In late 1800's Alexander Graham Bell in cooperation with his father and stimulated by Wheatstone's speaking machine, created a similar kind of speaking machine. Bell made also some remarkable studies with his dog. He used to put his dog between his legs and stimulate it to bark, and then he changed its vocal tract by hands to produce speech-like sounds.
The studies with mechanical and semi-electrical replicas of vocal system were being undertaken until 1960's, but with no worthy success, despite the studies being carried out by famous scientists. However, advances in that would begin the foundations for speech synthesis technology were on the horizion…
The first 100% electrical synthesis device was invented by Stewart in 1922. The synthesizer had a signal as trigger and two resonant circuits to imitate the audio resonances of the vocal tract. The machine could generate individual static vowel sounds with two lowest formants, but could not imitate any consonants or continuous speeches. A similar kind of synthesizer was created by Wagner which consisted of four electrical resonators joined in parallel and it was stimulated by a buzz-like source. The outputs of the four resonators were mixed in the proper ratios to produce vowel spectra. In 1932 Japanese scientists Obata and Teshima found out the third formant in vowels. The three first formants are typically considered enough for accurate synthetic speech.
VODER (Voice Operating Demonstrator) was the first device believed to be a speech synthesizer; it was created by Homer Dudley in 1939. VODER had his idea from VOCODER (Voice Coder) created in Bell Laboratories in the mid-thirties. The first VOCODER was a device for analyzing speech into gradually varying audio parameters that could then direct a synthesizer to rebuild the estimate of the original speech signal. Technology was beginning to take shape that would lead to valuable speech synthesis technology.
The VODER comprised a wrist rod for choosing a voicing or noise source and a foot pedal to manage the fundamental frequency. The source signal was directed through ten bandpass filters whose output levels were managed by fingers. It took significant skill to play a sentence on the device, besides the speech quality and accuracy were really bad but the possibility for creating artificial speech was well confirmed.
After VODER model, the scientific world acquired more and more interest in speech synthesis. It was finally proved that accurate speech can be produced artificially. Actually, the basic assembly of VODER is very close to current systems which are built on source-filter-model of speech. Thus, today’s speech synthesis technology still represents some of the earliest innovations.
In 1951, Franklin Cooper and others —at the Haskins Laboratories— remodeled a Pattern Playback synthesizer. It turned recorded spectrogram models into sounds, either in original or customized form. The spectrogram models were logged optically on the transparent strip.
PAT (Parametric Artificial Talker) was the first formant synthesizer; it was invented by Walter Lawrence in 1953. PAT comprised three electronic formant resonators joined in parallel. The trigger signal was either a buzz or noise. A moving glass slide was used to turn painted models into six time functions to manage the three formant frequencies: fundamental frequency, voicing amplitude and noise amplitude. During that period Gunnar Fant invented the first cascade formant synthesizer OVE I (Orator Verbis Electris) which comprised formant resonators connected in cascade, and then in 1962, Fant and Martony invented an advanced OVE II synthesizer, which comprised individual parts to replicate the transient functions of the vocal tract for vowels, nasals, and obstruent consonants. Voicing, aspiration noise, and frication noise were among the available excitations.
Read part II to see where speech synthesis technology is formalized and becomes part of various applications.
More Technical Articles
Text-to-Speech and Voice Recognition Videos
Text-to-Speech Homepage