Robot voices are all around us, Siri, Alexa, our GPS systems – they've become a staple in the modern technological ecosystem. These digital voice systems work by utilizing computer algorithms to formulate human speech on the fly.
Even with how good these computer voices have gotten in recent times, there's still an uncanny valley effect when we hear their voices. The best of the voices still make us feel like there's just something off about them. It goes to show that replicating human voice isn't easy, even with all the tech we have today. But roughly 250 years ago, a brilliant team of engineers created the world's first artificial voice.
A Russian professor by the name of Christian Kratzensten built a machine consisting of a number of reeds that vibrated similarly to the human vocal tract. Years later in 1791, an inventor named Wolfgang von Kempelen iterated on the original design to create an even better voice machine.
The machine had bellows for airflow, reeds to simulate vocal cords, a tube for the larynx and vocal path, and even two nostrils, a tongue, and lips. With all of this put together, Kempelen was able to manipulate the shape of the tube, lips, and tongue to create consonants and vowels.
Devices like this first mimicking the ways that humans spoke would continue to be iterated on for the next 100 years until, in the 1930s, Homer Dudley of Bell Labs created the best early speaking machine. Named the VODER for Voice Operating Demonstrator, the machine was much more complicated than early designs. It transformed the bellows and reeds system from early machines with mechanical connections that allowed an operator to play the machine like a piano.
In 1939, the VODER machine was first unveiled at the New York World Fair to the amazement of the audience. The New York Times described the voice as akin to "an alien speaking underwater..."
The voice became the origin of what people began to expect from machine voices and its sound was solidified in history through a wealth of science fiction media.
The machine worked completely through the control of an operator and it could create two basic sounds: a buzzing or hissing. The operator would utilize the buzzing sound for vowel and nasal sounds while the hissing was utilized for consonants.
These initial sounds created by the operator using a wrist bar were then pushed through several filters that were selected using the keys on the operator's keyboard. Sounds for letters like P, D, J, or even CH were created using additional filters as they didn't fit into the buzzing or hissing categories.
The operator was even able to combine words into sentences by finely manipulating the keys. The original operator was Helen Harper. Harper said this about operating the machine:
"In producing the word ‘concentration’ on the VODER, I have to form thirteen different sounds in succession and make five up and down movements of the wrist bar and vary the position of the foot pedal from three to five times according to what expression I want the VODER to give the word. And of course, all this must be done with exactly the correct timing.”
It reportedly took Haper 1 full year of practice to learn to operate the machine with high precision. Three hundred women went through training to become operators, but by the end, only 30 were able to master it.
Skilled operators were so good at manipulating the machine that they could make it speak any language and even make animal sounds. In simple, the VODER was a mechanical instrument that allowed operators to mimick human speech.
This mechanical machine soon turned electrical as the years went by and we are now left with devices that sound nearly exactly like humans.