Artificial intelligence is no longer part of the imaginative minds of sci-fiction. If you were worried about some of the latest AI developments in 2017, 2018 does not look any better. One of the creepier developments has to be the latest progress with Google's voice generating AI.
Now if you have used any of the Google products or even simply used the Google translate service, you are familiar with Google's AI voice. Available in both a male or female voice, the robotic voice is a staple in our culture just like Apple's Siri or Microsoft's Cortana.
As the years have gone by the Google voice has started to sound less robotic and more like a human. At this point, the new Tacotron 2 Google voice AI is almost indistinguishable from humans.
Google's Voice Generated AI
In a recently published research paper by the people at Google, the team introduces details to the impressive speech system called Tacotron 2. In the paper, Google highlights the systems ability to speak almost identically to its human creators. The team describes the second generation speech system in the report stating, " The Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms."
As stated in the report, the technology comprises of two deep neural networks. The first network translates the text into a spectrogram, then sends them into the Deep Mind-created system, WaveNet. What do you get when you implement these systems? A voice that sounds like its human counterparts. Listen to the voice recording presented below. One of the recordings is the Tacotron 2 while the other is a paid actress. Can you tell the difference?
In these recordings, the voice says “That girl did a video about Star Wars lipstick.”
Or how about this one? “She earned a doctorate in sociology at Columbia University.”
If you hear the power of the Tacotron 2, listen to it attempt these tongue twisters.
“Peter Piper picked a peck of pickled peppers. How many pickled peppers did Peter Piper pick?”
“She sells sea-shells on the sea-shore. The shells she sells are sea-shells I'm sure.”
The AI also does a fantastic job of parsing context and understanding where stress is supposed to lie. Listen to the perfect inflection it uses in the statement "He thought it was time to present the present."
It can also tell the difference between homonyms, such as being able to tell the difference between the past tense read and the infinitive to read. Even some (human) native English speakers can struggle with those while reading aloud!
Though the system does occasionally struggle with the pronunciation of the multi-syllable words, Tacotron 2 does deliver some impressive vocal acoustics. Once the system is finalized for production, the Tacotron 2 is sure to be powerful voice across Google's ecosystem.