With the rapid development in building human-like robots, the next foreseeable step in this artificial intelligence era is to match them with real-life voices. And that is exactly what this Montreal-based start-up has done by unveiling the world’s first speech synthesis technology that can replicate anyone’s voice. Watch out Siri and Alexa, you may be deemed obsolete soon!
[Image Source: Pixabay]
How does it work?
Lyrebird, founded by three Ph.D. students from the University of Montreal, has developed a speech synthesis solution that is capable of exactly copying someone’s voice with a given emotion.
The synthesis does this by analyzing only a few dozen seconds of audio recording. By recording one minute of someone’s voice, Lyrebird can compress that person’s audio DNA into a unique key. The AI speech generator can then produce any speech with its corresponding vocal range. If you don’t want to fake the voice of someone you know or using your own, you can design a unique voice for use on your app. There is also a wide range of selection from thousands of predefined voices that you can choose from. Of course, the whole AI speech generator wouldn’t be that special if it was monotonous like current digital voices. Lyrebird’s synthetic vocal generator can control the emotion of generated voices giving them anger, sympathy, stress, and many more human expressions. But the real highlight of this new, digital voice generator is its ability to produce 1,000 sentences in less than half a second! This puts Lyrebird’s technology at the forefront of the AI speech synthesis.
One of Lyrebird’s co-founder, Alexandre de Brébisson, spoke about how their newly developed technology doesn’t require so much information to generate like-for-like voices.
“Different voices share a lot of information. After having learned several speakers’ voices, learning a whole new speaker’s voice is much faster. That’s why we don’t need so much data to learn a completely new voice. More data will still definitely help, yet one minute is enough to capture a lot of the voice DNA”.
The API that is being developed will be robust to learn even from noisy recordings. The current version of the technology uses various intonations from one person’s voice like this generated Donald Trump vocal recordings.
Many research studies inevitably face ethical critics and are often questioned about the intended use of their findings. Lyrebird’s speech synthesis faces a few ethical issues like identity theft as only a very short length of audio recording is required to replicate someone’s voice. Especially if and when the speech generator is released to the public, it will be difficult to control once millions of people have access to the technology. Moreover, voice recording evidence in criminal cases can also be rendered invalid as it can be argued that the audio file is forged or has been tampered with.
However, this is not the intended purpose of this speech synthesis. The founders of Lyrebird is looking to apply the technology in a positive way as a personal aid like for reading books with famous voices. It’s also targeted for people with disabilities that can use the speech synthesis to help them speak. A famous example of this is the computer voice that Stephen Hawking uses. Moreover, the technology can also be used extensively in animation movies and video game studios.
Lyrebird’s technology is still being developed, however, interested individuals can subscribe to the start-up’s website to become a beta-tester or be informed of the launch.