Text-To-Speech Conversion by Microsoft AI Is Incredibly Realistic

Taking its lead from the human brain, this text-to-speech AI conversion shows how fast technology is developing.

Microsoft and Chinese researchers may have discovered an effective way of converting text-to-speech. Up until now, this conversion had been evolving very imaginatively and cleverly; however, the issues of training time and resources to create natural-sounding output were causing delays.


What Microsoft and Chinese researchers have done is to create an AI text-to-speech Artificial Intelligence (AI) that utilizes 200 voice samples to create realistic-sounding speech to match transcriptions. This means approximately 20 minutes' worth.

How is it linked to the brain?

Similar to brain neurons, the system partly uses Transformers, or deep neural networks. Like our brain synapses, the Transformers weigh in and process all input and output information on the go. This helps them to run through even complicated and long sequences in a well-organized way - for example, a complex sentence. 

Text-To-Speech Conversion by Microsoft AI Is Incredibly Realistic
Artificial Intelligence Pixabay

Working with relatively little information, including a voice-removing encoder added to the mix, as it is in this case, AI can regardless manage quite nicely. 

Even with slightly robotic sounds, the word-intelligibility of the recordings comes in at 99.84 percent. On top of that, this may bring the text to speech more accessible. In order to create realistic-sounding voices, it wouldn't take much more hard work. 

Researchers are continually working to improve the system, and are hopeful that in the future, it will take even less work to generate lifelike discourse.

Follow Us on

Stay on top of the latest engineering news

Just enter your email and we’ll take care of the rest:

By subscribing, you agree to our Terms of Use and Privacy Policy. You may unsubscribe at any time.