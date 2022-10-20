How the process works

In most cases, to train AI interpreters, researchers will enter data from written language information into the computer system and the AI would learn typical speech through that process. However, in the case of oral languages, such as Hokkien, there is not enough data to be entered into the software since there isn’t a written system.

The current speech translation systems that are used today rely on speech to text systems. However, oral languages do not have transcribed texts. Therefore, the company created a new speech-to-speech translation.

“We used speech-to-unit translation (S2UT) to translate input speech to a sequence of acoustic units directly in the path previously pioneered by Meta,” the company said in its blog post, referencing its past research on speech-to-speech translation initiatives. “Then, we generated waveforms from the units. In addition, UnitY was adopted for a two-pass decoding mechanism, where the first-pass decoder generates text in a related language (Mandarin) and the second-pass decoder creates units,” it stated.

The new speech translator project

Meta decided to develop a new speech-to-speech (S2ST) translation system for existent languages to include the spoken ones. The main goal of Meta’s project is to build language tools that can be applied to most, if not all, world languages, both spoken and written. This translation system is part of Meta’s Universal Speech Translator (UST) project. The UST is creating new approaches to translate speech in one language to another regardless of the language being written or spoken, or both. Although the process is in its preliminary phases, the company hopes that it is a step in the right direction towards achieving its universal goal of translating all oral and written languages.