Meta develops a new way for people to connect through language using AI

The artificial intelligence speech translation system can decipher Hokkien, a spoken language.
Brittney Grimes
Meta developing new language AI.
Meta developing new language AI.


Meta has created a new speech translator that can translate Hokkien, a predominantly oral language spoken in the diaspora of China and one of the national languages of Taiwan.

The spoken language

Since Hokkien is an oral language spoken by millions, this is a huge breakthrough.

Meta develops a new way for people to connect through language using AI
Millions of Hokkien speakers around the world.

Artificial intelligence (AI) translations have mainly focused on written languages. However, nearly half of the world’s approximate 7,000 living languages are mainly spoken languages. This leaves 3,500 languages unaccounted for when translated by most AI. The Meta AI-powered speech translator is the first of its kind. It is the first AI translator that can be used primarily for an oral language, in this case Hokkien.

How the process works

In most cases, to train AI interpreters, researchers will enter data from written language information into the computer system and the AI would learn typical speech through that process. However, in the case of oral languages, such as Hokkien, there is not enough data to be entered into the software since there isn’t a written system.

The current speech translation systems that are used today rely on speech to text systems. However, oral languages do not have transcribed texts. Therefore, the company created a new speech-to-speech translation.

“We used speech-to-unit translation (S2UT) to translate input speech to a sequence of acoustic units directly in the path previously pioneered by Meta,” the company said in its blog post, referencing its past research on speech-to-speech translation initiatives. “Then, we generated waveforms from the units. In addition, UnitY was adopted for a two-pass decoding mechanism, where the first-pass decoder generates text in a related language (Mandarin) and the second-pass decoder creates units,” it stated.

Meta develops a new way for people to connect through language using AI
Representation of speech-to-unit translation.

The new speech translator project

Meta decided to develop a new speech-to-speech (S2ST) translation system for existent languages to include the spoken ones. The main goal of Meta’s project is to build language tools that can be applied to most, if not all, world languages, both spoken and written. This translation system is part of Meta’s Universal Speech Translator (UST) project. The UST is creating new approaches to translate speech in one language to another regardless of the language being written or spoken, or both. Although the process is in its preliminary phases, the company hopes that it is a step in the right direction towards achieving its universal goal of translating all oral and written languages.

“It will take much more work to provide everyone around the world with truly universal translation tools. But we believe the efforts described here are an important step forward” the company said.

A portion of this project also includes Meta’s No Language Left Behind. For this segment of the project, the company is building a new advanced AI model that can learn languages that don’t have a lot of examples to train from, so that languages that are less well-known can be also translated.

Challenges of translating all languages

Meta also mentioned its concerns in translating every language. It mentioned three challenges that it will face as it goes further into creating a universal AI language translator. The first challenge mentioned is data scarcity. Meta explained the scarcity of data is difficult enough, even for written languages, but expands upon this for oral languages that would advance its speech-to-speech translation.

The second challenge mentioned is that most translation systems are designed to be bilingual for interpreting languages, such as English to Spanish. To interpret a spoken language, the company said it would have to scale dozens of language pairs to include both spoken and written languages.

The third challenge is translating languages in real time speech-to-speech, since some sentences are spoken in different word orders than the way they are written.

Meta AI researchers mentioned the additional challenges in creating this system that includes “traditional machine translation systems, including data gathering, model design, and evaluation.”

The future of translating spoken languages

Currently, the AI translator can only interpret the languages between Hokkien and English, but the company sees endless possibilities in using this innovative translative technology for other languages. The company hopes to have the ability to translate other spoken dialects in the future. “We’re open-sourcing not just our Hokkien translation models but also the evaluation datasets and research papers, so that others can reproduce and build on our work,” Meta stated.

The company wants to use AI research to break down language barriers and build a world united through language in the future.

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.
message circleSHOW COMMENT (1)chevron
Job Board