Meta's new AI model can translate nearly 100 languages
There are over 7,000 languages spoken in the world today. The average person knows at least two languages. Most likely, one of them is their mother tongue and the other one is a language that must have been taught to them in school.
Language is one of the biggest barriers to understanding other people, cultures, and communities. And while we all would love to have the skills of a polyglot, we possibly can’t learn all 7,000 languages that the world has to offer. So, we turn to technology.
Trained on 270,000 hours of speech and text
Meta has introduced a multilingual model for text and speech translation and transcription. Called SeamlessM4T, it can broadly perform five tasks: speech-to-text, speech-to-speech, text-to-speech, text-to-text translations, and speech recognition.

While it can perform speech recognition and translation for only almost 100 input languages and 35 output languages, it is one step closer to bringing communities together. For example, you can give a speech input of ‘Good morning’ in English and it will give a model output of ‘Bonjour’ when you select French.
“The world we live in has never been more interconnected, giving people access to more multilingual content than ever before. This also makes the ability to communicate and understand information in any language increasingly important,” said Meta in a statement.
Open-source model
SeamlessM4T could be beneficial for someone who wants to learn a new language or is in a new country they don’t know the language of.
Staying true to its open-source approach, Meta has uploaded the collection of models under SeamlessM4T on HuggingFace, a platform that allows developers and companies to upload their machine-learning models. The model comes in two checkpoints of different sizes - SeamlessM4T-Medium and SeamlessM4T-Large, allowing developers and researchers to build on this work.
Meta has also released the dataset on which SeamlessM4T was trained. It’s called SeamlessAlign, and as per Meta, it is the “biggest open multimodal translation dataset to date, totaling 270,000 hours of mined speech and text alignments.”
Built on previous similar models
Meta’s latest model is built on other its previous models such as No Language Left Behind (NLLB), a text-to-text translation model that supports 200 languages, and Universal Speech Translator, which was the first direct speech-to-speech translation system for Hokkien, a primarily oral language spoken within the Chinese diaspora.
Meta also released the Massively Multilingual Speech model, which can identify over 4,000 spoken languages and provides speech recognition, language identification, and speech synthesis technology across more than 1,100 languages.
Inching closer to a universal language translator
A pioneer in the field, Google is a go-to for most to translate an article or to convert speech from one language to another. The tech firm is now building the Universal Speech Model (USM) to support languages that are spoken by a limited number of people.
The AI-powered model would support 1,000 languages, with 2B parameters trained on 12 million hours of speech and 28 billion sentences of text. This would also enhance YouTube’s automatic speech recognition software being used for creating subtitles on the go.
Since SeamlessM4T covers only a fraction of all global languages, the model can be considered a stepping stone toward a universal language translator. OpenAI’s ChatGPT can converse in 95 languages. Google’s Bard can speak 40 languages. As fast as the pace of technology, especially in the artificial intelligence and generative AI space, is today, we have a long way to go in creating a tool that can effortlessly translate in all languages.