Meta used the Bible to train AI models to learn over 1,000 languages

Some words, however, may have been mistranscribed.
Loukia Papadopoulos
An illustration of an AI language model.jpg
An illustration of an AI language model.


Meta’s new AI models were trained with the Bible to recognize and produce speech for more than 1,000 languages. The company now hopes these algorithms will help preserve languages that are at risk of disappearing.

This is according to a report by MIT published on Monday.

There are currently around 7,000 languages in the world.

The firm is releasing its new language models to the public via the code hosting service GitHub so that developers working in different languages can build new, more varied speech applications.

The new models were trained on two data sets: one that contains audio recordings of the New Testament Bible in 1,107 languages and another containing unlabeled New Testament audio recordings in 3,809 languages. 

“We can use what that model learned to then quickly build speech systems with very, very little data,” said Michael Auli, a research scientist at Meta who worked on the project.

“For English, we have lots and lots of good data sets, and we have that for a few more languages, but we just don’t have that for languages that are spoken by, say, 1,000 people.” 

The researchers now claim their models can converse in over 1,000 languages but recognize more than 4,000.

In addition, compared to models from rival companies, including OpenAI Whisper, Meta’s version had half the error rate, despite covering 11 times more languages.

All is not rosy, however. The scientists claim that their new models may mistranscribe certain words or phrases and that their speech recognition models yielded more biased words than other models, albeit only 0.7% more. 

Chris Emezue, a researcher at Masakhane, an organization working on natural-language processing for African languages, who was not involved in the project, told MIT that the use of religious text to train the models may be problematic.

“The Bible has a lot of bias and misrepresentations,” he explained. Is this development a step forward for language models or is it too controversial to be impactful?

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.
message circleSHOW COMMENT (1)chevron
Job Board