Google marks major milestone towards its 1,000-language AI model

Its based model performs better than state-of-the-art models available today.
Ameya Paleja
Google is looking unveil a slew of AI-based products
Google is looking unveil a slew of AI-based products

400tmax/iStock 

Search giant Google has completed the 'critical first step' toward building its artificial intelligence (AI) model that will support the world's one thousand most-spoken languages. In a blog post the company released details about its Universal Speech Model (USM).

Google's announcement is part of the build-up to its annual I/O event where it plans to unveil a slew of products powered by AI. Scheduled in May this year, the event could see Google show off more than 20 products featuring AI capabilities, a much-needed boost as the company seems to be losing ground to Microsoft's aggressive pitch for OpenAI's GPT-powered products.

What is Universal Speech Model?

In November 2022, Google unveiled its 1,000 Languages Initiative, a machine learning model that aims to bring inclusivity to billions of people around the globe by making it easier to access one thousand most spoken languages.

According to the blog post, the Universal Speech Model (USM) is a family of speech models that includes two billion parameters that have been trained on 12 million hours of speech and 28 billion sentences of text. Currently, the model is based on a little over 300 languages but is already in use in Google's products, such as YouTube.

If you have used Automatic Speech Recognition (ASR) while watching YouTube videos in a language that you are not familiar with, it is the USM that is making it easier to understand the content. Google researchers Yu Zhang and James Qin further elaborated on how the machine-learning model was trained.

The researchers state that the fundamental difficulty in training a model such as USM is access to enough data. In a conventional supervised learning approach, the audio data needs to be manually labeled or collected from a pre-existing transcription. This either turns out too expensive, time-consuming, or hard to find, depending on the language and its representation.

Google marks major milestone towards its 1,000-language AI model
USM's overall training pipeline

Google instead used a self-supervised learning approach that leveraged audio-only data, which was available in large quantities across languages making it easier to scale. After self-supervised learning on audio, Google put the model through a second step where its quality and coverage were improved using text data and then fine-tuned it using downstream tasks such as ASR.

using this approach, Google found that its word error rate (WER) was less than 30 percent across 73 languages, an achievement for the company. For commonly used languages like U.S. English, the relative rate was six percent lower than the state-of-the-art internal model the company used. Comparisons with other publicly available datasets also showed that USM performs better at ASR and speech translation tasks.

The company is now looking to use USM's based model architecture and training pipeline to build its 1,000 language model. Mark Zuckerberg's Meta, which has been betting on building the metaverse, also released its ChatGPT-like large language model last month.

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.
message circleSHOW COMMENT (1)chevron
Job Board