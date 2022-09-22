Many variations on a theme

There have been countless versions of highly capable speech recognition systems, which work at the heart of software and services from the likes of giants in technology including Google, Meta, and Amazon. What makes Whisper different is that it was trained on 680,000 hours of multilingual and multitask data collected from the web.

This led to improved recognition of unique accents, background noise variants, and technical terminology and jargon.

“The primary intended users of Whisper models are AI researchers, studying robustness, capabilities, biases, generalization, and constraints of the current model. However, Whisper is also potentially useful as an automatic speech recognition solution for developers, especially for English speech recognition.” OpenAI said in a GitHub repo, (program notes) for Whisper. Anyone can download Whisper from GitHub; it is entirely free to use.

The models are showing strength

Also, in the repo, OpenAI wrote “The models show strong ASR results in about 10 languages. They may exhibit additional capabilities, if fine-tuned on certain tasks, like voice activity detection, speaker diarization, and speaker classification. But have not been robustly evaluated in these areas.”