Microsoft has created an AI system that's so good at describing images, it can do so even better than humans.
The recent model is also apparently twice as better as the company's image captioning model it's been using since 2015.
Why this is interesting
Microsoft's app for blind and visually impaired people, Seeing AI, has already launched the new AI system. Seeing AI assists the blind and visually impaired by narrating the world around them. This is just one clear example of just how useful and important this new AI system proves.
Moving further down the line, the system will also be available in PowerPoint for the web, Windows, and Mac, turning presentations into mind-blowing experiences, most likely.
"[Image captioning] is one of the hardest problems in AI," said Eric Boyd, CVP of Azure AI, in an interview with Engadget. "It represents not only understanding the objects in a scene, but how they’re interacting, and how to describe them."
Now, visually impaired and blind people can navigate the internet and the world around them with much more ease.
What truly stands out about Microsoft's work is how quickly it's been made available to the outside world. Xuedong Huang, CTO of Azure AI cognitive services, played a big part in this, as he understood just how integral this tech could be for many people.
Huang's team trained the AI model with images that had specific word tags, which provided a visual language to the system—something hard to come by, generally. As per Huang's statement in a Microsoft blog post, "This visual vocabulary pre-training essentially is the education needed to train the system; we are trying to educate this motor memory."