Several companies, like SignAll and Kintrans, have created sign language translation systems. While sophisticated systems, these are yet to go mainstream.
The ultimate mission for these companies is to allow the millions of people that use sign language to easily communicate with anyone.
Now, a new hand-tracking algorithm from Google's AI labs might be a big step in making this ambitious type of software everything it originally promised.
By using nothing but a smartphone and its camera, Google AI labs' new system creates a highly detailed map of a person's hand that it can then track for communication.
“Whereas current state-of-the-art approaches rely primarily on powerful desktop environments for inference, our method achieves real-time performance on a mobile phone, and even scales to multiple hands,” Google researchers Valentin Bazarevsky and Fan Zhang said in a blog post.
3D hand perception in real-time on a mobile phone via MediaPipe. Our solution uses machine learning to compute 21 3D keypoints of a hand from a video frame. Depth is indicated in grayscale. Source: Google AI blog
“Robust real-time hand perception is a decidedly challenging computer vision task, as hands often occlude themselves or each other (e.g. finger/palm occlusions and hand shakes) and lack high contrast patterns.”
As TechCrunch reports, companies, like SignAll, have turned to depth-sensing camera rigs in order to keep track of hand movements. Even so, keeping track of hand movements, with fingers that obscure each other and move quickly, is a difficult task.
Faster calculations
One of the ways the researchers made their algorithm calculate hand signals faster was by simplifying the process as much as they could - less data leading to less processing time.
Firstly, the system trains on a person's palm, rather than taking in the dimensions of the whole hand. Then, a separate algorithm looks at the fingers as well palm and assigns 21 coordinates on knuckles, fingertips, etc.
In order for the AI to learn these coordinates, the researchers had to manually add those 21 points to some 30,000 images of hands in various poses and lighting situations.
The developers have open-sourced their code in the hope that others will find innovative ways to use and improve on it. The system utilizes Google's existing MediaPipe augmented reality framework.
“We hope that providing this hand perception functionality to the wider research and development community will result in an emergence of creative use cases, stimulating new applications and new research avenues,” they say in their blog statement.
There's likely still a long way to go before truly effective sign language recognition - communication through sign language relies upon hand gestures, facial expressions, and other cues. Nevertheless, this is an exciting step in the right direction.