New invention engineered to help the hearing-impaired could also help military

The innovation tracks the words that people utter using strain sensors.
Loukia Papadopoulos
Sensors track the mouth's movements.jpg
Sensors track the mouth's movements.


A new silent speech recognition system developed by South Korean researchers can accurately recognize words by tracking facial movements, according to a report published Friday by Euronews

The new invention was engineered to help hearing-impaired people, who cannot always communicate with others using sign language. Other applications include military uses for when radio communication is complicated by surrounding noise.

The technology uses strain sensors to detect the skin’s expansion and contraction as a person mouths words and further applies a deep learning algorithm to convert these facial movements into words.

“The strain sensor attached to the face stretches and shrinks according to the skin’s stretchiness when a person speaks. And the electric properties of the strain sensors change accordingly,” Taemin Kim, of Yonsei University School of Electrical and Electronic Engineering, told Euronews.

So far, the system can recognize a set of 100 words with nearly 88 percent accuracy. The sensors are also resistant to sweat and sebum and are significantly smaller than previous similar devices.

A better understanding

Their size means that more sensors can be applied to a person’s face resulting in a better understanding of the words pronounced.

New invention engineered to help the hearing-impaired could also help military
The sensors track face movements to understand words

"To classify and recognise more words, a higher resolution of information is needed. And that is why researchers today are trying to develop a high-resolution silent speech system that combines our wearable strain sensor with a highly integrated circuit that’s normally used in display or semiconductor production," said Kim.

"If we manage to increase the amount of information, and therefore the system can recognise more words and sentences, we expect that one day people with language disorders could have conversations in their everyday life.”

Most Popular

Making sign language obsolete?

Constant innovations such as these could soon make interpreters and even sign language obsolete. In March of 2021, Google unveiled their Live Captions feature on Chrome browsers. Live Caption uses machine learning to instantly create closed captions on any video or audio clip, providing hearing-impaired or hard-of-hearing individuals greater access to internet content. 

In the past— and still today— closed captions were either pre-programmed for video formats, or a stenographer would type an almost-instant caption that would be broadcast on television. However, in places where captioning isn’t the “norm,” such as on apps like Instagram or TikTok, captions are almost impossible to find. Live Caption changes this: with a few taps on the screen, any user can have instantaneous, accurate captions that broaden the reach of audio and video.

Google’s Live Caption is a type of NLP or natural language processing. NLP is a form of artificial intelligence that uses algorithms to facilitate an “interaction” of sorts between people and machines. NLPs help us decode human languages into machine languages, and often vice versa.

The new study is published in Nature Communications.

Study abstract:

A wearable silent speech interface (SSI) is a promising platform that enables verbal communication without vocalization. The most widely studied methodology for SSI focuses on surface electromyography (sEMG). However, sEMG suffers from low scalability because of signal quality-related issues, including signal-to-noise ratio and interelectrode interference. Hence, here, we present a novel SSI by utilizing crystalline-silicon-based strain sensors combined with a 3D convolutional deep learning algorithm. Two perpendicularly placed strain gauges with minimized cell dimension (<0.1 mm2) could effectively capture the biaxial strain information with high reliability. We attached four strain sensors near the subject’s mouths and collected strain data of unprecedently large wordsets (100 words), which our SSI can classify at a high accuracy rate (87.53%). Several analysis methods were demonstrated to verify the system’s reliability, as well as the performance comparison with another SSI using sEMG electrodes with the same dimension, which exhibited a relatively low accuracy rate (42.60%).