Chinese researchers synthesize tonal speech using neural cues

The research is a result of observations from awake language mapping during brain tumor surgery.
Sejal Sharma
Representational image
Representational image


A team of Chinese researchers has developed a way to artificially produce speech, also known as speech synthesis, using cues from neural brain activity.

According to the South China Morning Post, the researchers claim that they have a mind-reading machine that is capable of turning human thought into spoken Mandarin.

To achieve this, the team used a technique called electrocorticography (ECoG). It is used to measure direct brain signals from the cerebral cortex using electrodes that are implanted in the brain during surgery.

Tonal challenges

Tonal languages use tone and pronunciation to work together to communicate meaning. In languages like Mandarin, Vietnamese, Punjabi, Thai, Lao, and Cantonese, words can differ in tones in addition to consonants and vowels.

“Considering that a tonal syllable can be divided into tone and base syllable that are independent of each other, we proposed a divide-and-conquer framework. We hypothesized that tone and base syllable can be decoded separately from the neural activity and then tonal speech can be synthesized using the combination of the decoded tone and base syllable,” explained the researchers in their paper.

Language mapping

The research involved five participants who underwent awake language mapping during brain tumor surgery in China. 

Two high-density subdural electrode arrays were placed on the lateral surface of their brain to record neural activity during the surgery.

The participant was instructed to produce one of the eight syllables “ma (tone 1), ma (tone 2), ma (tone 3), ma (tone 4), mi (tone 1), mi (tone 2), mi (tone 3), and mi (tone 4)” following the audio go cue. Each participant performed 160 trials. 

The Mandarin syllable “ma” has four different tones that can mean “mother,” “hemp,” “horse,” and “scold,” respectively.

To accurately produce and identify tones in tonal languages, the team enhanced the algorithms that observe neural activities.

Audio recordings were obtained in synchronization with the ECoG recordings through a mounted microphone.

According to the study, the team designed a multi-stream modularized neural network model that can decode tone labels and base syllable labels in parallel and then also synthesize tonal syllable speech by combining the outputs of the tone and syllable modules,

Several recent studies have shown the feasibility of synthesizing short sentences and a few specific words in non-tonal languages, such as English and Japanese, from neural recordings.

These advances not only provide methods for the treatment of anarthria (complete loss of speech) but also increase the communication efficiency of speech brain-computer interfaces.

“Our model is also applicable to other dialects of Chinese, such as Cantonese and Wu Chinese,” said the researchers.

The study was published in the journal Science Advances.

Study abstract:

Recent studies have shown that the feasibility of speech brain-computer interfaces (BCIs) as a clinically valid treatment in helping nontonal language patients with communication disorders restore their speech ability. However, tonal language speech BCI is challenging because additional precise control of laryngeal movements to produce lexical tones is required. Thus, the model should emphasize the features from the tonal-related cortex. Here, we designed a modularized multistream neural network that directly synthesizes tonal language speech from intracranial recordings. The network decoded lexical tones and base syllables independently via parallel streams of neural network modules inspired by neuroscience findings. The speech was synthesized by combining tonal syllable labels with nondiscriminant speech neural activity. Compared to commonly used base- line models, our proposed models achieved higher performance with modest training data and computational costs. These findings raise a potential strategy for approaching tonal language speech restoration.

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.
message circleSHOW COMMENT (1)chevron
Job Board