MIT’s Advanced AI Aims to Predict the Mood of a Conversation

February 2, 2017

MIT is working on developing a wearable AI system which can accurately predict the mood of a conversation.

Deciphering the way a person articulates a sentence’s mood and tone can significantly alter the meaning of a conversation. Ultimately, the interpretation of its meaning is left to the listener. Being able to distinguish the emotions, a person is portraying is a critical component of conversation. However, not everyone is able to make the distinctions between tones.

For some people, especially those who suffer from anxiety or Aspergers, may interoperate a conversation in a different way than it was intended. The miscommunication can make social interactions extremely stressful.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Institute of Medical Engineering and Science (IMES) say they may have the solution: a wearable AI device capable of distinguishing if a conversation is happy, sad, or neutral by actively monitoring the way a person speaks.

“Imagine if, at the end of a conversation, you could rewind it and see the moments when the people around you felt the most anxious,” says graduate student Tuka Alhanai, “Our work is a step in this direction, suggesting that we may not be that far away from a world where people can have an AI social coach right in their pocket.”

The mood-predicting wearables actively analyze a person’s speech patterns and physiological signals to determine the tones and moods expressed in a conversation with 83 percent accuracy. The system is programmed to record a “sentiment score” every five seconds during a conversation.

“As far as we know, this is the first experiment that collects both physical data and speech data in a passive but robust way, even while subjects are having natural, unstructured interactions,” says Ghassemi. “Our results show that it’s possible to classify the emotional tone of conversations in real-time.”

Deep-learning techniques will continue to improve the performance of the system as more people use the system creating more data for the algorithms to analyze. To protect the privacy of the user, the data is processed locally on a device to prevent potential privacy breaches. Although, there may still be privacy concerns as the device can potentially record the conversations of unassuming individuals.

How the device operates

Previous studies examining the emotion of a conversation required a participant to artificially act out a specific emotion. In an attempt to create more organic emotions, MIT researchers instead had participants tell a happy or sad story.

[Image Source: MITCSAIL/YouTube]

Participants of the study wore a Samsung Simband- a device able to capture high-resolution physiological waveforms to measure many attributes including heart rate, blood pressure, blood flow, and skin temperature. The device also simultaneously records audio data which is then analyzed to determine tone, pitch, energy, and vocabulary.

“The team’s usage of consumer market devices for collecting physiological data and speech data shows how close we are to having such tools in everyday devices,” says Björn Schuller, professor and chair of Complex and Intelligent Systems at the University of Passau in Germany. “Technology could soon feel much more emotionally intelligent, or even ‘emotional’ itself.”

MIT researchers recorded 31 conversations then used the data to train two separate algorithms. The first one deduces the conversation to categorize it as either happy or sad. The secondary algorithm determines whether the conversation is positive, negative, or neutral over 5-second intervals.

“The system picks up on how, for example, the sentiment in the text transcription was more abstract than the raw accelerometer data,” says Alhanai. “It’s quite remarkable that a machine could approximate how we humans perceive these interactions, without significant input from us as researchers.”

Does it work?

Surprisingly, the algorithms successfully determined most of the emotions which a human would expect during a conversation. Although, the results of the model were only 18 percent above chance. Despite the small percentage, the new technique remains a full 7.5 percent more accurate than existing approaches.

Unfortunately, the model is still too underdeveloped to be of any practical use as a social coach. However, researchers plan to scale up the data collection by enabling the system to be used on commercial devices like the apple watch.

“Our next step is to improve the algorithm’s emotional granularity so that it is more accurate at calling out boring, tense, and excited moments, rather than just labeling interactions as ‘positive’ or ‘negative,’” saysAlhanai. “Developing technology that can take the pulse of human emotions has the potential to dramatically improve how we communicate with each other.”

SEE ALSO: Vessels, the Music Created by Physiological Signals of Emotions