One thing is for sure, AI is going places. Just by the sheer amount of data that can be interpreted by deep-learning neural-networks these micro-AIs are able to perform wonders
Music is both an auditory and visual experience. When watching an ensemble of musicians, we take visual cues to aid us to differentiate who is playing what.
Researchers at MIT-IBM Watson AI Lab developed a new AI tool that just imitates this process. Building upon the work by Zhao et. al researchers exploit the observable hand and body movements captured on video. A video analysis network compiles data from the movements of the musicians while an audio-visual separation network separates each sound source via taking in the data provided by the video analysis network.
This technology can potentially be utilized when mixing the audio of a concert performance. Audio producers can isolate an instrument and change its volume, if you think about it, this can dramatically improve the remasters of old concert footage.
This technology could also be adapted to resolve issues related to people talking simultaneously at video-conferences. Another potential use could be robots, to make them understand environmental sounds like animals, vehicles, or people at a better quality.
The basis of visual analysis in the project, so-called keypoint analysis, also has applications in the sports field. Providing a performance tracking solution that requires less human-input.
Previous research has shown that vision-audio pairing systems can be taught to recognize and differentiate a wide array of sound sources from crashing waves to chirping birds.
There are many potential areas this technology can find commercial use. We can train a security system to react to the sound of breaking window glass, or, we can train a self-driving car AI to predict the path of an incoming ambulance.