Deep Learning Allows Accurate Real-Time Lip Sync for Live 2D Animation

Animation can now be more effectively generated to mirror the movements and speech of voice actors in real time.
Chris Young

Two researchers at Adobe Research and the University of Washington have introduced a deep-learning-based interactive system that takes live performances from actors and generates real-time lip-sync for 2-D animated characters.

The system allows for a different animation experience where audiences and actors can interact with each other while seeing their cartoon doppelgangers mirrored on a screen.


What is live 2D animation?

Back in 1997 in The Simpsons episode “The Itchy & Scratchy & Poochie Show," Homer Simpson asked a room of cartoon animators whether an episode they created was airing live.

A voice actor jokingly replied: “very few cartoons are broadcast live. It’s a terrible strain on the animators’ wrists.”

It seems that Homer's was a prophetic vision as in 2019, live 2D animation is a powerful new medium whereby actors can act out a scene that will be animated at the very same time on screen.

The first-ever live cartoon, an episode of The Simpsons, was aired in 2016. One recent example of the medium saw Stephen Colbert interviewing cartoon guests, including Donal Trump, on the Late Show.

Another saw Archer's main character talking to an audience live and answering questions during a ComicCon panel event.

The live animation technique uses facial detection software, cameras, recording equipment, and digital animation techniques to create a live animation that runs alongside a real acted scene.

Machine learning is allowing all of this to run even more smoothly and keep improving.

Live lip syncing

A key aspect of live 2D animation is good lip syncing. It allows the mouths of animated characters to move appropriately when speaking and mirror the performances of human actors.

Poor lip syncing, on the other hand, like a bad language dub, can make or break the technique by ruining immersion.

In a paper that has been prepublished on arXiv, the two researchers at Adobe Research and the University of Washington introduced their deep-learning-based interactive system that automatically generates live lip syncing for layered 2-D animated characters.

The neural network system they developed uses a long short-term memory (LSTM) model.

As Techexplore reports, Li and Aneja's LSTM model has shown such impressive results that Adobe decided to integrate a version of it into its Adobe Character Animator software, released in 2018.

The system can be used to drastically reduce training data requirements for models designed for producing real-time lip sync. At the same time, it provides effective lip syncing that isn't too far away from traditional animation — allowing 2D animation to come to life like never before.

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.
message circleSHOW COMMENT (1)chevron
Job Board