VisemeNet: Audio-Driven Animator-Centric Speech Animation

Chris Landreth; Evangelos Kalogerakis; Karan Singh; Subhransu Maji; Yang Zhou; Zhan Xu

arxiv: 1805.09488 · v1 · pith:AGMJVPC2new · submitted 2018-05-24 · 💻 cs.GR

VisemeNet: Audio-Driven Animator-Centric Speech Animation

Yang Zhou , Zhan Xu , Chris Landreth , Evangelos Kalogerakis , Subhransu Maji , Karan Singh This is my paper

classification 💻 cs.GR

keywords speechaudiomotionanimationanimatoranimator-centricapproachdeep-learning

0 comments

read the original abstract

We present a novel deep-learning based approach to producing animator-centric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio. Our three-stage Long Short-Term Memory (LSTM) network architecture is motivated by psycho-linguistic insights: segmenting speech audio into a stream of phonetic-groups is sufficient for viseme construction; speech styles like mumbling or shouting are strongly co-related to the motion of facial landmarks; and animator style is encoded in viseme motion curve profiles. Our contribution is an automatic real-time lip-synchronization from audio solution that integrates seamlessly into existing animation pipelines. We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to be resilient to diversity in speaker and language.

This paper has not been read by Pith yet.

VisemeNet: Audio-Driven Animator-Centric Speech Animation

discussion (0)