A Deep Learning Approach for Generalized Speech Animation

We introduce a simple and effective deep learning approach to automatically generate natural looking speech animation that synchronizes to input speech.

July 20, 2017
ACM SIGGRAPH 2017

Authors

Sarah Taylor (University of East Anglia)

Taehwan Kim (California Institute of Technology)

Yisong Yue (California Institute of Technology)

Moshe Mahler (Disney Research)

Jimmy Krahe (Disney Research)

Anastasio Garcia Rodrigues (Disney Research)

Jessica Hodgins (Disney Research)

Iain Matthews (Disney Research)

A Deep Learning Approach for Generalized Speech Animation

Download Publication PDF

Abstract

We introduce a simple and effective deep learning approach to automatically generate natural looking speech animation that synchronizes to input speech. Our approach uses a sliding window predictor that learns arbitrary nonlinear mappings from phoneme label input sequences to mouth movements in a way that accurately captures natural motion and visual coarticulation effects. Our deep learning approach enjoys several attractive properties: it runs in real-time, requires minimal parameter tuning, generalizes well to novel input speech sequences, is easily edited to create stylized and emotional speech, and is compatible with existing animation retargeting approaches. One important focus of our work is to develop an effective approach for speech animation that can be easily integrated into existing production pipelines. We provide a detailed description of our end-to-end approach, including machine learning design decisions. Generalized speech animation results are demonstrated over a wide range of animation clips on a variety of characters and voices, including singing and foreign language input. Our approach can also generate on-demand speech animation in real-time from user speech input.

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.

A Deep Learning Approach for Generalized Speech Animation

We introduce a simple and effective deep learning approach to automatically generate natural looking speech animation that synchronizes to input speech.

Authors

A Deep Learning Approach for Generalized Speech Animation

Abstract

Copyright Notice

Research at Disney

Legal

MORE