DiVAS: Video and Audio Synchronization with Dynamic Frame Rates

In this paper, we study the automatic discovery of such issues. Specifically, we focus on the alignment of lip movements with spoken words, targeting realistic production scenarios which can include background noise and music, intricate head poses, excessive makeup, or scenes with multiple individuals where the speaker is unknown.

June 17, 2024

CVPR (2024)

Authors

Clara Fernandez-Labrador (DisneyResearch|Studios)

Mertcan Akçay (DisneyResearch|Studios / ETH Zurich)

Eitan Abecassis (Disney Entertainment and ESPN Technology)

Joan Massich (DisneyResearch|Studios)

Christopher Schroers (DisneyResearch|Studios)

DiVAS: Video and Audio Synchronization with Dynamic Frame Rates

Download Publication PDF

Abstract

Synchronization issues between audio and video are one of the most disturbing quality defects in film production and live broadcasting. Even a discrepancy as short as 45 milliseconds can degrade the viewer’s experience enough to warrant manual quality checks over entire movies. In this paper, we study the automatic discovery of such issues. Specifically, we focus on the alignment of lip movements with spoken words, targeting realistic production scenarios which can include background noise and music, intricate head poses, excessive makeup, or scenes with multiple individuals where the speaker is unknown. Our model’s robustness also extends to various media specifications, including different video frame rates and audio sample rates. To address these challenges, we present a model fully based on Transformers that encodes face crops or full video frames and raw audio using timestamp information, identifies the speaker and provides highly accurate synchronization pre- dictions much faster than previous methods.

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.

DiVAS: Video and Audio Synchronization with Dynamic Frame Rates

Authors

DiVAS: Video and Audio Synchronization with Dynamic Frame Rates

Abstract

Copyright Notice

Research at Disney

Legal

MORE