Frame Interpolation Transformer and Uncertainty Guidance
We propose a transformer-based VFI architecture that processes both source and target frames in a unified framework and compensates motion through a tightly integrated optical flow estimation and cross-backward warping. Our model improves over the current state-of-the-art as supported by our extensive quantitative experiments and a user study.
Authors
Markus Plack (University of Bonn)
Matthias B. Hullin (University of Bonn)
Karlis Martins Briedis (DisneyResearch|Studios /ETH Zurich)
Markus Gross (DisneyResearch|Studios /ETH Zurich)
Abdelaziz Djelouah (DisneyResearch|Studios)
Christopher Schroers (DisneyResearch|Studios)
Video frame interpolation has seen important progress in recent years, thanks to developments in several directions. Some works leverage better optical flow methods with improved splatting strategies or additional cues from depth, while others have investigated alternative approaches through direct predictions or transformers. Still, the problem remains unsolved in more challenging conditions such as complex lighting or large motion. In this work, we are bridging the gap towards video production with a novel transformer-based interpolation network architecture capable of estimating the expected error together with the interpolated frame.