Controllable Tracking-Based Video Frame Interpolation
In this work, we address the less explored problem of user-assisted frame interpolation to improve quality and enable control over the appearance and motion of interpolated frames. To this end, we introduce a tracking-based video frame interpolation method that utilizes sparse point tracks, first estimated and interpolated with existing point tracking methods and then optionally refined by the user.
July 17, 2025
SIGGRAPH (2025)
Authors
Karlis Martins Briedis (DisneyResearch|Studios/ETH Zurich)
Abdelaziz Djelouah (DisneyResearch|Studios)
Raphaël Ortiz (DisneyResearch|Studios)
Markus Gross (DisneyResearch|Studios/ETH Zurich)
Christopher Schroers (DisneyResearch|Studios)

Controllable Tracking-Based Video Frame Interpolation
Temporal video frame interpolation has been an active area of research in recent years, with a primary focus on motion estimation, compensation, and synthesis of the final frame. While recent methods have shown good quality results in many cases, they can still fail in challenging scenarios. Moreover, they typically produce fixed outputs with no means of control, further limiting their application in film production pipelines. In this work, we address the less explored problem of user-assisted frame interpolation to improve quality and enable control over the appearance and motion of interpolated frames. To this end, we introduce a tracking-based video frame interpolation method that utilizes sparse point tracks, first estimated and interpolated with existing point tracking methods and then optionally refined by the user. Additionally, we propose a mechanism for controlling the levels of hallucination in interpolated frames through inference-time model weight adaptation, allowing a continuous trade-off between hallucination and blurriness. Even without any user input, our model achieves state-of-the-art results in challenging test cases. By using points tracked over the whole sequence, we can use better motion trajectory interpolation methods, such as cubic splines, to more accurately represent the true motion and achieve significant improvements in results. Our experiments demonstrate that refining tracks and their trajectories through user interactions significantly improves the quality of interpolated frames.
