Combining Frame and GOP Embeddings for Neural Video Representation

In this paper, we propose T-NeRV, a hybrid video INR that combines frame-specific embeddings with GOP-specific features, providing a lever for content-specific fine-tuning.

June 17, 2024

CVPR (2024)

Authors

Jens Eirik Saethre (DisneyResearch|Studios/ETH Zurich)

Roberto Azevedo (DisneyResearch|Studios)

Christopher Schroers (DisneyResearch|Studios)

Combining Frame and GOP Embeddings for Neural Video Representation

Download Publication PDF

Download Supplemental PDF

Abstract

Implicit neural representations (INRs) were recently proposed as a new video compression paradigm, with existing approaches performing on par with HEVC. However, such methods only perform well in limited settings, e.g., specific model sizes, fixed aspect ratios, and low-motion videos. We address this issue by proposing T-NeRV, a hybrid video INR that combines frame-specific embeddings with GOP-specific features, providing a lever for content-specific fine-tuning. We employ entropy-constrained training to jointly optimize our model for rate and distortion and demonstrate that T-NeRV can thereby automatically adjust this lever during training, effectively fine-tuning itself to the target content. We evaluate T-NeRV on the UVG dataset, where it achieves state-of-the-art results on the video representation task, outperforming previous works by up to 3dB PSNR on challenging high-motion sequences. Further, our method improves on the compression performance of previous methods and is the first video INR to outperform HEVC on all UVG sequences.

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.

Combining Frame and GOP Embeddings for Neural Video Representation

In this paper, we propose T-NeRV, a hybrid video INR that combines frame-specific embeddings with GOP-specific features, providing a lever for content-specific fine-tuning.

Authors

Combining Frame and GOP Embeddings for Neural Video Representation

Abstract

Copyright Notice

Research at Disney

Legal

MORE