FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation
In this work, we extend the explicit Gaussian representations with per-Gaussian features and introduce a lightweight MLP-based dynamic network to predict 3D Gaussian deformations from expression codes.
April 22, 2026
ICLR (2026)
Authors
Xinya Ji (ETH Zurich, DisneyResearch|Studios)
Sebastian Weiss (DisneyResearch|Studios)
Manuel Kansy (DisneyResearch|Studios)
Jacek Naruniec (ETH Zurich, DisneyResearch|Studios)
Xun Cao (Nanjing University)
Barbara Solenthaler (ETH Zurich)
Derek Bradley (DisneyResearch|Studios)
FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation
Despite recent progress in 3D Gaussian-based head avatar modeling, efficiently generating high fidelity avatars remains a challenge. Current methods typically rely on extensive multi-view capture setups or monocular videos with per-identity optimization during inference, limiting their scalability and ease of use on unseen subjects. To overcome these efficiency drawbacks, we propose FastGHA, a feed- forward method to generate high-quality Gaussian head avatars from only a few input images while supporting real-time animation. Our approach directly learns a per-pixel Gaussian representation from the input images, and aggregates multi-view information using a transformer-based encoder that fuses image features from both DINOv3 and Stable Diffusion VAE. For real-time animation, we extend the explicit Gaussian representations with per-Gaussian features and introduce a lightweight MLP-based dynamic network to predict 3D Gaussian deformations from expression codes. Furthermore, to enhance geometric smoothness of the 3D head, we employ point maps from a pre-trained large reconstruction model as geometry supervision. Experiments show that our approach significantly outperforms existing methods in both rendering quality and inference efficiency, while supporting real-time dynamic avatar animation.