ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions
In this work, we propose to couple locally-defined facial expressions with 3D Gaussian splatting to enable creating ultra-high fidelity, expressive and photorealistic head avatars.
July 10, 2025
SIGGRAPH (2025)
Authors
Shivangi Aneja (Technical University of Munich, DisneyResearch|Studios)
Sebastian Weiss (DisneyResearch|Studios)
Irene Baeza (DisneyResearch|Studios)
Prashanth Chandran (DisneyResearch|Studios)
Gaspard Zoss (DisneyResearch|Studios)
Matthias Niessner (Technical University of Munich)
Derek Bradley (DisneyResearch|Studios)

ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions
Generating high-fidelity real-time animated sequences of photorealistic 3D head avatars is important for many graphics applications, including immersive telepresence and movies. This is a challenging problem particularly when rendering digital avatar close-ups for showing character’s facial microfeatures and expressions. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple locally-defined facial expressions with 3D Gaussian splatting to enable creating ultra-high fidelity, expressive and photorealistic head avatars. In contrast to previous works that operate on a global expression space, we condition our avatar’s dynamics on patch-based local expression features and synthesize 3D Gaussians at a patch level. In particular, we leverage a patch-based geometric 3D face model to extract patch expressions and learn how to translate these into local dynamic skin appearance and motion by coupling the patches with anchor points of Scaffold-GS, a recent hierarchical scene representation. These anchors are then used to synthesize 3D Gaussians on-the-fly, conditioned by patch-expressions and viewing direction. We employ color-based densification and progressive training to obtain high-quality results and faster convergence for high resolution 3K training images. By leveraging patch-level expressions, ScaffoldAvatar consistently achieves state-of-the-art performance with visually natural motion, while encompassing diverse facial expressions and styles in real time.
