pith. sign in

arxiv: 2505.03351 · v2 · pith:TBJBVC73new · submitted 2025-05-06 · 💻 cs.CV

GUAVA: Generalizable Upper Body 3D Gaussian Avatar

classification 💻 cs.CV
keywords avatarfacialguavahumanreconstructionanimatablebodyexpressive
0
0 comments X
read the original abstract

Reconstructing a high-quality, animatable 3D human avatar with expressive facial and hand motions from a single image has gained significant attention due to its broad application potential. 3D human avatar reconstruction typically requires multi-view or monocular videos and training on individual IDs, which is both complex and time-consuming. Furthermore, limited by SMPLX's expressiveness, these methods often focus on body motion but struggle with facial expressions. To address these challenges, we first introduce an expressive human model (EHM) to enhance facial expression capabilities and develop an accurate tracking method. Based on this template model, we propose GUAVA, the first framework for fast animatable upper-body 3D Gaussian avatar reconstruction. We leverage inverse texture mapping and projection sampling techniques to infer Ubody (upper-body) Gaussians from a single image. The rendered images are refined through a neural refiner. Experimental results demonstrate that GUAVA significantly outperforms previous methods in rendering quality and offers significant speed improvements, with reconstruction times in the sub-second range (0.1s), and supports real-time animation and rendering.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Hand-4DGS: Feed-Forward 3D Gaussian Splatting for 4D Hand Reconstruction from Egocentric Videos

    cs.CV 2026-06 unverdicted novelty 6.0

    Hand-4DGS introduces the first feed-forward 3D Gaussian Splatting framework for 4D hand reconstruction from egocentric videos, achieving ~60 FPS inference and generalization on H2O and ARCTIC datasets.