PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

Antonio Oroz , Matthias Nie{\ss}ner , Tobias Kirschstein

Authors on Pith no claims yet

classification 💻 cs.CV

keywords modeleditingheadimagespercheadbasedisentangledhttps

read the original abstract

We present PercHead, a model for single-image 3D head reconstruction and disentangled 3D editing - two tasks that are inherently challenging due to ambiguity in plausible explanations for the same input. At the heart of our approach lies our novel perceptual loss based on DINOv2 and SAM 2.1. Unlike widely-adopted low-level losses like LPIPS, SSIM or L1, we rely on deep visual understanding of images and the resulting generalized supervision signals. We show that our new loss can be a drop-in replacement for standard losses and used to improve visual quality in high-frequency areas. We base our model architecture on Vision Transformers (ViTs), allowing us to decouple the 3D representation from the 2D input. We train our method on multi-view images for view-consistency and in-the-wild images for strong transferability to new environments. Our model achieves state-of-the-art performance in novel-view synthesis and, furthermore, exhibits exceptional robustness to extreme viewing angles. We also extend our base model to disentangled 3D editing by swapping the encoder and fine-tuning the network. A segmentation map controls geometry and either a text prompt or a reference image specifies appearance. We highlight the intuitive and powerful 3D editing capabilities through an interactive GUI. Project Page: https://antoniooroz.github.io/PercHead Video: https://www.youtube.com/watch?v=4hFybgTk4kE

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures
cs.CV 2026-05 unverdicted novelty 7.0

HeadsUp maps multi-view captures to UV-parameterized 3D Gaussians on a template via an encoder-decoder, achieving state-of-the-art quality and generalization after training on more than 10,000 subjects.
Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures
cs.CV 2026-05 unverdicted novelty 5.0

Pith review generated a malformed one-line summary.