PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

· 2025 · cs.CV · arXiv 2511.02777

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

We present PercHead, a model for single-image 3D head reconstruction and disentangled 3D editing - two tasks that are inherently challenging due to ambiguity in plausible explanations for the same input. At the heart of our approach lies our novel perceptual loss based on DINOv2 and SAM 2.1. Unlike widely-adopted low-level losses like LPIPS, SSIM or L1, we rely on deep visual understanding of images and the resulting generalized supervision signals. We show that our new loss can be a drop-in replacement for standard losses and used to improve visual quality in high-frequency areas. We base our model architecture on Vision Transformers (ViTs), allowing us to decouple the 3D representation from the 2D input. We train our method on multi-view images for view-consistency and in-the-wild images for strong transferability to new environments. Our model achieves state-of-the-art performance in novel-view synthesis and, furthermore, exhibits exceptional robustness to extreme viewing angles. We also extend our base model to disentangled 3D editing by swapping the encoder and fine-tuning the network. A segmentation map controls geometry and either a text prompt or a reference image specifies appearance. We highlight the intuitive and powerful 3D editing capabilities through an interactive GUI. Project Page: https://antoniooroz.github.io/PercHead Video: https://www.youtube.com/watch?v=4hFybgTk4kE

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

cs.CV · 2026-05-05 · unverdicted · novelty 7.0

HeadsUp maps multi-view captures to UV-parameterized 3D Gaussians on a template via an encoder-decoder, achieving state-of-the-art quality and generalization after training on more than 10,000 subjects.

FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision

cs.CV · 2025-12-17 · unverdicted · novelty 6.0

FlexAvatar introduces bias sinks in a transformer to unify monocular and multi-view training, yielding complete 3D head avatars with strong generalization and view extrapolation from single images.

citing papers explorer

Showing 2 of 2 citing papers.

Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures cs.CV · 2026-05-05 · unverdicted · none · ref 56 · internal anchor
HeadsUp maps multi-view captures to UV-parameterized 3D Gaussians on a template via an encoder-decoder, achieving state-of-the-art quality and generalization after training on more than 10,000 subjects.
FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision cs.CV · 2025-12-17 · unverdicted · none · ref 38 · internal anchor
FlexAvatar introduces bias sinks in a transformer to unify monocular and multi-view training, yielding complete 3D head avatars with strong generalization and view extrapolation from single images.

PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer