pith. machine review for the scientific record. sign in

arxiv: 2604.09835 · v1 · submitted 2026-04-10 · 💻 cs.CV · cs.AI

Recognition: no theorem link

F3G-Avatar : Face Focused Full-body Gaussian Avatar

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:07 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords full-body avatarGaussian splattingface-focused modelinganimatable avatarmulti-view reconstruction3D Gaussiansdeformation branchMHR template
0
0 comments X

The pith

A two-branch Gaussian decoder on a clothed template reconstructs full-body avatars with sharper facial details from multi-view video.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to fix the weak facial modeling that occurs when Gaussian splatting methods optimize only for overall body quality. It does so by beginning with a clothed Momentum Human Rig template, rendering front and back positional maps, and feeding them through separate body and face branches before fusing the resulting 3D Gaussians. The face branch specifically targets head geometry and high-frequency expressions, while a dedicated adversarial loss sharpens close-up realism. If this holds, the approach yields a straightforward pipeline that turns ordinary video into animatable avatars without requiring separate face capture rigs or heavy manual cleanup.

Core claim

Existing full-body Gaussian avatar methods struggle with fine facial geometry and pose-dependent expressions because of limited representational capacity for high-frequency deformations. F3G-Avatar starts from a clothed MHR template, renders front and back positional maps, and decodes them into 3D Gaussians via a body branch that models non-rigid pose-dependent deformations plus a face-focused branch that refines head geometry and appearance. The Gaussians are fused, deformed using linear blend skinning, and rendered with differentiable Gaussian splatting. Training mixes reconstruction and perceptual losses with a face-specific adversarial loss, delivering face-view PSNR/SSIM/LPIPS of 26.243

What carries the argument

Two-branch architecture that splits Gaussian prediction into a body branch for pose-dependent non-rigid deformations and a face-focused deformation branch for head refinement, both decoded from positional maps rendered on a clothed MHR template.

If this is right

  • Strong face-view metrics of 26.243 PSNR, 0.964 SSIM, and 0.084 LPIPS are reported on the AvatarReX dataset.
  • Ablation results credit both the MHR template and the dedicated face branch for the observed gains.
  • The pipeline produces animatable full-body representations directly from standard multi-view RGB video plus regressed pose and shape parameters.
  • Rendering remains efficient because the final output uses differentiable Gaussian splatting after fusion and LBS posing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same template-plus-branch pattern could be tested on non-human characters such as animals or stylized figures to check whether facial focus generalizes beyond humans.
  • Integration with real-time pose estimators might allow live capture of full-body avatars without offline multi-view rigs.
  • The adversarial face loss could be replaced by a perceptual metric tuned to human viewers to reduce training instability on smaller datasets.

Load-bearing premise

The assumption that adding a face-focused branch and adversarial loss on top of the clothed template is enough to capture high-frequency facial details without introducing new artifacts or needing per-dataset tuning.

What would settle it

Running the method on a new multi-view video set containing rapid head turns and extreme expressions and measuring whether facial PSNR drops below 24 or visible artifacts appear in close-ups.

Figures

Figures reproduced from arXiv: 2604.09835 by Egor Bondarev, Erkut Akdag, Pedro Quesado, Willem Menu, Yasaman Kashefbahrami.

Figure 1
Figure 1. Figure 1: Framework of F3G-Avatar. Multi-view images and regressed poses are used to generate an MHR clothed template, which is [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of F3G-Avatar. (a) MHR clothed body template. (b) Global Canonical Deformation (Body): front/back body positional [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of the canonical face model construction [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: F3G-Avatar displays state-of-the-art rendering quality by delivering improved facial details. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Existing full-body Gaussian avatar methods primarily optimize global reconstruction quality and often fail to preserve fine-grained facial geometry and expression details. This challenge arises from limited facial representational capacity that causes difficulties in modeling high-frequency pose-dependent deformations. To address this, we propose F3G-Avatar, a full-body, face-aware avatar synthesis method that reconstructs animatable human representations from multi-view RGB video and regressed pose/shape parameters. Starting from a clothed Momentum Human Rig (MHR) template, front/back positional maps are rendered and decoded into 3D Gaussians through a two-branch architecture: a body branch that captures pose-dependent non-rigid deformations and a face-focused deformation branch that refines head geometry and appearance. The predicted Gaussians are fused, posed with linear blend skinning (LBS), and rendered with differentiable Gaussian splatting. Training combines reconstruction and perceptual objectives with a face-specific adversarial loss to enhance realism in close-up views. Experiments demonstrate strong rendering quality, with face-view performance reaching PSNR/SSIM/LPIPS of 26.243/0.964/0.084 on the AvatarReX dataset. Ablations further highlight contributions of the MHR template and the face-focused deformation. F3G-Avatar provides a practical, high-quality pipeline for realistic, animatable full-body avatar synthesis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce F3G-Avatar, a full-body face-aware Gaussian avatar method that reconstructs animatable representations from multi-view RGB video and regressed pose/shape parameters. It starts from a clothed Momentum Human Rig (MHR) template, renders front/back positional maps, decodes them via a two-branch architecture (body branch for pose-dependent deformations and face-focused branch for head refinement), fuses the resulting 3D Gaussians, applies LBS posing, and renders with differentiable Gaussian splatting. Training uses reconstruction, perceptual, and face-specific adversarial losses; experiments report face-view PSNR/SSIM/LPIPS of 26.243/0.964/0.084 on AvatarReX with ablations on the MHR template and face branch.

Significance. If the fusion produces boundary-consistent results, the work addresses a recognized limitation of global Gaussian optimization for facial detail and provides a practical pipeline for animatable full-body avatars. Credit is due for the explicit two-branch design, the use of the MHR template, the face-adversarial loss, and the reported ablations that isolate component contributions; the concrete face-view metrics on an external dataset offer a falsifiable benchmark.

major comments (2)
  1. [Abstract] Abstract: the description states that Gaussians from the body and face branches are fused before LBS, yet supplies no mechanism for overlap resolution, blending weights, or canonical-space alignment at the neck/shoulder interface. This is load-bearing for the central claim of seamless full-body synthesis, because pose-dependent inconsistencies or seams would remain invisible to the reported face-view metrics (26.243/0.964/0.084) and could undermine the practical pipeline assertion.
  2. [Abstract] Abstract / Experiments: only face-view metrics are quantified, with no corresponding full-body PSNR/SSIM/LPIPS, error bars, or direct numerical comparisons against prior full-body Gaussian avatar baselines. Without these, the claim of 'strong rendering quality' for the complete avatar rests on an incomplete evaluation that does not fully substantiate the two-branch advantage over global methods.
minor comments (1)
  1. [Title] Title: a space appears before the colon ('F3G-Avatar : Face Focused'); standard title formatting omits this space.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have made revisions to improve the clarity and completeness of the evaluation and description.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the description states that Gaussians from the body and face branches are fused before LBS, yet supplies no mechanism for overlap resolution, blending weights, or canonical-space alignment at the neck/shoulder interface. This is load-bearing for the central claim of seamless full-body synthesis, because pose-dependent inconsistencies or seams would remain invisible to the reported face-view metrics (26.243/0.964/0.084) and could undermine the practical pipeline assertion.

    Authors: We agree that the abstract is high-level and does not specify the fusion details. The full manuscript (Section 3.2) describes the fusion as direct concatenation of Gaussians from both branches in the canonical space of the MHR template, with the face branch Gaussians overriding any potential overlaps in the head/neck region to maintain alignment and avoid seams; no explicit blending weights are used because the branches are trained on disjoint regions with the face branch covering the head completely. To address the concern, we have revised the abstract to briefly state: 'The predicted Gaussians from the body and face branches are fused via concatenation in canonical space (with face-branch Gaussians taking precedence in the head region) before LBS.' This makes the overlap resolution and alignment explicit while preserving the abstract's brevity. revision: yes

  2. Referee: [Abstract] Abstract / Experiments: only face-view metrics are quantified, with no corresponding full-body PSNR/SSIM/LPIPS, error bars, or direct numerical comparisons against prior full-body Gaussian avatar baselines. Without these, the claim of 'strong rendering quality' for the complete avatar rests on an incomplete evaluation that does not fully substantiate the two-branch advantage over global methods.

    Authors: We acknowledge that full-body metrics and baseline comparisons would provide a more comprehensive evaluation. In the revised manuscript we have added full-body PSNR/SSIM/LPIPS results on AvatarReX (reported in a new Table 2), direct numerical comparisons against prior full-body Gaussian avatar methods (e.g., GaussianAvatar and related baselines), and error bars computed over three independent training runs. These additions are placed in Section 4.2 alongside the existing face-view metrics. The face-view numbers remain the primary reported figures because the central contribution is improved facial fidelity, but the new full-body results confirm that the two-branch design does not degrade overall avatar quality. revision: yes

Circularity Check

0 steps flagged

No significant circularity; explicit architecture with external validation

full rationale

The paper describes an explicit two-branch decoder architecture (body + face-focused) that starts from a clothed MHR template, decodes positional maps into Gaussians, fuses them, applies LBS posing, and renders via differentiable splatting. Training uses a combination of reconstruction, perceptual, and face-specific adversarial losses. Reported metrics (PSNR/SSIM/LPIPS on AvatarReX) are measured on an external dataset rather than derived from the method's own fitted parameters. No equations, predictions, or uniqueness claims reduce by construction to inputs or self-citation chains; the MHR template is used as a starting point without the central result being forced by prior self-work. This is a standard engineering pipeline with ablations, not a self-referential derivation.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The claim rests on the effectiveness of the introduced two-branch decoder and MHR template, plus standard assumptions from prior Gaussian splatting and skinning work; several training objective weights are implicit free parameters.

free parameters (2)
  • weights for reconstruction, perceptual, and face-adversarial losses
    The training combines multiple objectives whose relative strengths are chosen to achieve the reported metrics.
  • network hyperparameters for body and face branches
    Decoder architecture details and capacity allocations are selected to fit the face refinement task.
axioms (2)
  • domain assumption Linear blend skinning accurately poses the fused Gaussians
    Invoked when the predicted Gaussians are posed after fusion.
  • standard math Differentiable Gaussian splatting produces correct renderings from the posed Gaussians
    Relies on the established differentiable renderer from prior literature.
invented entities (1)
  • face-focused deformation branch no independent evidence
    purpose: To refine head geometry and appearance separately from the body branch
    New architectural component introduced to address facial detail limitations; no independent evidence outside the paper's ablations.

pith-pipeline@v0.9.0 · 5551 in / 1582 out tokens · 50536 ms · 2026-05-10T18:07:40.947831+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Robust skin weights transfer via weight inpainting

    Rinat Abdrashitov, Kim Raichstat, Jared Monsen, and David Hill. Robust skin weights transfer via weight inpainting. InSIGGRAPH Asia 2023 Technical Communications, New York, NY , USA, 2023. Association for Computing Machin- ery. 5

  2. [2]

    Scape: shape completion and animation of people

    Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Se- bastian Thrun, Jim Rodgers, and James Davis. Scape: shape completion and animation of people. InACM siggraph 2005 papers, pages 408–416. 2005. 2, 4

  3. [3]

    Mhr: Momentum human rig.arXiv preprint arXiv:2511.15586, 2025

    Aaron Ferguson, Ahmed AA Osman, Berta Bescos, Carsten Stoll, Chris Twigg, Christoph Lassner, David Otte, Eric Vig- nola, Fabian Prada, Federica Bogo, et al. Mhr: Momentum human rig.arXiv preprint arXiv:2511.15586, 2025. 2

  4. [4]

    Dynamic neural radiance fields for monocular 4d facial avatar reconstruction

    Guy Gafni, Justus Thies, Michael Zollh ¨ofer, and Matthias Nießner. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8649–8658, 2021. 3

  5. [5]

    Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition

    Chen Guo, Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12858–12868, 2023. 2, 3

  6. [6]

    Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians

    Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, and Liqiang Nie. Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 634–644, 2024. 2, 3, 7

  7. [7]

    Uv gaus- sians: Joint learning of mesh deformation and gaussian tex- tures for human avatar modeling.Knowledge-Based Systems, 320:113470, 2025

    Yujiao Jiang, Qingmin Liao, Xiaoyu Li, Li Ma, Qi Zhang, Chaopeng Zhang, Zongqing Lu, and Ying Shan. Uv gaus- sians: Joint learning of mesh deformation and gaussian tex- tures for human avatar modeling.Knowledge-Based Systems, 320:113470, 2025. 2, 3

  8. [8]

    Alias-free generative adversarial networks

    Tero Karras, Miika Aittala, Samuli Laine, Erik H ¨ark¨onen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks. InProc. NeurIPS, 2021. 6

  9. [9]

    Alias-free generative adversarial networks.Advances in neural infor- mation processing systems, 34:852–863, 2021

    Tero Karras, Miika Aittala, Samuli Laine, Erik H ¨ark¨onen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Alias-free generative adversarial networks.Advances in neural infor- mation processing systems, 34:852–863, 2021. 2, 4, 6

  10. [10]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, George Drettakis, et al. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,

  11. [11]

    Segment Anything

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick. Segment anything.arXiv:2304.02643, 2023. 5

  12. [12]

    Hugs: Human gaussian splats

    Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. Hugs: Human gaussian splats. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 505– 515, 2024. 3

  13. [13]

    Tava: Template-free animatable volumetric actors

    Ruilong Li, Julian Tanke, Minh V o, Michael Zollh ¨ofer, J¨urgen Gall, Angjoo Kanazawa, and Christoph Lassner. Tava: Template-free animatable volumetric actors. InEu- ropean Conference on Computer Vision, pages 419–436. Springer, 2022. 2, 3, 7

  14. [14]

    Learning a model of facial shape and expression from 4d scans.ACM Trans

    Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. Learning a model of facial shape and expression from 4d scans.ACM Trans. Graph., 36(6):194–1, 2017. 1, 2

  15. [15]

    Neuralangelo: High-fidelity neural surface reconstruction

    Zhaoshuo Li, Thomas M ¨uller, Alex Evans, Russell H Tay- lor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 8456–8465, 2023. 5

  16. [16]

    Ani- matable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling

    Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu. Ani- matable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19711–19722, 2024. 2, 3, 7

  17. [17]

    Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. SMPL: A skinned multi- person linear model.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, 2015. 1, 2

  18. [18]

    Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2, 3

  19. [19]

    Expressive whole-body 3d gaussian avatar

    Gyeongsik Moon, Takaaki Shiratori, and Shunsuke Saito. Expressive whole-body 3d gaussian avatar. InECCV, 2024. 3

  20. [20]

    Human gaussian splatting: Real-time rendering of animatable avatars

    Arthur Moreau, Jifei Song, Helisa Dhamo, Richard Shaw, Yiren Zhou, and Eduardo P ´erez-Pellitero. Human gaussian splatting: Real-time rendering of animatable avatars. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 788–798, 2024. 3

  21. [21]

    The uncanny valley [from the field].IEEE Robotics & automa- tion magazine, 19(2):98–100, 2012

    Masahiro Mori, Karl F MacDorman, and Norri Kageki. The uncanny valley [from the field].IEEE Robotics & automa- tion magazine, 19(2):98–100, 2012. 2

  22. [22]

    Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

    Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 5

  23. [23]

    Humansplat: Generalizable single-image human gaus- sian splatting with structure priors

    Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, and Yebin Liu. Humansplat: Generalizable single-image human gaus- sian splatting with structure priors. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 3

  24. [24]

    Expressive body capture: 3d hands, face, and body from a single image

    Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed AA Osman, Dimitrios Tzionas, and Michael J Black. Expressive body capture: 3d hands, face, and body from a single image. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10975–10985, 2019. 1, 2

  25. [25]

    Animat- able neural radiance fields for human body modeling.arXiv preprint arXiv:2105.02872, 2(3):5, 2021

    Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Animat- able neural radiance fields for human body modeling.arXiv preprint arXiv:2105.02872, 2(3):5, 2021. 2, 3, 7

  26. [26]

    Combin- ing 3d morphable models: A large scale face-and-head model

    Stylianos Ploumpis, Haoyang Wang, Nick Pears, William AP Smith, and Stefanos Zafeiriou. Combin- ing 3d morphable models: A large scale face-and-head model. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 10934–10943,

  27. [27]

    Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians

    Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, and Matthias Nießner. Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20299– 20309, 2024. 3

  28. [28]

    3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting

    Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang. 3dgs-avatar: Animatable avatars via deformable 3d gaussian splatting. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5020–5030, 2024. 3, 7

  29. [29]

    SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

    Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 3

  30. [30]

    X- avatar: Expressive human avatars

    Kaiyue Shen, Chen Guo, Manuel Kaufmann, Juan Jose Zarate, Julien Valentin, Jie Song, and Otmar Hilliges. X- avatar: Expressive human avatars. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16911–16921, 2023. 3

  31. [31]

    Im- pact of virtual avatar appearance realism on perceptual in- teraction experience: a network meta-analysis.Frontiers in Psychology, 16:1624975, 2025

    Zhiyu Tao, Yanyan Liu, Junsheng Qiu, and Shengwei Li. Im- pact of virtual avatar appearance realism on perceptual in- teraction experience: a network meta-analysis.Frontiers in Psychology, 16:1624975, 2025. 2, 3

  32. [32]

    Stylea- vatar: Real-time photo-realistic portrait avatar from a single video

    Lizhen Wang, Xiaochen Zhao, Jingxiang Sun, Yuxiang Zhang, Hongwen Zhang, Tao Yu, and Yebin Liu. Stylea- vatar: Real-time photo-realistic portrait avatar from a single video. InACM SIGGRAPH 2023 Conference Proceedings, pages 1–10, 2023. 4

  33. [33]

    4d-dress: A 4d dataset of real-world human clothing with se- mantic annotations

    Wenbo Wang, Hsuan-I Ho, Chen Guo, Boxiang Rong, Ar- tur Grigorev, Jie Song, Juan Jose Zarate, and Otmar Hilliges. 4d-dress: A 4d dataset of real-world human clothing with se- mantic annotations. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 5

  34. [34]

    Neus2: Fast learning of neural implicit surfaces for multi-view recon- struction

    Yiming Wang, Qin Han, Marc Habermann, Kostas Dani- ilidis, Christian Theobalt, and Lingjie Liu. Neus2: Fast learning of neural implicit surfaces for multi-view recon- struction. InProceedings of the IEEE/CVF international conference on computer vision, pages 3295–3306, 2023. 5

  35. [35]

    Gaussian head avatar: Ultra high-fidelity head avatar via dynamic gaussians

    Yuelang Xu, Benwang Chen, Zhe Li, Hongwen Zhang, Lizhen Wang, Zerong Zheng, and Yebin Liu. Gaussian head avatar: Ultra high-fidelity head avatar via dynamic gaussians. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 1931–1941, 2024. 3

  36. [36]

    Multiview neu- ral surface reconstruction by disentangling geometry and ap- pearance.Advances in Neural Information Processing Sys- tems, 33:2492–2502, 2020

    Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan Atzmon, Basri Ronen, and Yaron Lipman. Multiview neu- ral surface reconstruction by disentangling geometry and ap- pearance.Advances in Neural Information Processing Sys- tems, 33:2492–2502, 2020. 5

  37. [37]

    Psavatar: A point-based shape model for real- time head avatar animation with 3d gaussian splatting.arXiv preprint arXiv:2401.12900, 2024

    Zhongyuan Zhao, Zhenyu Bao, Qing Li, Guoping Qiu, and Kanglin Liu. Psavatar: A point-based shape model for real- time head avatar animation with 3d gaussian splatting.arXiv preprint arXiv:2401.12900, 2024. 3

  38. [38]

    Black, and Otmar Hilliges

    Yufeng Zheng, Wang Yifan, Gordon Wetzstein, Michael J. Black, and Otmar Hilliges. Pointavatar: Deformable point- based head avatars from videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21057–21067, 2023. 3

  39. [39]

    Structured local radiance fields for human avatar modeling

    Zerong Zheng, Han Huang, Tao Yu, Hongwen Zhang, Yan- dong Guo, and Yebin Liu. Structured local radiance fields for human avatar modeling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15893–15903, 2022. 2, 3, 6, 7

  40. [40]

    Avatarrex: Real-time expressive full- body avatars.ACM Transactions on Graphics (TOG), 42(4): 1–19, 2023

    Zerong Zheng, Xiaochen Zhao, Hongwen Zhang, Boning Liu, and Yebin Liu. Avatarrex: Real-time expressive full- body avatars.ACM Transactions on Graphics (TOG), 42(4): 1–19, 2023. 3, 6, 7

  41. [41]

    Instant volumetric head avatars

    Wojciech Zielonka, Timo Bolkart, and Justus Thies. Instant volumetric head avatars. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4574–4584, 2023. 3