pith. sign in

arxiv: 2605.13853 · v1 · pith:TP44CXAXnew · submitted 2026-03-25 · 💻 cs.GR · cs.AI· cs.CV

FaceParts: Segmentation and Editing of Gaussian Splatting

Pith reviewed 2026-05-15 07:07 UTC · model grok-4.3

classification 💻 cs.GR cs.AIcs.CV
keywords Gaussian SplattingFacial SegmentationUnsupervised Editing3D Avatar ModelingPart TransferFLAME ModelDensity Clustering
0
0 comments X

The pith

Unsupervised segmentation decomposes Gaussian splatting avatars into editable facial parts like eyes and beards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FaceParts, a method that breaks down 3D Gaussian splatting representations of human faces into distinct semantic components without any manual labels or supervised training. It works entirely inside the Gaussian domain by first disentangling features, then applying density-based clustering, and finally using FLAME anchors to move parts from one avatar to another. The resulting segments adapt automatically to changes in pose and expression while keeping the original identity intact, as shown by strong scores on identity preservation, expression distance, and pose distance across eleven subjects in the NeRSemble dataset. This removes the need for labor-intensive manual editing that currently dominates 3D avatar work and opens direct editing paths for virtual reality and entertainment uses.

Core claim

Gaussian splatting avatars can be decomposed into semantically coherent facial parts through unsupervised feature disentanglement and density-based clustering, with FLAME-anchored transfer enabling precise editing and cross-avatar part swapping that preserves identity (ID = 0.943) while maintaining low average expression distance (AED = 0.021) and low average pose distance (APD = 0.004).

What carries the argument

Feature disentanglement combined with density-based clustering inside the Gaussian domain, guided by FLAME parametric anchors for part transfer.

If this is right

  • Facial features such as beards, eyebrows, eyes, and mustaches can be isolated automatically and edited directly in the 3D Gaussian representation.
  • Parts from one avatar can be transferred to another while automatically adapting to the new pose and expression.
  • Identity consistency remains high after transfer, as measured by an ID score of 0.943 on the test set.
  • The pipeline eliminates manual mesh editing steps for common avatar customization tasks.
  • The same decomposition works across multiple subjects without per-avatar tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The clustering step may generalize to other Gaussian-based scene elements beyond faces if similar feature spaces are used.
  • Real-time applications could arise by caching the part labels once computed, allowing live swapping in interactive environments.
  • Combining the segments with animation rigs might let users mix and match traits from different source avatars in a single model.
  • Failure cases on extreme expressions could point to the need for expression-aware feature extraction in follow-up work.

Load-bearing premise

Unsupervised clustering on disentangled Gaussian features will reliably group points into parts that match real semantic facial features across varied identities and expressions.

What would settle it

Finding that the extracted clusters do not correspond to visible facial landmarks or that swapped parts fail to match the target avatar's expression and pose in visual or metric checks.

Figures

Figures reproduced from arXiv: 2605.13853 by Dominik Galus, Julia Farganus, Miko{\l}aj Czachorowski, Piotr Syga, Przemys{\l}aw Spurek, Tymoteusz Zapa{\l}a.

Figure 1
Figure 1. Figure 1: Overview of the FaceParts reconstruction pipeline. The unsupervised segmentation process has three stages: [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Framework overview: (a) Head avatars employ a hybrid solution of 3D Gaussians rigged to a parametric [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Examples of face-part transfer across identities. The top row shows extracted segments (beard, eyebrows, [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Dynamic behavior of face swapping. Top row: avatar 253 rendered across different poses and expressions. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Segmentation results for different bottleneck sizes. Smaller bottlenecks (e.g., size 3 and 4) lead to in [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Example highlighting the limitations of segmentation and generative models. (a) Segmentation, left: ground [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of transferring hair from avatar 306 to avatar 218 using our FaceParts and MEGA [Wang et al., [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: a) Lusage + Lsparsity b) Lsparsity c) Lusage d) no additional loss e) direct prediction on xyz coordinates f) prediction on xyz with the standard softmax. The sparsity loss encourages the network to cluster nearby regions, while the usage loss promotes a more uniform bottleneck distribution across the entire avatar. Models trained without HashGrid fail to disentangle smaller facial parts. 7 Conclusion We i… view at source ↗
Figure 9
Figure 9. Figure 9: The first row illustrates the overlap strategy with different values of parameter [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Examples of facial part swaps. Paper under review [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: First row: hair extracted from avatar 306 using the disentanglement network followed by clustering. Second [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
read the original abstract

Facial editing is an important task with applications in entertainment, virtual reality, and digital avatars. Most existing approaches rely on generative models in the 2D image domain, while in 3D the task is typically performed through labor-intensive manual editing. We propose FaceParts, a framework for unsupervised segmentation and editing of Gaussian Splatting avatars. Unlike existing 2D or mesh-assisted methods, our approach operates directly in the Gaussian domain, decomposing avatars into semantically coherent facial parts without supervision. The method integrates feature disentanglement, density-based clustering, and FLAME-anchored part transfer, enabling precise editing and cross-avatar part swapping. Experiments on the NeRSemble dataset with 11 subjects demonstrate robust isolation of features such as beards, eyebrows, eyes and mustaches. Quantitative evaluation confirms that transferred segments adapt to pose and expression, while maintaining identity consistency (ID = 0.943), low Average Expression Distance (AED = 0.021) and low Average Pose Distance (APD = 0.004).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents FaceParts, a framework for unsupervised segmentation and editing of Gaussian Splatting avatars. It decomposes avatars into semantically coherent facial parts using feature disentanglement and density-based clustering, integrated with FLAME-anchored part transfer to enable precise editing and cross-avatar swapping. Experiments on the NeRSemble dataset with 11 subjects demonstrate isolation of features such as beards, eyebrows, eyes, and mustaches, supported by quantitative metrics of identity consistency (ID = 0.943), low Average Expression Distance (AED = 0.021), and low Average Pose Distance (APD = 0.004).

Significance. If the core unsupervised decomposition proves robust, the work would advance direct 3D Gaussian avatar editing without manual intervention or 2D intermediaries, with clear utility for VR and entertainment applications. The Gaussian-domain operation and cross-avatar transfer capability represent potential strengths, though they hinge on the reliability of the clustering step.

major comments (3)
  1. [Method] Method section on feature disentanglement and clustering: the central claim that density-based clustering produces semantically coherent parts (beards, eyebrows, etc.) 'without supervision' is load-bearing yet unsupported by any parameter-free derivation or stability analysis; density clustering is sensitive to feature scale and thresholds, and the reported isolation may depend on implicit choices or post-hoc selection.
  2. [Method] FLAME-anchored part transfer subsection: reliance on the external FLAME parametric model for transfer introduces a supervised prior that leaks semantic information, directly contradicting the 'without supervision' assertion for the overall pipeline even if the initial decomposition is unsupervised.
  3. [Experiments] Experiments and quantitative evaluation: the reported metrics (ID = 0.943, AED = 0.021, APD = 0.004) on NeRSemble lack error bars, ablation details on clustering parameters, or comparisons to supervised baselines, undermining assessment of whether transferred segments reliably adapt to pose/expression across identities.
minor comments (2)
  1. [Abstract] Abstract: the claim of 'robust isolation' would be strengthened by specifying the exact number of expressions/poses per subject and any failure cases observed.
  2. [Figures] Figures: visual results for part swapping and editing would benefit from explicit annotations or legends indicating source vs. transferred segments.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the presentation of our unsupervised segmentation claims and evaluation.

read point-by-point responses
  1. Referee: [Method] Method section on feature disentanglement and clustering: the central claim that density-based clustering produces semantically coherent parts (beards, eyebrows, etc.) 'without supervision' is load-bearing yet unsupported by any parameter-free derivation or stability analysis; density clustering is sensitive to feature scale and thresholds, and the reported isolation may depend on implicit choices or post-hoc selection.

    Authors: We clarify that 'unsupervised' refers specifically to the lack of semantic labels or annotated training data for part decomposition; the density-based clustering (DBSCAN) operates directly on the disentangled Gaussian features. Hyperparameters are fixed based on the observed feature scale in the NeRSemble data and held constant across all 11 subjects to ensure reproducibility. We acknowledge the sensitivity concern and will add a stability analysis subsection, including results from varying epsilon and min_samples within data-driven ranges, plus qualitative figures showing consistent part isolation (e.g., beards and eyebrows) across these variations. Specific parameter values will also be reported explicitly. revision: yes

  2. Referee: [Method] FLAME-anchored part transfer subsection: reliance on the external FLAME parametric model for transfer introduces a supervised prior that leaks semantic information, directly contradicting the 'without supervision' assertion for the overall pipeline even if the initial decomposition is unsupervised.

    Authors: The referee is correct that FLAME is a supervised parametric model. It is used exclusively in the transfer stage to establish 3D vertex correspondences between avatars for part swapping, after the unsupervised decomposition has already occurred. No FLAME-derived semantics influence the feature disentanglement or clustering steps. We will revise the abstract, introduction, and method sections to explicitly distinguish the unsupervised segmentation from the alignment tool used for transfer, avoiding any overstatement of the 'without supervision' claim for the full pipeline. revision: partial

  3. Referee: [Experiments] Experiments and quantitative evaluation: the reported metrics (ID = 0.943, AED = 0.021, APD = 0.004) on NeRSemble lack error bars, ablation details on clustering parameters, or comparisons to supervised baselines, undermining assessment of whether transferred segments reliably adapt to pose/expression across identities.

    Authors: We agree these details are needed for a complete assessment. In the revised manuscript we will add error bars (standard deviation across the 11 subjects) for all reported metrics. We will include an ablation table showing the effect of clustering hyperparameters on ID, AED, and APD. We will also add a supervised baseline comparison (e.g., using manually annotated parts transferred via the same FLAME anchoring) to quantify how our unsupervised results compare in terms of pose/expression adaptation and identity preservation. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper's core pipeline—feature disentanglement followed by density-based clustering and FLAME-anchored transfer—is presented as a procedural method without any equations that define outputs in terms of the same fitted parameters or self-referential predictions. No self-citations are invoked as load-bearing uniqueness theorems, no ansatzes are smuggled via prior author work, and no known results are merely renamed. Quantitative metrics (ID, AED, APD) are computed against an external dataset and FLAME model, keeping the derivation self-contained and falsifiable outside its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on standard assumptions from Gaussian Splatting and the FLAME face model; no new free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Feature disentanglement on Gaussian points yields clusters that correspond to semantic facial parts
    Invoked to justify unsupervised segmentation step.
  • domain assumption FLAME model provides reliable anchors for part transfer across avatars
    Used to enable pose and expression adaptation.

pith-pipeline@v0.9.0 · 5507 in / 1187 out tokens · 44559 ms · 2026-05-15T07:07:26.794137+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

  1. [1]

    Scaling Learning Algorithms Towards

    Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

  2. [2]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

    Liu, Hanxi and Men, Yifang and Lian, Zhouhui , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2025 , pages =

  3. [3]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

    Gerogiannis, Dimitrios and Papantoniou, Foivos Paraperas and Potamias, Rolandos Alexandros and Lattas, Alexandros and Zafeiriou, Stefanos , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2025 , pages =

  4. [4]

    and Osindero, Simon and Teh, Yee Whye , journal =

    Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

  5. [5]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  6. [6]

    and Unberath, Mathias and Liu, Ming-Yu and Lin, Chen-Hsuan , booktitle=

    Li, Zhaoshuo and Müller, Thomas and Evans, Alex and Taylor, Russell H. and Unberath, Mathias and Liu, Ming-Yu and Lin, Chen-Hsuan , booktitle=. Neuralangelo: High-Fidelity Neural Surface Reconstruction , year=

  7. [7]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

    Gu\'edon, Antoine and Lepetit, Vincent , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

  8. [8]

    2016 , publisher=

    Deep learning , author=. 2016 , publisher=

  9. [9]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  10. [10]

    NeRSemble: Multi-View Radiance Field Reconstruction of Human Heads , year =

    Kirschstein, Tobias and Qian, Shenhan and Giebenhain, Simon and Walter, Tim and Nie. NeRSemble: Multi-View Radiance Field Reconstruction of Human Heads , year =. doi:10.1145/3592455 , journal =

  11. [11]

    2020 , eprint=

    NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis , author=. 2020 , eprint=

  12. [12]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

    Moreau, Arthur and Song, Jifei and Dhamo, Helisa and Shaw, Richard and Zhou, Yiren and P\'erez-Pellitero, Eduardo , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

  13. [13]

    2024 , eprint=

    SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting , author=. 2024 , eprint=

  14. [14]

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Jun Xiang and Xuan Gao and Yudong Guo and Juyong Zhang , title =. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  15. [15]

    Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , month =

    Cha, Hyunsoo and Lee, Inhee and Joo, Hanbyul , title =. Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , month =. 2025 , pages =

  16. [16]

    ICASSP 2025 , year =

    Generating Editable Head Avatars with 3D Gaussian GANs , author =. ICASSP 2025 , year =

  17. [17]

    ACM SIGGRAPH , year=

    3D Gaussian Splatting for Real-Time Radiance Field Rendering , author=. ACM SIGGRAPH , year=

  18. [18]

    arXiv preprint arXiv:2312.02194 , year=

    COLMAP-Free 3D Gaussian Splatting , author=. arXiv preprint arXiv:2312.02194 , year=

  19. [19]

    arXiv preprint arXiv:2410.xxxxx , year=

    Gaussian Splatting with Neural Basis Extension , author=. arXiv preprint arXiv:2410.xxxxx , year=

  20. [20]

    CVPR , year=

    Generating Editable Head Avatars with 3D Gaussian GANs , author=. CVPR , year=

  21. [21]

    ACM Symposium on Virtual Reality Software and Technology (VRST) , year=

    AvatarPerfect: User-Assisted 3D Gaussian Splatting Avatar Construction , author=. ACM Symposium on Virtual Reality Software and Technology (VRST) , year=

  22. [22]

    CVPR , year=

    Unsupervised Face Part Discovery by Hierarchical Parsing , author=. CVPR , year=

  23. [23]

    ICCV , year=

    Unsupervised Universal Image Segmentation (U2Seg) , author=. ICCV , year=

  24. [24]

    arXiv preprint arXiv:2309.07125 , year=

    Text-guided generation and editing of compositional 3d avatars , author=. arXiv preprint arXiv:2309.07125 , year=

  25. [25]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Avatarverse: High-quality & stable 3d avatar creation from text and pose , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  26. [26]

    Visual Computing for Industry, Biomedicine, and Art , volume=

    Avatars in the educational metaverse , author=. Visual Computing for Industry, Biomedicine, and Art , volume=. 2025 , publisher=

  27. [27]

    2021 , eprint=

    SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , author=. 2021 , eprint=

  28. [28]

    , author=

    Learning a model of facial shape and expression from 4D scans. , author=. ACM Trans. Graph. , volume=

  29. [29]

    ACM Transactions on Graphics , volume=

    3D Gaussian Splatting for Real-Time Radiance Field Rendering , author=. ACM Transactions on Graphics , volume=

  30. [30]

    European conference on computer vision , pages=

    Channel selection using gumbel softmax , author=. European conference on computer vision , pages=. 2020 , organization=

  31. [31]

    Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

    Face identity-aware disentanglement in stylegan , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

  32. [32]

    ECCV , year=

    On-Device Unsupervised Image Segmentation , author=. ECCV , year=

  33. [33]

    Pattern Recognition , year=

    Facial Parts Swapping with Generative Adversarial Networks , author=. Pattern Recognition , year=

  34. [34]

    Computer Graphics Forum , year=

    Face Editing Using Part-Based Optimization of the Latent Space , author=. Computer Graphics Forum , year=

  35. [35]

    IEEE Transactions on Multimedia , year=

    Semantic Facial Attribute Editing: A Survey , author=. IEEE Transactions on Multimedia , year=

  36. [36]

    CVPR , year=

    Facial Attribute Editing by Only Changing What You Want , author=. CVPR , year=

  37. [37]

    2024 , eprint=

    Generating Editable Head Avatars with 3D Gaussian GANs , author=. 2024 , eprint=

  38. [38]

    doi: 10.1109/ tpami.2021.3087709

    Deng, Jiankang and Guo, Jia and Yang, Jing and Xue, Niannan and Kotsia, Irene and Zafeiriou, Stefanos , year=. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , volume=. IEEE Transactions on Pattern Analysis and Machine Intelligence , publisher=. doi:10.1109/tpami.2021.3087709 , number=

  39. [39]

    IEEE Computer Vision and Pattern Recognition Workshops , year=

    Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set , author=. IEEE Computer Vision and Pattern Recognition Workshops , year=

  40. [40]

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

    Yao, Xu and Newson, Alasdair and Gousseau, Yann and Hellier, Pierre , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2021 , pages =

  41. [41]

    2025 , eprint=

    Pixels2Points: Fusing 2D and 3D Features for Facial Skin Segmentation , author=. 2025 , eprint=

  42. [42]

    ACM SIGGRAPH 2023 Conference Proceedings , series =

    Nerfstudio: A Modular Framework for Neural Radiance Field Development , author =. ACM SIGGRAPH 2023 Conference Proceedings , series =

  43. [43]

    Gu and Ben Poole , booktitle =

    Eric Jang and S. Gu and Ben Poole , booktitle =. ArXiv , title =

  44. [44]

    2022 , eprint=

    PluGeN: Multi-Label Conditional Generation From Pre-Trained Models , author=. 2022 , eprint=

  45. [45]

    2021 , eprint=

    SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes , author=. 2021 , eprint=

  46. [46]

    2025 , eprint=

    HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars , author=. 2025 , eprint=

  47. [47]

    2024 , eprint=

    MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing , author=. 2024 , eprint=

  48. [48]

    2025 , eprint=

    Kornel Howil and Joanna Waczyńska and Piotr Borycki and Tadeusz Dziarmaga and Marcin Mazur and Przemysław Spurek , title=. 2025 , eprint=