pith. sign in

arxiv: 2604.08548 · v3 · submitted 2026-04-09 · 💻 cs.CV

ETCH-X: Robustify Expressive Body Fitting to Clothed Humans with Composable Datasets

Pith reviewed 2026-05-10 18:24 UTC · model grok-4.3

classification 💻 cs.CV
keywords body fittingclothed humansSMPL-Xdense correspondencesparametric body models3D point cloudshuman poseclothing dynamics
0
0 comments X

The pith

ETCH-X upgrades expressive body fitting to clothed humans by disentangling clothing removal from dense correspondence alignment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper upgrades an existing body fitting method to handle clothed humans more effectively. It introduces a tightness-aware stage to filter clothing dynamics and replaces sparse markers with implicit dense correspondences for finer details and robustness. The modular design supports training on separate datasets for clothing, poses, and hands. A reader would care because it addresses practical challenges in 3D human modeling that affect animation, virtual try-on, and other applications.

Core claim

The authors show that by using a tightness-aware fitting paradigm to 'undress' the input and implicit dense correspondences for alignment, combined with SMPL-X for expressiveness, the system achieves robust fitting across diverse conditions when trained on composable data sources.

What carries the argument

The disentangled 'undress' and 'dense fit' modular stages that separate clothing dynamics filtering from fine-grained body alignment using implicit correspondences.

If this is right

  • Improved performance on seen datasets like 4D-Dress and CAPE for body and hand fitting.
  • Stronger generalization to unseen data such as BEDLAM2.0.
  • Enhanced handling of partial or noisy inputs due to dense correspondences.
  • Scalable training by composing different data sources for garments, motions, and gestures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar modular disentanglement could be applied to other 3D fitting problems involving variable external factors.
  • The use of composable datasets suggests a way to leverage existing specialized datasets without needing a single massive one.
  • Downstream tasks like animation may benefit from the improved accuracy in hands and full body.

Load-bearing premise

The tightness-aware fitting accurately separates clothing effects from body shape without introducing estimation errors.

What would settle it

A scenario with loose clothing that closely matches body contours where the method incorrectly alters the estimated body shape or fails on severely incomplete point clouds.

Figures

Figures reproduced from arXiv: 2604.08548 by Boqian Li, Jingyi Wu, Siyuan Yu, Xiaoben Li, Yuliang Xiu, Zeyu Cai.

Figure 1
Figure 1. Figure 1: Strengths of ETCH-X. While NICP [31], which uses implicit dense correspondence but lacks tightness-aware undressing, consistently produces overweight bodies from clothed scans (A), ETCH [24], with tightness-aware undressing but sparse markers, fails to capture detailed body parts such as hands and face (B), and struggles with partial inputs due to missing markers (C). In contrast, our ETCH-X combines the s… view at source ↗
Figure 2
Figure 2. Figure 2: Two stages of ETCH-X: (A) Masked Undress, (B) Dense Fit. In the Masked Undress stage, we take a clothed scan as input and compute the undressed body (yˆi = xi + ˆlivˆi). In the Dense Fit stage, we implicitly learns the deforming field, which deforms the canonical SMPL-X into a posed one. Thanks to the decoupled design, the robustness to dynamic clothing and pose variations could be improved with simulated … view at source ↗
Figure 3
Figure 3. Figure 3: Hand Refinement by Re-sampling. After obtaining initial body fitting, we re-sample points around the hand and fit hand model separately. the neural field for model fitting. When J vertices are used, the SMPL -X model parameters are fitting by minimizing: \min _{\boldsymbol {\theta }, \boldsymbol {\beta },\expressioncoeff ,\mathbf {t}} \sum _{j=1}^{J} \left \| \mathbf {m}_j - \mathbf {M}(\shapecoeff , \pose… view at source ↗
Figure 4
Figure 4. Figure 4: Failure Case of ETCH [24] on BEDLAM2.0. Two respresentative reasons for ETCH failure are incorrect part labeling (above) and inaccurate inner points (both). The failure is reflected in the large V2V (12.209cm) and MPJPE (15.031cm) errors of ETCH reported in Tab. 2 [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Partial Augmentation . ETCH-X predicts better body poses with partial augmentation. and ETCH, trained exclusively on either AMASS or CLOTH3D. By leveraging tightness vectors, ETCH-X achieves more accurate undressing, particularly for loose garments in 4D-Dress where NICP often struggles. In contrast, ETCH is limited in pose generalization due to the constrained pose diversity in CLOTH3D. ⚡ ⚡ [PITH_FULL_IM… view at source ↗
Figure 6
Figure 6. Figure 6: Scaling Analysis of ETCH-X. Increasing the amount of training data from CLOTH3D [3] (left) and AMASS [30] (right) does not necessarily improve performance: tightness accuracy saturates, while pose robustness continues to increase. Scaling Analysis. As discussed in Sec. 1, ETCH-X leverages both sim￾ulated garment data (CLOTH3D) and body pose libraries (AMASS), enabling scalability across diverse sources [P… view at source ↗
Figure 7
Figure 7. Figure 7: Hand Refinement Results. ETCH-X produce much better hand poses with hand refinement. Hand Refinement. As described in Sec. 3.2, we adopt re-sampling to fit hand separately. The results under different settings are shown in Tab. 6, validating the effectiveness of our design. The visual comparison results in [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison with SOTAs on 4D-Dress [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
read the original abstract

Human body fitting, which aligns parametric body models such as SMPL to raw 3D point clouds of clothed humans, serves as a crucial first step for downstream tasks like animation and texturing. An effective fitting method should be both locally expressive-capturing fine details such as hands and facial features-and globally robust to handle real-world challenges, including clothing dynamics, pose variations, and noisy or partial inputs. Existing approaches typically excel in only one aspect, lacking an all-in-one solution. We upgrade ETCH to ETCH-X, which leverages a tightness-aware fitting paradigm to filter out clothing dynamics ("undress"), extends expressiveness with SMPL-X, and replaces explicit sparse markers (which are highly sensitive to partial data) with implicit dense correspondences ("dense fit") for more robust and fine-grained body fitting. Our disentangled "undress" and "dense fit" modular stages enable separate and scalable training on composable data sources, including diverse simulated garments (CLOTH3D), large-scale full-body motions (AMASS), and fine-grained hand gestures (InterHand2.6M), improving outfit generalization and pose robustness of both bodies and hands. Our approach achieves robust and expressive fitting across diverse clothing, poses, and levels of input completeness, delivering a substantial performance improvement over ETCH on both: 1) seen data, such as 4D-Dress (MPJPE-All, 33.0% ) and CAPE (V2V-Hands, 35.8% ), and 2) unseen data, such as BEDLAM2.0 (MPJPE-All, 80.8% ; V2V-All, 80.5% ). Code and models will be released at https://xiaobenli00.github.io/ETCH-X/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper introduces ETCH-X as an upgrade to ETCH for aligning SMPL-X body models to clothed human 3D point clouds. It proposes a modular disentangled pipeline with a tightness-aware 'undress' stage to filter clothing dynamics and an implicit dense correspondence 'dense fit' stage to replace sparse markers, enabling separate training on composable datasets (CLOTH3D for garments, AMASS for full-body motions, InterHand2.6M for hands). The approach claims improved expressiveness for hands/faces and robustness to pose, clothing, and partial/noisy inputs, with reported relative gains over ETCH of 33.0% MPJPE-All on 4D-Dress, 35.8% V2V-Hands on CAPE, and 80.8% MPJPE-All / 80.5% V2V-All on unseen BEDLAM2.0.

Significance. If validated, the work is significant for 3D human reconstruction in computer vision. The composable-dataset training and modular disentanglement of undressing and dense fitting represent a practical advance over monolithic approaches, supporting better generalization to diverse clothing and incomplete inputs. The large gains on unseen data and explicit promise to release code/models are notable strengths that could accelerate progress in animation, AR/VR, and downstream tasks requiring accurate body shape and pose under real-world conditions.

major comments (2)
  1. §3 (Method, tightness-aware fitting): The central claim that the tightness-aware paradigm filters clothing dynamics without introducing errors in body shape estimation lacks supporting ablations or error analysis; this assumption is load-bearing for the robustness results on both seen and unseen datasets such as BEDLAM2.0.
  2. §4 (Experiments): The substantial reported improvements (e.g., 80.8% MPJPE-All on BEDLAM2.0) are presented without component-wise ablations isolating the contributions of the undress stage versus the dense-fit stage versus SMPL-X expressiveness, making it difficult to attribute gains to the proposed disentanglement.
minor comments (3)
  1. Abstract: Relative improvements are given (e.g., 33.0%), but absolute baseline ETCH values should be included alongside for transparent comparison.
  2. Throughout: Ensure consistent definition of metrics (MPJPE-All, V2V-All) and acronyms at first use in the main text.
  3. §4: The evaluation on partial/noisy inputs would benefit from additional qualitative examples or failure-case analysis to complement the quantitative tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review and the recommendation for minor revision. We appreciate the recognition of the work's significance and address the major comments point by point below, indicating the revisions we will make.

read point-by-point responses
  1. Referee: §3 (Method, tightness-aware fitting): The central claim that the tightness-aware paradigm filters clothing dynamics without introducing errors in body shape estimation lacks supporting ablations or error analysis; this assumption is load-bearing for the robustness results on both seen and unseen datasets such as BEDLAM2.0.

    Authors: We agree that the manuscript would be strengthened by additional ablations and error analysis for the tightness-aware fitting. In the revised version, we will add a dedicated ablation in Section 4 that compares body shape estimation errors (using shape parameter differences and vertex-to-vertex errors against ground-truth undressed bodies from AMASS) with and without the tightness-aware undress stage. This will quantify any introduced errors while demonstrating the filtering of clothing dynamics, directly supporting the robustness claims on seen and unseen data including BEDLAM2.0. revision: yes

  2. Referee: §4 (Experiments): The substantial reported improvements (e.g., 80.8% MPJPE-All on BEDLAM2.0) are presented without component-wise ablations isolating the contributions of the undress stage versus the dense-fit stage versus SMPL-X expressiveness, making it difficult to attribute gains to the proposed disentanglement.

    Authors: We concur that component-wise ablations are necessary to isolate and attribute the gains. We will revise Section 4 to include a new ablation table that systematically disables the undress stage, the dense-fit stage, and compares SMPL-X against a non-expressive SMPL baseline. This will report incremental performance changes on the key metrics and datasets, clarifying the contributions of the disentangled pipeline and SMPL-X expressiveness. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical ML pipeline with modular, separately trained stages (tightness-aware undress on CLOTH3D, dense fit on AMASS + InterHand2.6M) evaluated on independent held-out datasets (4D-Dress, CAPE, BEDLAM2.0). Reported gains are direct empirical measurements on external ground truth, not reductions by construction. Minor self-citation to prior ETCH work exists but is not load-bearing for the performance claims, which rest on new composable training and implicit correspondences rather than self-referential definitions or fitted inputs renamed as predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on standard assumptions of parametric body models and existing datasets without introducing new free parameters or invented entities beyond the proposed modular stages.

axioms (1)
  • domain assumption SMPL-X parametric model accurately represents human body shapes, poses, hands, and faces
    Central to the fitting pipeline as stated in the abstract.

pith-pipeline@v0.9.0 · 5648 in / 1286 out tokens · 90642 ms · 2026-05-10T18:24:38.917698+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 1 internal anchor

  1. [1]

    Transactions on Graphics (TOG) (2003) 2, 5

    Allen, B., Curless, B., Popović, Z.: The Space of Human Body Shapes: Reconstruc- tion and Parameterization from Range Scans. Transactions on Graphics (TOG) (2003) 2, 5

  2. [2]

    In: International Conference on 3D Vision (3DV) (2024) 5

    Antić, D., Tiwari, G., Ozcomlekci, B., Marin, R., Pons-Moll, G.: CloSe: A 3D Clothing Segmentation Dataset and Model. In: International Conference on 3D Vision (3DV) (2024) 5

  3. [3]

    In: ECCV (2020) 3, 6, 10, 13

    Bertiche, H., Madadi, M., Escalera, S.: CLOTH3D: Clothed 3D Humans. In: ECCV (2020) 3, 6, 10, 13

  4. [4]

    Bhatnagar, B., Petrov, I., Xie, X.: RVH Mesh Registration.https://github.com/ bharat-b7/RVH_Mesh_Registration(2022) 5

  5. [5]

    In: European Conference on Computer Vision (ECCV) (2020) 2, 5, 10, 11, 15

    Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction. In: European Conference on Computer Vision (ECCV) (2020) 2, 5, 10, 11, 15

  6. [6]

    In: Conference on Neural Information Processing Systems (NeurIPS) (2020) 2, 3, 5

    Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Loopreg: Self- supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration. In: Conference on Neural Information Processing Systems (NeurIPS) (2020) 2, 3, 5

  7. [7]

    In: European Conference on Computer Vision (ECCV) (2016) 2

    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P.V., Romero, J., Black, M.J.: Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In: European Conference on Computer Vision (ECCV) (2016) 2

  8. [8]

    In: International Conference on Learning Representations (ICLR) (2026) 2

    Cai, Z., Li, Z., Li, X., Li, B., Wang, Z., Zhang, Z., Xiu, Y.: Up2you: Fast reconstruc- tion of yourself from unconstrained photo collections. In: International Conference on Learning Representations (ICLR) (2026) 2

  9. [9]

    In: Advances in Neural Information Processing Systems (2023) 9

    Cai, Z., Yin, W., Zeng, A., Wei, C., Sun, Q., Yanjun, W., Pang, H.E., Mei, H., Zhang, M., Zhang, L., Loy, C.C., Yang, L., Liu, Z.: SMPLer-X: Scaling up expressive human pose and shape estimation. In: Advances in Neural Information Processing Systems (2023) 9

  10. [10]

    Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2019) 3, 5

    Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: Realtime Multi- Person 2D Pose Estimation using Part Affinity Fields. Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2019) 3, 5

  11. [11]

    In: Computer Vision and Pattern Recognition (CVPR)

    Chen, H., Liu, S., Chen, W., Li, H., Hill, R.: Equivariant Point Network for 3D Point Cloud Analysis. In: Computer Vision and Pattern Recognition (CVPR). pp. 14514–14523 (2021) 6

  12. [12]

    Transactions on Graphics (TOG)41(1), 1–17 (2021) 5

    Chen, X., Pang, A., Yang, W., Wang, P., Xu, L., Yu, J.: TightCap: 3D human shape capture with clothing tightness field. Transactions on Graphics (TOG)41(1), 1–17 (2021) 5

  13. [13]

    Image and Vision Computing (1992) 2, 5

    Chen, Y., Medioni, G.: Object Modelling by Registration of Multiple Range Images. Image and Vision Computing (1992) 2, 5

  14. [14]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Chibane, J., Alldieck, T., Pons-Moll, G.: Implicit functions in feature space for 3d shape reconstruction and completion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6970–6981 (2020) 8

  15. [15]

    In: European Conference on Computer Vision (ECCV) (2022) 3, 8, 9

    Corona, E., Pons-Moll, G., Alenyà, G., Moreno-Noguer, F.: Learned Vertex Descent: A New Direction for 3D Human Model Fitting. In: European Conference on Computer Vision (ECCV) (2022) 3, 8, 9

  16. [16]

    In: International Conference on Learning Representations (ICLR) (2021) 3 ETCH-X 17

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In: International Conference on Learning Representations (ICLR) (2021) 3 ETCH-X 17

  17. [17]

    https://github.com/ RenderKit/embree(2025) 11

    Embree: High Performance Ray Tracing Kernels 4.4.0. https://github.com/ RenderKit/embree(2025) 11

  18. [18]

    Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2022) 3, 5

    Fang, H.S., Li, J., Tang, H., Xu, C., Zhu, H., Xiu, Y., Li, Y.L., Lu, C.: Alphapose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time. Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2022) 3, 5

  19. [19]

    In: International Conference on Computer Vision (ICCV) (2023) 2, 5, 10, 15

    Feng, H., Kulits, P., Liu, S., Black, M.J., Abrevaya, V.F.: Generalizing Neural Hu- man Fitting to Unseen Poses With Articulated SE(3) Equivariance. In: International Conference on Computer Vision (ICCV) (2023) 2, 5, 10, 15

  20. [20]

    In: SIGGRAPH Talks (2020) 2

    de Goes, F., Fong, D., O’Malley, M.: Garment Refitting for Digital Characters. In: SIGGRAPH Talks (2020) 2

  21. [21]

    In: Computer Vision and Pattern Recognition (CVPR) (2019) 5

    Gong, K., Gao, Y., Liang, X., Shen, X., Wang, M., Lin, L.: Graphonomy: Universal Human Parsing via Graph Transfer Learning. In: Computer Vision and Pattern Recognition (CVPR) (2019) 5

  22. [22]

    In: Computer Vision and Pattern Recognition (CVPR) (2022) 2

    Huang, C.H.P., Yi, H., Höschle, M., Safroshkin, M., Alexiadis, T., Polikovsky, S., Scharstein, D., Black, M.J.: Capturing and Inferring Dense Full-Body Human-Scene Contact. In: Computer Vision and Pattern Recognition (CVPR) (2022) 2

  23. [23]

    In: International Conference on Computer Vision (ICCV) (2019) 5

    Jiang, H., Cai, J., Zheng, J.: Skeleton-Aware 3D Human Shape Reconstruction from Point Clouds. In: International Conference on Computer Vision (ICCV) (2019) 5

  24. [24]

    In: International Conference on Computer Vision (ICCV) (2025) 2, 3, 4, 5, 6, 10, 11, 12, 15

    Li, B., Feng, H., Cai, Z., Black, M.J., Xiu, Y.: ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness. In: International Conference on Computer Vision (ICCV) (2025) 2, 3, 4, 5, 6, 10, 11, 12, 15

  25. [25]

    In: International Conference on 3D Vision (3DV) (2025) 2

    Li, B., Li, X., Jiang, Y., Xie, T., Gao, F., Wang, H., Yang, Y., Jiang, C.: Garment- dreamer: 3dgs guided garment synthesis with diverse geometry and texture details. In: International Conference on 3D Vision (3DV) (2025) 2

  26. [26]

    In: Proceedings of the 29th ACM International Conference on Multimedia (2021) 5

    Liu, G., Rong, Y., Sheng, L.: VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds. In: Proceedings of the 29th ACM International Conference on Multimedia (2021) 5

  27. [27]

    Transactions on Graphics (TOG) (2015) 2

    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: A Skinned Multi-Person Linear Model. Transactions on Graphics (TOG) (2015) 2

  28. [28]

    Sm- plolympics: Sports environments for physically simulated humanoids

    Luo, Z., Wang, J., Liu, K., Zhang, H., Tessler, C., Wang, J., Yuan, Y., Cao, J., Lin, Z., Wang, F., et al.: SMPLOlympics: Sports Environments for Physically Simulated Humanoids. arXiv preprint arXiv:2407.00187 (2024) 2

  29. [29]

    In: Computer Vision and Pattern Recognition (CVPR) (2020) 4, 5, 10

    Ma, Q., Yang, J., Ranjan, A., Pujades, S., Pons-Moll, G., Tang, S., Black, M.J.: Learning to Dress 3D People in Generative Clothing. In: Computer Vision and Pattern Recognition (CVPR) (2020) 4, 5, 10

  30. [30]

    In: International Conference on Computer Vision (ICCV) (2019) 4, 5, 6, 10, 13

    Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: Archive of Motion Capture As Surface Shapes. In: International Conference on Computer Vision (ICCV) (2019) 4, 5, 6, 10, 13

  31. [31]

    In: European Conference on Computer Vision (ECCV) (2024) 2, 3, 4, 5, 6, 8, 9, 10, 11, 15

    Marin, R., Corona, E., Pons-Moll, G.: NICP: Neural ICP for 3D Human Registration at Scale. In: European Conference on Computer Vision (ECCV) (2024) 2, 3, 4, 5, 6, 8, 9, 10, 11, 15

  32. [32]

    In: European Conference on Computer Vision (ECCV) (2020) 4, 6, 11

    Moon, G., Yu, S.I., Wen, H., Shiratori, T., Lee, K.M.: Interhand2.6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In: European Conference on Computer Vision (ECCV) (2020) 4, 6, 11

  33. [33]

    In: Proceedings IEEE/CVF Conf

    Müller, L., Osman, A.A.A., Tang, S., Huang, C.H.P., Black, M.J.: On self-contact and human pose. In: Proceedings IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) (Jun 2021) 9, 11

  34. [34]

    In: Computer Vision and Pattern Recognition (CVPR) (2021) 5 18 X

    Patel, P., Huang, C.H.P., Tesch, J., Hoffmann, D.T., Tripathi, S., Black, M.J.: AGORA: Avatars in Geography Optimized for Regression Analysis. In: Computer Vision and Pattern Recognition (CVPR) (2021) 5 18 X. Li, J. Wu, Z. Cai, S. Yu, B. Li and Y. Xiu

  35. [35]

    In: Computer Vision and Pattern Recognition (CVPR) (2019) 2, 3, 6, 7

    Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A.A., Tzionas, D., Black, M.J.: Expressive Body Capture: 3D Hands, Face, and Body From a Single Image. In: Computer Vision and Pattern Recognition (CVPR) (2019) 2, 3, 6, 7

  36. [36]

    Transactions on Graphics (TOG) (2015) 2, 5

    Pons-Moll, G., Romero, J., Mahmood, N., Black, M.J.: Dyna: A Model of Dynamic Human Shape in Motion. Transactions on Graphics (TOG) (2015) 2, 5

  37. [37]

    In: International Conference on Computer Vision (ICCV) (2019) 5

    Prokudin, S., Lassner, C., Romero, J.: Efficient Learning on Point Clouds with Basis Point Sets. In: International Conference on Computer Vision (ICCV) (2019) 5

  38. [38]

    In: Computer Vision and Pattern Recognition (CVPR) (2017) 5

    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In: Computer Vision and Pattern Recognition (CVPR) (2017) 5

  39. [39]

    In: Conference on Neural Information Processing Systems (NeurIPS) (2017) 5

    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In: Conference on Neural Information Processing Systems (NeurIPS) (2017) 5

  40. [40]

    ACM Transactions on Graphics, (Proc

    Romero, J., Tzionas, D., Black, M.J.: Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia)36(6) (Nov 2017) 9

  41. [41]

    In: International Conference on 3D Vision (3DV) (2025) 2

    Shao, Z., Wang, D., Tian, Q.Y., Yang, Y.D., Meng, H., Cai, Z., Dong, B., Zhang, Y., Zhang, K., Wang, Z.: DEGAS: Detailed Expressions on Full-Body Gaussian Avatars. In: International Conference on 3D Vision (3DV) (2025) 2

  42. [42]

    Shuai, Q., Fang, Q., Dong, J., Peng, S., Huang, D., Bao, H., Zhou, X.: EasyMo- Cap - Make human motion capture easier (2021),https://github.com/zju3dv/ EasyMocap2, 5

  43. [43]

    DINOv3

    Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khalidov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., Massa, F., Haziza, D., Wehrstedt, L., Wang, J., Darcet, T., Moutakanni, T., Sentana, L., Roberts, C., Vedaldi, A., Tolan, J., Brandt, J., Couprie, C., Mairal, J., Jégou, H., Labatut, P., Bojanowski, P.: Dinov3. arXiv preprint a...

  44. [44]

    In: The Thirty- ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2025) 4, 10

    Tesch, J., Becherini, G., Achar, P., Yiannakidis, A., Kocabas, M., Patel, P., Black, M.J.: BEDLAM2.0: Synthetic Humans and Cameras in Motion. In: The Thirty- ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2025) 4, 10

  45. [45]

    In: International Conference on Computer Vision (ICCV)

    Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: Flexible and Deformable Convolution for Point Clouds. In: International Conference on Computer Vision (ICCV). pp. 6411–6420 (2019) 5

  46. [46]

    In: Computer Vision and Pattern Recog- nition (CVPR) (2025) 3

    Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: VGGT: Visual Geometry Grounded Transformer. In: Computer Vision and Pattern Recog- nition (CVPR) (2025) 3

  47. [47]

    In: Computer Vision and Pattern Recognition (CVPR) (2020) 5

    Wang, K., Xie, J., Zhang, G., Liu, L., Yang, J.: Sequential 3D Human Pose and Shape Estimation from Point Clouds. In: Computer Vision and Pattern Recognition (CVPR) (2020) 5

  48. [48]

    In: Computer Vision and Pattern Recognition (CVPR) (2021) 2, 3, 5, 10, 11, 15

    Wang, S., Geiger, A., Tang, S.: Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration. In: Computer Vision and Pattern Recognition (CVPR) (2021) 2, 3, 5, 10, 11, 15

  49. [49]

    In: Computer Vision and Pattern Recognition (CVPR) (2024) 4, 5, 10

    Wang, W., Ho, H.I., Guo, C., Rong, B., Grigorev, A., Song, J., Zarate, J.J., Hilliges, O.: 4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations. In: Computer Vision and Pattern Recognition (CVPR) (2024) 4, 5, 10

  50. [50]

    In: Computer Vision and Pattern Recognition (CVPR) (2024) 5 ETCH-X 19

    Wu, X., Jiang, L., Wang, P.S., Liu, Z., Liu, X., Qiao, Y., Ouyang, W., He, T., Zhao, H.: Point Transformer V3: Simpler, Faster, Stronger. In: Computer Vision and Pattern Recognition (CVPR) (2024) 5 ETCH-X 19

  51. [51]

    Conference on Neural Information Processing Systems (NeurIPS) (2022) 5

    Wu, X., Lao, Y., Jiang, L., Liu, X., Zhao, H.: Point Transformer V2: Grouped Vector Attention and Partition-based Pooling. Conference on Neural Information Processing Systems (NeurIPS) (2022) 5

  52. [52]

    In: Computer Vision and Pattern Recognition (CVPR) (2020) 2

    Xu, H., Bazavan, E.G., Zanfir, A., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models. In: Computer Vision and Pattern Recognition (CVPR) (2020) 2

  53. [53]

    In: Computer Vision and Pattern Recognition (CVPR) (2021) 5

    Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors. In: Computer Vision and Pattern Recognition (CVPR) (2021) 5

  54. [54]

    Conference on Neural Information Processing Systems (NeurIPS) 30(2017) 5

    Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep Sets. Conference on Neural Information Processing Systems (NeurIPS) 30(2017) 5

  55. [55]

    In: Computer Vision and Pattern Recognition (CVPR) (2017) 5

    Zhang, C., Pujades, S., Black, M.J., Pons-Moll, G.: Detailed, Accurate, Human Shape Estimation From Clothed 3D Scan Sequences. In: Computer Vision and Pattern Recognition (CVPR) (2017) 5

  56. [56]

    In: Interna- tional Conference on Computer Vision (ICCV) (2021) 5, 7

    Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point Transformer. In: Interna- tional Conference on Computer Vision (ICCV) (2021) 5, 7

  57. [57]

    In: International Conference on Computer Vision (ICCV) (2019) 5

    Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D Human Reconstruction From a Single Image. In: International Conference on Computer Vision (ICCV) (2019) 5

  58. [58]

    In: Asian Conference on Computer Vision (ACCV) (2020) 5

    Zhou, B., Franco, J.S., Bogo, F., Tekin, B., Boyer, E.: Reconstructing Human Body Mesh from Point Clouds by Adversarial GP Network. In: Asian Conference on Computer Vision (ACCV) (2020) 5

  59. [59]

    In: Computer Vision and Pattern Recognition (CVPR) (2015) 2, 5

    Zuffi, S., Black, M.J.: The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose. In: Computer Vision and Pattern Recognition (CVPR) (2015) 2, 5