pith. sign in

arxiv: 2605.25861 · v2 · pith:4NGLVMQXnew · submitted 2026-05-25 · 💻 cs.CV · cs.AI

MuNet: A Mutualistic Network for Joint 3D Human Mesh Recovery and 3D Clothed Human Reconstruction from Single Images

Pith reviewed 2026-06-29 22:58 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords 3D human mesh recoveryclothed human reconstructionmutualistic networkgraph convolutional networksingle imagejoint optimization2-manifold graphsmutualistic mechanism
0
0 comments X

The pith

MuNet jointly optimizes 3D human mesh recovery and clothed reconstruction by letting each task guide the other during training on shared graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that 3D human mesh recovery and 3D clothed human reconstruction, long studied separately, can be addressed together in one framework to exploit their mutual dependencies. It uses 2-manifold graphs as a shared representation and an end-to-end graph convolutional network that starts with a coarse mesh and progressively refines it into a detailed clothed model. A mutualistic mechanism lets mesh recovery provide guidance to clothed reconstruction while reconstruction feedback improves the mesh output, all during training. A sympathetic reader would care because this avoids separate pipelines and could produce more accurate 3D human models from single images by capitalizing on the tasks' natural connections.

Core claim

MuNet is a mutualistic network that solves 3D human mesh recovery and 3D clothed human reconstruction jointly from single images. It adopts 2-manifold graphs as a unified representation for consistent modeling, employs an end-to-end graph convolutional network that progressively deforms an initial graph into a 3D human mesh and then refines it into a detailed clothed model, and introduces a mutualistic mechanism for reciprocal interaction between the tasks during training where mesh recovery guides reconstruction and reconstruction refines mesh recovery. This yields state-of-the-art performance on both tasks across the Human3.6M, 3DPW, MPI-INF-3DHP, THuman2.0, CAPE, and RenderPeople datasets

What carries the argument

The mutualistic mechanism that enables reciprocal interaction between 3D human mesh recovery and 3D clothed human reconstruction during training on a shared 2-manifold graph representation.

If this is right

  • Both tasks receive guidance from the other and reach higher accuracy than isolated training.
  • A single end-to-end model produces both bare meshes and detailed clothed outputs without separate stages.
  • The same 2-manifold graph representation supports consistent modeling for mesh recovery and clothed reconstruction.
  • Training incorporates feedback between tasks to refine outputs progressively from coarse to detailed.
  • The approach delivers state-of-the-art results on both tasks across indoor, outdoor, and synthetic datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The shared graph representation could support adding further related tasks such as pose refinement or texture mapping without redesigning the backbone.
  • Practitioners could reduce the number of separate models needed for 3D human pipelines by adopting the joint training loop.
  • The progressive deformation strategy might generalize to other 3D reconstruction problems that move from coarse shape to fine surface detail.
  • Real-world deployment on varied camera qualities could test whether the mutualistic gains persist outside controlled benchmark conditions.

Load-bearing premise

The mutualistic interaction mechanism during training produces measurable reciprocal benefits that are not achievable by standard multi-task or sequential training on the same graph representation.

What would settle it

An ablation experiment that removes the mutualistic interaction, trains with standard multi-task or sequential methods on the identical graph setup, and measures no performance drop on the metrics for either task across the evaluation datasets.

Figures

Figures reproduced from arXiv: 2605.25861 by Changxin Gao, Jingying Chen, Leyuan Liu, Yuhan Li, Yunqi Gao.

Figure 1
Figure 1. Figure 1: (a) Framework of joint 3D human mesh recovery and 3D clothed human reconstruction. The [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of MuNet. The T-pose SMPL template [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Detailed network architecture of MuNet. The 3D mesh recovery sub-network [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example images in six testing datasets. The testing datasets span diverse scenarios: Human3.6M, [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparisons of different 3D human mesh recovery methods on the Human3.6M, 3DPW, MPI-INF-3DHP, and CPAE datasets, where humans wear non-loose clothing. We show the results from three viewpoints and highlight the obvious pose and body shape errors visible from each viewpoint. Please zoom in to see the details. 4.4. Results on 3D Clothed Human Reconstruction We compare our MuNet with four SOTA 3D … view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison of different 3D human mesh recovery methods on the THuman2.0 and RenderPeople datasets, where humans wear loose clothing. We show the results from three viewpoints and highlight the obvious pose and body shape errors visible from each viewpoint. Please zoom in to see the details [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative comparison of different 3D clothed human reconstruction methods on the CAPE dataset. For each input image, we show the SMPL-(X) recovered 3D human mesh and the reconstructed 3D clothed human models respectively in the even and odd rows. Please zoom in to see the details. ICON [19], ECON [22], D-IF [20], VS [23] and SIFU [48] are based on ground-truth SMPL-(X) and normal maps. However, SMPL-(X) … view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparison of different 3D clothed human reconstruction methods on the THuman2.0 dataset. For each input image, we show the recovered 3D human meshes and the reconstructed 3D clothed human models respectively in the odd and even rows. Please zoom in to see the details. of clothing details (as indicated by higher εcos and εl2). It indicates that the accuracy of 3D mesh recovery is crucial for 3D… view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative comparison of different 3D clothed human reconstruction methods on the RenderPeople dataset. For each input image, we show the recovered 3D human meshes and the reconstructed 3D clothed human models, respectively in the odd and even rows. Please zoom in to see the details. may produce incomplete clothes. In contrast, our MuNet demonstrates excellent gen￾eralization to loose clothing. 4.5. Ablat… view at source ↗
Figure 10
Figure 10. Figure 10: Comparisons of 3D human mesh recovery results with and without clothed human reconstruction [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The figure shows the results of clothed human reconstruction guided by advanced methods. The [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
read the original abstract

3D human mesh recovery and 3D clothed human reconstruction are inherently related, yet they have long been studied in isolation, thereby overlooking the potential gains of joint optimization. To overcome this limitation, we propose to address these two tasks within a unified framework, which allows their mutual dependencies to be effectively exploited. Building on this idea, we propose MuNet, a mutualistic network for joint 3D human mesh recovery and 3D clothed human reconstruction from single images. First, we adopt 2-manifold graphs as a unified representation for all 3D models, enabling consistent modeling across 3D human mesh recovery and clothed human reconstruction. Second, we design an end-to-end graph convolutional network that progressively deforms an initial graph into a 3D human mesh and refines it into a detailed 3D clothed human model. Third, we introduce a mutualistic mechanism that allows reciprocal interaction between the two tasks {during training}, where 3D human mesh recovery provides guidance for 3D clothed human reconstruction, and reconstruction feedback refines the 3D human mesh recovery. We extensively evaluate MuNet on six benchmark datasets for 3D human mesh recovery and 3D clothed human reconstruction, including Human3.6M, 3DPW, MPI-INF-3DHP, THuman2.0, CAPE, and RenderPeople. Experimental results demonstrate that MuNet achieves state-of-the-art performance on both tasks across all datasets. The code of MuNet is released for research purposes at https://github.com/starVisionTeam/MuNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes MuNet, a mutualistic network for joint 3D human mesh recovery and 3D clothed human reconstruction from single images. It adopts 2-manifold graphs as a unified representation, designs an end-to-end progressive graph convolutional network that deforms an initial graph into a mesh and then a detailed clothed model, and introduces a mutualistic training mechanism for reciprocal interaction (mesh recovery guiding clothed reconstruction and vice versa). The work claims state-of-the-art results on both tasks across six datasets (Human3.6M, 3DPW, MPI-INF-3DHP, THuman2.0, CAPE, RenderPeople) and releases code.

Significance. If the mutualistic mechanism can be shown to deliver measurable reciprocal benefits beyond what is achievable by standard multi-task or sequential training on the same graph/GCN backbone, the result would be significant for 3D human modeling, as it would demonstrate concrete gains from exploiting task interdependencies in a unified framework rather than treating the problems in isolation. The code release is a positive factor for reproducibility.

major comments (2)
  1. [Abstract] Abstract: the central claim attributes SOTA gains specifically to the mutualistic interaction during training, yet no ablation studies are described that disable the reciprocal feedback loops while retaining the shared 2-manifold graph representation and progressive GCN; without such controls the attribution cannot be verified and the gains could instead arise from the unified representation or deformation pipeline alone.
  2. [Abstract] Abstract: the manuscript states SOTA performance on six datasets but provides no quantitative tables, implementation details on the mutualistic mechanism, loss functions, or training schedule, preventing verification of the claimed performance or the role of the mutualistic component.
minor comments (1)
  1. [Abstract] The phrase 'mutualistic mechanism that allows reciprocal interaction between the two tasks {during training}' contains stray braces that appear to be a formatting artifact.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below, clarifying the role of the mutualistic mechanism and the placement of supporting details in the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim attributes SOTA gains specifically to the mutualistic interaction during training, yet no ablation studies are described that disable the reciprocal feedback loops while retaining the shared 2-manifold graph representation and progressive GCN; without such controls the attribution cannot be verified and the gains could instead arise from the unified representation or deformation pipeline alone.

    Authors: The manuscript presents ablation studies comparing the full MuNet to variants using the same 2-manifold graph and progressive GCN but without the mutualistic training. To directly isolate the contribution of the reciprocal feedback, we will add a dedicated control experiment that disables the feedback loops while preserving all other components. This will be included in the revised version to strengthen verification of the mutualistic mechanism's role. revision: yes

  2. Referee: [Abstract] Abstract: the manuscript states SOTA performance on six datasets but provides no quantitative tables, implementation details on the mutualistic mechanism, loss functions, or training schedule, preventing verification of the claimed performance or the role of the mutualistic component.

    Authors: Abstracts are subject to strict length limits and therefore omit tables and low-level implementation details. The full manuscript contains quantitative tables reporting results on all six datasets (Human3.6M, 3DPW, MPI-INF-3DHP, THuman2.0, CAPE, RenderPeople) in the Experiments section, together with explicit descriptions of the mutualistic mechanism, loss functions, and training schedule in the Method and Implementation sections. We will revise the abstract to add concise pointers to these sections. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical architecture with external benchmarks

full rationale

The paper proposes MuNet as an empirical network architecture using 2-manifold graphs and a mutualistic training interaction for joint mesh recovery and clothed reconstruction. No equations, derivations, or first-principles claims are presented that reduce performance gains to fitted parameters, self-citations, or definitional equivalences. Results rest on evaluations across six independent external datasets (Human3.6M, 3DPW, etc.), with the mutualistic mechanism described as a training procedure rather than a mathematical identity. This is a standard empirical contribution without load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, no explicit free parameters, axioms, or invented entities are detailed beyond standard assumptions in graph convolutional networks for 3D reconstruction.

pith-pipeline@v0.9.1-grok · 5840 in / 1160 out tokens · 21727 ms · 2026-06-29T22:58:19.051699+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 42 canonical work pages

  1. [1]

    Zhang, Y

    H. Zhang, Y . Tian, Y . Zhang, M. Li, L. An, Z. Sun, Y . Liu, PyMAF-X: To- wards well-aligned full-body model regression from monocular images, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2023) 1– 16doi:10.1109/TPAMI.2023.3271691

  2. [2]

    In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)

    H. Zhang, Y . Tian, X. Zhou, W. Ouyang, Y . Liu, L. Wang, Z. Sun, PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop, in: IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 11426–11436. doi:10.1109/ICCV48922.2021.01125

  3. [3]

    Kanazawa, M

    A. Kanazawa, M. J. Black, D. W. Jacobs, J. Malik, End-to-end recovery of hu- man shape and pose, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7122–7131. doi:10.1109/CVPR.2018.00744

  4. [5]

    Y . Tian, H. Zhang, Y . Liu, L. Wang, Recovering 3D human mesh from monocular images: A survey, IEEE Transactions on Pattern Anal- ysis and Machine Intelligence (TPAMI) 45 (12) (2023) 15406 – 15425. doi:10.1109/TPAMI.2023.3298850

  5. [6]

    Kolotouros, G

    N. Kolotouros, G. Pavlakos, K. Daniilidis, Convolutional mesh regression for single-image human shape reconstruction, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4501–4510. doi:10.1109/CVPR.2019.00463

  6. [7]

    W. Zeng, W. Ouyang, P. Luo, W. Liu, X. Wang, 3D human mesh re- gression with dense correspondence, in: IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), 2020, pp. 7054–7063. doi:10.1109/CVPR42600.2020.00708. 28

  7. [8]

    S. K. Dwivedi, Y . Sun, P. Patel, Y . Feng, M. J. Black, Tokenhmr: Advancing hu- man mesh recovery with a tokenized pose representation, in: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 1323–1333. doi:10.1109/CVPR52733.2024.00132

  8. [9]

    E.-T. Le, A. Kakolvris, P. Koutras, H. Tam, E. Skordos, G. Papandreou, R. A. Güler, I. Kokkinos, Meshpose: Unifying densepose and 3D body mesh recon- struction, in: IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2024, pp. 2405–2414. doi:10.1109/CVPR52733.2024.00233

  9. [10]

    ACM Transactions on Graphics (TOG)34(6), 1–16 (2015).https://doi.org/10.1145/2816795.2818013

    M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, M. J. Black, SMPL: A skinned multi-person linear model, ACM Transactions on Graphics (TOG) 34 (6) (2015) 1–16. doi:10.1145/2816795.2818013

  10. [11]

    Pavlakos, V

    G. Pavlakos, V . Choutas, N. Ghorbani, T. Bolkart, A. A. A. Osman, D. Tzionas, M. J. Black, Expressive body capture: 3D hands, face, and body from a single image, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10975–10985. doi:10.1109/CVPR.2019.01123

  11. [12]

    F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, M. J. Black, Keep it SMPL: Automatic estimation of 3D human pose and shape from a single im- age, in: European Conference on Computer Vision (ECCV), 2016, pp. 561–578. doi:10.1007/978-3-319-46454-1_34

  12. [13]

    Center -based 3D Object Detection and Tracking,

    A. Zanfir, E. G. Bazavan, M. Zanfir, W. T. Freeman, R. Sukthankar, C. Sminchis- escu, Neural descent for visual 3D human pose and shape, in: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14484– 14493. doi:10.1109/CVPR46437.2021.01425

  13. [14]

    Pavlakos, L

    G. Pavlakos, L. Zhu, X. Zhou, K. Daniilidis, Learning to estimate 3D hu- man pose and shape from a single color image, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 459–468. doi:10.1109/CVPR.2018.00055. 29

  14. [16]

    L. Liu, Y . Gao, J. Sun, J. Chen, Single-image 3d human pose and shape estimation enhanced by clothed 3d human reconstruction, in: International Symposium on Artificial Intelligence and Robotics, 2023, pp. 33–44

  15. [17]

    Y . Gao, L. Liu, Y . Li, C. Gao, Y . Liu, J. Chen, Clothhmr: 3D mesh recovery of humans in diverse clothing from single image, in: International Conference on Multimedia Retrieval (ICMR), Association for Computing Machinery, 2025, p. 368–377. doi:10.1145/3731715.3733288

  16. [19]

    Y . Xiu, J. Yang, D. Tzionas, M. J. Black, ICON: Implicit clothed humans obtained from normals, in: IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 2022, pp. 13296–13306. doi:10.1109/TPAMI.2021.3050505

  17. [20]

    X. Yang, Y . Luo, Y . Xiu, W. Wang, H. Xu, Z. Fan, D-IF: Uncertainty- aware human digitization via implicit distribution field, in: IEEE/CVF International Conference on Computer Vision, 2023, pp. 9122–9132. doi:10.1109/ICCV51070.2023.00837

  18. [21]

    Zhang, L

    Z. Zhang, L. Sun, Z. Yang, L. Chen, Y . Yang, Global-correlated 3D-decoupling transformer for clothed avatar reconstruction, in: Advances in Neural Information Processing Systems (NeurIPS), 2023, pp. 7818 – 7830

  19. [22]

    Y . Xiu, J. Yang, X. Cao, D. Tzionas, M. J. Black, ECON: Explicit clothed humans optimized via normal integration, in: IEEE/CVF Conference 30 on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 512–523. doi:10.1109/CVPR52729.2023.00057

  20. [23]

    L. Liu, Y . Li, Y . Gao, C. Gao, Y . Liu, J. Chen, VS: Reconstructing clothed 3D human from single image via vertex shift, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 10498–10507. doi:10.1109/CVPR52733.2024.00999

  21. [24]

    Y . Feng, V . Choutas, T. Bolkart, D. Tzionas, M. J. Black, Collaborative regression of expressive bodies using moderation, in: International Conference on 3D Vision (3DV), 2021, pp. 792–804

  22. [25]

    A. O. Balan, L. Sigal, M. J. Black, J. E. Davis, H. W. Haussecker, Detailed human shape and pose from images, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2007, pp. 1–8. doi:10.1109/CVPR.2007.383340

  23. [26]

    Anguelov, P

    D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, J. Davis, SCAPE: Shape completion and animation of people, ACM Transactions on Graphics (TOG) 24 (3) (2005) 408–416. doi:10.1145/1073204.1073207

  24. [27]

    Loper, N

    M. Loper, N. Mahmood, M. J. Black, MoSh: motion and shape cap- ture from sparse markers, ACM Trans. Graph. 33 (6) (Nov. 2014). doi:10.1145/2661229.2661273

  25. [28]

    Ionescu, D

    C. Ionescu, D. Papava, V . Olaru, C. Sminchisescu, Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 36 (7) (2013) 1325–1339. doi:10.1109/TPAMI.2013.248

  26. [29]

    von Marcard, R

    T. von Marcard, R. Henschel, M. J. Black, B. Rosenhahn, G. Pons-Moll, Re- covering accurate 3D human pose in the wild using imus and a moving cam- era, in: European Conference on Computer Vision (ECCV), 2018, p. 614–631. doi:10.1007/978-3-030-01249-6_37

  27. [30]

    URL https: //doi.org/10.1109/3DV.2017.00081

    D. Mehta, H. Rhodin, D. Casas, P. Fua, O. Sotnychenko, W. Xu, C. Theobalt, Monocular 3D human pose estimation in the wild using improved cnn super- 31 vision, in: International Conference on 3D Vision (3DV), 2017, pp. 506–516. doi:10.1109/3DV .2017.00064

  28. [31]

    Q. Ma, J. Yang, A. Ranjan, S. Pujades, G. Pons-Moll, S. Tang, M. J. Black, Learning to dress 3d people in generative clothing, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6469–6478. doi:10.1109/CVPR42600.2020.00650

  29. [32]

    RenderPeople, www.renderpeople.com (2018)

  30. [33]

    T. Yu, Z. Zheng, K. Guo, P. Liu, Q. Dai, Y . Liu, Function4D: Real-time human volumetric capture from very sparse consumer RGBD sensors, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 5746–5756. doi:10.1109/CVPR46437.2021.00569

  31. [34]

    arXiv:1910.12933 [cs]

    M. Kocabas, N. Athanasiou, M. J. Black, Vibe: Video inference for human body pose and shape estimation, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5252–5262. doi:10.1109/CVPR42600.2020.00530

  32. [35]

    S. K. Dwivedi, N. Athanasiou, M. Kocabas, M. J. Black, Learning to regress bodies from images using differentiable semantic rendering, in: IEEE/CVF In- ternational Conference on Computer Vision (ICCV), 2021, pp. 11250–11259. doi:10.1109/ICCV48922.2021.01106

  33. [36]

    J. Li, C. Xu, Z. Chen, S. Bian, L. Yang, C. Lu, HybrIK: A hybrid analytical- neural inverse kinematics solution for 3D human pose and shape estimation, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 3382–3392. doi:10.1109/CVPR46437.2021.00339

  34. [37]

    H. Choi, G. Moon, J. Park, K. M. Lee, Learning to estimate robust 3D human mesh from in-the-wild crowded scenes, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 1465–1474. doi:10.1109/CVPR52688.2022.00153. 32

  35. [38]

    Y . Liu, Z. Zhang, Stgformer: Spatio-temporal graphformer for 3d hu- man pose estimation in video, Pattern Recognition 171 (2026) 112239. doi:10.1016/j.patcog.2025.112239

  36. [39]

    Y . Luo, C. Yuan, L. Gao, W. Xu, X. Yang, P. Wang, Fatnet: Feature-alignment transformer network for human pose transfer, Pattern Recognition 165 (2025) 111626. doi:10.1016/j.patcog.2025.111626

  37. [40]

    Lassner, J

    C. Lassner, J. Romero, M. Kiefel, F. Bogo, M. J. Black, P. V . Gehler, Unite the People: Closing the loop between 3D and 2D human representations, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6050–6059. doi:10.1109/CVPR.2017.500

  38. [41]

    X. Ma, J. Su, C. Wang, W. Zhu, Y . Wang, 3D human mesh estimation from virtual markers, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 534–543. doi:10.1109/CVPR52729.2023.00059

  39. [42]

    Sengupta, I

    A. Sengupta, I. Budvytis, R. Cipolla, Synthetic training for accurate 3D human pose and shape estimation in the wild, in: British Machine Vision Conference (BMVC), 2020

  40. [43]

    Y . Zhou, M. Habermann, I. Habibie, A. Tewari, C. Theobalt, F. Xu, Monocular real-time full body capture with inter-part correlations, in: IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 4809–4820. doi:10.1109/CVPR46437.2021.00478

  41. [44]

    Doersch, A

    C. Doersch, A. Zisserman, Sim2real transfer learning for 3D human pose estima- tion: motion to the rescue, Curran Associates Inc., Red Hook, NY , USA, 2019

  42. [45]

    R. A. Güler, N. Neverova, I. Kokkinos, Densepose: Dense human pose estimation in the wild, in: IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2018, pp. 7297–7306. doi:10.1109/CVPR.2018.00762

  43. [46]

    Center -based 3D Object Detection and Tracking,

    Y . Jafarian, H. S. Park, Learning high fidelity depths of dressed humans by watching social media dance videos, in: IEEE/CVF Conference on 33 Computer Vision and Pattern Recognition (CVPR), 2021, pp. 12748–12757. doi:10.1109/CVPR46437.2021.01256

  44. [48]

    Bootstrapping SparseFormers from vision foundation models

    Z. Zhang, Z. Yang, Y . Yang, SIFU: Side-view conditioned implicit function for real-world usable clothed human reconstruction, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 9936–9947. doi:10.1109/CVPR52733.2024.00948

  45. [49]

    Huang, S

    Z. Huang, S. M. Erfani, S. Lu, M. Gong, Efficient neural implicit represen- tation for 3D human reconstruction, Pattern Recognition 156 (2024) 110758. doi:10.1016/j.patcog.2024.110758

  46. [50]

    R. B. Neupane, K. Li, Z. Mao, High-fidelity 3D reconstruction via unified nerf- mesh optimization with geometric and color consistency, Pattern Recognition 170 (2026) 112071. doi:10.1016/j.patcog.2025.112071

  47. [51]

    J. Pan, X. Li, J. Bai, J. Dai, Litenerfavatar: A lightweight nerf with local fea- ture learning for dynamic human avatar, Pattern Recognition 170 (2026) 112008. doi:10.1016/j.patcog.2025.112008

  48. [52]

    URLhttps://doi.org/10.1109/ICCV.2019.00943

    S. Saito, Z. Huang, R. Natsume, S. Morishima, A. Kanazawa, H. Li, PIFu: Pixel-aligned implicit function for high-resolution clothed human digitization, in: IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 2304–2314. doi:10.1109/ICCV .2019.00239

  49. [53]

    Huang, H

    J. Huang, H. Su, L. J. Guibas, Robust watertight manifold surface generation method for ShapeNet models, ArXiv (2018)

  50. [54]

    T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in: International Conference on Learning Representations (ICLR), 2016. 34

  51. [55]

    Hanocka, A

    R. Hanocka, A. Hertz, N. Fish, R. Giryes-Or, S. Fleishman, Daniel, MeshCNN: A network with an edge, ACM Transactions on Graphics (TOG) 38 (4) (2019) 90:1–90:12. 35