Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

Derek Austin

arxiv: 2604.01447 · v2 · submitted 2026-04-01 · 💻 cs.CV · cs.AI

Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

Derek Austin This is my paper

Pith reviewed 2026-05-13 22:14 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords 3D Gaussian splattinghuman avatar reconstructionbody model ablationSMPLMomentum Human Rigpose estimationPeopleSnapshotZJU-MoCap

0 comments

The pith

Replacing SMPL with the Momentum Human Rig yields higher PSNR in a minimal Gaussian avatar pipeline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper sets out to show that much of the added complexity in recent 3D Gaussian avatar methods is unnecessary. It replaces the standard SMPL body model with the Momentum Human Rig estimated via SAM-3D-Body and trains a basic pipeline that contains no learned deformations or pose-dependent corrections. The result is the highest reported PSNR together with competitive or better LPIPS and SSIM scores on the PeopleSnapshot and ZJU-MoCap datasets. Two controlled ablations translate meshes to SMPL-X and poses to MHR under identical retraining, confirming that body-model expressiveness itself has been a central bottleneck.

Core claim

The central claim is that body model representational capacity has been the primary bottleneck in avatar reconstruction. A minimal pipeline built on the Momentum Human Rig estimated via SAM-3D-Body, without learned deformations or pose-dependent corrections, reaches the highest reported PSNR and competitive or superior LPIPS and SSIM on PeopleSnapshot and ZJU-MoCap. The two ablations that translate SAM-3D-Body meshes into SMPL-X and SMPL poses into MHR, both retrained identically, establish that gains arise from both improved mesh capacity and pose estimation quality.

What carries the argument

The Momentum Human Rig (MHR) body model estimated via SAM-3D-Body, which supplies greater expressiveness than SMPL while requiring no additional learned deformation networks.

If this is right

Simpler pipelines without learned corrections can surpass elaborate networks when the underlying body model is more expressive.
Both mesh representational capacity and pose estimation quality contribute independently to reconstruction fidelity.
Performance gains from the rig change appear consistently on standard human avatar benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adopting more expressive rigs may reduce reliance on large deformation networks in real-time avatar systems.
The same ablation logic could be applied to other parametric models or reconstruction pipelines beyond Gaussians.
Further improvements to rig expressiveness might produce additional gains without any increase in network size.

Load-bearing premise

The two controlled ablations fully isolate pose estimation quality from the body model's intrinsic representational capacity when everything else is retrained identically.

What would settle it

An experiment that applies the identical pose estimates to both SMPL and MHR rigs under the same Gaussian pipeline and checks whether the quality gap remains.

Figures

Figures reproduced from arXiv: 2604.01447 by Derek Austin.

read the original abstract

Recent 3D Gaussian splatting methods built atop SMPL achieve remarkable visual fidelity while continually increasing the complexity of the overall training architecture. We demonstrate that much of this complexity is unnecessary: by replacing SMPL with the Momentum Human Rig (MHR), estimated via SAM-3D-Body, a minimal pipeline with no learned deformations or pose-dependent corrections achieves the highest reported PSNR and competitive or superior LPIPS and SSIM on PeopleSnapshot and ZJU-MoCap. To disentangle pose estimation quality from body model representational capacity, we perform two controlled ablations: translating SAM-3D-Body meshes to SMPL-X, and translating the original dataset's SMPL poses into MHR both retrained under identical conditions. These ablations confirm that body model expressiveness has been a primary bottleneck in avatar reconstruction, with both mesh representational capacity and pose estimation quality contributing meaningfully to the full pipeline's gains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that replacing SMPL with the Momentum Human Rig (MHR) estimated via SAM-3D-Body enables a minimal Gaussian splatting pipeline (no learned deformations or pose-dependent corrections) to achieve the highest reported PSNR and competitive/superior LPIPS and SSIM on PeopleSnapshot and ZJU-MoCap. Two controlled ablations—translating SAM-3D-Body meshes to SMPL-X and original SMPL poses to MHR, both retrained identically—are presented to disentangle pose estimation quality from body model representational capacity, concluding that body model expressiveness has been the primary bottleneck.

Significance. If the ablation controls hold, the result is significant: it shows that a simpler, parameter-light pipeline can outperform increasingly complex deformation networks by improving the underlying body rig, shifting focus from architectural scaling to representational fidelity in Gaussian avatars. The empirical gains on standard datasets provide a falsifiable benchmark for future rig comparisons.

major comments (2)

[Ablation experiments] Ablation experiments: the two translations (SAM-3D-Body meshes to SMPL-X; SMPL poses to MHR) are asserted to be controlled and lossless under identical retraining, yet no vertex-to-vertex or pose-error metrics (e.g., MPJPE or mesh-to-mesh distance) are reported on the translated data. Without these, approximation artifacts cannot be ruled out as a confound, undermining the isolation of representational capacity from translation fidelity.
[Results] Results tables: the claim of 'highest reported PSNR' is load-bearing for the central thesis, but the manuscript does not include an exhaustive comparison table listing all cited prior methods with identical metrics, training iterations, and hardware; this prevents verification that the reported gains are not due to unstated differences in optimization schedule.

minor comments (2)

[Method] Notation: MHR is introduced as an 'invented entity' without an explicit equation or parameter count in the main text; a short table comparing degrees of freedom (joints, blend shapes, etc.) versus SMPL/SMPL-X would clarify the expressiveness claim.
[Figures] Figure clarity: the ablation diagrams should annotate the exact fitting/optimization steps used in each translation direction so readers can assess potential error sources.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the strength of our claims. We respond to each major point below and will incorporate revisions to address the concerns raised.

read point-by-point responses

Referee: [Ablation experiments] Ablation experiments: the two translations (SAM-3D-Body meshes to SMPL-X; SMPL poses to MHR) are asserted to be controlled and lossless under identical retraining, yet no vertex-to-vertex or pose-error metrics (e.g., MPJPE or mesh-to-mesh distance) are reported on the translated data. Without these, approximation artifacts cannot be ruled out as a confound, undermining the isolation of representational capacity from translation fidelity.

Authors: We agree that explicit quantitative metrics on translation fidelity would strengthen the controlled nature of the ablations. In the revised manuscript we will add MPJPE for the SMPL-to-MHR pose translations and mean vertex-to-vertex Euclidean distances for the SAM-3D-Body-to-SMPL-X mesh translations, computed on the same subjects used in the main experiments. These numbers will be reported in a new supplementary table together with a brief discussion of any residual error. revision: yes
Referee: [Results] Results tables: the claim of 'highest reported PSNR' is load-bearing for the central thesis, but the manuscript does not include an exhaustive comparison table listing all cited prior methods with identical metrics, training iterations, and hardware; this prevents verification that the reported gains are not due to unstated differences in optimization schedule.

Authors: We will expand the main results table to list every method cited in the paper, reporting the PSNR, LPIPS and SSIM values exactly as published by the original authors. Where training iteration counts or hardware details appear in the source papers we will include them; otherwise we will mark the entry as “not reported.” A revised table caption will explicitly note that cross-paper comparisons are subject to implementation and hardware differences and that our own runs used the same 300k-iteration schedule and hardware for all ablations. revision: partial

Circularity Check

0 steps flagged

No significant circularity: empirical ablation study with externally falsifiable metrics

full rationale

The paper reports an empirical ablation comparing SMPL and MHR body models in Gaussian splatting avatars, using two controlled mesh/pose translations retrained identically on PeopleSnapshot and ZJU-MoCap. No derivation chain, equations, or predictions are present that reduce to fitted inputs by construction. Claims rest on reported PSNR/LPIPS/SSIM values, which are externally replicable and falsifiable. No self-citation load-bearing steps, self-definitional relations, or ansatz smuggling appear in the abstract or described pipeline. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unverified assumption that MHR has superior representational capacity and that the ablations isolate this factor cleanly; no free parameters or invented entities are explicitly fitted in the abstract.

axioms (1)

domain assumption PeopleSnapshot and ZJU-MoCap are appropriate and representative benchmarks for evaluating avatar reconstruction quality.
Evaluation metrics and comparisons are reported exclusively on these two datasets.

invented entities (1)

Momentum Human Rig (MHR) no independent evidence
purpose: More expressive alternative to SMPL for representing human body shape and pose in avatar pipelines.
Presented as the key replacement that enables the minimal pipeline; no independent evidence outside the reported experiments is given.

pith-pipeline@v0.9.0 · 5449 in / 1277 out tokens · 69522 ms · 2026-05-13T22:14:18.956227+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

Video based reconstruc- tion of 3d people models

Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. Video based reconstruc- tion of 3d people models. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8387– 8397, 2018. 3

work page 2018
[2]

Aaron Ferguson, Ahmed A. A. Osman, Berta Bescos, Carsten Stoll, Chris Twigg, Christoph Lassner, David Otte, Eric Vignola, Fabian Prada, Federica Bogo, Igor Santeste- ban, Javier Romero, Jenna Zarate, Jeongseok Lee, Jinhyung Park, Jinlong Yang, John Doublestein, Kishore Venkateshan, Kris Kitani, Ladislav Kavan, Marco Dal Farra, Matthew Hu, Matthew Cioffi, ...

work page 2025
[3]

Gaussianavatar: Towards realistic human avatar model- ing from a single video via animatable 3d gaussians

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, and Liqiang Nie. Gaussianavatar: Towards realistic human avatar model- ing from a single video via animatable 3d gaussians. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 2, 3

work page 2024
[4]

In- stantavatar: Learning avatars from monocular video in 60 seconds.arXiv, 2022

Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. In- stantavatar: Learning avatars from monocular video in 60 seconds.arXiv, 2022. 1, 2

work page 2022
[5]

Kingma and Jimmy Lei Ba

Diederik P. Kingma and Jimmy Lei Ba. Adam: A method for stochastic optimization. InProceedings of the 3rd In- ternational Conference on Learning Representations (ICLR),

work page
[6]

HUGS: Human gaussian splats

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. HUGS: Human gaussian splats. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 505– 515, 2024. 2

work page 2024
[7]

Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and ex- pression from 4D scans.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017. 2

work page 2017
[8]

Matthew Loper, Naureen Mahmood, Javier Romero, Ger- ard Pons-Moll, and Michael J. Black. SMPL: A skinned multi-person linear model.ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, 2015. 1, 2

work page 2015
[9]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InProceedings of the European Conference on Com- puter Vision (ECCV), 2020. 1

work page 2020
[10]

Ahmed A A Osman, Timo Bolkart, and Michael J. Black. STAR: A sparse trained articulated human body regressor. InEuropean Conference on Computer Vision (ECCV), pages 598–613, 2020. 4

work page 2020
[11]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3D hands, face, and body from a single image. InProceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 10975–10985, 2019. 2

work page 2019
[12]

Ani- matable neural radiance fields for modeling dynamic human bodies

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Ani- matable neural radiance fields for modeling dynamic human bodies. InICCV, 2021. 2, 3

work page 2021
[13]

Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. InCVPR,

work page
[14]

RMAvatar: Photo- realistic human avatar reconstruction from monocular video based on rectified mesh-embedded gaussians.arXiv preprint arXiv:2501.07104, 2025

Sen Peng, Weixing Xie, Zilong Wang, Xiaohu Guo, Zhong- gui Chen, Baorong Yang, and Xiao Dong. RMAvatar: Photo- realistic human avatar reconstruction from monocular video based on rectified mesh-embedded gaussians.arXiv preprint arXiv:2501.07104, 2025. 1

work page arXiv 2025
[15]

Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians

Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, and Matthias Nießner. Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 20299–20309,

work page
[16]

3DGS-Avatar: Animatable avatars via deformable 3D gaussian splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang. 3DGS-Avatar: Animatable avatars via deformable 3D gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5020–5030, 2024. 1, 2, 3

work page 2024
[17]

Javier Romero, Dimitrios Tzionas, and Michael J. Black. Embodied hands: Modeling and capturing hands and bod- ies together.ACM Transactions on Graphics, (Proc. SIG- GRAPH Asia), 36(6), 2017. 2

work page 2017
[18]

SplattingAvatar: Realistic real-time human avatars with mesh-embedded gaussian splatting

Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. SplattingAvatar: Realistic real-time human avatars with mesh-embedded gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1606–1616, 2024. 2, 3

work page 2024
[19]

Srinivasan, Jonathan T

Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. Hu- manNeRF: Free-viewpoint rendering of moving people from monocular video. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 16210–16220, 2022. 1

work page 2022
[20]

Ghum & ghuml: Generative 3d human shape and articulated pose models

Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, Bill Freeman, Rahul Sukthankar, and Cristian Sminchisescu. Ghum & ghuml: Generative 3d human shape and articulated pose models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral), pages 6184–6193, 2020. 4

work page 2020
[21]

Sam 3d body: Robust full-body human mesh recovery, 2026

Xitong Yang, Devansh Kukreja, Don Pinkus, Anushka Sagar, Taosha Fan, Jinhyung Park, Soyong Shin, Jinkun Cao, Jiawei Liu, Nicolas Ugrinovic, Matt Feiszli, Jitendra Malik, Piotr Dollar, and Kris Kitani. Sam 3d body: Robust full-body hu- man mesh recovery.arXiv preprint arXiv:2602.15989, 2026. 1, 2, 3

work page arXiv 2026
[22]

gsplat: An open-source library for gaussian splatting.Journal of Ma- chine Learning Research, 26(34):1–17, 2025

Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tancik, and Angjoo Kanazawa. gsplat: An open-source library for gaussian splatting.Journal of Ma- chine Learning Research, 26(34):1–17, 2025. 2

work page 2025

[1] [1]

Video based reconstruc- tion of 3d people models

Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. Video based reconstruc- tion of 3d people models. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8387– 8397, 2018. 3

work page 2018

[2] [2]

Aaron Ferguson, Ahmed A. A. Osman, Berta Bescos, Carsten Stoll, Chris Twigg, Christoph Lassner, David Otte, Eric Vignola, Fabian Prada, Federica Bogo, Igor Santeste- ban, Javier Romero, Jenna Zarate, Jeongseok Lee, Jinhyung Park, Jinlong Yang, John Doublestein, Kishore Venkateshan, Kris Kitani, Ladislav Kavan, Marco Dal Farra, Matthew Hu, Matthew Cioffi, ...

work page 2025

[3] [3]

Gaussianavatar: Towards realistic human avatar model- ing from a single video via animatable 3d gaussians

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, and Liqiang Nie. Gaussianavatar: Towards realistic human avatar model- ing from a single video via animatable 3d gaussians. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 2, 3

work page 2024

[4] [4]

In- stantavatar: Learning avatars from monocular video in 60 seconds.arXiv, 2022

Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. In- stantavatar: Learning avatars from monocular video in 60 seconds.arXiv, 2022. 1, 2

work page 2022

[5] [5]

Kingma and Jimmy Lei Ba

Diederik P. Kingma and Jimmy Lei Ba. Adam: A method for stochastic optimization. InProceedings of the 3rd In- ternational Conference on Learning Representations (ICLR),

work page

[6] [6]

HUGS: Human gaussian splats

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan. HUGS: Human gaussian splats. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 505– 515, 2024. 2

work page 2024

[7] [7]

Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. Learning a model of facial shape and ex- pression from 4D scans.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):194:1–194:17, 2017. 2

work page 2017

[8] [8]

Matthew Loper, Naureen Mahmood, Javier Romero, Ger- ard Pons-Moll, and Michael J. Black. SMPL: A skinned multi-person linear model.ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, 2015. 1, 2

work page 2015

[9] [9]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InProceedings of the European Conference on Com- puter Vision (ECCV), 2020. 1

work page 2020

[10] [10]

Ahmed A A Osman, Timo Bolkart, and Michael J. Black. STAR: A sparse trained articulated human body regressor. InEuropean Conference on Computer Vision (ECCV), pages 598–613, 2020. 4

work page 2020

[11] [11]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3D hands, face, and body from a single image. InProceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 10975–10985, 2019. 2

work page 2019

[12] [12]

Ani- matable neural radiance fields for modeling dynamic human bodies

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. Ani- matable neural radiance fields for modeling dynamic human bodies. InICCV, 2021. 2, 3

work page 2021

[13] [13]

Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. InCVPR,

work page

[14] [14]

RMAvatar: Photo- realistic human avatar reconstruction from monocular video based on rectified mesh-embedded gaussians.arXiv preprint arXiv:2501.07104, 2025

Sen Peng, Weixing Xie, Zilong Wang, Xiaohu Guo, Zhong- gui Chen, Baorong Yang, and Xiao Dong. RMAvatar: Photo- realistic human avatar reconstruction from monocular video based on rectified mesh-embedded gaussians.arXiv preprint arXiv:2501.07104, 2025. 1

work page arXiv 2025

[15] [15]

Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians

Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, and Matthias Nießner. Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 20299–20309,

work page

[16] [16]

3DGS-Avatar: Animatable avatars via deformable 3D gaussian splatting

Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, and Siyu Tang. 3DGS-Avatar: Animatable avatars via deformable 3D gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5020–5030, 2024. 1, 2, 3

work page 2024

[17] [17]

Javier Romero, Dimitrios Tzionas, and Michael J. Black. Embodied hands: Modeling and capturing hands and bod- ies together.ACM Transactions on Graphics, (Proc. SIG- GRAPH Asia), 36(6), 2017. 2

work page 2017

[18] [18]

SplattingAvatar: Realistic real-time human avatars with mesh-embedded gaussian splatting

Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. SplattingAvatar: Realistic real-time human avatars with mesh-embedded gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1606–1616, 2024. 2, 3

work page 2024

[19] [19]

Srinivasan, Jonathan T

Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. Hu- manNeRF: Free-viewpoint rendering of moving people from monocular video. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 16210–16220, 2022. 1

work page 2022

[20] [20]

Ghum & ghuml: Generative 3d human shape and articulated pose models

Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, Bill Freeman, Rahul Sukthankar, and Cristian Sminchisescu. Ghum & ghuml: Generative 3d human shape and articulated pose models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral), pages 6184–6193, 2020. 4

work page 2020

[21] [21]

Sam 3d body: Robust full-body human mesh recovery, 2026

Xitong Yang, Devansh Kukreja, Don Pinkus, Anushka Sagar, Taosha Fan, Jinhyung Park, Soyong Shin, Jinkun Cao, Jiawei Liu, Nicolas Ugrinovic, Matt Feiszli, Jitendra Malik, Piotr Dollar, and Kris Kitani. Sam 3d body: Robust full-body hu- man mesh recovery.arXiv preprint arXiv:2602.15989, 2026. 1, 2, 3

work page arXiv 2026

[22] [22]

gsplat: An open-source library for gaussian splatting.Journal of Ma- chine Learning Research, 26(34):1–17, 2025

Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tancik, and Angjoo Kanazawa. gsplat: An open-source library for gaussian splatting.Journal of Ma- chine Learning Research, 26(34):1–17, 2025. 2

work page 2025