Latent Dynamics for Full Body Avatar Animation
Pith reviewed 2026-05-21 04:42 UTC · model grok-4.3
The pith
A residual latent evolved by force-decomposed dynamics captures history-dependent clothing motion beyond pose in full-body avatars.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We augment a pose-conditioned 3D Gaussian avatar with a transformer-based decoder and a dynamics residual latent that captures temporal appearance and geometry variation beyond the driving signals. At inference, a learned latent dynamics model evolves the residual latent from a short pose history and the previous latent state. The model decomposes each update into driving, restoring, and dissipative forces, producing temporally coherent, history-dependent rollouts with negligible added cost. Different initial conditions yield diverse yet plausible motion trajectories, and the force decomposition exposes controls such as stiffness.
What carries the argument
Dynamics residual latent whose updates are predicted by a latent dynamics model that decomposes each step into driving, restoring, and dissipative forces.
If this is right
- Different initial conditions produce diverse yet plausible motion trajectories for identical pose sequences.
- Force decomposition supplies interpretable controls such as stiffness adjustment.
- Quantitative metrics and user-study scores improve over prior data-driven baselines across nine everyday-motion sequences with loose garments.
- Temporally coherent animation is obtained with negligible added inference cost.
Where Pith is reading between the lines
- The short-history latent model could support real-time interactive avatar control by letting users set initial latent conditions.
- The same force-decomposed latent update structure might transfer to other dynamic avatar elements such as hair or held objects.
- Hybrid use with occasional explicit simulation steps could correct rare contact failures while retaining the low-cost rollout.
- Extending the history window or testing on longer sequences would check whether the current assumption limits accuracy for sustained motion.
Load-bearing premise
A short window of pose history plus the previous latent state is sufficient for a learned model to produce plausible, generalizable future clothing states without explicit garment geometry or runtime physics simulation.
What would settle it
Compare generated clothing states against ground-truth captures for sequences containing strong inertia events such as abrupt stops or swings; the claim fails if the rollouts remain inconsistent across different short histories for the same current pose or if perceptual studies show no improvement over baselines.
Figures
read the original abstract
Pose-driven full-body avatars built on neural rendering produce high-quality novel views of a captured subject. Yet loose clothing and other dynamic elements deform in ways pose alone cannot explain: the same pose can correspond to many different states, because their motion depends on history, inertia, and contact. Explicit simulation and layered-garment methods can model such dynamics, but they require either a dedicated garment template, which raw multi-view capture does not naturally provide, or a test-time physics simulator with non-trivial runtime cost. A parallel line of work learns data-driven clothing avatars that avoid explicit garment layers. These methods add an auxiliary latent for variation beyond pose; at inference, they fix it, regress it from pose, or retrieve it from training data, without explicitly modeling how the latent evolves with its own dynamics. Additionally, even in everyday motion with loose clothing, existing architectures often struggle to capture fine-grained detail, producing blurry renderings and temporal artifacts. We augment a pose-conditioned 3D Gaussian avatar with a transformer-based decoder and a dynamics residual latent that captures temporal appearance and geometry variation beyond the driving signals. At inference, a learned latent dynamics model evolves the residual latent from a short pose history and the previous latent state. The model decomposes each update into driving, restoring, and dissipative forces, producing temporally coherent, history-dependent rollouts with negligible added cost. Different initial conditions yield diverse yet plausible motion trajectories, and the force decomposition exposes controls such as stiffness. Across nine captured sequences of everyday motion with diverse loose garments, quantitative metrics and a perceptual user study show improved animation quality over recent data-driven baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript augments a pose-conditioned 3D Gaussian avatar with a transformer-based decoder and a dynamics residual latent that captures temporal appearance and geometry variation beyond pose signals. At inference a learned latent dynamics model evolves the residual from a short pose history and prior latent state, decomposing each update into driving, restoring, and dissipative forces to produce history-dependent, temporally coherent rollouts at negligible added cost. Different initial conditions yield diverse yet plausible trajectories, and the force decomposition exposes interpretable controls such as stiffness. Quantitative metrics and a perceptual user study on nine captured sequences of everyday motion with loose garments demonstrate improvements over recent data-driven baselines.
Significance. If the results hold, the work supplies a practical data-driven route to modeling clothing inertia, contact, and history dependence in full-body neural avatars without garment templates or test-time physics. The negligible runtime overhead and the explicit force decomposition are genuine strengths; the latter supplies controllable parameters that prior latent-augmented avatars lack. The evaluation on nine diverse sequences plus a user study is a positive step toward reproducibility, though the overall impact hinges on whether the learned dynamics generalize beyond the captured distribution rather than merely interpolating observed patterns.
major comments (2)
- [§3.2 and §4.2] §3.2 (Inference procedure) and §4.2 (Latent dynamics model): the central claim that a short fixed pose-history window plus previous latent state suffices for plausible long-horizon clothing states is load-bearing, yet the manuscript provides no ablations on history length nor quantitative long-horizon rollout metrics (e.g., drift after 5–10 s of free evolution). Without these, it remains unclear whether the transformer plus force decomposition learns genuine dynamics or merely reproduces patterns within the nine-sequence training distribution.
- [§5.1 and Table 2] §5.1 (Quantitative evaluation) and Table 2: reported gains over baselines are presented without error bars, statistical significance tests, or explicit confirmation that sequence selection and hyper-parameters were fixed before seeing test results. Given the small training corpus, this weakens the claim that the dynamics residual and force decomposition are responsible for the observed improvements rather than post-hoc fitting.
minor comments (2)
- [§3] Notation for the residual latent z_t and the force terms (F_drive, F_restore, F_dissip) should be introduced once in §3 and used consistently in all equations and figures to avoid reader confusion.
- [Figure 4] Figure 4 (force visualization): arrows for restoring and dissipative components would be clearer if accompanied by a short legend or color key directly on the figure rather than only in the caption.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional experiments and clarifications that directly respond to the concerns raised.
read point-by-point responses
-
Referee: [§3.2 and §4.2] §3.2 (Inference procedure) and §4.2 (Latent dynamics model): the central claim that a short fixed pose-history window plus previous latent state suffices for plausible long-horizon clothing states is load-bearing, yet the manuscript provides no ablations on history length nor quantitative long-horizon rollout metrics (e.g., drift after 5–10 s of free evolution). Without these, it remains unclear whether the transformer plus force decomposition learns genuine dynamics or merely reproduces patterns within the nine-sequence training distribution.
Authors: We agree that explicit ablations on history length and quantitative long-horizon metrics would strengthen the central claim. In the revised manuscript we have added an ablation varying the input pose history from 1 to 20 frames and report drift metrics (mean vertex displacement and appearance feature error) for free rollouts of 5 s and 10 s on held-out motion segments. The results show that performance plateaus after approximately 8 frames and that drift remains lower than pose-only and fixed-latent baselines, supporting that the force-decomposed transformer captures history-dependent dynamics rather than simple pattern reproduction within the training distribution. Visualizations of extended rollouts are also included. revision: yes
-
Referee: [§5.1 and Table 2] §5.1 (Quantitative evaluation) and Table 2: reported gains over baselines are presented without error bars, statistical significance tests, or explicit confirmation that sequence selection and hyper-parameters were fixed before seeing test results. Given the small training corpus, this weakens the claim that the dynamics residual and force decomposition are responsible for the observed improvements rather than post-hoc fitting.
Authors: We acknowledge the need for greater statistical rigor given the modest dataset size. The revised Section 5.1 now reports error bars as standard deviation across five independent training runs with different random seeds. We have added paired t-test p-values comparing our method against each baseline in Table 2. We have also clarified in the experimental protocol that all sequence splits, hyper-parameter selections, and evaluation procedures were fixed prior to any test-set evaluation. We further added a limitations paragraph noting that, while the current results are encouraging, broader generalization claims would benefit from larger and more diverse capture corpora. revision: yes
Circularity Check
No circularity: learned dynamics model is externally validated
full rationale
The paper describes a data-driven architecture that augments a pose-conditioned 3D Gaussian avatar with a transformer decoder and a residual latent evolved by a learned dynamics model. All central claims (temporally coherent rollouts, history-dependent behavior, force decomposition) are obtained by training on captured sequences and evaluating against external baselines plus a user study. No derivation step reduces by construction to its own fitted inputs, no self-citation is invoked as a uniqueness theorem, and no ansatz or renaming is presented as an independent result. The approach is self-contained against held-out motion data and perceptual metrics, satisfying the criteria for a non-circular, externally falsifiable contribution.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A short pose history plus previous latent state suffices to predict future clothing dynamics
invented entities (2)
-
dynamics residual latent
no independent evidence
-
driving, restoring, and dissipative forces
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The model decomposes each update into driving, restoring, and dissipative forces... a_t = (Fpose,t − Fdamping,t − Fspring,t) ⊘ m_t ... spring-damper ordinary differential equation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Srinivasan and Matthew Tancik and Jonathan T
Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng , booktitle=ECCV, year=
-
[2]
Shaofei Wang and Katja Schwarz and Andreas Geiger and Siyu Tang , booktitle=ECCV, year=
-
[3]
Chung-Yi Weng and Brian Curless and Pratul P. Srinivasan and Jonathan T. Barron and Ira Kemelmacher-Shlizerman , booktitle=CVPR, year=
-
[4]
Sida Peng and Yuanqing Zhang and Yinghao Xu and Qianqian Wang and Qing Shuai and Hujun Bao and Xiaowei Zhou , booktitle=CVPR, year=
-
[5]
Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies , author=. 2021 , pages=
work page 2021
-
[6]
Animatable Implicit Neural Representations for Creating Realistic Avatars From Videos , author=. 2022 , volume=
work page 2022
-
[7]
Black and Otmar Hilliges and Andreas Geiger , booktitle=ICCV, year=
Xu Chen and Yufeng Zheng and Michael J. Black and Otmar Hilliges and Andreas Geiger , booktitle=ICCV, year=
-
[8]
Fast-SNARF: A Fast Deformer for Articulated Neural Fields , author=. 2022 , volume=
work page 2022
-
[9]
Tianjian Jiang and Xu Chen and Jie Song and Otmar Hilliges , booktitle=CVPR, year=
-
[10]
NeuMan: Neural Human Radiance Field from a Single Video , author=. ArXiv , year=
- [11]
- [12]
-
[13]
Zhiyin Qian and Shaofei Wang and Marko Mihajlovic and Andreas Geiger and Siyu Tang , booktitle=CVPR, year=
-
[14]
Shoukang Hu and Ziwei Liu , booktitle=CVPR, year=
-
[15]
Liangxiao Hu and Hongwen Zhang and Yuxiang Zhang and Boyao Zhou and Boning Liu and Shengping Zhang and Liqiang Nie , booktitle=CVPR, year=
-
[16]
Shunyuan Zheng and Boyao Zhou and Ruizhi Shao and Boning Liu and Shengping Zhang and Liqiang Nie and Yebin Liu , booktitle=CVPR, year=
-
[17]
Xinqi Liu and Chenming Wu and Jialun Liu and Xing Liu and Jinbo Wu and Chen Zhao and Haocheng Feng and Errui Ding and Jingdong Wang , journal=. 2024 , volume=
work page 2024
-
[18]
Gyeongsik Moon and Takaaki Shiratori and Shunsuke Saito , booktitle=ECCV, year=. Expressive Whole-Body
-
[19]
Zhe Li and Zerong Zheng and Lizhen Wang and Yebin Liu , booktitle=CVPR, year=. Animatable
-
[20]
Mengtian Li and Shengxiang Yao and Zhifeng Xie and Keyu Chen , journal=. 2024 , volume=
work page 2024
-
[21]
Guanjun Wu and Taoran Yi and Jiemin Fang and Lingxi Xie and Xiaopeng Zhang and Wei Wei and Wenyu Liu and Qi Tian and Xinggang Wang , booktitle=CVPR, year=
-
[22]
Xiaoyu Zhou and Zhiwei Lin and Xiaojun Shan and Yongtao Wang and Deqing Sun and Ming-Hsuan Yang , booktitle=CVPR, year=
-
[23]
Hongyu Zhou and Jiahao Shao and Lu Xu and Dongfeng Bai and Weichao Qiu and Bingbing Liu and Yue Wang and Andreas Geiger and Yiyi Liao , booktitle=CVPR, year=
-
[24]
Kocabas, Muhammed and Chang, Jen-Hao Rick and Gabriel, James and Tuzel, Oncel and Ranjan, Anurag , booktitle=CVPR, pages=
- [25]
-
[26]
Within the Dynamic Context: Inertia-aware
Yutong Chen and Yifan Zhan and Zhihang Zhong and Wei Wang and Xiao Sun and Yu Qiao and Yinqiang Zheng , booktitle=ECCV, year=. Within the Dynamic Context: Inertia-aware
-
[27]
Xu, Wangze and Zhan, Yifan and Zhong, Zhihang and Sun, Xiao , booktitle=ICCV, year=. Sequential
-
[28]
Zero-Shot Reconstruction of Animatable
Kwon, Joohyun and Sim, Geonhee and Moon, Gyeongsik , booktitle=CVPR, year=. Zero-Shot Reconstruction of Animatable
-
[29]
Yahui Li and Zhi Zeng and Liming Pang and Guixuan Zhang and Shuwu Zhang , journal=. 2025 , volume=
work page 2025
-
[30]
Xiang Deng and Zerong Zheng and Yuxiang Zhang and Jingxiang Sun and Chao Xu and Xiaodong Yang and Lizhen Wang and Yebin Liu , booktitle=CVPR, year=
-
[31]
Guibas and Gordon Wetzstein , booktitle=ECCV, year=
Yang Zheng and Qingqing Zhao and Guandao Yang and Wang Yifan and Donglai Xiang and Florian Dubost and Dmitry Lagun and Thabo Beeler and Federico Tombari and Leonidas J. Guibas and Gordon Wetzstein , booktitle=ECCV, year=
-
[32]
Yifan Zhan and Qingtian Zhu and Muyao Niu and Mingze Ma and Jiancheng Zhao and Zhihang Zhong and Xiao Sun and Yu Qiao and Yinqiang Zheng , booktitle=ICCV, year=
-
[33]
Yifan Zhan and Wangze Xu and Qingtian Zhu and Muyao Niu and Mingze Ma and Yifei Liu and Zhihang Zhong and Xiao Sun and Yinqiang Zheng , journal=. 2025 , volume=
work page 2025
-
[34]
ACM SIGGRAPH 2005 Papers , year=
SCAPE: shape completion and animation of people , author=. ACM SIGGRAPH 2005 Papers , year=
work page 2005
-
[35]
Spring conference on Computer graphics , year=
Dynamic skinning: adding real-time dynamic effects to an existing character animation , author=. Spring conference on Computer graphics , year=
-
[36]
Computer Graphics Forum , year=
Velocity Skinning for Real‐time Stylized Skeletal Animation , author=. Computer Graphics Forum , year=
-
[37]
Hirshberg and Alexander Weiss and Michael J
Peng Guan and Loretta Reiss and David A. Hirshberg and Alexander Weiss and Michael J. Black , journal=TOG, year=
-
[38]
Lee, Changmin and Lee, Jihyun and Kim, Tae-Kyun , booktitle=NIPS, year=
-
[39]
Heming Zhu and Guoxing Sun and Christian Theobalt and Marc Habermann , journal=. 2025 , volume=
work page 2025
-
[40]
Pons-Moll, Gerard and Romero, Javier and Mahmood, Naureen and Black, Michael J. , title =. 2015 , issue_date =. doi:10.1145/2766993 , journal = TOG, month = jul, articleno =
-
[41]
and Li, Junxuan and Agrawal, Vasu and Prada, Fabi
Wang, Shaofei and Simon, Tomas and Santesteban, Igor and Bagautdinov, Timur M. and Li, Junxuan and Agrawal, Vasu and Prada, Fabi. Relightable Full-Body. ACM SIGGRAPH 2025 Conference Papers , year =
work page 2025
-
[42]
Wei Cheng and Ruixiang Chen and Wanqi Yin and Siming Fan and Keyu Chen and Honglin He and Huiwen Luo and Zhongang Cai and Jingbo Wang and Yangmin Gao and Zhengming Yu and Zhengyu Lin and Daxuan Ren and Lei Yang and Ziwei Liu and Chen Change Loy and Chen Qian and Wayne Wu and Dahua Lin and Bo Dai and Kwan-Yee Lin , booktitle=ICCV, year=
-
[43]
Zhangyang Xiong and Chenghong Li and Kenkun Liu and Hongjie Liao and Jianqiao Hu and Junyi Zhu and Shuliang Ning and Lingteng Qiu and Chongjie Wang and Shijie Wang and Shuguang Cui and Xiaoguang Han , booktitle=CVPR, year=
- [44]
-
[45]
Junxuan Li and Chen Cao and Gabriel Schwartz and Rawal Khirodkar and Christian Richardt and Tomas Simon and Yaser Sheikh and Shunsuke Saito , journal=. 2024 , url=
work page 2024
-
[46]
Zhaoxi Chen and Gyeongsik Moon and Kaiwen Guo and Chen Cao and Stanislav Pidhorskyi and Tomas Simon and R. Joshi and Yuan Dong and Yichen Xu and Bernardo Pires and He Wen and Lucas Evans and Bo Peng and Julia Buffalini and Autumn Trimble and Kevyn McPhail and Melissa Schoeller and Shoou-I Yu and Javier Romero and Michael Zollh. 2024 , pages=
work page 2024
-
[47]
On Aliased Resizing and Surprising Subtleties in
Parmar, Gaurav and Zhang, Richard and Zhu, Jun-Yan , booktitle=CVPR, year=. On Aliased Resizing and Surprising Subtleties in
-
[48]
Sutherland and Michal Arbel and Arthur Gretton , booktitle=ICLR, year=
Mikolaj Binkowski and Danica J. Sutherland and Michal Arbel and Arthur Gretton , booktitle=ICLR, year=. Demystifying
-
[49]
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , author=. 2018 , pages=
work page 2018
-
[50]
Andrew Jaegle and Sebastian Borgeaud and Jean
-
[51]
Jiahui Lei and Yufu Wang and Georgios Pavlakos and Lingjie Liu and Kostas Daniilidis , booktitle=CVPR, year=
-
[52]
Embedded deformation for shape manipulation , author=. TOG , volume=. 2007 , publisher=
work page 2007
-
[53]
Video-based characters: creating new human performances from a multi-view video database , author=. TOG , volume=. 2011 , publisher=
work page 2011
-
[54]
Real-time deep dynamic characters , author=. TOG , volume=. 2021 , publisher=
work page 2021
-
[55]
Video-based reconstruction of animatable human characters , author=. TOG , volume=. 2010 , publisher=
work page 2010
-
[56]
Modeling clothing as a separate layer for an animatable human avatar , author=
-
[57]
Dressing Avatars: Deep Photorealistic Appearance for Physically Simulated Clothing , author=
-
[58]
Pattern-Based Cloth Registration and Sparse-View Animation , author=. TOG , volume=. 2022 , publisher=
work page 2022
-
[59]
ACM SIGGRAPH 2016 Courses , year=
The material point method for simulating continuum materials , author=. ACM SIGGRAPH 2016 Courses , year=
work page 2016
-
[60]
XPBD: position-based simulation of compliant constrained dynamics , year =
Macklin, Miles and M\". XPBD: position-based simulation of compliant constrained dynamics , year =
- [61]
- [62]
-
[63]
Khirodkar, Rawal and Bagautdinov, Timur and Martinez, Julieta and Zhaoen, Su and James, Austin and Selednik, Peter and Anderson, Stuart and Saito, Shunsuke , booktitle=ECCV, year=
-
[64]
Smpl: a skinned multi-person linear model,
Loper, Matthew and Mahmood, Naureen and Romero, Javier and Pons-Moll, Gerard and Black, Michael J. , title =. ACM Trans. Graph. , month = nov, articleno =. 2015 , issue_date =. doi:10.1145/2816795.2818013 , abstract =
-
[65]
IEEE Transactions on Visualization and Computer Graphics , month = jul, pages =
Zwicker, Matthias and Pfister, Hanspeter and van Baar, Jeroen and Gross, Markus , title =. IEEE Transactions on Visualization and Computer Graphics , month = jul, pages =. 2002 , issue_date =. doi:10.1109/TVCG.2002.1021576 , abstract =
- [66]
-
[67]
Kingma and Jimmy Ba , booktitle=ICLR, year=
Diederik P. Kingma and Jimmy Ba , booktitle=ICLR, year=
-
[68]
Decoupled Weight Decay Regularization , author=
- [69]
-
[70]
and Hilliges, Otmar , booktitle = CVPR, year =
Grigorev, Artur and Thomaszewski, Bernhard and Black, Michael J. and Hilliges, Otmar , booktitle = CVPR, year =
-
[71]
and Thuerey, Nils and Casas, Dan , booktitle = NIPS, year =
Santesteban, Igor and Otaduy, Miguel A. and Thuerey, Nils and Casas, Dan , booktitle = NIPS, year =
-
[72]
ACM SIGGRAPH 2022 Conference Proceedings , year =
Predicting Loose-Fitting Garment Deformations Using Bone-Driven Motion Networks , author =. ACM SIGGRAPH 2022 Conference Proceedings , year =
work page 2022
-
[73]
Lu, Zixuan and Liu, Ziheng and Lan, Lei and Wang, Huamin and Ishiwaka, Yuko and Jiang, Chenfanfu and Wu, Kui and Yang, Yin , journal =. High-performance. 2025 , doi =
work page 2025
-
[74]
ACM SIGGRAPH 2025 Conference Papers , year =
Physics-inspired Estimation of Optimal Cloth Mesh Resolution , author =. ACM SIGGRAPH 2025 Conference Papers , year =
work page 2025
-
[75]
Zheng, Shunyuan and Zhou, Boyao and Shao, Ruizhi and Liu, Boning and Zhang, Shengping and Nie, Liqiang and Liu, Yebin , booktitle = CVPR, year =
-
[76]
ACM Transactions on Graphics (SIGGRAPH) , volume =
I. ACM Transactions on Graphics (SIGGRAPH) , volume =. 2023 , doi =
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.