LIMMT: Less is More for Motion Tracking
Pith reviewed 2026-06-27 22:11 UTC · model grok-4.3
The pith
Training on under 3% of high-quality motion data outperforms the full AMASS dataset for humanoid tracking policies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Motion data selected according to physics feasibility, diversity, and complexity allows small subsets to guide humanoid tracking policies to better optimization trajectories than the full AMASS dataset, establishing the first data-centric approach for this task.
What carries the argument
Three-dimensional quality metric that scores motion clips for physics feasibility, diversity, and complexity to select data yielding superior policy optimization trajectories.
If this is right
- Policies trained on quality-selected subsets reach higher tracking accuracy with less data.
- The selection method improves performance on both curated AMASS motions and noisy web-sourced motion capture data.
- Early-stage optimization trajectories improve when low-quality clips are removed rather than retained.
- Dataset size alone does not determine tracking performance when quality criteria are applied.
Where Pith is reading between the lines
- Data selection of this form could lower the compute required to train effective humanoid controllers.
- Similar quality filters might improve results in other imitation-learning settings beyond tracking.
- Many large motion datasets may contain substantial portions that slow rather than help policy learning.
Load-bearing premise
The three quality dimensions correctly identify motion clips that produce superior optimization trajectories for tracking policies, and experiments isolate the effect of data selection from other training factors.
What would settle it
An experiment that trains identical policies on the full AMASS dataset with the same hyperparameters and compute budget and obtains equal or better tracking performance than the quality-selected 3% subset would falsify the central claim.
read the original abstract
We argue that high-quality motion data can steer tracking policies toward better optimization trajectories early in training. In this work, we introduce LIMMT (Less Is More for Motion Tracking). To our knowledge, this is the first data-centric study for physics-based humanoid motion tracking. We go beyond simply removing low-quality and erroneous clips, but define motion data quality through three dimensions: physics feasibility, diversity, and complexity. We show that even training with under 3% of AMASS yields better tracking performance than training with the full dataset. We further conduct data cleaning on the estimated web-sourced mocap data. Extensive experiments and analyses validate the effectiveness of our framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces LIMMT, a data-centric framework for physics-based humanoid motion tracking. It defines motion data quality along three dimensions (physics feasibility, diversity, complexity) and claims that training tracking policies on a curated subset of under 3% of AMASS yields better performance than training on the full dataset. The work also applies data cleaning to estimated web-sourced mocap data and validates the approach through experiments and analyses.
Significance. If the central empirical result holds under properly controlled conditions, the finding would demonstrate that targeted data selection can outperform scale in motion tracking, with implications for more efficient training of humanoid policies. The approach is grounded in held-out tracking metrics rather than circular definitions, providing a falsifiable empirical basis.
major comments (2)
- [Abstract] Abstract: The headline claim that training with under 3% of AMASS outperforms the full dataset is load-bearing for the contribution, yet the description provides no confirmation that total optimization effort (epochs, gradient steps per epoch, or learning-rate schedules) was equalized between the subset and full-dataset runs; without this, performance differences cannot be attributed solely to the three quality dimensions.
- [Abstract] The three quality dimensions (physics feasibility, diversity, complexity) are presented as correctly identifying motions that produce superior optimization trajectories, but the manuscript does not report whether hyperparameter tuning or random-seed averaging was performed identically for both conditions, leaving open confounding factors in the subset-vs-full comparison.
minor comments (1)
- [Abstract] The claim 'to our knowledge, this is the first data-centric study' should be supported by a brief literature review in the introduction rather than left as an abstract statement.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on ensuring fair experimental comparisons. We provide clarifications below and will update the manuscript to address these concerns.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claim that training with under 3% of AMASS outperforms the full dataset is load-bearing for the contribution, yet the description provides no confirmation that total optimization effort (epochs, gradient steps per epoch, or learning-rate schedules) was equalized between the subset and full-dataset runs; without this, performance differences cannot be attributed solely to the three quality dimensions.
Authors: The training protocol was identical for both the curated subset and the full AMASS dataset, including the same number of epochs, gradient steps per epoch, and learning-rate schedules. This ensures that performance differences can be attributed to the data quality dimensions. We will explicitly state this in the revised abstract and experimental setup section. revision: yes
-
Referee: [Abstract] The three quality dimensions (physics feasibility, diversity, complexity) are presented as correctly identifying motions that produce superior optimization trajectories, but the manuscript does not report whether hyperparameter tuning or random-seed averaging was performed identically for both conditions, leaving open confounding factors in the subset-vs-full comparison.
Authors: Hyperparameters were tuned using the same procedure for both conditions, and all reported results are averaged over the same set of random seeds. We will add this information to the manuscript to confirm the comparisons are controlled. revision: yes
Circularity Check
No circularity: empirical data-selection result grounded in held-out metrics
full rationale
The paper is an empirical study that curates a motion subset via three quality dimensions and reports superior tracking performance on held-out metrics when training on <3% of AMASS versus the full set. No equations, fitted parameters, or derivations are present that reduce the reported gains to the quality definitions by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The central claim rests on experimental comparison rather than any of the enumerated circular patterns, making the result self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025
Zixuan Chen, Mazeyu Ji, Xuxin Cheng, Xuanbin Peng, Xue Bin Peng, and Xiaolong Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025
arXiv 2025
-
[2]
Expressive whole-body control for humanoid robots.arXiv preprint arXiv:2402.16796, 2024
Xuxin Cheng, Yandong Ji, Junming Chen, Ruihan Yang, Ge Yang, and Xiaolong Wang. Expressive whole-body control for humanoid robots.arXiv preprint arXiv:2402.16796, 2024
arXiv 2024
-
[3]
Christiano, Jan Leike, Tom B
Paul F. Christiano, Jan Leike, Tom B. Brown, Mil- jan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. In Adv. Neural Inform. Process. Syst., 2017
2017
-
[4]
Synchronized human-humanoid motion imitation.IEEE Robotics and Automation Letters, 8(7):4155–4162, 2023
Antonin Dallard, Mehdi Benallegue, Fumio Kane- hiro, and Abderrahmane Kheddar. Synchronized human-humanoid motion imitation.IEEE Robotics and Automation Letters, 8(7):4155–4162, 2023
2023
-
[5]
Go to zero: Towards zero-shot motion generation with million-scale data
Ke Fan, Shunlin Lu, Minyue Dai, Runyi Yu, Lixing Xiao, Zhiyang Dou, Junting Dong, Lizhuang Ma, and Jingbo Wang. Go to zero: Towards zero-shot motion generation with million-scale data. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision, pages 13336–13348, 2025
2025
-
[6]
Humanplus: Humanoid shadowing and imitation from humans.arXiv preprint arXiv:2406.10454, 2024
Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wet- zstein, and Chelsea Finn. Humanplus: Humanoid shadowing and imitation from humans.arXiv preprint arXiv:2406.10454, 2024
arXiv 2024
-
[7]
Robust motion in-betweening
Félix G Harvey, Mike Yurick, Derek Nowrouzezahrai, and Christopher Pal. Robust motion in-betweening. ACM Transactions on Graphics (TOG), 39(4):60–1, 2020
2020
-
[8]
Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, and Guanya Shi. Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoper- ation and learning.arXiv preprint arXiv:2406.08858, 2024
arXiv 2024
-
[9]
Exbody2: Advanced expressive humanoid whole- body control.arXiv preprint arXiv:2412.13196, 2024
Mazeyu Ji, Xuanbin Peng, Fangchen Liu, Jialong Li, Ge Yang, Xuxin Cheng, and Xiaolong Wang. Exbody2: Advanced expressive humanoid whole- body control.arXiv preprint arXiv:2412.13196, 2024
arXiv 2024
-
[10]
Jeonghwan Kim, Wontaek Kim, Yidan Lu, Jin Cheng, Fatemeh Zargarbashi, Zicheng Zeng, Zekun Qi, Zhiyang Dou, Nitish Sontakke, Donghoon Baek, et al. Switch-justdance: Benchmarking whole body motion tracking policies using a commercial console game.arXiv preprint arXiv:2511.17925, 2025
Pith/arXiv arXiv 2025
-
[11]
Phuma: Physically-grounded humanoid loco- motion dataset.arXiv preprint arXiv:2510.26236, 2025
Kyungmin Lee, Sibeen Kim, Minho Park, Hyunse- ung Kim, Dongyoon Hwang, Hojoon Lee, and Jaegul Choo. Phuma: Physically-grounded humanoid loco- motion dataset.arXiv preprint arXiv:2510.26236, 2025
Pith/arXiv arXiv 2025
-
[12]
Object motion guided human motion synthesis.ACM Trans- actions on Graphics (TOG), 42(6):1–11, 2023
Jiaman Li, Jiajun Wu, and C Karen Liu. Object motion guided human motion synthesis.ACM Trans- actions on Graphics (TOG), 42(6):1–11, 2023
2023
-
[13]
Yixuan Li, Yutang Lin, Jieming Cui, Tengyu Liu, Wei Liang, Yixin Zhu, and Siyuan Huang. Clone: Closed-loop whole-body humanoid teleoperation for long-horizon tasks.arXiv preprint arXiv:2506.08931, 2025
arXiv 2025
-
[14]
Motion-x: A large-scale 3d expressive whole-body human motion dataset.Advances in Neural Infor- mation Processing Systems, 36:25268–25280, 2023
Jing Lin, Ailing Zeng, Shunlin Lu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, and Lei Zhang. Motion-x: A large-scale 3d expressive whole-body human motion dataset.Advances in Neural Infor- mation Processing Systems, 36:25268–25280, 2023
2023
-
[15]
Perpetual humanoid control for real-time simulated avatars
Zhengyi Luo, Jinkun Cao, Kris Kitani, Weipeng Xu, et al. Perpetual humanoid control for real-time simulated avatars. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10895–10904, 2023
2023
-
[16]
Zhengyi Luo, Jinkun Cao, Josh Merel, Alexander Winkler, Jing Huang, Kris Kitani, and Weipeng Xu. Universal humanoid motion representa- tions for physics-based control.arXiv preprint arXiv:2310.04582, 2023
arXiv 2023
-
[17]
Zhengyi Luo, Ye Yuan, Tingwu Wang, Chen- ran Li, Sirui Chen, Fernando Castañeda, Zi-Ang Cao, Jiefeng Li, David Minor, Qingwei Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025
Pith/arXiv arXiv 2025
-
[18]
Amass: Archive of motion capture as surface shapes
Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Gerard Pons-Moll, and Michael J Black. Amass: Archive of motion capture as surface shapes. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 5442–5451, 2019
2019
-
[19]
Jiageng Mao, Siheng Zhao, Siqi Song, Tianheng Shi, Junjie Ye, Mingtong Zhang, Haoran Geng, Jitendra Malik, Vitor Guizilini, and Yue Wang. Learning from massive human videos for universal humanoid pose control.arXiv preprint arXiv:2412.14172, 2024
arXiv 2024
-
[20]
Deepmimic: Example-guided deep reinforcement learning of physics-based charac- ter skills.ACM Transactions On Graphics (TOG), 37(4):1–14, 2018
Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel Van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based charac- ter skills.ACM Transactions On Graphics (TOG), 37(4):1–14, 2018
2018
-
[21]
Amp: Adversarial motion priors for stylized physics-based character control
Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. Amp: Adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics (ToG), 40(4):1–20, 2021
2021
-
[22]
Shapellm: Universal 3d object understand- ing for embodied interaction
Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, Li Yi, and Kaisheng Ma. Shapellm: Universal 3d object understand- ing for embodied interaction. InComputer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XLIII, volume 15101 ofLecture Notes in Com- puter Science, pages 2...
2024
-
[23]
Zekun Qi, Wenyao Zhang, Yufei Ding, Runpei Dong, Xinqiang Yu, Jingwen Li, Lingyun Xu, Baoyu Li, Xialin He, Guofan Fan, Jiazhao Zhang, Jiawei He, Jiayuan Gu, Xin Jin, Kaisheng Ma, Zhizheng Zhang, He Wang, and Li Yi. Sofar: Language-grounded orientation bridges spatial reasoning and object manipulation.CoRR, abs/2502.13143, 2025. doi: 10.48550/ARXIV.2502.13...
-
[24]
Humanoid generative pre- training for zero-shot motion tracking
Zekun Qi, Xuchuan Chen, Jilong Wang, Chenghuai Lin, Yunrui Lian, Wenyao Zhang, Xinqiang Yu, He Wang, and Li Yi. Humanoid generative pre- training for zero-shot motion tracking. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20834–20844, 2026
2026
-
[25]
Yuzhe Qin, Wei Yang, Binghao Huang, Karl Van Wyk, Hao Su, Xiaolong Wang, Yu-Wei Chao, and Dieter Fox. Anyteleop: A general vision- based dexterous robot arm-hand teleoperation sys- tem.arXiv preprint arXiv:2307.04577, 2023
arXiv 2023
-
[26]
Physcap: Physically plausi- ble monocular 3d motion capture in real time.ACM Transactions on Graphics (ToG), 39(6):1–16, 2020
Soshi Shimada, Vladislav Golyanik, Weipeng Xu, and Christian Theobalt. Physcap: Physically plausi- ble monocular 3d motion capture in real time.ACM Transactions on Graphics (ToG), 39(6):1–16, 2020
2020
-
[27]
Wham: Reconstructing world-grounded hu- mans with accurate 3d motion
Soyong Shin, Juyong Kim, Eni Halilaj, and Michael J Black. Wham: Reconstructing world-grounded hu- mans with accurate 3d motion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2070–2080, 2024
2070
-
[28]
Deepphase: Periodic autoencoders for learning mo- tion phase manifolds.ACM Transactions on Graph- ics (ToG), 41(4):1–13, 2022
Sebastian Starke, Ian Mason, and Taku Komura. Deepphase: Periodic autoencoders for learning mo- tion phase manifolds.ACM Transactions on Graph- ics (ToG), 41(4):1–13, 2022
2022
-
[29]
Vla-jepa: Enhancing vision- language-action model with latent world model
Jingwen Sun, Wenyao Zhang, Zekun Qi, Shaojie Ren, Zezhi Liu, Hanxin Zhu, Guangzhong Sun, Xin Jin, and Zhibo Chen. Vla-jepa: Enhancing vision- language-action model with latent world model. arXiv preprint arXiv:2602.10098, 2026
arXiv 2026
-
[30]
Weiji Xie, Jinrui Han, Jiakun Zheng, Huanyu Li, Xinzhe Liu, Jiyuan Shi, Weinan Zhang, Chenjia Bai, and Xuelong Li. Kungfubot: Physics-based humanoid whole-body control for learning highly- dynamic skills.arXiv preprint arXiv:2506.12851, 2025
arXiv 2025
-
[31]
Iterative preference learning from human feedback: Bridging theory and practice for RLHF under kl- constraint
Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, and Tong Zhang. Iterative preference learning from human feedback: Bridging theory and practice for RLHF under kl- constraint. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024
2024
-
[32]
Collision-free humanoid traversal in cluttered indoor scenes.arXiv preprint arXiv:2601.16035, 2026
Han Xue, Sikai Liang, Zhikai Zhang, Zicheng Zeng, Yun Liu, Yunrui Lian, Jilong Wang, Qingtao Liu, Xuesong Shi, and Li Yi. Collision-free humanoid traversal in cluttered indoor scenes.arXiv preprint arXiv:2601.16035, 2026
arXiv 2026
-
[33]
Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C Karen Liu, Rocky Duan, and Guanya Shi. Omnire- target: Interaction-preserving data generation for humanoid whole-body loco-manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025
Pith/arXiv arXiv 2025
-
[34]
Kangning Yin, Weishuai Zeng, Ke Fan, Minyue Dai, Zirui Wang, Qiang Zhang, Zheng Tian, Jingbo Wang, Jiangmiao Pang, and Weinan Zhang. Unitracker: Learning universal whole-body motion tracker for humanoid robots.arXiv preprint arXiv:2507.07356, 2025
arXiv 2025
-
[35]
Twist: Teleoperated whole-body imitation system
Yanjie Ze, Zixuan Chen, JoÃG, o Pedro AraÚjo, Zi- ang Cao, Xue Bin Peng, Jiajun Wu, and C Karen Liu. Twist: Teleoperated whole-body imitation system. arXiv preprint arXiv:2505.02833, 2025
arXiv 2025
-
[36]
Yanjie Ze, Siheng Zhao, Weizhuo Wang, Angjoo Kanazawa, Rocky Duan, Pieter Abbeel, Guanya Shi, Jiajun Wu, and C Karen Liu. Twist2: Scalable, portable, and holistic humanoid data collection sys- tem.arXiv preprint arXiv:2511.02832, 2025
arXiv 2025
-
[37]
Dreamvla: A vision-language-action model dreamed with comprehensive world knowledge
WenyaoZhang, HongsiLiu, ZekunQi, YunnanWang, Xinqiang Yu, Jiazhao Zhang, Runpei Dong, Jiawei He, He Wang, Zhizheng Zhang, Li Yi, Wenjun Zeng, and Xin Jin. Dreamvla: A vision-language-action model dreamed with comprehensive world knowledge. CoRR, abs/2507.04447, 2025
Pith/arXiv arXiv 2025
-
[38]
Wenyao Zhang, Bozhou Zhang, Zekun Qi, Wenjun Zeng, Xin Jin, and Li Zhang. Disentangled robot learning via separate forward and inverse dynamics pretraining.arXiv preprint arXiv:2604.16391, 2026
Pith/arXiv arXiv 2026
-
[39]
Yuhong Zhang, Jing Lin, Ailing Zeng, Guanlin Wu, ShunlinLu, YurongFu, YuanhaoCai, RuimaoZhang, Haoqian Wang, and Lei Zhang. Motion-x++: A large-scale multimodal 3d whole-body human motion dataset.arXiv preprint arXiv:2501.05098, 2025
arXiv 2025
-
[40]
Freemotion: Mocap-free human motion synthesis with multimodal large language models
Zhikai Zhang, Yitang Li, Haofeng Huang, Mingx- ian Lin, and Li Yi. Freemotion: Mocap-free human motion synthesis with multimodal large language models. InEuropean Conference on Computer Vi- sion, pages 403–421. Springer, 2024
2024
-
[41]
Zhikai Zhang, Chao Chen, Han Xue, Jilong Wang, Sikai Liang, Yun Liu, Zongzhang Zhang, He Wang, and Li Yi. Unleashing humanoid reaching poten- tial via real-world-ready skill space.arXiv preprint arXiv:2505.10918, 2025
arXiv 2025
-
[42]
Track any motions under any disturbances.arXiv preprint arXiv:2509.13833, 2025
Zhikai Zhang, Jun Guo, Chao Chen, Jilong Wang, Chenghuai Lin, Yunrui Lian, Han Xue, Zhenrong Wang, Maoqi Liu, Jiangran Lyu, et al. Track any motions under any disturbances.arXiv preprint arXiv:2509.13833, 2025
arXiv 2025
-
[43]
Zhikai Zhang, Haofei Lu, Yunrui Lian, Ziqing Chen, Yun Liu, Chenghuai Lin, Han Xue, Zicheng Zeng, Zekun Qi, Shaolin Zheng, et al. Learning athletic humanoid tennis skills from imperfect human motion data.arXiv preprint arXiv:2603.12686, 2026. A Implementation Details A.1 Domain Randomization To improve sim-to-real transfer and policy robustness, we apply ...
arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.