One-shot Adaptation of Humanoid Whole-body Motion with Walking Priors
Pith reviewed 2026-05-18 03:53 UTC · model grok-4.3
The pith
Order-preserving optimal transport lets a walking-trained humanoid model adapt to any new whole-body motion from one target sample plus auxiliary walks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that order-preserving optimal transport distances between walking and non-walking sequences, followed by geodesic interpolation to produce intermediate pose skeletons, yield configurations that remain useful after collision optimization and retargeting, enabling effective reinforcement-learning policy adaptation from a single non-walking target sample together with auxiliary walking motions and a walking-trained base model.
What carries the argument
Order-preserving optimal transport that computes distances between walking and non-walking sequences to generate intermediate pose skeletons via geodesic interpolation.
If this is right
- A new whole-body motion can be learned from only one non-walking sample plus walking auxiliaries instead of multiple samples.
- The generated policies consistently outperform baseline adaptation methods across standard motion quality metrics on the CMU MoCap dataset.
- Collision-free optimization followed by retargeting produces skeletons that integrate directly into simulated environments for reinforcement learning.
- The walking-trained base model serves as a reusable prior that supports adaptation to diverse non-walking targets.
Where Pith is reading between the lines
- The same transport-based interpolation might reduce sample needs when adapting policies across different humanoid morphologies or hardware platforms.
- If the generated skeletons transfer well to real robots, the method could shorten the gap between motion capture and deployed behaviors in unstructured environments.
- Combining this one-shot adaptation with online feedback from physical trials could enable continual improvement without retraining from scratch.
Load-bearing premise
The intermediate skeletons created by order-preserving optimal transport remain useful after collision optimization, retargeting to the humanoid, and reinforcement-learning policy training.
What would settle it
Run the full pipeline on CMU MoCap non-walking motions and measure whether the resulting policies achieve lower success rates or higher error metrics than the reported baselines in simulation trials.
Figures
read the original abstract
Whole-body humanoid motion represents a fundamental challenge in robotics, requiring balance, coordination, and adaptability to enable human-like behaviors. However, existing methods typically require multiple training samples per motion, rendering the collection of high-quality human motion datasets both labor-intensive and costly. To address this, we propose a data-efficient adaptation approach that learns a new humanoid motion from a single non-walking target sample together with auxiliary walking motions and a walking-trained base model. The core idea lies in leveraging order-preserving optimal transport to compute distances between walking and non-walking sequences, followed by interpolation along geodesics to generate new intermediate pose skeletons, which are then optimized for collision-free configurations and retargeted to the humanoid before integration into a simulated environment for policy adaptation via reinforcement learning. Experimental evaluations on the CMU MoCap dataset demonstrate that our method consistently outperforms baselines, achieving superior performance across metrics. Our code is available at: https://github.com/hhuang-code/One-shot-WBM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce a data-efficient one-shot adaptation method for humanoid whole-body motions. It uses a single non-walking target sample together with auxiliary walking motions and a walking-trained base model; order-preserving optimal transport computes distances between sequences, geodesic interpolation generates intermediate pose skeletons, and these are collision-optimized, retargeted to the humanoid, and fed into reinforcement-learning policy adaptation in simulation. Experiments on the CMU MoCap dataset are reported to show consistent outperformance over baselines.
Significance. If the central empirical claims hold after verification, the work could meaningfully reduce data-collection costs for diverse humanoid behaviors, a practical bottleneck in robotics. The pipeline that combines order-preserving OT interpolation with a pre-trained walking prior and RL adaptation is a coherent technical contribution. Public release of the code at the cited GitHub repository is a clear strength that supports reproducibility.
major comments (2)
- [§3.2, Eq. (4)] §3.2, Eq. (4): the assumption that order-preserving OT geodesic interpolation between walking and non-walking sequences produces kinematically plausible intermediate skeletons that survive collision optimization and retargeting is load-bearing for the one-shot claim, yet the manuscript supplies no quantitative validation (e.g., joint-limit violation rates, velocity smoothness, or distribution distance to target motion) on the post-optimization intermediates themselves.
- [Experimental evaluations] Experimental section: the abstract states that the method 'consistently outperforms baselines, achieving superior performance across metrics,' but the provided description contains no numerical results, baseline definitions, error bars, or ablation studies; without these the strength of the performance claim cannot be assessed.
minor comments (1)
- Clarify in the method description how sequence-length differences between walking and non-walking motions are handled during OT alignment and whether timing or support-phase information is explicitly preserved after retargeting.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.2, Eq. (4)] §3.2, Eq. (4): the assumption that order-preserving OT geodesic interpolation between walking and non-walking sequences produces kinematically plausible intermediate skeletons that survive collision optimization and retargeting is load-bearing for the one-shot claim, yet the manuscript supplies no quantitative validation (e.g., joint-limit violation rates, velocity smoothness, or distribution distance to target motion) on the post-optimization intermediates themselves.
Authors: We agree that direct quantitative validation of the interpolated and optimized intermediate skeletons is valuable for supporting the central assumption behind the one-shot claim. While the downstream task success rates provide indirect support, we will revise §3.2 to include explicit metrics on the post-optimization and retargeted intermediates. These will comprise joint-limit violation rates (percentage of poses with any joint exceeding limits), velocity smoothness (mean squared jerk across the sequence), and distribution distance to the target motion (using Fréchet distance on pose embeddings). The added analysis will report these values before and after collision optimization to demonstrate plausibility. revision: yes
-
Referee: [Experimental evaluations] Experimental section: the abstract states that the method 'consistently outperforms baselines, achieving superior performance across metrics,' but the provided description contains no numerical results, baseline definitions, error bars, or ablation studies; without these the strength of the performance claim cannot be assessed.
Authors: We acknowledge that the experimental claims require clearer and more complete numerical support for full assessment. The manuscript reports results on the CMU MoCap dataset, but to address the concern we will expand the experimental section with: explicit definitions and implementation details for all baselines, complete numerical tables including means and standard deviations (with error bars) over multiple random seeds, and additional ablation studies isolating the contributions of order-preserving OT, geodesic interpolation, collision optimization, and the RL adaptation stage. These revisions will make the performance comparisons fully transparent and reproducible. revision: yes
Circularity Check
Derivation chain is self-contained with independent pipeline and external validation
full rationale
The paper describes a data-efficient adaptation method that applies order-preserving optimal transport to compute distances between one non-walking target sequence and auxiliary walking sequences, performs geodesic interpolation to create intermediate pose skeletons, optimizes those for collision-free configurations, retargets them to the humanoid, and uses the results for reinforcement-learning policy adaptation from a walking-trained base model. This pipeline is presented as a sequence of distinct processing steps whose outputs are not defined in terms of the inputs by construction, nor are any central claims justified solely by self-citations or fitted parameters renamed as predictions. Experimental results are reported on the external CMU MoCap dataset with comparisons to baselines, providing an independent check rather than a tautological re-expression of the same quantities. No equations or sections in the provided description exhibit self-definitional loops, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation that would force the reported performance.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Walking motions provide useful priors for generating intermediate poses for non-walking target motions
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
order-preserving optimal transport to compute distances between walking and non-walking sequences, followed by interpolation along geodesics to generate new intermediate pose skeletons... collision-free configurations and retargeted to the humanoid before integration into a simulated environment for policy adaptation via reinforcement learning
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
geodesic distance... d((x1,{q1,j}),(x2,{q2,j})) = dt(x1,x2) + w Σ dr(q1,j,q2,j)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Optimization-based control for dynamic legged robots,
P. M. Wensing, M. Posa, Y . Hu, A. Escande, N. Mansard, and A. Del Prete, “Optimization-based control for dynamic legged robots,” IEEE Transactions on Robotics, vol. 40, pp. 43–63, 2023
work page 2023
-
[2]
M. Elobaid, G. Romualdi, G. Nava, L. Rapetti, H. A. O. Mohamed, and D. Pucci, “Online non-linear centroidal mpc for humanoid robots payload carrying with contact-stable force parametrization,” inIEEE International Conference on Robotics and Automation. IEEE, 2023, pp. 12 233–12 239
work page 2023
-
[3]
Learning humanoid locomotion with perceptive internal model,
J. Long, J. Ren, M. Shi, Z. Wang, T. Huang, P. Luo, and J. Pang, “Learning humanoid locomotion with perceptive internal model,” arXiv preprint arXiv:2411.14386, 2024
-
[4]
Vb-com: Learning vision-blind composite humanoid locomotion against deficient perception,
J. Ren, T. Huang, H. Wang, Z. Wang, Q. Ben, J. Long, Y . Yang, J. Pang, and P. Luo, “Vb-com: Learning vision-blind composite humanoid locomotion against deficient perception,”arXiv preprint arXiv:2502.14814, 2025
-
[5]
Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu,et al., “Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning,”IEEE/ASME Transactions on Mechatronics, 2025
work page 2025
-
[6]
Expressive whole-body control for humanoid robots,
X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “Expressive whole-body control for humanoid robots,” inRobotics: Science and Systems, 2024
work page 2024
-
[7]
Humanplus: Hu- manoid shadowing and imitation from humans,
Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn, “Humanplus: Hu- manoid shadowing and imitation from humans,” inAnnual Conference on Robot Learning, 2024
work page 2024
-
[8]
Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,
T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan,et al., “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,”arXiv preprint arXiv:2502.01143, 2025
-
[9]
Amo: Adaptive motion optimization for hyper-dexterous humanoid whole- body control,
J. Li, X. Cheng, T. Huang, S. Yang, R.-Z. Qiu, and X. Wang, “Amo: Adaptive motion optimization for hyper-dexterous humanoid whole- body control,”arXiv preprint arXiv:2505.03738, 2025
-
[10]
Real-world humanoid locomotion with reinforcement learning,
I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”Science Robotics, vol. 9, no. 89, p. eadi9579, 2024
work page 2024
-
[11]
Behavior foundation model: Towards next- generation whole-body control system of humanoid robots,
M. Yuan, T. Yu, W. Ge, X. Yao, D. Li, H. Wang, J. Chen, X. Jin, B. Li, H. Chen,et al., “Behavior foundation model: Towards next- generation whole-body control system of humanoid robots,”arXiv preprint arXiv:2506.20487, 2025
-
[12]
Ex- body2: Advanced expressive humanoid whole-body control,
M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang, “Ex- body2: Advanced expressive humanoid whole-body control,” inRSS 2025 Workshop on Whole-body Control and Bimanual Manipulation: Applications in Humanoids and Beyond, 2025
work page 2025
-
[13]
Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot,
H.-S. Fang, H. Fang, Z. Tang, J. Liu, C. Wang, J. Wang, H. Zhu, and C. Lu, “Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot,” inIEEE International Conference on Robotics and Automation. IEEE, 2024, pp. 653–660
work page 2024
-
[14]
One act play: Single demonstration behavior cloning with action chunking transformers,
A. George and A. B. Farimani, “One act play: Single demonstration behavior cloning with action chunking transformers,”arXiv preprint arXiv:2309.10175, 2023
-
[15]
One-shot transfer of long-horizon extrinsic manipulation through contact retargeting,
A. Wu, R. Wang, S. Chen, C. Eppner, and C. K. Liu, “One-shot transfer of long-horizon extrinsic manipulation through contact retargeting,” in IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2024, pp. 13 891–13 898
work page 2024
-
[16]
You only teach once: Learn one-shot bimanual robotic manipulation from video demonstrations,
H. Zhou, R. Wang, Y . Tai, Y . Deng, G. Liu, and K. Jia, “You only teach once: Learn one-shot bimanual robotic manipulation from video demonstrations,”arXiv preprint arXiv:2501.14208, 2025
-
[17]
Order-preserving wasserstein distance for sequence matching,
B. Su and G. Hua, “Order-preserving wasserstein distance for sequence matching,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1049–1057
work page 2017
-
[18]
Order-preserving optimal transport for distances between sequences,
B. Su and G. Hua, “Order-preserving optimal transport for distances between sequences,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 12, pp. 2961–2974, 2018
work page 2018
-
[19]
Motiondiffuse: Text-driven human motion generation with diffusion model,
M. Zhang, Z. Cai, L. Pan, F. Hong, X. Guo, L. Yang, and Z. Liu, “Motiondiffuse: Text-driven human motion generation with diffusion model,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 46, no. 6, pp. 4115–4128, 2024
work page 2024
-
[20]
Towards efficient and diverse generative model for unconditional human motion synthesis,
H. Yu, W. Liu, J. Bai, X. Gui, Y . Hou, Y . Ong, and Q. Zhang, “Towards efficient and diverse generative model for unconditional human motion synthesis,” inProceedings of the ACM International Conference on Multimedia, 2024, pp. 2535–2544
work page 2024
-
[21]
Humanoid locomotion as next token prediction,
I. Radosavovic, B. Zhang, B. Shi, J. Rajasegaran, S. Kamat, T. Darrell, K. Sreenath, and J. Malik, “Humanoid locomotion as next token prediction,”Advances in Neural Information Processing Systems, vol. 37, pp. 79 307–79 324, 2024
work page 2024
-
[22]
Universal humanoid motion representations for physics-based control,
Z. Luo, J. Cao, J. Merel, A. Winkler, J. Huang, K. M. Kitani, and W. Xu, “Universal humanoid motion representations for physics-based control,” inInternational Conference on Learning Representations, 2024
work page 2024
-
[23]
Amass: Archive of motion capture as surface shapes,
N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black, “Amass: Archive of motion capture as surface shapes,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5442–5451
work page 2019
-
[24]
Let humanoids hike! integrative skill development on complex trails,
K.-Y . Lin and S. X. Yu, “Let humanoids hike! integrative skill development on complex trails,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 22 498–22 507
work page 2025
-
[25]
Adapting humanoid locomotion over challenging terrain via two-phase training,
W. Cui, S. Li, H. Huang, B. Qin, T. Zhang, L. Zheng, Z. Tang, C. Hu, N. Yan, J. Chen,et al., “Adapting humanoid locomotion over challenging terrain via two-phase training,” inAnnual Conference on Robot Learning, 2024
work page 2024
-
[26]
Diversifying robot locomotion behaviors with extrinsic behavioral curiosity,
Z. Wan, X. Yu, D. M. Bossens, Y . Lyu, Q. Guo, F. X. Fan, Y .-S. Ong, and I. Tsang, “Diversifying robot locomotion behaviors with extrinsic behavioral curiosity,” inInternational Conference on Machine Learning, 2025
work page 2025
-
[27]
Latent ex- ploration for reinforcement learning,
A. S. Chiappa, A. Marin Vargas, A. Huang, and A. Mathis, “Latent ex- ploration for reinforcement learning,”Advances in Neural Information Processing Systems, vol. 36, pp. 56 508–56 530, 2023
work page 2023
-
[28]
T. Li, H. Jung, M. Gombolay, Y . K. Cho, and S. Ha, “Crossloco: Human motion driven control of legged robots via guided unsupervised reinforcement learning,”arXiv preprint arXiv:2309.17046, 2023
-
[29]
Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots
P. Dugar, A. Shrestha, F. Yu, B. van Marum, and A. Fern, “Learning multi-modal whole-body control for real-world humanoid robots,” arXiv preprint arXiv:2408.07295, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[30]
Learning human-to-humanoid real-time whole-body teleoperation,
T. He, Z. Luo, W. Xiao, C. Zhang, K. Kitani, C. Liu, and G. Shi, “Learning human-to-humanoid real-time whole-body teleoperation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2024, pp. 8944–8951
work page 2024
-
[31]
Universal humanoid robot pose learning from internet human videos,
J. Mao, S. Zhao, S. Song, T. Shi, J. Ye, M. Zhang, H. Geng, J. Malik, V . C. Guizilini, and Y . Wang, “Universal humanoid robot pose learning from internet human videos,” inICRA Workshop: Human-Centered Robot Learning in the Era of Big Data and Large Models, 2025
work page 2025
-
[32]
Auto-encoding variational bayes,
D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in International Conference on Learning Representations, 2014
work page 2014
-
[33]
Denoising diffusion probabilistic models,
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020
work page 2020
-
[34]
Mdm: Human motion diffusion model,
G. Tevet, S. Raab, B. Gordon, Y . Shafir, D. Cohen-or, and A. H. Bermano, “Mdm: Human motion diffusion model,” inInternational Conference on Learning Representations, 2023
work page 2023
-
[35]
Executing your commands via motion diffusion in latent space,
X. Chen, B. Jiang, W. Liu, Z. Huang, B. Fu, T. Chen, and G. Yu, “Executing your commands via motion diffusion in latent space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 000–18 010
work page 2023
-
[36]
Physdiff: Physics-guided human motion diffusion model,
Y . Yuan, J. Song, U. Iqbal, A. Vahdat, and J. Kautz, “Physdiff: Physics-guided human motion diffusion model,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 16 010–16 021
work page 2023
-
[37]
Z. Kang, X. Wang, and Y . Mu, “Biomodiffuse: Physics-guided biomechanical diffusion for controllable and authentic human motion synthesis,”arXiv preprint arXiv:2503.06151, 2025
-
[38]
Guided motion diffusion for controllable human motion synthesis,
K. Karunratanakul, K. Preechakul, S. Suwajanakorn, and S. Tang, “Guided motion diffusion for controllable human motion synthesis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2151–2162
work page 2023
-
[39]
Smoodi: Stylized motion diffusion model,
L. Zhong, Y . Xie, V . Jampani, D. Sun, and H. Jiang, “Smoodi: Stylized motion diffusion model,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 405–421
work page 2024
-
[40]
Denoising diffusion probabilistic models for action-conditioned 3d motion generation,
M. Zhao, M. Liu, B. Ren, S. Dai, and N. Sebe, “Denoising diffusion probabilistic models for action-conditioned 3d motion generation,” inIEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2024, pp. 4225–4229
work page 2024
-
[41]
Salad: Skeleton-aware latent diffusion for text-driven motion generation and editing,
S. Hong, C. Kim, S. Yoon, J. Nam, S. Cha, and J. Noh, “Salad: Skeleton-aware latent diffusion for text-driven motion generation and editing,” inProceedings of the Computer Vision and Pattern Recog- nition Conference, 2025, pp. 7158–7168
work page 2025
-
[42]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[43]
Human action recog- nition by representing 3d skeletons as points in a lie group,
R. Vemulapalli, F. Arrate, and R. Chellappa, “Human action recog- nition by representing 3d skeletons as points in a lie group,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 588–595
work page 2014
-
[44]
Rolling rotations for recognizing human actions from 3d skeletal data,
R. Vemulapalli and R. Chellapa, “Rolling rotations for recognizing human actions from 3d skeletal data,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4471–4479
work page 2016
-
[45]
Villaniet al.,Optimal transport: old and new
C. Villaniet al.,Optimal transport: old and new. Springer, 2008, vol. 338
work page 2008
-
[46]
Ericson,Real-time collision detection
C. Ericson,Real-time collision detection. CRC Press, 2004. Supplemental Materials: One-shot Humanoid Whole-body Motion Learning Collision detectionplays a pivotal role in generating phys- ically plausible poses for articulated structures, where self- intersections may occur due to intricate joint arrangements. The process entails representing the skeleton...
work page 2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.