arxiv: 2510.25241 · v2 · submitted 2025-10-29 · 💻 cs.RO · cs.AI

One-shot Adaptation of Humanoid Whole-body Motion with Walking Priors

Hao Huang , Geeta Chandra Raju Bethala , Shuaihang Yuan , Congcong Wen , Mengyu Wang , Anthony Tzes , Yi Fang This is my paper

Pith reviewed 2026-05-18 03:53 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords humanoid whole-body motionone-shot adaptationoptimal transportwalking priorsreinforcement learningmotion retargetingCMU MoCapcollision optimization

0 comments

The pith

Order-preserving optimal transport lets a walking-trained humanoid model adapt to any new whole-body motion from one target sample plus auxiliary walks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a data-efficient way to teach humanoid robots new whole-body actions without needing many examples of each motion. Starting from a base model trained only on walking, the method measures distances between walking sequences and the single non-walking target using order-preserving optimal transport. It then creates intermediate pose skeletons by interpolating along geodesics, cleans them for collisions, retargets them to the robot body, and trains a new policy with reinforcement learning in simulation. This reduces the cost of building large human motion datasets while still producing usable policies. A sympathetic reader would care because collecting high-quality motion capture data for every possible action is currently a major bottleneck for practical humanoid robots.

Core claim

The central claim is that order-preserving optimal transport distances between walking and non-walking sequences, followed by geodesic interpolation to produce intermediate pose skeletons, yield configurations that remain useful after collision optimization and retargeting, enabling effective reinforcement-learning policy adaptation from a single non-walking target sample together with auxiliary walking motions and a walking-trained base model.

What carries the argument

Order-preserving optimal transport that computes distances between walking and non-walking sequences to generate intermediate pose skeletons via geodesic interpolation.

If this is right

A new whole-body motion can be learned from only one non-walking sample plus walking auxiliaries instead of multiple samples.
The generated policies consistently outperform baseline adaptation methods across standard motion quality metrics on the CMU MoCap dataset.
Collision-free optimization followed by retargeting produces skeletons that integrate directly into simulated environments for reinforcement learning.
The walking-trained base model serves as a reusable prior that supports adaptation to diverse non-walking targets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transport-based interpolation might reduce sample needs when adapting policies across different humanoid morphologies or hardware platforms.
If the generated skeletons transfer well to real robots, the method could shorten the gap between motion capture and deployed behaviors in unstructured environments.
Combining this one-shot adaptation with online feedback from physical trials could enable continual improvement without retraining from scratch.

Load-bearing premise

The intermediate skeletons created by order-preserving optimal transport remain useful after collision optimization, retargeting to the humanoid, and reinforcement-learning policy training.

What would settle it

Run the full pipeline on CMU MoCap non-walking motions and measure whether the resulting policies achieve lower success rates or higher error metrics than the reported baselines in simulation trials.

Figures

Figures reproduced from arXiv: 2510.25241 by Anthony Tzes, Congcong Wen, Geeta Chandra Raju Bethala, Hao Huang, Mengyu Wang, Shuaihang Yuan, Yi Fang.

**Figure 1.** Figure 1: Sampled frames from motion sequences of a humanoid (Unitree H1) performing four distinct actions in sim-to-sim [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Given a sequence of walking motion pose skeletons and a target sequence comprising non-walking motions, we [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of spheres for joints, capsules for bones, and line segment distance between two bones. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

Whole-body humanoid motion represents a fundamental challenge in robotics, requiring balance, coordination, and adaptability to enable human-like behaviors. However, existing methods typically require multiple training samples per motion, rendering the collection of high-quality human motion datasets both labor-intensive and costly. To address this, we propose a data-efficient adaptation approach that learns a new humanoid motion from a single non-walking target sample together with auxiliary walking motions and a walking-trained base model. The core idea lies in leveraging order-preserving optimal transport to compute distances between walking and non-walking sequences, followed by interpolation along geodesics to generate new intermediate pose skeletons, which are then optimized for collision-free configurations and retargeted to the humanoid before integration into a simulated environment for policy adaptation via reinforcement learning. Experimental evaluations on the CMU MoCap dataset demonstrate that our method consistently outperforms baselines, achieving superior performance across metrics. Our code is available at: https://github.com/hhuang-code/One-shot-WBM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a workable pipeline for one-shot humanoid motion adaptation by aligning a single non-walking sample to walking priors with order-preserving OT and geodesic interpolation before RL fine-tuning, but the abstract gives almost no numbers to judge whether it actually works.

read the letter

The main thing here is a concrete way to bootstrap a new whole-body humanoid motion from one target example plus some walking data and a walking-trained base policy. They align the sequences with order-preserving optimal transport, generate intermediate poses by geodesic interpolation, clean them up for collisions, retarget to the robot, and then run RL adaptation in simulation. That pipeline is the core of the work and it is presented clearly enough to follow.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce a data-efficient one-shot adaptation method for humanoid whole-body motions. It uses a single non-walking target sample together with auxiliary walking motions and a walking-trained base model; order-preserving optimal transport computes distances between sequences, geodesic interpolation generates intermediate pose skeletons, and these are collision-optimized, retargeted to the humanoid, and fed into reinforcement-learning policy adaptation in simulation. Experiments on the CMU MoCap dataset are reported to show consistent outperformance over baselines.

Significance. If the central empirical claims hold after verification, the work could meaningfully reduce data-collection costs for diverse humanoid behaviors, a practical bottleneck in robotics. The pipeline that combines order-preserving OT interpolation with a pre-trained walking prior and RL adaptation is a coherent technical contribution. Public release of the code at the cited GitHub repository is a clear strength that supports reproducibility.

major comments (2)

[§3.2, Eq. (4)] §3.2, Eq. (4): the assumption that order-preserving OT geodesic interpolation between walking and non-walking sequences produces kinematically plausible intermediate skeletons that survive collision optimization and retargeting is load-bearing for the one-shot claim, yet the manuscript supplies no quantitative validation (e.g., joint-limit violation rates, velocity smoothness, or distribution distance to target motion) on the post-optimization intermediates themselves.
[Experimental evaluations] Experimental section: the abstract states that the method 'consistently outperforms baselines, achieving superior performance across metrics,' but the provided description contains no numerical results, baseline definitions, error bars, or ablation studies; without these the strength of the performance claim cannot be assessed.

minor comments (1)

Clarify in the method description how sequence-length differences between walking and non-walking motions are handled during OT alignment and whether timing or support-phase information is explicitly preserved after retargeting.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2, Eq. (4)] §3.2, Eq. (4): the assumption that order-preserving OT geodesic interpolation between walking and non-walking sequences produces kinematically plausible intermediate skeletons that survive collision optimization and retargeting is load-bearing for the one-shot claim, yet the manuscript supplies no quantitative validation (e.g., joint-limit violation rates, velocity smoothness, or distribution distance to target motion) on the post-optimization intermediates themselves.

Authors: We agree that direct quantitative validation of the interpolated and optimized intermediate skeletons is valuable for supporting the central assumption behind the one-shot claim. While the downstream task success rates provide indirect support, we will revise §3.2 to include explicit metrics on the post-optimization and retargeted intermediates. These will comprise joint-limit violation rates (percentage of poses with any joint exceeding limits), velocity smoothness (mean squared jerk across the sequence), and distribution distance to the target motion (using Fréchet distance on pose embeddings). The added analysis will report these values before and after collision optimization to demonstrate plausibility. revision: yes
Referee: [Experimental evaluations] Experimental section: the abstract states that the method 'consistently outperforms baselines, achieving superior performance across metrics,' but the provided description contains no numerical results, baseline definitions, error bars, or ablation studies; without these the strength of the performance claim cannot be assessed.

Authors: We acknowledge that the experimental claims require clearer and more complete numerical support for full assessment. The manuscript reports results on the CMU MoCap dataset, but to address the concern we will expand the experimental section with: explicit definitions and implementation details for all baselines, complete numerical tables including means and standard deviations (with error bars) over multiple random seeds, and additional ablation studies isolating the contributions of order-preserving OT, geodesic interpolation, collision optimization, and the RL adaptation stage. These revisions will make the performance comparisons fully transparent and reproducible. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with independent pipeline and external validation

full rationale

The paper describes a data-efficient adaptation method that applies order-preserving optimal transport to compute distances between one non-walking target sequence and auxiliary walking sequences, performs geodesic interpolation to create intermediate pose skeletons, optimizes those for collision-free configurations, retargets them to the humanoid, and uses the results for reinforcement-learning policy adaptation from a walking-trained base model. This pipeline is presented as a sequence of distinct processing steps whose outputs are not defined in terms of the inputs by construction, nor are any central claims justified solely by self-citations or fitted parameters renamed as predictions. Experimental results are reported on the external CMU MoCap dataset with comparisons to baselines, providing an independent check rather than a tautological re-expression of the same quantities. No equations or sections in the provided description exhibit self-definitional loops, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation that would force the reported performance.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that walking motions supply transferable structure for non-walking targets and that optimal transport alignment yields usable intermediate poses; no explicit free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Walking motions provide useful priors for generating intermediate poses for non-walking target motions
Invoked in the core idea: auxiliary walking motions are combined with the single target sample to compute transport distances and geodesics.

pith-pipeline@v0.9.0 · 5715 in / 1316 out tokens · 38296 ms · 2026-05-18T03:53:04.494166+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

order-preserving optimal transport to compute distances between walking and non-walking sequences, followed by interpolation along geodesics to generate new intermediate pose skeletons... collision-free configurations and retargeted to the humanoid before integration into a simulated environment for policy adaptation via reinforcement learning
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

geodesic distance... d((x1,{q1,j}),(x2,{q2,j})) = dt(x1,x2) + w Σ dr(q1,j,q2,j)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

[1]

Optimization-based control for dynamic legged robots,

P. M. Wensing, M. Posa, Y . Hu, A. Escande, N. Mansard, and A. Del Prete, “Optimization-based control for dynamic legged robots,” IEEE Transactions on Robotics, vol. 40, pp. 43–63, 2023

work page 2023
[2]

Online non-linear centroidal mpc for humanoid robots payload carrying with contact-stable force parametrization,

M. Elobaid, G. Romualdi, G. Nava, L. Rapetti, H. A. O. Mohamed, and D. Pucci, “Online non-linear centroidal mpc for humanoid robots payload carrying with contact-stable force parametrization,” inIEEE International Conference on Robotics and Automation. IEEE, 2023, pp. 12 233–12 239

work page 2023
[3]

Learning humanoid locomotion with perceptive internal model,

J. Long, J. Ren, M. Shi, Z. Wang, T. Huang, P. Luo, and J. Pang, “Learning humanoid locomotion with perceptive internal model,” arXiv preprint arXiv:2411.14386, 2024

work page arXiv 2024
[4]

Vb-com: Learning vision-blind composite humanoid locomotion against deficient perception,

J. Ren, T. Huang, H. Wang, Z. Wang, Q. Ben, J. Long, Y . Yang, J. Pang, and P. Luo, “Vb-com: Learning vision-blind composite humanoid locomotion against deficient perception,”arXiv preprint arXiv:2502.14814, 2025

work page arXiv 2025
[5]

Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning,

Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu,et al., “Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning,”IEEE/ASME Transactions on Mechatronics, 2025

work page 2025
[6]

Expressive whole-body control for humanoid robots,

X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “Expressive whole-body control for humanoid robots,” inRobotics: Science and Systems, 2024

work page 2024
[7]

Humanplus: Hu- manoid shadowing and imitation from humans,

Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn, “Humanplus: Hu- manoid shadowing and imitation from humans,” inAnnual Conference on Robot Learning, 2024

work page 2024
[8]

Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,

T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan,et al., “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,”arXiv preprint arXiv:2502.01143, 2025

work page arXiv 2025
[9]

Amo: Adaptive motion optimization for hyper-dexterous humanoid whole- body control,

J. Li, X. Cheng, T. Huang, S. Yang, R.-Z. Qiu, and X. Wang, “Amo: Adaptive motion optimization for hyper-dexterous humanoid whole- body control,”arXiv preprint arXiv:2505.03738, 2025

work page arXiv 2025
[10]

Real-world humanoid locomotion with reinforcement learning,

I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”Science Robotics, vol. 9, no. 89, p. eadi9579, 2024

work page 2024
[11]

Behavior foundation model: Towards next- generation whole-body control system of humanoid robots,

M. Yuan, T. Yu, W. Ge, X. Yao, D. Li, H. Wang, J. Chen, X. Jin, B. Li, H. Chen,et al., “Behavior foundation model: Towards next- generation whole-body control system of humanoid robots,”arXiv preprint arXiv:2506.20487, 2025

work page arXiv 2025
[12]

Ex- body2: Advanced expressive humanoid whole-body control,

M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang, “Ex- body2: Advanced expressive humanoid whole-body control,” inRSS 2025 Workshop on Whole-body Control and Bimanual Manipulation: Applications in Humanoids and Beyond, 2025

work page 2025
[13]

Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot,

H.-S. Fang, H. Fang, Z. Tang, J. Liu, C. Wang, J. Wang, H. Zhu, and C. Lu, “Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot,” inIEEE International Conference on Robotics and Automation. IEEE, 2024, pp. 653–660

work page 2024
[14]

One act play: Single demonstration behavior cloning with action chunking transformers,

A. George and A. B. Farimani, “One act play: Single demonstration behavior cloning with action chunking transformers,”arXiv preprint arXiv:2309.10175, 2023

work page arXiv 2023
[15]

One-shot transfer of long-horizon extrinsic manipulation through contact retargeting,

A. Wu, R. Wang, S. Chen, C. Eppner, and C. K. Liu, “One-shot transfer of long-horizon extrinsic manipulation through contact retargeting,” in IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2024, pp. 13 891–13 898

work page 2024
[16]

You only teach once: Learn one-shot bimanual robotic manipulation from video demonstrations,

H. Zhou, R. Wang, Y . Tai, Y . Deng, G. Liu, and K. Jia, “You only teach once: Learn one-shot bimanual robotic manipulation from video demonstrations,”arXiv preprint arXiv:2501.14208, 2025

work page arXiv 2025
[17]

Order-preserving wasserstein distance for sequence matching,

B. Su and G. Hua, “Order-preserving wasserstein distance for sequence matching,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1049–1057

work page 2017
[18]

Order-preserving optimal transport for distances between sequences,

B. Su and G. Hua, “Order-preserving optimal transport for distances between sequences,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 12, pp. 2961–2974, 2018

work page 2018
[19]

Motiondiffuse: Text-driven human motion generation with diffusion model,

M. Zhang, Z. Cai, L. Pan, F. Hong, X. Guo, L. Yang, and Z. Liu, “Motiondiffuse: Text-driven human motion generation with diffusion model,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 46, no. 6, pp. 4115–4128, 2024

work page 2024
[20]

Towards efficient and diverse generative model for unconditional human motion synthesis,

H. Yu, W. Liu, J. Bai, X. Gui, Y . Hou, Y . Ong, and Q. Zhang, “Towards efficient and diverse generative model for unconditional human motion synthesis,” inProceedings of the ACM International Conference on Multimedia, 2024, pp. 2535–2544

work page 2024
[21]

Humanoid locomotion as next token prediction,

I. Radosavovic, B. Zhang, B. Shi, J. Rajasegaran, S. Kamat, T. Darrell, K. Sreenath, and J. Malik, “Humanoid locomotion as next token prediction,”Advances in Neural Information Processing Systems, vol. 37, pp. 79 307–79 324, 2024

work page 2024
[22]

Universal humanoid motion representations for physics-based control,

Z. Luo, J. Cao, J. Merel, A. Winkler, J. Huang, K. M. Kitani, and W. Xu, “Universal humanoid motion representations for physics-based control,” inInternational Conference on Learning Representations, 2024

work page 2024
[23]

Amass: Archive of motion capture as surface shapes,

N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black, “Amass: Archive of motion capture as surface shapes,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 5442–5451

work page 2019
[24]

Let humanoids hike! integrative skill development on complex trails,

K.-Y . Lin and S. X. Yu, “Let humanoids hike! integrative skill development on complex trails,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 22 498–22 507

work page 2025
[25]

Adapting humanoid locomotion over challenging terrain via two-phase training,

W. Cui, S. Li, H. Huang, B. Qin, T. Zhang, L. Zheng, Z. Tang, C. Hu, N. Yan, J. Chen,et al., “Adapting humanoid locomotion over challenging terrain via two-phase training,” inAnnual Conference on Robot Learning, 2024

work page 2024
[26]

Diversifying robot locomotion behaviors with extrinsic behavioral curiosity,

Z. Wan, X. Yu, D. M. Bossens, Y . Lyu, Q. Guo, F. X. Fan, Y .-S. Ong, and I. Tsang, “Diversifying robot locomotion behaviors with extrinsic behavioral curiosity,” inInternational Conference on Machine Learning, 2025

work page 2025
[27]

Latent ex- ploration for reinforcement learning,

A. S. Chiappa, A. Marin Vargas, A. Huang, and A. Mathis, “Latent ex- ploration for reinforcement learning,”Advances in Neural Information Processing Systems, vol. 36, pp. 56 508–56 530, 2023

work page 2023
[28]

Crossloco: Human motion driven control of legged robots via guided unsupervised reinforcement learning,

T. Li, H. Jung, M. Gombolay, Y . K. Cho, and S. Ha, “Crossloco: Human motion driven control of legged robots via guided unsupervised reinforcement learning,”arXiv preprint arXiv:2309.17046, 2023

work page arXiv 2023
[29]

Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots

P. Dugar, A. Shrestha, F. Yu, B. van Marum, and A. Fern, “Learning multi-modal whole-body control for real-world humanoid robots,” arXiv preprint arXiv:2408.07295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

Learning human-to-humanoid real-time whole-body teleoperation,

T. He, Z. Luo, W. Xiao, C. Zhang, K. Kitani, C. Liu, and G. Shi, “Learning human-to-humanoid real-time whole-body teleoperation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2024, pp. 8944–8951

work page 2024
[31]

Universal humanoid robot pose learning from internet human videos,

J. Mao, S. Zhao, S. Song, T. Shi, J. Ye, M. Zhang, H. Geng, J. Malik, V . C. Guizilini, and Y . Wang, “Universal humanoid robot pose learning from internet human videos,” inICRA Workshop: Human-Centered Robot Learning in the Era of Big Data and Large Models, 2025

work page 2025
[32]

Auto-encoding variational bayes,

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in International Conference on Learning Representations, 2014

work page 2014
[33]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020

work page 2020
[34]

Mdm: Human motion diffusion model,

G. Tevet, S. Raab, B. Gordon, Y . Shafir, D. Cohen-or, and A. H. Bermano, “Mdm: Human motion diffusion model,” inInternational Conference on Learning Representations, 2023

work page 2023
[35]

Executing your commands via motion diffusion in latent space,

X. Chen, B. Jiang, W. Liu, Z. Huang, B. Fu, T. Chen, and G. Yu, “Executing your commands via motion diffusion in latent space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 000–18 010

work page 2023
[36]

Physdiff: Physics-guided human motion diffusion model,

Y . Yuan, J. Song, U. Iqbal, A. Vahdat, and J. Kautz, “Physdiff: Physics-guided human motion diffusion model,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 16 010–16 021

work page 2023
[37]

Biomodiffuse: Physics-guided biomechanical diffusion for controllable and authentic human motion synthesis,

Z. Kang, X. Wang, and Y . Mu, “Biomodiffuse: Physics-guided biomechanical diffusion for controllable and authentic human motion synthesis,”arXiv preprint arXiv:2503.06151, 2025

work page arXiv 2025
[38]

Guided motion diffusion for controllable human motion synthesis,

K. Karunratanakul, K. Preechakul, S. Suwajanakorn, and S. Tang, “Guided motion diffusion for controllable human motion synthesis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2151–2162

work page 2023
[39]

Smoodi: Stylized motion diffusion model,

L. Zhong, Y . Xie, V . Jampani, D. Sun, and H. Jiang, “Smoodi: Stylized motion diffusion model,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 405–421

work page 2024
[40]

Denoising diffusion probabilistic models for action-conditioned 3d motion generation,

M. Zhao, M. Liu, B. Ren, S. Dai, and N. Sebe, “Denoising diffusion probabilistic models for action-conditioned 3d motion generation,” inIEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2024, pp. 4225–4229

work page 2024
[41]

Salad: Skeleton-aware latent diffusion for text-driven motion generation and editing,

S. Hong, C. Kim, S. Yoon, J. Nam, S. Cha, and J. Noh, “Salad: Skeleton-aware latent diffusion for text-driven motion generation and editing,” inProceedings of the Computer Vision and Pattern Recog- nition Conference, 2025, pp. 7158–7168

work page 2025
[42]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[43]

Human action recog- nition by representing 3d skeletons as points in a lie group,

R. Vemulapalli, F. Arrate, and R. Chellappa, “Human action recog- nition by representing 3d skeletons as points in a lie group,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 588–595

work page 2014
[44]

Rolling rotations for recognizing human actions from 3d skeletal data,

R. Vemulapalli and R. Chellapa, “Rolling rotations for recognizing human actions from 3d skeletal data,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4471–4479

work page 2016
[45]

Villaniet al.,Optimal transport: old and new

C. Villaniet al.,Optimal transport: old and new. Springer, 2008, vol. 338

work page 2008
[46]

Ericson,Real-time collision detection

C. Ericson,Real-time collision detection. CRC Press, 2004. Supplemental Materials: One-shot Humanoid Whole-body Motion Learning Collision detectionplays a pivotal role in generating phys- ically plausible poses for articulated structures, where self- intersections may occur due to intricate joint arrangements. The process entails representing the skeleton...

work page 2004