Direct Dynamic Retargeting for Humanoid Imitation Learning from Videos

Armand Jordana; Constant Roux; Ludovic De Matte\"is; Nicolas Mansard; Olivier Stasse; Philippe Sou\`eres; Valentin Guillet

arxiv: 2605.23762 · v1 · pith:IDCWPEO3new · submitted 2026-05-22 · 💻 cs.RO

Direct Dynamic Retargeting for Humanoid Imitation Learning from Videos

Constant Roux , Ludovic De Matte\"is , Armand Jordana , Valentin Guillet , Nicolas Mansard , Olivier Stasse , Philippe Sou\`eres This is my paper

Pith reviewed 2026-05-25 03:56 UTC · model grok-4.3

classification 💻 cs.RO

keywords Direct Dynamic RetargetingHumanoid Imitation LearningMotion RetargetingModel Predictive ControlVideo DemonstrationsReinforcement LearningPhysics Simulation

0 comments

The pith

Direct Dynamic Retargeting generates dynamically feasible humanoid trajectories straight from video by skipping intermediate kinematic projections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard imitation learning pipelines for humanoids from monocular video first map motions through geometric retargeting or indirect dynamic steps that introduce a geometric bias and shrink the space of possible behaviors. The paper proposes Direct Dynamic Retargeting as a single-stage alternative that works directly in task space inside a physics simulator and uses a sampling-based model predictive controller to optimize contact sequences and mitigate drift. Experiments show this produces references that track demonstrations more accurately than prior methods and, when supplied to reinforcement learning agents, speed convergence while improving agile and balancing performance.

Core claim

Direct Dynamic Retargeting (DDR) formulates the retargeting problem in task space and solves it with a sampling-based Model Predictive Control solver inside a physics simulator, producing high-fidelity, dynamically feasible trajectories directly from expert videos without any intermediate kinematic projection step.

What carries the argument

Direct Dynamic Retargeting (DDR) as a single-stage task-space optimization using sampling-based MPC inside a physics simulator

If this is right

DDR achieves higher demonstration tracking accuracy than geometric or indirect dynamic retargeting baselines.
Physically viable references from DDR accelerate convergence when used to train reinforcement learning policies.
RL agents trained with DDR references execute agile and balancing behaviors more successfully than those trained with biased references.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same direct task-space formulation could be applied to retargeting between other robot morphologies that differ in mass distribution.
Removing the kinematic stage may reduce the number of hyperparameters that must be tuned when adapting new video datasets.
The approach could be tested on contact-rich motions such as climbing or manipulation where intermediate projections are especially restrictive.

Load-bearing premise

The geometric bias created by intermediate kinematic projections is the main reason standard retargeting pipelines produce suboptimal dynamic behavior.

What would settle it

A controlled experiment that applies the same sampling-based MPC solver and simulator to both a DDR pipeline and a two-stage geometric-plus-dynamic pipeline and finds no accuracy difference in demonstration tracking.

Figures

Figures reproduced from arXiv: 2605.23762 by Armand Jordana, Constant Roux, Ludovic De Matte\"is, Nicolas Mansard, Olivier Stasse, Philippe Sou\`eres, Valentin Guillet.

**Figure 1.** Figure 1: Our proposed framework uses expert demonstration from monocular video, extracts human keypoints using part of [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Schematic comparison of Geometric Retargeting [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Experimental motion dataset for the IL benchmark. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative analysis of contact sequence accuracy [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Temporal evolution of pelvis Laplacian error during [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Temporal evolution of left foot Laplacian error during [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 8.** Figure 8: The Unitree robot H1-2 performing (a) a stable squat, (b) a static kung fu pose, (c) an extended one-foot balance, (d) a dynamic pistol squat, and (e) a balancing stick pose. ing. By formulating the reference generation problem directly in the task space using a Laplacian graph distance, our approach effectively mitigates the impact of noisy and drifting human data extracted from videos. Furthermore, by e… view at source ↗

read the original abstract

Imitation Learning from monocular video demonstrations provides a scalable approach for teaching complex skills to humanoid robots. However, translating human motion to humanoids requires overcoming significant morphological mismatches. Standard approaches rely on Geometric Retargeting or Indirect Dynamic Retargeting pipelines. We identify that these intermediate kinematic projections introduce a geometric bias, restricting the search space and yielding suboptimal dynamic behaviors. In this paper, we propose Direct Dynamic Retargeting (DDR), a novel single-stage framework that generates high-fidelity, dynamically feasible trajectories directly from expert videos. By formulating the problem in the task space and leveraging a sampling-based Model Predictive Control solver within a physics simulator, DDR natively optimizes over complex contact sequences while mitigating input drift. Our experiments demonstrate that bypassing the geometric bias allows DDR to outperform state-of-the-art baselines in demonstration tracking accuracy. Furthermore, we establish that providing such physically viable references to RL agents accelerates training convergence and enhances the final execution of agile and balancing behaviors. Source code will be made publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DDR gives a clean single-stage task-space MPC formulation for video retargeting that skips geometric steps, but the reported gains are not isolated from solver or simulator differences.

read the letter

The paper's main move is Direct Dynamic Retargeting: it casts the video-to-humanoid problem as a single task-space optimization solved by sampling-based MPC inside a physics simulator, so it can handle contact sequences without first projecting through kinematic retargeting. That formulation is the actual novelty and it is presented cleanly in the abstract. Releasing code is also useful for the subfield. The claim that this produces better tracking and faster RL convergence for balancing and agile motions follows logically from removing the intermediate projection step. If the full experiments back that up with matched solver budgets, the approach could be a practical improvement for imitation pipelines. The soft spot is exactly the one in the stress-test note. Nothing in the abstract shows that the baselines received equivalent MPC resources, contact models, or simulator fidelity. Without an ablation that applies the same solver to geometrically retargeted references, it is impossible to tell whether the performance delta comes from bypassing geometric bias or from the solver itself. That gap makes the central explanation provisional rather than demonstrated. The work is aimed at people already doing video-based imitation on humanoids or legged RL. A reader who needs a new retargeting option or wants to compare against direct optimization will get something concrete from it. The idea is coherent enough on its own terms to deserve referee time, though the authors will have to supply the missing controls before the claims can be taken as settled.

Referee Report

2 major / 1 minor

Summary. The paper claims that geometric and indirect dynamic retargeting from monocular videos introduce geometric bias via intermediate kinematic projections, yielding suboptimal dynamic behaviors for humanoid robots. It proposes Direct Dynamic Retargeting (DDR), a single-stage task-space framework that uses a sampling-based MPC solver inside a physics simulator to directly optimize dynamically feasible trajectories and complex contact sequences from expert videos. Experiments are said to show that DDR outperforms state-of-the-art baselines in demonstration tracking accuracy and that the resulting references accelerate RL convergence while improving execution of agile and balancing behaviors.

Significance. If the central claims hold after proper controls, the approach could improve the quality of video-based imitation references for humanoids, reducing reliance on biased kinematic intermediates and thereby accelerating RL for dynamic skills. The stated plan to release source code supports reproducibility.

major comments (2)

[Abstract] The abstract states that DDR 'outperform[s] state-of-the-art baselines in demonstration tracking accuracy' by 'bypassing the geometric bias.' No information is given on whether the baselines were run with the identical sampling-based MPC solver, contact model, simulator fidelity, or solver budget. Without this control, performance deltas cannot be attributed to the absence of intermediate kinematic projections rather than solver or implementation differences.
[Abstract] The claim that 'providing such physically viable references to RL agents accelerates training convergence' is load-bearing for the second contribution. The abstract provides no description of an ablation that applies the same MPC solver to geometrically retargeted references, leaving the geometric-bias explanation unisolated from other factors such as simulator or MPC differences.

minor comments (1)

The abstract notes that source code will be made publicly available; confirming this in the camera-ready version would strengthen the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. The concerns about isolating the effect of geometric bias from solver differences are valid, and we will strengthen the manuscript by adding the suggested controls and ablations.

read point-by-point responses

Referee: [Abstract] The abstract states that DDR 'outperform[s] state-of-the-art baselines in demonstration tracking accuracy' by 'bypassing the geometric bias.' No information is given on whether the baselines were run with the identical sampling-based MPC solver, contact model, simulator fidelity, or solver budget. Without this control, performance deltas cannot be attributed to the absence of intermediate kinematic projections rather than solver or implementation differences.

Authors: The baselines follow the standard geometric and indirect dynamic retargeting pipelines from prior literature, which are purely kinematic and do not employ sampling-based MPC. DDR's formulation directly optimizes task-space trajectories inside the simulator. To isolate the contribution of bypassing kinematic projections, we will add an ablation that applies the identical MPC solver, contact model, and solver budget to the geometrically retargeted references and report the resulting tracking accuracy. This will appear in the revised experiments section. revision: yes
Referee: [Abstract] The claim that 'providing such physically viable references to RL agents accelerates training convergence' is load-bearing for the second contribution. The abstract provides no description of an ablation that applies the same MPC solver to geometrically retargeted references, leaving the geometric-bias explanation unisolated from other factors such as simulator or MPC differences.

Authors: We agree that the current experiments do not fully isolate whether the RL acceleration arises from dynamic feasibility versus other implementation differences. We will add an ablation that feeds the same MPC solver the outputs of the geometric retargeting baselines and compare the resulting RL training curves and final policy performance. These results will be incorporated into the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces Direct Dynamic Retargeting as a single-stage optimization framework that directly solves a task-space MPC problem inside a physics simulator. No equations, fitted parameters, or self-citations appear in the abstract or described method that reduce any claimed prediction or result to the inputs by construction. The central premise (that bypassing intermediate kinematic projections removes geometric bias) is presented as an empirical hypothesis tested against baselines rather than a self-definitional or self-citation-dependent derivation. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input supplies no mathematical derivations, datasets, or implementation details, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5720 in / 1148 out tokens · 24547 ms · 2026-05-25T03:56:35.226639+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 2 internal anchors

[1]

Visual imitation enables contextual humanoid control,

A. Allshire, H. Choi, J. Zhang, D. McAllister, A. Zhang, C. M. Kim, T. Darrell, P. Abbeel, J. Malik, and A. Kanazawa, “Visual imitation enables contextual humanoid control,” inProceedings of the Conf. on Robot Learning, 2025. TABLE VII: Reference tracking metrics of imitation policies: Root Mean Square Error (RMSE) on the joint configurations, mean cartes...

work page 2025
[2]

Omniretarget: Interaction-preserving data generation for humanoid whole-body loco-manipulation and scene interaction,

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi, “Omniretarget: Interaction-preserving data generation for humanoid whole-body loco-manipulation and scene interaction,” 2025

work page 2025
[3]

Learning to walk via deep reinforcement learning,

T. Haarnoja, S. Ha,et al., “Learning to walk via deep reinforcement learning,” inRobotics: Science and Systems XV, 2019

work page 2019
[4]

Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,

X. B. Penget al., “Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,”ACM Trans. Graph., 2018

work page 2018
[5]

Latent conditioned loco-manipulation using motion priors,

M. St˛ epie´n, R. Kourdis, C. Roux, and O. Stasse, “Latent conditioned loco-manipulation using motion priors,” inIEEE-RAS 24th Int. Conf. on Humanoid Robots, 2025

work page 2025
[6]

Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion,

Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu, “Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion,” 2025

work page 2025
[7]

Humanoid robot locomotion,

X.-S. Lei, J. Pan, and J.-B. Su, “Humanoid robot locomotion,” inInt. Conf. on Machine Learning and Cybernetics, 2005

work page 2005
[8]

Hub: Learning extreme humanoid balance,

T. Zhang, B. Zheng, R. Nai, Y . Hu, Y .-J. Wang, G. Chen, F. Lin, J. Li, C. Hong, K. Sreenath, and Y . Gao, “Hub: Learning extreme humanoid balance,”arXiv preprint arXiv:.07294, 2025

work page 2025
[9]

Learning agile robotic locomotion skills by imitating animals,

X. Bin Penget al., “Learning agile robotic locomotion skills by imitating animals,” inRobotics: Science and Systems XVI, 2020

work page 2020
[10]

Ppr: Physically plausible reconstruction from monocular videos,

G. Yang, S. Yang, J. Z. Zhang, Z. Manchester, and D. Ramanan, “Ppr: Physically plausible reconstruction from monocular videos,” in IEEE/CVF Int. Conf. on Computer Vision, 2023

work page 2023
[11]

DynaRetarget: Dynamically-Feasible retargeting using Sampling- Based trajectory optimization,

V . Dhedin, I. Taouil, S. Omar, D. Yu, K. Tao, A. Dai, and M. Khadiv, “DynaRetarget: Dynamically-Feasible retargeting using Sampling- Based trajectory optimization,” 2026

work page 2026
[12]

Spatio-temporal motion retargeting for quadruped robots,

T. Yoon, D. Kang, S. Kim, J. Cheng, M. S. Ahn, S. Coros, and S. Choi, “Spatio-temporal motion retargeting for quadruped robots,” IEEE Trans. on Robotics, 2025

work page 2025
[13]

Spider: Scalable physics-informed dexterous retargeting,

C. Pan, C. Wang, H. Qi, Z. Liu, H. Bharadhwaj, A. Sharma, T. Wu, G. Shi, J. Malik, and F. Hogan, “Spider: Scalable physics-informed dexterous retargeting,” 2025

work page 2025
[14]

A tutorial on the cross-entropy method,

P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y . Rubinstein, “A tutorial on the cross-entropy method,”Annals of operations research, 2005

work page 2005
[15]

Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control,

C. Mastalli, R. Budhiraja, W. Merkt, G. Saurel, B. Hammoud, M. Naveau, J. Carpentier, L. Righetti, S. Vijayakumar, and N. Mansard, “Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control,” inIEEE Int. Conf. on Robotics and Automation, 2020

work page 2020
[16]

Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,

K. Ayusawa and E. Yoshida, “Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,”IEEE Trans. on Robotics, 2017

work page 2017
[17]

Pink: Python inverse kinematics based on Pinocchio,

S. Caronet al., “Pink: Python inverse kinematics based on Pinocchio,”

work page
[18]

Available: https://github.com/stephane-caron/pink

[Online]. Available: https://github.com/stephane-caron/pink

work page
[19]

Pyroki: A modular toolkit for robot kinematic optimization,

C. M. Kim*, B. Yi*, H. Choi, Y . Ma, K. Goldberg, and A. Kanazawa, “Pyroki: A modular toolkit for robot kinematic optimization,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2025

work page 2025
[20]

Optimization-based motion retargeting integrating spatial and dynamic constraints for humanoid,

T. Moulard, E. Yoshida, and S. Nakaoka, “Optimization-based motion retargeting integrating spatial and dynamic constraints for humanoid,” inIEEE ISR, 2013

work page 2013
[21]

Kinodynamic motion retargeting for humanoid locomotion via multi-contact whole-body trajectory optimization,

X. Zhanget al., “Kinodynamic motion retargeting for humanoid locomotion via multi-contact whole-body trajectory optimization,” 2026

work page 2026
[22]

Agility meets stability: Versatile humanoid control with heterogeneous data,

Y . Pan, R. Qiao, L. Chen, K. Chitta, L. Pan, H. Mai, Q. Bu, C. Zheng, H. Zhao, P. Luo, and H. Li, “Agility meets stability: Versatile humanoid control with heterogeneous data,”arXiv preprint arXiv:.17373, 2025

work page 2025
[23]

Amp: Adversarial motion priors for stylized physics-based character control,

X. B. Penget al., “Amp: Adversarial motion priors for stylized physics-based character control,”ACM Trans. on Graphics, 2021

work page 2021
[24]

Smpl: A skinned multi-person linear model,

M. Loperet al., “Smpl: A skinned multi-person linear model,” in Seminal Graphics Papers: Pushing the Boundaries, V olume 2, 2023

work page 2023
[25]

Hydrax: Sampling-based model predictive control on gpu with jax and mujoco mjx,

V . Kurtz, “Hydrax: Sampling-based model predictive control on gpu with jax and mujoco mjx,” 2024, https://github.com/vincekurtz/hydrax

work page 2024
[26]

Cat: Constraints as terminations for legged locomotion reinforcement learning,

E. Chane-Sane, P.-A. Leziart,et al., “Cat: Constraints as terminations for legged locomotion reinforcement learning,” inIEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2024

work page 2024
[27]

Constrained reinforcement learning for unstable point-feet bipedal locomotion applied to the bolt robot,

C. Roux, E. Chane-Sane, L. de Matteïs, T. Flayols, J. Manhes, O. Stasse, P. Souères, and N. Mansard, “Constrained reinforcement learning for unstable point-feet bipedal locomotion applied to the bolt robot,” inIEEE-RAS 24rd Int. Conf. on Humanoid Robots, 2025

work page 2025
[28]

Proximal Policy Optimization Algorithms

J. Schulmanet al., “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

skrl: Modular and flexible library for reinforcement learning,

A. Serrano-Muñozet al., “skrl: Modular and flexible library for reinforcement learning,”Journal of Machine Learning Research, 2023

work page 2023
[30]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

M. Mittalet al., “Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning,”arXiv preprint arXiv:2511.04831, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[31]

The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,

J. Carpentier, G. Saurel, G. Buondonno, J. Mirabel, F. Lamiraux, O. Stasse, and N. Mansard, “The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,” inIEEE Int. Symposium on System Integrations, 2019

work page 2019
[32]

Proxqp: an efficient and versatile quadratic programming solver for real-time robotics applications and beyond,

A. Bambadeet al., “Proxqp: an efficient and versatile quadratic programming solver for real-time robotics applications and beyond,” IEEE Transactions on Robotics, 2025

work page 2025
[33]

Mujoco: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” inIEEE/RSJ int. conf. on intelligent robots and systems, 2012

work page 2012

[1] [1]

Visual imitation enables contextual humanoid control,

A. Allshire, H. Choi, J. Zhang, D. McAllister, A. Zhang, C. M. Kim, T. Darrell, P. Abbeel, J. Malik, and A. Kanazawa, “Visual imitation enables contextual humanoid control,” inProceedings of the Conf. on Robot Learning, 2025. TABLE VII: Reference tracking metrics of imitation policies: Root Mean Square Error (RMSE) on the joint configurations, mean cartes...

work page 2025

[2] [2]

Omniretarget: Interaction-preserving data generation for humanoid whole-body loco-manipulation and scene interaction,

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi, “Omniretarget: Interaction-preserving data generation for humanoid whole-body loco-manipulation and scene interaction,” 2025

work page 2025

[3] [3]

Learning to walk via deep reinforcement learning,

T. Haarnoja, S. Ha,et al., “Learning to walk via deep reinforcement learning,” inRobotics: Science and Systems XV, 2019

work page 2019

[4] [4]

Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,

X. B. Penget al., “Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,”ACM Trans. Graph., 2018

work page 2018

[5] [5]

Latent conditioned loco-manipulation using motion priors,

M. St˛ epie´n, R. Kourdis, C. Roux, and O. Stasse, “Latent conditioned loco-manipulation using motion priors,” inIEEE-RAS 24th Int. Conf. on Humanoid Robots, 2025

work page 2025

[6] [6]

Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion,

Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu, “Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion,” 2025

work page 2025

[7] [7]

Humanoid robot locomotion,

X.-S. Lei, J. Pan, and J.-B. Su, “Humanoid robot locomotion,” inInt. Conf. on Machine Learning and Cybernetics, 2005

work page 2005

[8] [8]

Hub: Learning extreme humanoid balance,

T. Zhang, B. Zheng, R. Nai, Y . Hu, Y .-J. Wang, G. Chen, F. Lin, J. Li, C. Hong, K. Sreenath, and Y . Gao, “Hub: Learning extreme humanoid balance,”arXiv preprint arXiv:.07294, 2025

work page 2025

[9] [9]

Learning agile robotic locomotion skills by imitating animals,

X. Bin Penget al., “Learning agile robotic locomotion skills by imitating animals,” inRobotics: Science and Systems XVI, 2020

work page 2020

[10] [10]

Ppr: Physically plausible reconstruction from monocular videos,

G. Yang, S. Yang, J. Z. Zhang, Z. Manchester, and D. Ramanan, “Ppr: Physically plausible reconstruction from monocular videos,” in IEEE/CVF Int. Conf. on Computer Vision, 2023

work page 2023

[11] [11]

DynaRetarget: Dynamically-Feasible retargeting using Sampling- Based trajectory optimization,

V . Dhedin, I. Taouil, S. Omar, D. Yu, K. Tao, A. Dai, and M. Khadiv, “DynaRetarget: Dynamically-Feasible retargeting using Sampling- Based trajectory optimization,” 2026

work page 2026

[12] [12]

Spatio-temporal motion retargeting for quadruped robots,

T. Yoon, D. Kang, S. Kim, J. Cheng, M. S. Ahn, S. Coros, and S. Choi, “Spatio-temporal motion retargeting for quadruped robots,” IEEE Trans. on Robotics, 2025

work page 2025

[13] [13]

Spider: Scalable physics-informed dexterous retargeting,

C. Pan, C. Wang, H. Qi, Z. Liu, H. Bharadhwaj, A. Sharma, T. Wu, G. Shi, J. Malik, and F. Hogan, “Spider: Scalable physics-informed dexterous retargeting,” 2025

work page 2025

[14] [14]

A tutorial on the cross-entropy method,

P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y . Rubinstein, “A tutorial on the cross-entropy method,”Annals of operations research, 2005

work page 2005

[15] [15]

Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control,

C. Mastalli, R. Budhiraja, W. Merkt, G. Saurel, B. Hammoud, M. Naveau, J. Carpentier, L. Righetti, S. Vijayakumar, and N. Mansard, “Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control,” inIEEE Int. Conf. on Robotics and Automation, 2020

work page 2020

[16] [16]

Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,

K. Ayusawa and E. Yoshida, “Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,”IEEE Trans. on Robotics, 2017

work page 2017

[17] [17]

Pink: Python inverse kinematics based on Pinocchio,

S. Caronet al., “Pink: Python inverse kinematics based on Pinocchio,”

work page

[18] [18]

Available: https://github.com/stephane-caron/pink

[Online]. Available: https://github.com/stephane-caron/pink

work page

[19] [19]

Pyroki: A modular toolkit for robot kinematic optimization,

C. M. Kim*, B. Yi*, H. Choi, Y . Ma, K. Goldberg, and A. Kanazawa, “Pyroki: A modular toolkit for robot kinematic optimization,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2025

work page 2025

[20] [20]

Optimization-based motion retargeting integrating spatial and dynamic constraints for humanoid,

T. Moulard, E. Yoshida, and S. Nakaoka, “Optimization-based motion retargeting integrating spatial and dynamic constraints for humanoid,” inIEEE ISR, 2013

work page 2013

[21] [21]

Kinodynamic motion retargeting for humanoid locomotion via multi-contact whole-body trajectory optimization,

X. Zhanget al., “Kinodynamic motion retargeting for humanoid locomotion via multi-contact whole-body trajectory optimization,” 2026

work page 2026

[22] [22]

Agility meets stability: Versatile humanoid control with heterogeneous data,

Y . Pan, R. Qiao, L. Chen, K. Chitta, L. Pan, H. Mai, Q. Bu, C. Zheng, H. Zhao, P. Luo, and H. Li, “Agility meets stability: Versatile humanoid control with heterogeneous data,”arXiv preprint arXiv:.17373, 2025

work page 2025

[23] [23]

Amp: Adversarial motion priors for stylized physics-based character control,

X. B. Penget al., “Amp: Adversarial motion priors for stylized physics-based character control,”ACM Trans. on Graphics, 2021

work page 2021

[24] [24]

Smpl: A skinned multi-person linear model,

M. Loperet al., “Smpl: A skinned multi-person linear model,” in Seminal Graphics Papers: Pushing the Boundaries, V olume 2, 2023

work page 2023

[25] [25]

Hydrax: Sampling-based model predictive control on gpu with jax and mujoco mjx,

V . Kurtz, “Hydrax: Sampling-based model predictive control on gpu with jax and mujoco mjx,” 2024, https://github.com/vincekurtz/hydrax

work page 2024

[26] [26]

Cat: Constraints as terminations for legged locomotion reinforcement learning,

E. Chane-Sane, P.-A. Leziart,et al., “Cat: Constraints as terminations for legged locomotion reinforcement learning,” inIEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2024

work page 2024

[27] [27]

Constrained reinforcement learning for unstable point-feet bipedal locomotion applied to the bolt robot,

C. Roux, E. Chane-Sane, L. de Matteïs, T. Flayols, J. Manhes, O. Stasse, P. Souères, and N. Mansard, “Constrained reinforcement learning for unstable point-feet bipedal locomotion applied to the bolt robot,” inIEEE-RAS 24rd Int. Conf. on Humanoid Robots, 2025

work page 2025

[28] [28]

Proximal Policy Optimization Algorithms

J. Schulmanet al., “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[29] [29]

skrl: Modular and flexible library for reinforcement learning,

A. Serrano-Muñozet al., “skrl: Modular and flexible library for reinforcement learning,”Journal of Machine Learning Research, 2023

work page 2023

[30] [30]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

M. Mittalet al., “Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning,”arXiv preprint arXiv:2511.04831, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[31] [31]

The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,

J. Carpentier, G. Saurel, G. Buondonno, J. Mirabel, F. Lamiraux, O. Stasse, and N. Mansard, “The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,” inIEEE Int. Symposium on System Integrations, 2019

work page 2019

[32] [32]

Proxqp: an efficient and versatile quadratic programming solver for real-time robotics applications and beyond,

A. Bambadeet al., “Proxqp: an efficient and versatile quadratic programming solver for real-time robotics applications and beyond,” IEEE Transactions on Robotics, 2025

work page 2025

[33] [33]

Mujoco: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” inIEEE/RSJ int. conf. on intelligent robots and systems, 2012

work page 2012