pith. sign in

arxiv: 2605.23762 · v1 · pith:IDCWPEO3new · submitted 2026-05-22 · 💻 cs.RO

Direct Dynamic Retargeting for Humanoid Imitation Learning from Videos

Pith reviewed 2026-05-25 03:56 UTC · model grok-4.3

classification 💻 cs.RO
keywords Direct Dynamic RetargetingHumanoid Imitation LearningMotion RetargetingModel Predictive ControlVideo DemonstrationsReinforcement LearningPhysics Simulation
0
0 comments X

The pith

Direct Dynamic Retargeting generates dynamically feasible humanoid trajectories straight from video by skipping intermediate kinematic projections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard imitation learning pipelines for humanoids from monocular video first map motions through geometric retargeting or indirect dynamic steps that introduce a geometric bias and shrink the space of possible behaviors. The paper proposes Direct Dynamic Retargeting as a single-stage alternative that works directly in task space inside a physics simulator and uses a sampling-based model predictive controller to optimize contact sequences and mitigate drift. Experiments show this produces references that track demonstrations more accurately than prior methods and, when supplied to reinforcement learning agents, speed convergence while improving agile and balancing performance.

Core claim

Direct Dynamic Retargeting (DDR) formulates the retargeting problem in task space and solves it with a sampling-based Model Predictive Control solver inside a physics simulator, producing high-fidelity, dynamically feasible trajectories directly from expert videos without any intermediate kinematic projection step.

What carries the argument

Direct Dynamic Retargeting (DDR) as a single-stage task-space optimization using sampling-based MPC inside a physics simulator

If this is right

  • DDR achieves higher demonstration tracking accuracy than geometric or indirect dynamic retargeting baselines.
  • Physically viable references from DDR accelerate convergence when used to train reinforcement learning policies.
  • RL agents trained with DDR references execute agile and balancing behaviors more successfully than those trained with biased references.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same direct task-space formulation could be applied to retargeting between other robot morphologies that differ in mass distribution.
  • Removing the kinematic stage may reduce the number of hyperparameters that must be tuned when adapting new video datasets.
  • The approach could be tested on contact-rich motions such as climbing or manipulation where intermediate projections are especially restrictive.

Load-bearing premise

The geometric bias created by intermediate kinematic projections is the main reason standard retargeting pipelines produce suboptimal dynamic behavior.

What would settle it

A controlled experiment that applies the same sampling-based MPC solver and simulator to both a DDR pipeline and a two-stage geometric-plus-dynamic pipeline and finds no accuracy difference in demonstration tracking.

Figures

Figures reproduced from arXiv: 2605.23762 by Armand Jordana, Constant Roux, Ludovic De Matte\"is, Nicolas Mansard, Olivier Stasse, Philippe Sou\`eres, Valentin Guillet.

Figure 1
Figure 1. Figure 1: Our proposed framework uses expert demonstration from monocular video, extracts human keypoints using part of [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Schematic comparison of Geometric Retargeting [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Experimental motion dataset for the IL benchmark. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative analysis of contact sequence accuracy [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Temporal evolution of pelvis Laplacian error during [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Temporal evolution of left foot Laplacian error during [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: The Unitree robot H1-2 performing (a) a stable squat, (b) a static kung fu pose, (c) an extended one-foot balance, (d) a dynamic pistol squat, and (e) a balancing stick pose. ing. By formulating the reference generation problem di￾rectly in the task space using a Laplacian graph distance, our approach effectively mitigates the impact of noisy and drifting human data extracted from videos. Furthermore, by e… view at source ↗
read the original abstract

Imitation Learning from monocular video demonstrations provides a scalable approach for teaching complex skills to humanoid robots. However, translating human motion to humanoids requires overcoming significant morphological mismatches. Standard approaches rely on Geometric Retargeting or Indirect Dynamic Retargeting pipelines. We identify that these intermediate kinematic projections introduce a geometric bias, restricting the search space and yielding suboptimal dynamic behaviors. In this paper, we propose Direct Dynamic Retargeting (DDR), a novel single-stage framework that generates high-fidelity, dynamically feasible trajectories directly from expert videos. By formulating the problem in the task space and leveraging a sampling-based Model Predictive Control solver within a physics simulator, DDR natively optimizes over complex contact sequences while mitigating input drift. Our experiments demonstrate that bypassing the geometric bias allows DDR to outperform state-of-the-art baselines in demonstration tracking accuracy. Furthermore, we establish that providing such physically viable references to RL agents accelerates training convergence and enhances the final execution of agile and balancing behaviors. Source code will be made publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that geometric and indirect dynamic retargeting from monocular videos introduce geometric bias via intermediate kinematic projections, yielding suboptimal dynamic behaviors for humanoid robots. It proposes Direct Dynamic Retargeting (DDR), a single-stage task-space framework that uses a sampling-based MPC solver inside a physics simulator to directly optimize dynamically feasible trajectories and complex contact sequences from expert videos. Experiments are said to show that DDR outperforms state-of-the-art baselines in demonstration tracking accuracy and that the resulting references accelerate RL convergence while improving execution of agile and balancing behaviors.

Significance. If the central claims hold after proper controls, the approach could improve the quality of video-based imitation references for humanoids, reducing reliance on biased kinematic intermediates and thereby accelerating RL for dynamic skills. The stated plan to release source code supports reproducibility.

major comments (2)
  1. [Abstract] The abstract states that DDR 'outperform[s] state-of-the-art baselines in demonstration tracking accuracy' by 'bypassing the geometric bias.' No information is given on whether the baselines were run with the identical sampling-based MPC solver, contact model, simulator fidelity, or solver budget. Without this control, performance deltas cannot be attributed to the absence of intermediate kinematic projections rather than solver or implementation differences.
  2. [Abstract] The claim that 'providing such physically viable references to RL agents accelerates training convergence' is load-bearing for the second contribution. The abstract provides no description of an ablation that applies the same MPC solver to geometrically retargeted references, leaving the geometric-bias explanation unisolated from other factors such as simulator or MPC differences.
minor comments (1)
  1. The abstract notes that source code will be made publicly available; confirming this in the camera-ready version would strengthen the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. The concerns about isolating the effect of geometric bias from solver differences are valid, and we will strengthen the manuscript by adding the suggested controls and ablations.

read point-by-point responses
  1. Referee: [Abstract] The abstract states that DDR 'outperform[s] state-of-the-art baselines in demonstration tracking accuracy' by 'bypassing the geometric bias.' No information is given on whether the baselines were run with the identical sampling-based MPC solver, contact model, simulator fidelity, or solver budget. Without this control, performance deltas cannot be attributed to the absence of intermediate kinematic projections rather than solver or implementation differences.

    Authors: The baselines follow the standard geometric and indirect dynamic retargeting pipelines from prior literature, which are purely kinematic and do not employ sampling-based MPC. DDR's formulation directly optimizes task-space trajectories inside the simulator. To isolate the contribution of bypassing kinematic projections, we will add an ablation that applies the identical MPC solver, contact model, and solver budget to the geometrically retargeted references and report the resulting tracking accuracy. This will appear in the revised experiments section. revision: yes

  2. Referee: [Abstract] The claim that 'providing such physically viable references to RL agents accelerates training convergence' is load-bearing for the second contribution. The abstract provides no description of an ablation that applies the same MPC solver to geometrically retargeted references, leaving the geometric-bias explanation unisolated from other factors such as simulator or MPC differences.

    Authors: We agree that the current experiments do not fully isolate whether the RL acceleration arises from dynamic feasibility versus other implementation differences. We will add an ablation that feeds the same MPC solver the outputs of the geometric retargeting baselines and compare the resulting RL training curves and final policy performance. These results will be incorporated into the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces Direct Dynamic Retargeting as a single-stage optimization framework that directly solves a task-space MPC problem inside a physics simulator. No equations, fitted parameters, or self-citations appear in the abstract or described method that reduce any claimed prediction or result to the inputs by construction. The central premise (that bypassing intermediate kinematic projections removes geometric bias) is presented as an empirical hypothesis tested against baselines rather than a self-definitional or self-citation-dependent derivation. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input supplies no mathematical derivations, datasets, or implementation details, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5720 in / 1148 out tokens · 24547 ms · 2026-05-25T03:56:35.226639+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 2 internal anchors

  1. [1]

    Visual imitation enables contextual humanoid control,

    A. Allshire, H. Choi, J. Zhang, D. McAllister, A. Zhang, C. M. Kim, T. Darrell, P. Abbeel, J. Malik, and A. Kanazawa, “Visual imitation enables contextual humanoid control,” inProceedings of the Conf. on Robot Learning, 2025. TABLE VII: Reference tracking metrics of imitation policies: Root Mean Square Error (RMSE) on the joint configurations, mean cartes...

  2. [2]

    Omniretarget: Interaction-preserving data generation for humanoid whole-body loco-manipulation and scene interaction,

    L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi, “Omniretarget: Interaction-preserving data generation for humanoid whole-body loco-manipulation and scene interaction,” 2025

  3. [3]

    Learning to walk via deep reinforcement learning,

    T. Haarnoja, S. Ha,et al., “Learning to walk via deep reinforcement learning,” inRobotics: Science and Systems XV, 2019

  4. [4]

    Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,

    X. B. Penget al., “Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,”ACM Trans. Graph., 2018

  5. [5]

    Latent conditioned loco-manipulation using motion priors,

    M. St˛ epie´n, R. Kourdis, C. Roux, and O. Stasse, “Latent conditioned loco-manipulation using motion priors,” inIEEE-RAS 24th Int. Conf. on Humanoid Robots, 2025

  6. [6]

    Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion,

    Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu, “Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion,” 2025

  7. [7]

    Humanoid robot locomotion,

    X.-S. Lei, J. Pan, and J.-B. Su, “Humanoid robot locomotion,” inInt. Conf. on Machine Learning and Cybernetics, 2005

  8. [8]

    Hub: Learning extreme humanoid balance,

    T. Zhang, B. Zheng, R. Nai, Y . Hu, Y .-J. Wang, G. Chen, F. Lin, J. Li, C. Hong, K. Sreenath, and Y . Gao, “Hub: Learning extreme humanoid balance,”arXiv preprint arXiv:.07294, 2025

  9. [9]

    Learning agile robotic locomotion skills by imitating animals,

    X. Bin Penget al., “Learning agile robotic locomotion skills by imitating animals,” inRobotics: Science and Systems XVI, 2020

  10. [10]

    Ppr: Physically plausible reconstruction from monocular videos,

    G. Yang, S. Yang, J. Z. Zhang, Z. Manchester, and D. Ramanan, “Ppr: Physically plausible reconstruction from monocular videos,” in IEEE/CVF Int. Conf. on Computer Vision, 2023

  11. [11]

    DynaRetarget: Dynamically-Feasible retargeting using Sampling- Based trajectory optimization,

    V . Dhedin, I. Taouil, S. Omar, D. Yu, K. Tao, A. Dai, and M. Khadiv, “DynaRetarget: Dynamically-Feasible retargeting using Sampling- Based trajectory optimization,” 2026

  12. [12]

    Spatio-temporal motion retargeting for quadruped robots,

    T. Yoon, D. Kang, S. Kim, J. Cheng, M. S. Ahn, S. Coros, and S. Choi, “Spatio-temporal motion retargeting for quadruped robots,” IEEE Trans. on Robotics, 2025

  13. [13]

    Spider: Scalable physics-informed dexterous retargeting,

    C. Pan, C. Wang, H. Qi, Z. Liu, H. Bharadhwaj, A. Sharma, T. Wu, G. Shi, J. Malik, and F. Hogan, “Spider: Scalable physics-informed dexterous retargeting,” 2025

  14. [14]

    A tutorial on the cross-entropy method,

    P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y . Rubinstein, “A tutorial on the cross-entropy method,”Annals of operations research, 2005

  15. [15]

    Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control,

    C. Mastalli, R. Budhiraja, W. Merkt, G. Saurel, B. Hammoud, M. Naveau, J. Carpentier, L. Righetti, S. Vijayakumar, and N. Mansard, “Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control,” inIEEE Int. Conf. on Robotics and Automation, 2020

  16. [16]

    Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,

    K. Ayusawa and E. Yoshida, “Motion retargeting for humanoid robots based on simultaneous morphing parameter identification and motion optimization,”IEEE Trans. on Robotics, 2017

  17. [17]

    Pink: Python inverse kinematics based on Pinocchio,

    S. Caronet al., “Pink: Python inverse kinematics based on Pinocchio,”

  18. [18]

    Available: https://github.com/stephane-caron/pink

    [Online]. Available: https://github.com/stephane-caron/pink

  19. [19]

    Pyroki: A modular toolkit for robot kinematic optimization,

    C. M. Kim*, B. Yi*, H. Choi, Y . Ma, K. Goldberg, and A. Kanazawa, “Pyroki: A modular toolkit for robot kinematic optimization,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2025

  20. [20]

    Optimization-based motion retargeting integrating spatial and dynamic constraints for humanoid,

    T. Moulard, E. Yoshida, and S. Nakaoka, “Optimization-based motion retargeting integrating spatial and dynamic constraints for humanoid,” inIEEE ISR, 2013

  21. [21]

    Kinodynamic motion retargeting for humanoid locomotion via multi-contact whole-body trajectory optimization,

    X. Zhanget al., “Kinodynamic motion retargeting for humanoid locomotion via multi-contact whole-body trajectory optimization,” 2026

  22. [22]

    Agility meets stability: Versatile humanoid control with heterogeneous data,

    Y . Pan, R. Qiao, L. Chen, K. Chitta, L. Pan, H. Mai, Q. Bu, C. Zheng, H. Zhao, P. Luo, and H. Li, “Agility meets stability: Versatile humanoid control with heterogeneous data,”arXiv preprint arXiv:.17373, 2025

  23. [23]

    Amp: Adversarial motion priors for stylized physics-based character control,

    X. B. Penget al., “Amp: Adversarial motion priors for stylized physics-based character control,”ACM Trans. on Graphics, 2021

  24. [24]

    Smpl: A skinned multi-person linear model,

    M. Loperet al., “Smpl: A skinned multi-person linear model,” in Seminal Graphics Papers: Pushing the Boundaries, V olume 2, 2023

  25. [25]

    Hydrax: Sampling-based model predictive control on gpu with jax and mujoco mjx,

    V . Kurtz, “Hydrax: Sampling-based model predictive control on gpu with jax and mujoco mjx,” 2024, https://github.com/vincekurtz/hydrax

  26. [26]

    Cat: Constraints as terminations for legged locomotion reinforcement learning,

    E. Chane-Sane, P.-A. Leziart,et al., “Cat: Constraints as terminations for legged locomotion reinforcement learning,” inIEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2024

  27. [27]

    Constrained reinforcement learning for unstable point-feet bipedal locomotion applied to the bolt robot,

    C. Roux, E. Chane-Sane, L. de Matteïs, T. Flayols, J. Manhes, O. Stasse, P. Souères, and N. Mansard, “Constrained reinforcement learning for unstable point-feet bipedal locomotion applied to the bolt robot,” inIEEE-RAS 24rd Int. Conf. on Humanoid Robots, 2025

  28. [28]

    Proximal Policy Optimization Algorithms

    J. Schulmanet al., “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

  29. [29]

    skrl: Modular and flexible library for reinforcement learning,

    A. Serrano-Muñozet al., “skrl: Modular and flexible library for reinforcement learning,”Journal of Machine Learning Research, 2023

  30. [30]

    Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    M. Mittalet al., “Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning,”arXiv preprint arXiv:2511.04831, 2025

  31. [31]

    The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,

    J. Carpentier, G. Saurel, G. Buondonno, J. Mirabel, F. Lamiraux, O. Stasse, and N. Mansard, “The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives,” inIEEE Int. Symposium on System Integrations, 2019

  32. [32]

    Proxqp: an efficient and versatile quadratic programming solver for real-time robotics applications and beyond,

    A. Bambadeet al., “Proxqp: an efficient and versatile quadratic programming solver for real-time robotics applications and beyond,” IEEE Transactions on Robotics, 2025

  33. [33]

    Mujoco: A physics engine for model-based control,

    E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” inIEEE/RSJ int. conf. on intelligent robots and systems, 2012