pith. sign in

arxiv: 2606.30290 · v1 · pith:2RKSWZQCnew · submitted 2026-06-29 · 💻 cs.RO

X-Morph: Human Motion Priors for Scalable Robot Learning Across Morphologies

Pith reviewed 2026-06-30 05:17 UTC · model grok-4.3

classification 💻 cs.RO
keywords human motion retargetinglegged robot learningreinforcement learningmotion priorscross-morphology transferlocomotion policiesloco-manipulation
0
0 comments X

The pith

Human motion data can be retargeted to train deployable locomotion policies for quadrupeds, hexapods, and manipulator robots.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a pipeline that adapts abundant human motion data to create control policies for legged robots with different body plans. It first converts human motions into references suited to each robot's structure while preserving intent, then trains a policy with privileged information to follow those references, and finally distills the policy into one that operates with only onboard observations. This approach matters because robot-specific motion data is scarce while human data is plentiful, opening a route to scalable behavior learning across embodiments. Evaluation on three distinct platforms demonstrates that the resulting policies track varied motions, handle new human sequences, and enable tasks such as video teleoperation and text-driven motion generation.

Core claim

X-Morph converts human motions into kinematically plausible robot references through cross-morphology retargeting, tracks those references with a privileged reinforcement learning policy, and distills the result into a causal student policy; the resulting policies track diverse retargeted motions, generalize to unseen human motions, and support downstream applications including video-based teleoperation, behavior-prior control, and text-conditioned motion generation on quadruped, hexapod, and quadruped-manipulator platforms.

What carries the argument

Cross-morphology retargeting stage that produces kinematically plausible, intent-preserving robot motion references from human data for subsequent tracking by privileged RL.

If this is right

  • Policies track diverse retargeted motions across three morphologically distinct platforms.
  • Policies generalize to human motions not seen during training.
  • The approach supports video-based teleoperation, behavior-prior control, and text-conditioned motion generation.
  • Large-scale human motion data can serve as a substrate for reusable behavior priors on non-humanoid robots.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Shared human priors could reduce the cost of collecting robot-specific demonstration data for each new morphology.
  • The retargeting-plus-distillation pattern might extend to wheeled or aerial platforms if suitable kinematic mappings are defined.
  • Combining the distilled policies with sim-to-real transfer methods could accelerate physical robot deployment without additional motion capture.

Load-bearing premise

The retargeted motions remain kinematically and dynamically feasible for the target robot to track under its own physics.

What would settle it

Policies trained via the pipeline fail to track the retargeted references or show no advantage over direct robot-specific training when evaluated on held-out human motion sequences.

Figures

Figures reproduced from arXiv: 2606.30290 by Arhaan Jain, Chengyang He, Guillaume Sartoretti, Ritwik Sharma, Shivam Sood, Shyam Charan Kesavamoorthi.

Figure 1
Figure 1. Figure 1: X-Morph framework. Source human/G1 motions are converted into target robot ref￾erences by a cross-morphology retargeting model and then refined by a physics-aware corrector to reduce contact and ground-interaction artifacts. The resulting clean retargeted motions provide refer￾ence data for learning the reference-conditioned tracker and for distilling a causal retargeting model from the offline retargeting… view at source ↗
Figure 2
Figure 2. Figure 2: Video-driven teleoperation across non-humanoid morphologies. X-Morph converts monocular human motion into executable robot references for multiple target platforms. (a) Forward walking is transferred to both a quadruped and a hexapod. (b) Human body rotation induces turning behaviors on both robots. (c) A squat motion is retargeted to a quadruped while preserving the high-level lowering intent. (d) Large a… view at source ↗
Figure 3
Figure 3. Figure 3: Text-conditioned skill execution. A language command is converted into a human motion through a text-conditioned human-motion model or retrieval system. X-Morph retargets the resulting G1 motion to the target morphology and executes it with the same deployed tracker [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Downstream door-opening case study. Left: X-Morph retargets a Kimodo-generated [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Retargeting specifications for locomotion and loco-manipulation. Colored skeleton seg [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Offline retargeting teacher architecture. Solid arrows denote motion and latent flow, gray [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative generalization to unseen manipulation motions. X-Morph transfers source [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
read the original abstract

Recent progress in humanoid behavior models has been driven in large part by abundant human motion data, but comparable motion data is scarce for non-humanoid legged robots such as quadrupeds, hexapods, and quadruped manipulators. A promising alternative is to repurpose human motion across embodiments; however, direct retargeting often produces motions that are visually plausible yet physically inconsistent or difficult to track under robot dynamics. We present X-Morph, a human-motion-to-robot-behavior pipeline that converts human motion into deployable locomotion and loco-manipulation policies for diverse non-humanoid legged morphologies. A cross-morphology retargeting stage converts human motions into kinematically plausible, intent-preserving robot references, which are then tracked by a privileged RL policy and distilled into a causal student policy. We evaluate X-Morph on three morphologically distinct platforms: a quadruped, a hexapod, and a quadruped equipped with a manipulator. The resulting policies track diverse retargeted motions, generalize to unseen human motions, and support downstream use cases including video-based teleoperation, behavior-prior control, and text-conditioned motion generation. These results suggest that large-scale human motion can serve as a substrate for learning broad, reusable behavior priors beyond humanoid robots. Project page: https://maker-rat.github.io/morph/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper presents X-Morph, a pipeline that repurposes abundant human motion data for non-humanoid legged robots via a cross-morphology retargeting stage that produces kinematically plausible, intent-preserving robot references. These references are tracked by a privileged RL policy and distilled into a causal student policy. The method is evaluated on three platforms (quadruped, hexapod, quadruped-with-manipulator), with claims that the resulting policies track diverse retargeted motions, generalize to unseen human motions, and enable downstream tasks including video-based teleoperation, behavior-prior control, and text-conditioned motion generation.

Significance. If the core pipeline is validated with quantitative evidence of dynamic feasibility and generalization, the work would be significant for enabling scalable behavior learning across morphologies where robot-specific motion data is scarce. The explicit evaluation across three distinct platforms and the demonstration of multiple downstream use cases provide concrete evidence of breadth that strengthens the contribution relative to single-morphology retargeting approaches.

major comments (1)
  1. [Abstract] Abstract (and pipeline description): The central claim that retargeted references are tracked by the privileged RL policy and yield generalizable student policies rests on the assumption that the cross-morphology retargeting stage produces dynamically feasible trajectories. Kinematic plausibility and intent preservation do not imply satisfaction of the target robot's equations of motion, friction cones, or torque limits; no explicit feasibility check, failure-mode analysis, or fraction of invalid references is reported.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for this constructive comment on dynamic feasibility. We address it directly below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and pipeline description): The central claim that retargeted references are tracked by the privileged RL policy and yield generalizable student policies rests on the assumption that the cross-morphology retargeting stage produces dynamically feasible trajectories. Kinematic plausibility and intent preservation do not imply satisfaction of the target robot's equations of motion, friction cones, or torque limits; no explicit feasibility check, failure-mode analysis, or fraction of invalid references is reported.

    Authors: We agree that kinematic plausibility does not guarantee dynamic feasibility under the robot's equations of motion, friction cones, or torque limits. In our pipeline the privileged RL policy is trained to track the retargeted references using the full robot dynamics (including contact forces and actuator limits), and successful low-error tracking on the training set provides implicit evidence that the references are feasible for the motions retained. However, we did not report an explicit success rate (fraction of retargeted references trackable within a defined error threshold) or a failure-mode analysis of the retargeting stage. We will add this quantitative analysis and any failure examples to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard retarget-RL-distill pipeline with independent empirical claims.

full rationale

The described pipeline converts human motion via cross-morphology retargeting into references that are then tracked by a privileged RL policy and distilled to a student policy. No equations, fitted parameters, or central claims reduce by construction to inputs defined within the paper or via load-bearing self-citations. The abstract and provided text present the method as a sequence of distinct stages whose success is evaluated empirically on multiple platforms, without renaming known results or smuggling ansatzes through prior author work. This is the common case of a self-contained engineering pipeline.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations, no explicit parameters, and no stated assumptions beyond the high-level pipeline description; therefore no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.1-grok · 5787 in / 1192 out tokens · 27387 ms · 2026-06-30T05:17:17.689064+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 19 canonical work pages · 4 internal anchors

  1. [1]

    X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. Deepmimic: example-guided deep reinforcement learning of physics-based character skills.ACM Transactions on Graphics, 37 (4):1–14, 2018. ISSN 1557-7368. doi:10.1145/3197517.3201311. URLhttp://dx.doi. org/10.1145/3197517.3201311

  2. [2]

    BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

    K. Sreenath, C. K. Liu, T. E. Truong, Q. Liao, X. Huang, and G. Tevet. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion, 2025. URLhttps:// arxiv.org/abs/2508.08241

  3. [3]

    H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi. Hdmi: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

  4. [4]

    Z. Luo, Y . Yuan, T. Wang, C. Li, F. Casta˜neda, S. Chen, Z.-A. Cao, J. Li, D. Minor, Q. Ben, J. Park, D. Sami, Z. Wang, X. Da, R. Ding, C. Hogg, L. Song, E. Lim, E. Jeong, T. He, H. Xue, W. Xiao, S. Yuen, J. Kautz, Y . Chang, U. Iqbal, L. J. Fan, and Y . Zhu. Sonic: Supersizing motion tracking for natural humanoid whole-body control, 2026. URLhttps://arx...

  5. [5]

    B.; Jiang, Y.; Wang, T.; Iqbal, U.; Minor, D.; de Ruyter, M.; et al

    D. Rempe, M. Petrovich, Y . Yuan, H. Zhang, X. B. Peng, Y . Jiang, T. Wang, U. Iqbal, D. Minor, M. de Ruyter, J. Li, C. Tessler, E. Lim, E. Jeong, S. Wu, E. Hassani, M. Huang, J.-B. Yu, C. Chung, L. Song, O. Dionne, J. Kautz, S. Yuen, and S. Fidler. Kimodo: Scaling controllable human motion generation, 2026. URLhttps://arxiv.org/abs/2603.15546

  6. [6]

    T. Li, H. Jung, M. Gombolay, Y . Cho, and S. Ha. Crossloco: Human motion driven control of legged robots via guided unsupervised reinforcement learning. InInternational Conference on Learning Representations, volume 2024, pages 46892–46905, 2024

  7. [7]

    W. Kim, T. Li, and S. Ha. Moreflow: Motion retargeting learning through unsupervised flow matching, 2025. URLhttps://arxiv.org/abs/2509.25600

  8. [8]

    Zhang, T

    L. Zhang, T. Komura, Z. Dou, J. Wang, L.-H. Chen, X. Chen, Y . Zhang, and Z. Yin. Motion2motion: Cross-topology motion transfer with sparse correspondence, 2025. URL https://arxiv.org/abs/2508.13139

  9. [9]

    A. H. Bermano, D. Cohen-Or, G. Tevet, S. Raab, I. Gat, and Y . Reshef. Anytop: Character animation diffusion with any topology, 2025. URLhttps://arxiv.org/abs/2502.17327

  10. [10]

    X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa. Amp.ACM Transactions on Graphics (TOG), 40:1 – 20, 2021. URLhttps://api.semanticscholar.org/CorpusID: 233033739

  11. [11]

    X. B. Peng, Y . Guo, L. Halper, S. Levine, and S. Fidler. Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters.ACM Transactions On Graphics (TOG), 41(4):1–17, 2022

  12. [12]

    Z. Luo, J. Cao, J. Merel, A. Winkler, J. Huang, K. Kitani, and W. Xu. Universal humanoid motion representations for physics-based control. InInternational Conference on Learning Representations, volume 2024, pages 56766–56782, 2024

  13. [13]

    Z. Luo, J. Cao, K. Kitani, W. Xu, et al. Perpetual humanoid control for real-time simulated avatars. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10895–10904, 2023

  14. [14]

    Z. Chen, M. Ji, X. Cheng, X. Peng, X. B. Peng, and X. Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025. 10

  15. [15]

    Y . Mu, Z. Zhang, Y . Shi, M. Matsumoto, K. Imamura, G. Tevet, C. Guo, M. Taylor, C. Shu, P. Xi, et al. Smp: Reusable score-matching motion priors for physics-based character control. arXiv preprint arXiv:2512.03028, 2025

  16. [16]

    X. B. Peng, E. Coumans, T. Zhang, T.-W. Lee, J. Tan, and S. Levine. Learning agile robotic locomotion skills by imitating animals.arXiv preprint arXiv:2004.00784, 2020

  17. [17]

    Aberman, P

    K. Aberman, P. Li, D. Lischinski, O. Sorkine-Hornung, D. Cohen-Or, and B. Chen. Skeleton- aware networks for deep motion retargeting.ACM Transactions on Graphics, 39(4), Aug

  18. [18]

    doi:10.1145/3386569.3392462

    ISSN 1557-7368. doi:10.1145/3386569.3392462. URLhttp://dx.doi.org/10. 1145/3386569.3392462

  19. [19]

    Q. Zhao, P. Li, W. Yifan, O. Sorkine-Hornung, and G. Wetzstein. Pose-to-motion: Cross- domain motion retargeting with pose prior, 2023. URLhttps://arxiv.org/abs/2310. 20249

  20. [20]

    L.-H. Chen, Y . Zhang, Z. Yin, Z. Dou, X. Chen, J. Wang, T. Komura, and L. Zhang. Mo- tion2motion: Cross-topology motion transfer with sparse correspondence. InProceedings of the SIGGRAPH Asia 2025 Conference Papers, pages 1–11, 2025

  21. [21]

    S. Liu, M. Wang, B. Dai, and C. Lu. Palum: Part-based attention learning for unified motion retargeting, 2026. URLhttps://arxiv.org/abs/2601.07272

  22. [22]

    L. Hu, Z. Zhang, C. Zhong, B. Jiang, and S. Xia. Pose-aware attention network for flexible motion retargeting by body part.IEEE Transactions on Visualization and Computer Graphics, 30(8):4792–4808, Aug. 2024. ISSN 2160-9306. doi:10.1109/tvcg.2023.3277918. URLhttp: //dx.doi.org/10.1109/TVCG.2023.3277918

  23. [23]

    S. Kim, M. Sorokin, J. Lee, and S. Ha. Humanconquad: human motion control of quadrupedal robots using deep reinforcement learning. InSIGGRAPH Asia 2022 Emerging Technologies, pages 1–2. 2022

  24. [24]

    T. Li, J. Won, A. Clegg, J. Kim, A. Rai, and S. Ha. Ace: Adversarial correspondence em- bedding for cross morphology motion retargeting from human to nonhuman characters. In SIGGRAPH Asia 2023 Conference Papers, pages 1–11, 2023

  25. [25]

    T. Yoon, D. Kang, S. Kim, J. Cheng, M. Ahn, S. Coros, and S. Choi. Spatio-temporal motion retargeting for quadruped robots.IEEE Transactions on Robotics, 2025

  26. [26]

    T. Yang, S. He, H. Jing, J. Yang, Z. Liu, C. Zou, and Y . Wang. Fast sam 3d body: Accelerating sam 3d body for real-time full-body human mesh recovery, 2026. URLhttps://arxiv. org/abs/2603.15603

  27. [27]

    J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu. Retargeting matters: General motion retargeting for humanoid motion tracking.arXiv preprint arXiv:2510.02252, 2025

  28. [28]

    Mahmood, N

    N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black. AMASS: Archive of motion capture as surface shapes. InInternational Conference on Computer Vision, pages 5442–5451, Oct. 2019

  29. [29]

    F. G. Harvey, M. Yurick, D. Nowrouzezahrai, and C. Pal. Robust motion in-betweening.ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH), 39(4), 2020

  30. [30]

    S. Sood, L. Nakhwa, S. Ge, Y . Cao, J. Cheng, F. Zargarbashi, T. Yoon, S. Choi, S. Coros, and G. Sartoretti. Apex: Action priors enable efficient exploration for robust motion tracking on legged robots, 2025. URLhttps://arxiv.org/abs/2505.10022

  31. [31]

    ACM Trans

    M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black. Smpl: a skinned multi- person linear model.ACM Trans. Graph., 34(6), Nov. 2015. ISSN 0730-0301. doi:10.1145/ 2816795.2818013. URLhttps://doi.org/10.1145/2816795.2818013. 11

  32. [32]

    S. Sood, G. Sun, P. Li, and G. Sartoretti. Decap : Decaying action priors for accelerated imitation learning of torque-based legged locomotion policies. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2809–2815, 2024. doi:10.1109/ IROS58592.2024.10802000. 12 Appendix A Retargeting Specifications X-Morph separates m...