pith. machine review for the scientific record. sign in

arxiv: 2602.15827 · v2 · submitted 2026-02-17 · 💻 cs.RO · cs.AI· cs.LG· cs.SY· eess.SY

Recognition: no theorem link

Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:25 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LGcs.SYeess.SY
keywords humanoid parkourmotion matchingreinforcement learningdepth perceptionskill compositionlocomotionvision-based control
0
0 comments X

The pith

A humanoid robot chains retargeted human parkour skills into one depth-driven policy that autonomously chooses and executes climbs, vaults, or rolls over obstacles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates a method for humanoid robots to perform extended sequences of agile actions such as stepping over, climbing, vaulting, or rolling off obstacles of different sizes and shapes. It begins by using motion matching to link short human motion clips into longer smooth trajectories. Expert reinforcement learning policies are trained to follow these trajectories, then distilled via DAgger and RL into a single student policy that receives only onboard depth images and a simple 2D velocity command. This integrated perception and skill system lets the robot decide its next action based on what it sees without extra sensors or per-obstacle adjustments. Real-world tests on a Unitree G1 robot confirm the robot can handle obstacles up to 1.25 meters tall and adapt when obstacles shift during traversal.

Core claim

Retargeted human kinematic trajectories are composed into long-horizon motions through nearest-neighbor search in feature space, tracked by expert RL policies, and distilled into a unified depth-based multi-skill policy. With only onboard depth sensing and discrete 2D velocity commands, the policy selects and executes context-appropriate skills such as stepping over, climbing onto, vaulting, or rolling off obstacles of varying geometries and heights, enabling autonomous long-horizon parkour that adapts to real-time perturbations.

What carries the argument

Motion matching formulated as nearest-neighbor search in feature space, which composes atomic human skills into continuous kinematic trajectories for subsequent RL tracking and distillation into a single depth-based policy.

Load-bearing premise

Retargeted human motion data can be tracked by RL policies and successfully distilled into one depth-only policy that handles real-world changes in obstacle geometry without further tuning or sensing.

What would settle it

Deploy the policy on an obstacle whose height or shape lies outside the range used in training and check whether the robot still selects and completes an appropriate skill without falling or requiring manual retuning.

Figures

Figures reproduced from arXiv: 2602.15827 by Angjoo Kanazawa, Carmelo Sferrazza, C. Karen Liu, Guanya Shi, Lujie Yang, Pieter Abbeel, Rocky Duan, Xiaoyu Huang, Xi Chen, Yuanhang Zhang, Zhen Wu.

Figure 1
Figure 1. Figure 1: Perceptive Humanoid Parkour (PHP) enables a Unitree G1 humanoid robot to execute highly dynamic, long-horizon parkour behaviors using onboard perception. By composing various agile human skills via motion matching and a teacher￾student training pipeline, we train a single multi-skill visuomotor policy capable of complex contact-rich maneuvers including (a) cat-vaulting over a short obstacle followed by das… view at source ↗
Figure 2
Figure 2. Figure 2: Perceptive Humanoid Parkour overview. Atomic parkour skills are composed into long-horizon kinematic reference trajectories via motion matching. Single-skill teacher policies are trained with privileged information using RL-based motion tracking. Multiple teachers are distilled into a single depth-based student policy using a hybrid DAgger and RL objective. This scalable recipe enables zero-shot sim-to-rea… view at source ↗
Figure 3
Figure 3. Figure 3: Diverse variations of composed parkour skills synthesized via motion matching. (a) Different approach distances trigger varying stride phases and entry poses. (b) Diverse locomotion speeds, directions, and durations. (c) Ran￾domized terrain poses and shapes. Dataset construction. We synthesize long-horizon refer￾ence trajectories by rolling out the motion-matching compo￾sition procedure as follows. As visu… view at source ↗
Figure 4
Figure 4. Figure 4: Side-by-side comparison of high-climb agility. The [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Hardware results demonstrating agile, long-horizon parkour behaviors, including (a) a cat vault, (b) a drop landing from [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

While recent advances in humanoid locomotion have achieved stable walking on varied terrains, capturing the agility and adaptivity of highly dynamic human motions remains an open challenge. In particular, agile parkour in complex environments demands not only low-level robustness, but also human-like motion expressiveness, long-horizon skill composition, and perception-driven decision-making. In this paper, we present Perceptive Humanoid Parkour (PHP), a modular framework that enables humanoid robots to autonomously perform long-horizon, vision-based parkour across challenging obstacle courses. Our approach first leverages motion matching, formulated as nearest-neighbor search in a feature space, to compose retargeted atomic human skills into long-horizon kinematic trajectories. This framework enables the flexible composition and smooth transition of complex skill chains while preserving the elegance and fluidity of dynamic human motions. Next, we train motion-tracking reinforcement learning (RL) expert policies for these composed motions, and distill them into a single depth-based, multi-skill student policy, using a combination of DAgger and RL. Crucially, the combination of perception and skill composition enables autonomous, context-aware decision-making: using only onboard depth sensing and a discrete 2D velocity command, the robot selects and executes whether to step over, climb onto, vault or roll off obstacles of varying geometries and heights. We validate our framework with extensive real-world experiments on a Unitree G1 humanoid robot, demonstrating highly dynamic parkour skills such as climbing tall obstacles up to 1.25m (96% robot height), as well as long-horizon multi-obstacle traversal with closed-loop adaptation to real-time obstacle perturbations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents Perceptive Humanoid Parkour (PHP), a modular framework for humanoid robots to perform long-horizon, vision-based parkour. It uses motion matching via nearest-neighbor search in feature space to compose retargeted human skills into kinematic trajectories, trains RL expert policies to track them, and distills the experts into a single depth-image-conditioned student policy via DAgger+RL. The student policy, given only onboard depth sensing and a discrete 2D velocity command, autonomously selects and executes skills such as step-over, climb, vault, or roll-off for varying obstacle geometries. The framework is validated through real-world experiments on a Unitree G1 humanoid, including climbs up to 1.25 m and closed-loop adaptation to perturbations.

Significance. If the central claims hold, the work would represent a meaningful advance in agile humanoid locomotion by demonstrating perception-driven composition of dynamic human skills without hand-crafted controllers or additional sensing. The combination of motion-matching composition with policy distillation offers a scalable path toward long-horizon behaviors, and successful real-world transfer on a commercial platform would strengthen evidence for sim-to-real methods in high-dynamics settings.

major comments (2)
  1. [Abstract] Abstract: the claim of 'extensive real-world experiments' demonstrating 'highly dynamic parkour skills' and 'closed-loop adaptation to real-time obstacle perturbations' is not accompanied by any quantitative metrics, success rates, failure modes, baselines, or ablation studies. This leaves the central claim of reliable autonomous skill selection under depth-only sensing only partially supported.
  2. [Method (distillation subsection)] The distillation step (DAgger+RL from multiple RL experts into a single depth-based student) is load-bearing for the autonomous decision-making claim, yet the manuscript provides no analysis of whether depth observations alone suffice to recover both geometry classification and precise timing for transitions without mode collapse or loss of dynamic fidelity on perturbed geometries.
minor comments (2)
  1. [Method] Notation for the motion-matching feature space and nearest-neighbor distance metric should be defined explicitly with equations rather than described in prose.
  2. [Experiments] The abstract mentions '96% robot height' for the 1.25 m climb; the corresponding robot height and exact obstacle dimensions should be stated consistently in the experiments section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, clarifying the experimental support in the manuscript while committing to revisions that strengthen the presentation of quantitative evidence and analysis.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of 'extensive real-world experiments' demonstrating 'highly dynamic parkour skills' and 'closed-loop adaptation to real-time obstacle perturbations' is not accompanied by any quantitative metrics, success rates, failure modes, baselines, or ablation studies. This leaves the central claim of reliable autonomous skill selection under depth-only sensing only partially supported.

    Authors: We agree that the abstract would be strengthened by explicit quantitative metrics. The manuscript body (Section 5 and supplementary material) presents results from extensive real-world trials on the Unitree G1, including success across multiple obstacle types up to 1.25 m, long-horizon traversals, and adaptation to perturbations, with comparisons to non-perceptive baselines. We will revise the abstract to incorporate key metrics such as overall success rates, adaptation performance, and references to the ablation studies and failure mode analysis already present in the main text. revision: yes

  2. Referee: [Method (distillation subsection)] The distillation step (DAgger+RL from multiple RL experts into a single depth-based student) is load-bearing for the autonomous decision-making claim, yet the manuscript provides no analysis of whether depth observations alone suffice to recover both geometry classification and precise timing for transitions without mode collapse or loss of dynamic fidelity on perturbed geometries.

    Authors: The distillation subsection describes the DAgger+RL procedure that produces a single depth-conditioned student policy from the expert set. Real-world experiments demonstrate that this policy achieves autonomous skill selection and closed-loop adaptation to perturbations on varied geometries without evident mode collapse or loss of dynamic fidelity. We acknowledge that an explicit analysis of depth sufficiency for geometry classification and transition timing would provide additional rigor. We will add a targeted discussion in the revised method section examining the policy's observed behavior under depth-only input, drawing on the experimental outcomes. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper describes a modular pipeline that first applies motion matching (nearest-neighbor search in feature space) to compose retargeted human skills into kinematic trajectories, then trains separate RL expert policies to track those trajectories, and finally distills the experts into one depth-conditioned student policy via DAgger plus RL. None of these steps reduce by construction to quantities defined inside the paper; motion matching is a standard external technique, the RL tracking objective is independent of the final student policy, and the distillation step is a conventional supervised transfer process whose success is measured by external real-world experiments on the Unitree G1 rather than by internal re-use of fitted parameters. No self-citations are invoked to establish uniqueness or to smuggle in ansatzes, and the central claim of perception-driven skill selection rests on the empirical behavior of the trained policy rather than on any definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard robotics assumptions about motion retargeting and policy transfer; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Retargeted human motions preserve sufficient dynamic properties for stable robot execution
    Invoked when composing retargeted atomic human skills into kinematic trajectories for RL tracking.

pith-pipeline@v0.9.0 · 5648 in / 1320 out tokens · 26219 ms · 2026-05-15T21:25:42.825073+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learning Versatile Humanoid Manipulation with Touch Dreaming

    cs.RO 2026-04 conditional novelty 5.0

    HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-r...

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Legged locomotion in challenging ter- rains using egocentric vision

    Ananye Agarwal, Ashish Kumar, Jitendra Malik, and Deepak Pathak. Legged locomotion in challenging ter- rains using egocentric vision. InConference on robot learning, pages 403–415. PMLR, 2023

  2. [2]

    Gallant: V oxel grid-based humanoid locomotion and local-navigation across 3d constrained terrains, 2025

    Qingwei Ben, Botian Xu, Kailin Li, Feiyu Jia, Wentao Zhang, Jingping Wang, Jingbo Wang, Dahua Lin, and Jiangmiao Pang. Gallant: V oxel grid-based humanoid locomotion and local-navigation across 3d constrained terrains, 2025. URL https://arxiv.org/abs/2511.14625

  3. [3]

    Drecon: data-driven responsive control of physics-based characters.ACM Transactions On Graphics (TOG), 38(6):1–11, 2019

    Kevin Bergamin, Simon Clavet, Daniel Holden, and James Richard Forbes. Drecon: data-driven responsive control of physics-based characters.ACM Transactions On Graphics (TOG), 38(6):1–11, 2019

  4. [4]

    Inertialization: High-performance anima- tion transitions in Gears of War

    David Bollo. Inertialization: High-performance anima- tion transitions in Gears of War. Proc. of GDC, 2018

  5. [5]

    Motion matching - the road to next gen animation

    Michael B ¨uttner and Simon Clavet. Motion matching - the road to next gen animation. Proc. of Nucl.ai, 2015

  6. [6]

    Bark- our: Benchmarking animal-level agility with quadruped robots.arXiv preprint arXiv:2305.14654, 2023

    Ken Caluwaerts, Atil Iscen, J Chase Kew, Wenhao Yu, Tingnan Zhang, Daniel Freeman, Kuang-Huei Lee, Lisa Lee, Stefano Saliceti, Vincent Zhuang, et al. Bark- our: Benchmarking animal-level agility with quadruped robots.arXiv preprint arXiv:2305.14654, 2023

  7. [7]

    Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

    Zixuan Chen, Mazeyu Ji, Xuxin Cheng, Xuanbin Peng, Xue Bin Peng, and Xiaolong Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

  8. [8]

    Extreme parkour with legged robots

    Xuxin Cheng, Kexin Shi, Ananye Agarwal, and Deepak Pathak. Extreme parkour with legged robots. In2024 IEEE International Conference on Robotics and Automa- tion (ICRA), pages 11443–11450. IEEE, 2024

  9. [9]

    Motion matching and the road to next-gen animation

    Simon Clavet. Motion matching and the road to next-gen animation. Proc. of GDC, 2016

  10. [10]

    Humanplus: Humanoid shad- owing and imitation from humans.arXiv preprint arXiv:2406.10454, 2024

    Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wet- zstein, and Chelsea Finn. Humanplus: Humanoid shad- owing and imitation from humans.arXiv preprint arXiv:2406.10454, 2024

  11. [11]

    Control operators for interactive character animation

    Ruiyu Gou, Michiel van de Panne, and Daniel Holden. Control operators for interactive character animation. ACM Transactions on Graphics (TOG), 2025

  12. [12]

    Attention- based map encoding for learning generalized legged locomotion.Science Robotics, 10(105):eadv3604, 2025

    Junzhe He, Chong Zhang, Fabian Jenelten, Ruben Grandia, Moritz B ¨acher, and Marco Hutter. Attention- based map encoding for learning generalized legged locomotion.Science Robotics, 10(105):eadv3604, 2025

  13. [13]

    Anymal parkour: Learning agile navigation for quadrupedal robots.Science Robotics, 9(88):eadi7566, 2024

    David Hoeller, Nikita Rudin, Dhionis Sako, and Marco Hutter. Anymal parkour: Learning agile navigation for quadrupedal robots.Science Robotics, 9(88):eadi7566, 2024

  14. [14]

    Learned motion matching.ACM Transactions on Graph- ics (TOG), 2020

    Daniel Holden, Anas Kanoun, Michiel B ˘uttner, Sofien Bouaziz, Sebastian Thrun, and Aaron Hertzmann. Learned motion matching.ACM Transactions on Graph- ics (TOG), 2020

  15. [15]

    Huang,et al., Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control (2025),https://arxiv.org/abs/2503.11801

    Xiaoyu Huang, Takara Truong, Yunbo Zhang, Fangzhou Yu, Jean Pierre Sleiman, Jessica Hodgins, Koushil Sreenath, and Farbod Farshidian. Diffuse-cloc: Guided diffusion for physics-based character look-ahead control. arXiv preprint arXiv:2503.11801, 2025

  16. [16]

    Dreamcontrol: Human-inspired whole-body humanoid control for scene interaction via guided diffusion.arXiv preprint arXiv:2509.14353, 2025

    Dvij Kalaria, Sudarshan S Harithas, Pushkal Katara, Sangkyung Kwak, Sarthak Bhagat, Shankar Sastry, Sri- nath Sridhar, Sai Vemprala, Ashish Kapoor, and Jonathan Chung-Kuan Huang. Dreamcontrol: Human-inspired whole-body humanoid control for scene interaction via guided diffusion.arXiv preprint arXiv:2509.14353, 2025

  17. [17]

    Animal gaits on quadrupedal robots using motion match- ing and model-based control

    Dongho Kang, Simon Zimmermann, and Stelian Coros. Animal gaits on quadrupedal robots using motion match- ing and model-based control. In2021 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pages 8500–8507. IEEE, 2021

  18. [18]

    Rma: Rapid motor adaptation for legged robots

    Ashish Kumar, Zipeng Fu, Deepak Pathak, and Jitendra Malik. Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034, 2021

  19. [19]

    Learning quadrupedal locomotion over challenging terrain.Science robotics, 5 (47):eabc5986, 2020

    Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain.Science robotics, 5 (47):eabc5986, 2020

  20. [20]

    BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

    Qiayuan Liao, Takara E Truong, Xiaoyu Huang, Yu- man Gao, Guy Tevet, Koushil Sreenath, and C Karen Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

  21. [21]

    Hybrid internal model: Learning agile legged locomotion with simulated robot response.arXiv preprint arXiv:2312.11460, 2023

    Junfeng Long, Zirui Wang, Quanyi Li, Jiawei Gao, Liu Cao, and Jiangmiao Pang. Hybrid internal model: Learning agile legged locomotion with simulated robot response.arXiv preprint arXiv:2312.11460, 2023

  22. [22]

    Learning hu- manoid locomotion with perceptive internal model

    Junfeng Long, Junli Ren, Moji Shi, Zirui Wang, Tao Huang, Ping Luo, and Jiangmiao Pang. Learning hu- manoid locomotion with perceptive internal model. In 2025 IEEE International Conference on Robotics and Automation (ICRA), pages 9997–10003. IEEE, 2025

  23. [23]

    Pie: Parkour with implicit-explicit learning framework for legged robots.IEEE Robotics and Automation Letters, 2024

    Shixin Luo, Songbo Li, Ruiqi Yu, Zhicheng Wang, Jun Wu, and Qiuguo Zhu. Pie: Parkour with implicit-explicit learning framework for legged robots.IEEE Robotics and Automation Letters, 2024

  24. [24]

    Sonic: Supersizing motion tracking for natural humanoid whole-body con- trol.arXiv preprint arXiv:2511.07820, 2025

    Zhengyi Luo, Ye Yuan, Tingwu Wang, Chenran Li, Sirui Chen, Fernando Casta ˜neda, Zi-Ang Cao, Jiefeng Li, David Minor, Qingwei Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body con- trol.arXiv preprint arXiv:2511.07820, 2025

  25. [25]

    Warp: A high-performance python frame- work for gpu simulation and graphics

    Miles Macklin. Warp: A high-performance python frame- work for gpu simulation and graphics. https://github.com/ nvidia/warp, March 2022. NVIDIA GPU Technology Conference (GTC)

  26. [26]

    Learning robust perceptive locomotion for quadrupedal robots in the wild.Science robotics, 7(62):eabk2822, 2022

    Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning robust perceptive locomotion for quadrupedal robots in the wild.Science robotics, 7(62):eabk2822, 2022

  27. [27]

    Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Mu ˜noz, Xinjie Yao, Ren ´e Zurbr ¨ugg, Nikita Rudin, et al. Isaac lab: A gpu-accelerated simulation framework for multi- modal robot learning.arXiv preprint arXiv:2511.04831, 2025

  28. [28]

    Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning.arXiv preprint arXiv:2301.10602, 2023

    I Nahrendra, Byeongho Yu, and Hyun Myung. Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning.arXiv preprint arXiv:2301.10602, 2023

  29. [29]

    Agility meets stability: Versa- tile humanoid control with heterogeneous data.arXiv preprint arXiv:2511.17373, 2025

    Yixuan Pan, Ruoyi Qiao, Li Chen, Kashyap Chitta, Liang Pan, Haoguang Mai, Qingwen Bu, Hao Zhao, Cunyuan Zheng, Ping Luo, et al. Agility meets stability: Versa- tile humanoid control with heterogeneous data.arXiv preprint arXiv:2511.17373, 2025

  30. [30]

    Deepmimic: Example-guided deep re- inforcement learning of physics-based character skills

    Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel Van de Panne. Deepmimic: Example-guided deep re- inforcement learning of physics-based character skills. ACM Transactions On Graphics (TOG), 37(4):1–14, 2018

  31. [31]

    Amp: Adversarial motion priors for stylized physics-based character control.ACM Transac- tions on Graphics (TOG), 2021

    Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. Amp: Adversarial motion priors for stylized physics-based character control.ACM Transac- tions on Graphics (TOG), 2021

  32. [32]

    A reduction of imitation learning and structured prediction to no-regret online learning

    St ´ephane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InProceedings of the fourteenth international conference on artificial intelli- gence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011

  33. [33]

    Parkour in the wild: Learning a general and exten- sible agile locomotion policy using multi-expert distilla- tion and rl fine-tuning.arXiv preprint arXiv:2505.11164, 2025

    Nikita Rudin, Junzhe He, Joshua Aurand, and Marco Hutter. Parkour in the wild: Learning a general and exten- sible agile locomotion policy using multi-expert distilla- tion and rl fine-tuning.arXiv preprint arXiv:2505.11164, 2025

  34. [34]

    Learn parkour - climb up tutorial

    Salgadopk. Learn parkour - climb up tutorial. URL https://youtu.be/6U1sIgqgPFo?si=339TPTxlFB5lWGB1

  35. [35]

    Dpl: Depth-only perceptive humanoid locomotion via realistic depth synthesis and cross-attention terrain reconstruction

    Jingkai Sun, Gang Han, Pihai Sun, Wen Zhao, Jiahang Cao, Jiaxu Wang, Yijie Guo, and Qiang Zhang. Dpl: Depth-only perceptive humanoid locomotion via realistic depth synthesis and cross-attention terrain reconstruction. arXiv preprint arXiv:2510.07152, 2025

  36. [36]

    Human motion diffusion model

    Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. Human motion diffusion model. InICLR, 2023

  37. [37]

    Beamdojo: Learning agile humanoid locomotion on sparse footholds

    Huayi Wang, Zirui Wang, Junli Ren, Qingwei Ben, Tao Huang, Weinan Zhang, and Jiangmiao Pang. Beamdojo: Learning agile humanoid locomotion on sparse footholds. InRobotics: Science and Systems (RSS), 2025

  38. [38]

    Wang,et al., PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System.arXiv preprint arXiv:2510.11072(2025)

    Huayi Wang, Wentao Zhang, Runyi Yu, Tao Huang, Junli Ren, Feiyu Jia, Zirui Wang, Xiaojie Niu, Xiao Chen, Jiahe Chen, et al. Physhsi: Towards a real-world gener- alizable and natural humanoid-scene interaction system. arXiv preprint arXiv:2510.11072, 2025

  39. [39]

    Learning robust and agile legged locomotion using ad- versarial motion priors.IEEE Robotics and Automation Letters, 8(8):4975–4982, 2023

    Jinze Wu, Guiyang Xin, Chenkun Qi, and Yufei Xue. Learning robust and agile legged locomotion using ad- versarial motion priors.IEEE Robotics and Automation Letters, 8(8):4975–4982, 2023

  40. [40]

    Xie,et al., KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly- Dynamic Skills.arXiv preprint arXiv:2506.12851(2025)

    Weiji Xie, Jinrui Han, Jiakun Zheng, Huanyu Li, Xinzhe Liu, Jiyuan Shi, Weinan Zhang, Chenjia Bai, and Xue- long Li. Kungfubot: Physics-based humanoid whole- body control for learning highly-dynamic skills.arXiv preprint arXiv:2506.12851, 2025

  41. [41]

    Parc: Physics-based augmentation with reinforcement learning for character controllers

    Michael Xu, Yi Shi, KangKang Yin, and Xue Bin Peng. Parc: Physics-based augmentation with reinforcement learning for character controllers. InACM SIGGRAPH, 2025

  42. [42]

    Learning to ball: Composing policies for long-horizon basketball moves.ACM Transactions on Graphics (TOG), 44(6):1–14, 2025

    Pei Xu, Zhen Wu, Ruocheng Wang, Vishnu Sarukkai, Kayvon Fatahalian, Ioannis Karamouzas, Victor Zordan, and C Karen Liu. Learning to ball: Composing policies for long-horizon basketball moves.ACM Transactions on Graphics (TOG), 44(6):1–14, 2025

  43. [43]

    Omniretarget: Interaction- preserving data generation for humanoid whole-body loco-manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

    Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C Karen Liu, Rocky Duan, and Guanya Shi. Omniretarget: Interaction- preserving data generation for humanoid whole-body loco-manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

  44. [44]

    Neural volumetric memory for visual locomotion control

    Ruihan Yang, Ge Yang, and Xiaolong Wang. Neural volumetric memory for visual locomotion control. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1430–1440, 2023

  45. [45]

    Learning visual parkour from generated images

    Alan Yu, Ge Yang, Ran Choi, Yajvan Ravan, John Leonard, and Phillip Isola. Learning visual parkour from generated images. In8th Annual Conference on Robot Learning, 2024

  46. [46]

    Walking with terrain recon- struction: Learning to traverse risky sparse footholds

    Ruiqi Yu, Qianshi Wang, Yizhen Wang, Zhicheng Wang, Jun Wu, and Qiuguo Zhu. Walking with terrain recon- struction: Learning to traverse risky sparse footholds. arXiv preprint arXiv:2409.15692, 2024

  47. [47]

    Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

    Yanjie Ze, Siheng Zhao, Weizhuo Wang, Angjoo Kanazawa, Rocky Duan, Pieter Abbeel, Guanya Shi, Jiajun Wu, and C Karen Liu. Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

  48. [48]

    Hub: Learning extreme humanoid balance.CoRL, 2025

    Tong Zhang, Boyuan Zheng, Ruiqian Nai, Yingdong Hu, Yen-Jen Wang, Geng Chen, Fanqi Lin, Jiongye Li, Chuye Hong, Koushil Sreenath, et al. Hub: Learning extreme humanoid balance.CoRL, 2025

  49. [49]

    Add: Physics-based motion imi- tation with adversarial differential discriminators.arXiv preprint arXiv:2505.04961, 2025

    Ziyu Zhang, Sergey Bashkirov, Dun Yang, Michael Tay- lor, and Xue Bin Peng. Add: Physics-based motion imi- tation with adversarial differential discriminators.arXiv preprint arXiv:2505.04961, 2025

  50. [50]

    Hiking in the wild: A scalable perceptive parkour framework for humanoids.arXiv preprint arXiv:2601.07718, 2026

    Shaoting Zhu, Ziwen Zhuang, Mengjie Zhao, Kun-Ying Lee, and Hang Zhao. Hiking in the wild: A scalable perceptive parkour framework for humanoids.arXiv preprint arXiv:2601.07718, 2026

  51. [51]

    Robot parkour learning.arXiv preprint arXiv:2309.05665, 2023

    Ziwen Zhuang, Zipeng Fu, Jianren Wang, Christo- pher Atkeson, Soeren Schwertfeger, Chelsea Finn, and Hang Zhao. Robot parkour learning.arXiv preprint arXiv:2309.05665, 2023

  52. [52]

    Humanoid parkour learning.arXiv preprint arXiv:2406.10759, 2024

    Ziwen Zhuang, Shenzhe Yao, and Hang Zhao. Humanoid parkour learning.arXiv preprint arXiv:2406.10759, 2024. APPENDIX A. Motion Matching Implementation Details This section provides implementation details for the motion matching procedure used to synthesize long-horizon parkour reference trajectories

  53. [53]

    At each framei, we store the robot configuration qi = (p i,r i,θ i), consisting of the root translationp i ∈R 3, root quaternionr i ∈R 4, and joint anglesθ i ∈R 29

    Motion Database and Feature Precomputation:All mo- tion clips are first retargeted to a 29-DOF Unitree G1 hu- manoid using OmniRetarget [43] and represented as frame sequences. At each framei, we store the robot configuration qi = (p i,r i,θ i), consisting of the root translationp i ∈R 3, root quaternionr i ∈R 4, and joint anglesθ i ∈R 29. For each frame,...

  54. [54]

    position

    Query Feature Construction:At runtime, a query feature ˆxt is constructed from the current robot configurationq t and a 2D velocity command. We first extract the kinematic features fromq t to form the pose-based part of the query, namely the local foot state ˆft and the root velocity ˆht. We then compute the short-horizon future root trajectory from the 2...

  55. [55]

    Transition Smoothing via Inertialization:To ensure smooth transitions when switching the playback index to a newly retrieved frame, we adopt inertialization [4]. The key idea is to compute an offset between the currently playing motion and the target motion at the transition instant, apply this offset after switching so the output remains continuous, and ...

  56. [56]

    Locomotion provides a shared transition manifold and includes standing, walking, and run- ning motions spanning commanded speeds from 0.8 to 3.5 m/s

    Skill List:Our motion library includes locomotion and a set of atomic parkour skills. Locomotion provides a shared transition manifold and includes standing, walking, and run- ning motions spanning commanded speeds from 0.8 to 3.5 m/s. Most parkour skills are instantiated at 1.0 m/s and 2.0 m/s. We additionally include a single 3.0 m/s cat-vault skill to ...

  57. [57]

    Motion Tracking Details:Specific reward formulations and domain randomization settings used for expert policy learning from [20] are summarized in Table IV and Table V for reference

  58. [58]

    Distillation Details:During student training, we relax the termination conditions relative to the expert to prevent premature termination of valid but mirrored executions. While this improves PPO stability, the student may visit states that are out-of-distribution for the expert policies, which were trained under the original termination thresholds and ma...

  59. [59]

    Training Hyperparameters:We include all hyperparam- eters for two-stage training in Table VI for reference. C. Details for Baselines

  60. [60]

    curriculum, without any motion imitation or human refer- ence trajectories

    Velocity Tracking Baseline:To show the importance of human reference motion in our framework, we include a standard reward-shaping velocity-tracking baseline that learns locomotion purely from handcrafted rewards and a terrain Skill Duration (s) Locomotion Locomotion 495.5 Parkour skills @ 1.0 m/s Step (36 cm) 2.2 Climb (58 cm) 12.1 Climb (76 cm) 8.8 Clim...

  61. [61]

    AMP Baseline:Since AMP [31] is a popular algorithm for chaining skills with human reference data, we also im- plemented an AMP baseline by following theMimicKit 2 AMP implementation released by the original AMP authors. In our experiments, this baseline can walk stably and track the commanded velocity, but it does not perform well on obstacle traversal: i...