arxiv: 2602.15827 · v2 · submitted 2026-02-17 · 💻 cs.RO · cs.AI· cs.LG· cs.SY· eess.SY

Recognition: no theorem link

Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Zhen Wu , Xiaoyu Huang , Lujie Yang , Yuanhang Zhang , Xi Chen , Pieter Abbeel , Rocky Duan , Angjoo Kanazawa

show 3 more authors

Carmelo Sferrazza Guanya Shi C. Karen Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:25 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LGcs.SYeess.SY

keywords humanoid parkourmotion matchingreinforcement learningdepth perceptionskill compositionlocomotionvision-based control

0 comments

The pith

A humanoid robot chains retargeted human parkour skills into one depth-driven policy that autonomously chooses and executes climbs, vaults, or rolls over obstacles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates a method for humanoid robots to perform extended sequences of agile actions such as stepping over, climbing, vaulting, or rolling off obstacles of different sizes and shapes. It begins by using motion matching to link short human motion clips into longer smooth trajectories. Expert reinforcement learning policies are trained to follow these trajectories, then distilled via DAgger and RL into a single student policy that receives only onboard depth images and a simple 2D velocity command. This integrated perception and skill system lets the robot decide its next action based on what it sees without extra sensors or per-obstacle adjustments. Real-world tests on a Unitree G1 robot confirm the robot can handle obstacles up to 1.25 meters tall and adapt when obstacles shift during traversal.

Core claim

Retargeted human kinematic trajectories are composed into long-horizon motions through nearest-neighbor search in feature space, tracked by expert RL policies, and distilled into a unified depth-based multi-skill policy. With only onboard depth sensing and discrete 2D velocity commands, the policy selects and executes context-appropriate skills such as stepping over, climbing onto, vaulting, or rolling off obstacles of varying geometries and heights, enabling autonomous long-horizon parkour that adapts to real-time perturbations.

What carries the argument

Motion matching formulated as nearest-neighbor search in feature space, which composes atomic human skills into continuous kinematic trajectories for subsequent RL tracking and distillation into a single depth-based policy.

Load-bearing premise

Retargeted human motion data can be tracked by RL policies and successfully distilled into one depth-only policy that handles real-world changes in obstacle geometry without further tuning or sensing.

What would settle it

Deploy the policy on an obstacle whose height or shape lies outside the range used in training and check whether the robot still selects and completes an appropriate skill without falling or requiring manual retuning.

Figures

Figures reproduced from arXiv: 2602.15827 by Angjoo Kanazawa, Carmelo Sferrazza, C. Karen Liu, Guanya Shi, Lujie Yang, Pieter Abbeel, Rocky Duan, Xiaoyu Huang, Xi Chen, Yuanhang Zhang, Zhen Wu.

**Figure 1.** Figure 1: Perceptive Humanoid Parkour (PHP) enables a Unitree G1 humanoid robot to execute highly dynamic, long-horizon parkour behaviors using onboard perception. By composing various agile human skills via motion matching and a teacherstudent training pipeline, we train a single multi-skill visuomotor policy capable of complex contact-rich maneuvers including (a) cat-vaulting over a short obstacle followed by das… view at source ↗

**Figure 2.** Figure 2: Perceptive Humanoid Parkour overview. Atomic parkour skills are composed into long-horizon kinematic reference trajectories via motion matching. Single-skill teacher policies are trained with privileged information using RL-based motion tracking. Multiple teachers are distilled into a single depth-based student policy using a hybrid DAgger and RL objective. This scalable recipe enables zero-shot sim-to-rea… view at source ↗

**Figure 3.** Figure 3: Diverse variations of composed parkour skills synthesized via motion matching. (a) Different approach distances trigger varying stride phases and entry poses. (b) Diverse locomotion speeds, directions, and durations. (c) Randomized terrain poses and shapes. Dataset construction. We synthesize long-horizon reference trajectories by rolling out the motion-matching composition procedure as follows. As visu… view at source ↗

**Figure 4.** Figure 4: Side-by-side comparison of high-climb agility. The [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Hardware results demonstrating agile, long-horizon parkour behaviors, including (a) a cat vault, (b) a drop landing from [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

While recent advances in humanoid locomotion have achieved stable walking on varied terrains, capturing the agility and adaptivity of highly dynamic human motions remains an open challenge. In particular, agile parkour in complex environments demands not only low-level robustness, but also human-like motion expressiveness, long-horizon skill composition, and perception-driven decision-making. In this paper, we present Perceptive Humanoid Parkour (PHP), a modular framework that enables humanoid robots to autonomously perform long-horizon, vision-based parkour across challenging obstacle courses. Our approach first leverages motion matching, formulated as nearest-neighbor search in a feature space, to compose retargeted atomic human skills into long-horizon kinematic trajectories. This framework enables the flexible composition and smooth transition of complex skill chains while preserving the elegance and fluidity of dynamic human motions. Next, we train motion-tracking reinforcement learning (RL) expert policies for these composed motions, and distill them into a single depth-based, multi-skill student policy, using a combination of DAgger and RL. Crucially, the combination of perception and skill composition enables autonomous, context-aware decision-making: using only onboard depth sensing and a discrete 2D velocity command, the robot selects and executes whether to step over, climb onto, vault or roll off obstacles of varying geometries and heights. We validate our framework with extensive real-world experiments on a Unitree G1 humanoid robot, demonstrating highly dynamic parkour skills such as climbing tall obstacles up to 1.25m (96% robot height), as well as long-horizon multi-obstacle traversal with closed-loop adaptation to real-time obstacle perturbations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a real Unitree G1 doing dynamic parkour climbs up to 1.25 m with only depth sensing by chaining human motions via matching and distilling RL experts into one policy, but the abstract gives no numbers to back the reliability claims.

read the letter

The main point is that they have a working system on hardware where the robot picks and executes skills like stepping over, climbing, vaulting or rolling off obstacles using just depth images and a simple 2D velocity command. Motion matching composes retargeted human trajectories into long chains, RL experts track them, and DAgger plus RL distills the lot into a single depth-based student policy that runs closed-loop on the G1 with some adaptation to perturbations.

Referee Report

2 major / 2 minor

Summary. The paper presents Perceptive Humanoid Parkour (PHP), a modular framework for humanoid robots to perform long-horizon, vision-based parkour. It uses motion matching via nearest-neighbor search in feature space to compose retargeted human skills into kinematic trajectories, trains RL expert policies to track them, and distills the experts into a single depth-image-conditioned student policy via DAgger+RL. The student policy, given only onboard depth sensing and a discrete 2D velocity command, autonomously selects and executes skills such as step-over, climb, vault, or roll-off for varying obstacle geometries. The framework is validated through real-world experiments on a Unitree G1 humanoid, including climbs up to 1.25 m and closed-loop adaptation to perturbations.

Significance. If the central claims hold, the work would represent a meaningful advance in agile humanoid locomotion by demonstrating perception-driven composition of dynamic human skills without hand-crafted controllers or additional sensing. The combination of motion-matching composition with policy distillation offers a scalable path toward long-horizon behaviors, and successful real-world transfer on a commercial platform would strengthen evidence for sim-to-real methods in high-dynamics settings.

major comments (2)

[Abstract] Abstract: the claim of 'extensive real-world experiments' demonstrating 'highly dynamic parkour skills' and 'closed-loop adaptation to real-time obstacle perturbations' is not accompanied by any quantitative metrics, success rates, failure modes, baselines, or ablation studies. This leaves the central claim of reliable autonomous skill selection under depth-only sensing only partially supported.
[Method (distillation subsection)] The distillation step (DAgger+RL from multiple RL experts into a single depth-based student) is load-bearing for the autonomous decision-making claim, yet the manuscript provides no analysis of whether depth observations alone suffice to recover both geometry classification and precise timing for transitions without mode collapse or loss of dynamic fidelity on perturbed geometries.

minor comments (2)

[Method] Notation for the motion-matching feature space and nearest-neighbor distance metric should be defined explicitly with equations rather than described in prose.
[Experiments] The abstract mentions '96% robot height' for the 1.25 m climb; the corresponding robot height and exact obstacle dimensions should be stated consistently in the experiments section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, clarifying the experimental support in the manuscript while committing to revisions that strengthen the presentation of quantitative evidence and analysis.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'extensive real-world experiments' demonstrating 'highly dynamic parkour skills' and 'closed-loop adaptation to real-time obstacle perturbations' is not accompanied by any quantitative metrics, success rates, failure modes, baselines, or ablation studies. This leaves the central claim of reliable autonomous skill selection under depth-only sensing only partially supported.

Authors: We agree that the abstract would be strengthened by explicit quantitative metrics. The manuscript body (Section 5 and supplementary material) presents results from extensive real-world trials on the Unitree G1, including success across multiple obstacle types up to 1.25 m, long-horizon traversals, and adaptation to perturbations, with comparisons to non-perceptive baselines. We will revise the abstract to incorporate key metrics such as overall success rates, adaptation performance, and references to the ablation studies and failure mode analysis already present in the main text. revision: yes
Referee: [Method (distillation subsection)] The distillation step (DAgger+RL from multiple RL experts into a single depth-based student) is load-bearing for the autonomous decision-making claim, yet the manuscript provides no analysis of whether depth observations alone suffice to recover both geometry classification and precise timing for transitions without mode collapse or loss of dynamic fidelity on perturbed geometries.

Authors: The distillation subsection describes the DAgger+RL procedure that produces a single depth-conditioned student policy from the expert set. Real-world experiments demonstrate that this policy achieves autonomous skill selection and closed-loop adaptation to perturbations on varied geometries without evident mode collapse or loss of dynamic fidelity. We acknowledge that an explicit analysis of depth sufficiency for geometry classification and transition timing would provide additional rigor. We will add a targeted discussion in the revised method section examining the policy's observed behavior under depth-only input, drawing on the experimental outcomes. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper describes a modular pipeline that first applies motion matching (nearest-neighbor search in feature space) to compose retargeted human skills into kinematic trajectories, then trains separate RL expert policies to track those trajectories, and finally distills the experts into one depth-conditioned student policy via DAgger plus RL. None of these steps reduce by construction to quantities defined inside the paper; motion matching is a standard external technique, the RL tracking objective is independent of the final student policy, and the distillation step is a conventional supervised transfer process whose success is measured by external real-world experiments on the Unitree G1 rather than by internal re-use of fitted parameters. No self-citations are invoked to establish uniqueness or to smuggle in ansatzes, and the central claim of perception-driven skill selection rests on the empirical behavior of the trained policy rather than on any definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard robotics assumptions about motion retargeting and policy transfer; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Retargeted human motions preserve sufficient dynamic properties for stable robot execution
Invoked when composing retargeted atomic human skills into kinematic trajectories for RL tracking.

pith-pipeline@v0.9.0 · 5648 in / 1320 out tokens · 26219 ms · 2026-05-15T21:25:42.825073+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning Versatile Humanoid Manipulation with Touch Dreaming
cs.RO 2026-04 conditional novelty 5.0

HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-r...

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Legged locomotion in challenging ter- rains using egocentric vision

Ananye Agarwal, Ashish Kumar, Jitendra Malik, and Deepak Pathak. Legged locomotion in challenging ter- rains using egocentric vision. InConference on robot learning, pages 403–415. PMLR, 2023

work page 2023
[2]

Gallant: V oxel grid-based humanoid locomotion and local-navigation across 3d constrained terrains, 2025

Qingwei Ben, Botian Xu, Kailin Li, Feiyu Jia, Wentao Zhang, Jingping Wang, Jingbo Wang, Dahua Lin, and Jiangmiao Pang. Gallant: V oxel grid-based humanoid locomotion and local-navigation across 3d constrained terrains, 2025. URL https://arxiv.org/abs/2511.14625

work page arXiv 2025
[3]

Drecon: data-driven responsive control of physics-based characters.ACM Transactions On Graphics (TOG), 38(6):1–11, 2019

Kevin Bergamin, Simon Clavet, Daniel Holden, and James Richard Forbes. Drecon: data-driven responsive control of physics-based characters.ACM Transactions On Graphics (TOG), 38(6):1–11, 2019

work page 2019
[4]

Inertialization: High-performance anima- tion transitions in Gears of War

David Bollo. Inertialization: High-performance anima- tion transitions in Gears of War. Proc. of GDC, 2018

work page 2018
[5]

Motion matching - the road to next gen animation

Michael B ¨uttner and Simon Clavet. Motion matching - the road to next gen animation. Proc. of Nucl.ai, 2015

work page 2015
[6]

Bark- our: Benchmarking animal-level agility with quadruped robots.arXiv preprint arXiv:2305.14654, 2023

Ken Caluwaerts, Atil Iscen, J Chase Kew, Wenhao Yu, Tingnan Zhang, Daniel Freeman, Kuang-Huei Lee, Lisa Lee, Stefano Saliceti, Vincent Zhuang, et al. Bark- our: Benchmarking animal-level agility with quadruped robots.arXiv preprint arXiv:2305.14654, 2023

work page arXiv 2023
[7]

Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

Zixuan Chen, Mazeyu Ji, Xuxin Cheng, Xuanbin Peng, Xue Bin Peng, and Xiaolong Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

work page arXiv 2025
[8]

Extreme parkour with legged robots

Xuxin Cheng, Kexin Shi, Ananye Agarwal, and Deepak Pathak. Extreme parkour with legged robots. In2024 IEEE International Conference on Robotics and Automa- tion (ICRA), pages 11443–11450. IEEE, 2024

work page 2024
[9]

Motion matching and the road to next-gen animation

Simon Clavet. Motion matching and the road to next-gen animation. Proc. of GDC, 2016

work page 2016
[10]

Humanplus: Humanoid shad- owing and imitation from humans.arXiv preprint arXiv:2406.10454, 2024

Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wet- zstein, and Chelsea Finn. Humanplus: Humanoid shad- owing and imitation from humans.arXiv preprint arXiv:2406.10454, 2024

work page arXiv 2024
[11]

Control operators for interactive character animation

Ruiyu Gou, Michiel van de Panne, and Daniel Holden. Control operators for interactive character animation. ACM Transactions on Graphics (TOG), 2025

work page 2025
[12]

Attention- based map encoding for learning generalized legged locomotion.Science Robotics, 10(105):eadv3604, 2025

Junzhe He, Chong Zhang, Fabian Jenelten, Ruben Grandia, Moritz B ¨acher, and Marco Hutter. Attention- based map encoding for learning generalized legged locomotion.Science Robotics, 10(105):eadv3604, 2025

work page 2025
[13]

Anymal parkour: Learning agile navigation for quadrupedal robots.Science Robotics, 9(88):eadi7566, 2024

David Hoeller, Nikita Rudin, Dhionis Sako, and Marco Hutter. Anymal parkour: Learning agile navigation for quadrupedal robots.Science Robotics, 9(88):eadi7566, 2024

work page 2024
[14]

Learned motion matching.ACM Transactions on Graph- ics (TOG), 2020

Daniel Holden, Anas Kanoun, Michiel B ˘uttner, Sofien Bouaziz, Sebastian Thrun, and Aaron Hertzmann. Learned motion matching.ACM Transactions on Graph- ics (TOG), 2020

work page 2020
[15]

Huang,et al., Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control (2025),https://arxiv.org/abs/2503.11801

Xiaoyu Huang, Takara Truong, Yunbo Zhang, Fangzhou Yu, Jean Pierre Sleiman, Jessica Hodgins, Koushil Sreenath, and Farbod Farshidian. Diffuse-cloc: Guided diffusion for physics-based character look-ahead control. arXiv preprint arXiv:2503.11801, 2025

work page arXiv 2025
[16]

Dreamcontrol: Human-inspired whole-body humanoid control for scene interaction via guided diffusion.arXiv preprint arXiv:2509.14353, 2025

Dvij Kalaria, Sudarshan S Harithas, Pushkal Katara, Sangkyung Kwak, Sarthak Bhagat, Shankar Sastry, Sri- nath Sridhar, Sai Vemprala, Ashish Kapoor, and Jonathan Chung-Kuan Huang. Dreamcontrol: Human-inspired whole-body humanoid control for scene interaction via guided diffusion.arXiv preprint arXiv:2509.14353, 2025

work page arXiv 2025
[17]

Animal gaits on quadrupedal robots using motion match- ing and model-based control

Dongho Kang, Simon Zimmermann, and Stelian Coros. Animal gaits on quadrupedal robots using motion match- ing and model-based control. In2021 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pages 8500–8507. IEEE, 2021

work page 2021
[18]

Rma: Rapid motor adaptation for legged robots

Ashish Kumar, Zipeng Fu, Deepak Pathak, and Jitendra Malik. Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034, 2021

work page arXiv 2021
[19]

Learning quadrupedal locomotion over challenging terrain.Science robotics, 5 (47):eabc5986, 2020

Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain.Science robotics, 5 (47):eabc5986, 2020

work page 2020
[20]

BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

Qiayuan Liao, Takara E Truong, Xiaoyu Huang, Yu- man Gao, Guy Tevet, Koushil Sreenath, and C Karen Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

work page internal anchor Pith review arXiv 2025
[21]

Hybrid internal model: Learning agile legged locomotion with simulated robot response.arXiv preprint arXiv:2312.11460, 2023

Junfeng Long, Zirui Wang, Quanyi Li, Jiawei Gao, Liu Cao, and Jiangmiao Pang. Hybrid internal model: Learning agile legged locomotion with simulated robot response.arXiv preprint arXiv:2312.11460, 2023

work page arXiv 2023
[22]

Learning hu- manoid locomotion with perceptive internal model

Junfeng Long, Junli Ren, Moji Shi, Zirui Wang, Tao Huang, Ping Luo, and Jiangmiao Pang. Learning hu- manoid locomotion with perceptive internal model. In 2025 IEEE International Conference on Robotics and Automation (ICRA), pages 9997–10003. IEEE, 2025

work page 2025
[23]

Pie: Parkour with implicit-explicit learning framework for legged robots.IEEE Robotics and Automation Letters, 2024

Shixin Luo, Songbo Li, Ruiqi Yu, Zhicheng Wang, Jun Wu, and Qiuguo Zhu. Pie: Parkour with implicit-explicit learning framework for legged robots.IEEE Robotics and Automation Letters, 2024

work page 2024
[24]

Sonic: Supersizing motion tracking for natural humanoid whole-body con- trol.arXiv preprint arXiv:2511.07820, 2025

Zhengyi Luo, Ye Yuan, Tingwu Wang, Chenran Li, Sirui Chen, Fernando Casta ˜neda, Zi-Ang Cao, Jiefeng Li, David Minor, Qingwei Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body con- trol.arXiv preprint arXiv:2511.07820, 2025

work page arXiv 2025
[25]

Warp: A high-performance python frame- work for gpu simulation and graphics

Miles Macklin. Warp: A high-performance python frame- work for gpu simulation and graphics. https://github.com/ nvidia/warp, March 2022. NVIDIA GPU Technology Conference (GTC)

work page 2022
[26]

Learning robust perceptive locomotion for quadrupedal robots in the wild.Science robotics, 7(62):eabk2822, 2022

Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning robust perceptive locomotion for quadrupedal robots in the wild.Science robotics, 7(62):eabk2822, 2022

work page 2022
[27]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

Mayank Mittal, Pascal Roth, James Tigue, Antoine Richard, Octi Zhang, Peter Du, Antonio Serrano-Mu ˜noz, Xinjie Yao, Ren ´e Zurbr ¨ugg, Nikita Rudin, et al. Isaac lab: A gpu-accelerated simulation framework for multi- modal robot learning.arXiv preprint arXiv:2511.04831, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning.arXiv preprint arXiv:2301.10602, 2023

I Nahrendra, Byeongho Yu, and Hyun Myung. Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning.arXiv preprint arXiv:2301.10602, 2023

work page arXiv 2023
[29]

Agility meets stability: Versa- tile humanoid control with heterogeneous data.arXiv preprint arXiv:2511.17373, 2025

Yixuan Pan, Ruoyi Qiao, Li Chen, Kashyap Chitta, Liang Pan, Haoguang Mai, Qingwen Bu, Hao Zhao, Cunyuan Zheng, Ping Luo, et al. Agility meets stability: Versa- tile humanoid control with heterogeneous data.arXiv preprint arXiv:2511.17373, 2025

work page arXiv 2025
[30]

Deepmimic: Example-guided deep re- inforcement learning of physics-based character skills

Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel Van de Panne. Deepmimic: Example-guided deep re- inforcement learning of physics-based character skills. ACM Transactions On Graphics (TOG), 37(4):1–14, 2018

work page 2018
[31]

Amp: Adversarial motion priors for stylized physics-based character control.ACM Transac- tions on Graphics (TOG), 2021

Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. Amp: Adversarial motion priors for stylized physics-based character control.ACM Transac- tions on Graphics (TOG), 2021

work page 2021
[32]

A reduction of imitation learning and structured prediction to no-regret online learning

St ´ephane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. InProceedings of the fourteenth international conference on artificial intelli- gence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011

work page 2011
[33]

Parkour in the wild: Learning a general and exten- sible agile locomotion policy using multi-expert distilla- tion and rl fine-tuning.arXiv preprint arXiv:2505.11164, 2025

Nikita Rudin, Junzhe He, Joshua Aurand, and Marco Hutter. Parkour in the wild: Learning a general and exten- sible agile locomotion policy using multi-expert distilla- tion and rl fine-tuning.arXiv preprint arXiv:2505.11164, 2025

work page arXiv 2025
[34]

Learn parkour - climb up tutorial

Salgadopk. Learn parkour - climb up tutorial. URL https://youtu.be/6U1sIgqgPFo?si=339TPTxlFB5lWGB1

work page
[35]

Dpl: Depth-only perceptive humanoid locomotion via realistic depth synthesis and cross-attention terrain reconstruction

Jingkai Sun, Gang Han, Pihai Sun, Wen Zhao, Jiahang Cao, Jiaxu Wang, Yijie Guo, and Qiang Zhang. Dpl: Depth-only perceptive humanoid locomotion via realistic depth synthesis and cross-attention terrain reconstruction. arXiv preprint arXiv:2510.07152, 2025

work page arXiv 2025
[36]

Human motion diffusion model

Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. Human motion diffusion model. InICLR, 2023

work page 2023
[37]

Beamdojo: Learning agile humanoid locomotion on sparse footholds

Huayi Wang, Zirui Wang, Junli Ren, Qingwei Ben, Tao Huang, Weinan Zhang, and Jiangmiao Pang. Beamdojo: Learning agile humanoid locomotion on sparse footholds. InRobotics: Science and Systems (RSS), 2025

work page 2025
[38]

Wang,et al., PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System.arXiv preprint arXiv:2510.11072(2025)

Huayi Wang, Wentao Zhang, Runyi Yu, Tao Huang, Junli Ren, Feiyu Jia, Zirui Wang, Xiaojie Niu, Xiao Chen, Jiahe Chen, et al. Physhsi: Towards a real-world gener- alizable and natural humanoid-scene interaction system. arXiv preprint arXiv:2510.11072, 2025

work page arXiv 2025
[39]

Learning robust and agile legged locomotion using ad- versarial motion priors.IEEE Robotics and Automation Letters, 8(8):4975–4982, 2023

Jinze Wu, Guiyang Xin, Chenkun Qi, and Yufei Xue. Learning robust and agile legged locomotion using ad- versarial motion priors.IEEE Robotics and Automation Letters, 8(8):4975–4982, 2023

work page 2023
[40]

Xie,et al., KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly- Dynamic Skills.arXiv preprint arXiv:2506.12851(2025)

Weiji Xie, Jinrui Han, Jiakun Zheng, Huanyu Li, Xinzhe Liu, Jiyuan Shi, Weinan Zhang, Chenjia Bai, and Xue- long Li. Kungfubot: Physics-based humanoid whole- body control for learning highly-dynamic skills.arXiv preprint arXiv:2506.12851, 2025

work page arXiv 2025
[41]

Parc: Physics-based augmentation with reinforcement learning for character controllers

Michael Xu, Yi Shi, KangKang Yin, and Xue Bin Peng. Parc: Physics-based augmentation with reinforcement learning for character controllers. InACM SIGGRAPH, 2025

work page 2025
[42]

Learning to ball: Composing policies for long-horizon basketball moves.ACM Transactions on Graphics (TOG), 44(6):1–14, 2025

Pei Xu, Zhen Wu, Ruocheng Wang, Vishnu Sarukkai, Kayvon Fatahalian, Ioannis Karamouzas, Victor Zordan, and C Karen Liu. Learning to ball: Composing policies for long-horizon basketball moves.ACM Transactions on Graphics (TOG), 44(6):1–14, 2025

work page 2025
[43]

Omniretarget: Interaction- preserving data generation for humanoid whole-body loco-manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C Karen Liu, Rocky Duan, and Guanya Shi. Omniretarget: Interaction- preserving data generation for humanoid whole-body loco-manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

work page arXiv 2025
[44]

Neural volumetric memory for visual locomotion control

Ruihan Yang, Ge Yang, and Xiaolong Wang. Neural volumetric memory for visual locomotion control. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1430–1440, 2023

work page 2023
[45]

Learning visual parkour from generated images

Alan Yu, Ge Yang, Ran Choi, Yajvan Ravan, John Leonard, and Phillip Isola. Learning visual parkour from generated images. In8th Annual Conference on Robot Learning, 2024

work page 2024
[46]

Walking with terrain recon- struction: Learning to traverse risky sparse footholds

Ruiqi Yu, Qianshi Wang, Yizhen Wang, Zhicheng Wang, Jun Wu, and Qiuguo Zhu. Walking with terrain recon- struction: Learning to traverse risky sparse footholds. arXiv preprint arXiv:2409.15692, 2024

work page arXiv 2024
[47]

Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

Yanjie Ze, Siheng Zhao, Weizhuo Wang, Angjoo Kanazawa, Rocky Duan, Pieter Abbeel, Guanya Shi, Jiajun Wu, and C Karen Liu. Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

work page arXiv 2025
[48]

Hub: Learning extreme humanoid balance.CoRL, 2025

Tong Zhang, Boyuan Zheng, Ruiqian Nai, Yingdong Hu, Yen-Jen Wang, Geng Chen, Fanqi Lin, Jiongye Li, Chuye Hong, Koushil Sreenath, et al. Hub: Learning extreme humanoid balance.CoRL, 2025

work page 2025
[49]

Add: Physics-based motion imi- tation with adversarial differential discriminators.arXiv preprint arXiv:2505.04961, 2025

Ziyu Zhang, Sergey Bashkirov, Dun Yang, Michael Tay- lor, and Xue Bin Peng. Add: Physics-based motion imi- tation with adversarial differential discriminators.arXiv preprint arXiv:2505.04961, 2025

work page arXiv 2025
[50]

Hiking in the wild: A scalable perceptive parkour framework for humanoids.arXiv preprint arXiv:2601.07718, 2026

Shaoting Zhu, Ziwen Zhuang, Mengjie Zhao, Kun-Ying Lee, and Hang Zhao. Hiking in the wild: A scalable perceptive parkour framework for humanoids.arXiv preprint arXiv:2601.07718, 2026

work page arXiv 2026
[51]

Robot parkour learning.arXiv preprint arXiv:2309.05665, 2023

Ziwen Zhuang, Zipeng Fu, Jianren Wang, Christo- pher Atkeson, Soeren Schwertfeger, Chelsea Finn, and Hang Zhao. Robot parkour learning.arXiv preprint arXiv:2309.05665, 2023

work page arXiv 2023
[52]

Humanoid parkour learning.arXiv preprint arXiv:2406.10759, 2024

Ziwen Zhuang, Shenzhe Yao, and Hang Zhao. Humanoid parkour learning.arXiv preprint arXiv:2406.10759, 2024. APPENDIX A. Motion Matching Implementation Details This section provides implementation details for the motion matching procedure used to synthesize long-horizon parkour reference trajectories

work page arXiv 2024
[53]

At each framei, we store the robot configuration qi = (p i,r i,θ i), consisting of the root translationp i ∈R 3, root quaternionr i ∈R 4, and joint anglesθ i ∈R 29

Motion Database and Feature Precomputation:All mo- tion clips are first retargeted to a 29-DOF Unitree G1 hu- manoid using OmniRetarget [43] and represented as frame sequences. At each framei, we store the robot configuration qi = (p i,r i,θ i), consisting of the root translationp i ∈R 3, root quaternionr i ∈R 4, and joint anglesθ i ∈R 29. For each frame,...

work page
[54]

position

Query Feature Construction:At runtime, a query feature ˆxt is constructed from the current robot configurationq t and a 2D velocity command. We first extract the kinematic features fromq t to form the pose-based part of the query, namely the local foot state ˆft and the root velocity ˆht. We then compute the short-horizon future root trajectory from the 2...

work page
[55]

Transition Smoothing via Inertialization:To ensure smooth transitions when switching the playback index to a newly retrieved frame, we adopt inertialization [4]. The key idea is to compute an offset between the currently playing motion and the target motion at the transition instant, apply this offset after switching so the output remains continuous, and ...

work page
[56]

Locomotion provides a shared transition manifold and includes standing, walking, and run- ning motions spanning commanded speeds from 0.8 to 3.5 m/s

Skill List:Our motion library includes locomotion and a set of atomic parkour skills. Locomotion provides a shared transition manifold and includes standing, walking, and run- ning motions spanning commanded speeds from 0.8 to 3.5 m/s. Most parkour skills are instantiated at 1.0 m/s and 2.0 m/s. We additionally include a single 3.0 m/s cat-vault skill to ...

work page
[57]

Motion Tracking Details:Specific reward formulations and domain randomization settings used for expert policy learning from [20] are summarized in Table IV and Table V for reference

work page
[58]

Distillation Details:During student training, we relax the termination conditions relative to the expert to prevent premature termination of valid but mirrored executions. While this improves PPO stability, the student may visit states that are out-of-distribution for the expert policies, which were trained under the original termination thresholds and ma...

work page
[59]

Training Hyperparameters:We include all hyperparam- eters for two-stage training in Table VI for reference. C. Details for Baselines

work page
[60]

curriculum, without any motion imitation or human refer- ence trajectories

Velocity Tracking Baseline:To show the importance of human reference motion in our framework, we include a standard reward-shaping velocity-tracking baseline that learns locomotion purely from handcrafted rewards and a terrain Skill Duration (s) Locomotion Locomotion 495.5 Parkour skills @ 1.0 m/s Step (36 cm) 2.2 Climb (58 cm) 12.1 Climb (76 cm) 8.8 Clim...

work page
[61]

AMP Baseline:Since AMP [31] is a popular algorithm for chaining skills with human reference data, we also im- plemented an AMP baseline by following theMimicKit 2 AMP implementation released by the original AMP authors. In our experiments, this baseline can walk stably and track the commanded velocity, but it does not perform well on obstacle traversal: i...

work page 2048