pith. sign in

arxiv: 2606.08059 · v1 · pith:AMZGSIGXnew · submitted 2026-06-06 · 💻 cs.RO

Perceptive Behavior Foundation Model: Adapting Human Motion Priors to Robot-Centric Terrain

Pith reviewed 2026-06-27 19:47 UTC · model grok-4.3

classification 💻 cs.RO
keywords humanoid controlmotion priorsterrain adaptationfoundation modelperceptive behaviorreference synthesisteacher-student transferresidual learning
0
0 comments X

The pith

Human motion priors are adapted to a robot's local terrain by synthesizing conformal references from raw clips and transferring them to a student policy via residual corrections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to remove the assumption that human motion references are already compatible with the robot's surroundings. It does so by creating terrain-conformal references from locomotion clips and training a student policy that receives only the original raw references. Terrain features reach the policy only through residual pathways that start at zero and learn corrections only when needed. This keeps the motion-tracking prior intact while allowing adaptation of contacts, posture, and timing to the robot's actual ground.

Core claim

Perceptive BFM grounds human motion priors in robot-centric perception while preserving raw kinematic motion references as the behavioral interface. TCRS converts locomotion-oriented human motion clips into terrain-consistent references through contact-aware foothold construction, foot-geometry-aware swing optimization, support-aware root reconstruction, collision repair, and multi-point inverse kinematics. A blind adapted-reference teacher is trained and its terrain-conformal behavior is transferred to a deployed raw-reference student through target-frame action alignment in an identity-gated Transformer tracker whose terrain features enter through residual pathways initialized to preserve

What carries the argument

terrain-conformal reference synthesis (TCRS), the pipeline that converts human motion clips into terrain-consistent references via contact-aware foothold construction, foot-geometry-aware swing optimization, support-aware root reconstruction, collision repair, and multi-point inverse kinematics; paired with residual pathways in an identity-gated Transformer tracker that add local terrain corrections only when required.

If this is right

  • Raw kinematic references remain usable as the behavioral interface even when human and robot environments differ.
  • Local terrain observations adapt contacts, posture, and timing without retraining the core motion prior.
  • Terrain features enter the policy only through residuals, so corrections occur only when the raw reference is incompatible.
  • Scalable terrain supervision is obtained from automated synthesis rather than hand-designed or terrain-specific motion data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The residual-pathway design could allow perception modules to be added to existing motion trackers without full retraining.
  • If TCRS generalizes beyond locomotion clips, the same separation might support non-walking behaviors such as manipulation or climbing.
  • The teacher-student split separates the problem of reference synthesis from the problem of learning terrain corrections, which could be tested independently.

Load-bearing premise

That TCRS can reliably turn human locomotion clips into terrain-consistent references without artifacts that break policy training or cause real-world instability.

What would settle it

Run the trained student policy on terrain where the TCRS pipeline produces incorrect footholds or swing trajectories and check whether tracking fails or the robot falls.

Figures

Figures reproduced from arXiv: 2606.08059 by Hao Xu, Junwei Liang, Qiang Zhang, Shuo Yang, Teli Ma, Yizhao Li, Yudong Fan, Zifan Wang.

Figure 1
Figure 1. Figure 1: Single-policy terrain grounding. A single Perceptive BFM tracks diverse flat-ground human-motion commands while adapting them to randomly placed robot-side terrains. Robot-centric perception adjusts footholds, swing clearance, posture, and contact timing online. Abstract: Humanoid behavior foundation models aim to acquire reusable whole￾body control policies from broad human motion priors, enabling a singl… view at source ↗
Figure 2
Figure 2. Figure 2: Perceptive BFM overview. TCRS synthesizes terrain-conformal references offline only; it is never queried at deployment. A blind teacher learns adapted-reference tracking on this supervision; the deployed identity-gated Transformer student receives the raw reference and a robot-centric terrain scan, and learns local residual corrections through target-frame action alignment. The deployment command remains t… view at source ↗
Figure 3
Figure 3. Figure 3: TCRS trajectory synthesis. The blue ghost is the raw reference placed on terrain; the opaque robot is the TCRS output. Foot traces compare the sampling-based (model predictive path integral, MPPI) foot-end optimization used in TCRS (yellow), Cubic Interp (blue), and direct terrain-height z-lifting (black). 0 2 4 6 8 10 Iterations (10^3) 0 10 20 30 40 50 60 Mean Reward PMT PMT w/o distillation Flat MLP MLP-… view at source ↗
Figure 4
Figure 4. Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Real-robot mocap mismatch. (a) Human mocap motion captured on flat ground; (b,c) the robot tracks the (a) command over robot-side terrain; (d) a separate walk-and￾dance motion deployed in the wild [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Representative failure. The upper-body command is collision￾unaware, so arms or torso can strike obstacles. Assumptions. TCRS is a kinematic synthesizer: it builds contact-consistent, style-preserving references without solving contact-rich dynamics, and assumes a static, rigid, observable height field, so it does not model deformable, granular, or slip￾pery media, and assumes the upper-body command stays … view at source ↗
Figure 7
Figure 7. Figure 7: Detailed PMT network architecture. (A) Inputs: policy observation ot, 10-step propri￾oceptive history ht−k:t, command window ct−m:t, terrain map (Ht,Mt), critic observation, and supervision targets. (B) PMT actor: a Transformer motion-tracking backbone with cross-attention encoders Eh, Ec produces a motion intent ut; the terrain perception branch (Map CNN fcnn followed by a query-conditioned MapTransformer… view at source ↗
read the original abstract

Humanoid behavior foundation models aim to acquire reusable whole-body control policies from broad human motion priors, enabling a single controller to produce diverse and expressive behaviors. However, existing motion-centric foundation policies largely assume that the reference motion is already physically compatible with the robot's surroundings. This assumption breaks when the demonstrator, operator, and robot inhabit different environments: a human motion may specify the intended behavior, but not the footholds, clearance, body height, or contact timing required by the robot's local terrain. We introduce \emph{Perceptive Behavior Foundation Model} (Perceptive BFM), a terrain-aware humanoid control framework that grounds human motion priors in robot-centric perception. The model preserves raw kinematic motion references as the behavioral interface, while using local terrain observations to adapt contacts, posture, and timing. To provide scalable terrain supervision, we develop \emph{terrain-conformal reference synthesis} (TCRS), which converts locomotion-oriented human motion clips into terrain-consistent references through contact-aware foothold construction, foot-geometry-aware swing optimization, support-aware root reconstruction, collision repair, and multi-point inverse kinematics. We then train a blind adapted-reference teacher and transfer its terrain-conformal behavior to a deployed raw-reference student through target-frame action alignment. The student is an identity-gated Transformer tracker whose terrain features enter through residual pathways initialized to preserve the motion-tracking prior and trained to produce local corrections only when needed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the Perceptive Behavior Foundation Model (Perceptive BFM), a terrain-aware humanoid control framework that grounds human motion priors in robot-centric perception. It preserves raw kinematic references as the behavioral interface and uses local terrain observations to adapt contacts, posture, and timing. The key technical contribution is terrain-conformal reference synthesis (TCRS), a five-stage pipeline (contact-aware foothold construction, foot-geometry-aware swing optimization, support-aware root reconstruction, collision repair, multi-point inverse kinematics) that converts locomotion-oriented human motion clips into terrain-consistent references. A blind adapted-reference teacher is trained and its behavior transferred to a deployed raw-reference student (an identity-gated Transformer tracker) via target-frame action alignment, with terrain features entering through residual pathways initialized to preserve the motion-tracking prior.

Significance. If the TCRS pipeline reliably produces artifact-free references and the teacher-student transfer succeeds, the work would enable scalable reuse of human motion data for expressive humanoid behaviors on varied real-world terrain without requiring terrain-compatible demonstrations, addressing a key limitation of existing motion-centric foundation policies.

major comments (2)
  1. [TCRS pipeline description] The TCRS description (abstract and method) claims the five-stage process produces terrain-consistent references faithful enough for stable teacher training and student transfer, but supplies no quantitative validation such as contact timing error, foot clearance statistics, kinematic deviation metrics, or success rates on downstream policy training. This is load-bearing for the central claim, as any systematic distortion in timing, posture, or clearance would undermine the raw-reference student's ability to recover intended behavior via residual corrections.
  2. [Experiments / Evaluation] No ablation studies or quantitative results are reported to isolate the contribution of the residual terrain pathways, the target-frame action alignment transfer, or the identity-gated Transformer architecture. Without these, it is not possible to verify whether the student produces local corrections only when needed or whether the framework outperforms baselines that assume terrain-compatible references.
minor comments (2)
  1. [Method] Clarify the exact definition of 'target-frame action alignment' and how it differs from standard imitation or distillation losses used in prior motion-tracking work.
  2. [Student architecture] The abstract states the student is 'trained to produce local corrections only when needed,' but the initialization and training details for the residual pathways should be expanded for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for quantitative validation of TCRS and ablations on the transfer components. We address each major comment below and will revise the manuscript accordingly to strengthen the central claims.

read point-by-point responses
  1. Referee: [TCRS pipeline description] The TCRS description (abstract and method) claims the five-stage process produces terrain-consistent references faithful enough for stable teacher training and student transfer, but supplies no quantitative validation such as contact timing error, foot clearance statistics, kinematic deviation metrics, or success rates on downstream policy training. This is load-bearing for the central claim, as any systematic distortion in timing, posture, or clearance would undermine the raw-reference student's ability to recover intended behavior via residual corrections.

    Authors: We agree that explicit quantitative validation of TCRS is essential to support the claim that the synthesized references are sufficiently faithful. The current manuscript emphasizes the pipeline design and qualitative demonstrations but does not report the requested metrics. In the revised version we will add a new evaluation subsection that computes and reports contact timing error (mean absolute deviation in stance/swing phases), foot clearance statistics (minimum and average clearance over swing trajectories), kinematic deviation metrics (joint angle and root position RMSE relative to original human references), and downstream success rates (percentage of stable teacher training episodes and student transfer success across terrain types). These will be evaluated on a held-out set of locomotion clips adapted to procedurally generated terrains. revision: yes

  2. Referee: [Experiments / Evaluation] No ablation studies or quantitative results are reported to isolate the contribution of the residual terrain pathways, the target-frame action alignment transfer, or the identity-gated Transformer architecture. Without these, it is not possible to verify whether the student produces local corrections only when needed or whether the framework outperforms baselines that assume terrain-compatible references.

    Authors: We concur that isolating the contributions of the residual terrain pathways, target-frame action alignment, and identity-gated Transformer is necessary to substantiate the design choices. The present manuscript presents the integrated framework and overall results but omits these controlled ablations. In revision we will expand the experiments section with quantitative ablations: (1) variants with/without residual pathways (measuring terrain adaptation error and tracking fidelity), (2) alternative transfer methods versus target-frame action alignment (reporting policy success rate and correction magnitude), and (3) comparisons against non-gated Transformer baselines. All ablations will include performance on both simulated and real-robot terrain tasks to demonstrate when and how local corrections are applied. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained method description

full rationale

The provided abstract and framework description introduce Perceptive BFM and TCRS as a sequence of processing stages (contact-aware foothold construction, swing optimization, root reconstruction, collision repair, multi-point IK) followed by teacher-student transfer via target-frame action alignment. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations are present that would reduce any claimed output to its inputs by construction. The central claim rests on the described pipeline producing usable references, which is an empirical precondition rather than a circular reduction. This matches the default case of a non-circular proposal of a new control architecture.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review is based solely on the abstract; the ledger reflects the high-level components described. TCRS is presented as a composite procedure whose internal optimization steps are likely to contain tunable weights or thresholds not enumerated here.

axioms (1)
  • domain assumption Human motion priors remain a valid behavioral interface even after terrain-induced modifications to contacts and timing.
    Invoked when the paper states that raw kinematic references are preserved as the behavioral interface while local terrain observations adapt the execution.
invented entities (1)
  • terrain-conformal reference no independent evidence
    purpose: Provides a terrain-consistent motion target derived from raw human clips for training the teacher policy.
    New construct introduced via the TCRS pipeline; no independent evidence outside the paper is supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5805 in / 1486 out tokens · 25269 ms · 2026-06-27T19:47:11.570127+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 35 canonical work pages · 7 internal anchors

  1. [1]

    T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. M. Kitani, C. Liu, and G. Shi. OmniH2O: Universal and dexterous human-to-humanoid whole-body teleoperation and learning. InProceedings of the 8th Conference on Robot Learning, volume 270 ofProceedings of Machine Learning Research, pages 1516–1540. PMLR, 2025. URL https://proceedings. mlr.press/v270/he25b.html

  2. [2]

    Cheng, Y

    X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang. Expressive whole-body control for humanoid robots. InProceedings of Robotics: Science and Systems, Delft, Netherlands, July

  3. [3]

    doi:10.15607/RSS.2024.XX.107

  4. [4]

    T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, J. Kautz, C. Liu, G. Shi, X. Wang, L. Fan, and Y . Zhu. HOVER: Versatile neural whole-body controller for humanoid robots.arXiv preprint arXiv:2410.21229, 2024. doi:10.48550/arXiv.2410.21229

  5. [5]

    GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

    J. Bjorck, F. Castañeda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, J. Jang, Z. Jiang, J. Kautz, K. Kundalia, L. Lao, Z. Li, Z. Lin, K. Lin, G. Liu, E. Llontop, L. Magne, A. Mandlekar, A. Narayan, S. Nasiriany, S. Reed, Y . L. Tan, G. Wang, Z. Wang, J. Wang, Q. Wang, J. Xiang, Y . Xie, Y . Xu, Z. Xu, S. Ye, Z. Yu, A. Zhang, ...

  6. [6]

    W. Zeng, S. Lu, K. Yin, X. Niu, M. Dai, J. Wang, and J. Pang. Behavior foundation model for humanoid robots.arXiv preprint arXiv:2509.13780, 2025. doi:10.48550/arXiv.2509.13780

  7. [7]

    Y . Li, Z. Luo, T. Zhang, C. Dai, A. Kanervisto, A. Tirinzoni, H. Weng, K. Kitani, M. Guzek, A. Touati, A. Lazaric, M. Pirotta, and G. Shi. BFM-Zero: A promptable behavioral founda- tion model for humanoid control using unsupervised reinforcement learning.arXiv preprint arXiv:2511.04131, 2025. doi:10.48550/arXiv.2511.04131

  8. [8]

    Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Castaneda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, X. Da, L. Fan, and Y . Zhu. SONIC: Supersizing motion tracking for natural humanoid whole- body control.arXiv preprint arXiv:2511.07820, 2025. doi:10.48550/arXiv.2511.07820

  9. [9]

    S. Zhu, Z. Zhuang, M. Zhao, K.-Y . Lee, and H. Zhao. Hiking in the wild: A scalable perceptive parkour framework for humanoids.arXiv preprint arXiv:2601.07718, 2026. doi:10.48550/ arXiv.2601.07718

  10. [10]

    Zhuang, S

    Z. Zhuang, S. Zhu, M. Zhao, and H. Zhao. Deep whole-body parkour.arXiv preprint arXiv:2601.07701, 2026. doi:10.48550/arXiv.2601.07701

  11. [11]

    Z. Wu, X. Huang, L. Yang, Y . Zhang, K. Sreenath, X. Chen, P. Abbeel, R. Duan, A. Kanazawa, C. Sferrazza, G. Shi, and C. K. Liu. Perceptive humanoid parkour: Chaining dynamic human skills via motion matching.arXiv preprint arXiv:2602.15827, 2026. doi:10.48550/arXiv.2602. 15827

  12. [12]

    X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills.ACM Transactions on Graphics, 37 (4):143, 2018. doi:10.1145/3197517.3201311

  13. [13]

    X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa. AMP: Adversarial motion priors for stylized physics-based character control.ACM Transactions on Graphics, 40(4):1–20, 2021. doi:10.1145/3450626.3459670

  14. [14]

    T. He, Z. Luo, W. Xiao, C. Zhang, K. Kitani, C. Liu, and G. Shi. Learning human-to-humanoid real-time whole-body teleoperation. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8944–8951, 2024. doi:10.1109/IROS58592.2024.10801984. 9

  15. [15]

    Z. Luo, J. Cao, A. Winkler, K. Kitani, and W. Xu. Perpetual humanoid control for real-time simulated avatars. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10895–10904, 2023. doi:10.1109/ICCV51070.2023.01000

  16. [16]

    Z. Luo, J. Cao, J. Merel, A. Winkler, J. Huang, K. Kitani, and W. Xu. Universal humanoid motion representations for physics-based control. InInternational Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=OrOd8PxOO2

  17. [17]

    Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn. HumanPlus: Humanoid shadowing and imitation from humans. InProceedings of the 8th Conference on Robot Learning, volume 270 ofProceedings of Machine Learning Research, pages 2828–2844. PMLR, 2025. URL https://proceedings.mlr.press/v270/fu25a.html

  18. [18]

    Y . Ma, H. Yu, J. Xie, C. Lv, Q. Luo, C. Zhang, Y . Yin, B. Xing, X. Ren, and D. Zheng. Robust and generalized humanoid motion tracking.arXiv preprint arXiv:2601.23080, 2026. doi:10.48550/arXiv.2601.23080

  19. [19]

    Y . Wang, S. Zhu, P. Zhi, Y . Li, J. Li, Y .-L. Li, Y . Xiao, X. Wang, B. Jia, and S. Huang. OmniXtreme: Breaking the generality barrier in high-dynamic humanoid control.arXiv preprint arXiv:2602.23843, 2026. doi:10.48550/arXiv.2602.23843

  20. [20]

    Agarwal, A

    A. Agarwal, A. Kumar, J. Malik, and D. Pathak. Legged locomotion in challenging terrains using egocentric vision. InProceedings of the 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 403–415. PMLR, 2023. URL https://proceedings.mlr.press/v205/agarwal23a.html

  21. [21]

    Zhuang, Z

    Z. Zhuang, Z. Fu, J. Wang, C. G. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao. Robot parkour learning. InProceedings of the 7th Conference on Robot Learning, volume 229 ofProceedings of Machine Learning Research, pages 73–92. PMLR, 2023. URL https: //proceedings.mlr.press/v229/zhuang23a.html

  22. [22]

    nvblox: GPU - accelerated incremental signed distance field mapping,

    X. Cheng, K. Shi, A. Agarwal, and D. Pathak. Extreme parkour with legged robots. In2024 IEEE International Conference on Robotics and Automation (ICRA), pages 11443–11450, 2024. doi:10.1109/ICRA57147.2024.10610200

  23. [23]

    Zhuang, S

    Z. Zhuang, S. Yao, and H. Zhao. Humanoid parkour learning.arXiv preprint arXiv:2406.10759,

  24. [24]

    doi:10.48550/arXiv.2406.10759

  25. [25]

    H. Wang, Z. Wang, J. Ren, Q. Ben, T. Huang, W. Zhang, and J. Pang. BeamDojo: Learning agile humanoid locomotion on sparse footholds.arXiv preprint arXiv:2502.10363, 2025. doi:10.48550/arXiv.2502.10363

  26. [26]

    W. Sun, Y . Su, L. Huang, A. Zhang, D. Wei, M. San, D. Tian, E. Cao, F. Yan, E. Xie, and Z. Xie. Now you see that: Learning end-to-end humanoid locomotion from raw pixels.arXiv preprint arXiv:2602.06382, 2026. doi:10.48550/arXiv.2602.06382

  27. [27]

    Z. Wang, T. Ma, Y . Jia, X. Yang, J. Zhou, W. Ouyang, Q. Zhang, and J. Liang. Omni-perception: Omnidirectional collision avoidance for legged locomotion in dynamic environments.arXiv preprint arXiv:2505.19214, 2025. doi:10.48550/arXiv.2505.19214

  28. [28]

    Z. Wang, X. Yang, J. Zhao, J. Zhou, T. Ma, Z. Gao, A. Ajoudani, and J. Liang. End-to-end humanoid robot safe and comfortable locomotion policy.arXiv preprint arXiv:2508.07611,

  29. [29]

    doi:10.48550/arXiv.2508.07611

  30. [30]

    Learning Whole-Body Humanoid Locomotion via Motion Generation and Motion Tracking

    Z. Zhang, K. Wen, M. Xu, J. He, C. Li, T. Miki, C. Schwarke, C. Zhang, X. B. Peng, and M. Hutter. Learning whole-body humanoid locomotion via motion generation and motion tracking.arXiv preprint arXiv:2604.17335, 2026. doi:10.48550/arXiv.2604.17335. 10

  31. [31]

    W. D. Compton, Z. Olkin, and A. D. Ames. Terrain consistent reference-guided RL for humanoid navigation autonomy.arXiv preprint arXiv:2605.15517, 2026. doi:10.48550/arXiv.2605.15517

  32. [32]

    Y . Li, P. Zhi, Y . Wang, T. Liu, S. Yan, W. Liu, X. Wang, B. Jia, and S. Huang. OmniTrack: General motion tracking via physics-consistent reference.arXiv preprint arXiv:2602.23832,

  33. [33]

    doi:10.48550/arXiv.2602.23832

  34. [34]

    S. Choi, M. K. X. J. Pan, and J. Kim. Nonparametric motion retargeting for humanoid robots on shared latent space. InRobotics: Science and Systems, 2020. doi:10.15607/RSS.2020.XVI.071

  35. [35]

    Villegas, J

    R. Villegas, J. Yang, D. Ceylan, and H. Lee. Neural kinematic networks for unsupervised motion retargetting. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 8639–8648, 2018. doi:10.1109/CVPR.2018.00901

  36. [36]

    L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. OmniRetarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025. doi:10.48550/ arXiv.2509.26633

  37. [37]

    Dantec, M

    E. Dantec, M. Naveau, P. Fernbach, N. A. Villa, G. Saurel, O. Stasse, M. Taix, and N. Mansard. Whole-body model predictive control for biped locomotion on a torque-controlled humanoid robot.IEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 638–644,

  38. [38]

    doi:10.1109/Humanoids53995.2022.10000129

  39. [39]

    Pajon, S

    A. Pajon, S. Caron, G. De Magistris, S. Miossec, and A. Kheddar. Walking on gravel with soft soles using linear inverted pendulum tracking and reaction force distribution. In2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), pages 432–437,

  40. [40]

    doi:10.1109/HUMANOIDS.2017.8246909

  41. [41]

    J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter. Learning quadrupedal locomotion over challenging terrain.Science Robotics, 5(47):eabc5986, 2020. doi:10.1126/scirobotics. abc5986

  42. [42]

    Kumar, Z

    A. Kumar, Z. Fu, D. Pathak, and J. Malik. RMA: Rapid motor adaptation for legged robots. In Robotics: Science and Systems, 2021. doi:10.15607/RSS.2021.XVII.011

  43. [43]

    T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter. Learning robust perceptive locomotion for quadrupedal robots in the wild.Science Robotics, 7(62):eabk2822,

  44. [44]

    doi:10.1126/scirobotics.abk2822

  45. [45]

    T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan, Z. Yi, G. Qu, K. Kitani, J. Hodgins, L. J. Fan, Y . Zhu, C. Liu, and G. Shi. ASAP: Aligning simulation and real-world physics for learning agile humanoid whole-body skills.arXiv preprint arXiv:2502.01143, 2025. doi:10.48550/arXiv.2502.01143

  46. [46]

    Residual Policy Learning

    T. Silver, K. Allen, J. Tenenbaum, and L. P. Kaelbling. Residual policy learning.arXiv preprint arXiv:1812.06298, 2018. doi:10.48550/arXiv.1812.06298

  47. [47]

    Johannink, S

    T. Johannink, S. Bahl, A. Nair, J. Luo, A. Kumar, M. Loskyll, J. A. Ojea, E. Solowjow, and S. Levine. Residual reinforcement learning for robot control.IEEE International Conference on Robotics and Automation (ICRA), pages 6023–6029, 2019. doi:10.1109/ICRA.2019.8794127

  48. [48]

    S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. ResMimic: From general motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025. doi:10.48550/arXiv.2510.05070

  49. [49]

    Z. Wang, Y . Jia, L. Shi, H. Wang, H. Zhao, X. Li, J. Zhou, J. Ma, and G. Zhou. Arm- constrained curriculum learning for loco-manipulation of a wheel-legged robot. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10770– 10776. IEEE, 2024. doi:10.1109/IROS58592.2024.10802062. 11 A Additional Implementation Details A....