pith. sign in

arxiv: 2606.28476 · v1 · pith:XV5NWH7Nnew · submitted 2026-06-26 · 💻 cs.RO

FADA: Few-Shot Domain Adaptation via Dynamics Alignment for Humanoid Control

Pith reviewed 2026-06-30 01:24 UTC · model grok-4.3

classification 💻 cs.RO
keywords few-shot adaptationdomain adaptationhumanoid controlinverse dynamicsrobot learningdynamics alignmentDAgger
0
0 comments X

The pith

FADA adapts humanoid controllers to target dynamics by finetuning only the inverse dynamics model on short target rollouts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FADA as a three-stage framework that first trains an oracle policy with privileged information and distills it into a Planner-IDM student using DAgger. At deployment, the planner stays fixed while the IDM is updated with standard supervised learning on roughly two minutes of target-domain data consisting of paired actions and observations. This approach addresses dynamics mismatch in humanoid control caused by changes in terrain, payload, or actuators without needing rewards or full policy retraining. A sympathetic reader would care because it offers a lightweight way to achieve high-precision whole-body control on physical robots in new environments.

Core claim

FADA is a Planner-IDM framework for few-shot domain adaptation in humanoid control. It trains an oracle policy with privileged information, distills the behavior into a deployable Planner-IDM student through DAgger, and at deployment freezes the planner while finetuning only the IDM using approximately 2 minutes of target-domain rollouts with standard supervised learning on observed action-observation pairs to align with target dynamics.

What carries the argument

The Planner-IDM architecture, where the planner generates reference trajectories and the IDM maps them to actions, with adaptation performed solely by updating the IDM via supervised learning on target rollouts.

If this is right

  • FADA outperforms in-context and end-to-end adaptation baselines on task performance under dynamics shifts.
  • Real humanoid robots can perform diverse high-precision whole-body tasks after adaptation.
  • Adaptation uses only paired actions and observations from short rollouts without requiring optimal demonstrations or rewards.
  • The planner does not need updates, allowing modular adaptation focused on dynamics alignment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This separation of planner and IDM could generalize to other control systems where dynamics vary but task planning remains stable.
  • Further work might test if similar few-shot alignment works for longer horizons or more complex tasks beyond the evaluated ones.
  • Connecting to sim-to-real transfer, this method reduces reliance on extensive domain randomization by enabling quick post-deployment correction.

Load-bearing premise

That supervised learning on the observed action-observation pairs collected during short target rollouts is sufficient to align the IDM to the new dynamics without optimal demonstrations, rewards, or updates to the planner.

What would settle it

A test where the adapted IDM, after training on the 2-minute rollouts, produces actions that do not result in the planned motions when executed on the target robot in a repeatable dynamics shift scenario would falsify the alignment claim.

Figures

Figures reproduced from arXiv: 2606.28476 by Alan Wang, Angchen Xie, Guanya Shi, Ishayu Shikhare, Max Simchowitz, Nikhil Sobanbabu.

Figure 1
Figure 1. Figure 1: FADA enables high-precision whole-body skills through dynamics alignment. Through few-shot adaptation, hu￾manoid robots can stably execute diverse real-world tasks that fail under zero-shot transfer. (a) and (b) illustrate the adaptation effect: Only after adaptation is Unitree G1 able to precisely track a line on a slope, and Booster T1 able to pull a 6 kg laundry basket across the finish line. (c)–(f) sh… view at source ↗
Figure 2
Figure 2. Figure 2: Adaptation taxonomy. Existing ap￾proaches differ in whether they use target rollouts for model updates and which component they up￾date. FADA updates the IDM with target-domain rollouts. Existing approaches for improving target-domain deployment broadly fall into two categories, as sum￾marized in [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of FADA. FADA first trains a privileged oracle policy in the source simulator, then distills it into a deployable Planner–IDM student through DAgger-style supervision. The planner predicts short-horizon future propriocep￾tion from the task command and observation history, while the IDM maps this future to actions. During target adaptation, FADA freezes the planner and finetunes only the IDM using … view at source ↗
Figure 4
Figure 4. Figure 4: Planner–IDM interface. The planner predicts proprioceptive intent, and the IDM maps intent and execution history to an action chunk. Given the source-domain data and few-shot target roll￾outs defined in Section 3, FADA is built on a simple ob￾servation: under target-domain dynamics shifts, the task intention often remains meaningful, but the action re￾quired to realize it can change substantially. For exam… view at source ↗
Figure 5
Figure 5. Figure 5: Baseline interfaces. We compare FADA with source-trained transformer DAgger, zero-shot co-prediction, and target-domain co-prediction finetuning. The comparison isolates whether target rollouts are most useful when they update the execution module rather than a monolithic student or a future-prediction objective. where zero-shot transfer is unreliable, (2) whether the framework improves transfer across emb… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative sim-to-real deployment. Zero-shot and IDM-adapted rollouts for (a) G1 Slope Traversal, (b) G1 Kungfu + Soft Terrain, (c) G1 Loco. + Payload (grocery carrying through poles), and (d) T1 Loco. + Payload. Adaptation improves execution-critical behavior, including foot placement, posture recovery, and payload compensation. adaptation policy FADA-zs, freeze the planner, finetune only the IDM, and re… view at source ↗
Figure 7
Figure 7. Figure 7: Attribution over predicted future steps for K [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Zero-shot transfer to MuJoCo on G1 whole-body tracking (n = 10). Our loss formu￾lation in Section 4 has two design choices: (A) training the planner through the stop-gradient IDM via action-prediction loss (Eq. (4.3)); (B) supervising the IDM only on the executed first action (Eq. (4.2)). Few-shot adaptation requires the zero-shot policy to re￾main deployable long enough to collect target rollouts. We ther… view at source ↗
Figure 9
Figure 9. Figure 9: Target data-size ablation. We report E¯ v ↓ on T1 Loco. + Payload, normalized by the 100-step setting. The 6000-step budget used in the main experiments reaches the performance plateau, and larger budgets do not pro￾vide consistent gains. LoRA vs. full IDM finetuning [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Arm-tracking task. A fixed￾base arm tracks end-effector targets under wrist payloads. We evaluate three diagnostics before and after few-shot LoRA finetuning of the IDM, with the planner frozen: end￾effector tracking error, planner prediction RMSE, and an IDM consistency gap. The consistency gap compares the IDM action produced using the planner-predicted next observation with the IDM action produced usin… view at source ↗
Figure 11
Figure 11. Figure 11: Planner IK map across payload conditions before finetuning. [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Fixed-base arm tracking under payload variation. (a) [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Additional qualitative sim-to-real deployments. [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗
read the original abstract

High-precision humanoid control is limited by target-domain dynamics mismatch, where the same control objective can induce different realized motions under changes in terrain, payload, or actuator response. Existing methods either pursue zero-shot transfer through domain randomization or in-context adaptation without target-domain specialization, or require heavy adaptation pipelines that leverage target-domain data, such as model calibration, residual learning, or policy retraining. In this paper, we present FADA (Few-Shot Domain Adaptation via Dynamics Alignment), a three-stage Planner-Inverse Dynamics Model (Planner-IDM) framework for few-shot adaptation in humanoid control. FADA first trains an oracle policy with privileged information and then distills the oracle behavior into a deployable Planner-IDM student through DAgger. At deployment, FADA freezes the planner and finetunes only the IDM using approximately 2 minutes of target-domain rollouts with standard supervised learning. Rather than requiring optimal demonstrations or rewards, FADA uses the paired actions and observations that are observed during these rollouts as supervision, aligning the IDM's action generation with target-domain dynamics. Experiments show that FADA outperforms both in-context and end-to-end adaptation baselines, improving task performance under dynamics shifts and enabling real humanoid robots to execute diverse high-precision whole-body tasks. Implementation details and qualitative hardware rollout videos are available at https://lecar-lab.github.io/FADA-humanoid/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents FADA, a three-stage Planner-IDM framework for few-shot domain adaptation in humanoid control. An oracle policy is first trained with privileged information and distilled into a deployable Planner-IDM student via DAgger. At deployment the planner is frozen and only the IDM is finetuned via standard supervised learning on paired actions and observations from approximately 2 minutes of target-domain rollouts. The paper claims this outperforms both in-context and end-to-end adaptation baselines, improves task performance under dynamics shifts, and enables real humanoid robots to execute diverse high-precision whole-body tasks.

Significance. If the central claim holds, the result would be significant for practical humanoid deployment: it demonstrates that a lightweight, reward-free adaptation step using only short uncurated rollouts can bridge dynamics mismatch while keeping the planner fixed. The real-robot experiments and the explicit separation of planner and IDM adaptation are concrete strengths that, if quantitatively supported, would distinguish the method from heavier residual-learning or full-policy-retraining pipelines.

major comments (1)
  1. [Deployment stage / abstract] Deployment stage (abstract and corresponding method section): the claim that supervised regression on observed (obs, action) pairs from ~2 min of target rollouts generated by the unadapted Planner-IDM suffices to produce an IDM compatible with the frozen source planner is load-bearing. The training actions are those emitted by the source IDM; the resulting (obs, action) distribution may therefore differ from the state-action distribution the planner will actually query once the adapted IDM is inserted, leaving residual dynamics error unaddressed. A direct test (e.g., comparison against rollouts collected with an oracle target IDM or closed-loop planner-IDM interaction) is needed to substantiate the assumption.
minor comments (2)
  1. [Abstract] Abstract: states that FADA "outperforms both in-context and end-to-end adaptation baselines" yet supplies no numerical metrics, baseline names, or ablation summary; adding at least one key quantitative result would strengthen the abstract.
  2. [Abstract] The manuscript provides a project page with implementation details and qualitative videos; this is helpful for reproducibility and should be retained.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the deployment stage of FADA. We address the concern regarding the distribution of training data for the IDM finetuning below and outline the planned revisions.

read point-by-point responses
  1. Referee: [Deployment stage / abstract] Deployment stage (abstract and corresponding method section): the claim that supervised regression on observed (obs, action) pairs from ~2 min of target rollouts generated by the unadapted Planner-IDM suffices to produce an IDM compatible with the frozen source planner is load-bearing. The training actions are those emitted by the source IDM; the resulting (obs, action) distribution may therefore differ from the state-action distribution the planner will actually query once the adapted IDM is inserted, leaving residual dynamics error unaddressed. A direct test (e.g., comparison against rollouts collected with an oracle target IDM or closed-loop planner-IDM interaction) is needed to substantiate the assumption.

    Authors: We appreciate this observation on the potential covariate shift in the IDM training distribution. The rollouts are generated in closed-loop by the planner commanding the source IDM in the target domain, so the observations are drawn from the target dynamics under the planner's state queries. The IDM is then trained to map these target observations to the actions that were executed, effectively learning an inverse dynamics model aligned to the target. While the adapted IDM could in principle alter the closed-loop trajectory distribution, the empirical evidence from both simulation and real-robot experiments shows substantial gains in task success rates, indicating practical compatibility. To further substantiate the assumption as suggested, we will add in the revised manuscript a simulated comparison of the adapted IDM against an oracle target IDM (trained with privileged target information) to measure any remaining dynamics error. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external supervised learning on observed rollouts

full rationale

The paper's core pipeline (oracle policy training with privileged information, DAgger distillation to Planner-IDM, then freezing the planner and applying standard supervised regression to the IDM on ~2 minutes of target-domain (observation, action) pairs) contains no equations or claims that reduce a prediction to its own inputs by construction. The finetuning step uses externally observed data from rollouts as supervision rather than any self-referential fit or self-citation chain. This matches the default expectation of a non-circular empirical method paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities stated. Relies on standard assumptions of RL/imitation learning (e.g., that DAgger distillation preserves behavior and supervised IDM updates align dynamics).

pith-pipeline@v0.9.1-grok · 5796 in / 1146 out tokens · 21485 ms · 2026-06-30T01:24:25.543165+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 35 canonical work pages · 3 internal anchors

  1. [1]

    Huang, Weidong and Li, Zhehan and Liu, Hangxin and Hou, Biao and Su, Yao and Zhang, Jingwen , year =. Towards. doi:10.48550/ARXIV.2601.21363 , abstract =

  2. [2]

    and Iacobelli, Francesco and Koolen, Twan and Lambert, Alexander and Lin, Erica and Mungai, M

    Sleiman, Jean Pierre and Li, He and Adu-Bredu, Alphonsus and Deits, Robin and Kumar, Arun and Bergamin, Kevin and Bhardwaj, Mohak and Biddlestone, Scott and Burger, Nicola and Estrada, Matthew A. and Iacobelli, Francesco and Koolen, Twan and Lambert, Alexander and Lin, Erica and Mungai, M. Eva and Nobles, Zach and Rozen-Levy, Shane and Shi, Yuyao and Wang...

  3. [3]

    Uncertainty-

    Li, Chenhao and Krause, Andreas and Hutter, Marco , month = jan, year =. Uncertainty-. doi:10.48550/arXiv.2504.16680 , abstract =

  4. [4]

    and Chen, Boyuan , year =

    Lee, Easop and Moore, Samuel A. and Chen, Boyuan , year =. doi:10.48550/ARXIV.2509.15412 , abstract =

  5. [5]

    Li, Chenhao and Krause, Andreas and Hutter, Marco , month = dec, year =. Robotic. doi:10.48550/arXiv.2501.10100 , abstract =

  6. [6]

    doi:10.48550/arXiv.2602.23843 , abstract =

    Wang, Yunshen and Zhu, Shaohang and Zhi, Peiyuan and Li, Yuhan and Li, Jiaxin and Li, Yong-Lu and Xiao, Yuchen and Wang, Xingxing and Jia, Baoxiong and Huang, Siyuan , month = feb, year =. doi:10.48550/arXiv.2602.23843 , abstract =

  7. [7]

    Chase and Peng, Xue Bin and Ha, Sehoon and Tan, Jie and Levine, Sergey , year =

    Smith, Laura and Kew, J. Chase and Peng, Xue Bin and Ha, Sehoon and Tan, Jie and Levine, Sergey , year =. Legged. doi:10.48550/ARXIV.2110.05457 , abstract =

  8. [8]

    Humanoid

    Xie, Weiji and Bai, Chenjia and Shi, Jiyuan and Yang, Junkai and Ge, Yunfei and Zhang, Weinan and Li, Xuelong , month = feb, year =. Humanoid. doi:10.48550/arXiv.2502.17219 , abstract =

  9. [9]

    Jones, Joshua and Mees, Oier and Sferrazza, Carmelo and Stachowicz, Kyle and Abbeel, Pieter and Levine, Sergey , year =. Beyond. doi:10.48550/ARXIV.2501.04693 , abstract =

  10. [10]

    and Dai, Hongkai and Burchfiel, Benjamin and Majumdar, Anirudha , year =

    Ren, Allen Z. and Dai, Hongkai and Burchfiel, Benjamin and Majumdar, Anirudha , year =. doi:10.48550/ARXIV.2302.04903 , abstract =

  11. [11]

    Lei, Yu and Liu, Minghuan and Maddukuri, Abhiram and Jiang, Zhenyu and Zhu, Yuke , month = apr, year =. A. doi:10.48550/arXiv.2604.13645 , abstract =

  12. [12]

    Sim-and-

    Maddukuri, Abhiram and Jiang, Zhenyu and Chen, Lawrence Yunliang and Nasiriany, Soroush and Xie, Yuqi and Fang, Yu and Huang, Wenqi and Wang, Zu and Xu, Zhenjia and Chernyadev, Nikita and Reed, Scott and Goldberg, Ken and Mandlekar, Ajay and Fan, Linxi and Zhu, Yuke , year =. Sim-and-. doi:10.48550/ARXIV.2503.24361 , abstract =

  13. [13]

    Cha, Woohyun and Cha, Junhyeok and Shin, Jaeyong and Kim, Donghyeon and Park, Jaeheung , month = apr, year =. Sim-to-. doi:10.48550/arXiv.2504.06585 , abstract =

  14. [14]

    Learning

    Seo, Younggyo and Sferrazza, Carmelo and Chen, Juyue and Shi, Guanya and Duan, Rocky and Abbeel, Pieter , month = dec, year =. Learning. doi:10.48550/arXiv.2512.01996 , abstract =

  15. [15]

    Proceedings of The 8th Conference on Robot Learning , series =

    Adapting Humanoid Locomotion over Challenging Terrain via Two-Phase Training , author =. Proceedings of The 8th Conference on Robot Learning , series =. 2025 , publisher =

  16. [16]

    Learning

    Sun, Wandong and Chen, Long and Su, Yongbo and Cao, Baoshi and Liu, Yang and Xie, Zongwu , month = feb, year =. Learning. doi:10.48550/arXiv.2502.16230 , abstract =

  17. [17]

    Karen and Abbeel, Pieter and Shi, Guanya and Duan, Rocky , month = oct, year =

    Zhao, Siheng and Ze, Yanjie and Wang, Yue and Liu, C. Karen and Abbeel, Pieter and Shi, Guanya and Duan, Rocky , month = oct, year =. doi:10.48550/arXiv.2510.05070 , abstract =

  18. [18]

    2026 , keywords =

    Biomimetics , author =. 2026 , keywords =. doi:10.3390/biomimetics11010040 , abstract =

  19. [19]

    doi:10.48550/arXiv.2505.24068 , abstract =

    Krishna, Lokesh and Cheng, Sheng and Li, Junheng and Hovakimyan, Naira and Nguyen, Quan , month = jun, year =. doi:10.48550/arXiv.2505.24068 , abstract =

  20. [20]

    doi:10.48550/arXiv.2502.01143 , abstract =

    He, Tairan and Gao, Jiawei and Xiao, Wenli and Zhang, Yuanhang and Wang, Zi and Wang, Jiashun and Luo, Zhengyi and He, Guanqi and Sobanbab, Nikhil and Pan, Chaoyi and Yi, Zeji and Qu, Guannan and Kitani, Kris and Hodgins, Jessica and Fan, Linxi "Jim" and Zhu, Yuke and Liu, Changliu and Shi, Guanya , month = apr, year =. doi:10.48550/arXiv.2502.01143 , abstract =

  21. [21]

    Lei, Kun and He, Zhengmao and Lu, Chenhao and Hu, Kaizhe and Gao, Yang and Xu, Huazhe , month = mar, year =. Uni-. doi:10.48550/arXiv.2311.03351 , abstract =

  22. [22]

    Zhang, Zhikai and Guo, Jun and Chen, Chao and Wang, Jilong and Lin, Chenghuai and Lian, Yunrui and Xue, Han and Wang, Zhenrong and Liu, Maoqi and Lyu, Jiangran and Liu, Huaping and Wang, He and Yi, Li , month = oct, year =. Track. doi:10.48550/arXiv.2509.13833 , abstract =

  23. [23]

    doi:10.48550/arXiv.2509.23745 , abstract =

    Liu, Min and Pathak, Deepak and Agarwal, Ananye , month = sep, year =. doi:10.48550/arXiv.2509.23745 , abstract =

  24. [24]

    doi:10.48550/arXiv.2503.16806 , abstract =

    Lyu, Jiangran and Li, Ziming and Shi, Xuesong and Xu, Chaoyi and Wang, Yizhou and Wang, He , month = jul, year =. doi:10.48550/arXiv.2503.16806 , abstract =

  25. [25]

    Karen , year =

    Chen, Sirui and Werling, Keenon and Wu, Albert and Liu, C. Karen , year =. Real-time. doi:10.48550/ARXIV.2202.09834 , abstract =

  26. [26]

    doi:10.48550/ARXIV.2504.06662 , abstract =

    Cheng, Jin and Kang, Dongho and Fadini, Gabriele and Shi, Guanya and Coros, Stelian , year =. doi:10.48550/ARXIV.2504.06662 , abstract =

  27. [27]

    doi:10.48550/ARXIV.2405.10315 , abstract =

    Jiang, Yunfan and Wang, Chen and Zhang, Ruohan and Wu, Jiajun and Fei-Fei, Li , year =. doi:10.48550/ARXIV.2405.10315 , abstract =

  28. [28]

    RMA: Rapid Motor Adaptation for Legged Robots

    Kumar, Ashish and Fu, Zipeng and Pathak, Deepak and Malik, Jitendra , month = jul, year =. doi:10.48550/arXiv.2107.04034 , abstract =

  29. [29]

    2025 IEEE International Conference on Robotics and Automation (ICRA) , year =

    Beyond Robustness: Learning Unknown Dynamic Load Adaptation for Quadruped Locomotion on Rough Terrain , author =. 2025 IEEE International Conference on Robotics and Automation (ICRA) , year =. doi:10.1109/ICRA55743.2025.11128639 , url =

  30. [30]

    doi:10.48550/ARXIV.2508.00939 , abstract =

    Huang, Haodong and Sun, Shilong and Wang, Yuanpeng and Li, Chiyao and Huang, Hailin and Xu, Wenfu , year =. doi:10.48550/ARXIV.2508.00939 , abstract =

  31. [31]

    Adapting

    Kumar, Ashish and Li, Zhongyu and Zeng, Jun and Pathak, Deepak and Sreenath, Koushil and Malik, Jitendra , month = sep, year =. Adapting. doi:10.48550/arXiv.2205.15299 , abstract =

  32. [32]

    Da, Longchao and Turnau, Justin and Kutralingam, Thirulogasankar Pranav and Velasquez, Alvaro and Shakarian, Paulo and Wei, Hua , year =. A. doi:10.48550/ARXIV.2502.13187 , abstract =

  33. [33]

    Long, Junfeng and Wang, Zirui and Li, Quanyi and Gao, Jiawei and Cao, Liu and Pang, Jiangmiao , year =. Hybrid. doi:10.48550/ARXIV.2312.11460 , abstract =

  34. [34]

    Karen , month = mar, year =

    Kumar, Visak and Ha, Sehoon and Liu, C. Karen , month = mar, year =. Error-. doi:10.48550/arXiv.2103.07732 , abstract =

  35. [35]

    Made Aswin and Yu, Byeongho and Myung, Hyun , month = mar, year =

    Nahrendra, I. Made Aswin and Yu, Byeongho and Myung, Hyun , month = mar, year =. doi:10.48550/arXiv.2301.10602 , abstract =

  36. [36]

    arXiv preprint arXiv:2509.02815 , year=

    Multi-Embodiment Locomotion at Scale with Extreme Embodiment Randomization , author=. arXiv preprint arXiv:2509.02815 , year=

  37. [37]

    arXiv preprint arXiv:2402.16796 , year=

    Expressive Whole-Body Control for Humanoid Robots , author=. arXiv preprint arXiv:2402.16796 , year=

  38. [38]

    arXiv preprint arXiv:2406.10454 , year=

    HumanPlus: Humanoid Shadowing and Imitation from Humans , author=. arXiv preprint arXiv:2406.10454 , year=

  39. [39]

    arXiv preprint arXiv:2404.05695 , year=

    Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer , author=. arXiv preprint arXiv:2404.05695 , year=

  40. [40]

    arXiv preprint arXiv:2410.21229 , year=

    HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots , author=. arXiv preprint arXiv:2410.21229 , year=

  41. [41]

    arXiv preprint arXiv:2406.08858 , year=

    OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning , author=. arXiv preprint arXiv:2406.08858 , year=

  42. [42]

    arXiv preprint arXiv:2508.12252 , year=

    Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids , author=. arXiv preprint arXiv:2508.12252 , year=

  43. [43]

    Proceedings of The 9th Conference on Robot Learning , pages=

    Sampling-based System Identification with Active Exploration for Legged Sim2Real Learning , author=. Proceedings of The 9th Conference on Robot Learning , pages=. 2025 , editor=

  44. [44]

    arXiv preprint arXiv:2505.06776 , year=

    FALCON: Learning Force-Adaptive Humanoid Loco-Manipulation , author=. arXiv preprint arXiv:2505.06776 , year=

  45. [45]

    IEEE Robotics and Automation Letters , volume=

    Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments , author=. IEEE Robotics and Automation Letters , volume=. 2023 , doi=

  46. [46]

    2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , pages=

    MuJoCo: A physics engine for model-based control , author=. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , pages=. 2012 , doi=

  47. [47]

    Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation

    Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation , author=. 2026 , eprint=. doi:10.48550/arXiv.2603.15759 , note=