Any2Any: Efficient Cross-Embodiment Transfer for Humanoid Whole-Body Tracking

Feng Li; Hua Chen; Ming Yang; Tao Yu

arxiv: 2605.23733 · v1 · pith:PO4F2ZR5new · submitted 2026-05-22 · 💻 cs.RO · cs.AI

Any2Any: Efficient Cross-Embodiment Transfer for Humanoid Whole-Body Tracking

Ming Yang , Tao Yu , Feng Li , Hua Chen This is my paper

Pith reviewed 2026-05-25 03:59 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords cross-embodiment transferwhole-body trackinghumanoid controlpolicy transferkinematic alignmentparameter efficient fine tuningrobot imitationembodiment adaptation

0 comments

The pith

Pretrained whole-body tracking models transfer to new humanoids with 1 percent of the data and compute.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether whole-body tracking models for humanoids can move from one robot body to another without retraining everything. It introduces Any2Any, which aligns the kinematics of the old and new robots so the existing policy can generate actions for the target, then uses lightweight fine-tuning on key modules to adjust for different dynamics. This lets the model achieve good tracking results while using far less resources than training anew. Readers might care because it lowers the barrier to putting advanced control on fresh humanoid platforms that would otherwise require massive datasets and processing power.

Core claim

Any2Any first performs kinematic alignment between source and target humanoids to align their input and output spaces, enabling reuse of the pretrained source policy on the target embodiment. It then applies lightweight parameter-efficient fine-tuning components to selected dynamics-sensitive modules. This preserves behavioral priors while adapting to the target robot, and experiments demonstrate competitive tracking performance on multiple platforms using only 1% of the compute and data needed for full training from scratch.

What carries the argument

Kinematic alignment of input and output spaces combined with parameter-efficient fine-tuning on dynamics-sensitive modules.

If this is right

Pretrained models can be reused across different humanoid embodiments with minimal adaptation.
Training costs drop substantially compared to training from scratch.
Convergence speeds up while maintaining or improving tracking performance.
The method works on multiple humanoid platforms and various pretrained backbones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar alignment techniques might allow quick adaptation of other robot skills like walking or grasping to new hardware designs.
If the method scales, companies could maintain a single high-quality model and deploy it to many robot variants with low cost.
Testing on robots with greater differences in size or joint configuration would reveal the limits of the kinematic step.

Load-bearing premise

Aligning the kinematics of the source and target robots makes the pretrained policy generate meaningful actions on the new robot even before dynamics are adjusted.

What would settle it

Running the source policy on the target robot after only kinematic alignment and observing whether it can track motions without falling or producing useless joint commands.

Figures

Figures reproduced from arXiv: 2605.23733 by Feng Li, Hua Chen, Ming Yang, Tao Yu.

**Figure 1.** Figure 1: Illustration of ANY2ANY. A pretrained whole-body tracker (WBT) learned on specific humanoid can be efficiently transferred to another humanoid platform through the proposed ANY2ANY. For example, GEAR-SONIC [1], a large-scale pretrained WBT, can be adapted to a target robot LimX Oli using only a small fraction of the original training compute and data. Abstract: Whole-body tracking (WBT) models have become… view at source ↗

**Figure 2.** Figure 2: Architecture of ANY2ANY. The proposed framework adapts a pretrained whole-body tracker to arbitrary humanoid embodiments by combining Kinematic Alignment, which maps observations and actions across different robot morphologies, with Dynamics Adaptation, which efficiently fine-tunes lightweight modules to account for target-specific dynamics. fine-tuning instead freezes most pretrained weights and updates… view at source ↗

**Figure 3.** Figure 3: ANY2ANY transfer from Sonic to LimX humanoids, including SONIC2OLI and SONIC2LUNA. The curves compare ANY2ANY with the baseline, and the snapshots show stable rollout motions after adaptation. ther demonstrate faster convergence: ANY2ANY rapidly reaches high tracking rewards in the early training stage and obtains higher or comparable final rewards. The sim-to-sim snapshots verify that the adapted policie… view at source ↗

**Figure 4.** Figure 4: ANY2ANY transfer from the Oli-pretrained WBT policy to three target humanoids: OLIWBT2LUNA, OLIWBT2G1, and OLIWBT2H1. ANY2ANY is compared with the baseline trained from scratch. (a) Tracking-error radar plots. (b) Training curves of normalized tracking reward. (c) Sim-to-sim rollouts on diverse motions. scratch, but its convergence and final reward are still limited. This suggests that the pretrained WBT … view at source ↗

**Figure 5.** Figure 5: Ablation of ANY2ANY architectural components on OLIWBT2LUNA. The top table compares aligned full fine-tuning and ANY2ANY with LoRA. (a) Kinematic alignment ablation. (b) PEFT method ablation under kinematic alignment. ANY2ANY-LoRA achieves comparable rewards to full fine-tuning while using fewer trainable parameters and lower training cost. Setting Actor Critic Backbone Ref. In. Prop. In. Out. Backbone In.… view at source ↗

**Figure 6.** Figure 6: Ablation of LoRA injection scopes on OLIWBT2LUNA. The table summarizes the component-level injection locations across actor and critic modules, while the curves show the resulting joint tracking reward and mean episode reward [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Performance comparison under varying data scales. Left: quantitative tracking errors. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of reward curves under different GPU settings [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Whole-body tracking (WBT) models have become a key foundation for humanoid robots, enabling them to imitate diverse motions with high fidelity. Training such models from scratch requires large-scale data and computation, making rapid deployment on new humanoid platforms costly. This raises a natural question: Can pretrained WBT models transfer across embodiments with minimal adaptation? To answer this question, we propose Any2Any, a paradigm that efficiently transfers an existing WBT specialist to a new humanoid embodiment with only a small amount of data and compute. Any2Any first performs kinematic alignment between source and target humanoids, aligning their input and output spaces so that the pretrained source policy can be meaningfully reused on the target embodiment.Any2Any then performs dynamics adaptation by applying lightweight parameter-efficient fine-tuning (PEFT) components to selected dynamics-sensitive modules, preserving useful behavioral priors while enabling targeted adaptation to the target robot. Extensive experiments on multiple humanoid platforms and pretrained backbones show that Any2Any substantially accelerates convergence and reduces training cost compared with training from scratch, while achieving competitive or superior tracking performance. Notably, using only 1% of the compute and data required for full training, Any2Any successfully transfers Sonic models pre-trained on Unitree G1 to LimX Oli and LimX Luna. These results suggest that pretrained WBT specialists can be efficiently reused across embodiments, providing a scalable path toward deploying humanoid whole-body control on new robots.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Any2Any, a cross-embodiment transfer method for humanoid whole-body tracking (WBT) models. It performs kinematic alignment of input/output spaces to enable reuse of a pretrained source policy (e.g., Sonic on Unitree G1), followed by targeted parameter-efficient fine-tuning (PEFT) on dynamics-sensitive modules for adaptation to target embodiments (LimX Oli, LimX Luna). The central empirical claim is that this achieves competitive or superior tracking performance using only 1% of the compute and data required for training from scratch.

Significance. If the transfer results hold with supporting ablations and metrics, the work would be significant for robotics by demonstrating a scalable, low-cost path to reuse WBT specialists across hardware platforms, reducing the data and compute barriers that currently limit rapid deployment of humanoid controllers.

major comments (2)

[Abstract] Abstract: the claim that kinematic alignment enables the pretrained source policy to be 'meaningfully reused' on the target embodiment before PEFT is load-bearing for the efficiency narrative (1% compute/data). No quantitative metrics, baseline tracking errors, or ablation isolating alignment-only performance versus post-PEFT performance are referenced, leaving open whether alignment alone yields stable/useful actions or whether gains derive primarily from the subsequent adaptation step.
[Experiments] Experiments (implied by abstract claims): the reported success transferring Sonic models to LimX Oli and Luna requires explicit reporting of failure modes, embodiment-specific dynamics mismatches (inertia, actuator response), and comparisons to training-from-scratch baselines with the same data budget to substantiate that the method 'substantially accelerates convergence'.

minor comments (1)

[Abstract] Abstract: include at least one key quantitative result (e.g., tracking error or success rate) to ground the 'competitive or superior' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on clarifying the role of kinematic alignment and strengthening the experimental analysis. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that kinematic alignment enables the pretrained source policy to be 'meaningfully reused' on the target embodiment before PEFT is load-bearing for the efficiency narrative (1% compute/data). No quantitative metrics, baseline tracking errors, or ablation isolating alignment-only performance versus post-PEFT performance are referenced, leaving open whether alignment alone yields stable/useful actions or whether gains derive primarily from the subsequent adaptation step.

Authors: We agree that the abstract claim would be strengthened by quantitative support. In the revised manuscript we will add an ablation reporting tracking metrics (joint errors, success rates) for the kinematically aligned source policy on target embodiments before PEFT, with direct comparison to post-PEFT performance and from-scratch baselines. This will isolate the contribution of alignment. revision: yes
Referee: [Experiments] Experiments (implied by abstract claims): the reported success transferring Sonic models to LimX Oli and Luna requires explicit reporting of failure modes, embodiment-specific dynamics mismatches (inertia, actuator response), and comparisons to training-from-scratch baselines with the same data budget to substantiate that the method 'substantially accelerates convergence'.

Authors: We will expand the experiments section to report observed failure modes, discuss specific dynamics mismatches (inertia, actuator response) across G1/Oli/Luna, and include side-by-side convergence curves and final performance against from-scratch training restricted to the same 1% data/compute budget. These additions will directly address the acceleration claim. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical transfer method with no derivation chain

full rationale

The paper presents Any2Any as an empirical procedure consisting of kinematic alignment of input/output spaces followed by lightweight PEFT on dynamics-sensitive modules. No equations, first-principles derivations, or predictions are claimed; performance claims rest on experimental comparisons (1% compute/data yielding competitive tracking on LimX platforms) rather than any quantity that reduces to its own fitted inputs or self-citations by construction. The method is framed as a practical reuse strategy whose validity is tested externally via ablation-style experiments, satisfying the criteria for a self-contained empirical result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract; the method relies on standard kinematic mapping and existing PEFT techniques.

pith-pipeline@v0.9.0 · 5787 in / 996 out tokens · 35822 ms · 2026-05-25T03:59:31.010449+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 6 internal anchors

[1]

Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Castaneda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Pirotta, A

M. Pirotta, A. Tirinzoni, A. Touati, A. Lazaric, and Y . Ollivier. Fast imitation via behavior foundation models. InInternational Conference on Learning Representations, volume 2024, pages 12685–12724, 2024

work page 2024
[3]

M. Yuan, T. Yu, W. Ge, X. Yao, D. Li, H. Wang, J. Chen, B. Li, W. Zhang, W. Zeng, et al. A sur- vey of behavior foundation model: Next-generation whole-body control system of humanoid robots.IEEE transactions on pattern analysis and machine intelligence, 2025

work page 2025
[4]

Cetin, A

E. Cetin, A. Touati, and Y . Ollivier. Finer behavioral foundation models via auto-regressive features and advantage weighting.arXiv preprint arXiv:2412.04368, 2024

work page arXiv 2024
[5]

Y . Li, Z. Luo, T. Zhang, C. Dai, A. Kanervisto, A. Tirinzoni, H. Weng, K. Kitani, M. Guzek, A. Touati, et al. Bfm-zero: A promptable behavioral foundation model for humanoid control using unsupervised reinforcement learning.arXiv preprint arXiv:2511.04131, 2025

work page arXiv 2025
[6]

Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu, et al. Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning.IEEE/ASME Transactions on Mechatronics, 31(2):2300–2330, 2026

work page 2026
[7]

Cheng, Y

X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang. Expressive whole-body control for humanoid robots.arXiv preprint arXiv:2402.16796, 2024

work page arXiv 2024
[8]

T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi. Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning.arXiv preprint arXiv:2406.08858, 2024

work page arXiv 2024
[9]

Y . Ze, Z. Chen, J. P. Ara´ujo, Z.-a. Cao, X. B. Peng, J. Wu, and C. K. Liu. Twist: Teleoperated whole-body imitation system.arXiv preprint arXiv:2505.02833, 2025

work page arXiv 2025
[10]

Z. Chen, M. Ji, X. Cheng, X. Peng, X. B. Peng, and X. Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

work page arXiv 2025
[11]

Gupta, L

A. Gupta, L. Fan, S. Ganguli, and L. Fei-Fei. Metamorph: Learning universal controllers with transformers.arXiv preprint arXiv:2203.11931, 2022

work page arXiv 2022
[13]

Y . Xue, Y . Lin, W. Dong, Y . Tang, J. Wang, J. Pang, M. Zhou, M. Liu, and W. Zhang. Scalable and general whole-body control for cross-humanoid locomotion.arXiv preprint arXiv:2602.05791, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[14]

M. Liu, D. Pathak, and A. Agarwal. Locoformer: Generalist locomotion via long-context adaptation. InProceedings of The 9th Conference on Robot Learning, 2025

work page 2025
[15]

S. Yang, Z. Fu, Z. Cao, J. Guo, P. Wensing, W. Zhang, and H. Chen. Multi-loco: Unifying multi-embodiment legged locomotion via reinforcement learning augmented diffusion.arXiv preprint arXiv:2506.11470, 2025

work page arXiv 2025
[18]

S. Bai, M. Li, X. Lv, J. Wang, X. Wang, F. Liao, C. Hou, L. Gu, W. Zhou, K. Wu, et al. Hex: Humanoid-aligned experts for cross-embodiment whole-body manipulation.arXiv preprint arXiv:2604.07993, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[19]

D. Kim, J. Lee, J. Ahn, O. Campbell, H. Hwang, and L. Sentis. Computationally-robust and efficient prioritized whole-body controller with contact constraints. In2018 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS), pages 1–8. IEEE, 2018. doi:10.1109/IROS.2018.8593767

work page doi:10.1109/iros.2018.8593767 2018
[20]

Chignoli, D

M. Chignoli, D. Kim, E. Stanger-Jones, and S. Kim. The mit humanoid robot: De- sign, motion planning, and control for acrobatic behaviors. In2020 IEEE-RAS 20th In- ternational Conference on Humanoid Robots (Humanoids), pages 1–8. IEEE, 2021. doi: 10.1109/HUMANOIDS47582.2021.9555782

work page doi:10.1109/humanoids47582.2021.9555782 2021
[21]

J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu. Retargeting matters: General motion retargeting for humanoid motion tracking.arXiv preprint arXiv:2510.02252, 2025

work page arXiv 2025
[22]

W. Zeng, S. Lu, K. Yin, X. Niu, M. Dai, J. Wang, and J. Pang. Behavior foundation model for humanoid robots.arXiv preprint arXiv:2509.13780, 2025

work page arXiv 2025
[23]

T. Zhu, G. Cai, Y . Zhaohui, G. Ren, H. Xie, Z. Wang, J. Wu, J. Wang, X. Yang, Y . Mu, et al. Clot: Closed-loop global motion tracking for whole-body humanoid teleoperation.arXiv preprint arXiv:2602.15060, 2026

work page arXiv 2026
[24]

M. Chen, K. Wang, B. Zhang, X. Ma, Z. Yang, Y . Ren, Q. Huang, Z. Zhu, Y . Wang, and Z. Su. Holomotion-1 technical report.arXiv preprint arXiv:2605.15336, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[25]

N. Ding, Y . Qin, G. Yang, F. Wei, Z. Yang, Y . Su, S. Hu, Y . Chen, C.-M. Chan, W. Chen, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models.Nature Machine Intelligence, 5(3):220–235, 2023. doi:10.1038/s42256-023-00626-4. URLhttps://www. nature.com/articles/s42256-023-00626-4

work page doi:10.1038/s42256-023-00626-4 2023
[26]

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. InInternational Conference on Learning Rep- resentations, 2022. URLhttps://openreview.net/forum?id=nZeVKeeFYf9

work page 2022
[27]

Houlsby, A

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. At- tariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. InInternational Confer- ence on Machine Learning, pages 2790–2799, 2019. URLhttps://arxiv.org/abs/1902. 00751

work page 2019
[28]

X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. InPro- ceedings of the 59th Annual Meeting of the Association for Computational Linguistics, pages 4582–4597, 2021. doi:10.18653/v1/2021.acl-long.353. URLhttps://aclanthology.org/ 2021.acl-long.353/

work page doi:10.18653/v1/2021.acl-long.353 2021
[29]

X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills.ACM Transactions on Graphics, 37(4):1–14, 2018. doi:10.1145/3197517.3201311. URLhttps://arxiv.org/abs/1804. 02717

work page doi:10.1145/3197517.3201311 2018
[30]

Z. Luo, J. Cao, A. Winkler, K. Kitani, and W. Xu. Perpetual humanoid control for real-time simulated avatars. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. URLhttps://openaccess.thecvf.com/content/ICCV2023/ html/Luo_Perpetual_Humanoid_Control_for_Real-time_Simulated_Avatars_ ICCV_2023_paper.html. 16

work page 2023
[31]

Y . Li, Y . Lin, J. Cui, T. Liu, W. Liang, Y . Zhu, and S. Huang. Clone: Closed-loop whole-body humanoid teleoperation for long-horizon tasks. InProceedings of The 9th Conference on Robot Learning, 2025. URLhttps://arxiv.org/abs/2506.08931

work page arXiv 2025
[32]

Y . Pan, R. Qiao, L. Chen, K. Chitta, L. Pan, H. Mai, Q. Bu, H. Zhao, C. Zheng, P. Luo, et al. Agility meets stability: Versatile humanoid control with heterogeneous data.arXiv preprint arXiv:2511.17373, 2025

work page arXiv 2025
[33]

Sun, B.-S

Z. Sun, B.-S. Huang, Y . Peng, X. Li, J. Ma, Y . Sun, Z. Li, H. Jiang, B. Gao, Z. Bing, et al. Mosaic: Bridging the sim-to-real gap in generalist humanoid motion tracking and teleoperation with rapid residual adaptation.arXiv preprint arXiv:2602.08594, 2026

work page arXiv 2026
[34]

Y . Wang, S. Zhu, P. Zhi, Y . Li, J. Li, Y .-L. Li, Y . Xiao, X. Wang, B. Jia, and S. Huang. Om- nixtreme: Breaking the generality barrier in high-dynamic humanoid control.arXiv preprint arXiv:2602.23843, 2026

work page arXiv 2026
[35]

Y . Lin, M. Liu, Y . Xue, M. Zhou, Y . Yu, J. Pang, and W. Zhang. H-zero: Cross- humanoid locomotion pretraining enables few-shot novel embodiment transfer.arXiv preprint arXiv:2512.00971, 2025. URLhttps://arxiv.org/abs/2512.00971

work page arXiv 2025
[36]

10610948

Open X-Embodiment Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, et al. Open x-embodiment: Robotic learn- ing datasets and rt-x models. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903, 2024. doi:10.1109/ICRA57147.2024.10611477. URLhttps...

work page doi:10.1109/icra57147.2024.10611477 2024
[37]

Octo: An Open-Source Generalist Robot Policy

Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y . L. Tan, L. Y . Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy. InProceedings of Robotics: Science and Systems (RSS), 2024. doi:10.15607/RSS.2024.XX.090. URLhttps: //ar...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.15607/rss.2024.xx.090 2024
[38]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

J. Bjorck, F. Castaneda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, et al. GR00T N1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025. URLhttps://arxiv.org/abs/2503.14734

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

H. Luo, Y . Wang, W. Zhang, S. Zheng, Z. Xi, C. Xu, H. Xu, H. Yuan, C. Zhang, Y . Wang, Y . Feng, and Z. Lu. Being-H0.5: Scaling human-centric robot learning for cross-embodiment generalization.arXiv preprint arXiv:2601.12993, 2026. URLhttps://arxiv.org/abs/ 2601.12993

work page arXiv 2026
[40]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirec- tional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies, pages 4171–4186. Association for Computational Linguistics, 2019. doi: 10.1...

work page doi:10.18653/v1/n19-1423 2019
[41]

Houlsby, A

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. At- tariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learn- ing Research, pages 2790–2799. PMLR, 2019. URLhttps://proceedings.mlr.press/ v97/h...

work page 2019
[42]

M. J. Kim, C. Finn, and P. Liang. Fine-tuning vision-language-action models: Optimizing speed and success, 2025

work page 2025
[43]

Y . Wang, P. Ding, L. Li, C. Cui, Z. Ge, X. Tong, W. Song, H. Zhao, W. Zhao, P. Hou, S. Huang, Y . Tang, W. Wang, R. Zhang, J. Liu, and D. Wang. Vla-adapter: An effective paradigm for tiny-scale vision-language-action model, 2025. 17

work page 2025
[44]

LimX Oli: Full-Size General-Purpose Humanoid Robot.https://www

LimX Dynamics. LimX Oli: Full-Size General-Purpose Humanoid Robot.https://www. limxdynamics.com/en/products/oli, 2025. Accessed: 2026-05-22

work page 2025
[45]

LimX Luna Humanoid Robot.https://x.com/LimX_Dynamics, 2026

LimX Dynamics. LimX Luna Humanoid Robot.https://x.com/LimX_Dynamics, 2026. Official product page not yet publicly available at the time of access; accessed: 2026-05-22

work page 2026
[46]

Unitree G1 Humanoid Robot.https://www.unitree.com/g1, 2024

Unitree Robotics. Unitree G1 Humanoid Robot.https://www.unitree.com/g1, 2024. Accessed: 2026-05-22

work page 2024
[47]

Unitree H1 Universal Humanoid Robot.https://www.unitree.com/h1,

Unitree Robotics. Unitree H1 Universal Humanoid Robot.https://www.unitree.com/h1,

work page
[48]

Accessed: 2026-05-22

work page 2026
[49]

Mahmood, N

N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black. Amass: Archive of motion capture as surface shapes. InProceedings of the IEEE/CVF international conference on computer vision, pages 5442–5451, 2019. 18

work page 2019

[1] [1]

Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Castaneda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Pirotta, A

M. Pirotta, A. Tirinzoni, A. Touati, A. Lazaric, and Y . Ollivier. Fast imitation via behavior foundation models. InInternational Conference on Learning Representations, volume 2024, pages 12685–12724, 2024

work page 2024

[3] [3]

M. Yuan, T. Yu, W. Ge, X. Yao, D. Li, H. Wang, J. Chen, B. Li, W. Zhang, W. Zeng, et al. A sur- vey of behavior foundation model: Next-generation whole-body control system of humanoid robots.IEEE transactions on pattern analysis and machine intelligence, 2025

work page 2025

[4] [4]

Cetin, A

E. Cetin, A. Touati, and Y . Ollivier. Finer behavioral foundation models via auto-regressive features and advantage weighting.arXiv preprint arXiv:2412.04368, 2024

work page arXiv 2024

[5] [5]

Y . Li, Z. Luo, T. Zhang, C. Dai, A. Kanervisto, A. Tirinzoni, H. Weng, K. Kitani, M. Guzek, A. Touati, et al. Bfm-zero: A promptable behavioral foundation model for humanoid control using unsupervised reinforcement learning.arXiv preprint arXiv:2511.04131, 2025

work page arXiv 2025

[6] [6]

Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu, et al. Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning.IEEE/ASME Transactions on Mechatronics, 31(2):2300–2330, 2026

work page 2026

[7] [7]

Cheng, Y

X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang. Expressive whole-body control for humanoid robots.arXiv preprint arXiv:2402.16796, 2024

work page arXiv 2024

[8] [8]

T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi. Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning.arXiv preprint arXiv:2406.08858, 2024

work page arXiv 2024

[9] [9]

Y . Ze, Z. Chen, J. P. Ara´ujo, Z.-a. Cao, X. B. Peng, J. Wu, and C. K. Liu. Twist: Teleoperated whole-body imitation system.arXiv preprint arXiv:2505.02833, 2025

work page arXiv 2025

[10] [10]

Z. Chen, M. Ji, X. Cheng, X. Peng, X. B. Peng, and X. Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

work page arXiv 2025

[11] [11]

Gupta, L

A. Gupta, L. Fan, S. Ganguli, and L. Fei-Fei. Metamorph: Learning universal controllers with transformers.arXiv preprint arXiv:2203.11931, 2022

work page arXiv 2022

[12] [13]

Y . Xue, Y . Lin, W. Dong, Y . Tang, J. Wang, J. Pang, M. Zhou, M. Liu, and W. Zhang. Scalable and general whole-body control for cross-humanoid locomotion.arXiv preprint arXiv:2602.05791, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[13] [14]

M. Liu, D. Pathak, and A. Agarwal. Locoformer: Generalist locomotion via long-context adaptation. InProceedings of The 9th Conference on Robot Learning, 2025

work page 2025

[14] [15]

S. Yang, Z. Fu, Z. Cao, J. Guo, P. Wensing, W. Zhang, and H. Chen. Multi-loco: Unifying multi-embodiment legged locomotion via reinforcement learning augmented diffusion.arXiv preprint arXiv:2506.11470, 2025

work page arXiv 2025

[15] [18]

S. Bai, M. Li, X. Lv, J. Wang, X. Wang, F. Liao, C. Hou, L. Gu, W. Zhou, K. Wu, et al. Hex: Humanoid-aligned experts for cross-embodiment whole-body manipulation.arXiv preprint arXiv:2604.07993, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[16] [19]

D. Kim, J. Lee, J. Ahn, O. Campbell, H. Hwang, and L. Sentis. Computationally-robust and efficient prioritized whole-body controller with contact constraints. In2018 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS), pages 1–8. IEEE, 2018. doi:10.1109/IROS.2018.8593767

work page doi:10.1109/iros.2018.8593767 2018

[17] [20]

Chignoli, D

M. Chignoli, D. Kim, E. Stanger-Jones, and S. Kim. The mit humanoid robot: De- sign, motion planning, and control for acrobatic behaviors. In2020 IEEE-RAS 20th In- ternational Conference on Humanoid Robots (Humanoids), pages 1–8. IEEE, 2021. doi: 10.1109/HUMANOIDS47582.2021.9555782

work page doi:10.1109/humanoids47582.2021.9555782 2021

[18] [21]

J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu. Retargeting matters: General motion retargeting for humanoid motion tracking.arXiv preprint arXiv:2510.02252, 2025

work page arXiv 2025

[19] [22]

W. Zeng, S. Lu, K. Yin, X. Niu, M. Dai, J. Wang, and J. Pang. Behavior foundation model for humanoid robots.arXiv preprint arXiv:2509.13780, 2025

work page arXiv 2025

[20] [23]

T. Zhu, G. Cai, Y . Zhaohui, G. Ren, H. Xie, Z. Wang, J. Wu, J. Wang, X. Yang, Y . Mu, et al. Clot: Closed-loop global motion tracking for whole-body humanoid teleoperation.arXiv preprint arXiv:2602.15060, 2026

work page arXiv 2026

[21] [24]

M. Chen, K. Wang, B. Zhang, X. Ma, Z. Yang, Y . Ren, Q. Huang, Z. Zhu, Y . Wang, and Z. Su. Holomotion-1 technical report.arXiv preprint arXiv:2605.15336, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[22] [25]

N. Ding, Y . Qin, G. Yang, F. Wei, Z. Yang, Y . Su, S. Hu, Y . Chen, C.-M. Chan, W. Chen, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models.Nature Machine Intelligence, 5(3):220–235, 2023. doi:10.1038/s42256-023-00626-4. URLhttps://www. nature.com/articles/s42256-023-00626-4

work page doi:10.1038/s42256-023-00626-4 2023

[23] [26]

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. InInternational Conference on Learning Rep- resentations, 2022. URLhttps://openreview.net/forum?id=nZeVKeeFYf9

work page 2022

[24] [27]

Houlsby, A

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. At- tariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. InInternational Confer- ence on Machine Learning, pages 2790–2799, 2019. URLhttps://arxiv.org/abs/1902. 00751

work page 2019

[25] [28]

X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. InPro- ceedings of the 59th Annual Meeting of the Association for Computational Linguistics, pages 4582–4597, 2021. doi:10.18653/v1/2021.acl-long.353. URLhttps://aclanthology.org/ 2021.acl-long.353/

work page doi:10.18653/v1/2021.acl-long.353 2021

[26] [29]

X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills.ACM Transactions on Graphics, 37(4):1–14, 2018. doi:10.1145/3197517.3201311. URLhttps://arxiv.org/abs/1804. 02717

work page doi:10.1145/3197517.3201311 2018

[27] [30]

Z. Luo, J. Cao, A. Winkler, K. Kitani, and W. Xu. Perpetual humanoid control for real-time simulated avatars. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. URLhttps://openaccess.thecvf.com/content/ICCV2023/ html/Luo_Perpetual_Humanoid_Control_for_Real-time_Simulated_Avatars_ ICCV_2023_paper.html. 16

work page 2023

[28] [31]

Y . Li, Y . Lin, J. Cui, T. Liu, W. Liang, Y . Zhu, and S. Huang. Clone: Closed-loop whole-body humanoid teleoperation for long-horizon tasks. InProceedings of The 9th Conference on Robot Learning, 2025. URLhttps://arxiv.org/abs/2506.08931

work page arXiv 2025

[29] [32]

Y . Pan, R. Qiao, L. Chen, K. Chitta, L. Pan, H. Mai, Q. Bu, H. Zhao, C. Zheng, P. Luo, et al. Agility meets stability: Versatile humanoid control with heterogeneous data.arXiv preprint arXiv:2511.17373, 2025

work page arXiv 2025

[30] [33]

Sun, B.-S

Z. Sun, B.-S. Huang, Y . Peng, X. Li, J. Ma, Y . Sun, Z. Li, H. Jiang, B. Gao, Z. Bing, et al. Mosaic: Bridging the sim-to-real gap in generalist humanoid motion tracking and teleoperation with rapid residual adaptation.arXiv preprint arXiv:2602.08594, 2026

work page arXiv 2026

[31] [34]

Y . Wang, S. Zhu, P. Zhi, Y . Li, J. Li, Y .-L. Li, Y . Xiao, X. Wang, B. Jia, and S. Huang. Om- nixtreme: Breaking the generality barrier in high-dynamic humanoid control.arXiv preprint arXiv:2602.23843, 2026

work page arXiv 2026

[32] [35]

Y . Lin, M. Liu, Y . Xue, M. Zhou, Y . Yu, J. Pang, and W. Zhang. H-zero: Cross- humanoid locomotion pretraining enables few-shot novel embodiment transfer.arXiv preprint arXiv:2512.00971, 2025. URLhttps://arxiv.org/abs/2512.00971

work page arXiv 2025

[33] [36]

10610948

Open X-Embodiment Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, et al. Open x-embodiment: Robotic learn- ing datasets and rt-x models. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903, 2024. doi:10.1109/ICRA57147.2024.10611477. URLhttps...

work page doi:10.1109/icra57147.2024.10611477 2024

[34] [37]

Octo: An Open-Source Generalist Robot Policy

Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y . L. Tan, L. Y . Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy. InProceedings of Robotics: Science and Systems (RSS), 2024. doi:10.15607/RSS.2024.XX.090. URLhttps: //ar...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.15607/rss.2024.xx.090 2024

[35] [38]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

J. Bjorck, F. Castaneda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, et al. GR00T N1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025. URLhttps://arxiv.org/abs/2503.14734

work page internal anchor Pith review Pith/arXiv arXiv 2025

[36] [39]

H. Luo, Y . Wang, W. Zhang, S. Zheng, Z. Xi, C. Xu, H. Xu, H. Yuan, C. Zhang, Y . Wang, Y . Feng, and Z. Lu. Being-H0.5: Scaling human-centric robot learning for cross-embodiment generalization.arXiv preprint arXiv:2601.12993, 2026. URLhttps://arxiv.org/abs/ 2601.12993

work page arXiv 2026

[37] [40]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirec- tional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies, pages 4171–4186. Association for Computational Linguistics, 2019. doi: 10.1...

work page doi:10.18653/v1/n19-1423 2019

[38] [41]

Houlsby, A

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. At- tariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learn- ing Research, pages 2790–2799. PMLR, 2019. URLhttps://proceedings.mlr.press/ v97/h...

work page 2019

[39] [42]

M. J. Kim, C. Finn, and P. Liang. Fine-tuning vision-language-action models: Optimizing speed and success, 2025

work page 2025

[40] [43]

Y . Wang, P. Ding, L. Li, C. Cui, Z. Ge, X. Tong, W. Song, H. Zhao, W. Zhao, P. Hou, S. Huang, Y . Tang, W. Wang, R. Zhang, J. Liu, and D. Wang. Vla-adapter: An effective paradigm for tiny-scale vision-language-action model, 2025. 17

work page 2025

[41] [44]

LimX Oli: Full-Size General-Purpose Humanoid Robot.https://www

LimX Dynamics. LimX Oli: Full-Size General-Purpose Humanoid Robot.https://www. limxdynamics.com/en/products/oli, 2025. Accessed: 2026-05-22

work page 2025

[42] [45]

LimX Luna Humanoid Robot.https://x.com/LimX_Dynamics, 2026

LimX Dynamics. LimX Luna Humanoid Robot.https://x.com/LimX_Dynamics, 2026. Official product page not yet publicly available at the time of access; accessed: 2026-05-22

work page 2026

[43] [46]

Unitree G1 Humanoid Robot.https://www.unitree.com/g1, 2024

Unitree Robotics. Unitree G1 Humanoid Robot.https://www.unitree.com/g1, 2024. Accessed: 2026-05-22

work page 2024

[44] [47]

Unitree H1 Universal Humanoid Robot.https://www.unitree.com/h1,

Unitree Robotics. Unitree H1 Universal Humanoid Robot.https://www.unitree.com/h1,

work page

[45] [48]

Accessed: 2026-05-22

work page 2026

[46] [49]

Mahmood, N

N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black. Amass: Archive of motion capture as surface shapes. InProceedings of the IEEE/CVF international conference on computer vision, pages 5442–5451, 2019. 18

work page 2019