Any2Any: Efficient Cross-Embodiment Transfer for Humanoid Whole-Body Tracking
Pith reviewed 2026-05-25 03:59 UTC · model grok-4.3
The pith
Pretrained whole-body tracking models transfer to new humanoids with 1 percent of the data and compute.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Any2Any first performs kinematic alignment between source and target humanoids to align their input and output spaces, enabling reuse of the pretrained source policy on the target embodiment. It then applies lightweight parameter-efficient fine-tuning components to selected dynamics-sensitive modules. This preserves behavioral priors while adapting to the target robot, and experiments demonstrate competitive tracking performance on multiple platforms using only 1% of the compute and data needed for full training from scratch.
What carries the argument
Kinematic alignment of input and output spaces combined with parameter-efficient fine-tuning on dynamics-sensitive modules.
If this is right
- Pretrained models can be reused across different humanoid embodiments with minimal adaptation.
- Training costs drop substantially compared to training from scratch.
- Convergence speeds up while maintaining or improving tracking performance.
- The method works on multiple humanoid platforms and various pretrained backbones.
Where Pith is reading between the lines
- Similar alignment techniques might allow quick adaptation of other robot skills like walking or grasping to new hardware designs.
- If the method scales, companies could maintain a single high-quality model and deploy it to many robot variants with low cost.
- Testing on robots with greater differences in size or joint configuration would reveal the limits of the kinematic step.
Load-bearing premise
Aligning the kinematics of the source and target robots makes the pretrained policy generate meaningful actions on the new robot even before dynamics are adjusted.
What would settle it
Running the source policy on the target robot after only kinematic alignment and observing whether it can track motions without falling or producing useless joint commands.
Figures
read the original abstract
Whole-body tracking (WBT) models have become a key foundation for humanoid robots, enabling them to imitate diverse motions with high fidelity. Training such models from scratch requires large-scale data and computation, making rapid deployment on new humanoid platforms costly. This raises a natural question: Can pretrained WBT models transfer across embodiments with minimal adaptation? To answer this question, we propose Any2Any, a paradigm that efficiently transfers an existing WBT specialist to a new humanoid embodiment with only a small amount of data and compute. Any2Any first performs kinematic alignment between source and target humanoids, aligning their input and output spaces so that the pretrained source policy can be meaningfully reused on the target embodiment.Any2Any then performs dynamics adaptation by applying lightweight parameter-efficient fine-tuning (PEFT) components to selected dynamics-sensitive modules, preserving useful behavioral priors while enabling targeted adaptation to the target robot. Extensive experiments on multiple humanoid platforms and pretrained backbones show that Any2Any substantially accelerates convergence and reduces training cost compared with training from scratch, while achieving competitive or superior tracking performance. Notably, using only 1% of the compute and data required for full training, Any2Any successfully transfers Sonic models pre-trained on Unitree G1 to LimX Oli and LimX Luna. These results suggest that pretrained WBT specialists can be efficiently reused across embodiments, providing a scalable path toward deploying humanoid whole-body control on new robots.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Any2Any, a cross-embodiment transfer method for humanoid whole-body tracking (WBT) models. It performs kinematic alignment of input/output spaces to enable reuse of a pretrained source policy (e.g., Sonic on Unitree G1), followed by targeted parameter-efficient fine-tuning (PEFT) on dynamics-sensitive modules for adaptation to target embodiments (LimX Oli, LimX Luna). The central empirical claim is that this achieves competitive or superior tracking performance using only 1% of the compute and data required for training from scratch.
Significance. If the transfer results hold with supporting ablations and metrics, the work would be significant for robotics by demonstrating a scalable, low-cost path to reuse WBT specialists across hardware platforms, reducing the data and compute barriers that currently limit rapid deployment of humanoid controllers.
major comments (2)
- [Abstract] Abstract: the claim that kinematic alignment enables the pretrained source policy to be 'meaningfully reused' on the target embodiment before PEFT is load-bearing for the efficiency narrative (1% compute/data). No quantitative metrics, baseline tracking errors, or ablation isolating alignment-only performance versus post-PEFT performance are referenced, leaving open whether alignment alone yields stable/useful actions or whether gains derive primarily from the subsequent adaptation step.
- [Experiments] Experiments (implied by abstract claims): the reported success transferring Sonic models to LimX Oli and Luna requires explicit reporting of failure modes, embodiment-specific dynamics mismatches (inertia, actuator response), and comparisons to training-from-scratch baselines with the same data budget to substantiate that the method 'substantially accelerates convergence'.
minor comments (1)
- [Abstract] Abstract: include at least one key quantitative result (e.g., tracking error or success rate) to ground the 'competitive or superior' claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on clarifying the role of kinematic alignment and strengthening the experimental analysis. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that kinematic alignment enables the pretrained source policy to be 'meaningfully reused' on the target embodiment before PEFT is load-bearing for the efficiency narrative (1% compute/data). No quantitative metrics, baseline tracking errors, or ablation isolating alignment-only performance versus post-PEFT performance are referenced, leaving open whether alignment alone yields stable/useful actions or whether gains derive primarily from the subsequent adaptation step.
Authors: We agree that the abstract claim would be strengthened by quantitative support. In the revised manuscript we will add an ablation reporting tracking metrics (joint errors, success rates) for the kinematically aligned source policy on target embodiments before PEFT, with direct comparison to post-PEFT performance and from-scratch baselines. This will isolate the contribution of alignment. revision: yes
-
Referee: [Experiments] Experiments (implied by abstract claims): the reported success transferring Sonic models to LimX Oli and Luna requires explicit reporting of failure modes, embodiment-specific dynamics mismatches (inertia, actuator response), and comparisons to training-from-scratch baselines with the same data budget to substantiate that the method 'substantially accelerates convergence'.
Authors: We will expand the experiments section to report observed failure modes, discuss specific dynamics mismatches (inertia, actuator response) across G1/Oli/Luna, and include side-by-side convergence curves and final performance against from-scratch training restricted to the same 1% data/compute budget. These additions will directly address the acceleration claim. revision: yes
Circularity Check
No circularity: empirical transfer method with no derivation chain
full rationale
The paper presents Any2Any as an empirical procedure consisting of kinematic alignment of input/output spaces followed by lightweight PEFT on dynamics-sensitive modules. No equations, first-principles derivations, or predictions are claimed; performance claims rest on experimental comparisons (1% compute/data yielding competitive tracking on LimX platforms) rather than any quantity that reduces to its own fitted inputs or self-citations by construction. The method is framed as a practical reuse strategy whose validity is tested externally via ablation-style experiments, satisfying the criteria for a self-contained empirical result.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Castaneda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
M. Pirotta, A. Tirinzoni, A. Touati, A. Lazaric, and Y . Ollivier. Fast imitation via behavior foundation models. InInternational Conference on Learning Representations, volume 2024, pages 12685–12724, 2024
work page 2024
-
[3]
M. Yuan, T. Yu, W. Ge, X. Yao, D. Li, H. Wang, J. Chen, B. Li, W. Zhang, W. Zeng, et al. A sur- vey of behavior foundation model: Next-generation whole-body control system of humanoid robots.IEEE transactions on pattern analysis and machine intelligence, 2025
work page 2025
- [4]
- [5]
-
[6]
Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu, et al. Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning.IEEE/ASME Transactions on Mechatronics, 31(2):2300–2330, 2026
work page 2026
- [7]
- [8]
- [9]
- [10]
- [11]
-
[13]
Y . Xue, Y . Lin, W. Dong, Y . Tang, J. Wang, J. Pang, M. Zhou, M. Liu, and W. Zhang. Scalable and general whole-body control for cross-humanoid locomotion.arXiv preprint arXiv:2602.05791, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[14]
M. Liu, D. Pathak, and A. Agarwal. Locoformer: Generalist locomotion via long-context adaptation. InProceedings of The 9th Conference on Robot Learning, 2025
work page 2025
- [15]
-
[18]
S. Bai, M. Li, X. Lv, J. Wang, X. Wang, F. Liao, C. Hou, L. Gu, W. Zhou, K. Wu, et al. Hex: Humanoid-aligned experts for cross-embodiment whole-body manipulation.arXiv preprint arXiv:2604.07993, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[19]
D. Kim, J. Lee, J. Ahn, O. Campbell, H. Hwang, and L. Sentis. Computationally-robust and efficient prioritized whole-body controller with contact constraints. In2018 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS), pages 1–8. IEEE, 2018. doi:10.1109/IROS.2018.8593767
-
[20]
M. Chignoli, D. Kim, E. Stanger-Jones, and S. Kim. The mit humanoid robot: De- sign, motion planning, and control for acrobatic behaviors. In2020 IEEE-RAS 20th In- ternational Conference on Humanoid Robots (Humanoids), pages 1–8. IEEE, 2021. doi: 10.1109/HUMANOIDS47582.2021.9555782
- [21]
- [22]
- [23]
-
[24]
M. Chen, K. Wang, B. Zhang, X. Ma, Z. Yang, Y . Ren, Q. Huang, Z. Zhu, Y . Wang, and Z. Su. Holomotion-1 technical report.arXiv preprint arXiv:2605.15336, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
N. Ding, Y . Qin, G. Yang, F. Wei, Z. Yang, Y . Su, S. Hu, Y . Chen, C.-M. Chan, W. Chen, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models.Nature Machine Intelligence, 5(3):220–235, 2023. doi:10.1038/s42256-023-00626-4. URLhttps://www. nature.com/articles/s42256-023-00626-4
-
[26]
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. InInternational Conference on Learning Rep- resentations, 2022. URLhttps://openreview.net/forum?id=nZeVKeeFYf9
work page 2022
-
[27]
N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. At- tariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. InInternational Confer- ence on Machine Learning, pages 2790–2799, 2019. URLhttps://arxiv.org/abs/1902. 00751
work page 2019
-
[28]
X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. InPro- ceedings of the 59th Annual Meeting of the Association for Computational Linguistics, pages 4582–4597, 2021. doi:10.18653/v1/2021.acl-long.353. URLhttps://aclanthology.org/ 2021.acl-long.353/
-
[29]
X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills.ACM Transactions on Graphics, 37(4):1–14, 2018. doi:10.1145/3197517.3201311. URLhttps://arxiv.org/abs/1804. 02717
-
[30]
Z. Luo, J. Cao, A. Winkler, K. Kitani, and W. Xu. Perpetual humanoid control for real-time simulated avatars. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. URLhttps://openaccess.thecvf.com/content/ICCV2023/ html/Luo_Perpetual_Humanoid_Control_for_Real-time_Simulated_Avatars_ ICCV_2023_paper.html. 16
work page 2023
- [31]
- [32]
- [33]
- [34]
- [35]
-
[36]
Open X-Embodiment Collaboration, A. O’Neill, A. Rehman, A. Gupta, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, et al. Open x-embodiment: Robotic learn- ing datasets and rt-x models. InProceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 6892–6903, 2024. doi:10.1109/ICRA57147.2024.10611477. URLhttps...
-
[37]
Octo: An Open-Source Generalist Robot Policy
Octo Model Team, D. Ghosh, H. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y . L. Tan, L. Y . Chen, P. Sanketi, Q. Vuong, T. Xiao, D. Sadigh, C. Finn, and S. Levine. Octo: An open-source generalist robot policy. InProceedings of Robotics: Science and Systems (RSS), 2024. doi:10.15607/RSS.2024.XX.090. URLhttps: //ar...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.15607/rss.2024.xx.090 2024
-
[38]
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
J. Bjorck, F. Castaneda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, et al. GR00T N1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025. URLhttps://arxiv.org/abs/2503.14734
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [39]
-
[40]
BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirec- tional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies, pages 4171–4186. Association for Computational Linguistics, 2019. doi: 10.1...
-
[41]
N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. At- tariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. InProceedings of the 36th International Conference on Machine Learning, volume 97 ofProceedings of Machine Learn- ing Research, pages 2790–2799. PMLR, 2019. URLhttps://proceedings.mlr.press/ v97/h...
work page 2019
-
[42]
M. J. Kim, C. Finn, and P. Liang. Fine-tuning vision-language-action models: Optimizing speed and success, 2025
work page 2025
-
[43]
Y . Wang, P. Ding, L. Li, C. Cui, Z. Ge, X. Tong, W. Song, H. Zhao, W. Zhao, P. Hou, S. Huang, Y . Tang, W. Wang, R. Zhang, J. Liu, and D. Wang. Vla-adapter: An effective paradigm for tiny-scale vision-language-action model, 2025. 17
work page 2025
-
[44]
LimX Oli: Full-Size General-Purpose Humanoid Robot.https://www
LimX Dynamics. LimX Oli: Full-Size General-Purpose Humanoid Robot.https://www. limxdynamics.com/en/products/oli, 2025. Accessed: 2026-05-22
work page 2025
-
[45]
LimX Luna Humanoid Robot.https://x.com/LimX_Dynamics, 2026
LimX Dynamics. LimX Luna Humanoid Robot.https://x.com/LimX_Dynamics, 2026. Official product page not yet publicly available at the time of access; accessed: 2026-05-22
work page 2026
-
[46]
Unitree G1 Humanoid Robot.https://www.unitree.com/g1, 2024
Unitree Robotics. Unitree G1 Humanoid Robot.https://www.unitree.com/g1, 2024. Accessed: 2026-05-22
work page 2024
-
[47]
Unitree H1 Universal Humanoid Robot.https://www.unitree.com/h1,
Unitree Robotics. Unitree H1 Universal Humanoid Robot.https://www.unitree.com/h1,
-
[48]
Accessed: 2026-05-22
work page 2026
-
[49]
N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black. Amass: Archive of motion capture as surface shapes. InProceedings of the IEEE/CVF international conference on computer vision, pages 5442–5451, 2019. 18
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.