pith. sign in

arxiv: 2606.03297 · v1 · pith:ITCLI4ZDnew · submitted 2026-06-02 · 💻 cs.RO

SplitAdapter: Load-Aware Humanoid Loco-Manipulation via Factorized Adaptation

Pith reviewed 2026-06-28 09:49 UTC · model grok-4.3

classification 💻 cs.RO
keywords humanoid loco-manipulationsim-to-real transferfactorized adaptationload awarenessFiLM modulationcontext encodersGRL regularizationwhole-body control
0
0 comments X

The pith

SplitAdapter uses separate load and dynamics encoders to raise humanoid loco-manipulation success under heavy objects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that freezing a pretrained manipulation policy and extending it with distinct encoders for object load and robot dynamics, trained via split world-model objectives plus GRL regularization and hierarchical FiLM, yields more robust full-task performance than unified adapters. A reader would care because humanoid robots must walk and manipulate objects whose weight and placement height change the forces during contact, and these variations compound with sim-to-real mismatches to cause instability. The method targets the compression problem in history-based adapters by keeping load and dynamics factors separate so each can be handled without diluting the other.

Core claim

SplitAdapter freezes a pretrained box manipulation policy and augments it with object/load and dynamics-aware context encoders trained with split world-model objectives, GRL-based cross-adversarial regularization, and hierarchical Feature-wise Linear Modulation (FiLM). In sim-to-sim experiments and real-world deployment this produces higher Full-task success than the base policy and world-model FiLM baselines across object masses of 2, 4, and 6 kg and pickup/placement heights of 0, 30, and 60 cm, with the largest gains under heavy-load conditions.

What carries the argument

SplitAdapter, which applies separate load and dynamics context encoders modulated by hierarchical FiLM onto a frozen base policy.

If this is right

  • Full-task success rates rise over both the base policy and unified world-model FiLM baselines.
  • The largest gains occur under the heaviest tested load of 6 kg across all pickup and placement heights.
  • The approach supports stable performance in both sim-to-sim and real-world loco-manipulation settings.
  • The frozen base policy can be reused while only the context encoders are trained for new load or dynamics conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same split could let teams swap only the dynamics encoder when moving the method to a different robot body without retraining the core policy.
  • Factorization may extend to other variable-payload tasks such as legged carrying or mobile manipulation where multiple disturbance sources coexist.
  • Testing the encoders on continuous rather than discrete mass values would check whether the split objectives scale beyond the reported 2-6 kg range.

Load-bearing premise

The split encoders and regularization will stay stable and effective when object-induced load changes interact with robot dynamics mismatch during physical contact in sim-to-real transfer.

What would settle it

Real-world trials in which success with 6 kg objects at 60 cm height falls to or below the base-policy level would falsify the claim of effective factorized adaptation.

Figures

Figures reproduced from arXiv: 2606.03297 by Donghan Koo, Hanbyel Cho, Jeonguk Kang, Sanghyun Kang.

Figure 1
Figure 1. Figure 1: Real-world humanoid loco-manipulation experiments with the proposed method under [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SplitAdapter overview. A frozen humanoid manipulation policy is adapted with ob [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparisons of the proposed and baseline methods. (a) t-SNE visualization [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Online mass and loaded-state estimation across two payload conditions. Both the esti [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) World-model transition losses with and without split designs. (b) Predictability gaps [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Real-world experiment setup. (a) High-friction gloves attached to the Unitree G1 hands to [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Supplementary visualization of the simulation results in Table [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Additional online mass and loaded-state estimation results across four payload conditions [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Box-transport trajectories under 9 object mass and pickup/placement-height conditions, [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
read the original abstract

Humanoid loco-manipulation requires stable whole-body control under varying object masses and pickup/placement heights. This becomes particularly challenging in sim-to-real transfer, where object-induced load variation and robot-side dynamics mismatch interact during physical contact. Existing history-based adapters often compress these factors into a single latent representation, which can weaken robustness under heavy-load manipulation. We propose \textbf{SplitAdapter: Load-Aware Humanoid Loco-Manipulation via Factorized Adaptation}, which freezes a pretrained box manipulation policy and extends it with object/load and dynamics-aware context encoders trained with split world-model objectives, GRL-based cross-adversarial regularization, and hierarchical Feature-wise Linear Modulation (FiLM). In sim-to-sim experiments and real-world deployment, SplitAdapter improves Full-task success over the base policy and world-model FiLM baselines across object masses of $2$, $4$, and $6$ kg and pickup/placement heights of $0$, $30$, and $60$ cm, with the largest improvements under heavy-load conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes SplitAdapter, a factorized adaptation method for humanoid loco-manipulation. It freezes a pretrained base policy and augments it with separate object/load-aware and dynamics-aware context encoders. These are trained using split world-model objectives, GRL-based cross-adversarial regularization, and hierarchical FiLM conditioning. The central empirical claim is that this yields higher full-task success rates than the base policy and world-model FiLM baselines in both sim-to-sim and real-world settings, across object masses of 2/4/6 kg and pickup/placement heights of 0/30/60 cm, with the largest gains under the 6 kg condition.

Significance. If the reported gains are statistically reliable and the factorization remains effective under contact, the work would offer a concrete architectural pattern for handling multiple sources of variation in sim-to-real humanoid control. The explicit separation of load and dynamics encoders plus the use of GRL to encourage disentanglement constitute a clear methodological contribution. Real-world deployment on a physical humanoid further strengthens the result relative to purely simulated studies.

major comments (1)
  1. [Experiments] Experiments section (and abstract): the reported conditions vary mass and height independently but do not isolate or stress the coupled regime in which object-induced load variation interacts with robot-side dynamics mismatch during physical contact—the exact difficulty flagged in the abstract as the core sim-to-real challenge. Without such conditions or an ablation that perturbs both factors simultaneously, it remains unclear whether the deliberate separation of encoders actually captures the interaction the method is intended to solve.
minor comments (1)
  1. [Abstract] Abstract: quantitative success rates, number of trials, and any statistical tests are omitted; these details belong in the abstract or a prominent results table for immediate evaluability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below.

read point-by-point responses
  1. Referee: [Experiments] Experiments section (and abstract): the reported conditions vary mass and height independently but do not isolate or stress the coupled regime in which object-induced load variation interacts with robot-side dynamics mismatch during physical contact—the exact difficulty flagged in the abstract as the core sim-to-real challenge. Without such conditions or an ablation that perturbs both factors simultaneously, it remains unclear whether the deliberate separation of encoders actually captures the interaction the method is intended to solve.

    Authors: The reported experiments evaluate full-task success across all combinations of masses (2/4/6 kg) and heights (0/30/60 cm). These joint conditions require the policy to manage the interaction between load-induced effects and contact dynamics during pickup and placement. The real-world deployment on the physical humanoid further couples these load variations with inherent robot-side dynamics mismatch. We therefore maintain that the current results do stress the regime highlighted in the abstract. To provide additional explicit evidence for the value of the factorization under simultaneous perturbations, we will add a targeted sim-to-sim ablation in the revision. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation chain; claims rest on empirical validation

full rationale

The provided text (abstract plus summary) contains no equations, no derivation steps, no fitted parameters presented as predictions, and no self-citations invoked as load-bearing uniqueness theorems or ansatzes. The method is described as an architectural extension (frozen base policy plus factorized encoders with split objectives, GRL, and FiLM) whose performance is asserted via sim-to-sim and real-world experiments across discrete mass/height conditions. No step reduces by construction to its own inputs; the central claims are therefore not circular but rest on external experimental outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only text supplies no explicit free parameters, axioms, or invented entities; full manuscript required to audit any fitted scales, normalization choices, or new latent variables.

pith-pipeline@v0.9.1-grok · 5714 in / 1114 out tokens · 18639 ms · 2026-06-28T09:49:35.933679+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 14 canonical work pages · 2 internal anchors

  1. [1]

    X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. DeepMimic: Example-guided deep reinforcement learning of physics-based character skills.ACM Transactions on Graphics, 37(4):1–14, 2018

  2. [2]

    X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa. AMP: Adversarial motion priors for stylized physics-based character control.ACM Transactions on Graphics, 40(4):1–20, 2021

  3. [3]

    H. Wang, W. Zhang, R. Yu, T. Huang, J. Ren, F. Jia, Z. Wang, X. Niu, X. Chen, J. Chen, Q. Chen, J. Wang, and J. Pang. PhysHSI: Towards a real-world generalizable and natural humanoid-scene interaction system. arXiv preprint arXiv:2510.11072, 2025

  4. [4]

    X. He, S. Xu, X. Li, R. Dong, L. Bian, Y .-X. Wang, and L.-Y . Gui. ULTRA: Unified mul- timodal control for autonomous humanoid whole-body loco-manipulation. arXiv preprint arXiv:2603.03279, 2026

  5. [5]

    Y . Lin, J. Shi, D. Wang, J. Kong, Y . Liu, C. Bai, and X. Li. Pro-HOI: Perceptive root-guided humanoid-object interaction. arXiv preprint arXiv:2603.01126, 2026

  6. [6]

    S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. Resmimic: From general motion tracking to humanoid whole-body loco-manipulation via residual learning, 2025. URL https://arxiv.org/abs/2510.05070

  7. [7]

    Zhang, Y

    Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A. akbar Agha-mohammadi, M. Vazquez-Chanlatte, L. Pedersen, T. He, and G. Shi. FALCON: Learning force-adaptive humanoid loco-manipulation. arXiv preprint arXiv:2505.06776, 2025

  8. [8]

    D. Li, X. Chen, Q. Wu, B. Chen, S. Wu, H. Wu, G. Zhang, L. Li, M. Zhou, D. Xiang, J. Ma, Q. Zhang, and R. Xu. HAIC: Humanoid agile object interaction control via dynamics-aware world model. arXiv preprint arXiv:2602.11758, 2026

  9. [9]

    Kumar, Z

    A. Kumar, Z. Fu, D. Pathak, and J. Malik. RMA: Rapid motor adaptation for legged robots. In Robotics: Science and Systems (RSS), 2021

  10. [10]

    Gu, Y .-J

    X. Gu, Y .-J. Wang, X. Zhu, C. Shi, Y . Guo, Y . Liu, and J. Chen. Advancing humanoid loco- motion: Mastering challenging terrains with denoising world model learning. arXiv preprint arXiv:2408.14472, 2024

  11. [11]

    W. Sun, L. Chen, Y . Su, B. Cao, Y . Liu, and Z. Xie. Learning humanoid locomotion with world model reconstruction. arXiv preprint arXiv:2502.16230, 2025

  12. [12]

    C. Li, A. Krause, and M. Hutter. Robotic world model: A neural network simulator for robust policy optimization in robotics. arXiv preprint arXiv:2501.10100, 2025

  13. [13]

    Zhang, J

    Z. Zhang, J. Guo, C. Chen, J. Wang, C. Lin, Y . Lian, H. Xue, Z. Wang, M. Liu, J. Lyu, H. Liu, H. Wang, and L. Yi. Track any motions under any disturbances. arXiv preprint arXiv:2509.13833, 2025

  14. [14]

    Ganin and V

    Y . Ganin and V . Lempitsky. Unsupervised domain adaptation by backpropagation. InProceed- ings of the International Conference on Machine Learning (ICML), 2015

  15. [15]

    Perez, F

    E. Perez, F. Strub, H. de Vries, V . Dumoulin, and A. Courville. Film: Visual reasoning with a general conditioning layer. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

  16. [16]

    Todorov, T

    E. Todorov, T. Erez, and Y . Tassa. MuJoCo: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012. 9

  17. [17]

    Kajita, F

    S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and H. Hirukawa. Biped walking pattern generation by using preview control of zero-moment point. InIEEE Interna- tional Conference on Robotics and Automation (ICRA), 2003

  18. [18]

    Murooka, K

    M. Murooka, K. Chappellet, A. Tanguy, M. Benallegue, I. Kumagai, M. Morisawa, F. Kane- hiro, and A. Kheddar. Humanoid loco-manipulations pattern generation and stabilization con- trol.IEEE Robotics and Automation Letters, 6(3):5597–5604, 2021

  19. [19]

    Ruscelli, M

    F. Ruscelli, M. P. Polverini, A. Laurenzi, E. M. Hoffman, and N. G. Tsagarakis. A multi- contact motion planning and control strategy for physical interaction tasks using a humanoid robot. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020

  20. [20]

    J. Dao, H. Duan, and A. Fern. Sim-to-real learning for humanoid box loco-manipulation. InIEEE International Conference on Robotics and Automation (ICRA), pages 16930–16936, 2024

  21. [21]

    A. Rigo, M. Hu, S. K. Gupta, and Q. Nguyen. Hierarchical optimization-based control for whole-body loco-manipulation of heavy objects. InIEEE International Conference on Robotics and Automation (ICRA), pages 15322–15328, 2024

  22. [22]

    H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi. Hdmi: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

  23. [23]

    L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

  24. [24]

    S. Yin, Y . Ze, H.-X. Yu, C. K. Liu, and J. Wu. Visualmimic: Visual humanoid loco- manipulation via motion tracking and generation.arXiv preprint arXiv:2509.20322, 2025

  25. [25]

    S. Xu, H. Y . Ling, Y .-X. Wang, and L.-Y . Gui. InterMimic: Towards universal whole-body control for physics-based human-object interactions. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

  26. [26]

    Y . Fu, F. Xie, C. Xu, J. Xiong, H. Yuan, and Z. Lu. DemoHLM: From one demonstration to generalizable humanoid loco-manipulation. arXiv preprint arXiv:2510.11258, 2025

  27. [27]

    Kumar, Z

    A. Kumar, Z. Li, J. Zeng, D. Pathak, K. Sreenath, and J. Malik. Adapting rapid motor adap- tation for bipedal robots. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1161–1168, 2022

  28. [28]

    Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath. Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control.The International Journal of Robotics Research, 2024

  29. [29]

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. InIEEE International Conference on Robotics and Automation (ICRA), 2018

  30. [30]

    Pinto, M

    L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel. Asymmetric actor critic for image-based robot learning. InRobotics: Science and Systems (RSS), 2018

  31. [31]

    Makoviychuk, L

    V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al. Isaac Gym: High performance GPU-based physics simulation for robot learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

  32. [32]

    Humanoid agent AI avatar, 2024

    Unitree. Humanoid agent AI avatar, 2024. URLhttps://www.unitree.com/g1. Accessed: 2026-05-22. 10 A Appendix A.1 Observation and Reward Details The frozen base policy follows the AMP-based locomotion framework of PhysHSI [3]. The policy outputs 29-D joint-position targets executed through low-level PD control. Here, we only summarize the task-specific obse...