SplitAdapter: Load-Aware Humanoid Loco-Manipulation via Factorized Adaptation

Donghan Koo; Hanbyel Cho; Jeonguk Kang; Sanghyun Kang

arxiv: 2606.03297 · v1 · pith:ITCLI4ZDnew · submitted 2026-06-02 · 💻 cs.RO

SplitAdapter: Load-Aware Humanoid Loco-Manipulation via Factorized Adaptation

Jeonguk Kang , Hanbyel Cho , Sanghyun Kang , Donghan Koo This is my paper

Pith reviewed 2026-06-28 09:49 UTC · model grok-4.3

classification 💻 cs.RO

keywords humanoid loco-manipulationsim-to-real transferfactorized adaptationload awarenessFiLM modulationcontext encodersGRL regularizationwhole-body control

0 comments

The pith

SplitAdapter uses separate load and dynamics encoders to raise humanoid loco-manipulation success under heavy objects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that freezing a pretrained manipulation policy and extending it with distinct encoders for object load and robot dynamics, trained via split world-model objectives plus GRL regularization and hierarchical FiLM, yields more robust full-task performance than unified adapters. A reader would care because humanoid robots must walk and manipulate objects whose weight and placement height change the forces during contact, and these variations compound with sim-to-real mismatches to cause instability. The method targets the compression problem in history-based adapters by keeping load and dynamics factors separate so each can be handled without diluting the other.

Core claim

SplitAdapter freezes a pretrained box manipulation policy and augments it with object/load and dynamics-aware context encoders trained with split world-model objectives, GRL-based cross-adversarial regularization, and hierarchical Feature-wise Linear Modulation (FiLM). In sim-to-sim experiments and real-world deployment this produces higher Full-task success than the base policy and world-model FiLM baselines across object masses of 2, 4, and 6 kg and pickup/placement heights of 0, 30, and 60 cm, with the largest gains under heavy-load conditions.

What carries the argument

SplitAdapter, which applies separate load and dynamics context encoders modulated by hierarchical FiLM onto a frozen base policy.

If this is right

Full-task success rates rise over both the base policy and unified world-model FiLM baselines.
The largest gains occur under the heaviest tested load of 6 kg across all pickup and placement heights.
The approach supports stable performance in both sim-to-sim and real-world loco-manipulation settings.
The frozen base policy can be reused while only the context encoders are trained for new load or dynamics conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same split could let teams swap only the dynamics encoder when moving the method to a different robot body without retraining the core policy.
Factorization may extend to other variable-payload tasks such as legged carrying or mobile manipulation where multiple disturbance sources coexist.
Testing the encoders on continuous rather than discrete mass values would check whether the split objectives scale beyond the reported 2-6 kg range.

Load-bearing premise

The split encoders and regularization will stay stable and effective when object-induced load changes interact with robot dynamics mismatch during physical contact in sim-to-real transfer.

What would settle it

Real-world trials in which success with 6 kg objects at 60 cm height falls to or below the base-policy level would falsify the claim of effective factorized adaptation.

Figures

Figures reproduced from arXiv: 2606.03297 by Donghan Koo, Hanbyel Cho, Jeonguk Kang, Sanghyun Kang.

**Figure 2.** Figure 2: SplitAdapter overview. A frozen humanoid manipulation policy is adapted with ob [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparisons of the proposed and baseline methods. (a) t-SNE visualization [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Online mass and loaded-state estimation across two payload conditions. Both the esti [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: (a) World-model transition losses with and without split designs. (b) Predictability gaps [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Real-world experiment setup. (a) High-friction gloves attached to the Unitree G1 hands to [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Supplementary visualization of the simulation results in Table [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Additional online mass and loaded-state estimation results across four payload conditions [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Box-transport trajectories under 9 object mass and pickup/placement-height conditions, [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

read the original abstract

Humanoid loco-manipulation requires stable whole-body control under varying object masses and pickup/placement heights. This becomes particularly challenging in sim-to-real transfer, where object-induced load variation and robot-side dynamics mismatch interact during physical contact. Existing history-based adapters often compress these factors into a single latent representation, which can weaken robustness under heavy-load manipulation. We propose \textbf{SplitAdapter: Load-Aware Humanoid Loco-Manipulation via Factorized Adaptation}, which freezes a pretrained box manipulation policy and extends it with object/load and dynamics-aware context encoders trained with split world-model objectives, GRL-based cross-adversarial regularization, and hierarchical Feature-wise Linear Modulation (FiLM). In sim-to-sim experiments and real-world deployment, SplitAdapter improves Full-task success over the base policy and world-model FiLM baselines across object masses of $2$, $4$, and $6$ kg and pickup/placement heights of $0$, $30$, and $60$ cm, with the largest improvements under heavy-load conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SplitAdapter factorizes load and dynamics adaptation for humanoid loco-manipulation and reports gains on heavy loads, but the experiments do not clearly test the coupled contact regime that the abstract itself flags as the core difficulty.

read the letter

The paper freezes a base policy and adds separate load-aware and dynamics-aware encoders trained with split world-model losses, GRL cross-adversarial regularization, and hierarchical FiLM. That specific combination is new enough to be worth noting, even if it builds on prior adapter work. The claim is that this gives higher full-task success than the base policy or a single world-model FiLM baseline, with the biggest lift at 6 kg across the tested heights.

What stands out is the practical focus on object mass variation in whole-body control and the attempt to keep the base policy untouched. That matches a real deployment need.

The soft spot is the stress-test point: the abstract calls out that object load and robot dynamics mismatch interact during physical contact, yet the reported conditions (masses 2/4/6 kg, heights 0/30/60 cm) do not isolate or stress that interaction. If the factorization cannot capture the coupling, the real-world gains will not generalize. The abstract supplies no numbers, error bars, or trial counts, so the size of the improvement and its statistical reliability cannot be judged from the summary alone.

This is for people already working on sim-to-real humanoid manipulation who need a concrete adapter trick. It is not a broad framework paper. The idea is coherent on its own terms and the authors engage the right prior work, so it deserves a serious referee to check the full results, ablations, and whether the coupled regime was actually probed. I would send it out rather than desk reject.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes SplitAdapter, a factorized adaptation method for humanoid loco-manipulation. It freezes a pretrained base policy and augments it with separate object/load-aware and dynamics-aware context encoders. These are trained using split world-model objectives, GRL-based cross-adversarial regularization, and hierarchical FiLM conditioning. The central empirical claim is that this yields higher full-task success rates than the base policy and world-model FiLM baselines in both sim-to-sim and real-world settings, across object masses of 2/4/6 kg and pickup/placement heights of 0/30/60 cm, with the largest gains under the 6 kg condition.

Significance. If the reported gains are statistically reliable and the factorization remains effective under contact, the work would offer a concrete architectural pattern for handling multiple sources of variation in sim-to-real humanoid control. The explicit separation of load and dynamics encoders plus the use of GRL to encourage disentanglement constitute a clear methodological contribution. Real-world deployment on a physical humanoid further strengthens the result relative to purely simulated studies.

major comments (1)

[Experiments] Experiments section (and abstract): the reported conditions vary mass and height independently but do not isolate or stress the coupled regime in which object-induced load variation interacts with robot-side dynamics mismatch during physical contact—the exact difficulty flagged in the abstract as the core sim-to-real challenge. Without such conditions or an ablation that perturbs both factors simultaneously, it remains unclear whether the deliberate separation of encoders actually captures the interaction the method is intended to solve.

minor comments (1)

[Abstract] Abstract: quantitative success rates, number of trials, and any statistical tests are omitted; these details belong in the abstract or a prominent results table for immediate evaluability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below.

read point-by-point responses

Referee: [Experiments] Experiments section (and abstract): the reported conditions vary mass and height independently but do not isolate or stress the coupled regime in which object-induced load variation interacts with robot-side dynamics mismatch during physical contact—the exact difficulty flagged in the abstract as the core sim-to-real challenge. Without such conditions or an ablation that perturbs both factors simultaneously, it remains unclear whether the deliberate separation of encoders actually captures the interaction the method is intended to solve.

Authors: The reported experiments evaluate full-task success across all combinations of masses (2/4/6 kg) and heights (0/30/60 cm). These joint conditions require the policy to manage the interaction between load-induced effects and contact dynamics during pickup and placement. The real-world deployment on the physical humanoid further couples these load variations with inherent robot-side dynamics mismatch. We therefore maintain that the current results do stress the regime highlighted in the abstract. To provide additional explicit evidence for the value of the factorization under simultaneous perturbations, we will add a targeted sim-to-sim ablation in the revision. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation chain; claims rest on empirical validation

full rationale

The provided text (abstract plus summary) contains no equations, no derivation steps, no fitted parameters presented as predictions, and no self-citations invoked as load-bearing uniqueness theorems or ansatzes. The method is described as an architectural extension (frozen base policy plus factorized encoders with split objectives, GRL, and FiLM) whose performance is asserted via sim-to-sim and real-world experiments across discrete mass/height conditions. No step reduces by construction to its own inputs; the central claims are therefore not circular but rest on external experimental outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only text supplies no explicit free parameters, axioms, or invented entities; full manuscript required to audit any fitted scales, normalization choices, or new latent variables.

pith-pipeline@v0.9.1-grok · 5714 in / 1114 out tokens · 18639 ms · 2026-06-28T09:49:35.933679+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 14 canonical work pages · 2 internal anchors

[1]

X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. DeepMimic: Example-guided deep reinforcement learning of physics-based character skills.ACM Transactions on Graphics, 37(4):1–14, 2018

2018
[2]

X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa. AMP: Adversarial motion priors for stylized physics-based character control.ACM Transactions on Graphics, 40(4):1–20, 2021

2021
[3]

H. Wang, W. Zhang, R. Yu, T. Huang, J. Ren, F. Jia, Z. Wang, X. Niu, X. Chen, J. Chen, Q. Chen, J. Wang, and J. Pang. PhysHSI: Towards a real-world generalizable and natural humanoid-scene interaction system. arXiv preprint arXiv:2510.11072, 2025

work page arXiv 2025
[4]

X. He, S. Xu, X. Li, R. Dong, L. Bian, Y .-X. Wang, and L.-Y . Gui. ULTRA: Unified mul- timodal control for autonomous humanoid whole-body loco-manipulation. arXiv preprint arXiv:2603.03279, 2026

work page arXiv 2026
[5]

Y . Lin, J. Shi, D. Wang, J. Kong, Y . Liu, C. Bai, and X. Li. Pro-HOI: Perceptive root-guided humanoid-object interaction. arXiv preprint arXiv:2603.01126, 2026

work page arXiv 2026
[6]

S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. Resmimic: From general motion tracking to humanoid whole-body loco-manipulation via residual learning, 2025. URL https://arxiv.org/abs/2510.05070

work page arXiv 2025
[7]

Zhang, Y

Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A. akbar Agha-mohammadi, M. Vazquez-Chanlatte, L. Pedersen, T. He, and G. Shi. FALCON: Learning force-adaptive humanoid loco-manipulation. arXiv preprint arXiv:2505.06776, 2025

work page arXiv 2025
[8]

D. Li, X. Chen, Q. Wu, B. Chen, S. Wu, H. Wu, G. Zhang, L. Li, M. Zhou, D. Xiang, J. Ma, Q. Zhang, and R. Xu. HAIC: Humanoid agile object interaction control via dynamics-aware world model. arXiv preprint arXiv:2602.11758, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[9]

Kumar, Z

A. Kumar, Z. Fu, D. Pathak, and J. Malik. RMA: Rapid motor adaptation for legged robots. In Robotics: Science and Systems (RSS), 2021

2021
[10]

Gu, Y .-J

X. Gu, Y .-J. Wang, X. Zhu, C. Shi, Y . Guo, Y . Liu, and J. Chen. Advancing humanoid loco- motion: Mastering challenging terrains with denoising world model learning. arXiv preprint arXiv:2408.14472, 2024

work page arXiv 2024
[11]

W. Sun, L. Chen, Y . Su, B. Cao, Y . Liu, and Z. Xie. Learning humanoid locomotion with world model reconstruction. arXiv preprint arXiv:2502.16230, 2025

work page arXiv 2025
[12]

C. Li, A. Krause, and M. Hutter. Robotic world model: A neural network simulator for robust policy optimization in robotics. arXiv preprint arXiv:2501.10100, 2025

work page arXiv 2025
[13]

Zhang, J

Z. Zhang, J. Guo, C. Chen, J. Wang, C. Lin, Y . Lian, H. Xue, Z. Wang, M. Liu, J. Lyu, H. Liu, H. Wang, and L. Yi. Track any motions under any disturbances. arXiv preprint arXiv:2509.13833, 2025

work page arXiv 2025
[14]

Ganin and V

Y . Ganin and V . Lempitsky. Unsupervised domain adaptation by backpropagation. InProceed- ings of the International Conference on Machine Learning (ICML), 2015

2015
[15]

Perez, F

E. Perez, F. Strub, H. de Vries, V . Dumoulin, and A. Courville. Film: Visual reasoning with a general conditioning layer. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

2018
[16]

Todorov, T

E. Todorov, T. Erez, and Y . Tassa. MuJoCo: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012. 9

2012
[17]

Kajita, F

S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and H. Hirukawa. Biped walking pattern generation by using preview control of zero-moment point. InIEEE Interna- tional Conference on Robotics and Automation (ICRA), 2003

2003
[18]

Murooka, K

M. Murooka, K. Chappellet, A. Tanguy, M. Benallegue, I. Kumagai, M. Morisawa, F. Kane- hiro, and A. Kheddar. Humanoid loco-manipulations pattern generation and stabilization con- trol.IEEE Robotics and Automation Letters, 6(3):5597–5604, 2021

2021
[19]

Ruscelli, M

F. Ruscelli, M. P. Polverini, A. Laurenzi, E. M. Hoffman, and N. G. Tsagarakis. A multi- contact motion planning and control strategy for physical interaction tasks using a humanoid robot. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020

2020
[20]

J. Dao, H. Duan, and A. Fern. Sim-to-real learning for humanoid box loco-manipulation. InIEEE International Conference on Robotics and Automation (ICRA), pages 16930–16936, 2024

2024
[21]

A. Rigo, M. Hu, S. K. Gupta, and Q. Nguyen. Hierarchical optimization-based control for whole-body loco-manipulation of heavy objects. InIEEE International Conference on Robotics and Automation (ICRA), pages 15322–15328, 2024

2024
[22]

H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi. Hdmi: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

work page arXiv 2025
[23]

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

S. Yin, Y . Ze, H.-X. Yu, C. K. Liu, and J. Wu. Visualmimic: Visual humanoid loco- manipulation via motion tracking and generation.arXiv preprint arXiv:2509.20322, 2025

work page arXiv 2025
[25]

S. Xu, H. Y . Ling, Y .-X. Wang, and L.-Y . Gui. InterMimic: Towards universal whole-body control for physics-based human-object interactions. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025
[26]

Y . Fu, F. Xie, C. Xu, J. Xiong, H. Yuan, and Z. Lu. DemoHLM: From one demonstration to generalizable humanoid loco-manipulation. arXiv preprint arXiv:2510.11258, 2025

work page arXiv 2025
[27]

Kumar, Z

A. Kumar, Z. Li, J. Zeng, D. Pathak, K. Sreenath, and J. Malik. Adapting rapid motor adap- tation for bipedal robots. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1161–1168, 2022

2022
[28]

Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath. Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control.The International Journal of Robotics Research, 2024

2024
[29]

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. InIEEE International Conference on Robotics and Automation (ICRA), 2018

2018
[30]

Pinto, M

L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel. Asymmetric actor critic for image-based robot learning. InRobotics: Science and Systems (RSS), 2018

2018
[31]

Makoviychuk, L

V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al. Isaac Gym: High performance GPU-based physics simulation for robot learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

2021
[32]

Humanoid agent AI avatar, 2024

Unitree. Humanoid agent AI avatar, 2024. URLhttps://www.unitree.com/g1. Accessed: 2026-05-22. 10 A Appendix A.1 Observation and Reward Details The frozen base policy follows the AMP-based locomotion framework of PhysHSI [3]. The policy outputs 29-D joint-position targets executed through low-level PD control. Here, we only summarize the task-specific obse...

2024

[1] [1]

X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. DeepMimic: Example-guided deep reinforcement learning of physics-based character skills.ACM Transactions on Graphics, 37(4):1–14, 2018

2018

[2] [2]

X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa. AMP: Adversarial motion priors for stylized physics-based character control.ACM Transactions on Graphics, 40(4):1–20, 2021

2021

[3] [3]

H. Wang, W. Zhang, R. Yu, T. Huang, J. Ren, F. Jia, Z. Wang, X. Niu, X. Chen, J. Chen, Q. Chen, J. Wang, and J. Pang. PhysHSI: Towards a real-world generalizable and natural humanoid-scene interaction system. arXiv preprint arXiv:2510.11072, 2025

work page arXiv 2025

[4] [4]

X. He, S. Xu, X. Li, R. Dong, L. Bian, Y .-X. Wang, and L.-Y . Gui. ULTRA: Unified mul- timodal control for autonomous humanoid whole-body loco-manipulation. arXiv preprint arXiv:2603.03279, 2026

work page arXiv 2026

[5] [5]

Y . Lin, J. Shi, D. Wang, J. Kong, Y . Liu, C. Bai, and X. Li. Pro-HOI: Perceptive root-guided humanoid-object interaction. arXiv preprint arXiv:2603.01126, 2026

work page arXiv 2026

[6] [6]

S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. Resmimic: From general motion tracking to humanoid whole-body loco-manipulation via residual learning, 2025. URL https://arxiv.org/abs/2510.05070

work page arXiv 2025

[7] [7]

Zhang, Y

Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A. akbar Agha-mohammadi, M. Vazquez-Chanlatte, L. Pedersen, T. He, and G. Shi. FALCON: Learning force-adaptive humanoid loco-manipulation. arXiv preprint arXiv:2505.06776, 2025

work page arXiv 2025

[8] [8]

D. Li, X. Chen, Q. Wu, B. Chen, S. Wu, H. Wu, G. Zhang, L. Li, M. Zhou, D. Xiang, J. Ma, Q. Zhang, and R. Xu. HAIC: Humanoid agile object interaction control via dynamics-aware world model. arXiv preprint arXiv:2602.11758, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[9] [9]

Kumar, Z

A. Kumar, Z. Fu, D. Pathak, and J. Malik. RMA: Rapid motor adaptation for legged robots. In Robotics: Science and Systems (RSS), 2021

2021

[10] [10]

Gu, Y .-J

X. Gu, Y .-J. Wang, X. Zhu, C. Shi, Y . Guo, Y . Liu, and J. Chen. Advancing humanoid loco- motion: Mastering challenging terrains with denoising world model learning. arXiv preprint arXiv:2408.14472, 2024

work page arXiv 2024

[11] [11]

W. Sun, L. Chen, Y . Su, B. Cao, Y . Liu, and Z. Xie. Learning humanoid locomotion with world model reconstruction. arXiv preprint arXiv:2502.16230, 2025

work page arXiv 2025

[12] [12]

C. Li, A. Krause, and M. Hutter. Robotic world model: A neural network simulator for robust policy optimization in robotics. arXiv preprint arXiv:2501.10100, 2025

work page arXiv 2025

[13] [13]

Zhang, J

Z. Zhang, J. Guo, C. Chen, J. Wang, C. Lin, Y . Lian, H. Xue, Z. Wang, M. Liu, J. Lyu, H. Liu, H. Wang, and L. Yi. Track any motions under any disturbances. arXiv preprint arXiv:2509.13833, 2025

work page arXiv 2025

[14] [14]

Ganin and V

Y . Ganin and V . Lempitsky. Unsupervised domain adaptation by backpropagation. InProceed- ings of the International Conference on Machine Learning (ICML), 2015

2015

[15] [15]

Perez, F

E. Perez, F. Strub, H. de Vries, V . Dumoulin, and A. Courville. Film: Visual reasoning with a general conditioning layer. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

2018

[16] [16]

Todorov, T

E. Todorov, T. Erez, and Y . Tassa. MuJoCo: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012. 9

2012

[17] [17]

Kajita, F

S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi, and H. Hirukawa. Biped walking pattern generation by using preview control of zero-moment point. InIEEE Interna- tional Conference on Robotics and Automation (ICRA), 2003

2003

[18] [18]

Murooka, K

M. Murooka, K. Chappellet, A. Tanguy, M. Benallegue, I. Kumagai, M. Morisawa, F. Kane- hiro, and A. Kheddar. Humanoid loco-manipulations pattern generation and stabilization con- trol.IEEE Robotics and Automation Letters, 6(3):5597–5604, 2021

2021

[19] [19]

Ruscelli, M

F. Ruscelli, M. P. Polverini, A. Laurenzi, E. M. Hoffman, and N. G. Tsagarakis. A multi- contact motion planning and control strategy for physical interaction tasks using a humanoid robot. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020

2020

[20] [20]

J. Dao, H. Duan, and A. Fern. Sim-to-real learning for humanoid box loco-manipulation. InIEEE International Conference on Robotics and Automation (ICRA), pages 16930–16936, 2024

2024

[21] [21]

A. Rigo, M. Hu, S. K. Gupta, and Q. Nguyen. Hierarchical optimization-based control for whole-body loco-manipulation of heavy objects. InIEEE International Conference on Robotics and Automation (ICRA), pages 15322–15328, 2024

2024

[22] [22]

H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi. Hdmi: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

work page arXiv 2025

[23] [23]

L. Yang, X. Huang, Z. Wu, A. Kanazawa, P. Abbeel, C. Sferrazza, C. K. Liu, R. Duan, and G. Shi. Omniretarget: Interaction-preserving data generation for humanoid whole-body loco- manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

S. Yin, Y . Ze, H.-X. Yu, C. K. Liu, and J. Wu. Visualmimic: Visual humanoid loco- manipulation via motion tracking and generation.arXiv preprint arXiv:2509.20322, 2025

work page arXiv 2025

[25] [25]

S. Xu, H. Y . Ling, Y .-X. Wang, and L.-Y . Gui. InterMimic: Towards universal whole-body control for physics-based human-object interactions. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025

[26] [26]

Y . Fu, F. Xie, C. Xu, J. Xiong, H. Yuan, and Z. Lu. DemoHLM: From one demonstration to generalizable humanoid loco-manipulation. arXiv preprint arXiv:2510.11258, 2025

work page arXiv 2025

[27] [27]

Kumar, Z

A. Kumar, Z. Li, J. Zeng, D. Pathak, K. Sreenath, and J. Malik. Adapting rapid motor adap- tation for bipedal robots. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1161–1168, 2022

2022

[28] [28]

Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath. Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control.The International Journal of Robotics Research, 2024

2024

[29] [29]

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. InIEEE International Conference on Robotics and Automation (ICRA), 2018

2018

[30] [30]

Pinto, M

L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel. Asymmetric actor critic for image-based robot learning. InRobotics: Science and Systems (RSS), 2018

2018

[31] [31]

Makoviychuk, L

V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al. Isaac Gym: High performance GPU-based physics simulation for robot learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

2021

[32] [32]

Humanoid agent AI avatar, 2024

Unitree. Humanoid agent AI avatar, 2024. URLhttps://www.unitree.com/g1. Accessed: 2026-05-22. 10 A Appendix A.1 Observation and Reward Details The frozen base policy follows the AMP-based locomotion framework of PhysHSI [3]. The policy outputs 29-D joint-position targets executed through low-level PD control. Here, we only summarize the task-specific obse...

2024