pith. machine review for the scientific record. sign in

arxiv: 2603.02856 · v2 · submitted 2026-03-03 · 💻 cs.RO

Recognition: 2 theorem links

· Lean Theorem

Rhythm: Learning Interactive Whole-Body Control for Dual Humanoids

Authors on Pith no claims yet

Pith reviewed 2026-05-15 17:08 UTC · model grok-4.3

classification 💻 cs.RO
keywords humanoid robotswhole-body controlreinforcement learningmulti-robot interactionsim-to-real transfermotion retargetingdual humanoid systems
0
0 comments X

The pith

Rhythm framework enables real-world interactive whole-body control for dual humanoid robots by transferring behaviors like hugging and dancing from simulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a unified framework called Rhythm to make pairs of humanoid robots interact physically in the real world. It tackles kinematic mismatches between robots and complex contact forces by first retargeting human motion data into feasible robot references, then training a policy with graph-based rewards that account for the coupled dynamics of two agents. A deployment pipeline then transfers the learned behaviors to physical Unitree G1 robots. If the approach holds, it would allow collaborative robot teams to perform joint tasks such as coordinated movement or physical assistance without requiring separate controllers for each robot.

Core claim

The paper claims that integrating an Interaction-Aware Motion Retargeting module to produce kinematically feasible references from human data, an Interaction-Guided Reinforcement Learning policy that masters coupled dynamics via graph-based rewards, and a real-world deployment system achieves robust transfer of diverse interactive behaviors such as hugging and dancing from simulation to physical dual-humanoid systems.

What carries the argument

The Interaction-Aware Motion Retargeting (IAMR) module that converts human interaction data into feasible dual-humanoid references, paired with the Interaction-Guided Reinforcement Learning (IGRL) policy that uses graph-based rewards to handle coupled contact dynamics.

Load-bearing premise

The retargeted motion references remain kinematically feasible and dynamically stable when the robots experience real contact forces and sensor noise on physical hardware.

What would settle it

Repeated real-world trials in which the dual robots fail to complete a hugging or dancing sequence without falling, exceeding joint limits, or losing balance under contact.

Figures

Figures reproduced from arXiv: 2603.02856 by Hongjin Chen, Jieru Zhao, Ke Ma, Pengfei Li, Shihao Ma, Wei Zhang, Wenchao Ding, Xiaohui Wang, Yilun Chen, Yujie Jin, Yupeng Zheng, Zijun Xu, Zining Wang.

Figure 1
Figure 1. Figure 1: The proposed framework, Rhythm, facilitates a spectrum of humanoid–humanoid interactions. (a–c) Contact-Rich Interaction: The method handles interactions ranging from light contact (Greeting) to intensive contact (Hug, Shoulder-to-Shoulder), maintaining fine-grained contact geometry without penetration (shown in the zoomed-in views). (d) Coordinated Interaction: The humanoids perform synchronized long-hori… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of Rhythm. IAMR utilizes decoupled optimization to generate high-quality humanoid-humanoid motion interaction references from human demonstrations. Guided by these references, IGRL employs MAPPO and graph-based rewards to learn robust coupled dynamics. Finally, the deployment module facilitates Sim-to-Real transfer via Lidar-fused state estimation and inter-agent synchronization. robot, encoding t… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of MAGIC. MAGIC contains ∼3 hours of high-fidelity interaction data balanced across five semantic categories (inner chart). Rep￾resentative snapshots (outer ring) illustrate the diversity ranging from loose spatiotemporal coordination to intensive contact. fixed increment (ϕ˙ base = 1.0), we apply a correction: ϕ˙ ego = 1.0 + k(ϕpeer − ϕego), where k is the synchronization gain. This compensates f… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative Visualization of Retargeting on Inter-X. Top: [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative Visualization of Policy. Single Agent (blue) drifts into collisions. w/o Contact Rew (green) achieves low error but exhibits physical “ghosting”. In contrast, Ours enforces valid physical contact. attains high geometric precision (93.4% ISR in Collaborate) by disregarding physical collision constraints, yet fails to maintain valid physical contact (only 52.1% CSR in Light Contact). This is beca… view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of Topological Interaction Priors. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
read the original abstract

Realizing interactive whole-body control for multi-humanoid systems is critical for unlocking complex collaborative capabilities in shared environments. Although recent advancements have significantly enhanced the agility of individual robots, bridging the gap to physically coupled multi-humanoid interaction remains challenging, primarily due to severe kinematic mismatches and complex contact dynamics. To address this, we introduce Rhythm, the first unified framework enabling real-world deployment of dual-humanoid systems for complex, physically plausible interactions. Our framework integrates three core components: (1) an Interaction-Aware Motion Retargeting (IAMR) module that generates feasible humanoid interaction references from human data; (2) an Interaction-Guided Reinforcement Learning (IGRL) policy that masters coupled dynamics via graph-based rewards; and (3) a real-world deployment system that enables robust transfer of dual-humanoid interaction. Extensive experiments on physical Unitree G1 robots demonstrate that our framework achieves robust interactive whole-body control, successfully transferring diverse behaviors such as hugging and dancing from simulation to reality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces the Rhythm framework for interactive whole-body control of dual-humanoid systems. It comprises an Interaction-Aware Motion Retargeting (IAMR) module that generates kinematically feasible references from human data, an Interaction-Guided Reinforcement Learning (IGRL) policy that learns coupled dynamics using graph-based rewards, and a real-world deployment system. The central claim is that this unified approach enables robust sim-to-real transfer of complex physically coupled behaviors such as hugging and dancing on physical Unitree G1 robots, as shown via extensive experiments.

Significance. If the hardware results hold under quantitative scrutiny, the work would advance multi-robot collaboration by addressing kinematic mismatches and contact dynamics in shared environments, an important gap beyond single-robot agility. The graph-reward formulation in IGRL and the IAMR retargeting represent a coherent technical integration worth further study in the field.

major comments (3)
  1. [Abstract] Abstract: the assertion of 'robust interactive whole-body control' and 'successful transferring' of hugging and dancing rests on 'extensive experiments' yet supplies no trial counts, success rates, force-tracking errors, baseline comparisons, or failure-mode statistics, leaving the central sim-to-real claim unquantified and difficult to assess.
  2. [IAMR module and deployment system] IAMR module and deployment system: the assumption that IAMR produces references stable under real contact forces, latency, and sensor noise on Unitree G1 hardware is load-bearing for the transfer claim but is stated without supporting analysis, error metrics, or ablation on kinematic feasibility under dynamics mismatch.
  3. [IGRL policy] IGRL policy: the graph-based reward formulation is presented as key to mastering coupled dynamics, but the manuscript provides no ablations, sensitivity analysis on reward weights, or comparisons showing its contribution relative to standard RL objectives.
minor comments (2)
  1. [Methods] Notation for graph reward weights and interaction terms should be defined with explicit equations or pseudocode in the methods to support reproducibility.
  2. [Figures] Figure captions describing real-robot trials would benefit from added quantitative labels (e.g., number of successful runs) even if detailed tables appear elsewhere.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating revisions where we have strengthened the presentation of results and analysis.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion of 'robust interactive whole-body control' and 'successful transferring' of hugging and dancing rests on 'extensive experiments' yet supplies no trial counts, success rates, force-tracking errors, baseline comparisons, or failure-mode statistics, leaving the central sim-to-real claim unquantified and difficult to assess.

    Authors: We agree that the abstract would be strengthened by explicit quantitative support. In the revised manuscript we have updated the abstract to reference the key metrics already reported in the experiments section (trial counts, success rates for hugging and dancing, and baseline comparisons) and added a concise summary of failure-mode statistics. This makes the central sim-to-real claim directly quantifiable without altering the underlying experimental results. revision: yes

  2. Referee: [IAMR module and deployment system] IAMR module and deployment system: the assumption that IAMR produces references stable under real contact forces, latency, and sensor noise on Unitree G1 hardware is load-bearing for the transfer claim but is stated without supporting analysis, error metrics, or ablation on kinematic feasibility under dynamics mismatch.

    Authors: We acknowledge that the original manuscript provided limited quantitative backing for IAMR stability under real-world conditions. We have added retargeting error metrics, an ablation on kinematic feasibility under simulated dynamics mismatch, and a new discussion of latency and sensor noise effects observed during deployment. Full quantitative force-tracking under contact remains limited by our current hardware instrumentation; we have therefore added an explicit limitations paragraph while retaining the qualitative success of the deployed interactions as supporting evidence. revision: partial

  3. Referee: [IGRL policy] IGRL policy: the graph-based reward formulation is presented as key to mastering coupled dynamics, but the manuscript provides no ablations, sensitivity analysis on reward weights, or comparisons showing its contribution relative to standard RL objectives.

    Authors: We agree that ablations and comparisons would better isolate the contribution of the graph-based rewards. In the revised manuscript we have moved the existing ablation studies from the supplement into the main text, added sensitivity analysis on reward weights, and included direct comparisons against standard RL objectives (dense reward without graph structure). These additions demonstrate the performance gain attributable to the interaction-guided formulation. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical sim-to-real experiments without self-referential derivations or fitted predictions

full rationale

The paper presents a three-component framework (IAMR for retargeting, IGRL for policy learning via graph rewards, and a deployment system) whose central claim of robust dual-humanoid interaction transfer is asserted via 'extensive experiments' on Unitree G1 robots. No equations, uniqueness theorems, or ansatzes are supplied in the text that reduce reported success metrics to quantities defined by the same fitted parameters or prior self-citations. The derivation chain is therefore self-contained as an empirical engineering contribution rather than a closed mathematical loop.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Framework rests on standard RL convergence assumptions and the premise that human-derived references can be retargeted without introducing unmodeled instabilities; no new physical entities are postulated.

free parameters (1)
  • graph reward weights
    Weights balancing contact, balance, and interaction terms are tuned to produce stable policies.
axioms (1)
  • domain assumption Graph-based rewards suffice to master coupled contact dynamics
    Policy training assumes the chosen graph formulation captures all relevant interaction forces.

pith-pipeline@v0.9.0 · 5511 in / 1100 out tokens · 56958 ms · 2026-05-15T17:08:33.559439+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 3 internal anchors

  1. [1]

    Differential coordinates for local mesh morphing and deformation.The Visual Computer, 19 (2):105–114, 2003

    Marc Alexa. Differential coordinates for local mesh morphing and deformation.The Visual Computer, 19 (2):105–114, 2003

  2. [2]

    Visual imitation enables contextual humanoid control

    Arthur Allshire, Hongsuk Choi, Junyi Zhang, David McAllister, Anthony Zhang, Chung Min Kim, Trevor Darrell, Pieter Abbeel, Jitendra Malik, and Angjoo Kanazawa. Visual imitation enables contextual humanoid control. InConference on Robot Learning, 2025

  3. [3]

    Karen Liu

    Joao Pedro Araujo, Yanjie Ze, Pei Xu, Jiajun Wu, and C Karen Liu. Retargeting Matters: General motion retargeting for humanoid motion tracking.arXiv preprint arXiv:2510.02252, 2025

  4. [4]

    HOMIE: Humanoid loco- manipulation with isomorphic exoskeleton cockpit

    Qingwei Ben, Feiyu Jia, Jia Zeng, Junting Dong, Dahua Lin, and Jiangmiao Pang. HOMIE: Humanoid loco- manipulation with isomorphic exoskeleton cockpit. In Robotics: Science and Systems, 2025

  5. [5]

    Humanoid robots and humanoid ai: Review, perspectives and directions.ACM Computing Surveys, 58(4):1–37, 2025

    Longbing Cao. Humanoid robots and humanoid ai: Review, perspectives and directions.ACM Computing Surveys, 58(4):1–37, 2025

  6. [6]

    Symbridge: A human-in-the-loop cyber-physical interactive system for adaptive human- robot symbiosis

    Haoran Chen, Yiteng Xu, Yiming Ren, Yaoqin Ye, Xin- ran Li, Ning Ding, Yuxuan Wu, Yaoze Liu, Peishan Cong, Ziyi Wang, et al. Symbridge: A human-in-the-loop cyber-physical interactive system for adaptive human- robot symbiosis. InProceedings of the SIGGRAPH Asia 2025 Conference Papers, pages 1–12, 2025

  7. [7]

    Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

    Zixuan Chen, Mazeyu Ji, Xuxin Cheng, Xuanbin Peng, Xue Bin Peng, and Xiaolong Wang. GMT: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

  8. [8]

    Expressive whole-body control for humanoid robots

    Xuxin Cheng, Yandong Ji, Junming Chen, Ruihan Yang, Ge Yang, and Xiaolong Wang. Expressive whole-body control for humanoid robots. InRobotics: Science and Systems, 2024

  9. [9]

    Learning human-humanoid coordination for collaborative object carrying.arXiv preprint arXiv:2510.14293, 2025

    Yushi Du, Yixuan Li, Baoxiong Jia, Yutang Lin, Pei Zhou, Wei Liang, Yanchao Yang, and Siyuan Huang. Learning human-humanoid coordination for collaborative object carrying.arXiv preprint arXiv:2510.14293, 2025

  10. [10]

    CooHOI: Learning cooperative human- object interaction with manipulated object dynamics

    Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, and Jiangmiao Pang. CooHOI: Learning cooperative human- object interaction with manipulated object dynamics. Advances in Neural Information Processing Systems, 37: 79741–79763, 2024

  11. [11]

    ReMoS: 3D motion-conditioned reaction synthesis for two-person in- teractions

    Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt, and Philipp Slusallek. ReMoS: 3D motion-conditioned reaction synthesis for two-person in- teractions. InEuropean Conference on Computer Vision (ECCV), 2024

  12. [12]

    Advancing humanoid locomotion: Mastering challenging terrains with denoising world model learning

    Xinyang Gu, Yen-Jen Wang, Xiang Zhu, Chengming Shi, Yanjiang Guo, Yichen Liu, and Jianyu Chen. Advancing humanoid locomotion: Mastering challenging terrains with denoising world model learning. InRobotics: Science and Systems, 2024

  13. [13]

    Kungfubot2: Learn- ing versatile motion skills for humanoid whole-body control.arXiv preprint arXiv:2509.16638, 2025

    Jinrui Han, Weiji Xie, Jiakun Zheng, Jiyuan Shi, Weinan Zhang, Ting Xiao, and Chenjia Bai. KungfuBot2: Learn- ing versatile motion skills for humanoid whole-body control.arXiv preprint arXiv:2509.16638, 2025

  14. [14]

    Point-LIO: robust high- bandwidth light detection and ranging inertial odometry

    Dongjiao He, Wei Xu, Nan Chen, Fanze Kong, Chongjian Yuan, and Fu Zhang. Point-LIO: robust high- bandwidth light detection and ranging inertial odometry. Advanced Intelligent Systems, 5(7):2200459, 2023

  15. [15]

    OmniH2O: Universal and dexterous human- to-humanoid whole-body teleoperation and learning

    Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, and Guanya Shi. OmniH2O: Universal and dexterous human- to-humanoid whole-body teleoperation and learning. In Conference on Robot Learning, 2024

  16. [16]

    Learning human- to-humanoid real-time whole-body teleoperation

    Tairan He, Zhengyi Luo, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, and Guanya Shi. Learning human- to-humanoid real-time whole-body teleoperation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

  17. [17]

    Hodgins, Linxi Fan, Yuke Zhu, Changliu Liu, and Guanya Shi

    Tairan He, Jiawei Gao, Wenli Xiao, Yuanhang Zhang, Zi Wang, Jiashun Wang, Zhengyi Luo, Guanqi He, Nikhil Sobanbabu, Chaoyi Pan, Zeji Yi, Guannan Qu, Kris Ki- tani, Jessica K. Hodgins, Linxi Fan, Yuke Zhu, Changliu Liu, and Guanya Shi. ASAP: Aligning simulation and real-world physics for learning agile humanoid whole- body skills. InRobotics: Science and S...

  18. [18]

    Learning getting-up policies for real-world hu- manoid robots

    Xialin He, Runpei Dong, Zixuan Chen, and Saurabh Gupta. Learning getting-up policies for real-world hu- manoid robots. InRobotics: Science and Systems, 2025

  19. [19]

    Learning humanoid standing-up con- trol across diverse postures

    Tao Huang, Junli Ren, Huayi Wang, Zirui Wang, Qing- wei Ben, Muning Wen, Xiao Chen, Jianan Li, and Jiangmiao Pang. Learning humanoid standing-up con- trol across diverse postures. InRobotics: Science and Systems, 2025

  20. [20]

    Learning whole-body human-humanoid interac- tion from human-human demonstrations.arXiv preprint arXiv:2601.09518, 2026

    Wei-Jin Huang, Yue-Yi Zhang, Yi-Lin Wei, Zhi-Wei Xia, Juantao Tan, Yuan-Ming Li, Zhilin Zhao, and Wei-Shi Zheng. Learning whole-body human-humanoid interac- tion from human-human demonstrations.arXiv preprint arXiv:2601.09518, 2026

  21. [21]

    LCM: A multicast core management protocol for link-state routing networks

    Yih Huang, Eric Fleury, and Philip K McKinley. LCM: A multicast core management protocol for link-state routing networks. InICC’98. 1998 IEEE International Conference on Communications. Conference Record. Af- filiated with SUPERCOMM’98 (Cat. No. 98CH36220), volume 2, pages 1197–1201. IEEE, 1998

  22. [22]

    AMO: Adaptive mo- tion optimization for hyper-dexterous humanoid whole- body control

    Jialong Li, Xuxin Cheng, Tianshu Huang, Shiqi Yang, Ri-Zhao Qiu, and Xiaolong Wang. AMO: Adaptive mo- tion optimization for hyper-dexterous humanoid whole- body control. InRobotics: Science and Systems, 2025

  23. [23]

    Gait-Net-augmented implicit kino-dynamic MPC for dy- namic variable-frequency humanoid locomotion over dis- crete terrains

    Junheng Li, Ziwei Duan, Junchao Ma, and Quan Nguyen. Gait-Net-augmented implicit kino-dynamic MPC for dy- namic variable-frequency humanoid locomotion over dis- crete terrains. InRobotics: Science and Systems, 2025

  24. [24]

    CLONE: Closed-loop whole-body humanoid teleoperation for long-horizon tasks

    Yixuan Li, Yutang Lin, Jieming Cui, Tengyu Liu, Wei Liang, Yixin Zhu, and Siyuan Huang. CLONE: Closed-loop whole-body humanoid teleoperation for long-horizon tasks. InConference on Robot Learning, 2025

  25. [25]

    InterGen: Diffusion-based multi-human motion generation under complex interactions.International Journal of Computer Vision (IJCV), 2024

    Han Liang, Wenqian Zhang, Wenxuan Li, Jingyi Yu, and Lan Xu. InterGen: Diffusion-based multi-human motion generation under complex interactions.International Journal of Computer Vision (IJCV), 2024

  26. [26]

    BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

    Qiayuan Liao, Takara E Truong, Xiaoyu Huang, Yu- man Gao, Guy Tevet, Koushil Sreenath, and C Karen Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

  27. [27]

    Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. InProceedings of the Eleventh International Conference on Machine Learning (ICML), volume 157, pages 157–163. Morgan Kaufmann, 1994

  28. [28]

    Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning

    Chenhao Liu, Leyun Jiang, Yibo Wang, Kairan Yao, Jinchen Fu, and Xiaoyu Ren. Humanoid whole-body badminton via multi-stage reinforcement learning.arXiv preprint arXiv:2511.11218, 2025

  29. [29]

    It Takes Two: Learning interactive whole-body control between humanoid robots.arXiv preprint arXiv:2510.10206, 2025

    Zuhong Liu, Junhao Ge, Minhao Xiong, Jiahao Gu, Bowei Tang, Wei Jing, and Siheng Chen. It Takes Two: Learning interactive whole-body control between humanoid robots.arXiv preprint arXiv:2510.10206, 2025

  30. [30]

    Perpetual humanoid control for real-time simulated avatars

    Zhengyi Luo, Jinkun Cao, Alexander Winkler, Kris Ki- tani, and Weipeng Xu. Perpetual humanoid control for real-time simulated avatars. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 10895– 10904, 2023

  31. [31]

    DeepMimic: Example-guided deep re- inforcement learning of physics-based character skills

    Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel Van de Panne. DeepMimic: Example-guided deep re- inforcement learning of physics-based character skills. ACM Transactions On Graphics (TOG), 37(4):1–14, 2018

  32. [32]

    ASE: Large-scale reusable adversarial skill embeddings for physically simulated characters.ACM Transactions on Graphics (TOG), 41 (4):1–17, 2022

    Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, and Sanja Fidler. ASE: Large-scale reusable adversarial skill embeddings for physically simulated characters.ACM Transactions on Graphics (TOG), 41 (4):1–17, 2022

  33. [33]

    Non-conflicting energy minimization in reinforce- ment learning based robot control

    Skand Peri, Akhil Perincherry, Bikram Pandit, and Stefan Lee. Non-conflicting energy minimization in reinforce- ment learning based robot control. InConference on Robot Learning, 2025

  34. [34]

    Humanoid Goalkeeper: Learning from position conditioned task-motion constraints.arXiv preprint arXiv:2510.18002, 2025

    Junli Ren, Junfeng Long, Tao Huang, Huayi Wang, Zirui Wang, Feiyu Jia, Wentao Zhang, Jingbo Wang, Ping Luo, and Jiangmiao Pang. Humanoid Goalkeeper: Learning from position conditioned task-motion constraints.arXiv preprint arXiv:2510.18002, 2025

  35. [35]

    Generalized-ICP

    Aleksandr Segal, Dirk Haehnel, and Sebastian Thrun. Generalized-ICP. InRobotics: Science and Systems, volume 2, page 435. Seattle, W A, 2009

  36. [36]

    LangWBC: Language-directed humanoid whole-body control via end-to-end learning

    Yiyang Shao, Bike Zhang, Qiayuan Liao, Xiaoyu Huang, Yuman Gao, Yufeng Chi, Zhongyu Li, Sophia Shao, and Koushil Sreenath. LangWBC: Language-directed humanoid whole-body control via end-to-end learning. InRobotics: Science and Systems, 2025

  37. [37]

    A comprehensive review of humanoid robots.SmartBot, 1(1):e12008, 2025

    Qincheng Sheng, Zhongxiang Zhou, Jinhao Li, Xiangyu Mi, Pingyu Xiang, Zhenghan Chen, Haocheng Xu, Shen- han Jia, Xiyang Wu, Yuxiang Cui, et al. A comprehensive review of humanoid robots.SmartBot, 1(1):e12008, 2025

  38. [38]

    Local motion phases for learning multi-contact character movements.ACM Transactions on Graphics, 39(4), 2020

    Sebastian Starke, Yiwei Zhao, Taku Komura, and Kazi Zaman. Local motion phases for learning multi-contact character movements.ACM Transactions on Graphics, 39(4), 2020

  39. [39]

    Hitter: A humanoid table tennis robot via hierarchical planning and learning.arXiv preprint arXiv:2508.21043, 2025

    Zhi Su, Bike Zhang, Nima Rahmanian, Yuman Gao, Qiayuan Liao, Caitlin Regan, Koushil Sreenath, and S Shankar Sastry. HITTER: A humanoid table tennis robot via hierarchical planning and learning.arXiv preprint arXiv:2508.21043, 2025

  40. [40]

    BeamDojo: Learning agile humanoid locomotion on sparse footholds

    Huayi Wang, Zirui Wang, Junli Ren, Qingwei Ben, Tao Huang, Weinan Zhang, and Jiangmiao Pang. BeamDojo: Learning agile humanoid locomotion on sparse footholds. InRobotics: Science and Systems, 2025

  41. [41]

    InterMoE: Individual-specific 3d human interaction generation via dynamic temporal- selective moe

    Lipeng Wang, Hongxing Fan, Haohua Chen, Zehuan Huang, and Lu Sheng. InterMoE: Individual-specific 3d human interaction generation via dynamic temporal- selective moe. InProceedings of the AAAI Conference on Artificial Intelligence, 2026

  42. [42]

    Skillmimic: Learning basketball interaction skills from demonstrations

    Yinhuai Wang, Qihan Zhao, Runyi Yu, Hok Wai Tsui, Ailing Zeng, Jing Lin, Zhengyi Luo, Jiwen Yu, Xiu Li, Qifeng Chen, Jian Zhang, Lei Zhang, and Ping Tan. Skillmimic: Learning basketball interaction skills from demonstrations. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 17540– 17549, June 2025

  43. [43]

    Control strategies for physically simulated characters performing two-player competitive sports.ACM Trans- actions on Graphics (TOG), 40(4):1–11, 2021

    Jungdam Won, Deepak Gopinath, and Jessica Hodgins. Control strategies for physically simulated characters performing two-player competitive sports.ACM Trans- actions on Graphics (TOG), 40(4):1–11, 2021

  44. [44]

    KungfuBot: Physics-based humanoid whole- body control for learning highly-dynamic skills.Ad- vances in Neural Information Processing Systems, 2025

    Weiji Xie, Jinrui Han, Jiakun Zheng, Huanyu Li, Xinzhe Liu, Jiyuan Shi, Weinan Zhang, Chenjia Bai, and Xue- long Li. KungfuBot: Physics-based humanoid whole- body control for learning highly-dynamic skills.Ad- vances in Neural Information Processing Systems, 2025

  45. [45]

    Inter-X: Towards versatile human- human interaction analysis

    Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, and Xiaokang Yang. Inter-X: Towards versatile human- human interaction analysis. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

  46. [46]

    Perceiving and acting in first-person: A dataset and benchmark for egocentric human-object- human interactions

    Liang Xu, Chengqun Yang, Zili Lin, Fei Xu, Yifan Liu, Congsheng Xu, Yiyi Zhang, Jie Qin, Xingdong Sheng, Yunhui Liu, et al. Perceiving and acting in first-person: A dataset and benchmark for egocentric human-object- human interactions. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12535–12548, 2025

  47. [47]

    Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input

    Zifan Xu, Myoungkyu Seo, Dongmyeong Lee, Hao Fu, Jiaheng Hu, Jiaxun Cui, Yuqian Jiang, Zhihan Wang, Anastasiia Brund, Joydeep Biswas, et al. Learning agile striker skills for humanoid soccer robots from noisy sensory input.arXiv preprint arXiv:2512.06571, 2025

  48. [48]

    A unified and general humanoid whole-body controller for fine-grained locomotion

    Yufei Xue, Wentao Dong, Minghuan Liu, Weinan Zhang, and Jiangmiao Pang. A unified and general humanoid whole-body controller for fine-grained locomotion. In Robotics: Science and Systems, 2025

  49. [49]

    Omniretarget: Interaction- preserving data generation for humanoid whole-body loco-manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

    Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C Karen Liu, Rocky Duan, and Guanya Shi. OmniRetarget: Interaction- preserving data generation for humanoid whole-body loco-manipulation and scene interaction.arXiv preprint arXiv:2509.26633, 2025

  50. [50]

    PhysiInter: Integrating physical mapping for high-fidelity human interaction generation.arXiv preprint arXiv:2506.07456, 2025

    Wei Yao, Yunlian Sun, Chang Liu, Hongwen Zhang, and Jinhui Tang. PhysiInter: Integrating physical mapping for high-fidelity human interaction generation.arXiv preprint arXiv:2506.07456, 2025

  51. [51]

    Hi4D: 4D instance segmentation of close human interaction

    Yifei Yin, Chen Guo, Manuel Kaufmann, Juan Jose Zarate, Jie Song, and Otmar Hilliges. Hi4D: 4D instance segmentation of close human interaction. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

  52. [52]

    TWIST: Tele- operated whole-body imitation system

    Yanjie Ze, Zixuan Chen, Joao Pedro Araujo, Zi-ang Cao, Xue Bin Peng, Jiajun Wu, and Karen Liu. TWIST: Tele- operated whole-body imitation system. InConference on Robot Learning, 2025

  53. [53]

    HuB: Learning extreme humanoid balance

    Tong Zhang, Boyuan Zheng, Ruiqian Nai, Yingdong Hu, Yen-Jen Wang, Geng Chen, Fanqi Lin, Jiongye Li, Chuye Hong, Koushil Sreenath, and Yang Gao. HuB: Learning extreme humanoid balance. InConference on Robot Learning, 2025

  54. [54]

    Simulation and retargeting of complex multi-character interactions

    Yunbo Zhang, Deepak Gopinath, Yuting Ye, Jessica Hodgins, Greg Turk, and Jungdam Won. Simulation and retargeting of complex multi-character interactions. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023

  55. [55]

    Track any motions under any disturbances, 2025

    Zhikai Zhang, Jun Guo, Chao Chen, Jilong Wang, Chenghuai Lin, Yunrui Lian, Han Xue, Zhenrong Wang, Maoqi Liu, Jiangran Lyu, et al. Track any motions un- der any disturbances.arXiv preprint arXiv:2509.13833, 2025

  56. [56]

    Large mesh deformation using the volumetric graph laplacian.ACM Transactions on Graphics (TOG), 24(3):496–503, 2005

    Kun Zhou, Jin Huang, John Snyder, Xinguo Liu, Hujun Bao, Baining Guo, and Heung-Yeung Shum. Large mesh deformation using the volumetric graph laplacian.ACM Transactions on Graphics (TOG), 24(3):496–503, 2005. APPENDIX Overview This appendix is organized into three main sections (A– C) to support the clarity and reproducibility of the proposed framework,Rh...

  57. [57]

    Compatibility with Heterogeneous Motion Sources: Human motion datasets contain rich pose and interaction information but differ significantly in data format and physical attributes (e.g., height, body proportions). A key strength of our framework is its input-agnostic design: we first abstract diverse inputs into a standardized representation—time-series ...

  58. [58]

    Optimization Details and Hyperparameters:We explic- itly formulate the optimization objectives and constraints used to solve the kinematic conflict described in Sec. III-A. Optimization Formulation.We solve the retargeting problem frame-by-frame using a Sequential Quadratic Programming (SQP) approach. For each framet, we optimize the joint configurationsq...

  59. [59]

    Topological Graph Visualization:To provide an intuitive understanding of the topological priors, we visualize the ex- tracted graph structures using a representative interaction case (e.g., a handshake task), as shown in Fig. 7. The visualization highlights two distinct connectivity types used by IAMR: •The Interaction Graph (Yellow Edges):Bridges the key...

  60. [60]

    The network input is composed of three semantic groups, which are processed by specialized encoders before being fused for action generation

    Network Architecture:Our policyπ θ(at|ot)employs a hierarchical encoder-decoder architecture designed to process heterogeneous temporal data. The network input is composed of three semantic groups, which are processed by specialized encoders before being fused for action generation. Observation Space & Inputs To capture complex coupled dynamics, the polic...

  61. [61]

    touching without force

    Reward Definitions:The total rewardr t is computed as a weighted sum of terms designed to balance kinematic fidelity with interaction plausibility, as detailed in Table IV. We prioritize interaction-centric objectives (e.g., relative geometry and contact) over individual tracking precision to encourage compliant multi-agent coupling. Interaction Graph Rew...

  62. [62]

    survival

    Robust Training Strategy:To ensure transferability to the physical world and handle the complexity of coupled interaction phases, we implement a rigorous training protocol comprising error-aware curriculum-based adaptive sampling and extensive domain randomization. Curriculum-based Adaptive Sampling Standard Reference State Initialization (RSI) relies on ...

  63. [63]

    cross-agent

    Baseline Implementation Details:To strictly validate our contributions, we benchmark our framework against two sets of baselines: kinematic retargeting methods (answering Q1) and dynamic policy learning variants (answering Q2). Retargeting Baselines (Kinematic Level) All baselines utilize the same source motion data and undergo identical skeletal scaling ...

  64. [64]

    Retargeting Evaluation Metrics (Q1) We evaluate retargeting quality from three complementary as- pects: physical feasibility, interaction fidelity, and downstream utility

    Evaluation Metric Implementation Details:We employ specific metrics for each component of our framework to eval- uate kinematic quality and dynamic performance respectively. Retargeting Evaluation Metrics (Q1) We evaluate retargeting quality from three complementary as- pects: physical feasibility, interaction fidelity, and downstream utility. •Inter-Pene...

  65. [65]

    cold start

    Sim-to-Real Hardware:We validate our approach on the Unitree G1 humanoid robot platform. To bridge the gap between ideal simulation states and real-world noisy sensor data, we implement a fully onboard perception and control stack written in C++ for real-time performance. Robot Platform & Compute.The Unitree G1 (approx. 1.3 m height, 29 DoF) serves as our...