Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning

Kashu Yamazaki; Katerina Fragkiadaki; Lalit Jayanti; Mariko Isogawa; Yoshimitsu Aoki; Yuto Shibata

arxiv: 2603.11346 · v2 · submitted 2026-03-11 · 💻 cs.CV · cs.GR· cs.RO

Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning

Yuto Shibata , Kashu Yamazaki , Lalit Jayanti , Yoshimitsu Aoki , Mariko Isogawa , Katerina Fragkiadaki This is my paper

Pith reviewed 2026-05-15 12:32 UTC · model grok-4.3

classification 💻 cs.CV cs.GRcs.RO

keywords assistive roboticsmulti-agent reinforcement learninghumanoid motion trackingphysics simulationhuman-human interactioncontact-rich controlpartner-aware policies

0 comments

The pith

Multi-agent reinforcement learning allows humanoid robots to track and assist in force-exchanging human motions for the first time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper frames the imitation of closely interacting, force-exchanging human-human motions as a multi-agent reinforcement learning problem in which a supporter agent and recipient agent are trained jointly inside a physics simulator. It introduces partner-policy initialization from single-human trackers, dynamic reference retargeting that updates the assistant's target pose from the recipient's live state, and contact-promoting rewards to keep support physically meaningful. A sympathetic reader would care because assistive robotics for caregiving requires continuous awareness of a partner's evolving posture and dynamics, capabilities that prior single-agent motion trackers lack. The result is the first method shown to track such interactive sequences on established benchmarks.

Core claim

We formulate the imitation of closely interacting, force-exchanging human-human motion sequences as a multi-agent reinforcement learning problem. We jointly train partner-aware policies for both the supporter and recipient agents in a physics simulator, using a partner policies initialization scheme that transfers priors from single-human motion-tracking controllers, together with dynamic reference retargeting and contact-promoting rewards, to achieve the first successful tracking of assistive interaction motions on established benchmarks.

What carries the argument

Jointly trained partner-aware multi-agent policies that combine initialization from single-human trackers, dynamic reference retargeting to the recipient's current pose, and contact-promoting rewards inside a physics simulator.

If this is right

Humanoid robots can now produce continuous, force-exchanging support behaviors rather than isolated contact-free motions.
Policies become responsive to the real-time posture and dynamics of a human partner.
Simulation training yields both physically stable and socially aware control policies.
The multi-agent formulation extends general motion tracking to interactive caregiving scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Successful simulation policies could reduce the need for large-scale real-robot data collection in assistive applications.
The same joint-training structure might transfer to other multi-person physical tasks such as collaborative lifting or object passing.
Dynamic retargeting could be adapted into non-RL controllers to improve responsiveness without full retraining.

Load-bearing premise

The assumption that policies trained with the proposed partner initialization, dynamic retargeting, and contact reward in simulation will produce behaviors that are both physically stable and transferable to real humanoid robots without additional real-world fine-tuning.

What would settle it

Deploying the trained supporter policy on a physical humanoid robot and observing whether it maintains stable contact, adapts to a live human partner's posture changes, and completes assistive tasks such as supported walking without falling or losing physical grounding.

Figures

Figures reproduced from arXiv: 2603.11346 by Kashu Yamazaki, Katerina Fragkiadaki, Lalit Jayanti, Mariko Isogawa, Yoshimitsu Aoki, Yuto Shibata.

**Figure 1.** Figure 1: AssistMimic: We propose a multi-agent RL framework capable of learning robust Supporter and Recipient policies from noisy, close-proximity motion sequences. By leveraging single-person motion priors, a novel recipient-adaptive reference retargeting mechanism, and contact-promoting rewards, AssistMimic becomes the first physics-based controller to successfully track such complex, high-contact reference moti… view at source ↗

**Figure 2.** Figure 2: Learning contact-rich assistive behaviors is substantially [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of AssistMimic. We train tracking-based humanoid control policies for both the recipient and the supporter, optimizing them to imitate a paired reference motion sequence. Our architecture builds on the single-agent tracking framework of PHC [10], extending it with partner-aware state inputs and augmenting standard imitation rewards with recipient-aware reference retargeting and contact-incentiviz… view at source ↗

**Figure 4.** Figure 4: Failure of Kinematic Baselines. We first show that the recipient kinematic replay is ill-posed for assistive tasks. In [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 7.** Figure 7: Qualitative results: unseen interactions and failures. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 6.** Figure 6: Tracking results of AssistMimic on interaction trajecto [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 8.** Figure 8: The qualitative comparison of the specialist with Inter-X dataset. Supporter (orange) and Recipient (blue). Left to Right in [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: The qualitative comparison with HHI-Assist dataset. Supporter (orange) and Recipient (blue). [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

read the original abstract

Humanoid robotics has strong potential to transform daily service and caregiving applications. Although recent advances in general motion tracking within physics engines (GMT) have enabled virtual characters and humanoid robots to reproduce a broad range of human motions, these behaviors are primarily limited to contact-less social interactions or isolated movements. Assistive scenarios, by contrast, require continuous awareness of a human partner and rapid adaptation to their evolving posture and dynamics. In this paper, we formulate the imitation of closely interacting, force-exchanging human-human motion sequences as a multi-agent reinforcement learning problem. We jointly train partner-aware policies for both the supporter (assistant) agent and the recipient agent in a physics simulator to track assistive motion references. To make this problem tractable, we introduce a partner policies initialization scheme that transfers priors from single-human motion-tracking controllers, greatly improving exploration. We further propose dynamic reference retargeting and contact-promoting reward, which adapt the assistant's reference motion to the recipient's real-time pose and encourage physically meaningful support. We show that AssistMimic is the first method capable of successfully tracking assistive interaction motions on established benchmarks, demonstrating the benefits of a multi-agent RL formulation for physically grounded and socially aware humanoid control.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends motion tracking to force-exchanging assistance with multi-agent RL and some practical add-ons, but the multi-agent benefit still needs direct ablations to hold up.

read the letter

The main point is that this work takes general motion tracking and reframes assistive human-human interactions as a two-agent RL problem where both the supporter and recipient learn policies that stay aware of each other. The initialization from single-human trackers, the dynamic retargeting of the reference motion, and the contact-promoting reward are the concrete pieces that make the joint training feasible in simulation. These choices address real exploration and adaptation issues that single-agent methods hit when contact and force exchange enter the picture. The relevance to caregiving and service tasks is straightforward and the positioning against prior GMT work is clear. If the full paper shows quantitative tracking success on the benchmarks with those components, it gives the field a workable starting point for physically grounded assistance. The soft spot is exactly the one the stress-test flags: without an ablation that runs a single-agent assistant against a replayed or fixed recipient using the identical retargeting and contact reward, it is hard to know whether the joint multi-agent training is doing the heavy lifting or whether the other engineering pieces are sufficient. The abstract claim of being first to succeed rests on that distinction. Sim-to-real transfer is also left as an assumption rather than demonstrated, which is common but still a gap for a robotics paper. This is for people already working on humanoid RL and physics-based control. A reader in that area would get value from the formulation and the specific techniques even if the experiments need tightening. It is worth sending for peer review so the authors can add the missing single-agent comparisons and any available metrics or failure cases.

Referee Report

3 major / 1 minor

Summary. The paper introduces AssistMimic, a multi-agent reinforcement learning approach for physics-based simulation of assistive human-human interactions. It jointly trains supporter and recipient policies to track reference motions by transferring priors via a partner initialization scheme from single-human trackers, applying dynamic reference retargeting based on the recipient's real-time pose, and adding a contact-promoting reward term. The central claim is that this is the first method to successfully track such force-exchanging assistive motions on established benchmarks, demonstrating advantages of the multi-agent formulation for physically grounded and socially aware humanoid control.

Significance. If the performance claims hold under quantitative scrutiny, the work would advance humanoid robotics toward realistic caregiving and service scenarios by extending motion tracking to continuous physical contact and mutual adaptation. The initialization scheme and retargeting mechanism are practical contributions that could improve training stability in multi-agent settings. However, the significance hinges on whether the multi-agent aspect provides gains beyond the auxiliary components, which remains unverified.

major comments (3)

[Abstract] Abstract: The assertion that AssistMimic is the first method capable of successfully tracking assistive interaction motions on benchmarks is unsupported by any reported quantitative metrics (e.g., tracking error, success rate), ablation results, or baseline comparisons, rendering the performance claim impossible to evaluate.
[Method and Experiments] Method and Experiments sections: The central claim credits the multi-agent RL formulation for enabling stable tracking, yet the manuscript introduces partner initialization, dynamic retargeting, and contact reward without an ablation against a single-agent baseline (assistant policy only, with recipient motion replayed) that retains the same three components; this leaves open whether joint training is load-bearing or if the auxiliary techniques suffice.
[Experiments] Experiments: The assumption that simulation-trained policies will produce stable, transferable behaviors on real humanoid robots without additional fine-tuning is stated as a strength but is unsupported by any real-world experiments, sim-to-real gap analysis, or failure-case discussion.

minor comments (1)

[Abstract] Abstract: Include at least one key quantitative result (e.g., average tracking error or success percentage) to ground the claim of successful tracking.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, clarifying our contributions and outlining planned revisions to improve the presentation and analysis.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that AssistMimic is the first method capable of successfully tracking assistive interaction motions on benchmarks is unsupported by any reported quantitative metrics (e.g., tracking error, success rate), ablation results, or baseline comparisons, rendering the performance claim impossible to evaluate.

Authors: The experiments section reports quantitative tracking errors, success rates, and comparisons against prior single-agent motion tracking baselines, which fail to maintain stable contact and force exchange on the assistive benchmarks. We will revise the abstract to explicitly reference these metrics and the performance gap relative to baselines, making the 'first successful method' claim directly supported by the reported results. revision: partial
Referee: [Method and Experiments] Method and Experiments sections: The central claim credits the multi-agent RL formulation for enabling stable tracking, yet the manuscript introduces partner initialization, dynamic retargeting, and contact reward without an ablation against a single-agent baseline (assistant policy only, with recipient motion replayed) that retains the same three components; this leaves open whether joint training is load-bearing or if the auxiliary techniques suffice.

Authors: We agree that an explicit ablation isolating the multi-agent joint training is needed. Although replaying a fixed recipient trajectory would not capture the mutual adaptation essential to assistive tasks, we will add a new experiment comparing the full multi-agent policies against a single-agent assistant policy that uses identical partner initialization, dynamic retargeting, and contact-promoting reward but with a non-adaptive recipient. This will quantify the additional benefit of joint optimization. revision: yes
Referee: [Experiments] Experiments: The assumption that simulation-trained policies will produce stable, transferable behaviors on real humanoid robots without additional fine-tuning is stated as a strength but is unsupported by any real-world experiments, sim-to-real gap analysis, or failure-case discussion.

Authors: The manuscript emphasizes the simulation results as the core algorithmic contribution enabling physically grounded assistive tracking. We will add a dedicated paragraph discussing observed simulation failure modes (e.g., contact instability under high forces) and potential sim-to-real challenges such as actuator delays and sensor noise. Full real-robot validation, however, lies outside the current scope. revision: partial

standing simulated objections not resolved

Comprehensive real-world experiments on physical humanoid robots to validate sim-to-real transfer and stability without fine-tuning

Circularity Check

0 steps flagged

No circularity; new components proposed independently of results

full rationale

The paper formulates assistive human-human tracking as a multi-agent RL problem and introduces three distinct engineering components: a partner initialization scheme transferring priors from single-human trackers, dynamic reference retargeting that adapts the assistant's motion to the recipient's real-time pose, and a contact-promoting reward. These are presented as novel inputs to the training process rather than quantities derived from the final policies or metrics. No equations reduce the claimed success to a fitted parameter or self-referential definition, and no load-bearing self-citations or uniqueness theorems are invoked to force the multi-agent choice. The derivation chain is therefore self-contained as an empirical method proposal whose validity rests on benchmark performance, not on tautological reduction to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the physics simulator faithfully reproducing human-like contact forces and on the heuristic initialization and reward terms being sufficient to solve the exploration problem; these are not derived from first principles.

free parameters (1)

reward weights for contact-promoting term and tracking terms
Standard RL practice requires tuning scalar weights on each reward component; these are not stated to be derived analytically.

axioms (1)

domain assumption The physics engine produces sufficiently accurate contact forces and dynamics for human-like assistance motions
All training and evaluation occurs inside the simulator; no real-world validation is described.

pith-pipeline@v0.9.0 · 5538 in / 1242 out tokens · 35847 ms · 2026-05-15T12:32:47.721154+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 2 internal anchors

[1]

Emergent complexity via multi-agent competition

Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. Emergent complexity via multi-agent competition. InInternational Conference on Learning Representations, pages 1–12, 2018. 3

work page 2018
[2]

Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

Zixuan Chen, Mazeyu Ji, Xuxin Cheng, Xuanbin Peng, Xue Bin Peng, and Xiaolong Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025. 1, 2

work page arXiv 2025
[3]

Interaction transformer for human reaction generation.IEEE Transactions on Multimedia, 25: 8842–8854, 2023

Baptiste Chopin, Hao Tang, Naima Otberdout, Mohamed Daoudi, and Nicu Sebe. Interaction transformer for human reaction generation.IEEE Transactions on Multimedia, 25: 8842–8854, 2023. 3

work page 2023
[4]

Coohoi: Learning cooperative human-object interac- tion with manipulated object dynamics.Advances in Neural Information Processing Systems, 37:79741–79763, 2024

Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, and Jiangmiao Pang. Coohoi: Learning cooperative human-object interac- tion with manipulated object dynamics.Advances in Neural Information Processing Systems, 37:79741–79763, 2024. 2, 3

work page 2024
[5]

Hover: Versatile neural whole-body con- troller for humanoid robots

Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiao- long Wang, et al. Hover: Versatile neural whole-body con- troller for humanoid robots. InIEEE International Confer- ence on Robotics and Automation (ICRA), pages 9989–9996. IEEE, 2025. 2

work page 2025
[6]

Towards immersive human-x interaction: A real-time framework for physically plausible motion synthesis

Kaiyang Ji, Ye Shi, Zichen Jin, Kangyi Chen, Lan Xu, Yuexin Ma, Jingyi Yu, and Jingya Wang. Towards immersive human-x interaction: A real-time framework for physically plausible motion synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10173–10183, 2025. 2, 3, 6

work page 2025
[7]

Exbody2: Advanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196, 2024

Mazeyu Ji, Xuanbin Peng, Fangchen Liu, Jialong Li, Ge Yang, Xuxin Cheng, and Xiaolong Wang. Exbody2: Ad- vanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196, 2024. 2

work page arXiv 2024
[8]

Intergen: Diffusion-based multi-human motion gener- ation under complex interactions.International Journal of Computer Vision, 132(9):3463–3483, 2024

Huan Liang, Wenyu Zhang, Wenhao Li, Jingyi Yu, and Lan Xu. Intergen: Diffusion-based multi-human motion gener- ation under complex interactions.International Journal of Computer Vision, 132(9):3463–3483, 2024. 3

work page 2024
[9]

Phys- reaction: Physically plausible real-time humanoid reaction synthesis via forward dynamics guided 4d imitation

Yunze Liu, Changxi Chen, Chenjing Ding, and Li Yi. Phys- reaction: Physically plausible real-time humanoid reaction synthesis via forward dynamics guided 4d imitation. InPro- ceedings of the 32nd ACM International Conference on Mul- timedia, pages 3771–3780, 2024. 2, 3, 6

work page 2024
[10]

Perpetual humanoid control for real-time simulated avatars

Zhengyi Luo, Jinkun Cao, Alexander Winkler, Kris Kitani, and Weipeng Xu. Perpetual humanoid control for real-time simulated avatars. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 10895–10904, 2023. 1, 2, 3, 4, 6, 12

work page 2023
[11]

Amass: Archive of motion capture as surface shapes

Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Ger- ard Pons-Moll, and Michael J Black. Amass: Archive of motion capture as surface shapes. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5442–5451, 2019. 12

work page 2019
[12]

Neural probabilistic motor primitives for humanoid control

Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, and Nicolas Manfred Otto Heess. Neural probabilistic motor primitives for humanoid control. InInternational Confer- ence on Learning Representations, pages 1–14, 2019. 3

work page 2019
[13]

Learning predict-and-simulate policies from un- organized human motion data.ACM Transactions on Graph- ics, 38(6), 2019

Soohwan Park, Hoseok Ryu, Seyoung Lee, Sunmin Lee, and Jehee Lee. Learning predict-and-simulate policies from un- organized human motion data.ACM Transactions on Graph- ics, 38(6), 2019. 3

work page 2019
[14]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3d hands, face, and body from a single image. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10975–10985, 2019. 6, 15

work page 2019
[15]

Deepmimic: Example-guided deep reinforce- ment learning of physics-based character skills.ACM Trans- actions on Graphics, 37(4):143:1–143:14, 2018

Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. Deepmimic: Example-guided deep reinforce- ment learning of physics-based character skills.ACM Trans- actions on Graphics, 37(4):143:1–143:14, 2018. 3

work page 2018
[16]

Amp: Adversarial motion priors for styl- ized physics-based character control.ACM Transactions on Graphics, 40(4):1–20, 2021

Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. Amp: Adversarial motion priors for styl- ized physics-based character control.ACM Transactions on Graphics, 40(4):1–20, 2021. 3, 13

work page 2021
[17]

Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters.ACM Transactions on Graphics, 41(4), 2022

Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, and Sanja Fidler. Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters.ACM Transactions on Graphics, 41(4), 2022. 3

work page 2022
[18]

A re- duction of imitation learning and structured prediction to no- regret online learning

St ´ephane Ross, Geoffrey Gordon, and Drew Bagnell. A re- duction of imitation learning and structured prediction to no- regret online learning. InProceedings of the fourteenth inter- national conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceed- ings, 2011. 5, 12, 13

work page 2011
[19]

Hhi-assist: A dataset and benchmark of human-human interaction in phys- ical assistance scenario.IEEE Robotics and Automation Let- ters, 2025

Saeed Saadatnejad, Reyhaneh Hosseininejad, Jose Barreiros, Katherine M Tsui, and Alexandre Alahi. Hhi-assist: A dataset and benchmark of human-human interaction in phys- ical assistance scenario.IEEE Robotics and Automation Let- ters, 2025. 2, 3, 6

work page 2025
[20]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.ArXiv, abs/1910.01108, 2019. 15

work page internal anchor Pith review Pith/arXiv arXiv 1910
[21]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 5, 12

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

Human motion diffusion as a generative prior

Yoni Shafir, Guy Tevet, Roy Kapon, and Amit Haim Bermano. Human motion diffusion as a generative prior. InInternational Conference on Learning Representations, pages 1–17, 2024. 3

work page 2024
[23]

Maskedmimic: Unified physics-based char- acter control through masked motion inpainting.ACM Trans- actions on Graphics, 43(6), 2024

Chen Tessler, Yunrong Guo, Ofir Nabati, Gal Chechik, and Xue Bin Peng. Maskedmimic: Unified physics-based char- acter control through masked motion inpainting.ACM Trans- actions on Graphics, 43(6), 2024. 1

work page 2024
[24]

arXiv preprint arXiv:2505.19086 (2025) 2, 3

Chen Tessler, Yifeng Jiang, Erwin Coumans, Zhengyi Luo, Gal Chechik, and Xue Bin Peng. Maskedmanipulator: Versatile whole-body control for loco-manipulation.arXiv preprint arXiv:2505.19086, 2025. 5 9

work page arXiv 2025
[25]

Human motion diffu- sion model

Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. Human motion diffu- sion model. InInternational Conference on Learning Repre- sentations, 2023. 8, 15

work page 2023
[26]

Actformer: A gan- based transformer towards general action-conditioned 3d hu- man motion generation

Liang Xu, Ziyang Song, Dongliang Wang, Jing Su, Zhicheng Fang, Chenjing Ding, Weihao Gan, Yichao Yan, Xin Jin, Xi- aokang Yang, Wenjun Zeng, and Wei Wu. Actformer: A gan- based transformer towards general action-conditioned 3d hu- man motion generation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2228–2238, 2023. 3

work page 2023
[27]

Inter-x: Towards versatile human- human interaction analysis

Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, et al. Inter-x: Towards versatile human- human interaction analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22260–22271, 2024. 2, 3, 6

work page 2024
[28]

Regennet: Towards human action-reaction synthesis

Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, and Wenjun Zeng. Regennet: Towards human action-reaction synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1759–1769, 2024. 3

work page 2024
[29]

Parc: Physics-based augmentation with reinforcement learning for character controllers

Michael Xu, Yi Shi, KangKang Yin, and Xue Bin Peng. Parc: Physics-based augmentation with reinforcement learning for character controllers. InProceedings of the Special Inter- est Group on Computer Graphics and Interactive Techniques Conference, number 131, pages 1–11, 2025. 2

work page 2025
[30]

Intermimic: Towards universal whole-body control for physics-based human-object interactions

Sirui Xu, Hung Yu Ling, Yu-Xiong Wang, and Liang-Yan Gui. Intermimic: Towards universal whole-body control for physics-based human-object interactions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12266–12277, 2025. 1, 5

work page 2025
[31]

The surprising effec- tiveness of ppo in cooperative multi-agent games.Advances in Neural Information Processing Systems, 35:24611–24624,

Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effec- tiveness of ppo in cooperative multi-agent games.Advances in Neural Information Processing Systems, 35:24611–24624,

work page
[32]

Simpoe: Simulated character control for 3d human pose estimation

Ye Yuan, Shih-En Wei, Tomas Simon, Kris Kitani, and Jason Saragih. Simpoe: Simulated character control for 3d human pose estimation. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7159–7169, 2021. 3

work page 2021
[33]

Physdiff: Physics-guided human motion diffusion model.Proceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV), pages 16010–16021, 2023

Ye Yuan, Jiaming Song, Umar Iqbal, Arash Vahdat, and Jan Kautz. Physdiff: Physics-guided human motion diffusion model.Proceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV), pages 16010–16021, 2023. 3

work page 2023
[34]

Trackanymotionsunderanydisturbances.arXivpreprintarXiv:2509.13833,

Zhikai Zhang, Jun Guo, Chao Chen, Jilong Wang, Chenghuai Lin, Yunrui Lian, Han Xue, Zhenrong Wang, Maoqi Liu, Jiangran Lyu, Huaping Liu, He Wang, and Li Yi. Track any motions under any disturbances.arXiv preprint arXiv:2509.13833, 2025. 2

work page arXiv 2025
[35]

Resmimic: From general motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025

Siheng Zhao, Yanjie Ze, Yue Wang, C Karen Liu, Pieter Abbeel, Guanya Shi, and Rocky Duan. Resmimic: From general motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025. 1

work page arXiv 2025
[36]

Humanoid parkour learning

Ziwen Zhuang, Shenzhe Yao, and Hang Zhao. Humanoid parkour learning. InProceedings of The 8th Conference on Robot Learning (CoRL), pages 1975–1991. PMLR, 2025. 1 10 Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning Supplementary Material A. Overview of the Supplementary Materials This supplementary document co...

work page arXiv 1975

[1] [1]

Emergent complexity via multi-agent competition

Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. Emergent complexity via multi-agent competition. InInternational Conference on Learning Representations, pages 1–12, 2018. 3

work page 2018

[2] [2]

Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

Zixuan Chen, Mazeyu Ji, Xuxin Cheng, Xuanbin Peng, Xue Bin Peng, and Xiaolong Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025. 1, 2

work page arXiv 2025

[3] [3]

Interaction transformer for human reaction generation.IEEE Transactions on Multimedia, 25: 8842–8854, 2023

Baptiste Chopin, Hao Tang, Naima Otberdout, Mohamed Daoudi, and Nicu Sebe. Interaction transformer for human reaction generation.IEEE Transactions on Multimedia, 25: 8842–8854, 2023. 3

work page 2023

[4] [4]

Coohoi: Learning cooperative human-object interac- tion with manipulated object dynamics.Advances in Neural Information Processing Systems, 37:79741–79763, 2024

Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, and Jiangmiao Pang. Coohoi: Learning cooperative human-object interac- tion with manipulated object dynamics.Advances in Neural Information Processing Systems, 37:79741–79763, 2024. 2, 3

work page 2024

[5] [5]

Hover: Versatile neural whole-body con- troller for humanoid robots

Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiao- long Wang, et al. Hover: Versatile neural whole-body con- troller for humanoid robots. InIEEE International Confer- ence on Robotics and Automation (ICRA), pages 9989–9996. IEEE, 2025. 2

work page 2025

[6] [6]

Towards immersive human-x interaction: A real-time framework for physically plausible motion synthesis

Kaiyang Ji, Ye Shi, Zichen Jin, Kangyi Chen, Lan Xu, Yuexin Ma, Jingyi Yu, and Jingya Wang. Towards immersive human-x interaction: A real-time framework for physically plausible motion synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10173–10183, 2025. 2, 3, 6

work page 2025

[7] [7]

Exbody2: Advanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196, 2024

Mazeyu Ji, Xuanbin Peng, Fangchen Liu, Jialong Li, Ge Yang, Xuxin Cheng, and Xiaolong Wang. Exbody2: Ad- vanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196, 2024. 2

work page arXiv 2024

[8] [8]

Intergen: Diffusion-based multi-human motion gener- ation under complex interactions.International Journal of Computer Vision, 132(9):3463–3483, 2024

Huan Liang, Wenyu Zhang, Wenhao Li, Jingyi Yu, and Lan Xu. Intergen: Diffusion-based multi-human motion gener- ation under complex interactions.International Journal of Computer Vision, 132(9):3463–3483, 2024. 3

work page 2024

[9] [9]

Phys- reaction: Physically plausible real-time humanoid reaction synthesis via forward dynamics guided 4d imitation

Yunze Liu, Changxi Chen, Chenjing Ding, and Li Yi. Phys- reaction: Physically plausible real-time humanoid reaction synthesis via forward dynamics guided 4d imitation. InPro- ceedings of the 32nd ACM International Conference on Mul- timedia, pages 3771–3780, 2024. 2, 3, 6

work page 2024

[10] [10]

Perpetual humanoid control for real-time simulated avatars

Zhengyi Luo, Jinkun Cao, Alexander Winkler, Kris Kitani, and Weipeng Xu. Perpetual humanoid control for real-time simulated avatars. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 10895–10904, 2023. 1, 2, 3, 4, 6, 12

work page 2023

[11] [11]

Amass: Archive of motion capture as surface shapes

Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Ger- ard Pons-Moll, and Michael J Black. Amass: Archive of motion capture as surface shapes. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5442–5451, 2019. 12

work page 2019

[12] [12]

Neural probabilistic motor primitives for humanoid control

Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, and Nicolas Manfred Otto Heess. Neural probabilistic motor primitives for humanoid control. InInternational Confer- ence on Learning Representations, pages 1–14, 2019. 3

work page 2019

[13] [13]

Learning predict-and-simulate policies from un- organized human motion data.ACM Transactions on Graph- ics, 38(6), 2019

Soohwan Park, Hoseok Ryu, Seyoung Lee, Sunmin Lee, and Jehee Lee. Learning predict-and-simulate policies from un- organized human motion data.ACM Transactions on Graph- ics, 38(6), 2019. 3

work page 2019

[14] [14]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3d hands, face, and body from a single image. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10975–10985, 2019. 6, 15

work page 2019

[15] [15]

Deepmimic: Example-guided deep reinforce- ment learning of physics-based character skills.ACM Trans- actions on Graphics, 37(4):143:1–143:14, 2018

Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. Deepmimic: Example-guided deep reinforce- ment learning of physics-based character skills.ACM Trans- actions on Graphics, 37(4):143:1–143:14, 2018. 3

work page 2018

[16] [16]

Amp: Adversarial motion priors for styl- ized physics-based character control.ACM Transactions on Graphics, 40(4):1–20, 2021

Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. Amp: Adversarial motion priors for styl- ized physics-based character control.ACM Transactions on Graphics, 40(4):1–20, 2021. 3, 13

work page 2021

[17] [17]

Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters.ACM Transactions on Graphics, 41(4), 2022

Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, and Sanja Fidler. Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters.ACM Transactions on Graphics, 41(4), 2022. 3

work page 2022

[18] [18]

A re- duction of imitation learning and structured prediction to no- regret online learning

St ´ephane Ross, Geoffrey Gordon, and Drew Bagnell. A re- duction of imitation learning and structured prediction to no- regret online learning. InProceedings of the fourteenth inter- national conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceed- ings, 2011. 5, 12, 13

work page 2011

[19] [19]

Hhi-assist: A dataset and benchmark of human-human interaction in phys- ical assistance scenario.IEEE Robotics and Automation Let- ters, 2025

Saeed Saadatnejad, Reyhaneh Hosseininejad, Jose Barreiros, Katherine M Tsui, and Alexandre Alahi. Hhi-assist: A dataset and benchmark of human-human interaction in phys- ical assistance scenario.IEEE Robotics and Automation Let- ters, 2025. 2, 3, 6

work page 2025

[20] [20]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.ArXiv, abs/1910.01108, 2019. 15

work page internal anchor Pith review Pith/arXiv arXiv 1910

[21] [21]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 5, 12

work page internal anchor Pith review Pith/arXiv arXiv 2017

[22] [22]

Human motion diffusion as a generative prior

Yoni Shafir, Guy Tevet, Roy Kapon, and Amit Haim Bermano. Human motion diffusion as a generative prior. InInternational Conference on Learning Representations, pages 1–17, 2024. 3

work page 2024

[23] [23]

Maskedmimic: Unified physics-based char- acter control through masked motion inpainting.ACM Trans- actions on Graphics, 43(6), 2024

Chen Tessler, Yunrong Guo, Ofir Nabati, Gal Chechik, and Xue Bin Peng. Maskedmimic: Unified physics-based char- acter control through masked motion inpainting.ACM Trans- actions on Graphics, 43(6), 2024. 1

work page 2024

[24] [24]

arXiv preprint arXiv:2505.19086 (2025) 2, 3

Chen Tessler, Yifeng Jiang, Erwin Coumans, Zhengyi Luo, Gal Chechik, and Xue Bin Peng. Maskedmanipulator: Versatile whole-body control for loco-manipulation.arXiv preprint arXiv:2505.19086, 2025. 5 9

work page arXiv 2025

[25] [25]

Human motion diffu- sion model

Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. Human motion diffu- sion model. InInternational Conference on Learning Repre- sentations, 2023. 8, 15

work page 2023

[26] [26]

Actformer: A gan- based transformer towards general action-conditioned 3d hu- man motion generation

Liang Xu, Ziyang Song, Dongliang Wang, Jing Su, Zhicheng Fang, Chenjing Ding, Weihao Gan, Yichao Yan, Xin Jin, Xi- aokang Yang, Wenjun Zeng, and Wei Wu. Actformer: A gan- based transformer towards general action-conditioned 3d hu- man motion generation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2228–2238, 2023. 3

work page 2023

[27] [27]

Inter-x: Towards versatile human- human interaction analysis

Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, et al. Inter-x: Towards versatile human- human interaction analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22260–22271, 2024. 2, 3, 6

work page 2024

[28] [28]

Regennet: Towards human action-reaction synthesis

Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, and Wenjun Zeng. Regennet: Towards human action-reaction synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1759–1769, 2024. 3

work page 2024

[29] [29]

Parc: Physics-based augmentation with reinforcement learning for character controllers

Michael Xu, Yi Shi, KangKang Yin, and Xue Bin Peng. Parc: Physics-based augmentation with reinforcement learning for character controllers. InProceedings of the Special Inter- est Group on Computer Graphics and Interactive Techniques Conference, number 131, pages 1–11, 2025. 2

work page 2025

[30] [30]

Intermimic: Towards universal whole-body control for physics-based human-object interactions

Sirui Xu, Hung Yu Ling, Yu-Xiong Wang, and Liang-Yan Gui. Intermimic: Towards universal whole-body control for physics-based human-object interactions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12266–12277, 2025. 1, 5

work page 2025

[31] [31]

The surprising effec- tiveness of ppo in cooperative multi-agent games.Advances in Neural Information Processing Systems, 35:24611–24624,

Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. The surprising effec- tiveness of ppo in cooperative multi-agent games.Advances in Neural Information Processing Systems, 35:24611–24624,

work page

[32] [32]

Simpoe: Simulated character control for 3d human pose estimation

Ye Yuan, Shih-En Wei, Tomas Simon, Kris Kitani, and Jason Saragih. Simpoe: Simulated character control for 3d human pose estimation. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7159–7169, 2021. 3

work page 2021

[33] [33]

Physdiff: Physics-guided human motion diffusion model.Proceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV), pages 16010–16021, 2023

Ye Yuan, Jiaming Song, Umar Iqbal, Arash Vahdat, and Jan Kautz. Physdiff: Physics-guided human motion diffusion model.Proceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV), pages 16010–16021, 2023. 3

work page 2023

[34] [34]

Trackanymotionsunderanydisturbances.arXivpreprintarXiv:2509.13833,

Zhikai Zhang, Jun Guo, Chao Chen, Jilong Wang, Chenghuai Lin, Yunrui Lian, Han Xue, Zhenrong Wang, Maoqi Liu, Jiangran Lyu, Huaping Liu, He Wang, and Li Yi. Track any motions under any disturbances.arXiv preprint arXiv:2509.13833, 2025. 2

work page arXiv 2025

[35] [35]

Resmimic: From general motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025

Siheng Zhao, Yanjie Ze, Yue Wang, C Karen Liu, Pieter Abbeel, Guanya Shi, and Rocky Duan. Resmimic: From general motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025. 1

work page arXiv 2025

[36] [36]

Humanoid parkour learning

Ziwen Zhuang, Shenzhe Yao, and Hang Zhao. Humanoid parkour learning. InProceedings of The 8th Conference on Robot Learning (CoRL), pages 1975–1991. PMLR, 2025. 1 10 Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning Supplementary Material A. Overview of the Supplementary Materials This supplementary document co...

work page arXiv 1975