pith. sign in

arxiv: 2604.07984 · v1 · submitted 2026-04-09 · 💻 cs.GR

Physics-Based Motion Tracking of Contact-Rich Interacting Characters

Pith reviewed 2026-05-10 17:41 UTC · model grok-4.3

classification 💻 cs.GR
keywords physics-based motion trackingcontact-rich interactionsprogressive neural networksmotion synthesisinteracting charactersautomatic expert assignmentcharacter animationphysics simulation
0
0 comments X

The pith

A progressive neural network assigns training samples to specialized experts automatically to enable stable physics-based tracking of contact-rich character interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to extend physics-based motion tracking to scenarios where multiple characters interact densely through contacts, such as in combat or collaborative movements. Single-character methods become unstable because of the forces exchanged at contact points, and the higher control requirements exceed typical model capacities. The proposed approach uses a progressive neural network that adds experts each specialized in progressively more difficult skills. Training data is routed to the right expert automatically, avoiding any need for manual scheduling of which expert handles what. This results in steadier tracking of complex interactive motions and quicker overall training of the model.

Core claim

By structuring the tracker as a progressive neural network with multiple experts, each handling skills of increasing difficulty, and training it so that samples are automatically assigned to the appropriate expert, the method achieves stable imitation of contact-rich interactions between characters in a physics simulation, outperforming extensions of single-character trackers in both stability and training efficiency.

What carries the argument

The progressive neural network (PNN) with automatically assigned experts that specialize in control demands of varying difficulty levels.

Load-bearing premise

The instability when extending single-character trackers to interactions comes mainly from contact force transfers, and a progressive expert setup can address the higher control needs without extra manual design.

What would settle it

Running the method on a test set of highly dynamic contact-rich interactions and checking if the motion tracking remains stable compared to a non-progressive baseline, or observing if expert assignment fails to specialize properly leading to collapse.

Figures

Figures reproduced from arXiv: 2604.07984 by Hubert P. H. Shum, Qianhui Men, Xiaotang Zhang, Ziyi Chang.

Figure 1
Figure 1. Figure 1: Physics-based motion tracking of two humanoid characters performing contact-rich interactions such as boxing, pushing, and grappling. The objective is to track and reproduce stable motions under frequent physical contacts and complex force exchanges. Abstract Motion tracking has been an important technique for imitating human-like movement from large-scale datasets in physics-based motion synthesis. Howeve… view at source ↗
Figure 2
Figure 2. Figure 2: Framework overview. We train a progressive learning model in which later experts build on the knowledge from earlier experts, but specializing in more challenging motions. The policy receives humanoid state and goal state, and outputs actions for the proportional derivative (PD) controller to generate torques. Experts are activated sequentially, with adapters enabling knowledge transfer and a gating networ… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison of tracking results across different models. From top to bottom, our method, MLP, MoE and PNN are shown that performs boxing interaction, respectively. Baseline models often exhibit instability or loss of balance under dense contact, while our method produces more stable and realistic interactions that closely follow the target motions. value losses, as they follow the standard PPO f… view at source ↗
Figure 4
Figure 4. Figure 4: Tracking under external perturbations with different ob￾ject masses (3 kg, 7 kg, 15 kg). As the perturbation strength in￾creases, the characters experience growing difficulty in maintain￾ing stable interaction [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Tracking under observation noise with different noise scales (0.1, 0.3, 0.7). Larger noise levels lead to instability and loss of balance in the interactions. high success rates, while baseline methods show marked degrada￾tion. Notably, PNN fails under perturbations due to its gating net￾work’s reliance on dataset-specific specialization, which does not generalize when noise shifts the input distribution. … view at source ↗
Figure 6
Figure 6. Figure 6: Tracking with interaction skill transitions. The target interaction shifts from spinning to boxing, and our system remains robust to these abrupt changes without failure [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Training reward curves of baselines and our method. Our approach shows smooth transitions between experts and faster convergence compared to PNN. returns once the model capacity surpasses the scale of InterHuman. This analysis confirms that four experts are sufficient for the present dataset, and that our automatic routing strategy naturally balances capacity and efficiency without requiring manual dataset… view at source ↗
Figure 8
Figure 8. Figure 8: Ablation on the number of experts in our progressive framework. With only 1–2 experts, the characters often fail to maintain stable interactions. Adding more experts (3–4) improves tracking quality, showing that later experts specialize in handling more challenging motion dynamics. termination (e.g., falling or large tracking error) or truncation at the maximum allowed horizon [PITH_FULL_IMAGE:figures/ful… view at source ↗
read the original abstract

Motion tracking has been an important technique for imitating human-like movement from large-scale datasets in physics-based motion synthesis. However, existing approaches focus on tracking either single character or a particular type of interaction, limiting their ability to handle contact-rich interactions. Extending single-character tracking approaches suffers from the instability due to the challenge of forces transferred through contacts. Contact-rich interactions requires levels of control, which places much greater demands on model capacity. To this end, we propose a robust tracking method based on progressive neural network (PNN) where multiple experts are specialized in learning skills of various difficulties. Our method learns to assign training samples to experts automatically without requiring manually scheduling. Both qualitative and quantitative results show that our method delivers more stable motion tracking in densely interactive movements while enabling more efficient model training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a physics-based motion tracking approach for contact-rich interactions between multiple characters. It identifies instability in extending single-character trackers as arising from contact force transfers and increased control demands, and addresses this via a Progressive Neural Network (PNN) architecture in which multiple experts specialize in skills of varying difficulty levels. The method claims to learn automatic assignment of training samples to experts without manual scheduling, yielding more stable tracking in densely interactive motions and more efficient training.

Significance. If the stability and efficiency claims hold under rigorous evaluation, the work could advance scalable physics-based multi-character animation by reducing reliance on manual expert scheduling or custom contact/reward engineering. The automatic specialization via PNN offers a capacity-adaptive alternative to monolithic policies, with potential applicability to games, simulation, and robotics. However, the absence of detailed metrics, baselines, and ablation studies in the evaluation limits the assessed impact.

major comments (3)
  1. [Abstract and §4] Abstract and §4: The abstract states that 'both qualitative and quantitative results show... more stable motion tracking' but provides no concrete metrics (e.g., joint-position RMSE, contact-force error, or penetration depth), no list of baselines, no data-exclusion criteria, and no error analysis or trial counts. This information is load-bearing for the central stability claim and cannot be verified from the text.
  2. [§3.1] §3.1: The claim that 'extending single-character tracking approaches suffers from the instability due to the challenge of forces transferred through contacts' is presented without an ablation that isolates contact-force propagation from model-capacity limits. If standard single-body contact solvers and rewards are retained (as implied by the lack of described changes), expert specialization alone may increase capacity without mitigating the stated source of instability.
  3. [§3.2] §3.2: The PNN description does not specify the underlying physics engine, contact-resolution algorithm, or multi-character reward formulation. Without these details, it is impossible to assess whether the automatic expert assignment actually resolves force-transfer issues or merely scales model capacity.
minor comments (2)
  1. [Figures] Figure captions and axis labels in the qualitative results could more explicitly indicate the number of characters, contact density, and failure modes being compared.
  2. [§3] The notation for progressive layer addition and sample-to-expert routing would benefit from a short pseudocode block or diagram for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. We address each major comment point by point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4: The abstract states that 'both qualitative and quantitative results show... more stable motion tracking' but provides no concrete metrics (e.g., joint-position RMSE, contact-force error, or penetration depth), no list of baselines, no data-exclusion criteria, and no error analysis or trial counts. This information is load-bearing for the central stability claim and cannot be verified from the text.

    Authors: We agree that the abstract would benefit from greater specificity. In the revised manuscript we will update the abstract to report the key quantitative metrics (joint-position RMSE, contact-force error, and penetration depth) that support the stability claim, along with a brief mention of the baselines used. Section 4 already contains the full quantitative evaluation with baselines (single-character trackers and monolithic policies), trial counts, and error analysis; we will add an explicit subsection on data-exclusion criteria and statistical reporting to make these details immediately verifiable from the main text. revision: yes

  2. Referee: [§3.1] §3.1: The claim that 'extending single-character tracking approaches suffers from the instability due to the challenge of forces transferred through contacts' is presented without an ablation that isolates contact-force propagation from model-capacity limits. If standard single-body contact solvers and rewards are retained (as implied by the lack of described changes), expert specialization alone may increase capacity without mitigating the stated source of instability.

    Authors: The referee correctly notes that an explicit ablation isolating contact-force effects from capacity would strengthen the argument. While our existing experiments demonstrate that simply enlarging a monolithic policy does not resolve the observed instabilities, we will add a dedicated ablation study in the revision that directly compares a high-capacity monolithic policy against the PNN under identical contact solvers and reward formulations. This will clarify whether the observed gains stem from automatic expert specialization rather than capacity alone. revision: yes

  3. Referee: [§3.2] §3.2: The PNN description does not specify the underlying physics engine, contact-resolution algorithm, or multi-character reward formulation. Without these details, it is impossible to assess whether the automatic expert assignment actually resolves force-transfer issues or merely scales model capacity.

    Authors: We will revise §3.2 to explicitly document the simulation details: the MuJoCo physics engine with its default multi-body contact solver, and the multi-character reward formulation (weighted sum of per-character pose/velocity tracking errors plus inter-character contact consistency and penetration penalties). These elements were described at a high level in the supplementary material; we will integrate the precise formulations into the main text so readers can evaluate how the PNN interacts with the underlying physics and reward structure. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper's core proposal is a progressive neural network (PNN) architecture with automatic expert specialization for contact-rich multi-character motion tracking. The abstract and provided text describe the motivation (instability from contact force transfer in single-character trackers) and the method (multiple experts learning skills of varying difficulty, with automatic sample assignment) without presenting equations, fitted parameters, or self-citations that reduce the claimed stability/efficiency gains to inputs by construction. No load-bearing steps match the enumerated circularity patterns: there are no self-definitional relations, no 'predictions' that are statistically forced by prior fits, and no uniqueness theorems or ansatzes imported via self-citation. Experimental results are presented as independent validation rather than tautological outputs. The derivation chain therefore stands on its own architectural and empirical content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard assumption that physics simulation provides a reliable forward model and that motion capture data can be treated as target trajectories. No new physical constants or invented entities are introduced.

axioms (1)
  • domain assumption Physics simulation accurately models contact forces between characters
    Invoked when stating that force transfer through contacts causes instability in single-character trackers.

pith-pipeline@v0.9.0 · 5435 in / 1103 out tokens · 51883 ms · 2026-05-10T17:41:23.450343+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    On the design fundamentals of diffusion models: A survey

    [CKCS26] CHANG, ZIYI, KOULIERIS, GEORGEA, CHANG, HYUNGJIN, and SHUM, HUBERTPH. “On the design fundamentals of diffusion models: A survey”.Pattern Recognition169 (2026), 111934

  2. [2]

    Physics-based motion capture imitation with deep reinforcement learning

    [CMM*18] CHENTANEZ, NUTTAPONG, MÜLLER, MATTHIAS, MACK- LIN, MILES, et al. “Physics-based motion capture imitation with deep reinforcement learning”.Proceedings of the 11th ACM SIGGRAPH Con- ference on Motion, Interaction and Games. 2018, 1–10

  3. [3]

    & Luković, M

    ISBN: 9798400715402.DOI:10.1145/3721238.3730750.URL: https://doi.org/10.1145/3721238.37307503. [FBH21] FUSSELL, LEVI, BERGAMIN, KEVIN, and HOLDEN, DANIEL. “Supertrack: Motion tracking for physically simulated characters us- ing supervised learning”.ACM Transactions on Graphics (TOG)40.6 (2021), 1–13

  4. [4]

    Superpadl: Scaling language-directed physics-based control with progressive supervised distillation

    [JGFP24] JURAVSKY, JORDAN, GUO, YUNRONG, FIDLER, SANJA, and PENG, XUEBIN. “Superpadl: Scaling language-directed physics-based control with progressive supervised distillation”.ACM SIGGRAPH 2024 Conference Papers. 2024, 1–11

  5. [5]

    Omnigrasp: Grasping diverse objects with simulated humanoids

    [LCC*24] LUO, ZHENGYI, CAO, JINKUN, CHRISTEN, SAMMY, et al. “Omnigrasp: Grasping diverse objects with simulated humanoids”.Ad- vances in Neural Information Processing Systems37 (2024), 2161– 2184

  6. [6]

    PhysReaction: Physically plausible real-time humanoid reaction synthesis via forward dynamics guided 4d imitation

    [LCDY24] LIU, YUNZE, CHEN, CHANGXI, DING, CHENJING, and YI, LI. “PhysReaction: Physically plausible real-time humanoid reaction synthesis via forward dynamics guided 4d imitation”.Proceedings of the 32nd ACM International Conference on Multimedia. 2024, 3771–3780

  7. [7]

    Real-time simulated avatar from head-mounted sensors

    © 2026 Eurographics - The European Association for Computer Graphics and John Wiley & Sons Ltd. X. Zhang & Z. Chang & Q. Men & H. P . H. Shum / Physics-Based Motion Tracking of Contact-Rich Interacting Characters9 of 9 [LCK*24] LUO, ZHENGYI, CAO, JINKUN, KHIRODKAR, RAWAL, et al. “Real-time simulated avatar from head-mounted sensors”.Proceedings of the IEE...

  8. [8]

    Perpetual humanoid control for real-time simulated avatars

    [LCKX*23] LUO, ZHENGYI, CAO, JINKUN, KITANI, KRIS, XU, WEIPENG, et al. “Perpetual humanoid control for real-time simulated avatars”.Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023, 10895–10904 1–3,

  9. [9]

    Available: https://arxiv.org/abs/2310.04582

    [LCM*23] LUO, ZHENGYI, CAO, JINKUN, MEREL, JOSH, et al. “Univer- sal humanoid motion representations for physics-based control”.arXiv preprint arXiv:2310.04582(2023)

  10. [10]

    Sm- plolympics: Sports environments for physically simulated humanoids

    [LWL*24] LUO, ZHENGYI, WANG, JIASHUN, LIU, KANGNI, et al. “Sm- plolympics: Sports environments for physically simulated humanoids”. arXiv preprint arXiv:2407.00187(2024) 1–3. [LYW*25] LUO, ZHENGYI, YUAN, YE, WANG, TINGWU, et al. “Sonic: Supersizing motion tracking for natural humanoid whole-body control”. arXiv preprint arXiv:2511.07820(2025)

  11. [11]

    In- tergen: Diffusion-based multi-human motion generation under complex interactions

    [LZL*24] LIANG, HAN, ZHANG, WENQIAN, LI, WENXUAN, et al. “In- tergen: Diffusion-based multi-human motion generation under complex interactions”.International Journal of Computer Vision(2024), 1–21 2,

  12. [12]

    Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Au- tomation Letters, 8(6):3740–3747, June 2023

    [MYY*23] MITTAL, MAYANK, YU, CALVIN, YU, QINXI, et al. “Orbit: A Unified Simulation Framework for Interactive Robot Learning Environ- ments”.IEEE Robotics and Automation Letters8.6 (2023), 3740–3747. DOI:10.1109/LRA.2023.32700345. [PALV18] PENG, XUEBIN, ABBEEL, PIETER, LEVINE, SERGEY, and VAN DEPANNE, MICHIEL. “Deepmimic: Example-guided deep rein- forceme...

  13. [13]

    Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters

    [PGH*22] PENG, XUEBIN, GUO, YUNRONG, HALPER, LINA, et al. “Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters”.ACM Transactions On Graphics (TOG)41.4 (2022), 1–17

  14. [14]

    Learning predict-and-simulate policies from unorganized human mo- tion data

    [PRL*19] PARK, SOOHWAN, RYU, HOSEOK, LEE, SEYOUNG, et al. “Learning predict-and-simulate policies from unorganized human mo- tion data”.ACM Transactions on Graphics (TOG)38.6 (2019), 1–11

  15. [15]

    Progressive Neural Networks

    [RRD*16] RUSU, ANDREIA, RABINOWITZ, NEILC, DESJARDINS, GUILLAUME, et al. “Progressive Neural Networks”. (2016) 2, 4,

  16. [16]

    Diffmimic: Efficient motion mimicking with differentiable physics

    [RYC*23] REN, JIAWEI, YU, CUNJUN, CHEN, SIWEI, et al. “Diffmimic: Efficient motion mimicking with differentiable physics”.arXiv preprint arXiv:2304.03274(2023)

  17. [17]

    Maskedmimic: Unified physics-based character control through masked motion inpainting

    [TGN*24] TESSLER, CHEN, GUO, YUNRONG, NABATI, OFIR, et al. “Maskedmimic: Unified physics-based character control through masked motion inpainting”.ACM Transactions on Graphics (TOG)43.6 (2024), 1–21 2, 3,

  18. [18]

    CLoSD: Closing the Loop between Simulation and Diffusion for multi- task character control

    [TRC*25] TEVET, GUY, RAAB, SIGAL, COHAN, SETAREH, et al. “CLoSD: Closing the Loop between Simulation and Diffusion for multi- task character control”.The Thirteenth International Conference on Learning Representations. 2025

  19. [19]

    A scalable approach to control diverse behaviors for physi- cally simulated characters

    [WGH20] WON, JUNGDAM, GOPINATH, DEEPAK, and HODGINS, JES- SICA. “A scalable approach to control diverse behaviors for physi- cally simulated characters”.ACM Transactions on Graphics (TOG)39.4 (2020), 33–1

  20. [20]

    Control strategies for physically simulated characters performing two-player competitive sports

    [WGH21] WON, JUNGDAM, GOPINATH, DEEPAK, and HODGINS, JES- SICA. “Control strategies for physically simulated characters performing two-player competitive sports”.ACM Transactions on Graphics (TOG) 40.4 (2021), 1–11 1,

  21. [21]

    Unicon: Universal neural controller for physics-based character motion.arXiv preprint arXiv:2011.15119, 2020

    [WGSF20] WANG, TINGWU, GUO, YUNRONG, SHUGRINA, MARIA, and FIDLER, SANJA. “Unicon: Universal neural controller for physics-based character motion”.arXiv preprint arXiv:2011.15119(2020)

  22. [22]

    Learning Soccer Juggling Skills with Layer-wise Mixture-of-Experts.(2022)

    [XSLvdP22] XIE, ZHAOMING, STARKE, SEBASTIAN, LING, HUNGYU, and van de PANNE, MICHIEL. “Learning Soccer Juggling Skills with Layer-wise Mixture-of-Experts.(2022)”. (2022)

  23. [23]

    Parc: Physics-based augmentation with reinforcement learning for character controllers

    [XSYP25] XU, MICHAEL, SHI, YI, YIN, KANGKANG, and PENG, XUE BIN. “Parc: Physics-based augmentation with reinforcement learning for character controllers”.Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers. 2025, 1–11

  24. [24]

    Residual force control for agile human behavior imitation and extended motion synthesis

    [YK20] YUAN, YEand KITANI, KRIS. “Residual force control for agile human behavior imitation and extended motion synthesis”.Advances in Neural Information Processing Systems33 (2020), 21763–21774

  25. [25]

    MAAIP: Multi-Agent Adversarial Interaction Priors for imitation from fighting demonstrations for physics-based characters

    [YKK*23] YOUNES, MOHAMED, KIJAK, EWA, KULPA, RICHARD, et al. “MAAIP: Multi-Agent Adversarial Interaction Priors for imitation from fighting demonstrations for physics-based characters”.Proceed- ings of the ACM on Computer Graphics and Interactive Techniques6.3 (2023), 1–20

  26. [26]

    Motion In-Betweening for Densely Interacting Characters

    [ZCMS25a] ZHANG, XIAOTANG, CHANG, ZIYI, MEN, QIANHUI, and SHUM, HUBERTP. H. “Motion In-Betweening for Densely Interacting Characters”.Proceedings of the SIGGRAPH Asia 2025 Conference Pa- pers. SA Conference Papers ’25. Association for Computing Machinery, 2025.ISBN: 9798400721373.DOI:10.1145/3757377.3763950. URL:https://doi.org/10.1145/3757377.37639503. [...

  27. [27]

    2025, e70222

    Wiley Online Library. 2025, e70222

  28. [28]

    Simulation and retargeting of complex multi-character interactions

    [ZGY*23] ZHANG, YUNBO, GOPINATH, DEEPAK, YE, YUTING, et al. “Simulation and retargeting of complex multi-character interactions”. ACM SIGGRAPH 2023 Conference Proceedings. 2023, 1–11

  29. [29]

    Neural categorical priors for physics-based character control

    [ZZLH23] ZHU, QINGXU, ZHANG, HE, LAN, MENGTING, and HAN, LEI. “Neural categorical priors for physics-based character control”. ACM Transactions on Graphics (TOG)42.6 (2023), 1–16 1,

  30. [30]

    © 2026 Eurographics - The European Association for Computer Graphics and John Wiley & Sons Ltd