Recognition: unknown
ReActor: Reinforcement Learning for Physics-Aware Motion Retargeting
Pith reviewed 2026-05-08 08:44 UTC · model grok-4.3
The pith
A bilevel optimization framework retargets human motions to robots by jointly training a tracking policy with reinforcement learning inside physics simulation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By casting retargeting as the upper level of a bilevel program whose lower level trains a reinforcement-learning policy to track the retargeted trajectory inside a physics simulator, and by supplying an approximate gradient for the upper-level objective, the framework discovers retargeting parameters that preserve the original motion's character while guaranteeing physical feasibility from only a sparse set of semantic rigid-body matches.
What carries the argument
Bilevel optimization that couples an upper-level retargeting parameterization with a lower-level reinforcement-learning policy trained to track the motion under full physics simulation, using an approximate gradient to make the outer optimization tractable.
If this is right
- Only a sparse set of semantic rigid-body correspondences is required instead of dense or manual feature matching.
- Retargeting parameters are identified automatically, removing the need for per-motion manual tuning.
- The adapted motions are free of physical inconsistencies such as foot sliding or dynamically infeasible forces.
- The motions support robust imitation learning that transfers from simulation to hardware.
- The same pipeline works across morphologies that differ substantially from the human source, including quadrupeds.
Where Pith is reading between the lines
- Embedding physics simulation inside the retargeting loop may shrink the usual sim-to-real gap for imitation policies.
- The bilevel structure could be reused for other motion-adaptation tasks where control and morphology changes must be solved together.
- Once trained, the retargeting parameters might serve as a fast feed-forward adapter for new reference motions without re-optimizing.
- Extending the approach to online retargeting during execution could let robots adjust human-like motions on the fly.
Load-bearing premise
The approximate gradient computed for the upper-level loss is accurate enough to drive the bilevel optimizer to useful parameter values without trapping it in poor local minima or demanding prohibitive computation.
What would settle it
Running the method on a new robot morphology and observing that the output motion still exhibits foot sliding, self-collisions, or cannot be tracked by the learned policy without further manual fixes would show the central claim is false.
Figures
read the original abstract
Retargeting human kinematic reference motion onto a robot's morphology remains a formidable challenge. Existing methods often produce physical inconsistencies, such as foot sliding, self-collisions, or dynamically infeasible motions, which hinder downstream imitation learning. We propose a bilevel optimization framework that jointly adapts reference motions to a robot's morphology while training a tracking policy using reinforcement learning. To make the optimization tractable, we derive an approximate gradient for the upper-level loss. Our framework requires only a sparse set of semantic rigid-body correspondences and eliminates the need for manual tuning by identifying optimal values for a parameterization expressive enough to preserve characteristic motion across different embodiments. Moreover, by integrating retargeting directly with physics simulation, we produce physically plausible motions that facilitate robust imitation learning. We validate our method in simulation and on hardware, demonstrating challenging motions for morphologies that differ significantly from a human, including retargeting onto a quadruped.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ReActor, a bilevel optimization framework for physics-aware motion retargeting that jointly adapts a human reference motion to a target robot morphology (via a parameterization) while training a tracking policy with reinforcement learning. An approximate gradient is derived for the upper-level loss to make the optimization tractable; the method requires only sparse semantic rigid-body correspondences, claims to eliminate manual tuning by automatically identifying optimal parameterization values that preserve characteristic motion, and integrates retargeting directly with physics simulation to produce plausible motions suitable for imitation learning. Validation is reported in simulation and on hardware for morphologies including quadrupeds that differ significantly from human form.
Significance. If the central claims hold, the work would be significant for robotics motion retargeting and imitation learning: it automates what is typically a manual, embodiment-specific process, directly enforces physical plausibility via simulation, and demonstrates results on challenging cross-embodiment cases (e.g., human-to-quadruped). The bilevel formulation that couples retargeting and policy training is a constructive idea that could reduce downstream sim-to-real gaps.
major comments (2)
- [§3.2] §3.2 (Bilevel Optimization and Approximate Gradient): The derivation of the approximate gradient for the upper-level loss is load-bearing for the tractability and automation claims, yet the manuscript provides no quantitative validation of its accuracy (e.g., comparison to finite differences on a simplified case, error bounds, or ablation against exact gradients where feasible). Without this, it is unclear whether the optimizer reliably reaches solutions that preserve characteristic motion or merely converges to local minima that still require post-hoc tuning.
- [§5] §5 (Experiments, quadruped and hardware results): The reported success on morphologies differing significantly from human form rests on qualitative and aggregate metrics, but lacks direct quantitative comparison to hand-tuned baselines or alternative retargeting methods on load-bearing physical-plausibility measures (foot sliding, self-collision rate, torque feasibility). This weakens the claim that the automatic parameterization eliminates manual intervention.
minor comments (2)
- [§3] The notation for the retargeting parameterization (free parameters, bounds, and how semantic correspondences map to them) is introduced piecemeal; a consolidated table or diagram in §3 would improve readability.
- [Figures 4-6] Figure captions for qualitative motion results could explicitly state the source human motion, target robot, and any post-processing applied, to allow readers to assess preservation of characteristic features.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and describe the revisions we will incorporate to strengthen the work.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Bilevel Optimization and Approximate Gradient): The derivation of the approximate gradient for the upper-level loss is load-bearing for the tractability and automation claims, yet the manuscript provides no quantitative validation of its accuracy (e.g., comparison to finite differences on a simplified case, error bounds, or ablation against exact gradients where feasible). Without this, it is unclear whether the optimizer reliably reaches solutions that preserve characteristic motion or merely converges to local minima that still require post-hoc tuning.
Authors: We agree that explicit quantitative validation of the approximate gradient would strengthen the tractability claims. The derivation enables the bilevel optimization to remain computationally feasible, and the successful retargeting results across embodiments provide indirect empirical support. However, we acknowledge the absence of direct comparisons such as finite-difference checks or error bounds in the current manuscript. In the revised version, we will add an ablation study on simplified cases that compares the approximate gradient to finite differences and exact gradients (where feasible), including quantitative error metrics and analysis of convergence to characteristic-motion-preserving solutions. revision: yes
-
Referee: [§5] §5 (Experiments, quadruped and hardware results): The reported success on morphologies differing significantly from human form rests on qualitative and aggregate metrics, but lacks direct quantitative comparison to hand-tuned baselines or alternative retargeting methods on load-bearing physical-plausibility measures (foot sliding, self-collision rate, torque feasibility). This weakens the claim that the automatic parameterization eliminates manual intervention.
Authors: We appreciate the call for stronger quantitative evidence on physical plausibility. The current experiments emphasize feasibility on challenging cross-embodiment cases (including human-to-quadruped) using qualitative demonstrations and aggregate metrics to highlight the automation benefit. We agree that direct comparisons on specific measures would better support the elimination of manual tuning. In the revision, we will add quantitative evaluations against hand-tuned baselines and alternative retargeting methods, reporting foot-sliding distances, self-collision rates, and torque feasibility for the quadruped simulation and hardware results. revision: yes
Circularity Check
No significant circularity; derivation relies on external physics simulation and standard bilevel RL optimization.
full rationale
The paper's core contribution is a bilevel optimization that jointly performs motion retargeting and RL policy training, using an approximate gradient derived for tractability. This does not reduce any claimed prediction or result to its inputs by construction, nor does it rely on self-definitional parameters, fitted inputs renamed as predictions, or load-bearing self-citations. The framework explicitly integrates external physics simulation and requires only sparse rigid-body correspondences as input, with the optimization discovering parameterization values rather than presupposing them. No ansatz is smuggled via citation, and no uniqueness theorem from prior author work is invoked to force the method. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- retargeting parameterization
axioms (1)
- domain assumption Physics simulator provides sufficiently accurate dynamics for the target robot morphology
Reference graph
Works this paper leans on
-
[1]
and Pons-Moll, Gerard and Black, Michael J
Mahmood, Naureen and Ghorbani, Nima and Troje, Nikolaus F. and Pons-Moll, Gerard and Black, Michael J. , year =. doi:10.48550/arXiv.1904.03278 , abstract =
-
[2]
A. IEEE Robot. Autom. Lett. , author =. 2021 , file =. doi:10.1109/LRA.2021.3056030 , abstract =
-
[3]
Yang, Lujie and Huang, Xiaoyu and Wu, Zhen and Kanazawa, Angjoo and Abbeel, Pieter and Sferrazza, Carmelo and Liu, C. Karen and Duan, Rocky and Shi, Guanya , year =. doi:10.48550/arXiv.2509.26633 , abstract =
-
[4]
ACM Trans. Graph. , author =. 2018 , file =. doi:10.1145/3197517.3201311 , abstract =
-
[5]
Proximal Policy Optimization Algorithms
Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg , year =. Proximal. doi:10.48550/arXiv.1707.06347 , abstract =
work page internal anchor Pith review doi:10.48550/arxiv.1707.06347
-
[6]
Grandia, Ruben and Knoop, Espen and Hopkins, Michael and Wiedebach, Georg and Bishop, Jared and Pickles, Steven and Müller, David and Bächer, Moritz , year =. Design and. Robotics:. doi:10.15607/RSS.2024.XX.103 , abstract =
-
[7]
Hopkins AND Georg Wiedebach AND Jared Bishop AND Steven Pickles AND David Müller AND Moritz Bächer , TITLE =
Ruben Grandia AND Espen Knoop AND Michael A. Hopkins AND Georg Wiedebach AND Jared Bishop AND Steven Pickles AND David Müller AND Moritz Bächer , TITLE =. Proceedings of Robotics: Science and Systems , YEAR =
-
[8]
ACM Trans. Graph. , author =. 2022 , file =. doi:10.1145/3528223.3530110 , abstract =
-
[9]
ACM Trans. Graph. , author =. 2021 , file =. doi:10.1145/3450626.3459670 , abstract =
-
[10]
Liao, Qiayuan and Truong, Takara E. and Huang, Xiaoyu and Gao, Yuman and Tevet, Guy and Sreenath, Koushil and Liu, C. Karen , year =. doi:10.48550/arXiv.2508.08241 , abstract =
-
[11]
Symp. Comput. Anim. , author =. 2024 , file =. doi:10.1111/cgf.15175 , abstract =
-
[12]
Cheng, Xuxin and Shi, Kexin and Agarwal, Ananye and Pathak, Deepak , year =. Extreme. doi:10.1109/ICRA57147.2024.10610200 , abstract =
-
[13]
doi:10.1145/3721238.3730621 , abstract =
Gat, Inbar and Raab, Sigal and Tevet, Guy and Reshef, Yuval and Bermano, Amit Haim and Cohen-Or, Daniel , year =. doi:10.1145/3721238.3730621 , abstract =
-
[14]
Robust motion in-betweening , year =
Robust motion in-betweening , volume =. ACM Trans. Graph. , author =. 2020 , file =. doi:10.1145/3386569.3392480 , abstract =
-
[15]
Learning agile and dynamic motor skills for legged robots , volume =. Sci. Robot. , author =. 2019 , file =. doi:10.1126/scirobotics.aau5872 , abstract =
-
[16]
Character controllers using motion. ACM Trans. Graph. , author =. 2020 , file =. doi:10.1145/3386569.3392422 , abstract =
-
[17]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Luo, Zhengyi and Cao, Jinkun and Winkler, Alexander and Kitani, Kris and Xu, Weipeng , year =. Perpetual. doi:10.1109/ICCV51070.2023.01000 , abstract =
-
[18]
Serifi, Agon and Grandia, Ruben and Knoop, Espen and Gross, Markus and Bächer, Moritz , year =. Robot. doi:10.1145/3680528.3687626 , abstract =
-
[19]
A scalable approach to control diverse behaviors for physically simulated characters , volume =. ACM Trans. Graph. , author =. 2020 , file =. doi:10.1145/3386569.3392381 , abstract =
-
[20]
ACM Trans. Graph. , author =. 2023 , file =. doi:10.1145/3592454 , abstract =
-
[21]
Araujo, Joao Pedro and Ze, Yanjie and Xu, Pei and Wu, Jiajun and Liu, C. Karen , year =. Retargeting. doi:10.48550/arXiv.2510.02252 , abstract =
-
[22]
doi:10.1145/3610548.3618206 , abstract =
Lee, Sunmin and Kang, Taeho and Park, Jungnam and Lee, Jehee and Won, Jungdam , year =. doi:10.1145/3610548.3618206 , abstract =
-
[23]
doi:10.1145/3610548.3618255 , abstract =
Li, Tianyu and Won, Jungdam and Clegg, Alexander and Kim, Jeonghwan and Rai, Akshara and Ha, Sehoon , year =. doi:10.1145/3610548.3618255 , abstract =
-
[24]
Motion. IEEE Trans. Robot. , author =. 2017 , file =. doi:10.1109/TRO.2017.2752711 , abstract =
-
[25]
Online and markerless motion retargeting with kinematic constraints , doi =
Dariush, Behzad and Gienger, Michael and Arumbakkam, Arjun and Goerick, Christian and Zhu, Youding and Fujimura, Kikuo , year =. Online and markerless motion retargeting with kinematic constraints , doi =
-
[26]
Darvish, Kourosh and Tirupachuri, Yeshasvi and Romualdi, Giulio and Rapetti, Lorenzo and Ferigo, Diego and Chavez, Francisco Javier Andrade and Pucci, Daniele , year =. Whole-. doi:10.1109/Humanoids43949.2019.9035059 , abstract =
-
[27]
and Hodgins, J.K
Pollard, N.S. and Hodgins, J.K. and Riley, M.J. and Atkeson, C.G. , year =. Adapting human motion for the control of a humanoid robot , doi =
-
[28]
A. ASME Int. Mech. Eng. Congr. Expo. , author =. 2015 , file =. doi:10.1115/IMECE2014-37700 , abstract =
-
[29]
Vibration-minimizing motion retargeting for robotic characters , volume =. ACM Trans. Graph. , author =. 2019 , file =. doi:10.1145/3306346.3323034 , abstract =
-
[30]
doi:10.1145/3550471.3564762 , abstract =
Kim, Sunwoo and Sorokin, Maks and Lee, Jehee and Ha, Sehoon , year =. doi:10.1145/3550471.3564762 , abstract =
-
[31]
Chen, Xingyu and Wu, Hanyu and Wu, Sikai and Zhou, Mingliang and Xiang, Diyun and Zhang, Haodong , year =. Implicit. doi:10.48550/arXiv.2509.15443 , abstract =
-
[32]
Spatio-. IEEE Trans. Robot. , author =. 2025 , file =. doi:10.1109/TRO.2025.3600123 , abstract =
-
[33]
and Serifi, Agon and Grandia, Ruben and Müller, David and Knoop, Espen and Bächer, Moritz , year =
Alegre, Lucas N. and Serifi, Agon and Grandia, Ruben and Müller, David and Knoop, Espen and Bächer, Moritz , year =. doi:10.1145/3721238.3730656 , abstract =
-
[34]
doi:10.1109/Humanoids57100.2023.10375150 , abstract =
Yan, Yashuai and Mascaro, Esteve Valls and Lee, Dongheui , year =. doi:10.1109/Humanoids57100.2023.10375150 , abstract =
-
[35]
Villegas, Ruben and Yang, Jimei and Ceylan, Duygu and Lee, Honglak , year =. Neural. doi:10.1109/CVPR.2018.00901 , abstract =
-
[36]
Brit. Mach. Vis. Conf. , author =. 2019 , file =
2019
-
[37]
Skeleton-aware networks for deep motion retargeting , volume =. ACM Trans. Graph. , author =. 2020 , file =. doi:10.1145/3386569.3392462 , abstract =
-
[38]
Proc. AAAI Conf. Artif. Intell. , author =. 2022 , file =. doi:10.1609/aaai.v36i3.20274 , abstract =
-
[39]
Pose-. IEEE Trans. Vis. Comput. Graph. , author =. 2024 , file =. doi:10.1109/TVCG.2023.3277918 , abstract =
-
[40]
Zhang, Jiaxu and Weng, Junwu and Kang, Di and Zhao, Fang and Huang, Shaoli and Zhe, Xuefei and Bao, Linchao and Shan, Ying and Wang, Jue and Tu, Zhigang , year =. Skinned. doi:10.1109/CVPR52729.2023.01332 , abstract =
-
[41]
Skinned. IEEE Trans. Vis. Comput. Graph. , author =. 2025 , file =. doi:10.1109/TVCG.2024.3423426 , abstract =
-
[42]
Multicontact. IEEE/ASME Trans. Mechatron. , author =. 2022 , file =. doi:10.1109/TMECH.2022.3152844 , abstract =
-
[43]
doi:10.1145/3757377.3763811 , abstract =
Chen, Ling-Hao and Zhang, Yuhong and Yin, Zixin and Dou, Zhiyang and Chen, Xin and Wang, Jingbo and Komura, Taku and Zhang, Lei , year =. doi:10.1145/3757377.3763811 , abstract =
-
[44]
Residual
Yuan, Ye and Kitani, Kris , year =. Residual. Advances in
-
[45]
Automatic rigging and animation of. ACM Trans. Graph. , author =. 2007 , file =. doi:10.1145/1276377.1276467 , abstract =
-
[46]
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
Clevert, Djork-Arné and Unterthiner, Thomas and Hochreiter, Sepp , year =. Fast and. doi:10.48550/arXiv.1511.07289 , abstract =
-
[47]
Learning to
Rudin, Nikita and Hoeller, David and Reist, Philipp and Hutter, Marco , year =. Learning to
-
[48]
He, Tairan and Luo, Zhengyi and Xiao, Wenli and Zhang, Chong and Kitani, Kris and Liu, Changliu and Shi, Guanya , year =. Learning. doi:10.1109/IROS58592.2024.10801984 , abstract =
-
[49]
Advances in
Xie, Weiji and Han, Jinrui and Zheng, Jiakun and Li, Huanyu and Liu, Xinzhe and Shi, Jiyuan and Zhang, Weinan and Bai, Chenjia and Li, Xuelong , year =. Advances in
-
[50]
Fu, Zipeng and Zhao, Qingqing and Wu, Qi and Wetzstein, Gordon and Finn, Chelsea , year =
-
[51]
Allshire, Arthur and Choi, Hongsuk and Zhang, Junyi and McAllister, David and Zhang, Anthony and Kim, Chung Min and Darrell, Trevor and Abbeel, Pieter and Malik, Jitendra and Kanazawa, Angjoo , year =. Visual
-
[52]
Retargetting motion to new characters , doi =
Gleicher, Michael , year =. Retargetting motion to new characters , doi =
-
[53]
A hierarchical approach to interactive motion editing for human-like figures , doi =
Lee, Jehee and Shin, Sung Yong , year =. A hierarchical approach to interactive motion editing for human-like figures , doi =
-
[54]
A physically-based motion retargeting filter , volume =. ACM Trans. Graph. , author =. 2005 , file =. doi:10.1145/1037957.1037963 , abstract =
-
[55]
Physically based motion transformation , doi =
Popović, Zoran and Witkin, Andrew , year =. Physically based motion transformation , doi =
-
[56]
Motion adaptation based on character shape , volume =. Comput. Animat. Virtual Worlds , author =. 2008 , file =. doi:10.1002/cav.233 , abstract =
-
[57]
Spatial relationship preserving character motion adaptation , volume =. ACM Trans. Graph. , author =. 2010 , file =. doi:10.1145/1778765.1778770 , abstract =
-
[58]
j Terry and Zhao, Tong and Graesdal, Bernhard Paus and Kelestemur, Tarik and Wang, Jiuguang and Pang, Tao and Tedrake, Russ , year =
Yang, Lujie and Suh, H. j Terry and Zhao, Tong and Graesdal, Bernhard Paus and Kelestemur, Tarik and Wang, Jiuguang and Pang, Tao and Tedrake, Russ , year =. Physics-. Robotics:
-
[59]
Learning body shape variation in physics-based characters , volume =. ACM Trans. Graph. , author =. 2019 , file =. doi:10.1145/3355089.3356499 , abstract =
-
[60]
and DeWeese, John and Maynard, Jordan and van Prooijen, Kees , year =
Hecker, Chris and Raabe, Bernd and Enslow, Ryan W. and DeWeese, John and Maynard, Jordan and van Prooijen, Kees , year =. Real-time motion retargeting to highly varied user-created morphologies , doi =
-
[61]
Simulation of. Comput. Graph. Forum. , author =. 2008 , file =. doi:10.1111/j.1467-8659.2008.01134.x , abstract =
-
[62]
Motion capture-driven simulations that hit and react , doi =. Symp. Comput. Anim. , author =. 2002 , file =
2002
-
[63]
Functionality-. Comput. Graph. Forum. , author =. 2021 , file =. doi:10.1111/cgf.14191 , abstract =
-
[64]
Zhang, Yunbo and Gopinath, Deepak and Ye, Yuting and Hodgins, Jessica and Turk, Greg and Won, Jungdam , year =. Simulation and. doi:10.1145/3588432.3591491 , abstract =
-
[65]
Surface based motion retargeting by preserving spatial relationship , doi =
Liu, Zhiguang and Mucherino, Antonio and Hoyet, Ludovic and Multon, Franck , year =. Surface based motion retargeting by preserving spatial relationship , doi =
-
[66]
In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
Villegas, Ruben and Ceylan, Duygu and Hertzmann, Aaron and Yang, Jimei and Saito, Jun , year =. Contact-. doi:10.1109/ICCV48922.2021.00958 , abstract =
-
[67]
GitHub repository , howpublished =
ProtoMotions3: An Open-source Framework for Humanoid Simulation and Control , author =. GitHub repository , howpublished =. 2025 , publisher =
2025
-
[68]
Generalized biped walking control , volume =. ACM Trans. Graph. , author =. 2010 , file =. doi:10.1145/1778765.1781156 , abstract =
-
[69]
Discovery of complex behaviors through contact-invariant optimization , volume =. ACM Trans. Graph. , author =. 2012 , file =. doi:10.1145/2185520.2185539 , abstract =
-
[70]
and Wooten, Wayne L
Hodgins, Jessica K. and Wooten, Wayne L. and Brogan, David C. and O'Brien, James F. , year =. Animating human athletics , doi =
-
[71]
Online control of simulated humanoids using particle belief propagation , volume =. ACM Trans. Graph. , author =. 2015 , file =. doi:10.1145/2767002 , abstract =
-
[72]
ACM Trans. Graph. , author =. 2007 , file =. doi:10.1145/1276377.1276509 , abstract =
-
[73]
ACM Trans. Graph. , author =. 2022 , file =. doi:10.1145/3550454.3555434 , abstract =
-
[74]
Versatile. Comput. Graph. Forum. , author =. 2025 , file =. doi:10.1111/cgf.70018 , abstract =
-
[75]
Dou, Zhiyang and Chen, Xuelin and Fan, Qingnan and Komura, Taku and Wang, Wenping , year =. C·. doi:10.1145/3610548.3618205 , abstract =
-
[76]
A. SIAM J. Optim. , author =. 2023 , file =. doi:10.1137/20M1387341 , abstract =
-
[77]
Tan, Jie and Zhang, Tingnan and Coumans, Erwin and Iscen, Atil and Bai, Yunfei and Hafner, Danijar and Bohez, Steven and Vanhoucke, Vincent , year =. Sim-to-. Robotics:. doi:10.15607/RSS.2018.XIV.010 , abstract =
-
[78]
Physics-based. Proc. ACM Comput. Graph. Interact. Tech. , author =. 2023 , file =. doi:10.1145/3606928 , abstract =
-
[79]
, year =
Luo, Zhengyi and Hachiuma, Ryo and Yuan, Ye and Kitani, Kris M. , year =. Dynamics-regulated kinematic policy for egocentric pose estimation , abstract =. Advances in
-
[80]
ACM Trans. Graph. , author =. 2020 , file =. doi:10.1145/3414685.3417877 , abstract =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.