Model-Based Reinforcement Learning Exploits Passive Body Dynamics for High-Performance Biped Robot Locomotion
Pith reviewed 2026-05-10 11:38 UTC · model grok-4.3
The pith
Biped robots with passive elements learn high-performance locomotion by exploiting stable limit cycles
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that the training of the model with passive elements is highly affected by the attractor of the system. This leads trajectories to converge quickly to limit cycles, and although it takes a long time to obtain large rewards, the acquired locomotion is robust and energy-efficient. Robots with passive elements can efficiently acquire high-performance locomotion by utilizing stable limit cycles generated through dynamic interaction between the body and ground.
What carries the argument
The stable limit cycles created by dynamic interaction between passive body elements and the ground, which act as attractors that shape and accelerate the reinforcement learning process toward efficient policies.
If this is right
- Locomotion policies learned with passive elements prove more robust and energy-efficient than those from rigid models.
- Trajectories reach limit cycles rapidly because the passive dynamics create strong attractors.
- The approach demonstrates that passive body properties support high-performance bipedal gaits in model-based reinforcement learning.
- Implementing passive properties in robot bodies becomes a practical route to better embodied locomotion.
Where Pith is reading between the lines
- Physical robots that embed springs or other compliant elements directly in their structure could realize the same learning gains outside simulation.
- The same passive-dynamics principle might improve learning for other dynamic robot behaviors such as balancing or jumping.
- Varying the stiffness or placement of passive elements could be tested to find optimal settings for faster reward improvement.
Load-bearing premise
The simulation of passive elements and the resulting attractor dynamics will match real robot behavior closely enough for learned policies to transfer without major hardware changes or retuning.
What would settle it
A physical biped robot built with springs fails to produce the same robust and energy-efficient locomotion when running the policy trained in the passive-element simulator, or shows no clear advantage over an identical robot without the springs.
Figures
read the original abstract
Embodiment is a significant keyword in recent machine learning fields. This study focused on the passive nature of the body of a biped robot to generate walking and running locomotion using model-based deep reinforcement learning. We constructed two models in a simulator, one with passive elements (e.g., springs) and the other, which is similar to general humanoids, without passive elements. The training of the model with passive elements was highly affected by the attractor of the system. This lead that although the trajectories quickly converged to limit cycles, it took a long time to obtain large rewards. However, thanks to the attractor-driven learning, the acquired locomotion was robust and energy-efficient. The results revealed that robots with passive elements could efficiently acquire high-performance locomotion by utilizing stable limit cycles generated through dynamic interaction between the body and ground. This study demonstrates the importance of implementing passive properties in the body for future embodied AI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that model-based deep RL on simulated biped robots exploits passive body elements (e.g., springs) to converge rapidly to stable limit cycles arising from body-ground dynamics, yielding more robust and energy-efficient locomotion than equivalent models without passive elements. Training is attractor-driven, producing high-performance gaits despite slower reward growth.
Significance. If validated, the result would strengthen the case for incorporating passive dynamics into embodied RL controllers, showing how mechanical attractors can simplify learning of efficient periodic behaviors. The simulation comparison between passive and rigid models provides a concrete demonstration of embodiment benefits within the manuscript's scope.
major comments (3)
- Abstract and Results: the central claims of 'high-performance locomotion' and 'energy-efficient' gaits rest on qualitative descriptions ('quickly converged', 'robust and energy-efficient') with no reported quantitative metrics such as final rewards, energy consumption (e.g., torque integrals), convergence episode counts, or statistical significance across seeds, leaving the magnitude of improvement unassessable.
- Experiments/Results: the comparison of the two simulated models demonstrates attractor effects but provides no ablations on spring stiffness, friction parameters, or actuator models, nor any domain-randomization tests, which are load-bearing for the claim that passive limit cycles survive real-world mismatches.
- Introduction and Conclusion: the emphasis on 'embodied AI' and 'future embodied AI' is undermined by the complete absence of hardware experiments or sim-to-real transfer results, despite the skeptic note highlighting that attractor behavior may not survive stiffness, friction, delay, and noise discrepancies.
minor comments (2)
- Methods: specify the exact model-based RL algorithm (e.g., which planner or dynamics model is used) and the precise passive-element implementation (spring constants, damping) so that the attractor claim can be reproduced.
- Figures: add plots of state trajectories, limit-cycle projections, and learning curves for both models to make the qualitative statements visually verifiable.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and have revised the manuscript to improve quantitative support and clarify scope where feasible.
read point-by-point responses
-
Referee: Abstract and Results: the central claims of 'high-performance locomotion' and 'energy-efficient' gaits rest on qualitative descriptions ('quickly converged', 'robust and energy-efficient') with no reported quantitative metrics such as final rewards, energy consumption (e.g., torque integrals), convergence episode counts, or statistical significance across seeds, leaving the magnitude of improvement unassessable.
Authors: We agree that quantitative metrics are needed to make the performance claims assessable. The original manuscript focused on qualitative descriptions of training dynamics and gait stability. In the revision we re-analyzed the existing training logs and will add tables reporting mean final rewards, episodes to convergence (defined as reward plateau within 5 %), energy consumption via integrated torque, and results across five random seeds with standard deviations and statistical comparisons between the passive and rigid models. revision: yes
-
Referee: Experiments/Results: the comparison of the two simulated models demonstrates attractor effects but provides no ablations on spring stiffness, friction parameters, or actuator models, nor any domain-randomization tests, which are load-bearing for the claim that passive limit cycles survive real-world mismatches.
Authors: The central experiment isolates the effect of passive elements by comparing otherwise identical models. We have added a limited sensitivity study on spring stiffness in the revision, confirming that limit-cycle behavior persists across a neighborhood of the nominal value. Broader ablations on friction, actuator dynamics, and domain randomization were not performed because they would shift the focus away from the core attractor-driven learning phenomenon under ideal conditions; we have expanded the discussion to acknowledge these as limitations and future directions. revision: partial
-
Referee: Introduction and Conclusion: the emphasis on 'embodied AI' and 'future embodied AI' is undermined by the complete absence of hardware experiments or sim-to-real transfer results, despite the skeptic note highlighting that attractor behavior may not survive stiffness, friction, delay, and noise discrepancies.
Authors: The manuscript is explicitly framed as a simulation study demonstrating the principle that passive body dynamics can simplify model-based RL. The skeptic note is already addressed in the limitations paragraph. We have revised the introduction and conclusion to temper language and explicitly list hardware validation as future work. We cannot perform hardware experiments or sim-to-real transfer at this time because no physical biped platform is available in our current laboratory setup. revision: partial
- Conducting hardware experiments or sim-to-real transfer, as the study is purely simulation-based and no physical robot hardware is accessible.
Circularity Check
No circularity: empirical simulation comparison with independent experimental outcomes
full rationale
The paper reports an empirical study constructing two simulated biped models (with/without passive springs) and training model-based RL policies on them. Claims about attractor-driven convergence to limit cycles, robustness, and energy efficiency are direct observations from the training runs and reward curves in simulation. No equations, fitted parameters, or self-citations are presented as load-bearing derivations; the central results are falsifiable experimental outcomes rather than reductions to inputs by construction. The derivation chain is self-contained as a standard sim-based RL ablation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Learning agile and dynamic motor skills for legged robots,
J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,”Sci Robot, vol. 4, no. 26, Jan. 2019
work page 2019
-
[2]
A survey of embodied AI: From simulators to research tasks,
J. Duan, S. Yu, H. L. Tan, H. Zhu, and C. Tan, “A survey of embodied AI: From simulators to research tasks,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 6, no. 2, pp. 230–244, Apr. 2022
work page 2022
-
[3]
Mechanical intelligence simplifies control in terrestrial limbless locomotion,
T. Wang, C. Pierce, V. Kojouharov, B. Chong, K. Diaz, H. Lu, and D. I. Goldman, “Mechanical intelligence simplifies control in terrestrial limbless locomotion,”Sci Robot, vol. 8, no. 85, p. eadi2243, Dec. 2023
work page 2023
-
[4]
ANYmal parkour: Learning agile navigation for quadrupedal robots,
D. Hoeller, N. Rudin, D. Sako, and M. Hutter, “ANYmal parkour: Learning agile navigation for quadrupedal robots,”Sci Robot, vol. 9, no. 88, p. eadi7566, Mar. 2024
work page 2024
-
[5]
Real-world humanoid locomotion with reinforcement learning,
I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”Sci Robot, vol. 9, no. 89, p. eadi9579, Apr. 2024
work page 2024
-
[6]
Learning agile soccer skills for a bipedal robot with deep reinforcement learning,
T. Haarnoja, B. Moran, G. Lever, S. H. Huang, D. Tirumala, J. Humplik, M. Wulfmeier, S. Tunyasuvunakool, N. Y. Siegel, R. Hafner, M. Bloesch, K. Har- tikainen, A. Byravan, L. Hasenclever, Y. Tassa, F. Sadeghi, N. Batchelor, F. Casarini, S. Saliceti, C. Game, N. Sreendra, K. Patel, M. Gwira, A. Huber, 17 N. Hurley, F. Nori, R. Hadsell, and N. Heess, “Learn...
work page 2024
-
[7]
Adaptive control strategies for interlimb coordination in legged robots: A review,
S. Aoi, P. Manoonpong, Y. Ambe, F. Matsuno, and F. W¨ org¨ otter, “Adaptive control strategies for interlimb coordination in legged robots: A review,”Front. Neurorobot., vol. 11, p. 39, 2017
work page 2017
-
[8]
A. Fukuhara, Y. Koizumi, T. Baba, S. Suzuki, T. Kano, A. Ishiguro, M. Daley, and A. Ijspeert, “Simple decentralized control mechanism that enables limb ad- justment for adaptive quadruped running,”Proceedings of the Royal Society B: Biological Sciences, vol. 288, p. 20211622, 2021
work page 2021
-
[9]
Tegotae-based control pro- duces adaptive inter- and intra-limb coordination in bipedal walking,
D. Owaki, S. Y. Horikiri, J. Nishii, and A. Ishiguro, “Tegotae-based control pro- duces adaptive inter- and intra-limb coordination in bipedal walking,”Front. Neurorobot., vol. 15, 629595, May 2021
work page 2021
-
[10]
A. J. Ijspeert and M. A. Daley, “Integration of feedforward and feedback con- trol in the neuromechanics of vertebrate locomotion: a review of experimental, simulation and robotic studies,”J. Exp. Biol., vol. 226, no. 15, Aug. 2023
work page 2023
-
[11]
T. McGeer, “Passive dynamic walking,”Int. J. Rob. Res., vol. 9, no. 2, pp. 62–82, 1990
work page 1990
-
[12]
Efficient bipedal robots based on passive-dynamic walkers,
S. Collins, A. Ruina, R. Tedrake, and M. Wisse, “Efficient bipedal robots based on passive-dynamic walkers,”Science, vol. 307, no. 5712, pp. 1082–1085, Feb. 2005
work page 2005
-
[13]
Computer optimization of a minimal biped model discovers walking and running,
M. Srinivasan and A. Ruina, “Computer optimization of a minimal biped model discovers walking and running,”Nature, vol. 439, no. 7072, pp. 72–75, 2006
work page 2006
-
[14]
Compliant leg behaviour explains basic dynamics of walking and running,
H. Geyer, A. Seyfarth, and R. Blickhan, “Compliant leg behaviour explains basic dynamics of walking and running,”Proceedings of the Royal Society B: Biological Sciences, vol. 273, no. 1603, pp. 2861–2867, 2006
work page 2006
-
[15]
M. Adachi, S. Aoi, T. Kamimura, K. Tsuchiya, and F. Matsuno, “Body tor- sional flexibility effects on stability during trotting and pacing based on a simple analytical model,”Bioinspir. Biomim., vol. 15, no. 5, p. 55001, 2020
work page 2020
-
[16]
Dynamical effect of elastically supported wobbling mass on biped running,
T. Kamimura, K. Sato, D. Murayama, N. Kawase, and A. Sano, “Dynamical effect of elastically supported wobbling mass on biped running,” inIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2021, pp. 4048–4055
work page 2021
-
[17]
Effect of the dynamics of a horizontally wobbling mass on biped walking performance,
T. Kamimura and A. Sano, “Effect of the dynamics of a horizontally wobbling mass on biped walking performance,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2023, pp. 12 212–12 217
work page 2023
-
[18]
Self-organization, embodiment, and bi- ologically inspired robotics,
R. Pfeifer, M. Lungarella, and F. Iida, “Self-organization, embodiment, and bi- ologically inspired robotics,”Science, vol. 318, no. 5853, pp. 1088–1093, Nov. 2007. 18
work page 2007
-
[19]
Flexible shoulder in quadruped animals and robots guiding science of soft robotics,
A. Fukuhara, M. Gunji, Y. Masuda, K. Tadakuma, and A. Ishiguro, “Flexible shoulder in quadruped animals and robots guiding science of soft robotics,”Jour- nal of Robotics and Mechatronics, vol. 34, no. 2, pp. 304–309, 2022
work page 2022
-
[20]
Comparative anatomy of quadruped robots and animals: a review,
A. Fukuhara, M. Gunji, and Y. Masuda, “Comparative anatomy of quadruped robots and animals: a review,”Adv. Robot., vol. 36, no. 13, pp. 612–630, 2022
work page 2022
-
[21]
Bird- Bot achieves energy-efficient gait with minimal control using avian-inspired leg clutching,
A. Badri-Spr¨ owitz, A. Aghamaleki Sarvestani, M. Sitti, and M. A. Daley, “Bird- Bot achieves energy-efficient gait with minimal control using avian-inspired leg clutching,”Sci Robot, vol. 7, no. 64, p. eabg4055, Mar. 2022
work page 2022
-
[22]
Model-based learning for mobile robot navigation from the dynamical systems perspective,
J. Tani, “Model-based learning for mobile robot navigation from the dynamical systems perspective,”IEEE Trans. Syst. Man Cybern. B Cybern., vol. 26, no. 3, pp. 421–436, 1996
work page 1996
-
[23]
DeepCPG policies for robot locomotion,
A. M. Deshpande, E. Hurd, A. A. Minai, and M. Kumar, “DeepCPG policies for robot locomotion,”IEEE Transactions on Cognitive and Developmental Systems, vol. 15, no. 4, pp. 2108–2121, Dec. 2023
work page 2023
-
[24]
G. Li, A. Ijspeert, and M. Hayashibe, “AI-CPG: Adaptive imitated central pat- tern generators for bipedal locomotion learned through reinforced reflex neural networks,”IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 5190–5197, Jun. 2024
work page 2024
-
[25]
Terrain adaptive walking of biped neuromuscu- lar virtual human using deep reinforcement learning,
J. Wang, W. Qin, and L. Sun, “Terrain adaptive walking of biped neuromuscu- lar virtual human using deep reinforcement learning,”IEEE Access, vol. 7, pp. 92 465–92 475, 2019
work page 2019
-
[26]
Natural walking with musculoskeletal mod- els using deep reinforcement learning,
J. Weng, E. Hashemi, and A. Arami, “Natural walking with musculoskeletal mod- els using deep reinforcement learning,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 4156–4162, Apr. 2021
work page 2021
-
[27]
B. N. Ogum, L. R. B. Schomaker, and R. Carloni, “Learning to walk with deep reinforcement learning: Forward dynamic simulation of a physics-based muscu- loskeletal model of an osseointegrated transfemoral amputee,”IEEE Trans. Neu- ral Syst. Rehabil. Eng., vol. 32, pp. 431–441, Jan. 2024
work page 2024
-
[28]
S. Yamaguchi, R. Sato, and A. Ming, “Motion acquisition of vertical jumping by a bio-inspired legged robot via deep reinforcement learning,” in2021 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, Dec. 2021, pp. 932–937
work page 2021
-
[29]
Multimodal bipedal locomotion generation with passive dynamics via deep reinforcement learning,
S. Koseki, K. Kutsuzawa, D. Owaki, and M. Hayashibe, “Multimodal bipedal locomotion generation with passive dynamics via deep reinforcement learning,” Front. Neurorobot., vol. 16, p. 1054239, Jan. 2023
work page 2023
-
[30]
Learning-based design and control for quadrupedal robots with parallel-elastic actuators,
F. Bjelonic, J. Lee, P. Arm, D. Sako, D. Tateo, J. Peters, and M. Hutter, “Learning-based design and control for quadrupedal robots with parallel-elastic actuators,”IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1611–1618, Mar. 2023. 19
work page 2023
-
[31]
A physical principle of gait generation and its stabilization derived from mechanism of fixed point,
Y. Ikemata, A. Sano, and H. Fujimoto, “A physical principle of gait generation and its stabilization derived from mechanism of fixed point,” inProceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006. IEEE, 2006, pp. 836–841
work page 2006
-
[32]
A study of bouncing rod dynamics aiming at passive running,
H. Miyamoto, A. Sano, Y. Ikemata, S. Maruyama, and H. Fujimoto, “A study of bouncing rod dynamics aiming at passive running,” inIEEE International Conference on Robotics and Automation. IEEE, 2010, pp. 3298–3303
work page 2010
-
[33]
Bipedal robot running: human-like actuation timing using fast and slow adaptations,
Y. Sakurai, T. Kamimura, Y. Sakamoto, S. Nishii, K. Sato, Y. Fujiwara, and A. Sano, “Bipedal robot running: human-like actuation timing using fast and slow adaptations,”Adv. Robot., vol. 38, no. 8, pp. 561–572, Apr. 2024
work page 2024
-
[34]
A. S. Anand, G. Zhao, H. Roth, and A. Seyfarth, “A deep reinforcement learning based approach towards generating human walking behavior with a neuromus- cular model,” in2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids). IEEE, Oct. 2019, pp. 537–543
work page 2019
-
[35]
Learning with muscles: Benefits for data-efficiency and robustness in anthropomorphic tasks,
I. Wochner, P. Schumacher, G. Martius, D. B¨ uchler, S. Schmitt, and D. F. B. Haeufle, “Learning with muscles: Benefits for data-efficiency and robustness in anthropomorphic tasks,” in6th Conference on Robot Learning (CoRL2022), Jul. 2022
work page 2022
-
[36]
Recurrent world models facilitate policy evolution,
D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” in32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 2018
work page 2018
-
[37]
Mastering atari with discrete world models,
D. Hafner, T. Lillicrap, M. Norouzi, and J. Ba, “Mastering atari with discrete world models,”ICLR 2021 - 9th International Conference on Learning Represen- tations, Oct. 2020
work page 2021
-
[38]
DayDreamer: World models for physical robot learning,
P. Wu, A. Escontrela, D. Hafner, K. Goldberg, and P. Abbeel, “DayDreamer: World models for physical robot learning,”Proceedings of Machine Learning Re- search, vol. 205, pp. 2226–2240, Jun. 2022
work page 2022
-
[39]
UMAP: Uniform manifold approximation and projection,
L. McInnes, J. Healy, N. Saul, and L. Großberger, “UMAP: Uniform manifold approximation and projection,”J. Open Source Softw., vol. 3, no. 29, p. 861, Sep. 2018
work page 2018
-
[40]
A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals,
C. A. Fukuchi, R. K. Fukuchi, and M. Duarte, “A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals,”PeerJ, vol. 6, p. e4640, Apr. 2018
work page 2018
-
[41]
R. K. Fukuchi, C. A. Fukuchi, and M. Duarte, “A public dataset of running biomechanics and the effects of running speed on lower extremity kinematics and kinetics,”PeerJ, vol. 5, no. e3298, p. e3298, May 2017. 20
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.