Model-Based Reinforcement Learning Exploits Passive Body Dynamics for High-Performance Biped Robot Locomotion

Akihito Sano; Haruka Washiyama; Tomoya Kamimura

arxiv: 2604.14565 · v1 · submitted 2026-04-16 · 💻 cs.RO · cs.SY· eess.SY

Model-Based Reinforcement Learning Exploits Passive Body Dynamics for High-Performance Biped Robot Locomotion

Tomoya Kamimura , Haruka Washiyama , Akihito Sano This is my paper

Pith reviewed 2026-05-10 11:38 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY

keywords model-based reinforcement learningbiped locomotionpassive dynamicslimit cyclesembodied AIenergy efficiencyattractor dynamicsrobot walking

0 comments

The pith

Biped robots with passive elements learn high-performance locomotion by exploiting stable limit cycles

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that adding passive elements such as springs to a biped robot model lets model-based deep reinforcement learning produce effective walking and running. These elements generate attractors that pull the system toward stable limit cycles from body-ground contact, yielding robust and energy-efficient gaits even when reward gains take longer to appear. A reader would care because the comparison to rigid models shows how passive body design can simplify control and improve outcomes rather than depending only on active actuation. The work concludes that such passive properties matter for building better embodied AI systems.

Core claim

The paper claims that the training of the model with passive elements is highly affected by the attractor of the system. This leads trajectories to converge quickly to limit cycles, and although it takes a long time to obtain large rewards, the acquired locomotion is robust and energy-efficient. Robots with passive elements can efficiently acquire high-performance locomotion by utilizing stable limit cycles generated through dynamic interaction between the body and ground.

What carries the argument

The stable limit cycles created by dynamic interaction between passive body elements and the ground, which act as attractors that shape and accelerate the reinforcement learning process toward efficient policies.

If this is right

Locomotion policies learned with passive elements prove more robust and energy-efficient than those from rigid models.
Trajectories reach limit cycles rapidly because the passive dynamics create strong attractors.
The approach demonstrates that passive body properties support high-performance bipedal gaits in model-based reinforcement learning.
Implementing passive properties in robot bodies becomes a practical route to better embodied locomotion.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Physical robots that embed springs or other compliant elements directly in their structure could realize the same learning gains outside simulation.
The same passive-dynamics principle might improve learning for other dynamic robot behaviors such as balancing or jumping.
Varying the stiffness or placement of passive elements could be tested to find optimal settings for faster reward improvement.

Load-bearing premise

The simulation of passive elements and the resulting attractor dynamics will match real robot behavior closely enough for learned policies to transfer without major hardware changes or retuning.

What would settle it

A physical biped robot built with springs fails to produce the same robust and energy-efficient locomotion when running the policy trained in the passive-element simulator, or shows no clear advantage over an identical robot without the springs.

Figures

Figures reproduced from arXiv: 2604.14565 by Akihito Sano, Haruka Washiyama, Tomoya Kamimura.

**Figure 2.** Figure 2: Monochrome image of the robot taken from behind, whose size is 64 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Learning curves of passive model and torque model in 10 trials. (A) [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Convergence of trajectories with learning process. Two-dimensional [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Snapshots of typical walking with vd = 1.5 [m/s] by (A) passive model and (B) torque model. Gait cycle A: Passive model running B: Torque model running 0% 50% 100% [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Snapshots of typical running with vd = 2.5 [m/s] by (A) passive model and (B) torque model. both speeds, the resulting locomotion at the end of training was qualitatively different for each robot. Typical locomotion obtained for target speeds vd = 1.5 [m/s] and 2.5 [m/s] are depicted in Figs. 5 and 6, respectively. Regardless of the target speed, the passive model produced soft and bending joint motions, e… view at source ↗

**Figure 7.** Figure 7: Footprint diagrams (model only) and time profiles of joint angles in [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Horizontal position on slopes (solid) and level ground (dashed) with [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Embodiment is a significant keyword in recent machine learning fields. This study focused on the passive nature of the body of a biped robot to generate walking and running locomotion using model-based deep reinforcement learning. We constructed two models in a simulator, one with passive elements (e.g., springs) and the other, which is similar to general humanoids, without passive elements. The training of the model with passive elements was highly affected by the attractor of the system. This lead that although the trajectories quickly converged to limit cycles, it took a long time to obtain large rewards. However, thanks to the attractor-driven learning, the acquired locomotion was robust and energy-efficient. The results revealed that robots with passive elements could efficiently acquire high-performance locomotion by utilizing stable limit cycles generated through dynamic interaction between the body and ground. This study demonstrates the importance of implementing passive properties in the body for future embodied AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Passive springs let simulated MBRL biped policies reach stable limit cycles faster, but the work stays in simulation with no hardware checks.

read the letter

The key point is that passive elements in a simulated biped robot allow model-based reinforcement learning to find high-performance locomotion more effectively by converging to stable limit cycles through body-ground dynamics. The paper sets up two models in simulation, one with springs and one rigid like typical humanoids. It trains model-based deep RL on both and observes that the passive version gets pulled into efficient gaits faster due to the system's attractors. This leads to locomotion that is described as robust and energy-efficient. This comparison is the main new element. It applies existing ideas from passive dynamics to a modern RL framework and shows an empirical difference in learning behavior. The attractor-driven aspect is a useful angle for thinking about how embodiment can simplify control problems. The simulation results appear consistent with the idea that passive properties can aid learning. The authors highlight how the trajectories quickly settle into limit cycles, which then support good performance. That said, the entire study is simulation-based. There is no hardware implementation or sim-to-real transfer test, which weakens the claim that this approach yields high-performance locomotion for actual robots. Real-world factors like actuator delays and varying ground conditions could disrupt the attractors. The abstract gives no specific numbers on rewards, energy, or statistical tests, so the full paper needs to provide those to make the gains convincing. This work is aimed at researchers in robotics and machine learning who study embodied AI and co-design of body and controller. A reader interested in legged locomotion or how hardware influences RL outcomes would find the contrast informative. It is not yet at the stage for direct application to physical robots. I recommend sending it for peer review. The simulation experiment is a solid foundation that raises good questions about passive dynamics in learning, though it will likely need revisions to include more quantitative data and some validation steps.

Referee Report

3 major / 2 minor

Summary. The paper claims that model-based deep RL on simulated biped robots exploits passive body elements (e.g., springs) to converge rapidly to stable limit cycles arising from body-ground dynamics, yielding more robust and energy-efficient locomotion than equivalent models without passive elements. Training is attractor-driven, producing high-performance gaits despite slower reward growth.

Significance. If validated, the result would strengthen the case for incorporating passive dynamics into embodied RL controllers, showing how mechanical attractors can simplify learning of efficient periodic behaviors. The simulation comparison between passive and rigid models provides a concrete demonstration of embodiment benefits within the manuscript's scope.

major comments (3)

Abstract and Results: the central claims of 'high-performance locomotion' and 'energy-efficient' gaits rest on qualitative descriptions ('quickly converged', 'robust and energy-efficient') with no reported quantitative metrics such as final rewards, energy consumption (e.g., torque integrals), convergence episode counts, or statistical significance across seeds, leaving the magnitude of improvement unassessable.
Experiments/Results: the comparison of the two simulated models demonstrates attractor effects but provides no ablations on spring stiffness, friction parameters, or actuator models, nor any domain-randomization tests, which are load-bearing for the claim that passive limit cycles survive real-world mismatches.
Introduction and Conclusion: the emphasis on 'embodied AI' and 'future embodied AI' is undermined by the complete absence of hardware experiments or sim-to-real transfer results, despite the skeptic note highlighting that attractor behavior may not survive stiffness, friction, delay, and noise discrepancies.

minor comments (2)

Methods: specify the exact model-based RL algorithm (e.g., which planner or dynamics model is used) and the precise passive-element implementation (spring constants, damping) so that the attractor claim can be reproduced.
Figures: add plots of state trajectories, limit-cycle projections, and learning curves for both models to make the qualitative statements visually verifiable.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and have revised the manuscript to improve quantitative support and clarify scope where feasible.

read point-by-point responses

Referee: Abstract and Results: the central claims of 'high-performance locomotion' and 'energy-efficient' gaits rest on qualitative descriptions ('quickly converged', 'robust and energy-efficient') with no reported quantitative metrics such as final rewards, energy consumption (e.g., torque integrals), convergence episode counts, or statistical significance across seeds, leaving the magnitude of improvement unassessable.

Authors: We agree that quantitative metrics are needed to make the performance claims assessable. The original manuscript focused on qualitative descriptions of training dynamics and gait stability. In the revision we re-analyzed the existing training logs and will add tables reporting mean final rewards, episodes to convergence (defined as reward plateau within 5 %), energy consumption via integrated torque, and results across five random seeds with standard deviations and statistical comparisons between the passive and rigid models. revision: yes
Referee: Experiments/Results: the comparison of the two simulated models demonstrates attractor effects but provides no ablations on spring stiffness, friction parameters, or actuator models, nor any domain-randomization tests, which are load-bearing for the claim that passive limit cycles survive real-world mismatches.

Authors: The central experiment isolates the effect of passive elements by comparing otherwise identical models. We have added a limited sensitivity study on spring stiffness in the revision, confirming that limit-cycle behavior persists across a neighborhood of the nominal value. Broader ablations on friction, actuator dynamics, and domain randomization were not performed because they would shift the focus away from the core attractor-driven learning phenomenon under ideal conditions; we have expanded the discussion to acknowledge these as limitations and future directions. revision: partial
Referee: Introduction and Conclusion: the emphasis on 'embodied AI' and 'future embodied AI' is undermined by the complete absence of hardware experiments or sim-to-real transfer results, despite the skeptic note highlighting that attractor behavior may not survive stiffness, friction, delay, and noise discrepancies.

Authors: The manuscript is explicitly framed as a simulation study demonstrating the principle that passive body dynamics can simplify model-based RL. The skeptic note is already addressed in the limitations paragraph. We have revised the introduction and conclusion to temper language and explicitly list hardware validation as future work. We cannot perform hardware experiments or sim-to-real transfer at this time because no physical biped platform is available in our current laboratory setup. revision: partial

standing simulated objections not resolved

Conducting hardware experiments or sim-to-real transfer, as the study is purely simulation-based and no physical robot hardware is accessible.

Circularity Check

0 steps flagged

No circularity: empirical simulation comparison with independent experimental outcomes

full rationale

The paper reports an empirical study constructing two simulated biped models (with/without passive springs) and training model-based RL policies on them. Claims about attractor-driven convergence to limit cycles, robustness, and energy efficiency are direct observations from the training runs and reward curves in simulation. No equations, fitted parameters, or self-citations are presented as load-bearing derivations; the central results are falsifiable experimental outcomes rather than reductions to inputs by construction. The derivation chain is self-contained as a standard sim-based RL ablation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not detail specific free parameters, axioms, or invented entities; the work appears to rest on standard assumptions of RL convergence in simulation and fidelity of passive element modeling.

pith-pipeline@v0.9.0 · 5467 in / 1082 out tokens · 31924 ms · 2026-05-10T11:38:25.033732+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

[1]

Learning agile and dynamic motor skills for legged robots,

J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,”Sci Robot, vol. 4, no. 26, Jan. 2019

work page 2019
[2]

A survey of embodied AI: From simulators to research tasks,

J. Duan, S. Yu, H. L. Tan, H. Zhu, and C. Tan, “A survey of embodied AI: From simulators to research tasks,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 6, no. 2, pp. 230–244, Apr. 2022

work page 2022
[3]

Mechanical intelligence simplifies control in terrestrial limbless locomotion,

T. Wang, C. Pierce, V. Kojouharov, B. Chong, K. Diaz, H. Lu, and D. I. Goldman, “Mechanical intelligence simplifies control in terrestrial limbless locomotion,”Sci Robot, vol. 8, no. 85, p. eadi2243, Dec. 2023

work page 2023
[4]

ANYmal parkour: Learning agile navigation for quadrupedal robots,

D. Hoeller, N. Rudin, D. Sako, and M. Hutter, “ANYmal parkour: Learning agile navigation for quadrupedal robots,”Sci Robot, vol. 9, no. 88, p. eadi7566, Mar. 2024

work page 2024
[5]

Real-world humanoid locomotion with reinforcement learning,

I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”Sci Robot, vol. 9, no. 89, p. eadi9579, Apr. 2024

work page 2024
[6]

Learning agile soccer skills for a bipedal robot with deep reinforcement learning,

T. Haarnoja, B. Moran, G. Lever, S. H. Huang, D. Tirumala, J. Humplik, M. Wulfmeier, S. Tunyasuvunakool, N. Y. Siegel, R. Hafner, M. Bloesch, K. Har- tikainen, A. Byravan, L. Hasenclever, Y. Tassa, F. Sadeghi, N. Batchelor, F. Casarini, S. Saliceti, C. Game, N. Sreendra, K. Patel, M. Gwira, A. Huber, 17 N. Hurley, F. Nori, R. Hadsell, and N. Heess, “Learn...

work page 2024
[7]

Adaptive control strategies for interlimb coordination in legged robots: A review,

S. Aoi, P. Manoonpong, Y. Ambe, F. Matsuno, and F. W¨ org¨ otter, “Adaptive control strategies for interlimb coordination in legged robots: A review,”Front. Neurorobot., vol. 11, p. 39, 2017

work page 2017
[8]

Simple decentralized control mechanism that enables limb ad- justment for adaptive quadruped running,

A. Fukuhara, Y. Koizumi, T. Baba, S. Suzuki, T. Kano, A. Ishiguro, M. Daley, and A. Ijspeert, “Simple decentralized control mechanism that enables limb ad- justment for adaptive quadruped running,”Proceedings of the Royal Society B: Biological Sciences, vol. 288, p. 20211622, 2021

work page 2021
[9]

Tegotae-based control pro- duces adaptive inter- and intra-limb coordination in bipedal walking,

D. Owaki, S. Y. Horikiri, J. Nishii, and A. Ishiguro, “Tegotae-based control pro- duces adaptive inter- and intra-limb coordination in bipedal walking,”Front. Neurorobot., vol. 15, 629595, May 2021

work page 2021
[10]

Integration of feedforward and feedback con- trol in the neuromechanics of vertebrate locomotion: a review of experimental, simulation and robotic studies,

A. J. Ijspeert and M. A. Daley, “Integration of feedforward and feedback con- trol in the neuromechanics of vertebrate locomotion: a review of experimental, simulation and robotic studies,”J. Exp. Biol., vol. 226, no. 15, Aug. 2023

work page 2023
[11]

Passive dynamic walking,

T. McGeer, “Passive dynamic walking,”Int. J. Rob. Res., vol. 9, no. 2, pp. 62–82, 1990

work page 1990
[12]

Efficient bipedal robots based on passive-dynamic walkers,

S. Collins, A. Ruina, R. Tedrake, and M. Wisse, “Efficient bipedal robots based on passive-dynamic walkers,”Science, vol. 307, no. 5712, pp. 1082–1085, Feb. 2005

work page 2005
[13]

Computer optimization of a minimal biped model discovers walking and running,

M. Srinivasan and A. Ruina, “Computer optimization of a minimal biped model discovers walking and running,”Nature, vol. 439, no. 7072, pp. 72–75, 2006

work page 2006
[14]

Compliant leg behaviour explains basic dynamics of walking and running,

H. Geyer, A. Seyfarth, and R. Blickhan, “Compliant leg behaviour explains basic dynamics of walking and running,”Proceedings of the Royal Society B: Biological Sciences, vol. 273, no. 1603, pp. 2861–2867, 2006

work page 2006
[15]

Body tor- sional flexibility effects on stability during trotting and pacing based on a simple analytical model,

M. Adachi, S. Aoi, T. Kamimura, K. Tsuchiya, and F. Matsuno, “Body tor- sional flexibility effects on stability during trotting and pacing based on a simple analytical model,”Bioinspir. Biomim., vol. 15, no. 5, p. 55001, 2020

work page 2020
[16]

Dynamical effect of elastically supported wobbling mass on biped running,

T. Kamimura, K. Sato, D. Murayama, N. Kawase, and A. Sano, “Dynamical effect of elastically supported wobbling mass on biped running,” inIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2021, pp. 4048–4055

work page 2021
[17]

Effect of the dynamics of a horizontally wobbling mass on biped walking performance,

T. Kamimura and A. Sano, “Effect of the dynamics of a horizontally wobbling mass on biped walking performance,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2023, pp. 12 212–12 217

work page 2023
[18]

Self-organization, embodiment, and bi- ologically inspired robotics,

R. Pfeifer, M. Lungarella, and F. Iida, “Self-organization, embodiment, and bi- ologically inspired robotics,”Science, vol. 318, no. 5853, pp. 1088–1093, Nov. 2007. 18

work page 2007
[19]

Flexible shoulder in quadruped animals and robots guiding science of soft robotics,

A. Fukuhara, M. Gunji, Y. Masuda, K. Tadakuma, and A. Ishiguro, “Flexible shoulder in quadruped animals and robots guiding science of soft robotics,”Jour- nal of Robotics and Mechatronics, vol. 34, no. 2, pp. 304–309, 2022

work page 2022
[20]

Comparative anatomy of quadruped robots and animals: a review,

A. Fukuhara, M. Gunji, and Y. Masuda, “Comparative anatomy of quadruped robots and animals: a review,”Adv. Robot., vol. 36, no. 13, pp. 612–630, 2022

work page 2022
[21]

Bird- Bot achieves energy-efficient gait with minimal control using avian-inspired leg clutching,

A. Badri-Spr¨ owitz, A. Aghamaleki Sarvestani, M. Sitti, and M. A. Daley, “Bird- Bot achieves energy-efficient gait with minimal control using avian-inspired leg clutching,”Sci Robot, vol. 7, no. 64, p. eabg4055, Mar. 2022

work page 2022
[22]

Model-based learning for mobile robot navigation from the dynamical systems perspective,

J. Tani, “Model-based learning for mobile robot navigation from the dynamical systems perspective,”IEEE Trans. Syst. Man Cybern. B Cybern., vol. 26, no. 3, pp. 421–436, 1996

work page 1996
[23]

DeepCPG policies for robot locomotion,

A. M. Deshpande, E. Hurd, A. A. Minai, and M. Kumar, “DeepCPG policies for robot locomotion,”IEEE Transactions on Cognitive and Developmental Systems, vol. 15, no. 4, pp. 2108–2121, Dec. 2023

work page 2023
[24]

AI-CPG: Adaptive imitated central pat- tern generators for bipedal locomotion learned through reinforced reflex neural networks,

G. Li, A. Ijspeert, and M. Hayashibe, “AI-CPG: Adaptive imitated central pat- tern generators for bipedal locomotion learned through reinforced reflex neural networks,”IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 5190–5197, Jun. 2024

work page 2024
[25]

Terrain adaptive walking of biped neuromuscu- lar virtual human using deep reinforcement learning,

J. Wang, W. Qin, and L. Sun, “Terrain adaptive walking of biped neuromuscu- lar virtual human using deep reinforcement learning,”IEEE Access, vol. 7, pp. 92 465–92 475, 2019

work page 2019
[26]

Natural walking with musculoskeletal mod- els using deep reinforcement learning,

J. Weng, E. Hashemi, and A. Arami, “Natural walking with musculoskeletal mod- els using deep reinforcement learning,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 4156–4162, Apr. 2021

work page 2021
[27]

Learning to walk with deep reinforcement learning: Forward dynamic simulation of a physics-based muscu- loskeletal model of an osseointegrated transfemoral amputee,

B. N. Ogum, L. R. B. Schomaker, and R. Carloni, “Learning to walk with deep reinforcement learning: Forward dynamic simulation of a physics-based muscu- loskeletal model of an osseointegrated transfemoral amputee,”IEEE Trans. Neu- ral Syst. Rehabil. Eng., vol. 32, pp. 431–441, Jan. 2024

work page 2024
[28]

Motion acquisition of vertical jumping by a bio-inspired legged robot via deep reinforcement learning,

S. Yamaguchi, R. Sato, and A. Ming, “Motion acquisition of vertical jumping by a bio-inspired legged robot via deep reinforcement learning,” in2021 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, Dec. 2021, pp. 932–937

work page 2021
[29]

Multimodal bipedal locomotion generation with passive dynamics via deep reinforcement learning,

S. Koseki, K. Kutsuzawa, D. Owaki, and M. Hayashibe, “Multimodal bipedal locomotion generation with passive dynamics via deep reinforcement learning,” Front. Neurorobot., vol. 16, p. 1054239, Jan. 2023

work page 2023
[30]

Learning-based design and control for quadrupedal robots with parallel-elastic actuators,

F. Bjelonic, J. Lee, P. Arm, D. Sako, D. Tateo, J. Peters, and M. Hutter, “Learning-based design and control for quadrupedal robots with parallel-elastic actuators,”IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1611–1618, Mar. 2023. 19

work page 2023
[31]

A physical principle of gait generation and its stabilization derived from mechanism of fixed point,

Y. Ikemata, A. Sano, and H. Fujimoto, “A physical principle of gait generation and its stabilization derived from mechanism of fixed point,” inProceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006. IEEE, 2006, pp. 836–841

work page 2006
[32]

A study of bouncing rod dynamics aiming at passive running,

H. Miyamoto, A. Sano, Y. Ikemata, S. Maruyama, and H. Fujimoto, “A study of bouncing rod dynamics aiming at passive running,” inIEEE International Conference on Robotics and Automation. IEEE, 2010, pp. 3298–3303

work page 2010
[33]

Bipedal robot running: human-like actuation timing using fast and slow adaptations,

Y. Sakurai, T. Kamimura, Y. Sakamoto, S. Nishii, K. Sato, Y. Fujiwara, and A. Sano, “Bipedal robot running: human-like actuation timing using fast and slow adaptations,”Adv. Robot., vol. 38, no. 8, pp. 561–572, Apr. 2024

work page 2024
[34]

A deep reinforcement learning based approach towards generating human walking behavior with a neuromus- cular model,

A. S. Anand, G. Zhao, H. Roth, and A. Seyfarth, “A deep reinforcement learning based approach towards generating human walking behavior with a neuromus- cular model,” in2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids). IEEE, Oct. 2019, pp. 537–543

work page 2019
[35]

Learning with muscles: Benefits for data-efficiency and robustness in anthropomorphic tasks,

I. Wochner, P. Schumacher, G. Martius, D. B¨ uchler, S. Schmitt, and D. F. B. Haeufle, “Learning with muscles: Benefits for data-efficiency and robustness in anthropomorphic tasks,” in6th Conference on Robot Learning (CoRL2022), Jul. 2022

work page 2022
[36]

Recurrent world models facilitate policy evolution,

D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” in32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 2018

work page 2018
[37]

Mastering atari with discrete world models,

D. Hafner, T. Lillicrap, M. Norouzi, and J. Ba, “Mastering atari with discrete world models,”ICLR 2021 - 9th International Conference on Learning Represen- tations, Oct. 2020

work page 2021
[38]

DayDreamer: World models for physical robot learning,

P. Wu, A. Escontrela, D. Hafner, K. Goldberg, and P. Abbeel, “DayDreamer: World models for physical robot learning,”Proceedings of Machine Learning Re- search, vol. 205, pp. 2226–2240, Jun. 2022

work page 2022
[39]

UMAP: Uniform manifold approximation and projection,

L. McInnes, J. Healy, N. Saul, and L. Großberger, “UMAP: Uniform manifold approximation and projection,”J. Open Source Softw., vol. 3, no. 29, p. 861, Sep. 2018

work page 2018
[40]

A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals,

C. A. Fukuchi, R. K. Fukuchi, and M. Duarte, “A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals,”PeerJ, vol. 6, p. e4640, Apr. 2018

work page 2018
[41]

A public dataset of running biomechanics and the effects of running speed on lower extremity kinematics and kinetics,

R. K. Fukuchi, C. A. Fukuchi, and M. Duarte, “A public dataset of running biomechanics and the effects of running speed on lower extremity kinematics and kinetics,”PeerJ, vol. 5, no. e3298, p. e3298, May 2017. 20

work page 2017

[1] [1]

Learning agile and dynamic motor skills for legged robots,

J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,”Sci Robot, vol. 4, no. 26, Jan. 2019

work page 2019

[2] [2]

A survey of embodied AI: From simulators to research tasks,

J. Duan, S. Yu, H. L. Tan, H. Zhu, and C. Tan, “A survey of embodied AI: From simulators to research tasks,”IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 6, no. 2, pp. 230–244, Apr. 2022

work page 2022

[3] [3]

Mechanical intelligence simplifies control in terrestrial limbless locomotion,

T. Wang, C. Pierce, V. Kojouharov, B. Chong, K. Diaz, H. Lu, and D. I. Goldman, “Mechanical intelligence simplifies control in terrestrial limbless locomotion,”Sci Robot, vol. 8, no. 85, p. eadi2243, Dec. 2023

work page 2023

[4] [4]

ANYmal parkour: Learning agile navigation for quadrupedal robots,

D. Hoeller, N. Rudin, D. Sako, and M. Hutter, “ANYmal parkour: Learning agile navigation for quadrupedal robots,”Sci Robot, vol. 9, no. 88, p. eadi7566, Mar. 2024

work page 2024

[5] [5]

Real-world humanoid locomotion with reinforcement learning,

I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”Sci Robot, vol. 9, no. 89, p. eadi9579, Apr. 2024

work page 2024

[6] [6]

Learning agile soccer skills for a bipedal robot with deep reinforcement learning,

T. Haarnoja, B. Moran, G. Lever, S. H. Huang, D. Tirumala, J. Humplik, M. Wulfmeier, S. Tunyasuvunakool, N. Y. Siegel, R. Hafner, M. Bloesch, K. Har- tikainen, A. Byravan, L. Hasenclever, Y. Tassa, F. Sadeghi, N. Batchelor, F. Casarini, S. Saliceti, C. Game, N. Sreendra, K. Patel, M. Gwira, A. Huber, 17 N. Hurley, F. Nori, R. Hadsell, and N. Heess, “Learn...

work page 2024

[7] [7]

Adaptive control strategies for interlimb coordination in legged robots: A review,

S. Aoi, P. Manoonpong, Y. Ambe, F. Matsuno, and F. W¨ org¨ otter, “Adaptive control strategies for interlimb coordination in legged robots: A review,”Front. Neurorobot., vol. 11, p. 39, 2017

work page 2017

[8] [8]

Simple decentralized control mechanism that enables limb ad- justment for adaptive quadruped running,

A. Fukuhara, Y. Koizumi, T. Baba, S. Suzuki, T. Kano, A. Ishiguro, M. Daley, and A. Ijspeert, “Simple decentralized control mechanism that enables limb ad- justment for adaptive quadruped running,”Proceedings of the Royal Society B: Biological Sciences, vol. 288, p. 20211622, 2021

work page 2021

[9] [9]

Tegotae-based control pro- duces adaptive inter- and intra-limb coordination in bipedal walking,

D. Owaki, S. Y. Horikiri, J. Nishii, and A. Ishiguro, “Tegotae-based control pro- duces adaptive inter- and intra-limb coordination in bipedal walking,”Front. Neurorobot., vol. 15, 629595, May 2021

work page 2021

[10] [10]

Integration of feedforward and feedback con- trol in the neuromechanics of vertebrate locomotion: a review of experimental, simulation and robotic studies,

A. J. Ijspeert and M. A. Daley, “Integration of feedforward and feedback con- trol in the neuromechanics of vertebrate locomotion: a review of experimental, simulation and robotic studies,”J. Exp. Biol., vol. 226, no. 15, Aug. 2023

work page 2023

[11] [11]

Passive dynamic walking,

T. McGeer, “Passive dynamic walking,”Int. J. Rob. Res., vol. 9, no. 2, pp. 62–82, 1990

work page 1990

[12] [12]

Efficient bipedal robots based on passive-dynamic walkers,

S. Collins, A. Ruina, R. Tedrake, and M. Wisse, “Efficient bipedal robots based on passive-dynamic walkers,”Science, vol. 307, no. 5712, pp. 1082–1085, Feb. 2005

work page 2005

[13] [13]

Computer optimization of a minimal biped model discovers walking and running,

M. Srinivasan and A. Ruina, “Computer optimization of a minimal biped model discovers walking and running,”Nature, vol. 439, no. 7072, pp. 72–75, 2006

work page 2006

[14] [14]

Compliant leg behaviour explains basic dynamics of walking and running,

H. Geyer, A. Seyfarth, and R. Blickhan, “Compliant leg behaviour explains basic dynamics of walking and running,”Proceedings of the Royal Society B: Biological Sciences, vol. 273, no. 1603, pp. 2861–2867, 2006

work page 2006

[15] [15]

Body tor- sional flexibility effects on stability during trotting and pacing based on a simple analytical model,

M. Adachi, S. Aoi, T. Kamimura, K. Tsuchiya, and F. Matsuno, “Body tor- sional flexibility effects on stability during trotting and pacing based on a simple analytical model,”Bioinspir. Biomim., vol. 15, no. 5, p. 55001, 2020

work page 2020

[16] [16]

Dynamical effect of elastically supported wobbling mass on biped running,

T. Kamimura, K. Sato, D. Murayama, N. Kawase, and A. Sano, “Dynamical effect of elastically supported wobbling mass on biped running,” inIEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2021, pp. 4048–4055

work page 2021

[17] [17]

Effect of the dynamics of a horizontally wobbling mass on biped walking performance,

T. Kamimura and A. Sano, “Effect of the dynamics of a horizontally wobbling mass on biped walking performance,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2023, pp. 12 212–12 217

work page 2023

[18] [18]

Self-organization, embodiment, and bi- ologically inspired robotics,

R. Pfeifer, M. Lungarella, and F. Iida, “Self-organization, embodiment, and bi- ologically inspired robotics,”Science, vol. 318, no. 5853, pp. 1088–1093, Nov. 2007. 18

work page 2007

[19] [19]

Flexible shoulder in quadruped animals and robots guiding science of soft robotics,

A. Fukuhara, M. Gunji, Y. Masuda, K. Tadakuma, and A. Ishiguro, “Flexible shoulder in quadruped animals and robots guiding science of soft robotics,”Jour- nal of Robotics and Mechatronics, vol. 34, no. 2, pp. 304–309, 2022

work page 2022

[20] [20]

Comparative anatomy of quadruped robots and animals: a review,

A. Fukuhara, M. Gunji, and Y. Masuda, “Comparative anatomy of quadruped robots and animals: a review,”Adv. Robot., vol. 36, no. 13, pp. 612–630, 2022

work page 2022

[21] [21]

Bird- Bot achieves energy-efficient gait with minimal control using avian-inspired leg clutching,

A. Badri-Spr¨ owitz, A. Aghamaleki Sarvestani, M. Sitti, and M. A. Daley, “Bird- Bot achieves energy-efficient gait with minimal control using avian-inspired leg clutching,”Sci Robot, vol. 7, no. 64, p. eabg4055, Mar. 2022

work page 2022

[22] [22]

Model-based learning for mobile robot navigation from the dynamical systems perspective,

J. Tani, “Model-based learning for mobile robot navigation from the dynamical systems perspective,”IEEE Trans. Syst. Man Cybern. B Cybern., vol. 26, no. 3, pp. 421–436, 1996

work page 1996

[23] [23]

DeepCPG policies for robot locomotion,

A. M. Deshpande, E. Hurd, A. A. Minai, and M. Kumar, “DeepCPG policies for robot locomotion,”IEEE Transactions on Cognitive and Developmental Systems, vol. 15, no. 4, pp. 2108–2121, Dec. 2023

work page 2023

[24] [24]

AI-CPG: Adaptive imitated central pat- tern generators for bipedal locomotion learned through reinforced reflex neural networks,

G. Li, A. Ijspeert, and M. Hayashibe, “AI-CPG: Adaptive imitated central pat- tern generators for bipedal locomotion learned through reinforced reflex neural networks,”IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 5190–5197, Jun. 2024

work page 2024

[25] [25]

Terrain adaptive walking of biped neuromuscu- lar virtual human using deep reinforcement learning,

J. Wang, W. Qin, and L. Sun, “Terrain adaptive walking of biped neuromuscu- lar virtual human using deep reinforcement learning,”IEEE Access, vol. 7, pp. 92 465–92 475, 2019

work page 2019

[26] [26]

Natural walking with musculoskeletal mod- els using deep reinforcement learning,

J. Weng, E. Hashemi, and A. Arami, “Natural walking with musculoskeletal mod- els using deep reinforcement learning,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 4156–4162, Apr. 2021

work page 2021

[27] [27]

Learning to walk with deep reinforcement learning: Forward dynamic simulation of a physics-based muscu- loskeletal model of an osseointegrated transfemoral amputee,

B. N. Ogum, L. R. B. Schomaker, and R. Carloni, “Learning to walk with deep reinforcement learning: Forward dynamic simulation of a physics-based muscu- loskeletal model of an osseointegrated transfemoral amputee,”IEEE Trans. Neu- ral Syst. Rehabil. Eng., vol. 32, pp. 431–441, Jan. 2024

work page 2024

[28] [28]

Motion acquisition of vertical jumping by a bio-inspired legged robot via deep reinforcement learning,

S. Yamaguchi, R. Sato, and A. Ming, “Motion acquisition of vertical jumping by a bio-inspired legged robot via deep reinforcement learning,” in2021 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, Dec. 2021, pp. 932–937

work page 2021

[29] [29]

Multimodal bipedal locomotion generation with passive dynamics via deep reinforcement learning,

S. Koseki, K. Kutsuzawa, D. Owaki, and M. Hayashibe, “Multimodal bipedal locomotion generation with passive dynamics via deep reinforcement learning,” Front. Neurorobot., vol. 16, p. 1054239, Jan. 2023

work page 2023

[30] [30]

Learning-based design and control for quadrupedal robots with parallel-elastic actuators,

F. Bjelonic, J. Lee, P. Arm, D. Sako, D. Tateo, J. Peters, and M. Hutter, “Learning-based design and control for quadrupedal robots with parallel-elastic actuators,”IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1611–1618, Mar. 2023. 19

work page 2023

[31] [31]

A physical principle of gait generation and its stabilization derived from mechanism of fixed point,

Y. Ikemata, A. Sano, and H. Fujimoto, “A physical principle of gait generation and its stabilization derived from mechanism of fixed point,” inProceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006. IEEE, 2006, pp. 836–841

work page 2006

[32] [32]

A study of bouncing rod dynamics aiming at passive running,

H. Miyamoto, A. Sano, Y. Ikemata, S. Maruyama, and H. Fujimoto, “A study of bouncing rod dynamics aiming at passive running,” inIEEE International Conference on Robotics and Automation. IEEE, 2010, pp. 3298–3303

work page 2010

[33] [33]

Bipedal robot running: human-like actuation timing using fast and slow adaptations,

Y. Sakurai, T. Kamimura, Y. Sakamoto, S. Nishii, K. Sato, Y. Fujiwara, and A. Sano, “Bipedal robot running: human-like actuation timing using fast and slow adaptations,”Adv. Robot., vol. 38, no. 8, pp. 561–572, Apr. 2024

work page 2024

[34] [34]

A deep reinforcement learning based approach towards generating human walking behavior with a neuromus- cular model,

A. S. Anand, G. Zhao, H. Roth, and A. Seyfarth, “A deep reinforcement learning based approach towards generating human walking behavior with a neuromus- cular model,” in2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids). IEEE, Oct. 2019, pp. 537–543

work page 2019

[35] [35]

Learning with muscles: Benefits for data-efficiency and robustness in anthropomorphic tasks,

I. Wochner, P. Schumacher, G. Martius, D. B¨ uchler, S. Schmitt, and D. F. B. Haeufle, “Learning with muscles: Benefits for data-efficiency and robustness in anthropomorphic tasks,” in6th Conference on Robot Learning (CoRL2022), Jul. 2022

work page 2022

[36] [36]

Recurrent world models facilitate policy evolution,

D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” in32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 2018

work page 2018

[37] [37]

Mastering atari with discrete world models,

D. Hafner, T. Lillicrap, M. Norouzi, and J. Ba, “Mastering atari with discrete world models,”ICLR 2021 - 9th International Conference on Learning Represen- tations, Oct. 2020

work page 2021

[38] [38]

DayDreamer: World models for physical robot learning,

P. Wu, A. Escontrela, D. Hafner, K. Goldberg, and P. Abbeel, “DayDreamer: World models for physical robot learning,”Proceedings of Machine Learning Re- search, vol. 205, pp. 2226–2240, Jun. 2022

work page 2022

[39] [39]

UMAP: Uniform manifold approximation and projection,

L. McInnes, J. Healy, N. Saul, and L. Großberger, “UMAP: Uniform manifold approximation and projection,”J. Open Source Softw., vol. 3, no. 29, p. 861, Sep. 2018

work page 2018

[40] [40]

A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals,

C. A. Fukuchi, R. K. Fukuchi, and M. Duarte, “A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals,”PeerJ, vol. 6, p. e4640, Apr. 2018

work page 2018

[41] [41]

A public dataset of running biomechanics and the effects of running speed on lower extremity kinematics and kinetics,

R. K. Fukuchi, C. A. Fukuchi, and M. Duarte, “A public dataset of running biomechanics and the effects of running speed on lower extremity kinematics and kinetics,”PeerJ, vol. 5, no. e3298, p. e3298, May 2017. 20

work page 2017