pith. machine review for the scientific record. sign in

arxiv: 2107.04034 · v1 · submitted 2021-07-08 · 💻 cs.LG · cs.AI· cs.CV· cs.RO

Recognition: 2 theorem links

· Lean Theorem

RMA: Rapid Motor Adaptation for Legged Robots

Authors on Pith no claims yet

Pith reviewed 2026-05-16 13:27 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVcs.RO
keywords rapid motor adaptationlegged robotsquadruped locomotionsim-to-real transferonline adaptationreinforcement learningrobot control
0
0 comments X

The pith

A two-part algorithm lets quadruped robots adapt motor control to new terrains and dynamics in fractions of a second.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Rapid Motor Adaptation (RMA) to solve real-time adaptation for legged robots facing unseen conditions such as changing surfaces, payloads, or wear. It trains a base policy and an adaptation module entirely in simulation on a varied terrain generator, using bioenergetics-inspired rewards and without reference trajectories or foot generators. The trained system deploys directly on the physical A1 robot and adjusts its actions on the fly. A reader would care because legged robots must handle unpredictable environments immediately, yet most current approaches either require slow real-world retraining or rely on hand-crafted components that limit flexibility.

Core claim

RMA consists of a base policy that maps robot states to actions and an adaptation module that processes a short history of states and actions to produce a latent representation of current terrain or dynamics. The combination is trained end-to-end in simulation and enables the robot to adapt to novel situations in fractions of a second. When deployed zero-shot on the A1 quadruped, the system achieves state-of-the-art performance across rocky, slippery, deformable, and vegetated surfaces in both simulation and real-world tests.

What carries the argument

The adaptation module, which infers a latent vector of environmental changes from recent state-action history and feeds it to the base policy for immediate adjustment.

If this is right

  • The robot adapts its motor commands in fractions of a second to previously unseen conditions.
  • Training requires no reference trajectories or predefined foot trajectory generators.
  • Direct sim-to-real deployment works without any real-world fine-tuning or domain knowledge.
  • State-of-the-art performance holds on both simulated and physical experiments with diverse terrains including stairs, sand, grass, and pebbles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same base-plus-adaptation structure could be tested on other robot morphologies if the latent inference generalizes beyond quadrupeds.
  • Performance would likely degrade on conditions that the simulation generator never sampled, pointing to the value of expanding terrain randomization.
  • Pairing the adaptation module with onboard perception could reduce reliance on proprioception alone for even harder unseen environments.

Load-bearing premise

The distribution of terrains and dynamics in the simulation's varied terrain generator is representative enough of real-world conditions that the adaptation module transfers without further training.

What would settle it

Placing the robot on a real surface whose properties fall clearly outside the simulation generator's range, such as an extreme mud or ice patch never approximated in training, and checking whether rapid successful adaptation still occurs.

read the original abstract

Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained completely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky, slippery, deformable surfaces in environments with grass, long vegetation, concrete, pebbles, stairs, sand, etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Video results at https://ashish-kmr.github.io/rma-legged-robots/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents Rapid Motor Adaptation (RMA) for quadruped robots, consisting of a base policy and an adaptation module. RMA is trained entirely in simulation using a varied terrain generator and bioenergetics-inspired rewards, with no reference trajectories or foot-trajectory generators. The central claim is zero-shot real-world deployment on the Unitree A1 robot, enabling adaptation to novel terrains (rocky, slippery, deformable, grass, sand, stairs) in fractions of a second and achieving state-of-the-art performance across simulation and real-world experiments.

Significance. If the adaptation module generalizes as claimed, the result would be significant for legged-robot control: it demonstrates practical, real-time online adaptation to changing dynamics and payloads without fine-tuning or domain-specific knowledge. The empirical focus on diverse real-world surfaces and the use of bioenergetics-inspired rewards are strengths that could influence future sim-to-real pipelines.

major comments (2)
  1. [§3.2 and §4] §3.2 (Varied Terrain Generator) and §4 (Experiments): the zero-shot generalization claim requires that the distribution of simulated terrains and dynamics overlaps with real-world test conditions (friction, compliance, contact parameters). No quantitative overlap metric, parameter sensitivity sweep, or out-of-distribution detection result is reported; without this, it is impossible to rule out that successful real-robot trials reflect post-hoc terrain selection rather than true rapid adaptation.
  2. [§4.2] §4.2 (Real-world results): the reported adaptation time of 'fractions of a second' is central to the contribution, yet the manuscript provides no latency breakdown (encoder inference + policy execution) or ablation showing that the adaptation module, rather than the base policy alone, is responsible for the observed robustness on slippery and deformable surfaces.
minor comments (2)
  1. [§3.1] The reward function definition in §3.1 uses several bioenergetics-inspired terms whose relative weighting is stated but not justified by an ablation; a short sensitivity table would improve reproducibility.
  2. [Figure 3] Figure 3 (real-robot snapshots) would benefit from explicit labeling of terrain type and quantitative metrics (e.g., success rate, traversal time) directly on the figure or in an accompanying table.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and indicate the revisions we will incorporate.

read point-by-point responses
  1. Referee: [§3.2 and §4] §3.2 (Varied Terrain Generator) and §4 (Experiments): the zero-shot generalization claim requires that the distribution of simulated terrains and dynamics overlaps with real-world test conditions (friction, compliance, contact parameters). No quantitative overlap metric, parameter sensitivity sweep, or out-of-distribution detection result is reported; without this, it is impossible to rule out that successful real-robot trials reflect post-hoc terrain selection rather than true rapid adaptation.

    Authors: We agree that a quantitative overlap analysis would strengthen the zero-shot claim. In the revision we will add a parameter sensitivity sweep over friction, compliance, and contact stiffness, reporting the simulation ranges against values estimated from the real-world test surfaces. We will also clarify the design rationale of the varied terrain generator to show it was constructed to cover the diversity of conditions later tested on hardware. While the breadth of successful real-world deployments (rocky, slippery, sand, stairs) supports generalization beyond post-hoc selection, we acknowledge the referee's point and will include the requested metrics. revision: partial

  2. Referee: [§4.2] §4.2 (Real-world results): the reported adaptation time of 'fractions of a second' is central to the contribution, yet the manuscript provides no latency breakdown (encoder inference + policy execution) or ablation showing that the adaptation module, rather than the base policy alone, is responsible for the observed robustness on slippery and deformable surfaces.

    Authors: We will revise §4.2 to include a hardware latency breakdown separating adaptation-module encoder inference from base-policy execution. We will also add an ablation that directly compares the base policy alone against the full RMA system on the same real-world slippery and deformable surfaces, quantifying the improvement attributable to the adaptation module. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical training and external validation

full rationale

The paper trains a base policy and adaptation module end-to-end in simulation using a terrain generator and bioenergetics rewards, then measures zero-shot real-robot performance on external test terrains. No equations reduce reported adaptation speed or success rates back to fitted parameters by construction, and no load-bearing self-citations or uniqueness theorems are invoked. The derivation chain consists of standard RL components whose outputs are falsifiable against real hardware.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that simulation-trained policies plus a learned adapter can cover the distribution shift to real terrains without explicit modeling of contact dynamics or sensor noise; no new physical entities are introduced.

free parameters (2)
  • adaptation module output dimension and update rate
    Chosen to allow rapid adjustment; value not stated in abstract but required for the fractions-of-a-second claim.
  • bioenergetics-inspired reward weights
    Multiple scalar weights used to shape the base policy training; these are fitted or tuned to produce stable gaits.
axioms (1)
  • domain assumption Simulation dynamics are close enough to reality that zero-shot transfer is possible after adaptation
    Invoked when claiming deployment on A1 without fine-tuning.

pith-pipeline@v0.9.0 · 5491 in / 1355 out tokens · 41931 ms · 2026-05-16T13:27:23.460384+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain

    cs.NE 2026-05 unverdicted novelty 7.0

    An equilibrium-propagation-based PPO controller for a 12-DoF quadruped achieves locomotion performance comparable to backpropagation-trained PPO on uneven terrain while using 4.3 times less GPU memory.

  2. Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

    cs.LG 2026-05 unverdicted novelty 7.0

    SeqRejectron builds a stopping rule from a small set of validator policies to achieve horizon-free sample-complexity guarantees for selective imitation learning under arbitrary train-test dynamics shifts.

  3. Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering

    cs.RO 2026-01 unverdicted novelty 7.0

    NeuroKalman mitigates state drift in vision-language UAV navigation by using memory-augmented Kalman filtering where attention retrieves historical anchors to correct predictions without gradient updates.

  4. Offline Reinforcement Learning for Rotation Profile Control in Tokamaks

    cs.LG 2026-05 unverdicted novelty 6.0

    Offline RL policies trained solely on DIII-D historical data were deployed on the tokamak and produced promising real-world control of the plasma rotation profile.

  5. SixthSense: Task-Agnostic Proprioception-Only Whole-Body Wrench Estimation for Humanoids

    cs.RO 2026-05 unverdicted novelty 6.0

    SixthSense infers whole-body contact events and wrenches in humanoids from proprioception and IMU data alone by tokenizing histories and estimating a sparse contact-event flow with conditional flow matching.

  6. GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning

    cs.RO 2026-04 unverdicted novelty 6.0

    GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for ...

  7. Wiggle and Go! System Identification for Zero-Shot Dynamic Rope Manipulation

    cs.RO 2026-04 unverdicted novelty 6.0

    Wiggle and Go! uses system identification from rope motion observations to predict parameters that enable zero-shot goal-conditioned dynamic manipulation, achieving 3.55 cm accuracy on 3D target striking versus 15.34 ...

  8. Abstract Sim2Real through Approximate Information States

    cs.RO 2026-04 unverdicted novelty 6.0

    Abstract simulators can be grounded to real tasks by making their dynamics history-dependent and correcting them with real data, enabling RL policy transfer.

  9. FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control

    cs.LG 2026-04 unverdicted novelty 6.0

    FlashSAC scales up Soft Actor-Critic with fewer updates, larger models, higher data throughput, and norm bounds to deliver faster, more stable training than PPO on high-dimensional robot control tasks across dozens of...

  10. Sim2Real-AD: A Modular Sim-to-Real Framework for Deploying VLM-Guided Reinforcement Learning in Real-World Autonomous Driving

    cs.RO 2026-04 unverdicted novelty 6.0

    Sim2Real-AD enables zero-shot transfer of CARLA-trained VLM-guided RL policies to full-scale vehicles, reporting 75-90% success rates in car-following, obstacle avoidance, and stop-sign scenarios without real-world RL...

  11. Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots

    cs.RO 2026-04 unverdicted novelty 6.0

    DreamTIP adds LLM-identified task-invariant properties as auxiliary targets in Dreamer's world model plus a mixed-replay adaptation step, delivering 28.1% average simulated transfer gains and 100% real-world climb suc...

  12. Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation

    cs.RO 2026-03 unverdicted novelty 6.0

    SimDist pretrains world models in simulation and adapts them to real-world robots by updating only the latent dynamics model, enabling rapid improvement on contact-rich tasks where prior methods fail.

  13. PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation

    cs.RO 2026-03 unverdicted novelty 6.0

    PTLD distills real privileged tactile data into a state estimator to boost sim-to-real performance of proprioceptive dexterous manipulation policies, yielding 182% improvement on in-hand rotation and 57% on reorientat...

  14. Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

    cs.RO 2026-02 unverdicted novelty 6.0

    A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over...

  15. MUJICA: Multi-skill Unified Joint Integration of Control Architecture for Wheeled-Legged Robots

    cs.RO 2026-05 unverdicted novelty 5.0

    A single reinforcement learning policy jointly trains multiple locomotion skills for wheeled-legged robots with DC-motor constraints and learns a proprioceptive skill selector for adaptive behavior.

  16. Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models

    cs.LG 2026-05 unverdicted novelty 5.0

    Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.

  17. UniCon: A Unified System for Efficient Robot Learning Transfers

    cs.RO 2026-01 unverdicted novelty 5.0

    UniCon standardizes states and control logic into modular execution graphs for efficient transfer of learning controllers across heterogeneous robots, with lower latency than ROS.

  18. Quadruped Parkour Learning: Sparsely Gated Mixture of Experts with Visual Input

    cs.RO 2026-04 unverdicted novelty 4.0

    Sparsely gated MoE policies double the success rate of a real Unitree Go2 quadruped on large-obstacle parkour versus matched-active-parameter MLP baselines while cutting inference time compared with a scaled-up MLP.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · cited by 18 Pith papers · 1 internal anchor

  1. [1]

    Surrogate-based aerodynamic design optimization: Use of surrogates in aerodynamic design optimization

    MYM Ahmed and N Qin. Surrogate-based aerodynamic design optimization: Use of surrogates in aerodynamic design optimization. In International Conference on Aerospace Sciences and Aviation Technology , 2009. 3

  2. [2]

    Rapidly exponentially stabilizing control lyapunov functions and hybrid zero dynamics

    Aaron D Ames, Kevin Galloway, Koushil Sreenath, and Jessy W Grizzle. Rapidly exponentially stabilizing control lyapunov functions and hybrid zero dynamics. IEEE Transactions on Automatic Control , 2014. 1, 3

  3. [3]

    Fast online trajectory optimization for the bipedal robot cassie

    Taylor Apgar, Patrick Clary, Kevin Green, Alan Fern, and Jonathan W Hurst. Fast online trajectory optimization for the bipedal robot cassie. In Robotics: Science and Systems, 2018. 3

  4. [4]

    Design and Control of Small Legged Robots

    Monica Barragan, Nikolai Flowers, and Aaron M. Johnson. MiniRHex: A small, open-source, fully programmable walking hexapod. In Robotics: Science and Systems Workshop on “Design and Control of Small Legged Robots”, 2018. 3

  5. [5]

    Mit chee- tah 3: Design and control of a robust, dynamic quadruped robot

    Gerardo Bledt, Matthew J Powell, Benjamin Katz, Jared Di Carlo, Patrick M Wensing, and Sangbae Kim. Mit chee- tah 3: Design and control of a robust, dynamic quadruped robot. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2018. 3

  6. [6]

    Leveraging multiple simulators for crossing the reality gap

    Adrian Boeing and Thomas Br ¨aunl. Leveraging multiple simulators for crossing the reality gap. In 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV) . IEEE, 2012. 3

  7. [7]

    Nonlinear system identification using coevolution of models and tests

    Josh C Bongard and Hod Lipson. Nonlinear system identification using coevolution of models and tests. IEEE Transactions on Evolutionary Computation , 2005. 3

  8. [8]

    Bayesian optimization for learning gaits under uncertainty

    Roberto Calandra, Andr ´e Seyfarth, Jan Peters, and Marc Peter Deisenroth. Bayesian optimization for learning gaits under uncertainty. Annals of Mathematics and Artificial Intelligence, 2016. 3

  9. [9]

    Optimizing simulations with noise-tolerant structured exploration

    Krzysztof Choromanski, Atil Iscen, Vikas Sindhwani, Jie Tan, and Erwin Coumans. Optimizing simulations with noise-tolerant structured exploration. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018. 3

  10. [10]

    Fearing, Pieter Abbeel, Sergey Levine, and Chelsea Finn

    Ignasi Clavera, Anusha Nagabandi, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, and Chelsea Finn. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In International Conference on Learning Representations , 2019. 4

  11. [11]

    Feature-based locomotion controllers

    Martin De Lasa, Igor Mordatch, and Aaron Hertzmann. Feature-based locomotion controllers. ACM Transactions on Graphics (TOG) , 2010. 3

  12. [12]

    Dynamic locomotion in the mit cheetah 3 through convex model-predictive control

    Jared Di Carlo, Patrick M Wensing, Benjamin Katz, Gerardo Bledt, and Sangbae Kim. Dynamic locomotion in the mit cheetah 3 through convex model-predictive control. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2018. 3

  13. [13]

    Model- agnostic meta-learning for fast adaptation of deep net- works

    Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model- agnostic meta-learning for fast adaptation of deep net- works. In International Conference on Machine Learning . PMLR, 2017. 4

  14. [14]

    Address- ing function approximation error in actor-critic methods

    Scott Fujimoto, Herke Hoof, and David Meger. Address- ing function approximation error in actor-critic methods. In International Conference on Machine Learning. PMLR,

  15. [15]

    Practice makes perfect: An optimization-based approach to controlling agile motions for a quadruped robot

    Christian Gehring, Stelian Coros, Marco Hutter, Carmine Dario Bellicoso, Huub Heijnen, Remo Diethelm, Michael Bloesch, P ´eter Fankhauser, Jemin Hwangbo, Mark Hoepflinger, et al. Practice makes perfect: An optimization-based approach to controlling agile motions for a quadruped robot. IEEE Robotics & Automation Magazine, 2016. 3

  16. [16]

    Positive force feedback in bouncing gaits? Proceedings of the Royal Society of London

    Hartmut Geyer, Andre Seyfarth, and Reinhard Blickhan. Positive force feedback in bouncing gaits? Proceedings of the Royal Society of London. Series B: Biological Sciences, 2003. 1, 3

  17. [17]

    Convolutional neural networks for steady flow approximation

    Xiaoxiao Guo, Wei Li, and Francesco Iorio. Convolutional neural networks for steady flow approximation. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016. 3

  18. [18]

    Learning to walk via deep reinforcement learning

    Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, and Sergey Levine. Learning to walk via deep reinforcement learning. In Robotics: Science and Systems, 2019. 1, 3

  19. [19]

    Grounded action transfor- mation for robot learning in simulation

    Josiah Hanna and Peter Stone. Grounded action transfor- mation for robot learning in simulation. In Proceedings of the AAAI Conference on Artificial Intelligence , 2017. 4

  20. [20]

    Anymal-a highly mobile and dynamic quadrupedal robot

    Marco Hutter, Christian Gehring, Dominic Jud, Andreas Lauber, C Dario Bellicoso, Vassilios Tsounis, Jemin Hwangbo, Karen Bodie, Peter Fankhauser, Michael Bloesch, et al. Anymal-a highly mobile and dynamic quadrupedal robot. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2016. 3

  21. [21]

    RaisimGymTorch

    Jemin Hwangbo. RaisimGymTorch. https://raisim.com/ sections/RaisimGymTorch.html, 2020-2021. 12

  22. [22]

    Per- contact iteration method for solving contact dynamics

    Jemin Hwangbo, Joonho Lee, and Marco Hutter. Per- contact iteration method for solving contact dynamics. IEEE Robotics and Automation Letters , 2018. URL www. raisim.com. 5

  23. [23]

    Learning agile and dynamic motor skills for legged robots

    Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 2019. 1, 3, 4, 8, 12

  24. [24]

    Implementation of trot-to-gallop transition and subsequent gallop on the mit cheetah i

    Dong Jin Hyun, Jongwoo Lee, SangIn Park, and Sangbae Kim. Implementation of trot-to-gallop transition and subsequent gallop on the mit cheetah i. The International Journal of Robotics Research , 2016. 1, 3

  25. [25]

    Policies modulating trajectory generators

    Atil Iscen, Ken Caluwaerts, Jie Tan, Tingnan Zhang, Er- win Coumans, Vikas Sindhwani, and Vincent Vanhoucke. Policies modulating trajectory generators. In Conference on Robot Learning . PMLR, 2018. 3

  26. [26]

    Tail assisted dynamic self righting

    Aaron M Johnson, Thomas Libby, Evan Chang-Siu, Masayoshi Tomizuka, Robert J Full, and Daniel E Koditschek. Tail assisted dynamic self righting. In Adaptive Mobile Robotics . World Scientific, 2012. 1, 3

  27. [27]

    Fast, robust quadruped locomotion over challenging terrain

    Mrinal Kalakrishnan, Jonas Buchli, Peter Pastor, Michael Mistry, and Stefan Schaal. Fast, robust quadruped locomotion over challenging terrain. In 2010 IEEE International Conference on Robotics and Automation . IEEE, 2010. 3

  28. [28]

    Piecewise linear spine for speed–energy efficiency trade-off in quadruped robots

    Mahdi Khoramshahi, Hamed Jalaly Bidgoly, Soroosh Shafiee, Ali Asaei, Auke Jan Ijspeert, and Majid Nili Ahmadabadi. Piecewise linear spine for speed–energy efficiency trade-off in quadruped robots. Robotics and Autonomous Systems, 2013. 1, 3

  29. [29]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations , 2015. 6, 12

  30. [30]

    Reinforce- ment learning in robotics: A survey

    Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforce- ment learning in robotics: A survey. The International Journal of Robotics Research , 2013. 3

  31. [31]

    Crossing the reality gap in evolutionary robotics by promoting transferable controllers

    Sylvain Koos, Jean-Baptiste Mouret, and St ´ephane Don- cieux. Crossing the reality gap in evolutionary robotics by promoting transferable controllers. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, 2010. 3

  32. [32]

    Learning quadrupedal loco- motion over challenging terrain

    Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal loco- motion over challenging terrain. Science robotics, 2020. 1, 3, 4

  33. [33]

    Lillicrap, Jonathan J

    Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. In ICLR, 2016. 3

  34. [34]

    Robust trajectory opti- mization under frictional contact with iterative learning

    Jingru Luo and Kris Hauser. Robust trajectory opti- mization under frictional contact with iterative learning. Autonomous Robots, 2017. 4

  35. [35]

    Gaze and the control of foot placement when walking in natural terrain

    Jonathan Samir Matthis, Jacob L Yates, and Mary M Hayhoe. Gaze and the control of foot placement when walking in natural terrain. Current Biology, 2018. 9

  36. [36]

    Dynamic walk of a biped

    Hirofumi Miura and Isao Shimoyama. Dynamic walk of a biped. The International Journal of Robotics Research ,

  37. [37]

    Asynchronous methods for deep reinforcement learning

    V olodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning . PMLR, 2016. 3

  38. [38]

    Multi-agent manipulation via locomotion using hierarchical sim2real

    Ofir Nachum, Michael Ahn, Hugo Ponte, Shixiang Shane Gu, and Vikash Kumar. Multi-agent manipulation via locomotion using hierarchical sim2real. In Conference on Robot Learning . PMLR, 2020. 4

  39. [39]

    Why off-the-shelf physics simulators fail in evaluat- ing feedback controller performance-a case study for quadrupedal robots

    Michael Neunert, Thiago Boaventura, and Jonas Buchli. Why off-the-shelf physics simulators fail in evaluat- ing feedback controller performance-a case study for quadrupedal robots. In Advances in Cooperative Robotics . World Scientific, 2017. 3

  40. [40]

    Sim-to-real transfer of robotic control with dynamics randomization

    Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE international conference on robotics and automation (ICRA) . IEEE,

  41. [41]

    Learning agile robotic locomotion skills by imitating animals

    Xue Bin Peng, Erwin Coumans, Tingnan Zhang, Tsang- Wei Edward Lee, Jie Tan, and Sergey Levine. Learning agile robotic locomotion skills by imitating animals. In Robotics: Science and Systems , 2020. 1, 3, 4, 7, 8

  42. [42]

    An inelastic quadrupedal model discovers four-beat walking, two-beat running, and pseudo-elastic actuation as energetically optimal

    Delyle T Polet and John EA Bertram. An inelastic quadrupedal model discovers four-beat walking, two-beat running, and pseudo-elastic actuation as energetically optimal. PLoS computational biology , 2019. 4

  43. [43]

    Hopping in legged systems—modeling and simulation for the two-dimensional one-legged case

    Marc H Raibert. Hopping in legged systems—modeling and simulation for the two-dimensional one-legged case. IEEE Transactions on Systems, Man, and Cybernetics ,

  44. [44]

    Chomp: Gradient optimization techniques for efficient motion planning

    Nathan Ratliff, Matt Zucker, J Andrew Bagnell, and Siddhartha Srinivasa. Chomp: Gradient optimization techniques for efficient motion planning. In 2009 IEEE International Conference on Robotics and Automation . IEEE, 2009. 3

  45. [45]

    A reduction of imitation learning and structured prediction to no-regret online learning

    St´ephane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011. 5

  46. [46]

    Rhex: A simple and highly mobile hexapod robot

    Uluc Saranli, Martin Buehler, and Daniel E Koditschek. Rhex: A simple and highly mobile hexapod robot. The International Journal of Robotics Research , 2001. 1

  47. [47]

    Jordan, and Pieter Abbeel

    John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. In 4th International Conference on Learning Representations ,

  48. [48]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 , 2017. 3, 6, 12

  49. [49]

    Rapidly adaptable legged robots via evolutionary meta-learning

    Xingyou Song, Yuxiang Yang, Krzysztof Choromanski, Ken Caluwaerts, Wenbo Gao, Chelsea Finn, and Jie Tan. Rapidly adaptable legged robots via evolutionary meta-learning. In International Conference on Intelligent Robots and Systems (IROS) , 2020. 4

  50. [50]

    A compliant hybrid zero dynamics controller for stable, efficient and fast bipedal walking on mabel

    Koushil Sreenath, Hae-Won Park, Ioannis Poulakakis, and Jessy W Grizzle. A compliant hybrid zero dynamics controller for stable, efficient and fast bipedal walking on mabel. The International Journal of Robotics Research ,

  51. [51]

    Sim-to-real: Learning agile locomotion for quadruped robots

    Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots. In Robotics: Science and Systems ,

  52. [52]

    Domain randomization for transferring deep neural networks from simulation to the real world

    Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Woj- ciech Zaremba, and Pieter Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS) . IEEE, 2017. 1, 4, 7, 8

  53. [53]

    Unitree Robotics

    Xingxing Wang. Unitree Robotics. https://www.unitree. com/. 5

  54. [54]

    Dynamics randomization revisited: A case study for quadrupedal locomotion

    Zhaoming Xie, Xingye Da, Michiel van de Panne, Buck Babich, and Animesh Garg. Dynamics randomization revisited: A case study for quadrupedal locomotion. arXiv preprint arXiv:2011.02404, 2020. 4

  55. [55]

    Data efficient reinforcement learning for legged robots

    Yuxiang Yang, Ken Caluwaerts, Atil Iscen, Tingnan Zhang, Jie Tan, and Vikas Sindhwani. Data efficient reinforcement learning for legged robots. In Conference on Robot Learning . PMLR, 2020. 1, 3

  56. [56]

    Simbicon: Simple biped locomotion control

    KangKang Yin, Kevin Loken, and Michiel Van de Panne. Simbicon: Simple biped locomotion control. ACM Transactions on Graphics (TOG) , 2007. 1, 3

  57. [57]

    Karen Liu, and Greg Turk

    Wenhao Yu, Jie Tan, C. Karen Liu, and Greg Turk. Preparing for the unknown: Learning a universal policy with online system identification. In Robotics: Science and Systems, 2017. 3, 4, 7, 8

  58. [58]

    Policy transfer with strategy optimization

    Wenhao Yu, C Karen Liu, and Greg Turk. Policy transfer with strategy optimization. In International Conference on Learning Representations , 2018. 4

  59. [59]

    Wenhao Yu, Visak C. V . Kumar, Greg Turk, and C. Karen Liu. Sim-to-real transfer for biped locomotion. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2019. 4

  60. [60]

    Learning fast adaptation with meta strategy optimization

    Wenhao Yu, Jie Tan, Yunfei Bai, Erwin Coumans, and Sehoon Ha. Learning fast adaptation with meta strategy optimization. IEEE Robotics and Automation Letters ,

  61. [61]

    Envi- ronment probing interaction policies

    Wenxuan Zhou, Lerrel Pinto, and Abhinav Gupta. Envi- ronment probing interaction policies. In 7th International Conference on Learning Representations, ICLR 2019 ,

  62. [62]

    The stanford littledog: A learning and rapid replanning approach to quadruped locomotion

    J Zico Kolter and Andrew Y Ng. The stanford littledog: A learning and rapid replanning approach to quadruped locomotion. The International Journal of Robotics Research, 2011. 3

  63. [63]

    An optimization approach to rough terrain locomotion

    Matt Zucker, J Andrew Bagnell, Christopher G Atke- son, and James Kuffner. An optimization approach to rough terrain locomotion. In 2010 IEEE International Conference on Robotics and Automation . IEEE, 2010. 1, 3

  64. [64]

    Optimization and learning for rough terrain legged locomotion

    Matt Zucker, Nathan Ratliff, Martin Stolle, Joel Chestnutt, J Andrew Bagnell, Christopher G Atkeson, and James Kuffner. Optimization and learning for rough terrain legged locomotion. The International Journal of Robotics Research, 2011. 3 Supplementary for RMA: Rapid Motor Adaptation for Legged Robots S1. M ETRICS We use several metrics (in SI units) to e...