arxiv: 2107.04034 · v1 · submitted 2021-07-08 · 💻 cs.LG · cs.AI· cs.CV· cs.RO

Recognition: 2 theorem links

· Lean Theorem

RMA: Rapid Motor Adaptation for Legged Robots

Ashish Kumar , Zipeng Fu , Deepak Pathak , Jitendra Malik

Authors on Pith no claims yet

Pith reviewed 2026-05-16 13:27 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVcs.RO

keywords rapid motor adaptationlegged robotsquadruped locomotionsim-to-real transferonline adaptationreinforcement learningrobot control

0 comments

The pith

A two-part algorithm lets quadruped robots adapt motor control to new terrains and dynamics in fractions of a second.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Rapid Motor Adaptation (RMA) to solve real-time adaptation for legged robots facing unseen conditions such as changing surfaces, payloads, or wear. It trains a base policy and an adaptation module entirely in simulation on a varied terrain generator, using bioenergetics-inspired rewards and without reference trajectories or foot generators. The trained system deploys directly on the physical A1 robot and adjusts its actions on the fly. A reader would care because legged robots must handle unpredictable environments immediately, yet most current approaches either require slow real-world retraining or rely on hand-crafted components that limit flexibility.

Core claim

RMA consists of a base policy that maps robot states to actions and an adaptation module that processes a short history of states and actions to produce a latent representation of current terrain or dynamics. The combination is trained end-to-end in simulation and enables the robot to adapt to novel situations in fractions of a second. When deployed zero-shot on the A1 quadruped, the system achieves state-of-the-art performance across rocky, slippery, deformable, and vegetated surfaces in both simulation and real-world tests.

What carries the argument

The adaptation module, which infers a latent vector of environmental changes from recent state-action history and feeds it to the base policy for immediate adjustment.

If this is right

The robot adapts its motor commands in fractions of a second to previously unseen conditions.
Training requires no reference trajectories or predefined foot trajectory generators.
Direct sim-to-real deployment works without any real-world fine-tuning or domain knowledge.
State-of-the-art performance holds on both simulated and physical experiments with diverse terrains including stairs, sand, grass, and pebbles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same base-plus-adaptation structure could be tested on other robot morphologies if the latent inference generalizes beyond quadrupeds.
Performance would likely degrade on conditions that the simulation generator never sampled, pointing to the value of expanding terrain randomization.
Pairing the adaptation module with onboard perception could reduce reliance on proprioception alone for even harder unseen environments.

Load-bearing premise

The distribution of terrains and dynamics in the simulation's varied terrain generator is representative enough of real-world conditions that the adaptation module transfers without further training.

What would settle it

Placing the robot on a real surface whose properties fall clearly outside the simulation generator's range, such as an extreme mud or ice patch never approximated in training, and checking whether rapid successful adaptation still occurs.

read the original abstract

Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained completely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky, slippery, deformable surfaces in environments with grass, long vegetation, concrete, pebbles, stairs, sand, etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Video results at https://ashish-kmr.github.io/rma-legged-robots/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RMA pairs a base policy with a learned adaptation module to get sub-second terrain handling on real quadrupeds after pure sim training, but the zero-shot claim hinges on untested coverage of real dynamics by the terrain generator.

read the letter

The main thing to know is that this paper shows a two-module setup—a base policy plus a separate adaptation module—trained end-to-end in simulation that lets an A1 quadruped adjust to new surfaces like rocks, sand, and stairs in fractions of a second with no real-world fine-tuning or hand-designed foot trajectories. They use a varied terrain generator plus bioenergetics-style rewards and report state-of-the-art results on both simulation and hardware tests across multiple difficult environments. That combination is a clean incremental step over earlier sim-to-real locomotion work that often relied on reference motions or per-environment tuning. The real-robot deployment without extra calibration is the part that stands out as practically useful. The approach stays empirical and avoids circular fitting by training against external sim and real test conditions. The soft spot is the assumption that the terrain generator's distribution of friction, compliance, and contact dynamics is broad enough to keep the adaptation module in-distribution on the actual test surfaces. No overlap metrics or deliberate mismatch sweeps are described, so it is hard to tell how much the reported performance depends on lucky coverage versus true generalization. Some details like the adaptation module's output dimension and update rate are left as free choices that could need robot-specific tuning. This paper is for people working on online adaptation and sim-to-real transfer in legged robotics. Anyone building field-deployable quadrupeds will get concrete implementation ideas from the real-world section. It deserves a serious referee because the hardware results are specific enough to check in detail even if the generalization argument needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper presents Rapid Motor Adaptation (RMA) for quadruped robots, consisting of a base policy and an adaptation module. RMA is trained entirely in simulation using a varied terrain generator and bioenergetics-inspired rewards, with no reference trajectories or foot-trajectory generators. The central claim is zero-shot real-world deployment on the Unitree A1 robot, enabling adaptation to novel terrains (rocky, slippery, deformable, grass, sand, stairs) in fractions of a second and achieving state-of-the-art performance across simulation and real-world experiments.

Significance. If the adaptation module generalizes as claimed, the result would be significant for legged-robot control: it demonstrates practical, real-time online adaptation to changing dynamics and payloads without fine-tuning or domain-specific knowledge. The empirical focus on diverse real-world surfaces and the use of bioenergetics-inspired rewards are strengths that could influence future sim-to-real pipelines.

major comments (2)

[§3.2 and §4] §3.2 (Varied Terrain Generator) and §4 (Experiments): the zero-shot generalization claim requires that the distribution of simulated terrains and dynamics overlaps with real-world test conditions (friction, compliance, contact parameters). No quantitative overlap metric, parameter sensitivity sweep, or out-of-distribution detection result is reported; without this, it is impossible to rule out that successful real-robot trials reflect post-hoc terrain selection rather than true rapid adaptation.
[§4.2] §4.2 (Real-world results): the reported adaptation time of 'fractions of a second' is central to the contribution, yet the manuscript provides no latency breakdown (encoder inference + policy execution) or ablation showing that the adaptation module, rather than the base policy alone, is responsible for the observed robustness on slippery and deformable surfaces.

minor comments (2)

[§3.1] The reward function definition in §3.1 uses several bioenergetics-inspired terms whose relative weighting is stated but not justified by an ablation; a short sensitivity table would improve reproducibility.
[Figure 3] Figure 3 (real-robot snapshots) would benefit from explicit labeling of terrain type and quantitative metrics (e.g., success rate, traversal time) directly on the figure or in an accompanying table.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and indicate the revisions we will incorporate.

read point-by-point responses

Referee: [§3.2 and §4] §3.2 (Varied Terrain Generator) and §4 (Experiments): the zero-shot generalization claim requires that the distribution of simulated terrains and dynamics overlaps with real-world test conditions (friction, compliance, contact parameters). No quantitative overlap metric, parameter sensitivity sweep, or out-of-distribution detection result is reported; without this, it is impossible to rule out that successful real-robot trials reflect post-hoc terrain selection rather than true rapid adaptation.

Authors: We agree that a quantitative overlap analysis would strengthen the zero-shot claim. In the revision we will add a parameter sensitivity sweep over friction, compliance, and contact stiffness, reporting the simulation ranges against values estimated from the real-world test surfaces. We will also clarify the design rationale of the varied terrain generator to show it was constructed to cover the diversity of conditions later tested on hardware. While the breadth of successful real-world deployments (rocky, slippery, sand, stairs) supports generalization beyond post-hoc selection, we acknowledge the referee's point and will include the requested metrics. revision: partial
Referee: [§4.2] §4.2 (Real-world results): the reported adaptation time of 'fractions of a second' is central to the contribution, yet the manuscript provides no latency breakdown (encoder inference + policy execution) or ablation showing that the adaptation module, rather than the base policy alone, is responsible for the observed robustness on slippery and deformable surfaces.

Authors: We will revise §4.2 to include a hardware latency breakdown separating adaptation-module encoder inference from base-policy execution. We will also add an ablation that directly compares the base policy alone against the full RMA system on the same real-world slippery and deformable surfaces, quantifying the improvement attributable to the adaptation module. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical training and external validation

full rationale

The paper trains a base policy and adaptation module end-to-end in simulation using a terrain generator and bioenergetics rewards, then measures zero-shot real-robot performance on external test terrains. No equations reduce reported adaptation speed or success rates back to fitted parameters by construction, and no load-bearing self-citations or uniqueness theorems are invoked. The derivation chain consists of standard RL components whose outputs are falsifiable against real hardware.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that simulation-trained policies plus a learned adapter can cover the distribution shift to real terrains without explicit modeling of contact dynamics or sensor noise; no new physical entities are introduced.

free parameters (2)

adaptation module output dimension and update rate
Chosen to allow rapid adjustment; value not stated in abstract but required for the fractions-of-a-second claim.
bioenergetics-inspired reward weights
Multiple scalar weights used to shape the base policy training; these are fitted or tuned to produce stable gaits.

axioms (1)

domain assumption Simulation dynamics are close enough to reality that zero-shot transfer is possible after adaptation
Invoked when claiming deployment on A1 without fine-tuning.

pith-pipeline@v0.9.0 · 5491 in / 1355 out tokens · 41931 ms · 2026-05-16T13:27:23.460384+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RMA consists of two components: a base policy and an adaptation module... z_t = μ(e_t) ... φ(x_{t-k:t-1}, a_{t-k:t-1}) ... trained via supervised learning with on-policy data
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We train RMA on a varied terrain generator using bioenergetics-inspired rewards

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain
cs.NE 2026-05 unverdicted novelty 7.0

An equilibrium-propagation-based PPO controller for a 12-DoF quadruped achieves locomotion performance comparable to backpropagation-trained PPO on uneven terrain while using 4.3 times less GPU memory.
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
cs.LG 2026-05 unverdicted novelty 7.0

SeqRejectron builds a stopping rule from a small set of validator policies to achieve horizon-free sample-complexity guarantees for selective imitation learning under arbitrary train-test dynamics shifts.
Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering
cs.RO 2026-01 unverdicted novelty 7.0

NeuroKalman mitigates state drift in vision-language UAV navigation by using memory-augmented Kalman filtering where attention retrieves historical anchors to correct predictions without gradient updates.
Offline Reinforcement Learning for Rotation Profile Control in Tokamaks
cs.LG 2026-05 unverdicted novelty 6.0

Offline RL policies trained solely on DIII-D historical data were deployed on the tokamak and produced promising real-world control of the plasma rotation profile.
SixthSense: Task-Agnostic Proprioception-Only Whole-Body Wrench Estimation for Humanoids
cs.RO 2026-05 unverdicted novelty 6.0

SixthSense infers whole-body contact events and wrenches in humanoids from proprioception and IMU data alone by tokenizing histories and estimating a sparse contact-event flow with conditional flow matching.
GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning
cs.RO 2026-04 unverdicted novelty 6.0

GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for ...
Wiggle and Go! System Identification for Zero-Shot Dynamic Rope Manipulation
cs.RO 2026-04 unverdicted novelty 6.0

Wiggle and Go! uses system identification from rope motion observations to predict parameters that enable zero-shot goal-conditioned dynamic manipulation, achieving 3.55 cm accuracy on 3D target striking versus 15.34 ...
Abstract Sim2Real through Approximate Information States
cs.RO 2026-04 unverdicted novelty 6.0

Abstract simulators can be grounded to real tasks by making their dynamics history-dependent and correcting them with real data, enabling RL policy transfer.
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
cs.LG 2026-04 unverdicted novelty 6.0

FlashSAC scales up Soft Actor-Critic with fewer updates, larger models, higher data throughput, and norm bounds to deliver faster, more stable training than PPO on high-dimensional robot control tasks across dozens of...
Sim2Real-AD: A Modular Sim-to-Real Framework for Deploying VLM-Guided Reinforcement Learning in Real-World Autonomous Driving
cs.RO 2026-04 unverdicted novelty 6.0

Sim2Real-AD enables zero-shot transfer of CARLA-trained VLM-guided RL policies to full-scale vehicles, reporting 75-90% success rates in car-following, obstacle avoidance, and stop-sign scenarios without real-world RL...
Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots
cs.RO 2026-04 unverdicted novelty 6.0

DreamTIP adds LLM-identified task-invariant properties as auxiliary targets in Dreamer's world model plus a mixed-replay adaptation step, delivering 28.1% average simulated transfer gains and 100% real-world climb suc...
Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation
cs.RO 2026-03 unverdicted novelty 6.0

SimDist pretrains world models in simulation and adapts them to real-world robots by updating only the latent dynamics model, enabling rapid improvement on contact-rich tasks where prior methods fail.
PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation
cs.RO 2026-03 unverdicted novelty 6.0

PTLD distills real privileged tactile data into a state estimator to boost sim-to-real performance of proprioceptive dexterous manipulation policies, yielding 182% improvement on in-hand rotation and 57% on reorientat...
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
cs.RO 2026-02 unverdicted novelty 6.0

A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over...
MUJICA: Multi-skill Unified Joint Integration of Control Architecture for Wheeled-Legged Robots
cs.RO 2026-05 unverdicted novelty 5.0

A single reinforcement learning policy jointly trains multiple locomotion skills for wheeled-legged robots with DC-motor constraints and learns a proprioceptive skill selector for adaptive behavior.
Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models
cs.LG 2026-05 unverdicted novelty 5.0

Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.
UniCon: A Unified System for Efficient Robot Learning Transfers
cs.RO 2026-01 unverdicted novelty 5.0

UniCon standardizes states and control logic into modular execution graphs for efficient transfer of learning controllers across heterogeneous robots, with lower latency than ROS.
Quadruped Parkour Learning: Sparsely Gated Mixture of Experts with Visual Input
cs.RO 2026-04 unverdicted novelty 4.0

Sparsely gated MoE policies double the success rate of a real Unitree Go2 quadruped on large-obstacle parkour versus matched-active-parameter MLP baselines while cutting inference time compared with a scaled-up MLP.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · cited by 18 Pith papers · 1 internal anchor

[1]

Surrogate-based aerodynamic design optimization: Use of surrogates in aerodynamic design optimization

MYM Ahmed and N Qin. Surrogate-based aerodynamic design optimization: Use of surrogates in aerodynamic design optimization. In International Conference on Aerospace Sciences and Aviation Technology , 2009. 3

work page 2009
[2]

Rapidly exponentially stabilizing control lyapunov functions and hybrid zero dynamics

Aaron D Ames, Kevin Galloway, Koushil Sreenath, and Jessy W Grizzle. Rapidly exponentially stabilizing control lyapunov functions and hybrid zero dynamics. IEEE Transactions on Automatic Control , 2014. 1, 3

work page 2014
[3]

Fast online trajectory optimization for the bipedal robot cassie

Taylor Apgar, Patrick Clary, Kevin Green, Alan Fern, and Jonathan W Hurst. Fast online trajectory optimization for the bipedal robot cassie. In Robotics: Science and Systems, 2018. 3

work page 2018
[4]

Design and Control of Small Legged Robots

Monica Barragan, Nikolai Flowers, and Aaron M. Johnson. MiniRHex: A small, open-source, fully programmable walking hexapod. In Robotics: Science and Systems Workshop on “Design and Control of Small Legged Robots”, 2018. 3

work page 2018
[5]

Mit chee- tah 3: Design and control of a robust, dynamic quadruped robot

Gerardo Bledt, Matthew J Powell, Benjamin Katz, Jared Di Carlo, Patrick M Wensing, and Sangbae Kim. Mit chee- tah 3: Design and control of a robust, dynamic quadruped robot. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2018. 3

work page 2018
[6]

Leveraging multiple simulators for crossing the reality gap

Adrian Boeing and Thomas Br ¨aunl. Leveraging multiple simulators for crossing the reality gap. In 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV) . IEEE, 2012. 3

work page 2012
[7]

Nonlinear system identiﬁcation using coevolution of models and tests

Josh C Bongard and Hod Lipson. Nonlinear system identiﬁcation using coevolution of models and tests. IEEE Transactions on Evolutionary Computation , 2005. 3

work page 2005
[8]

Bayesian optimization for learning gaits under uncertainty

Roberto Calandra, Andr ´e Seyfarth, Jan Peters, and Marc Peter Deisenroth. Bayesian optimization for learning gaits under uncertainty. Annals of Mathematics and Artiﬁcial Intelligence, 2016. 3

work page 2016
[9]

Optimizing simulations with noise-tolerant structured exploration

Krzysztof Choromanski, Atil Iscen, Vikas Sindhwani, Jie Tan, and Erwin Coumans. Optimizing simulations with noise-tolerant structured exploration. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018. 3

work page 2018
[10]

Fearing, Pieter Abbeel, Sergey Levine, and Chelsea Finn

Ignasi Clavera, Anusha Nagabandi, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, and Chelsea Finn. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In International Conference on Learning Representations , 2019. 4

work page 2019
[11]

Feature-based locomotion controllers

Martin De Lasa, Igor Mordatch, and Aaron Hertzmann. Feature-based locomotion controllers. ACM Transactions on Graphics (TOG) , 2010. 3

work page 2010
[12]

Dynamic locomotion in the mit cheetah 3 through convex model-predictive control

Jared Di Carlo, Patrick M Wensing, Benjamin Katz, Gerardo Bledt, and Sangbae Kim. Dynamic locomotion in the mit cheetah 3 through convex model-predictive control. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2018. 3

work page 2018
[13]

Model- agnostic meta-learning for fast adaptation of deep net- works

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model- agnostic meta-learning for fast adaptation of deep net- works. In International Conference on Machine Learning . PMLR, 2017. 4

work page 2017
[14]

Address- ing function approximation error in actor-critic methods

Scott Fujimoto, Herke Hoof, and David Meger. Address- ing function approximation error in actor-critic methods. In International Conference on Machine Learning. PMLR,

work page
[15]

Practice makes perfect: An optimization-based approach to controlling agile motions for a quadruped robot

Christian Gehring, Stelian Coros, Marco Hutter, Carmine Dario Bellicoso, Huub Heijnen, Remo Diethelm, Michael Bloesch, P ´eter Fankhauser, Jemin Hwangbo, Mark Hoepﬂinger, et al. Practice makes perfect: An optimization-based approach to controlling agile motions for a quadruped robot. IEEE Robotics & Automation Magazine, 2016. 3

work page 2016
[16]

Positive force feedback in bouncing gaits? Proceedings of the Royal Society of London

Hartmut Geyer, Andre Seyfarth, and Reinhard Blickhan. Positive force feedback in bouncing gaits? Proceedings of the Royal Society of London. Series B: Biological Sciences, 2003. 1, 3

work page 2003
[17]

Convolutional neural networks for steady ﬂow approximation

Xiaoxiao Guo, Wei Li, and Francesco Iorio. Convolutional neural networks for steady ﬂow approximation. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016. 3

work page 2016
[18]

Learning to walk via deep reinforcement learning

Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, and Sergey Levine. Learning to walk via deep reinforcement learning. In Robotics: Science and Systems, 2019. 1, 3

work page 2019
[19]

Grounded action transfor- mation for robot learning in simulation

Josiah Hanna and Peter Stone. Grounded action transfor- mation for robot learning in simulation. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence , 2017. 4

work page 2017
[20]

Anymal-a highly mobile and dynamic quadrupedal robot

Marco Hutter, Christian Gehring, Dominic Jud, Andreas Lauber, C Dario Bellicoso, Vassilios Tsounis, Jemin Hwangbo, Karen Bodie, Peter Fankhauser, Michael Bloesch, et al. Anymal-a highly mobile and dynamic quadrupedal robot. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2016. 3

work page 2016
[21]

RaisimGymTorch

Jemin Hwangbo. RaisimGymTorch. https://raisim.com/ sections/RaisimGymTorch.html, 2020-2021. 12

work page 2020
[22]

Per- contact iteration method for solving contact dynamics

Jemin Hwangbo, Joonho Lee, and Marco Hutter. Per- contact iteration method for solving contact dynamics. IEEE Robotics and Automation Letters , 2018. URL www. raisim.com. 5

work page 2018
[23]

Learning agile and dynamic motor skills for legged robots

Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 2019. 1, 3, 4, 8, 12

work page 2019
[24]

Implementation of trot-to-gallop transition and subsequent gallop on the mit cheetah i

Dong Jin Hyun, Jongwoo Lee, SangIn Park, and Sangbae Kim. Implementation of trot-to-gallop transition and subsequent gallop on the mit cheetah i. The International Journal of Robotics Research , 2016. 1, 3

work page 2016
[25]

Policies modulating trajectory generators

Atil Iscen, Ken Caluwaerts, Jie Tan, Tingnan Zhang, Er- win Coumans, Vikas Sindhwani, and Vincent Vanhoucke. Policies modulating trajectory generators. In Conference on Robot Learning . PMLR, 2018. 3

work page 2018
[26]

Tail assisted dynamic self righting

Aaron M Johnson, Thomas Libby, Evan Chang-Siu, Masayoshi Tomizuka, Robert J Full, and Daniel E Koditschek. Tail assisted dynamic self righting. In Adaptive Mobile Robotics . World Scientiﬁc, 2012. 1, 3

work page 2012
[27]

Fast, robust quadruped locomotion over challenging terrain

Mrinal Kalakrishnan, Jonas Buchli, Peter Pastor, Michael Mistry, and Stefan Schaal. Fast, robust quadruped locomotion over challenging terrain. In 2010 IEEE International Conference on Robotics and Automation . IEEE, 2010. 3

work page 2010
[28]

Piecewise linear spine for speed–energy efﬁciency trade-off in quadruped robots

Mahdi Khoramshahi, Hamed Jalaly Bidgoly, Soroosh Shaﬁee, Ali Asaei, Auke Jan Ijspeert, and Majid Nili Ahmadabadi. Piecewise linear spine for speed–energy efﬁciency trade-off in quadruped robots. Robotics and Autonomous Systems, 2013. 1, 3

work page 2013
[29]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations , 2015. 6, 12

work page 2015
[30]

Reinforce- ment learning in robotics: A survey

Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforce- ment learning in robotics: A survey. The International Journal of Robotics Research , 2013. 3

work page 2013
[31]

Crossing the reality gap in evolutionary robotics by promoting transferable controllers

Sylvain Koos, Jean-Baptiste Mouret, and St ´ephane Don- cieux. Crossing the reality gap in evolutionary robotics by promoting transferable controllers. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, 2010. 3

work page 2010
[32]

Learning quadrupedal loco- motion over challenging terrain

Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal loco- motion over challenging terrain. Science robotics, 2020. 1, 3, 4

work page 2020
[33]

Lillicrap, Jonathan J

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. In ICLR, 2016. 3

work page 2016
[34]

Robust trajectory opti- mization under frictional contact with iterative learning

Jingru Luo and Kris Hauser. Robust trajectory opti- mization under frictional contact with iterative learning. Autonomous Robots, 2017. 4

work page 2017
[35]

Gaze and the control of foot placement when walking in natural terrain

Jonathan Samir Matthis, Jacob L Yates, and Mary M Hayhoe. Gaze and the control of foot placement when walking in natural terrain. Current Biology, 2018. 9

work page 2018
[36]

Dynamic walk of a biped

Hirofumi Miura and Isao Shimoyama. Dynamic walk of a biped. The International Journal of Robotics Research ,

work page
[37]

Asynchronous methods for deep reinforcement learning

V olodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning . PMLR, 2016. 3

work page 2016
[38]

Multi-agent manipulation via locomotion using hierarchical sim2real

Oﬁr Nachum, Michael Ahn, Hugo Ponte, Shixiang Shane Gu, and Vikash Kumar. Multi-agent manipulation via locomotion using hierarchical sim2real. In Conference on Robot Learning . PMLR, 2020. 4

work page 2020
[39]

Why off-the-shelf physics simulators fail in evaluat- ing feedback controller performance-a case study for quadrupedal robots

Michael Neunert, Thiago Boaventura, and Jonas Buchli. Why off-the-shelf physics simulators fail in evaluat- ing feedback controller performance-a case study for quadrupedal robots. In Advances in Cooperative Robotics . World Scientiﬁc, 2017. 3

work page 2017
[40]

Sim-to-real transfer of robotic control with dynamics randomization

Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE international conference on robotics and automation (ICRA) . IEEE,

work page 2018
[41]

Learning agile robotic locomotion skills by imitating animals

Xue Bin Peng, Erwin Coumans, Tingnan Zhang, Tsang- Wei Edward Lee, Jie Tan, and Sergey Levine. Learning agile robotic locomotion skills by imitating animals. In Robotics: Science and Systems , 2020. 1, 3, 4, 7, 8

work page 2020
[42]

An inelastic quadrupedal model discovers four-beat walking, two-beat running, and pseudo-elastic actuation as energetically optimal

Delyle T Polet and John EA Bertram. An inelastic quadrupedal model discovers four-beat walking, two-beat running, and pseudo-elastic actuation as energetically optimal. PLoS computational biology , 2019. 4

work page 2019
[43]

Hopping in legged systems—modeling and simulation for the two-dimensional one-legged case

Marc H Raibert. Hopping in legged systems—modeling and simulation for the two-dimensional one-legged case. IEEE Transactions on Systems, Man, and Cybernetics ,

work page
[44]

Chomp: Gradient optimization techniques for efﬁcient motion planning

Nathan Ratliff, Matt Zucker, J Andrew Bagnell, and Siddhartha Srinivasa. Chomp: Gradient optimization techniques for efﬁcient motion planning. In 2009 IEEE International Conference on Robotics and Automation . IEEE, 2009. 3

work page 2009
[45]

A reduction of imitation learning and structured prediction to no-regret online learning

St´ephane Ross, Geoffrey Gordon, and Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artiﬁcial intelligence and statistics, 2011. 5

work page 2011
[46]

Rhex: A simple and highly mobile hexapod robot

Uluc Saranli, Martin Buehler, and Daniel E Koditschek. Rhex: A simple and highly mobile hexapod robot. The International Journal of Robotics Research , 2001. 1

work page 2001
[47]

Jordan, and Pieter Abbeel

John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. In 4th International Conference on Learning Representations ,

work page
[48]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 , 2017. 3, 6, 12

work page internal anchor Pith review Pith/arXiv arXiv 2017
[49]

Rapidly adaptable legged robots via evolutionary meta-learning

Xingyou Song, Yuxiang Yang, Krzysztof Choromanski, Ken Caluwaerts, Wenbo Gao, Chelsea Finn, and Jie Tan. Rapidly adaptable legged robots via evolutionary meta-learning. In International Conference on Intelligent Robots and Systems (IROS) , 2020. 4

work page 2020
[50]

A compliant hybrid zero dynamics controller for stable, efﬁcient and fast bipedal walking on mabel

Koushil Sreenath, Hae-Won Park, Ioannis Poulakakis, and Jessy W Grizzle. A compliant hybrid zero dynamics controller for stable, efﬁcient and fast bipedal walking on mabel. The International Journal of Robotics Research ,

work page
[51]

Sim-to-real: Learning agile locomotion for quadruped robots

Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots. In Robotics: Science and Systems ,

work page
[52]

Domain randomization for transferring deep neural networks from simulation to the real world

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Woj- ciech Zaremba, and Pieter Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS) . IEEE, 2017. 1, 4, 7, 8

work page 2017
[53]

Unitree Robotics

Xingxing Wang. Unitree Robotics. https://www.unitree. com/. 5

work page
[54]

Dynamics randomization revisited: A case study for quadrupedal locomotion

Zhaoming Xie, Xingye Da, Michiel van de Panne, Buck Babich, and Animesh Garg. Dynamics randomization revisited: A case study for quadrupedal locomotion. arXiv preprint arXiv:2011.02404, 2020. 4

work page arXiv 2011
[55]

Data efﬁcient reinforcement learning for legged robots

Yuxiang Yang, Ken Caluwaerts, Atil Iscen, Tingnan Zhang, Jie Tan, and Vikas Sindhwani. Data efﬁcient reinforcement learning for legged robots. In Conference on Robot Learning . PMLR, 2020. 1, 3

work page 2020
[56]

Simbicon: Simple biped locomotion control

KangKang Yin, Kevin Loken, and Michiel Van de Panne. Simbicon: Simple biped locomotion control. ACM Transactions on Graphics (TOG) , 2007. 1, 3

work page 2007
[57]

Karen Liu, and Greg Turk

Wenhao Yu, Jie Tan, C. Karen Liu, and Greg Turk. Preparing for the unknown: Learning a universal policy with online system identiﬁcation. In Robotics: Science and Systems, 2017. 3, 4, 7, 8

work page 2017
[58]

Policy transfer with strategy optimization

Wenhao Yu, C Karen Liu, and Greg Turk. Policy transfer with strategy optimization. In International Conference on Learning Representations , 2018. 4

work page 2018
[59]

Wenhao Yu, Visak C. V . Kumar, Greg Turk, and C. Karen Liu. Sim-to-real transfer for biped locomotion. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2019. 4

work page 2019
[60]

Learning fast adaptation with meta strategy optimization

Wenhao Yu, Jie Tan, Yunfei Bai, Erwin Coumans, and Sehoon Ha. Learning fast adaptation with meta strategy optimization. IEEE Robotics and Automation Letters ,

work page
[61]

Envi- ronment probing interaction policies

Wenxuan Zhou, Lerrel Pinto, and Abhinav Gupta. Envi- ronment probing interaction policies. In 7th International Conference on Learning Representations, ICLR 2019 ,

work page 2019
[62]

The stanford littledog: A learning and rapid replanning approach to quadruped locomotion

J Zico Kolter and Andrew Y Ng. The stanford littledog: A learning and rapid replanning approach to quadruped locomotion. The International Journal of Robotics Research, 2011. 3

work page 2011
[63]

An optimization approach to rough terrain locomotion

Matt Zucker, J Andrew Bagnell, Christopher G Atke- son, and James Kuffner. An optimization approach to rough terrain locomotion. In 2010 IEEE International Conference on Robotics and Automation . IEEE, 2010. 1, 3

work page 2010
[64]

Optimization and learning for rough terrain legged locomotion

Matt Zucker, Nathan Ratliff, Martin Stolle, Joel Chestnutt, J Andrew Bagnell, Christopher G Atkeson, and James Kuffner. Optimization and learning for rough terrain legged locomotion. The International Journal of Robotics Research, 2011. 3 Supplementary for RMA: Rapid Motor Adaptation for Legged Robots S1. M ETRICS We use several metrics (in SI units) to e...

work page 2011