Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

Jie Tan , Tingnan Zhang , Erwin Coumans , Atil Iscen , Yunfei Bai , Danijar Hafner , Steven Bohez , Vincent Vanhoucke

Authors on Pith no claims yet

classification 💻 cs.RO cs.AI

keywords learninglocomotionquadrupedsystemagilepoliciesrealrobots

read the original abstract

Designing agile locomotion for quadruped robots often requires extensive expertise and tedious manual tuning. In this paper, we present a system to automate this process by leveraging deep reinforcement learning techniques. Our system can learn quadruped locomotion from scratch using simple reward signals. In addition, users can provide an open loop reference to guide the learning process when more control over the learned gait is needed. The control policies are learned in a physics simulator and then deployed on real robots. In robotics, policies trained in simulation often do not transfer to the real world. We narrow this reality gap by improving the physics simulator and learning robust policies. We improve the simulation using system identification, developing an accurate actuator model and simulating latency. We learn robust controllers by randomizing the physical environments, adding perturbations and designing a compact observation space. We evaluate our system on two agile locomotion gaits: trotting and galloping. After learning in simulation, a quadruped robot can successfully perform both gaits in the real world.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain
cs.NE 2026-05 unverdicted novelty 7.0

An equilibrium-propagation-based PPO controller for a 12-DoF quadruped achieves locomotion performance comparable to backpropagation-trained PPO on uneven terrain while using 4.3 times less GPU memory.
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
cs.LG 2026-05 unverdicted novelty 7.0

SeqRejectron builds a stopping rule from a small set of validator policies to achieve horizon-free sample-complexity guarantees for selective imitation learning under arbitrary train-test dynamics shifts.
Towards Real-time Control of a CartPole System on a Quantum Computer
quant-ph 2026-05 unverdicted novelty 6.0

A single-qubit quantum reinforcement learning agent solves CartPole faster than classical networks and quantifies shot-count versus control-frequency requirements for real-time closed-loop control on NISQ hardware, in...
Abstract Sim2Real through Approximate Information States
cs.RO 2026-04 unverdicted novelty 6.0

Abstract simulators can be grounded to real tasks by making their dynamics history-dependent and correcting them with real data, enabling RL policy transfer.
MoRI: Mixture of RL and IL Experts for Long-Horizon Manipulation Tasks
cs.RO 2026-04 unverdicted novelty 6.0

MoRI dynamically mixes RL and IL experts with variance-based switching and IL regularization to reach 97.5% success in four real-world robotic tasks while cutting human intervention by 85.8%.
Precise Aggressive Aerial Maneuvers with Sensorimotor Policies
cs.RO 2026-04 unverdicted novelty 6.0

Reinforcement learning sensorimotor policies enable quadrotors to traverse narrow gaps at extreme tilts with 5 cm clearance using only vision and proprioception, including reactive traversal of moving gaps.
MUJICA: Multi-skill Unified Joint Integration of Control Architecture for Wheeled-Legged Robots
cs.RO 2026-05 unverdicted novelty 5.0

A single reinforcement learning policy jointly trains multiple locomotion skills for wheeled-legged robots with DC-motor constraints and learns a proprioceptive skill selector for adaptive behavior.
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
cs.LG 2020-05 unverdicted novelty 2.0

Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.