CLF-guided RL yields exponentially stable optimal controllers, with proofs in continuous and discrete time, numerical checks on double integrator and cart-pole, and implementation on a walking humanoid.
Learning to walk in minutes using massively parallel deep reinforcement learning,
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A single reinforcement learning policy jointly trains multiple locomotion skills for wheeled-legged robots with DC-motor constraints and learns a proprioceptive skill selector for adaptive behavior.
Empirical comparison shows a clear sim-to-real gap in reset-free RL for agile driving: TD-MPC2 outperforms the MPPI baseline in the real world while SAC excels in simulation, and residual learning benefits simulation but does not transfer.
A four-stage RL system with teacher-student distillation and online constrained adaptation enables humanoid robots to achieve robust ball-kicking accuracy under noisy perception in simulation and on physical hardware.
Sparsely gated MoE policies double the success rate of a real Unitree Go2 quadruped on large-obstacle parkour versus matched-active-parameter MLP baselines while cutting inference time compared with a scaled-up MLP.
citing papers explorer
-
Stability of Control Lyapunov Function Guided Reinforcement Learning
CLF-guided RL yields exponentially stable optimal controllers, with proofs in continuous and discrete time, numerical checks on double integrator and cart-pole, and implementation on a walking humanoid.
-
MUJICA: Multi-skill Unified Joint Integration of Control Architecture for Wheeled-Legged Robots
A single reinforcement learning policy jointly trains multiple locomotion skills for wheeled-legged robots with DC-motor constraints and learns a proprioceptive skill selector for adaptive behavior.
-
Reset-Free Reinforcement Learning for Real-World Agile Driving: An Empirical Study
Empirical comparison shows a clear sim-to-real gap in reset-free RL for agile driving: TD-MPC2 outperforms the MPPI baseline in the real world while SAC excels in simulation, and residual learning benefits simulation but does not transfer.
-
Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input
A four-stage RL system with teacher-student distillation and online constrained adaptation enables humanoid robots to achieve robust ball-kicking accuracy under noisy perception in simulation and on physical hardware.
-
Quadruped Parkour Learning: Sparsely Gated Mixture of Experts with Visual Input
Sparsely gated MoE policies double the success rate of a real Unitree Go2 quadruped on large-obstacle parkour versus matched-active-parameter MLP baselines while cutting inference time compared with a scaled-up MLP.
- CART: Context-Aware Terrain Adaptation using Temporal Sequence Selection for Legged Robots