BRRL derives an analytic optimal policy for regularized constrained RL that guarantees monotonic improvement and yields the BPO algorithm that matches or exceeds PPO.
Learning robust perceptive locomotion for quadrupedal robots in the wild.Science robotics, 7(62):eabk2822, 2022
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6representative citing papers
A new GPU-accelerated deformable simulation framework trains manipulation policies in minutes using only synthetic data, achieving robust zero-shot transfer to physical robots.
Reinforcement learning sensorimotor policies enable quadrotors to traverse narrow gaps at extreme tilts with 5 cm clearance using only vision and proprioception, including reactive traversal of moving gaps.
A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over obstacles up to 1.25 m tall.
SERNF achieves sample-efficient real-world fine-tuning of multimodal dexterous policies by pairing exact-likelihood normalizing flow policies with action-chunked value critics.
MoE-based locomotion policy with RoboGauge metrics achieves reliable sim-to-real transfer, enabling robust quadrupedal walking on challenging unseen terrains up to 4 m/s.
citing papers explorer
-
Bounded Ratio Reinforcement Learning
BRRL derives an analytic optimal policy for regularized constrained RL that guarantees monotonic improvement and yields the BPO algorithm that matches or exceeds PPO.
-
FLASH: Fast Learning via GPU-Accelerated Simulation for High-Fidelity Deformable Manipulation in Minutes
A new GPU-accelerated deformable simulation framework trains manipulation policies in minutes using only synthetic data, achieving robust zero-shot transfer to physical robots.
-
Precise Aggressive Aerial Maneuvers with Sensorimotor Policies
Reinforcement learning sensorimotor policies enable quadrotors to traverse narrow gaps at extreme tilts with 5 cm clearance using only vision and proprioception, including reactive traversal of moving gaps.
-
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over obstacles up to 1.25 m tall.
-
SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows
SERNF achieves sample-efficient real-world fine-tuning of multimodal dexterous policies by pairing exact-likelihood normalizing flow policies with action-chunked value critics.
-
Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion
MoE-based locomotion policy with RoboGauge metrics achieves reliable sim-to-real transfer, enabling robust quadrupedal walking on challenging unseen terrains up to 4 m/s.