CoP tactile representation with differentiable calibration enables zero-shot sim-to-real transfer and outperforms binary and raw-taxel baselines on peg-in-hole insertion and ball balancing with a multi-fingered hand.
Rsl-rl: A learning library for robotics research,
16 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Betting mechanisms can yield provably more accurate and efficient estimates of real-world robot behavior than Monte Carlo sampling under specified conditions, with practical approximations demonstrated on synthetic data and a robotic manipulator task.
HALO learns latent reduced-order models with Poincaré maps for hybrid locomotion dynamics, allowing Lyapunov-based regions of attraction to be lifted from latent space to the full-order system.
BRRL derives an analytic optimal policy for regularized constrained RL that guarantees monotonic improvement and yields the BPO algorithm that matches or exceeds PPO.
A quadruped robot with a three-degree-of-freedom active spine reaches 6.9 m/s top speed and 7.2 rad/s turning rate via an RL framework that rewards spine engagement and gallop gaits.
A framework using 3D Gaussian Splatting for visual domain randomization enables robust monocular RGB-based dexterous in-hand reorientation on real hardware for multiple objects under varied lighting.
PriPG-RL trains RL policies for POMDPs by distilling knowledge from a privileged anytime-feasible MPC planner into a P2P-SAC policy, improving sample efficiency and performance in partially observable robotic navigation.
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
Isaac Lab is a unified GPU-native platform combining high-fidelity physics, photorealistic rendering, multi-frequency sensors, domain randomization, and learning pipelines for scalable multi-modal robot policy training.
RANDPOL achieves effective quadruped locomotion by training only the final linear readout of a randomly initialized and fixed neural network policy, matching PPO results with reduced parameters and enabling zero-shot sim-to-real transfer on Unitree Go2.
PPO-EAL integrates exact augmented Lagrangian optimization into PPO for safe robotic control, with claimed theoretical guarantees and better empirical safety-performance tradeoffs on several robot benchmarks including sim-to-real gear assembly.
Terrain-consistent reference modulation during RL training yields SE(2)-controllable humanoid locomotion policies that improve tracking in simulation and enable over 70 m closed-loop autonomous navigation on rough terrain and stairs on the Unitree G1 with onboard computation.
SDPG is a new on-policy visual RL algorithm that estimates gradients via stochastic perturbations of rollouts, achieving faster training and lower memory use than baselines on visual MuJoCo tasks while adding new robotics benchmarks and sim-to-real results.
A modified YOLO segmentation model plus sim-trained PPO control yields 84.3% overall success harvesting 281 strawberries in greenhouse trials on a real UR10e manipulator.
An open-sourced Unified Autonomy Stack fuses LiDAR, radar, vision and inertial data with sampling-based planning and control barrier functions to deliver resilient autonomy on aerial and ground robots in challenging real-world settings.
citing papers explorer
-
Bounded Ratio Reinforcement Learning
BRRL derives an analytic optimal policy for regularized constrained RL that guarantees monotonic improvement and yields the BPO algorithm that matches or exceeds PPO.
-
PriPG-RL: Privileged Planner-Guided Reinforcement Learning for Partially Observable Systems with Anytime-Feasible MPC
PriPG-RL trains RL policies for POMDPs by distilling knowledge from a privileged anytime-feasible MPC planner into a P2P-SAC policy, improving sample efficiency and performance in partially observable robotic navigation.
-
FlashSAC: Fast and Stable Off-Policy Reinforcement Learning for High-Dimensional Robot Control
FlashSAC improves training speed and final performance of off-policy RL on high-dimensional robot tasks by reducing update frequency, increasing model scale, and bounding norms to limit critic error accumulation.
-
RANDPOL: Parameter-Efficient End-to-End Quadruped Locomotion via Randomized Policy Learning
RANDPOL achieves effective quadruped locomotion by training only the final linear readout of a randomly initialized and fixed neural network policy, matching PPO results with reduced parameters and enabling zero-shot sim-to-real transfer on Unitree Go2.