Uncertainty-Aware Reinforcement Learning for Collision Avoidance

Gregory Kahn , Adam Villaflor , Vitchyr Pong , Pieter Abbeel , Sergey Levine

Authors on Pith no claims yet

classification 💻 cs.LG cs.RO

keywords learningcollisionsrobottrainingavoidancecollisionmustreinforcement

read the original abstract

Reinforcement learning can enable complex, adaptive behavior to be learned automatically for autonomous robotic platforms. However, practical deployment of reinforcement learning methods must contend with the fact that the training process itself can be unsafe for the robot. In this paper, we consider the specific case of a mobile robot learning to navigate an a priori unknown environment while avoiding collisions. In order to learn collision avoidance, the robot must experience collisions at training time. However, high-speed collisions, even at training time, could damage the robot. A successful learning method must therefore proceed cautiously, experiencing only low-speed collisions until it gains confidence. To this end, we present an uncertainty-aware model-based learning algorithm that estimates the probability of collision together with a statistical estimate of uncertainty. By formulating an uncertainty-dependent cost function, we show that the algorithm naturally chooses to proceed cautiously in unfamiliar environments, and increases the velocity of the robot in settings where it has high confidence. Our predictive model is based on bootstrapped neural networks using dropout, allowing it to process raw sensory inputs from high-bandwidth sensors such as cameras. Our experimental evaluation demonstrates that our method effectively minimizes dangerous collisions at training time in an obstacle avoidance task for a simulated and real-world quadrotor, and a real-world RC car. Videos of the experiments can be found at https://sites.google.com/site/probcoll.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift
cs.LG 2026-05 unverdicted novelty 7.0

Anchor-TS corrects bias from distribution shift in offline-to-online bandits by taking the median of an online posterior sample, a hybrid posterior sample, and the online sample mean.
Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift
cs.LG 2026-05 unverdicted novelty 7.0

Anchor-TS defines arm indices as the median of an online posterior sample, a hybrid posterior sample, and the online sample mean to correct distribution-shift bias and safely accelerate online learning with offline data.
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
cs.LG 2026-05 unverdicted novelty 7.0

SeqRejectron builds a stopping rule from a small set of validator policies to achieve horizon-free sample-complexity guarantees for selective imitation learning under arbitrary train-test dynamics shifts.