Uncertainty-Aware Reinforcement Learning for Collision Avoidance

Adam Villaflor; Gregory Kahn; Pieter Abbeel; Sergey Levine; Vitchyr Pong

arxiv: 1702.01182 · v1 · pith:MPD2SVSYnew · submitted 2017-02-03 · 💻 cs.LG · cs.RO

Uncertainty-Aware Reinforcement Learning for Collision Avoidance

Gregory Kahn , Adam Villaflor , Vitchyr Pong , Pieter Abbeel , Sergey Levine This is my paper

classification 💻 cs.LG cs.RO

keywords learningcollisionsrobottrainingavoidancecollisionmustreinforcement

0 comments

read the original abstract

Reinforcement learning can enable complex, adaptive behavior to be learned automatically for autonomous robotic platforms. However, practical deployment of reinforcement learning methods must contend with the fact that the training process itself can be unsafe for the robot. In this paper, we consider the specific case of a mobile robot learning to navigate an a priori unknown environment while avoiding collisions. In order to learn collision avoidance, the robot must experience collisions at training time. However, high-speed collisions, even at training time, could damage the robot. A successful learning method must therefore proceed cautiously, experiencing only low-speed collisions until it gains confidence. To this end, we present an uncertainty-aware model-based learning algorithm that estimates the probability of collision together with a statistical estimate of uncertainty. By formulating an uncertainty-dependent cost function, we show that the algorithm naturally chooses to proceed cautiously in unfamiliar environments, and increases the velocity of the robot in settings where it has high confidence. Our predictive model is based on bootstrapped neural networks using dropout, allowing it to process raw sensory inputs from high-bandwidth sensors such as cameras. Our experimental evaluation demonstrates that our method effectively minimizes dangerous collisions at training time in an obstacle avoidance task for a simulated and real-world quadrotor, and a real-world RC car. Videos of the experiments can be found at https://sites.google.com/site/probcoll.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift
cs.LG 2026-05 unverdicted novelty 7.0

Anchor-TS corrects bias from distribution shift in offline-to-online bandits by taking the median of an online posterior sample, a hybrid posterior sample, and the online sample mean.
Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift
cs.LG 2026-05 unverdicted novelty 7.0

Anchor-TS defines arm indices as the median of an online posterior sample, a hybrid posterior sample, and the online sample mean to correct distribution-shift bias and safely accelerate online learning with offline data.
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
cs.LG 2026-05 unverdicted novelty 7.0

SeqRejectron builds a stopping rule from a small set of validator policies to achieve horizon-free sample-complexity guarantees for selective imitation learning under arbitrary train-test dynamics shifts.
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
cs.LG 2026-05 unverdicted novelty 7.0

SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
InFeR: Informed Failure Resilience in Learned Visual Navigation Control
cs.RO 2025-10 unverdicted novelty 6.0

InFeR retrains imitation learning policies with a VIB loss for OOD failure detection and applies Grad-CAM to localize failure sources, enabling heuristic recovery in visual navigation without additional demonstrations.
Generalizing from a few environments in safety-critical reinforcement learning
cs.LG 2019-07 unverdicted novelty 6.0

RL agents fail dangerously on unseen environments; ensembles reduce catastrophes in gridworld but not CoinRun, with uncertainty enabling intervention prediction.
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
cs.LG 2019-06 unverdicted novelty 6.0

Develops Way Off-Policy batch RL algorithms with pre-trained model priors, KL-control, and dropout uncertainty estimates to learn implicit rewards from offline human dialog data, reporting live deployment gains over p...
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning
cs.RO 2025-03 unverdicted novelty 5.0

SafeVLA applies constrained reinforcement learning via CMDP min-max optimization to VLAs, cutting safety violation costs by 83.58% while preserving task success on long-horizon mobile manipulation tasks.