pith. the verified trust layer for science. sign in

arxiv: 1606.04753 · v2 · pith:JDYAJTFSnew · submitted 2016-06-15 · 💻 cs.LG · cs.AI· cs.RO· stat.ML

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

classification 💻 cs.LG cs.AIcs.ROstat.ML
keywords safetyexploringconstraintprocessessafeunknownactionsalgorithm
0
0 comments X p. Extension
Add this Pith Number to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{JDYAJTFS}

Prints a linked pith:JDYAJTFS badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

In classical reinforcement learning, when exploring an environment, agents accept arbitrary short term loss for long term gain. This is infeasible for safety critical applications, such as robotics, where even a single unsafe action may cause system failure. In this paper, we address the problem of safely exploring finite Markov decision processes (MDP). We define safety in terms of an, a priori unknown, safety constraint that depends on states and actions. We aim to explore the MDP under this constraint, assuming that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop a novel algorithm for this task and prove that it is able to completely explore the safely reachable part of the MDP without violating the safety constraint. To achieve this, it cautiously explores safe states and actions in order to gain statistical confidence about the safety of unvisited state-action pairs from noisy observations collected while navigating the environment. Moreover, the algorithm explicitly considers reachability when exploring the MDP, ensuring that it does not get stuck in any state with no safe way out. We demonstrate our method on digital terrain models for the task of exploring an unknown map with a rover.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Concrete Problems in AI Safety

    cs.AI 2016-06 accept novelty 7.0

    The paper categorizes five concrete AI safety problems arising from flawed objectives, costly evaluation, and learning dynamics.