pith. sign in

arxiv: 1903.01004 · v3 · pith:4KHGNROVnew · submitted 2019-03-03 · 💻 cs.LG · cs.AI· stat.ML

Budgeted Reinforcement Learning in Continuous State Space

classification 💻 cs.LG cs.AIstat.ML
keywords budgetedapplicationsbmdpbmdpscontinuousdecisiondynamicslearning
0
0 comments X
read the original abstract

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.