Budgeted Reinforcement Learning in Continuous State Space

Edouard Leurent; Nicolas Carrara; Odalric-Ambrym Maillard; Olivier Pietquin; Romain Laroche; Tanguy Urvoy

arxiv: 1903.01004 · v3 · pith:4KHGNROVnew · submitted 2019-03-03 · 💻 cs.LG · cs.AI· stat.ML

Budgeted Reinforcement Learning in Continuous State Space

Nicolas Carrara , Edouard Leurent , Romain Laroche , Tanguy Urvoy , Odalric-Ambrym Maillard , Olivier Pietquin This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords budgetedapplicationsbmdpbmdpscontinuousdecisiondynamicslearning

0 comments

read the original abstract

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.

This paper has not been read by Pith yet.

Budgeted Reinforcement Learning in Continuous State Space

discussion (0)