Quantile Reinforcement Learning

Hugo Gilbert; Paul Weng

arxiv: 1611.00862 · v1 · pith:OAGXOCKFnew · submitted 2016-11-03 · 💻 cs.LG · cs.AI

Quantile Reinforcement Learning

Hugo Gilbert , Paul Weng This is my paper

classification 💻 cs.LG cs.AI

keywords criterionlearningreinforcementevaluatealgorithmalternativealwaysapproximation

0 comments

read the original abstract

In reinforcement learning, the standard criterion to evaluate policies in a state is the expectation of (discounted) sum of rewards. However, this criterion may not always be suitable, we consider an alternative criterion based on the notion of quantiles. In the case of episodic reinforcement learning problems, we propose an algorithm based on stochastic approximation with two timescales. We evaluate our proposition on a simple model of the TV show, Who wants to be a millionaire.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Scheme for Dynamic Risk-Sensitive Sequential Decision Making
cs.AI 2019-07 unverdicted novelty 3.0

A neural network scheme approximates risk and policies for dynamic risk-sensitive MDPs using synthetic data based on mean-variance risk estimation.