pith. sign in

arxiv: 1611.00862 · v1 · pith:OAGXOCKFnew · submitted 2016-11-03 · 💻 cs.LG · cs.AI

Quantile Reinforcement Learning

classification 💻 cs.LG cs.AI
keywords criterionlearningreinforcementevaluatealgorithmalternativealwaysapproximation
0
0 comments X
read the original abstract

In reinforcement learning, the standard criterion to evaluate policies in a state is the expectation of (discounted) sum of rewards. However, this criterion may not always be suitable, we consider an alternative criterion based on the notion of quantiles. In the case of episodic reinforcement learning problems, we propose an algorithm based on stochastic approximation with two timescales. We evaluate our proposition on a simple model of the TV show, Who wants to be a millionaire.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Scheme for Dynamic Risk-Sensitive Sequential Decision Making

    cs.AI 2019-07 unverdicted novelty 3.0

    A neural network scheme approximates risk and policies for dynamic risk-sensitive MDPs using synthetic data based on mean-variance risk estimation.