Quantile Reinforcement Learning
classification
💻 cs.LG
cs.AI
keywords
criterionlearningreinforcementevaluatealgorithmalternativealwaysapproximation
read the original abstract
In reinforcement learning, the standard criterion to evaluate policies in a state is the expectation of (discounted) sum of rewards. However, this criterion may not always be suitable, we consider an alternative criterion based on the notion of quantiles. In the case of episodic reinforcement learning problems, we propose an algorithm based on stochastic approximation with two timescales. We evaluate our proposition on a simple model of the TV show, Who wants to be a millionaire.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
A Scheme for Dynamic Risk-Sensitive Sequential Decision Making
A neural network scheme approximates risk and policies for dynamic risk-sensitive MDPs using synthetic data based on mean-variance risk estimation.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.