Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

Cheng Jie; Csaba Szepesv\'ari; Michael Fu; Prashanth L.A.; Steve Marcus

arxiv: 1506.02632 · v3 · pith:IJDQISY3new · submitted 2015-06-08 · 💻 cs.LG · math.OC

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

Prashanth L.A. , Cheng Jie , Michael Fu , Steve Marcus , Csaba Szepesv\'ari This is my paper

classification 💻 cs.LG math.OC

keywords controlalgorithmscpt-valuecumulativedistributionempiricalestimationidea

0 comments

read the original abstract

Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also illustrate the usefulness of CPT-based criteria in a traffic signal control application.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs
cs.LG 2026-05 unverdicted novelty 7.0

Derives contraction-based Q-value extensions for exponential utility and proves almost-sure convergence of two-timescale and one-timescale model-free algorithms in discounted MDPs.