Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

Qing Zhao; Sattar Vakili

arxiv: 1604.05257 · v3 · pith:5ICT36MXnew · submitted 2016-04-18 · 💻 cs.LG

Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

Sattar Vakili , Qing Zhao This is my paper

classification 💻 cs.LG

keywords measurebanditmean-variancemulti-armedproblemsunderloweromega

0 comments

read the original abstract

The multi-armed bandit problems have been studied mainly under the measure of expected total reward accrued over a horizon of length $T$. In this paper, we address the issue of risk in multi-armed bandit problems and develop parallel results under the measure of mean-variance, a commonly adopted risk measure in economics and mathematical finance. We show that the model-specific regret and the model-independent regret in terms of the mean-variance of the reward process are lower bounded by $\Omega(\log T)$ and $\Omega(T^{2/3})$, respectively. We then show that variations of the UCB policy and the DSEE policy developed for the classic risk-neutral MAB achieve these lower bounds.

This paper has not been read by Pith yet.

Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

discussion (0)