Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model

Akimichi Takemura; Junya Honda

arxiv: 1202.2277 · v2 · pith:6NU55XKTnew · submitted 2012-02-10 · 🧮 math.ST · math.PR· stat.TH

Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model

Junya Honda , Akimichi Takemura This is my paper

classification 🧮 math.ST math.PRstat.TH

keywords boundregretasymptoticbanditfinitefinite-timeformmodel

0 comments

read the original abstract

In this paper we consider stochastic multiarmed bandit problems. Recently a policy, DMED, is proposed and proved to achieve the asymptotic bound for the model that each reward distribution is supported in a known bounded interval, e.g. [0,1]. However, the derived regret bound is described in an asymptotic form and the performance in finite time has been unknown. We inspect this policy and derive a finite-time regret bound by refining large deviation probabilities to a simple finite form. Further, this observation reveals that the assumption on the lower-boundedness of the support is not essential and can be replaced with a weaker one, the existence of the moment generating function.

This paper has not been read by Pith yet.

Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model

discussion (0)