Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Aur\'elien Garivier (IMT); Gilles Stoltz (GREGH); Pierre M\'enard (IMT)

arxiv: 1602.07182 · v3 · pith:3YKEMNG7new · submitted 2016-02-23 · 🧮 math.ST · cs.LG· stat.TH

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Aur\'elien Garivier (IMT) , Pierre M\'enard (IMT) , Gilles Stoltz (GREGH) This is my paper

classification 🧮 math.ST cs.LGstat.TH

keywords regretboundsbanditonlyphaseproblemswell-knownalmost

0 comments

read the original abstract

We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Budgeted Online Influence Maximization
cs.LG 2026-04 unverdicted novelty 7.0

A new algorithm for online influence maximization under a total budget constraint using the independent cascade model and edge-level semi-bandit feedback, with improved regret bounds for both budgeted and cardinality ...