Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
read the original abstract
We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Budgeted Online Influence Maximization
A new algorithm for online influence maximization under a total budget constraint using the independent cascade model and edge-level semi-bandit feedback, with improved regret bounds for both budgeted and cardinality ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.