pith. sign in

arxiv: 1602.07182 · v3 · pith:3YKEMNG7new · submitted 2016-02-23 · 🧮 math.ST · cs.LG· stat.TH

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

classification 🧮 math.ST cs.LGstat.TH
keywords regretboundsbanditonlyphaseproblemswell-knownalmost
0
0 comments X
read the original abstract

We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Budgeted Online Influence Maximization

    cs.LG 2026-04 unverdicted novelty 7.0

    A new algorithm for online influence maximization under a total budget constraint using the independent cascade model and edge-level semi-bandit feedback, with improved regret bounds for both budgeted and cardinality ...