pith. machine review for the scientific record. sign in

arxiv: 1512.04152 · v1 · submitted 2015-12-14 · 💻 cs.LG · cs.GT· stat.ML

Recognition: unknown

Fighting Bandits with a New Kind of Smoothness

Authors on Pith no claims yet
classification 💻 cs.LG cs.GTstat.ML
keywords perturbationregretsqrtachieveachievesadversarialalgorithmsanalysis
0
0 comments X
read the original abstract

We define a novel family of algorithms for the adversarial multi-armed bandit problem, and provide a simple analysis technique based on convex smoothing. We prove two main results. First, we show that regularization via the \emph{Tsallis entropy}, which includes EXP3 as a special case, achieves the $\Theta(\sqrt{TN})$ minimax regret. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as $O(\sqrt{TN \log N})$ if the perturbation distribution has a bounded hazard rate. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.