Bandits with Side Observations: Bounded vs. Logarithmic Regret

Evrard Garcelon; R\'emy Degenne; Vianney Perchet

arxiv: 1807.03558 · v1 · pith:GERXDLFHnew · submitted 2018-07-10 · 💻 cs.LG · stat.ML

Bandits with Side Observations: Bounded vs. Logarithmic Regret

R\'emy Degenne , Evrard Garcelon , Vianney Perchet This is my paper

classification 💻 cs.LG stat.ML

keywords epsilonregrettimeagentalgorithmboundedprovebandit

0 comments

read the original abstract

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $\epsilon$, an extra observation is gathered by the agent for free. We prove that, no matter how small $\epsilon$ is the agent can ensure a regret uniformly bounded in time. More precisely, we construct an algorithm with a regret smaller than $\sum_i \frac{\log(1/\epsilon)}{\Delta_i}$, up to multiplicative constant and loglog terms. We also prove a matching lower-bound, stating that no reasonable algorithm can outperform this quantity.

This paper has not been read by Pith yet.

Bandits with Side Observations: Bounded vs. Logarithmic Regret

discussion (0)