Stochastic Multi-armed Bandits in Constant Space

David Liau; Eric Price; Ger Yang; Zhao Song

arxiv: 1712.09007 · v2 · pith:ZTCIIDU2new · submitted 2017-12-25 · 💻 cs.DS · cs.LG· stat.ML

Stochastic Multi-armed Bandits in Constant Space

David Liau , Eric Price , Zhao Song , Ger Yang This is my paper

classification 💻 cs.DS cs.LGstat.ML

keywords deltaspacearmsbestfracrecordregretstochastic

0 comments

read the original abstract

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms. We give an algorithm using $O(1)$ words of space with regret \[ \sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{\Delta}\log T \] where $\Delta_i$ is the gap between the best arm and arm $i$ and $\Delta$ is the gap between the best and the second-best arms. If the rewards are bounded away from $0$ and $1$, this is within an $O(\log 1/\Delta)$ factor of the optimum regret possible without space constraints.

This paper has not been read by Pith yet.

Stochastic Multi-armed Bandits in Constant Space

discussion (0)