Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication

Jiachen Hu; Liwei Wang; Xiaoyu Chen; Yuanhao Wang

arxiv: 1904.06309 · v2 · pith:OMYSCLOCnew · submitted 2019-04-12 · 💻 cs.LG · stat.ML

Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication

Yuanhao Wang , Jiachen Hu , Xiaoyu Chen , Liwei Wang This is my paper

classification 💻 cs.LG stat.ML

keywords communicationregretcostdistributednear-optimalbanditslogarithmiconly

0 comments

read the original abstract

We study the problem of regret minimization for distributed bandits learning, in which $M$ agents work collaboratively to minimize their total regret under the coordination of a central server. Our goal is to design communication protocols with near-optimal regret and little communication cost, which is measured by the total amount of transmitted data. For distributed multi-armed bandits, we propose a protocol with near-optimal regret and only $O(M\log(MK))$ communication cost, where $K$ is the number of arms. The communication cost is independent of the time horizon $T$, has only logarithmic dependence on the number of arms, and matches the lower bound except for a logarithmic factor. For distributed $d$-dimensional linear bandits, we propose a protocol that achieves near-optimal regret and has communication cost of order $\tilde{O}(Md)$, which has only logarithmic dependence on $T$.

This paper has not been read by Pith yet.

Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication

discussion (0)