pith. sign in

arxiv: 1904.12233 · v2 · pith:BMR3W4KOnew · submitted 2019-04-28 · 💻 cs.LG · cs.MA· stat.ML

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

classification 💻 cs.LG cs.MAstat.ML
keywords playersmodelcollisionfeedbackfirstguaranteeinformationmulti-armed
0
0 comments X
read the original abstract

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first $\sqrt{T}$-type regret guarantee for this problem, under the feedback model where collisions are announced to the colliding players. Such a bound was not known even for the simpler stochastic version. We also prove the first sublinear guarantee for the feedback model where collision information is not available, namely $T^{1-\frac{1}{2m}}$ where $m$ is the number of players.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Near-Optimal Privacy-Preserving Learning for Max-Min Fair Multi-Agent Bandits

    cs.LG 2023-06 unverdicted novelty 7.0

    A collision-only coordinated distributed algorithm for max-min fair multi-agent bandits achieves O(N^3 f(log T) log T) regret while preserving local reward privacy.