Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

· 2019 · cs.LG · arXiv 1904.12233

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first $\sqrt{T}$-type regret guarantee for this problem, under the feedback model where collisions are announced to the colliding players. Such a bound was not known even for the simpler stochastic version. We also prove the first sublinear guarantee for the feedback model where collision information is not available, namely $T^{1-\frac{1}{2m}}$ where $m$ is the number of players.

representative citing papers

Near-Optimal Privacy-Preserving Learning for Max-Min Fair Multi-Agent Bandits

cs.LG · 2023-06-07 · unverdicted · novelty 7.0

A collision-only coordinated distributed algorithm for max-min fair multi-agent bandits achieves O(N^3 f(log T) log T) regret while preserving local reward privacy.

citing papers explorer

Showing 1 of 1 citing paper.

Near-Optimal Privacy-Preserving Learning for Max-Min Fair Multi-Agent Bandits cs.LG · 2023-06-07 · unverdicted · none · ref 20 · internal anchor
A collision-only coordinated distributed algorithm for max-min fair multi-agent bandits achieves O(N^3 f(log T) log T) regret while preserving local reward privacy.

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

fields

years

verdicts

representative citing papers

citing papers explorer