Bandit learning in concave $N$-person games

David S. Leslie; Mario Bravo; Panayotis Mertikopoulos

arxiv: 1810.01925 · v1 · pith:4MBL535Onew · submitted 2018-10-03 · 💻 cs.GT · cs.LG· math.OC

Bandit learning in concave N-person games

Mario Bravo , David S. Leslie , Panayotis Mertikopoulos This is my paper

classification 💻 cs.GT cs.LGmath.OC

keywords banditlearningno-regretagentsbehaviorconcaveevenfeedback

0 comments

read the original abstract

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability $1$. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

This paper has not been read by Pith yet.

Bandit learning in concave N-person games

discussion (0)