Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Georgios Piliouras; Kelly Spendlove; Stefanos Leonardos

arxiv: 2106.12928 · v1 · pith:AV3ABTVNnew · submitted 2021-06-24 · 💻 cs.GT · cs.LG· cs.MA· econ.TH· math.DS

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Stefanos Leonardos , Georgios Piliouras , Kelly Spendlove This is my paper

classification 💻 cs.GT cs.LGcs.MAecon.THmath.DS

keywords gamescompetitiveconvergenceexplorationlearningmulti-agentq-learningagents

0 comments

read the original abstract

The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games, we show that fast convergence of Q-learning in competitive settings is obtained regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.

This paper has not been read by Pith yet.

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

discussion (0)