A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

Pablo Hernandez-Leal , Michael Kaisers , Tim Baarslag , Enrique Munoz de Cote

Authors on Pith no claims yet

classification 💻 cs.MA cs.LG

keywords learningnon-stationarityagentsalgorithmsapproachesbehaviourcategoriesenvironments

read the original abstract

The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MARS$^2$: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation
cs.AI 2026-04 unverdicted novelty 6.0

MARS² integrates multi-agent collaboration with tree-structured search in RL to boost code generation by increasing exploratory diversity and using path-level group advantages for credit assignment.
Train-Small Deploy-Large: Leveraging Diffusion-Based Multi-Robot Planning
cs.RO 2026-04 unverdicted novelty 6.0

A diffusion-based multi-robot planner trained on few agents generalizes to larger numbers during deployment using inter-agent attention and temporal convolution.
Modular Reinforcement Learning For Cooperative Swarms
cs.RO 2026-05 unverdicted novelty 5.0

Modular decomposition of interaction states allows distributed RL for cooperative robot swarms to scale without combinatorial memory explosion in foraging simulations.
A Hierarchical MARL-Based Approach for Coordinated Retail P2P Trading and Wholesale Market Participation of DERs
cs.LG 2026-04 unverdicted novelty 4.0

Hierarchical MARL with Stackelberg coordination enables DER prosumers to engage in P2P retail auctions and aggregate for wholesale participation to enhance market performance.