Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Jakob Foerster , Nantas Nardelli , Gregory Farquhar , Triantafyllos Afouras , Philip H. S. Torr , Pushmeet Kohli , Shimon Whiteson

Authors on Pith no claims yet

classification 💻 cs.AI cs.LGcs.MA

keywords multi-agentreplaydeepexperiencelearningmethodsdatamemory

0 comments

read the original abstract

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Plasticity-Enhanced Multi-Agent Mixture of Experts for Dynamic Objective Adaptation in UAVs-Assisted Emergency Communication Networks
cs.MA 2026-04 unverdicted novelty 7.0

PE-MAMoE combines sparsely gated mixture-of-experts actors with a non-parametric phase controller in MAPPO to maintain plasticity under dynamic user mobility and traffic, yielding 26.3% higher normalized IQM return in...