Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

Arrasy Rahman; Filippos Christianos; Georgios Papoudakis; Stefano V. Albrecht

arxiv: 1906.04737 · v1 · pith:N2EVZ3ZXnew · submitted 2019-06-11 · 💻 cs.LG · cs.AI· cs.MA· stat.ML

Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

Georgios Papoudakis , Filippos Christianos , Arrasy Rahman , Stefano V. Albrecht This is my paper

classification 💻 cs.LG cs.AIcs.MAstat.ML

keywords learningmulti-agentreinforcementagentsdeepdecision-makingnon-stationarityproblems

0 comments

read the original abstract

Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains. A particular approach which has received increasing attention is multi-agent reinforcement learning, in which multiple agents learn concurrently to coordinate their actions. In such multi-agent environments, additional learning problems arise due to the continually changing decision-making policies of agents. This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning. The surveyed methods range from modifications in the training procedure, such as centralized training, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. The survey concludes with a list of open problems and possible lines of future research.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Plasticity-Enhanced Multi-Agent Mixture of Experts for Dynamic Objective Adaptation in UAVs-Assisted Emergency Communication Networks
cs.MA 2026-04 unverdicted novelty 7.0

PE-MAMoE combines sparsely gated mixture-of-experts actors with a non-parametric phase controller in MAPPO to maintain plasticity under dynamic user mobility and traffic, yielding 26.3% higher normalized IQM return in...
Temporal Task Diversity: Inductive Biases Under Non-Stationarity in Synthetic Sequence Modelling
cs.LG 2026-05 unverdicted novelty 6.0

Temporal diversity in task distribution during training increases generalization bias over memorization in transformers for in-context linear regression.
Overcoming Environmental Meta-Stationarity in MARL via Adaptive Curriculum and Counterfactual Group Advantage
cs.AI 2025-06 unverdicted novelty 6.0

CL-MARL uses an adaptive curriculum scheduler called FlexDiff and Counterfactual Group Relative Policy Advantage to break static-difficulty training in MARL and achieve higher win rates on hard StarCraft maps.
ERPPO: Entropy Regularization-based Proximal Policy Optimization
cs.LG 2026-05 unverdicted novelty 5.0

ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.
RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach
cs.LG 2026-03 unverdicted novelty 5.0

RE-SAC disentangles aleatoric and epistemic risks via IPM regularization on the critic and a diversified Q-ensemble, yielding higher rewards and lower estimation error than vanilla SAC in simulated bus corridor control.
PIMbot: A Self-Adaptive Attack Framework for Adversarial Manipulation of Multi-Robot Reinforcement Learning
cs.RO 2026-05 unverdicted novelty 4.0

PIMbot introduces an adaptive attack using reward-channel and policy manipulation to disrupt cooperation in multi-robot social dilemma RL, shown effective in Gazebo simulation and on NVIDIA Jetson hardware.