MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

Andreas Bulling; Bram Grooten; Constantin Ruhdorfer; Fabrice Kusters; Luka van den Boogaard; Meng Fang; Mykola Pechenizkiy; Samuel Garcin; Tristan Tomilin; Yali Du

arxiv: 2506.14990 · v3 · pith:6OYASPGPnew · submitted 2025-06-17 · 💻 cs.AI

MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

Tristan Tomilin , Luka van den Boogaard , Samuel Garcin , Constantin Ruhdorfer , Bram Grooten , Fabrice Kusters , Yali Du , Andreas Bulling

show 2 more authors

Mykola Pechenizkiy Meng Fang

This is my paper

classification 💻 cs.AI

keywords learningcontinualmulti-agentmealsequencesbenchmarkenvironmentsreinforcement

0 comments

read the original abstract

Benchmarks play a central role in reinforcement learning (RL) research, yet their computational constraints often shape what is studied. Despite the motivation of lifelong learning, most continual RL papers consider only 3-10 sequential tasks, as CPU-bound environments make longer sequences impractical. Meanwhile, continual learning in cooperative multi-agent settings remains largely unexplored. To address these gaps, we introduce MEAL (Multi-agent Environments for Adaptive Learning), the first benchmark for continual multi-agent RL. By leveraging JAX and GPU acceleration, MEAL enables training on sequences of 100 tasks in a few hours on a single GPU. We find that long task sequences reveal failure modes that do not appear at smaller scales.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Offline Multi-agent Continual Cooperation via Skill Partition and Reuse
cs.AI 2026-06 unverdicted novelty 7.0

COMAD discovers and reuses coordination skills from mixed offline MARL data via auto-encoders and density-based estimation to achieve continual learning with better transfer.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
cs.CL 2025-11 unverdicted novelty 7.0

Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
Stagnant Neuron: Towards Understanding the Plasticity Loss in Multi-Agent Reinforcement Learning Value Factorization Methods
cs.LG 2026-06 unverdicted novelty 6.0

KNIFE targets stagnant neurons in MARL value factorization by replacing them with a composite of frozen, re-initialized, and compensating units to restore plasticity while preserving cooperation knowledge.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
cs.CL 2025-11 unverdicted novelty 6.0

Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and ...