Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning

Brett Daley; Christopher Amato; Xueguang Lyu; Yuchen Xiao

arxiv: 2102.04402 · v2 · pith:Y3OPU3MWnew · submitted 2021-02-08 · 💻 cs.LG · cs.AI

Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning

Xueguang Lyu , Yuchen Xiao , Brett Daley , Christopher Amato This is my paper

classification 💻 cs.LG cs.AI

keywords centralizeddecentralizedcriticcriticschoiceimplicationslearningmethods

0 comments

read the original abstract

Centralized Training for Decentralized Execution, where agents are trained offline using centralized information but execute in a decentralized manner online, has gained popularity in the multi-agent reinforcement learning community. In particular, actor-critic methods with a centralized critic and decentralized actors are a common instance of this idea. However, the implications of using a centralized critic in this context are not fully discussed and understood even though it is the standard choice of many algorithms. We therefore formally analyze centralized and decentralized critic approaches, providing a deeper understanding of the implications of critic choice. Because our theory makes unrealistic assumptions, we also empirically compare the centralized and decentralized critic methods over a wide set of environments to validate our theories and to provide practical advice. We show that there exist misconceptions regarding centralized critics in the current literature and show that the centralized critic design is not strictly beneficial, but rather both centralized and decentralized critics have different pros and cons that should be taken into account by algorithm designers.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification
astro-ph.IM 2026-05 unverdicted novelty 7.0

AstroAlertBench evaluates multimodal LLMs on astronomical classification accuracy, reasoning, and honesty using real ZTF alerts, revealing that high accuracy often diverges from self-assessed reasoning quality.
Fast Rates in $\alpha$-Potential Games via Regularized Mirror Descent
cs.GT 2026-04 unverdicted novelty 7.0

OPMD achieves the first fast Õ(1/n) rate for offline Nash equilibrium learning in α-potential games via a new reference-anchored coverage framework.
Fast Rates in $\alpha$-Potential Games via Regularized Mirror Descent
cs.GT 2026-04 unverdicted novelty 7.0

Proposes OPMD algorithm achieving accelerated O(1/n) rates for offline Nash equilibrium learning in alpha-potential games via reference-anchored data coverage.
Pessimism-Free Offline Learning in General-Sum Games via KL Regularization
cs.LG 2026-04 unverdicted novelty 7.0

KL regularization enables pessimism-free offline learning in general-sum games, recovering regularized Nash equilibria at accelerated rate O(1/n) via GANE and converging to coarse correlated equilibria at standard rat...
Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 6.0

MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 6.0

MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in a unified policy.
Pessimism-Free Offline Learning in General-Sum Games via KL Regularization
cs.LG 2026-04 unverdicted novelty 6.0

KL regularization enables pessimism-free offline learning in general-sum games by recovering regularized Nash equilibria at rate O(1/n) via GANE and converging to coarse correlated equilibria at O(1/sqrt(n) + 1/T) via GAMD.
Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic
cs.AI 2026-01 unverdicted novelty 6.0

Multi-agent actor-critic methods with a centralized critic improve decentralized LLM collaboration over Monte Carlo baselines in long-horizon and sparse-reward settings.
Cross-Modal Navigation with Multi-Agent Reinforcement Learning
cs.RO 2026-05 unverdicted novelty 5.0

CRONA is a MARL framework that uses modality-specialized agents with auxiliary beliefs and a centralized multi-modal critic to achieve better performance and efficiency than single-agent baselines on visual-acoustic n...
Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies
cs.LG 2025-08 unverdicted novelty 5.0

CoSER adaptively samples joint actions in CTDE MARL to reduce sampling error relative to the joint on-policy distribution, empirically improving reliability of independent policy gradient convergence.