hub

Value-Decomposition Networks For Cooperative Multi-Agent Learning

17 Preprint · 2017 · cs.AI · arXiv 1706.05296

28 Pith papers cite this work. Polarity classification is still indexing.

28 Pith papers citing it

open full Pith review browse 28 citing papers arXiv PDF

abstract

We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. This class of learning problems is difficult because of the often large combined action and observation spaces. In the fully centralized and decentralized approaches, we find the problem of spurious rewards and a phenomenon we call the "lazy agent" problem, which arises due to partial observability. We address these problems by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions. We perform an experimental evaluation across a range of partially-observable multi-agent domains and show that learning such value-decompositions leads to superior results, in particular when combined with weight sharing, role information and information channels.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 method 1

citation-polarity summary

background 3 use method 1

representative citing papers

Acting on the Unseen: Communication-Free Collaborative Filtering for Decentralized Multi-Robot Task Allocation

cs.RO · 2026-05-25 · unverdicted · novelty 7.0

SwarmCF enables robots to achieve low error on unseen task pairs with per-robot sample complexity linear in rank d rather than task count n by running decentralized low-rank matrix completion on masked broadcast data in the zero-knowledge MRTA regime.

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

cs.AI · 2026-05-14 · unverdicted · novelty 7.0 · 2 refs

A survey that unifies prior work on multi-agent LLM systems via the LIFE framework, mapping dependencies across collaboration, failure attribution, and autonomous self-evolution while identifying cross-stage challenges.

Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

CPPO is an on-policy contrastive RL method that derives advantages from contrastive Q-values for PPO optimization, outperforming prior CRL baselines in 14/18 tasks and matching or exceeding reward-based PPO in 12/18 tasks.

Metric-Gradient Projection for Stable Multi-Agent Policy Learning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

HPML projects multi-agent update fields onto the closest metric-gradient potential flow via Hodge decomposition, yielding Lyapunov potentials and equilibrium-gap bounds.

Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning

cs.MA · 2026-02-23 · unverdicted · novelty 7.0

DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.

Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

ATD(λ) adapts TD(λ) in MARL via a density ratio estimator on past/current replay buffers to assign λ per state-action pair, yielding competitive or better results than fixed-λ QMIX and MAPPO on SMAC and Gfootball.

Randomness is sometimes necessary for coordination

cs.AI · 2026-05-07 · conditional · novelty 7.0

Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.

Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning

cs.MA · 2026-05-07 · unverdicted · novelty 7.0

A new controlled testbed and coordination diagnostics show that multi-agent RL methods achieving similar returns can differ substantially in redundant assignments, diversity, and efficiency.

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

NonZero introduces an interaction score and bandit-formalized proposal rule for local agent deviations in multi-agent MCTS, delivering a sublinear local-regret guarantee and improved sample efficiency on game benchmarks without full joint-action enumeration.

Quantum Advantage in Multi Agent Reinforcement Learning

cs.LG · 2026-05-14 · conditional · novelty 6.0

Entangled QMARL agents approach the Tsirelson bound of 0.854 in CHSH while unentangled versions match classical baselines, and hybrid quantum-classical setups outperform both in CoopNav.

SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning

cs.LG · 2026-05-08 · conditional · novelty 6.0 · 2 refs

SACHI enriches agent representations via graph transformer convolutions over inter-agent graphs to enable holistic information integration, outperforming baselines across five cooperative tasks with statistical significance.

Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning

cs.MA · 2025-02-05 · unverdicted · novelty 6.0

Optimistic ε-Greedy Exploration adds decoupled optimistic networks that converge in probability to maximum returns and samples from them with probability ε to increase optimal joint-action frequency in CTDE MARL.

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

cs.LG · 2025-02-05 · unverdicted · novelty 6.0

Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.

Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.

Do LLM-derived graph priors improve multi-agent coordination?

cs.LG · 2026-04-19 · unverdicted · novelty 6.0

LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.

Reflective Context Learning: Studying the Optimization Primitives of Context Space

cs.LG · 2026-04-03 · unverdicted · novelty 6.0

Reflective Context Learning unifies context optimization for agents by recasting prior methods as instances of a shared learning problem and extending them with classical primitives such as batching, failure replay, and grouped rollouts, yielding improvements on AppWorld, BrowseComp+, and RewardBene

Clarus: Coordinating Autonomous Research Agents toward Web-Scale Scientific Collaboration

cs.AI · 2026-06-29 · unverdicted · novelty 5.0

Clarus is a four-layer collaboration infrastructure with a project-agent-resource model that reformulates research as an open, traceable, multi-participant process.

Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework

cs.CR · 2026-01-12 · unverdicted · novelty 5.0

CyberOps-Bots is a hierarchical LLM-empowered multi-agent RL framework that reports 68.5% higher network availability and 34.7% better jumpstart performance in new scenarios without retraining on real cloud datasets.

Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem

cs.LG · 2025-09-19 · unverdicted · novelty 5.0

DAC models fully decentralized cooperative MARL as a context modeling problem, using latent variables for joint policies to fix non-stationarity in value updates and relative overgeneralization in value estimation.

Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies

cs.LG · 2025-08-01 · unverdicted · novelty 5.0

CoSER adaptively samples joint actions in CTDE MARL to reduce sampling error relative to the joint on-policy distribution, empirically improving reliability of independent policy gradient convergence.

Reflection of Episodes: Learning to Play Game from Expert and Self Experiences

cs.AI · 2025-02-19 · unverdicted · novelty 5.0

ROE framework lets LLM defeat Very Hard bot in TextStarCraft II via keyframe selection, expert/self-experience decisions, and post-game reflection for new self-experience.

Growing Action Spaces

cs.LG · 2019-06-28 · unverdicted · novelty 5.0

A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.

Adaptive Punishment for Cooperation in Mixed-Motive Games

cs.MA · 2026-05-23 · unverdicted · novelty 4.0

APC adapts punishment via dynamic probability and a reward-guided defection awareness module to foster cooperation in iterated public goods games and sequential social dilemmas, outperforming baselines.

GLo-MAPPO: Multi-Agent Deep Reinforcement Learning for Energy-Efficient UAV-Assisted LoRa Networks

cs.NI · 2025-09-22 · unverdicted · novelty 4.0

GLo-MAPPO applies centralized-training decentralized-execution MAPPO with a gain-based association scheme to jointly optimize LoRa parameters and UAV paths, yielding higher weighted energy efficiency than prior MARL baselines in simulations.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning cs.MA · 2026-02-23 · unverdicted · none · ref 23 · internal anchor
DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.
Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning cs.MA · 2026-05-07 · unverdicted · none · ref 31
A new controlled testbed and coordination diagnostics show that multi-agent RL methods achieving similar returns can differ substantially in redundant assignments, diversity, and efficiency.
Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning cs.MA · 2025-02-05 · unverdicted · none · ref 24 · internal anchor
Optimistic ε-Greedy Exploration adds decoupled optimistic networks that converge in probability to maximum returns and samples from them with probability ε to increase optimal joint-action frequency in CTDE MARL.
Adaptive Punishment for Cooperation in Mixed-Motive Games cs.MA · 2026-05-23 · unverdicted · none · ref 4 · internal anchor
APC adapts punishment via dynamic probability and a reward-guided defection awareness module to foster cooperation in iterated public goods games and sequential social dilemmas, outperforming baselines.

Value-Decomposition Networks For Cooperative Multi-Agent Learning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer