Mixed citations

Is independent learning all you need in the StarCraft multi-agent challenge?

· 2011 · arXiv 2011.09533

Mixed citation behavior. Most common role is background (40%).

24 Pith papers citing it

Background 40% of classified citations

read on arXiv browse 24 citing papers

citation-role summary

background 2 method 2 baseline 1

citation-polarity summary

background 2 use method 2 baseline 1

representative citing papers

Structural Equivalence and Learning Dynamics in Delayed MARL

cs.LG · 2026-05-05 · accept · novelty 8.0

Observation and action delays are formally equivalent in cooperative Dec-POMDPs, yielding identical optimal solutions and enabling zero-shot transfer, though learning dynamics differ due to credit assignment and operational constraints.

ARMS: Automatic Reward Shaping for Sparse-Reward Multi-Agent Reinforcement Learning

cs.MA · 2026-05-22 · unverdicted · novelty 7.0

ARMS is an automatic reward-shaping framework for sparse-reward MARL that uses trajectory ranking and conditional best-response reasoning to preserve Nash equilibria while improving sampling efficiency in pathfinding tasks.

Randomness is sometimes necessary for coordination

cs.AI · 2026-05-07 · conditional · novelty 7.0

Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.

Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning

cs.MA · 2026-05-03 · unverdicted · novelty 7.0

A quality-aware exploration method using return-conditioned sigmoid scheduling and per-agent RSQ metrics achieves top-tier returns on seven cooperative MARL benchmarks.

One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

cs.AI · 2025-07-21 · conditional · novelty 7.0

OSPO trains optimal order dispatch policies for homogeneous AV fleets using only one-step group rewards, outperforming GRPO on a real ride-hailing dataset.

Queue-Aware Graph Reinforcement Learning for UAV-ISAC-Assisted Maritime Data Collection

eess.SY · 2026-07-01 · unverdicted · novelty 6.0

A queue-weighted graph-MARL framework with masked sequential b-matching for UAV-buoy associations improves cumulative collection utility by 106% over rate-driven baselines in maritime ISAC simulations.

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

pcsp is a shared RL policy using LLM persona embeddings, low-rank projection, and PPO+InfoNCE+KL training that delivers 17x above-chance zero-shot persona identification and 22x faster inference on a 300-persona benchmark.

GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.

Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.

Shaping Zero-Shot Coordination via State Blocking

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.

Rethinking Ratio-Based Trust Regions for Policy Optimization in Multi-Agent Reinforcement Learning

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

MARS replaces additive clipping and soft penalties in multi-agent trust-region methods with a symmetric geometric barrier, matching or exceeding MAPPO and MASPO performance across 47 tasks in eight environments.

SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning

cs.LG · 2026-05-08 · conditional · novelty 6.0 · 2 refs

SACHI enriches agent representations via graph transformer convolutions over inter-agent graphs to enable holistic information integration, outperforming baselines across five cooperative tasks with statistical significance.

Hierarchical Multiagent Reinforcement Learning for Multi-Group Tax Game

cs.MA · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

A bilevel MARL framework with curriculum learning and closed-loop sequential updates learns stable tax policies in multi-group taxation simulations, extending effective game duration by 60.92% and reducing GDP disparities by 44.12% versus baseline.

Priority-Driven Control and Communication in Decentralized Multi-Agent Systems via Reinforcement Learning

eess.SY · 2026-05-11 · unverdicted · novelty 5.0 · 2 refs

A priority-driven RL algorithm learns joint communication priorities and control policies for decentralized multi-agent systems in a model-free way and outperforms baselines on benchmark tasks.

A Survey of Multi-Agent Deep Reinforcement Learning with Graph Neural Network-Based Communication

cs.LG · 2026-04-28 · unverdicted · novelty 5.0

A survey of MARL with GNN-based communication that proposes a generalized process to organize and clarify existing methods.

Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies

cs.LG · 2025-08-01 · unverdicted · novelty 5.0

CoSER adaptively samples joint actions in CTDE MARL to reduce sampling error relative to the joint on-policy distribution, empirically improving reliability of independent policy gradient convergence.

$\alpha$-fair heterogeneous agent reinforcement learning

cs.MA · 2026-06-11 · unverdicted · novelty 4.0

Introduces α-fair HATRPO and HAPPO algorithms that integrate α-fairness into HATRL via a weighted advantage function while claiming to preserve convergence to Nash equilibria.

Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty

cs.LG · 2026-05-18 · unverdicted · novelty 4.0 · 2 refs

Co-training an SDC and pedestrians with MAPPO yields 78% goal success and 14% collisions versus 35%/33% for rule-based baselines, with jaywalking causing 62% of collisions and evidence of poor anticipation via speed differentials.

Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

cs.CL · 2026-05-04 · unverdicted · novelty 4.0

This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a tagged corpus.

GLo-MAPPO: Multi-Agent Deep Reinforcement Learning for Energy-Efficient UAV-Assisted LoRa Networks

cs.NI · 2025-09-22 · unverdicted · novelty 4.0

GLo-MAPPO applies centralized-training decentralized-execution MAPPO with a gain-based association scheme to jointly optimize LoRa parameters and UAV paths, yielding higher weighted energy efficiency than prior MARL baselines in simulations.

HiComm: Hierarchical Communication for Multi-agent Reinforcement Learning

cs.AI · 2026-06-28

Dynamics of Cognitive Heterogeneity: Investigating Behavioral Biases in Multi-Stage Supply Chains with LLM-Based Simulation

cs.MA · 2026-04-19

TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning

cs.MA · 2026-02-02

Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic

cs.AI · 2026-01-29

citing papers explorer

Showing 21 of 21 citing papers after filters.

Structural Equivalence and Learning Dynamics in Delayed MARL cs.LG · 2026-05-05 · accept · none · ref 36
Observation and action delays are formally equivalent in cooperative Dec-POMDPs, yielding identical optimal solutions and enabling zero-shot transfer, though learning dynamics differ due to credit assignment and operational constraints.
ARMS: Automatic Reward Shaping for Sparse-Reward Multi-Agent Reinforcement Learning cs.MA · 2026-05-22 · unverdicted · none · ref 27
ARMS is an automatic reward-shaping framework for sparse-reward MARL that uses trajectory ranking and conditional best-response reasoning to preserve Nash equilibria while improving sampling efficiency in pathfinding tasks.
Randomness is sometimes necessary for coordination cs.AI · 2026-05-07 · conditional · none · ref 72
Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.
Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning cs.MA · 2026-05-03 · unverdicted · none · ref 63
A quality-aware exploration method using return-conditioned sigmoid scheduling and per-agent RSQ metrics achieves top-tier returns on seven cooperative MARL benchmarks.
Queue-Aware Graph Reinforcement Learning for UAV-ISAC-Assisted Maritime Data Collection eess.SY · 2026-07-01 · unverdicted · none · ref 44
A queue-weighted graph-MARL framework with masked sequential b-matching for UAV-buoy associations improves cumulative collection utility by 106% over rate-driven baselines in maritime ISAC simulations.
One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents cs.AI · 2026-05-22 · unverdicted · none · ref 27
pcsp is a shared RL policy using LLM persona embeddings, low-rank projection, and PPO+InfoNCE+KL training that delivers 17x above-chance zero-shot persona identification and 22x faster inference on a 300-persona benchmark.
GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning cs.LG · 2026-05-19 · unverdicted · none · ref 23
GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.
Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning cs.AI · 2026-05-12 · unverdicted · none · ref 136
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
Shaping Zero-Shot Coordination via State Blocking cs.LG · 2026-05-12 · unverdicted · none · ref 50
SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.
Rethinking Ratio-Based Trust Regions for Policy Optimization in Multi-Agent Reinforcement Learning cs.LG · 2026-05-09 · unverdicted · none · ref 31
MARS replaces additive clipping and soft penalties in multi-agent trust-region methods with a symmetric geometric barrier, matching or exceeding MAPPO and MASPO performance across 47 tasks in eight environments.
SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning cs.LG · 2026-05-08 · conditional · none · ref 23 · 2 links
SACHI enriches agent representations via graph transformer convolutions over inter-agent graphs to enable holistic information integration, outperforming baselines across five cooperative tasks with statistical significance.
Hierarchical Multiagent Reinforcement Learning for Multi-Group Tax Game cs.MA · 2026-05-06 · unverdicted · none · ref 11 · 2 links
A bilevel MARL framework with curriculum learning and closed-loop sequential updates learns stable tax policies in multi-group taxation simulations, extending effective game duration by 60.92% and reducing GDP disparities by 44.12% versus baseline.
Priority-Driven Control and Communication in Decentralized Multi-Agent Systems via Reinforcement Learning eess.SY · 2026-05-11 · unverdicted · none · ref 21 · 2 links
A priority-driven RL algorithm learns joint communication priorities and control policies for decentralized multi-agent systems in a model-free way and outperforms baselines on benchmark tasks.
A Survey of Multi-Agent Deep Reinforcement Learning with Graph Neural Network-Based Communication cs.LG · 2026-04-28 · unverdicted · none · ref 3
A survey of MARL with GNN-based communication that proposes a generalized process to organize and clarify existing methods.
$\alpha$-fair heterogeneous agent reinforcement learning cs.MA · 2026-06-11 · unverdicted · none · ref 28
Introduces α-fair HATRPO and HAPPO algorithms that integrate α-fairness into HATRL via a weighted advantage function while claiming to preserve convergence to Nash equilibria.
Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty cs.LG · 2026-05-18 · unverdicted · none · ref 12 · 2 links
Co-training an SDC and pedestrians with MAPPO yields 78% goal success and 14% collisions versus 35%/33% for rule-based baselines, with jaywalking causing 62% of collisions and evidence of poor anticipation via speed differentials.
Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces cs.CL · 2026-05-04 · unverdicted · none · ref 11
This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a tagged corpus.
HiComm: Hierarchical Communication for Multi-agent Reinforcement Learning cs.AI · 2026-06-28 · unreviewed · ref 3
Dynamics of Cognitive Heterogeneity: Investigating Behavioral Biases in Multi-Stage Supply Chains with LLM-Based Simulation cs.MA · 2026-04-19 · unreviewed · ref 39
TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning cs.MA · 2026-02-02 · unreviewed · ref 6
Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic cs.AI · 2026-01-29 · unreviewed · ref 9

Is independent learning all you need in the StarCraft multi-agent challenge?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer