Mixed citations

Title resolution pending

Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip H · 2020 · arXiv 2011.09533

Mixed citation behavior. Most common role is background (40%).

21 Pith papers citing it

Background 40% of classified citations

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2 method 2 baseline 1

citation-polarity summary

background 2 use method 2 baseline 1

representative citing papers

Structural Equivalence and Learning Dynamics in Delayed MARL

cs.LG · 2026-05-05 · accept · novelty 8.0

Observation and action delays are formally equivalent in cooperative Dec-POMDPs, yielding identical optimal solutions and enabling zero-shot transfer, though learning dynamics differ due to credit assignment and operational constraints.

ARMS: Automatic Reward Shaping for Sparse-Reward Multi-Agent Reinforcement Learning

cs.MA · 2026-05-22 · unverdicted · novelty 7.0

ARMS is an automatic reward-shaping framework for sparse-reward MARL that uses trajectory ranking and conditional best-response reasoning to preserve Nash equilibria while improving sampling efficiency in pathfinding tasks.

Randomness is sometimes necessary for coordination

cs.AI · 2026-05-07 · conditional · novelty 7.0

Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.

Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning

cs.MA · 2026-05-03 · unverdicted · novelty 7.0

A quality-aware exploration method using return-conditioned sigmoid scheduling and per-agent RSQ metrics achieves top-tier returns on seven cooperative MARL benchmarks.

One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

cs.AI · 2025-07-21 · conditional · novelty 7.0

OSPO trains optimal order dispatch policies for homogeneous AV fleets using only one-step group rewards, outperforming GRPO on a real ride-hailing dataset.

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

pcsp is a shared RL policy using LLM persona embeddings, low-rank projection, and PPO+InfoNCE+KL training that delivers 17x above-chance zero-shot persona identification and 22x faster inference on a 300-persona benchmark.

GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.

Shaping Zero-Shot Coordination via State Blocking

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.

Rethinking Ratio-Based Trust Regions for Policy Optimization in Multi-Agent Reinforcement Learning

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

MARS replaces additive clipping and soft penalties in multi-agent trust-region methods with a symmetric geometric barrier, matching or exceeding MAPPO and MASPO performance across 47 tasks in eight environments.

SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning

cs.LG · 2026-05-08 · conditional · novelty 6.0 · 2 refs

SACHI enriches agent representations via graph transformer convolutions over inter-agent graphs to enable holistic information integration, outperforming baselines across five cooperative tasks with statistical significance.

Hierarchical Multiagent Reinforcement Learning for Multi-Group Tax Game

cs.MA · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

A bilevel MARL framework with curriculum learning and closed-loop sequential updates learns stable tax policies in multi-group taxation simulations, extending effective game duration by 60.92% and reducing GDP disparities by 44.12% versus baseline.

Priority-Driven Control and Communication in Decentralized Multi-Agent Systems via Reinforcement Learning

eess.SY · 2026-05-11 · unverdicted · novelty 5.0 · 2 refs

A priority-driven RL algorithm learns joint communication priorities and control policies for decentralized multi-agent systems in a model-free way and outperforms baselines on benchmark tasks.

A Survey of Multi-Agent Deep Reinforcement Learning with Graph Neural Network-Based Communication

cs.LG · 2026-04-28 · unverdicted · novelty 5.0

A survey of MARL with GNN-based communication that proposes a generalized process to organize and clarify existing methods.

Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies

cs.LG · 2025-08-01 · unverdicted · novelty 5.0

CoSER adaptively samples joint actions in CTDE MARL to reduce sampling error relative to the joint on-policy distribution, empirically improving reliability of independent policy gradient convergence.

$\alpha$-fair heterogeneous agent reinforcement learning

cs.MA · 2026-06-11 · unverdicted · novelty 4.0

Introduces α-fair HATRPO and HAPPO algorithms that integrate α-fairness into HATRL via a weighted advantage function while claiming to preserve convergence to Nash equilibria.

Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

cs.CL · 2026-05-04 · unverdicted · novelty 4.0

This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a tagged corpus.

GLo-MAPPO: Multi-Agent Deep Reinforcement Learning for Energy-Efficient UAV-Assisted LoRa Networks

cs.NI · 2025-09-22 · unverdicted · novelty 4.0

GLo-MAPPO applies centralized-training decentralized-execution MAPPO with a gain-based association scheme to jointly optimize LoRa parameters and UAV paths, yielding higher weighted energy efficiency than prior MARL baselines in simulations.

Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty

cs.LG · 2026-05-18

Dynamics of Cognitive Heterogeneity: Investigating Behavioral Biases in Multi-Stage Supply Chains with LLM-Based Simulation

cs.MA · 2026-04-19

TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning

cs.MA · 2026-02-02

Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic

cs.AI · 2026-01-29