The starcraft multi-agent challenge,

· 2019 · arXiv 1902.04043

24 Pith papers cite this work. Polarity classification is still indexing.

24 Pith papers citing it

read on arXiv browse 24 citing papers

citation-role summary

background 2 method 1 other 1

citation-polarity summary

background 2 unclear 1 use method 1

representative citing papers

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning

cs.LG · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

IBAL framework constructs information-theoretic adversarial attacks on agent observations and actions to train MARL agents that remain robust to interaction disruptions and agent-missing scenarios.

Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows

cs.MA · 2026-05-09 · unverdicted · novelty 7.0

EntCollabBench shows that today's LLM agents still struggle with delegation, context transfer, parameter grounding, workflow closure, and decision commitment when tested in a simulated enterprise with 11 role-specialized agents.

CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making

cs.AI · 2026-05-02 · unverdicted · novelty 7.0 · 3 refs

CoFlow achieves state-of-the-art coordination in offline MARL using single-pass joint velocity fields with Coordinated Velocity Attention and Adaptive Coordination Gating.

Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 7.0

CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.

Play Like Champions: Counterfactual Feedback Generation in Latent Space

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

A guided VAE trained on pro StarCraft replays enables four latent-space traversal strategies to produce counterfactual improvement trajectories for amateur players.

Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

IBTS framework uses influence shaping to improve zero-shot human-machine teaming beyond partner diversity alone, with gains shown in Overcooked-AI simulations and a 30-subject human study.

SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning

cs.LG · 2026-05-08 · conditional · novelty 6.0 · 2 refs

SACHI enriches agent representations via graph transformer convolutions over inter-agent graphs to enable holistic information integration, outperforming baselines across five cooperative tasks with statistical significance.

Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

Interactive IRL is cast as bi-level optimization with an inner loop learning expert rewards and an outer loop learning interaction policies, solved by the convergent BISIRL algorithm.

Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents

cs.AI · 2026-04-24 · unverdicted · novelty 6.0

Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.

Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

VGM²P achieves SOTA-comparable performance in offline MARL via value-guided conditional behavior cloning with MeanFlow, enabling efficient single-step action generation insensitive to regularization coefficients.

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

cs.CL · 2025-11-25 · unverdicted · novelty 6.0

Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.

Overcoming Environmental Meta-Stationarity in MARL via Adaptive Curriculum and Counterfactual Group Advantage

cs.AI · 2025-06-09 · unverdicted · novelty 6.0

CL-MARL uses an adaptive curriculum scheduler called FlexDiff and Counterfactual Group Relative Policy Advantage to break static-difficulty training in MARL and achieve higher win rates on hard StarCraft maps.

VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments

cs.AI · 2025-06-03 · unverdicted · novelty 6.0

VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of

Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning

cs.MA · 2025-02-05 · unverdicted · novelty 6.0

Optimistic ε-Greedy Exploration adds decoupled optimistic networks that converge in probability to maximum returns and samples from them with probability ε to increase optimal joint-action frequency in CTDE MARL.

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

cs.LG · 2025-02-05 · unverdicted · novelty 6.0

Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.

Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning

cs.LG · 2024-04-17 · unverdicted · novelty 6.0

GACG infers a coordination graph capturing both pair and group dependencies for information exchange in MARL, adds a group distance loss for consistency, and reports superior performance on StarCraft II micromanagement tasks.

Arena: a toolkit for Multi-Agent Reinforcement Learning

cs.LG · 2019-07-20 · accept · novelty 6.0

Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.

A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations

cs.MA · 2026-04-29 · unverdicted · novelty 5.0

A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.

Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem

cs.LG · 2025-09-19 · unverdicted · novelty 5.0

DAC models fully decentralized cooperative MARL as a context modeling problem, using latent variables for joint policies to fix non-stationarity in value updates and relative overgeneralization in value estimation.

Focusing Influence Mechanism for Multi-Agent Reinforcement Learning

cs.LG · 2025-06-24 · unverdicted · novelty 5.0

The Focusing Influence Mechanism (FIM) uses an entropy-based criterion and eligibility traces to help multiple agents in reinforcement learning focus and maintain their influence on under-explored parts of the state space, improving coordinated exploration and performance under sparse rewards.

Episodic Memory Temporal Consistency for Cooperative Multi-Agent Reinforcement Learning

cs.LG · 2026-06-03 · unverdicted · novelty 4.0

EMTC adds temporal consistency to episodic memory in MARL via contrastive time-conditioned embeddings and dynamic gating, backed by an error bound and yielding up to 24% win-rate gains on hard SMAC and 28% on GRF.

PIMbot: A Self-Adaptive Attack Framework for Adversarial Manipulation of Multi-Robot Reinforcement Learning

cs.RO · 2026-05-21 · unverdicted · novelty 4.0

PIMbot introduces an adaptive attack using reward-channel and policy manipulation to disrupt cooperation in multi-robot social dilemma RL, shown effective in Gazebo simulation and on NVIDIA Jetson hardware.

TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning

cs.MA · 2026-02-02

citing papers explorer

Showing 15 of 15 citing papers after filters.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation cs.LG · 2026-05-18 · unverdicted · none · ref 128
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning cs.LG · 2026-05-18 · unverdicted · none · ref 11 · 2 links
IBAL framework constructs information-theoretic adversarial attacks on agent observations and actions to train MARL agents that remain robust to interaction disruptions and agent-missing scenarios.
Beyond the All-in-One Agent: Benchmarking Role-Specialized Multi-Agent Collaboration in Enterprise Workflows cs.MA · 2026-05-09 · unverdicted · none · ref 8
EntCollabBench shows that today's LLM agents still struggle with delegation, context transfer, parameter grounding, workflow closure, and decision commitment when tested in a simulated enterprise with 11 role-specialized agents.
CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making cs.AI · 2026-05-02 · unverdicted · none · ref 41 · 3 links
CoFlow achieves state-of-the-art coordination in offline MARL using single-pass joint velocity fields with Coordinated Velocity Attention and Adaptive Coordination Gating.
Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning cs.LG · 2026-04-09 · unverdicted · none · ref 50
CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.
Play Like Champions: Counterfactual Feedback Generation in Latent Space cs.LG · 2026-06-30 · unverdicted · none · ref 34
A guided VAE trained on pro StarCraft replays enables four latent-space traversal strategies to produce counterfactual improvement trajectories for amateur players.
Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming cs.AI · 2026-05-14 · unverdicted · none · ref 35
IBTS framework uses influence shaping to improve zero-shot human-machine teaming beyond partner diversity alone, with gains shown in Overcooked-AI simulations and a 30-subject human study.
SACHI: Structured Agent Coordination via Holistic Information Integration in Multi-Agent Reinforcement Learning cs.LG · 2026-05-08 · conditional · none · ref 52 · 2 links
SACHI enriches agent representations via graph transformer convolutions over inter-agent graphs to enable holistic information integration, outperforming baselines across five cooperative tasks with statistical significance.
Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization cs.LG · 2026-05-01 · unverdicted · none · ref 31
Interactive IRL is cast as bi-level optimization with an inner loop learning expert rewards and an outer loop learning interaction policies, solved by the convergent BISIRL algorithm.
Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents cs.AI · 2026-04-24 · unverdicted · none · ref 40
Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.
Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning cs.LG · 2026-04-09 · unverdicted · none · ref 37
VGM²P achieves SOTA-comparable performance in offline MARL via value-guided conditional behavior cloning with MeanFlow, enabling efficient single-step action generation insensitive to regularization coefficients.
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations cs.MA · 2026-04-29 · unverdicted · none · ref 25
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.
Episodic Memory Temporal Consistency for Cooperative Multi-Agent Reinforcement Learning cs.LG · 2026-06-03 · unverdicted · none · ref 5
EMTC adds temporal consistency to episodic memory in MARL via contrastive time-conditioned embeddings and dynamic gating, backed by an error bound and yielding up to 24% win-rate gains on hard SMAC and 28% on GRF.
PIMbot: A Self-Adaptive Attack Framework for Adversarial Manipulation of Multi-Robot Reinforcement Learning cs.RO · 2026-05-21 · unverdicted · none · ref 54
PIMbot introduces an adaptive attack using reward-channel and policy manipulation to disrupt cooperation in multi-robot social dilemma RL, shown effective in Gazebo simulation and on NVIDIA Jetson hardware.
TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning cs.MA · 2026-02-02 · unreviewed · ref 14

The starcraft multi-agent challenge,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer