DICE formalizes multi-agent LLM coordination as discounted incomplete-information Markov games and introduces Heterogeneous Quantal Response Equilibrium (HQRE) to achieve unique stable equilibria with bounded regret, demonstrated via prompt-control and fine-tuning algorithms on eleven benchmarks.
Qplex: Duplex dueling multi-agent q-learning
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
IBAL framework constructs information-theoretic adversarial attacks on agent observations and actions to train MARL agents that remain robust to interaction disruptions and agent-missing scenarios.
ATD(λ) adapts TD(λ) in MARL via a density ratio estimator on past/current replay buffers to assign λ per state-action pair, yielding competitive or better results than fixed-λ QMIX and MAPPO on SMAC and Gfootball.
SACHI enriches agent representations via graph transformer convolutions over inter-agent graphs to enable holistic information integration, outperforming baselines across five cooperative tasks with statistical significance.
LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.
Optimistic ε-Greedy Exploration adds decoupled optimistic networks that converge in probability to maximum returns and samples from them with probability ε to increase optimal joint-action frequency in CTDE MARL.
Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.
DAC models fully decentralized cooperative MARL as a context modeling problem, using latent variables for joint policies to fix non-stationarity in value updates and relative overgeneralization in value estimation.
EMTC adds temporal consistency to episodic memory in MARL via contrastive time-conditioned embeddings and dynamic gating, backed by an error bound and yielding up to 24% win-rate gains on hard SMAC and 28% on GRF.
citing papers explorer
-
Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning
Optimistic ε-Greedy Exploration adds decoupled optimistic networks that converge in probability to maximum returns and samples from them with probability ε to increase optimal joint-action frequency in CTDE MARL.
-
Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning
Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.
-
Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem
DAC models fully decentralized cooperative MARL as a context modeling problem, using latent variables for joint policies to fix non-stationarity in value updates and relative overgeneralization in value estimation.