DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.
Trust region policy optimisation in multi-agent reinforcement learning
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
IBTS framework uses influence shaping to improve zero-shot human-machine teaming beyond partner diversity alone, with gains shown in Overcooked-AI simulations and a 30-subject human study.
SLIM decouples inter-agent communication from policy execution in MARL via a dedicated pathway and a normalized bandwidth budget β, yielding robust performance under tight communication limits on standard benchmarks.
citing papers explorer
-
Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning
DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.
-
Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming
IBTS framework uses influence shaping to improve zero-shot human-machine teaming beyond partner diversity alone, with gains shown in Overcooked-AI simulations and a 30-subject human study.
-
Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints
SLIM decouples inter-agent communication from policy execution in MARL via a dedicated pathway and a normalized bandwidth budget β, yielding robust performance under tight communication limits on standard benchmarks.