Trust region policy optimisation in multi-agent reinforcement learning

Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, Yaodong Yang · 2022

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning

cs.MA · 2026-02-23 · unverdicted · novelty 7.0

DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.

Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

IBTS framework uses influence shaping to improve zero-shot human-machine teaming beyond partner diversity alone, with gains shown in Overcooked-AI simulations and a 30-subject human study.

Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints

cs.MA · 2026-05-20 · unverdicted · novelty 5.0

SLIM decouples inter-agent communication from policy execution in MARL via a dedicated pathway and a normalized bandwidth budget β, yielding robust performance under tight communication limits on standard benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning cs.MA · 2026-02-23 · unverdicted · none · ref 12
DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.
Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming cs.AI · 2026-05-14 · unverdicted · none · ref 17
IBTS framework uses influence shaping to improve zero-shot human-machine teaming beyond partner diversity alone, with gains shown in Overcooked-AI simulations and a 30-subject human study.
Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints cs.MA · 2026-05-20 · unverdicted · none · ref 19
SLIM decouples inter-agent communication from policy execution in MARL via a dedicated pathway and a normalized bandwidth budget β, yielding robust performance under tight communication limits on standard benchmarks.

Trust region policy optimisation in multi-agent reinforcement learning

fields

years

verdicts

representative citing papers

citing papers explorer