Model-free DQN learning achieves suboptimality bounds of O(1/sqrt(Ns)) + O(1/N) in Karma DPGs at equilibrium, and deep RL combined with fictitious play empirically reaches near-Stationary Nash Equilibrium from scratch.
Mean field multi-agent reinforcement learning
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.
Proposes HAD-MFC framework that decouples upper-level vulnerable agent selection from lower-level adversarial policy learning in large-scale MARL using Fenchel-Rockafellar transform and MDP reformulation with provable optimality preservation.
citing papers explorer
-
Towards Model-Free Learning in Dynamic Population Games: An Application to Karma Economies
Model-free DQN learning achieves suboptimality bounds of O(1/sqrt(Ns)) + O(1/N) in Karma DPGs at equilibrium, and deep RL combined with fictitious play empirically reaches near-Stationary Nash Equilibrium from scratch.
-
Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning
DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.
-
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
Proposes HAD-MFC framework that decouples upper-level vulnerable agent selection from lower-level adversarial policy learning in large-scale MARL using Fenchel-Rockafellar transform and MDP reformulation with provable optimality preservation.