Model-free DQN learning achieves suboptimality bounds of O(1/sqrt(Ns)) + O(1/N) in Karma DPGs at equilibrium, and deep RL combined with fictitious play empirically reaches near-Stationary Nash Equilibrium from scratch.
Mean field games.Japanese journal of mathematics, 2(1):229–260
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Proposes HAD-MFC framework that decouples upper-level vulnerable agent selection from lower-level adversarial policy learning in large-scale MARL using Fenchel-Rockafellar transform and MDP reformulation with provable optimality preservation.
Forward relative performance processes with CRRA wealth utility and separable time-space dependence necessarily induce matching CRRA consumption utility, yielding closed-form Nash equilibria for n-player and mean-field games with consumption.
citing papers explorer
-
Towards Model-Free Learning in Dynamic Population Games: An Application to Karma Economies
Model-free DQN learning achieves suboptimality bounds of O(1/sqrt(Ns)) + O(1/N) in Karma DPGs at equilibrium, and deep RL combined with fictitious play empirically reaches near-Stationary Nash Equilibrium from scratch.
-
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
Proposes HAD-MFC framework that decouples upper-level vulnerable agent selection from lower-level adversarial policy learning in large-scale MARL using Fenchel-Rockafellar transform and MDP reformulation with provable optimality preservation.
-
Optimal investment and consumption under forward utilities with relative performance concerns
Forward relative performance processes with CRRA wealth utility and separable time-space dependence necessarily induce matching CRRA consumption utility, yielding closed-form Nash equilibria for n-player and mean-field games with consumption.