Proposes HAD-MFC framework that decouples upper-level vulnerable agent selection from lower-level adversarial policy learning in large-scale MARL using Fenchel-Rockafellar transform and MDP reformulation with provable optimality preservation.
Efficient model-based multi-agent mean- field reinforcement learning
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
The authors propose actor-critic q-learning algorithms for mean-field control with common noise based on martingale orthogonality conditions and relaxed controls, establish convergence of inner iterations in the linear-quadratic case, and demonstrate performance on examples.
citing papers explorer
-
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
Proposes HAD-MFC framework that decouples upper-level vulnerable agent selection from lower-level adversarial policy learning in large-scale MARL using Fenchel-Rockafellar transform and MDP reformulation with provable optimality preservation.
-
Continuous-time q-learning for mean-field control with common noise, part-II: q-learning algorithms
The authors propose actor-critic q-learning algorithms for mean-field control with common noise based on martingale orthogonality conditions and relaxed controls, establish convergence of inner iterations in the linear-quadratic case, and demonstrate performance on examples.