Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning

Bo An; Chengdong Ma; Jie Luo; Linhao Wang; Ruixiao Xu; Simin Li; Weifeng Lv; Xianglong Liu; Xin Wang; Xin Yu

arxiv: 2509.15103 · v3 · submitted 2025-09-18 · 💻 cs.MA · cs.AI

Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning

Simin Li , Zihao Mao , Zheng Yuwei , Linhao Wang , Ruixiao Xu , Chengdong Ma , Zhiqian Liu , Xin Yu

show 7 more authors

Yuqing Ma Xin Wang Jie Luo Bo An Yaodong Yang Weifeng Lv Xianglong Liu

This is my paper

Pith reviewed 2026-05-18 15:47 UTC · model grok-4.3

classification 💻 cs.MA cs.AI

keywords vulnerable agent identificationmulti-agent reinforcement learninghierarchical adversarial controlFenchel-Rockafellar transformmean-field controlMDP reformulationlarge-scale MARL

0 comments

The pith

Decomposing the vulnerable agent identification problem in large-scale MARL via Fenchel-Rockafellar transform and MDP reformulation preserves the optimal solution set while enabling independent learning at each hierarchical level.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames the task of finding agents whose failure causes the worst system performance drops as a hierarchical adversarial decentralized mean-field control problem. An upper level chooses which agents to target in an NP-hard selection task, while a lower level learns adversarial policies against them. The authors apply the Fenchel-Rockafellar transform to separate the two levels into a regularized mean-field Bellman operator, then recast the selection step as a Markov decision process with dense rewards. This decomposition lets each level train independently and reduces complexity, yet the paper states that the set of optimal vulnerable agents remains exactly the same as in the original coupled formulation.

Core claim

We frame VAI as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC), where the upper level selects vulnerable agents as an NP-hard task and the lower level learns their worst-case adversarial policies via mean-field MARL. By first applying the Fenchel-Rockafellar transform we obtain a regularized mean-field Bellman operator that decouples the hierarchy and permits independent learning at each level. We then reformulate the upper-level selection as an MDP with dense rewards, which admits sequential identification through greedy and RL algorithms. This decomposition provably preserves the optimal solution.

What carries the argument

Fenchel-Rockafellar transform that decouples the hierarchical adversarial process into a regularized mean-field Bellman operator for the upper level, combined with MDP reformulation of the NP-hard agent selection.

If this is right

The method identifies a larger number of vulnerable agents than prior approaches in large-scale MARL environments.
Identified agents, when removed, cause measurably worse system failures than randomly chosen agents.
The approach reveals per-agent vulnerability rankings across entire large systems.
Computational cost drops because each level can now be solved independently rather than jointly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decoupling pattern could be tested on other hierarchical adversarial problems that combine discrete selection with continuous policy learning.
In deployed systems the per-agent vulnerability scores might serve as a diagnostic tool to prioritize redundancy or monitoring resources.
Because the MDP reformulation admits any RL algorithm, replacing the current solver with a more sample-efficient method would constitute a direct extension.

Load-bearing premise

The Fenchel-Rockafellar transform decouples the upper and lower levels without introducing approximation error that changes which agents belong to the optimal vulnerable set.

What would settle it

On a small-scale instance where exhaustive enumeration of all agent subsets is feasible, check whether the agents returned by the decomposed method exactly match the subset that produces the globally worst performance when those agents are removed.

Figures

Figures reproduced from arXiv: 2509.15103 by Bo An, Chengdong Ma, Jie Luo, Linhao Wang, Ruixiao Xu, Simin Li, Weifeng Lv, Xianglong Liu, Xin Wang, Xin Yu, Yaodong Yang, Yuqing Ma, Zheng Yuwei, Zhiqian Liu, Zihao Mao.

**Figure 2.** Figure 2: (a,b) Some agents contributes more to overall system when compromised, reflecting the [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Environments used in our experiments. The task Battle in Magent are proposed in [ [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗

read the original abstract

Partial agent failure becomes inevitable when systems scale up, making it crucial to identify the subset of agents whose failure causes worst-case system performance degradations. We study this Vulnerable Agent Identification (VAI) problem in large-scale multi-agent reinforcement learning (MARL). We frame VAI as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC), where the upper level selects vulnerable agents as an NP-hard task and the lower level learns their worst-case adversarial policies via mean-field MARL. The two problems are coupled together, making HAD-MFC difficult to solve. To handle this, we first decouple the hierarchical process by Fenchel-Rockafellar transform, resulting a regularized mean-field Bellman operator for upper level that enables independent learning at each level, thus reducing computational complexity. We next reformulate the upper-level NP-hard problem as an MDP with dense rewards, allowing sequential identification of vulnerable agents via greedy and RL algorithms. This decomposition provably preserves the optimal solution. Experiments show our method effectively identifies more vulnerable agents in large-scale MARL and the rule-based system, fooling system into worse failures, and reveals the vulnerability of each agent in large systems. Code available at https://github.com/Waken-dream/VAI

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a hierarchical adversarial mean-field framing for vulnerable agent identification in large MARL, using Fenchel-Rockafellar decoupling and an MDP reformulation, but the exact optimality preservation claim looks shaky on non-convex ground.

read the letter

This paper tackles the practical problem of spotting which agents matter most when systems scale and partial failures start to hurt overall performance. They cast vulnerable agent identification as a two-level adversarial mean-field control task and apply the Fenchel-Rockafellar transform to split the levels so each can be learned separately, then turn the top-level selection into a dense-reward MDP that greedy or RL methods can handle sequentially. The abstract states this decomposition preserves optimality and that experiments confirm stronger identification than baselines, with code released on GitHub. That combination of framing, decoupling, and reformulation does not appear in the cited prior MARL work, so the technical move is new. The experiments on large-scale MARL and a rule-based system show the method finds agents whose adversarial policies produce worse system outcomes, which is the kind of concrete diagnostic that safety work needs. The code availability helps too. The soft spot sits in the central guarantee. The transform is said to produce a regularized mean-field Bellman operator that keeps the optimal vulnerable-agent set intact while allowing independent learning. Yet the lower-level policy optimization and upper-level selection involve policy gradients and stochastic policies, which are non-convex. Standard Fenchel-Rockafellar duality gives zero gap only under convex-concave conditions, so the regularized operator is likely an approximation whose fixed point can differ from the original optimum and therefore change the ranking of vulnerable agents. No derivations, error bounds, or ablations are visible in the abstract, and the stress-test concern lands directly on this point. The empirical gains could still hold under approximation, but the provable preservation claim is the load-bearing part and currently rests on unshown steps. This work is for researchers focused on robustness and diagnostics in multi-agent systems rather than core algorithm design. A reader who needs scalable tools to audit failure modes in growing MARL setups will get usable ideas and results from it. The topic is relevant, the experiments are present, and the framing is inventive enough that it deserves a serious referee even if the math requires tightening. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper frames Vulnerable Agent Identification (VAI) in large-scale MARL as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC) problem. It decouples the hierarchy via the Fenchel-Rockafellar transform to obtain a regularized mean-field Bellman operator that permits independent learning at each level, then reformulates the upper-level NP-hard selection task as an MDP with dense rewards solvable by greedy and RL algorithms. The authors assert that this decomposition provably preserves the optimal solution set and report experimental results showing improved identification of vulnerable agents that lead to worse system failures.

Significance. If the optimality-preservation claim holds, the approach would offer a computationally tractable method for identifying critical agents in large MARL systems, with direct relevance to robustness and safety analysis. The public code release supports reproducibility. The significance is limited by the absence of explicit verification that the transform maintains the exact optimal vulnerable-agent set under the non-convex policy optimization present in the lower level.

major comments (2)

[Abstract] Abstract (paragraph on decoupling): The claim that the Fenchel-Rockafellar transform 'provably preserves the optimal solution' is load-bearing for the central contribution, yet no derivation, duality-gap bound, or theorem is referenced showing that the fixed point of the resulting regularized mean-field Bellman operator coincides with the original HAD-MFC optimum when the lower-level adversarial policy optimization is non-convex. Standard Fenchel-Rockafellar duality requires convex-concave structure; without explicit conditions or a proof that the set of optimal vulnerable agents is unchanged, the preservation statement cannot be evaluated.
[MDP reformulation] Section on MDP reformulation (upper-level problem): The reformulation of the NP-hard agent-selection task as an MDP with dense rewards is presented as enabling sequential identification, but no analysis is given of how the dense-reward construction interacts with the regularized Bellman operator to guarantee that the greedy/RL solutions recover the same vulnerable-agent ranking as the original hierarchical objective.

minor comments (2)

[Abstract] The abstract mentions 'the rule-based system' without defining the specific rule-based environment or providing its parameters, making it difficult to assess the generality of the reported failure results.
[Method] Notation for the regularization coefficient in the Bellman operator is introduced but its dependence on problem scale or agent count is not discussed, which affects reproducibility of the reported identification performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's thorough review and valuable feedback on our manuscript. We address the major comments point by point below. We will revise the paper to provide additional theoretical details and clarifications as suggested.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph on decoupling): The claim that the Fenchel-Rockafellar transform 'provably preserves the optimal solution' is load-bearing for the central contribution, yet no derivation, duality-gap bound, or theorem is referenced showing that the fixed point of the resulting regularized mean-field Bellman operator coincides with the original HAD-MFC optimum when the lower-level adversarial policy optimization is non-convex. Standard Fenchel-Rockafellar duality requires convex-concave structure; without explicit conditions or a proof that the set of optimal vulnerable agents is unchanged, the preservation statement cannot be evaluated.

Authors: We thank the referee for highlighting this important point. The Fenchel-Rockafellar transform is applied in the context of the mean-field approximation, where the lower-level problem is formulated as a convex optimization over distributions in the mean-field limit. We will add a new theorem in the revised manuscript that explicitly derives the duality gap bound and shows that under the mean-field regime, the optimal solution set for vulnerable agent selection is preserved even when individual policies are non-convex, as the transform operates on the aggregate mean-field quantities. A detailed proof will be included in the appendix. revision: yes
Referee: [MDP reformulation] Section on MDP reformulation (upper-level problem): The reformulation of the NP-hard agent-selection task as an MDP with dense rewards is presented as enabling sequential identification, but no analysis is given of how the dense-reward construction interacts with the regularized Bellman operator to guarantee that the greedy/RL solutions recover the same vulnerable-agent ranking as the original hierarchical objective.

Authors: We agree that the interaction between the dense-reward MDP and the regularized Bellman operator requires further elaboration. In the revised version, we will include an analysis showing that the dense rewards are constructed directly from the value function of the regularized operator, ensuring that the optimal policy in the MDP corresponds to the selection that maximizes the hierarchical objective. This will include a proposition demonstrating equivalence in the recovered ranking for the greedy algorithm. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation uses external transform and reformulation without self-reduction

full rationale

The paper applies the Fenchel-Rockafellar transform to decouple the HAD-MFC hierarchy into a regularized mean-field Bellman operator and then reformulates the upper-level selection as an MDP with dense rewards. It explicitly claims this 'decomposition provably preserves the optimal solution' and enables independent learning. No quoted step defines the vulnerable-agent set or optimality in terms of the output itself, renames a fitted quantity as a prediction, or relies on a self-citation chain for the preservation result. The transform is invoked as a standard decoupling tool with independent mathematical content, and the MDP reformulation is presented as a separate algorithmic step. The central claim therefore remains non-tautological relative to the inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on standard mean-field approximations for large agent populations and on the applicability of the Fenchel-Rockafellar transform to the adversarial control objective; no new physical entities are postulated.

free parameters (1)

regularization coefficient in Bellman operator
Introduced to enable independent upper-level learning after the transform; value not specified in abstract.

axioms (2)

domain assumption Mean-field approximation is valid for the large-scale agent population
Invoked for the lower-level adversarial policy learning.
domain assumption Fenchel-Rockafellar transform yields an equivalent regularized operator for the hierarchical problem
Central step that permits decoupling.

pith-pipeline@v0.9.0 · 5797 in / 1297 out tokens · 51263 ms · 2026-05-18T15:47:02.595098+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 1 internal anchor

[1]

Mean field multi-agent reinforcement learning

Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. Mean field multi-agent reinforcement learning. InInternational conference on machine learning, pages 5571–5580. PMLR, 2018

work page 2018
[2]

Decen- tralized mean field games

Sriram Ganapathi Subramanian, Matthew E Taylor, Mark Crowley, and Pascal Poupart. Decen- tralized mean field games. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 9439–9447, 2022

work page 2022
[3]

Efficient model-based multi-agent mean- field reinforcement learning.arXiv preprint arXiv:2107.04050, 2021

Barna Pasztor, Ilija Bogunovic, and Andreas Krause. Efficient model-based multi-agent mean- field reinforcement learning.arXiv preprint arXiv:2107.04050, 2021

work page arXiv 2021
[4]

Scalable deep rein- forcement learning algorithms for mean field games

Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Elie, Olivier Pietquin, et al. Scalable deep rein- forcement learning algorithms for mean field games. InInternational Conference on Machine Learning, pages 12078–12095. PMLR, 2022

work page 2022
[5]

Deep reinforcement learning for swarm systems.Journal of Machine Learning Research, 20(54):1–31, 2019

Maximilian Hüttenrauch, Sosic Adrian, Gerhard Neumann, et al. Deep reinforcement learning for swarm systems.Journal of Machine Learning Research, 20(54):1–31, 2019

work page 2019
[6]

Magent: A many-agent reinforcement learning platform for artificial collective intelligence

Lianmin Zheng, Jiacheng Yang, Han Cai, Ming Zhou, Weinan Zhang, Jun Wang, and Yong Yu. Magent: A many-agent reinforcement learning platform for artificial collective intelligence. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018

work page 2018
[7]

Multi-agent reinforcement learning for active voltage control on power distribution networks.Advances in Neural Information Processing Systems, 34:3271–3284, 2021

Jianhong Wang, Wangkun Xu, Yunjie Gu, Wenbin Song, and Tim C Green. Multi-agent reinforcement learning for active voltage control on power distribution networks.Advances in Neural Information Processing Systems, 34:3271–3284, 2021

work page 2021
[8]

Credit assignment for collective multiagent rl with global rewards.Advances in neural information processing systems, 31, 2018

Duc Thien Nguyen, Akshat Kumar, and Hoong Chuin Lau. Credit assignment for collective multiagent rl with global rewards.Advances in neural information processing systems, 31, 2018

work page 2018
[9]

Action robust reinforcement learning and applications in continuous control

Chen Tessler, Yonathan Efroni, and Shie Mannor. Action robust reinforcement learning and applications in continuous control. InInternational Conference on Machine Learning, pages 6215–6224. PMLR, 2019

work page 2019
[10]

Fault detection and diagnosis in multi-robot systems: A survey.Sensors, 19(18):4019, 2019

Eliahu Khalastchi and Meir Kalech. Fault detection and diagnosis in multi-robot systems: A survey.Sensors, 19(18):4019, 2019

work page 2019
[11]

Exploration in extreme environments with swarm robotic system

Xinge Huang, Farshad Arvin, Craig West, Simon Watson, and Barry Lennox. Exploration in extreme environments with swarm robotic system. In2019 IEEE international conference on mechatronics (ICM), volume 1, pages 193–198. IEEE, 2019

work page 2019
[12]

Anatomy of unmanned aerial vehicle hijacking with signal spoofing

Sait Murat Giray. Anatomy of unmanned aerial vehicle hijacking with signal spoofing. In 2013 6th International Conference on Recent Advances in Space Technologies (RAST), pages 795–800. IEEE, 2013

work page 2013
[13]

Cybersecurity in unmanned aerial vehicles (uavs).Journal of Cyber Security Technology, 5(2):120–137, 2021

Bora Ly and Romny Ly. Cybersecurity in unmanned aerial vehicles (uavs).Journal of Cyber Security Technology, 5(2):120–137, 2021

work page 2021
[14]

Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019

Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019. 10

work page arXiv 1905
[15]

On the robustness of cooperative multi-agent reinforcement learning

Jieyu Lin, Kristina Dzeparoska, Sai Qian Zhang, Alberto Leon-Garcia, and Nicolas Papernot. On the robustness of cooperative multi-agent reinforcement learning. In2020 IEEE Security and Privacy Workshops (SPW), pages 62–68. IEEE, 2020

work page 2020
[16]

Online markov decision processes with non-oblivious strategic adversary.Autonomous Agents and Multi-Agent Systems, 37(1):15, 2023

Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, and Yaodong Yang. Online markov decision processes with non-oblivious strategic adversary.Autonomous Agents and Multi-Agent Systems, 37(1):15, 2023

work page 2023
[17]

Mean field games.Japanese journal of mathematics, 2(1):229–260, 2007

Jean-Michel Lasry and Pierre-Louis Lions. Mean field games.Japanese journal of mathematics, 2(1):229–260, 2007

work page 2007
[18]

Maximizing the spread of influence through a social network

David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. InProceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137–146, 2003

work page 2003
[19]

A survey on influence maxi- mization in a social network.Knowledge and Information Systems, 62:3417–3455, 2020

Suman Banerjee, Mamata Jenamani, and Dilip Kumar Pratihar. A survey on influence maxi- mization in a social network.Knowledge and Information Systems, 62:3417–3455, 2020

work page 2020
[20]

A survey on influence maximization: From an ml-based combinatorial optimization.ACM Transactions on Knowledge Discovery from Data, 17(9):1–50, 2023

Yandi Li, Haobo Gao, Yunxuan Gao, Jianxiong Guo, and Weili Wu. A survey on influence maximization: From an ml-based combinatorial optimization.ACM Transactions on Knowledge Discovery from Data, 17(9):1–50, 2023

work page 2023
[21]

Evaluating robustness of cooperative marl: A model-based approach.arXiv preprint arXiv:2202.03558, 2022

Nhan H Pham, Lam M Nguyen, Jie Chen, Hoang Thanh Lam, Subhro Das, and Tsui-Wei Weng. Evaluating robustness of cooperative marl: A model-based approach.arXiv preprint arXiv:2202.03558, 2022

work page arXiv 2022
[22]

Adversarial attacks on cooperative multi-agent deep reinforcement learning: a dynamic group-based adversarial example transferability method

Lixia Zan, Xiangbin Zhu, and Zhao-Long Hu. Adversarial attacks on cooperative multi-agent deep reinforcement learning: a dynamic group-based adversarial example transferability method. Complex & Intelligent Systems, 9(6):7439–7450, 2023

work page 2023
[23]

Robustness testing for multi-agent reinforcement learning: State perturbations on critical agents.arXiv preprint arXiv:2306.06136, 2023

Ziyuan Zhou and Guanjun Liu. Robustness testing for multi-agent reinforcement learning: State perturbations on critical agents.arXiv preprint arXiv:2306.06136, 2023

work page arXiv 2023
[24]

The generalized maximum coverage problem.Information Processing Letters, 108(1):15–22, 2008

Reuven Cohen and Liran Katzir. The generalized maximum coverage problem.Information Processing Letters, 108(1):15–22, 2008

work page 2008
[25]

Convex analysis, 1970

Ralph Tyrell Rockafellar. Convex analysis, 1970

work page 1970
[26]

Reinforcement learning via fenchel-rockafellar duality.arXiv preprint arXiv:2001.01866, 2020

Ofir Nachum and Bo Dai. Reinforcement learning via fenchel-rockafellar duality.arXiv preprint arXiv:2001.01866, 2020

work page arXiv 2001
[27]

arXiv preprint arXiv:2011.00583 , year=

Yaodong Yang and Jun Wang. An overview of multi-agent reinforcement learning from game theoretical perspective.arXiv preprint arXiv:2011.00583, 2020

work page arXiv 2011
[28]

Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle.COM- MUNICATIONS IN INFORMATION AND SYSTEMS, 2006

Minyi Huang, Roland P Malhamé, and Peter E Caines. Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle.COM- MUNICATIONS IN INFORMATION AND SYSTEMS, 2006

work page 2006
[29]

Learning mean-field games.Advances in neural information processing systems, 32, 2019

Xin Guo, Anran Hu, Renyuan Xu, and Junzi Zhang. Learning mean-field games.Advances in neural information processing systems, 32, 2019

work page 2019
[30]

Scaling up mean field games with online mirror descent

Julien Perolat, Sarah Perrin, Romuald Elie, Mathieu Laurière, Georgios Piliouras, Matthieu Geist, Karl Tuyls, and Olivier Pietquin. Scaling up mean field games with online mirror descent. arXiv preprint arXiv:2103.00623, 2021

work page arXiv 2021
[31]

Learning correlated equilibria in mean-field games.arXiv preprint arXiv:2208.10138, 2022

Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, and Karl Tuyls. Learning correlated equilibria in mean-field games.arXiv preprint arXiv:2208.10138, 2022

work page arXiv 2022
[32]

Model-free mean-field reinforcement learning: mean-field mdp and mean-field q-learning.The Annals of Applied Probability, 33(6B):5334–5381, 2023

René Carmona, Mathieu Laurière, and Zongjun Tan. Model-free mean-field reinforcement learning: mean-field mdp and mean-field q-learning.The Annals of Applied Probability, 33(6B):5334–5381, 2023. 11

work page 2023
[33]

Mean-field controls with q-learning for cooperative marl: convergence and complexity analysis.SIAM Journal on Mathematics of Data Science, 3(4):1168–1196, 2021

Haotian Gu, Xin Guo, Xiaoli Wei, and Renyuan Xu. Mean-field controls with q-learning for cooperative marl: convergence and complexity analysis.SIAM Journal on Mathematics of Data Science, 3(4):1168–1196, 2021

work page 2021
[34]

On the approximation of cooperative heterogeneous multi-agent reinforcement learning (marl) using mean field control (mfc).Journal of Machine Learning Research, 23(129):1–46, 2022

Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, and Satish V Ukkusuri. On the approximation of cooperative heterogeneous multi-agent reinforcement learning (marl) using mean field control (mfc).Journal of Machine Learning Research, 23(129):1–46, 2022

work page 2022
[35]

Unified reinforcement q-learning for mean field game and control problems.Mathematics of Control, Signals, and Systems, 34(2):217–271, 2022

Andrea Angiuli, Jean-Pierre Fouque, and Mathieu Laurière. Unified reinforcement q-learning for mean field game and control problems.Mathematics of Control, Signals, and Systems, 34(2):217–271, 2022

work page 2022
[36]

Reinforcement learning in stationary mean- field games

Jayakumar Subramanian and Aditya Mahajan. Reinforcement learning in stationary mean- field games. InProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pages 251–259, 2019

work page 2019
[37]

Multi type mean field reinforcement learning.arXiv preprint arXiv:2002.02513, 2020

Sriram Ganapathi Subramanian, Pascal Poupart, Matthew E Taylor, and Nidhi Hegde. Multi type mean field reinforcement learning.arXiv preprint arXiv:2002.02513, 2020

work page arXiv 2002
[38]

Partially observable mean field reinforcement learning.arXiv preprint arXiv:2012.15791, 2020

Sriram Ganapathi Subramanian, Matthew E Taylor, Mark Crowley, and Pascal Poupart. Partially observable mean field reinforcement learning.arXiv preprint arXiv:2012.15791, 2020

work page arXiv 2012
[39]

Efficient model-based multi-agent reinforcement learning via optimistic equilibrium computation

Pier Giuseppe Sessa, Maryam Kamgarpour, and Andreas Krause. Efficient model-based multi-agent reinforcement learning via optimistic equilibrium computation. InInternational Conference on Machine Learning, pages 19580–19597. PMLR, 2022

work page 2022
[40]

Learning decentralized partially observable mean field control for artificial collective behavior.arXiv preprint arXiv:2307.06175, 2023

Kai Cui, Sascha Hauck, Christian Fabian, and Heinz Koeppl. Learning decentralized partially observable mean field control for artificial collective behavior.arXiv preprint arXiv:2307.06175, 2023

work page arXiv 2023
[41]

Major-minor mean field multi-agent reinforcement learning

Kai Cui, Christian Fabian, Anam Tahir, and Heinz Koeppl. Major-minor mean field multi-agent reinforcement learning. InForty-first International Conference on Machine Learning, 2024

work page 2024
[42]

Towards comprehen- sive testing on the robustness of cooperative multi-agent reinforcement learning

Jun Guo, Yonghong Chen, Yihang Hao, Zixin Yin, Yin Yu, and Simin Li. Towards comprehen- sive testing on the robustness of cooperative multi-agent reinforcement learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 115–122, 2022

work page 2022
[43]

Attacking cooperative multi-agent reinforcement learning by adversarial minority influence

Simin Li, Jun Guo, Jingqiao Xiu, Pu Feng, Xin Yu, Aishan Liu, Wenjun Wu, and Xianglong Liu. Attacking cooperative multi-agent reinforcement learning by adversarial minority influence. arXiv preprint arXiv:2302.03322, 2023

work page arXiv 2023
[44]

Robust multi-agent reinforcement learning with model uncertainty.Advances in neural information processing systems, 33:10571–10583, 2020

Kaiqing Zhang, Tao Sun, Yunzhe Tao, Sahika Genc, Sunil Mallya, and Tamer Basar. Robust multi-agent reinforcement learning with model uncertainty.Advances in neural information processing systems, 33:10571–10583, 2020

work page 2020
[45]

Sample-efficient robust multi- agent reinforcement learning in the face of environmental uncertainty.arXiv preprint arXiv:2404.18909, 2024

Laixi Shi, Eric Mazumdar, Yuejie Chi, and Adam Wierman. Sample-efficient robust multi- agent reinforcement learning in the face of environmental uncertainty.arXiv preprint arXiv:2404.18909, 2024

work page arXiv 2024
[46]

Learning and testing resilience in cooperative multi-agent systems

Thomy Phan, Thomas Gabor, Andreas Sedlmeier, Fabian Ritz, Bernhard Kempter, Cornel Klein, Horst Sauer, Reiner Schmid, Jan Wieghardt, Marc Zeller, et al. Learning and testing resilience in cooperative multi-agent systems. InProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 1055–1063, 2020

work page 2020
[47]

Efficient influence maximization in social networks

Wei Chen, Yajun Wang, and Siyu Yang. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 199–208, 2009

work page 2009
[48]

User interactions in social networks and their implications

Christo Wilson, Bryce Boe, Alessandra Sala, Krishna PN Puttaswamy, and Ben Y Zhao. User interactions in social networks and their implications. InProceedings of the 4th ACM European conference on Computer systems, pages 205–218, 2009

work page 2009
[49]

Scalable influence maximization in social networks under the linear threshold model

Wei Chen, Yifei Yuan, and Li Zhang. Scalable influence maximization in social networks under the linear threshold model. In2010 IEEE international conference on data mining, pages 88–97. IEEE, 2010. 12

work page 2010
[50]

A fast and effective heuristic for discovering small target sets in social networks

Gennaro Cordasco, Luisa Gargano, Marco Mecchia, Adele A Rescigno, and Ugo Vaccaro. A fast and effective heuristic for discovering small target sets in social networks. InCombinatorial Optimization and Applications: 9th International Conference, COCOA 2015, Houston, TX, USA, December 18-20, 2015, Proceedings, pages 193–208. Springer, 2015

work page 2015
[51]

A genetic newgreedy algorithm for influence maximization in social network

Chun-Wei Tsai, Yo-Chung Yang, and Ming-Chao Chiang. A genetic newgreedy algorithm for influence maximization in social network. In2015 IEEE International Conference on Systems, Man, and Cybernetics, pages 2549–2554. IEEE, 2015

work page 2015
[52]

Influence maximization in social networks with genetic algorithms

Doina Bucur and Giovanni Iacca. Influence maximization in social networks with genetic algorithms. InApplications of Evolutionary Computation: 19th European Conference, EvoAp- plications 2016, Porto, Portugal, March 30–April 1, 2016, Proceedings, Part I 19, pages 379–392. Springer, 2016

work page 2016
[53]

Community-based greedy algorithm for mining top-k influential nodes in mobile social networks

Yu Wang, Gao Cong, Guojie Song, and Kunqing Xie. Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. InProceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1039–1048, 2010

work page 2010
[54]

Cim: Community-based influence maximization in social networks.ACM Transactions on Intelligent Systems and Technology (TIST), 5(2):1–31, 2014

Yi-Cheng Chen, Wen-Yuan Zhu, Wen-Chih Peng, Wang-Chien Lee, and Suh-Yin Lee. Cim: Community-based influence maximization in social networks.ACM Transactions on Intelligent Systems and Technology (TIST), 5(2):1–31, 2014

work page 2014
[55]

Controlling graph dynamics with reinforcement learning and graph neural networks

Eli Meirom, Haggai Maron, Shie Mannor, and Gal Chechik. Controlling graph dynamics with reinforcement learning and graph neural networks. InInternational Conference on Machine Learning, pages 7565–7577. PMLR, 2021

work page 2021
[56]

Piano: Influence maximization meets deep reinforcement learning.IEEE Transactions on Computational Social Systems, 10(3):1288–1300, 2022

Hui Li, Mengting Xu, Sourav S Bhowmick, Joty Shafiq Rayhan, Changsheng Sun, and Jiangtao Cui. Piano: Influence maximization meets deep reinforcement learning.IEEE Transactions on Computational Social Systems, 10(3):1288–1300, 2022

work page 2022
[57]

Touplegdd: A fine-designed solution of influence maximization by deep reinforcement learning.IEEE Transactions on Computational Social Systems, 11(2):2210–2221, 2023

Tiantian Chen, Siwen Yan, Jianxiong Guo, and Weili Wu. Touplegdd: A fine-designed solution of influence maximization by deep reinforcement learning.IEEE Transactions on Computational Social Systems, 11(2):2210–2221, 2023

work page 2023
[58]

Deep graph representation learning and optimization for influence maximization

Chen Ling, Junji Jiang, Junxiang Wang, My T Thai, Renhao Xue, James Song, Meikang Qiu, and Liang Zhao. Deep graph representation learning and optimization for influence maximization. InInternational Conference on Machine Learning, pages 21350–21361. PMLR, 2023

work page 2023
[59]

Byzantine robust cooperative multi- agent reinforcement learning as a bayesian game.arXiv preprint arXiv:2305.12872, 2023a

Simin Li, Jun Guo, Jingqiao Xiu, Ruixiao Xu, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, and Xianglong Liu. Byzantine robust cooperative multi-agent reinforcement learning as a bayesian game.arXiv preprint arXiv:2305.12872, 2023

work page arXiv 2023
[60]

Robust dynamic programming.Mathematics of Operations Research, 30(2):257–280, 2005

Garud N Iyengar. Robust dynamic programming.Mathematics of Operations Research, 30(2):257–280, 2005

work page 2005
[61]

Challenges, tasks, and opportunities in modeling agent-based complex systems.Ecological Modelling, 457:109685, 2021

Li An, V olker Grimm, Abigail Sullivan, BL Turner Ii, Nicolas Malleson, Alison Heppenstall, Christian Vincenot, Derek Robinson, Xinyue Ye, Jianguo Liu, et al. Challenges, tasks, and opportunities in modeling agent-based complex systems.Ecological Modelling, 457:109685, 2021

work page 2021
[63]

Novel type of phase transition in a system of self-driven particles.Physical review letters, 75(6):1226, 1995

Tamás Vicsek, András Czirók, Eshel Ben-Jacob, Inon Cohen, and Ofer Shochet. Novel type of phase transition in a system of self-driven particles.Physical review letters, 75(6):1226, 1995

work page 1995
[64]

Dynamics and control of diseases in networks with community structure.PLoS computational biology, 6(4):e1000736, 2010

Marcel Salathé and James H Jones. Dynamics and control of diseases in networks with community structure.PLoS computational biology, 6(4):e1000736, 2010. 13

work page 2010
[65]

Feudal networks for hierarchical reinforcement learning

Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. InInternational conference on machine learning, pages 3540–3549. PMLR, 2017

work page 2017
[66]

Springer, 2018

René Carmona, François Delarue, et al.Probabilistic theory of mean field games with applica- tions I-II. Springer, 2018

work page 2018
[67]

Dynamic programming.science, 153(3731):34–37, 1966

Richard Bellman. Dynamic programming.science, 153(3731):34–37, 1966

work page 1966
[68]

Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

work page 2015
[69]

Deep Reinforcement Learning in Large Discrete Action Spaces

Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. Deep reinforcement learning in large discrete action spaces.arXiv preprint arXiv:1512.07679, 2015. 14 APPENDIX FOR"VULNERABLEAGENTIDENTIFICA- TION INLARGE-SCALEMULTI-AGENTREINFORCE- MENTLEARN...

work page internal anchor Pith review Pith/arXiv arXiv 2015
[70]

Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page 2025

[1] [1]

Mean field multi-agent reinforcement learning

Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. Mean field multi-agent reinforcement learning. InInternational conference on machine learning, pages 5571–5580. PMLR, 2018

work page 2018

[2] [2]

Decen- tralized mean field games

Sriram Ganapathi Subramanian, Matthew E Taylor, Mark Crowley, and Pascal Poupart. Decen- tralized mean field games. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 9439–9447, 2022

work page 2022

[3] [3]

Efficient model-based multi-agent mean- field reinforcement learning.arXiv preprint arXiv:2107.04050, 2021

Barna Pasztor, Ilija Bogunovic, and Andreas Krause. Efficient model-based multi-agent mean- field reinforcement learning.arXiv preprint arXiv:2107.04050, 2021

work page arXiv 2021

[4] [4]

Scalable deep rein- forcement learning algorithms for mean field games

Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Elie, Olivier Pietquin, et al. Scalable deep rein- forcement learning algorithms for mean field games. InInternational Conference on Machine Learning, pages 12078–12095. PMLR, 2022

work page 2022

[5] [5]

Deep reinforcement learning for swarm systems.Journal of Machine Learning Research, 20(54):1–31, 2019

Maximilian Hüttenrauch, Sosic Adrian, Gerhard Neumann, et al. Deep reinforcement learning for swarm systems.Journal of Machine Learning Research, 20(54):1–31, 2019

work page 2019

[6] [6]

Magent: A many-agent reinforcement learning platform for artificial collective intelligence

Lianmin Zheng, Jiacheng Yang, Han Cai, Ming Zhou, Weinan Zhang, Jun Wang, and Yong Yu. Magent: A many-agent reinforcement learning platform for artificial collective intelligence. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018

work page 2018

[7] [7]

Multi-agent reinforcement learning for active voltage control on power distribution networks.Advances in Neural Information Processing Systems, 34:3271–3284, 2021

Jianhong Wang, Wangkun Xu, Yunjie Gu, Wenbin Song, and Tim C Green. Multi-agent reinforcement learning for active voltage control on power distribution networks.Advances in Neural Information Processing Systems, 34:3271–3284, 2021

work page 2021

[8] [8]

Credit assignment for collective multiagent rl with global rewards.Advances in neural information processing systems, 31, 2018

Duc Thien Nguyen, Akshat Kumar, and Hoong Chuin Lau. Credit assignment for collective multiagent rl with global rewards.Advances in neural information processing systems, 31, 2018

work page 2018

[9] [9]

Action robust reinforcement learning and applications in continuous control

Chen Tessler, Yonathan Efroni, and Shie Mannor. Action robust reinforcement learning and applications in continuous control. InInternational Conference on Machine Learning, pages 6215–6224. PMLR, 2019

work page 2019

[10] [10]

Fault detection and diagnosis in multi-robot systems: A survey.Sensors, 19(18):4019, 2019

Eliahu Khalastchi and Meir Kalech. Fault detection and diagnosis in multi-robot systems: A survey.Sensors, 19(18):4019, 2019

work page 2019

[11] [11]

Exploration in extreme environments with swarm robotic system

Xinge Huang, Farshad Arvin, Craig West, Simon Watson, and Barry Lennox. Exploration in extreme environments with swarm robotic system. In2019 IEEE international conference on mechatronics (ICM), volume 1, pages 193–198. IEEE, 2019

work page 2019

[12] [12]

Anatomy of unmanned aerial vehicle hijacking with signal spoofing

Sait Murat Giray. Anatomy of unmanned aerial vehicle hijacking with signal spoofing. In 2013 6th International Conference on Recent Advances in Space Technologies (RAST), pages 795–800. IEEE, 2013

work page 2013

[13] [13]

Cybersecurity in unmanned aerial vehicles (uavs).Journal of Cyber Security Technology, 5(2):120–137, 2021

Bora Ly and Romny Ly. Cybersecurity in unmanned aerial vehicles (uavs).Journal of Cyber Security Technology, 5(2):120–137, 2021

work page 2021

[14] [14]

Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019

Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019. 10

work page arXiv 1905

[15] [15]

On the robustness of cooperative multi-agent reinforcement learning

Jieyu Lin, Kristina Dzeparoska, Sai Qian Zhang, Alberto Leon-Garcia, and Nicolas Papernot. On the robustness of cooperative multi-agent reinforcement learning. In2020 IEEE Security and Privacy Workshops (SPW), pages 62–68. IEEE, 2020

work page 2020

[16] [16]

Online markov decision processes with non-oblivious strategic adversary.Autonomous Agents and Multi-Agent Systems, 37(1):15, 2023

Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, and Yaodong Yang. Online markov decision processes with non-oblivious strategic adversary.Autonomous Agents and Multi-Agent Systems, 37(1):15, 2023

work page 2023

[17] [17]

Mean field games.Japanese journal of mathematics, 2(1):229–260, 2007

Jean-Michel Lasry and Pierre-Louis Lions. Mean field games.Japanese journal of mathematics, 2(1):229–260, 2007

work page 2007

[18] [18]

Maximizing the spread of influence through a social network

David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. InProceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137–146, 2003

work page 2003

[19] [19]

A survey on influence maxi- mization in a social network.Knowledge and Information Systems, 62:3417–3455, 2020

Suman Banerjee, Mamata Jenamani, and Dilip Kumar Pratihar. A survey on influence maxi- mization in a social network.Knowledge and Information Systems, 62:3417–3455, 2020

work page 2020

[20] [20]

A survey on influence maximization: From an ml-based combinatorial optimization.ACM Transactions on Knowledge Discovery from Data, 17(9):1–50, 2023

Yandi Li, Haobo Gao, Yunxuan Gao, Jianxiong Guo, and Weili Wu. A survey on influence maximization: From an ml-based combinatorial optimization.ACM Transactions on Knowledge Discovery from Data, 17(9):1–50, 2023

work page 2023

[21] [21]

Evaluating robustness of cooperative marl: A model-based approach.arXiv preprint arXiv:2202.03558, 2022

Nhan H Pham, Lam M Nguyen, Jie Chen, Hoang Thanh Lam, Subhro Das, and Tsui-Wei Weng. Evaluating robustness of cooperative marl: A model-based approach.arXiv preprint arXiv:2202.03558, 2022

work page arXiv 2022

[22] [22]

Adversarial attacks on cooperative multi-agent deep reinforcement learning: a dynamic group-based adversarial example transferability method

Lixia Zan, Xiangbin Zhu, and Zhao-Long Hu. Adversarial attacks on cooperative multi-agent deep reinforcement learning: a dynamic group-based adversarial example transferability method. Complex & Intelligent Systems, 9(6):7439–7450, 2023

work page 2023

[23] [23]

Robustness testing for multi-agent reinforcement learning: State perturbations on critical agents.arXiv preprint arXiv:2306.06136, 2023

Ziyuan Zhou and Guanjun Liu. Robustness testing for multi-agent reinforcement learning: State perturbations on critical agents.arXiv preprint arXiv:2306.06136, 2023

work page arXiv 2023

[24] [24]

The generalized maximum coverage problem.Information Processing Letters, 108(1):15–22, 2008

Reuven Cohen and Liran Katzir. The generalized maximum coverage problem.Information Processing Letters, 108(1):15–22, 2008

work page 2008

[25] [25]

Convex analysis, 1970

Ralph Tyrell Rockafellar. Convex analysis, 1970

work page 1970

[26] [26]

Reinforcement learning via fenchel-rockafellar duality.arXiv preprint arXiv:2001.01866, 2020

Ofir Nachum and Bo Dai. Reinforcement learning via fenchel-rockafellar duality.arXiv preprint arXiv:2001.01866, 2020

work page arXiv 2001

[27] [27]

arXiv preprint arXiv:2011.00583 , year=

Yaodong Yang and Jun Wang. An overview of multi-agent reinforcement learning from game theoretical perspective.arXiv preprint arXiv:2011.00583, 2020

work page arXiv 2011

[28] [28]

Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle.COM- MUNICATIONS IN INFORMATION AND SYSTEMS, 2006

Minyi Huang, Roland P Malhamé, and Peter E Caines. Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle.COM- MUNICATIONS IN INFORMATION AND SYSTEMS, 2006

work page 2006

[29] [29]

Learning mean-field games.Advances in neural information processing systems, 32, 2019

Xin Guo, Anran Hu, Renyuan Xu, and Junzi Zhang. Learning mean-field games.Advances in neural information processing systems, 32, 2019

work page 2019

[30] [30]

Scaling up mean field games with online mirror descent

Julien Perolat, Sarah Perrin, Romuald Elie, Mathieu Laurière, Georgios Piliouras, Matthieu Geist, Karl Tuyls, and Olivier Pietquin. Scaling up mean field games with online mirror descent. arXiv preprint arXiv:2103.00623, 2021

work page arXiv 2021

[31] [31]

Learning correlated equilibria in mean-field games.arXiv preprint arXiv:2208.10138, 2022

Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, and Karl Tuyls. Learning correlated equilibria in mean-field games.arXiv preprint arXiv:2208.10138, 2022

work page arXiv 2022

[32] [32]

Model-free mean-field reinforcement learning: mean-field mdp and mean-field q-learning.The Annals of Applied Probability, 33(6B):5334–5381, 2023

René Carmona, Mathieu Laurière, and Zongjun Tan. Model-free mean-field reinforcement learning: mean-field mdp and mean-field q-learning.The Annals of Applied Probability, 33(6B):5334–5381, 2023. 11

work page 2023

[33] [33]

Mean-field controls with q-learning for cooperative marl: convergence and complexity analysis.SIAM Journal on Mathematics of Data Science, 3(4):1168–1196, 2021

Haotian Gu, Xin Guo, Xiaoli Wei, and Renyuan Xu. Mean-field controls with q-learning for cooperative marl: convergence and complexity analysis.SIAM Journal on Mathematics of Data Science, 3(4):1168–1196, 2021

work page 2021

[34] [34]

On the approximation of cooperative heterogeneous multi-agent reinforcement learning (marl) using mean field control (mfc).Journal of Machine Learning Research, 23(129):1–46, 2022

Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, and Satish V Ukkusuri. On the approximation of cooperative heterogeneous multi-agent reinforcement learning (marl) using mean field control (mfc).Journal of Machine Learning Research, 23(129):1–46, 2022

work page 2022

[35] [35]

Unified reinforcement q-learning for mean field game and control problems.Mathematics of Control, Signals, and Systems, 34(2):217–271, 2022

Andrea Angiuli, Jean-Pierre Fouque, and Mathieu Laurière. Unified reinforcement q-learning for mean field game and control problems.Mathematics of Control, Signals, and Systems, 34(2):217–271, 2022

work page 2022

[36] [36]

Reinforcement learning in stationary mean- field games

Jayakumar Subramanian and Aditya Mahajan. Reinforcement learning in stationary mean- field games. InProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pages 251–259, 2019

work page 2019

[37] [37]

Multi type mean field reinforcement learning.arXiv preprint arXiv:2002.02513, 2020

Sriram Ganapathi Subramanian, Pascal Poupart, Matthew E Taylor, and Nidhi Hegde. Multi type mean field reinforcement learning.arXiv preprint arXiv:2002.02513, 2020

work page arXiv 2002

[38] [38]

Partially observable mean field reinforcement learning.arXiv preprint arXiv:2012.15791, 2020

Sriram Ganapathi Subramanian, Matthew E Taylor, Mark Crowley, and Pascal Poupart. Partially observable mean field reinforcement learning.arXiv preprint arXiv:2012.15791, 2020

work page arXiv 2012

[39] [39]

Efficient model-based multi-agent reinforcement learning via optimistic equilibrium computation

Pier Giuseppe Sessa, Maryam Kamgarpour, and Andreas Krause. Efficient model-based multi-agent reinforcement learning via optimistic equilibrium computation. InInternational Conference on Machine Learning, pages 19580–19597. PMLR, 2022

work page 2022

[40] [40]

Learning decentralized partially observable mean field control for artificial collective behavior.arXiv preprint arXiv:2307.06175, 2023

Kai Cui, Sascha Hauck, Christian Fabian, and Heinz Koeppl. Learning decentralized partially observable mean field control for artificial collective behavior.arXiv preprint arXiv:2307.06175, 2023

work page arXiv 2023

[41] [41]

Major-minor mean field multi-agent reinforcement learning

Kai Cui, Christian Fabian, Anam Tahir, and Heinz Koeppl. Major-minor mean field multi-agent reinforcement learning. InForty-first International Conference on Machine Learning, 2024

work page 2024

[42] [42]

Towards comprehen- sive testing on the robustness of cooperative multi-agent reinforcement learning

Jun Guo, Yonghong Chen, Yihang Hao, Zixin Yin, Yin Yu, and Simin Li. Towards comprehen- sive testing on the robustness of cooperative multi-agent reinforcement learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 115–122, 2022

work page 2022

[43] [43]

Attacking cooperative multi-agent reinforcement learning by adversarial minority influence

Simin Li, Jun Guo, Jingqiao Xiu, Pu Feng, Xin Yu, Aishan Liu, Wenjun Wu, and Xianglong Liu. Attacking cooperative multi-agent reinforcement learning by adversarial minority influence. arXiv preprint arXiv:2302.03322, 2023

work page arXiv 2023

[44] [44]

Robust multi-agent reinforcement learning with model uncertainty.Advances in neural information processing systems, 33:10571–10583, 2020

Kaiqing Zhang, Tao Sun, Yunzhe Tao, Sahika Genc, Sunil Mallya, and Tamer Basar. Robust multi-agent reinforcement learning with model uncertainty.Advances in neural information processing systems, 33:10571–10583, 2020

work page 2020

[45] [45]

Sample-efficient robust multi- agent reinforcement learning in the face of environmental uncertainty.arXiv preprint arXiv:2404.18909, 2024

Laixi Shi, Eric Mazumdar, Yuejie Chi, and Adam Wierman. Sample-efficient robust multi- agent reinforcement learning in the face of environmental uncertainty.arXiv preprint arXiv:2404.18909, 2024

work page arXiv 2024

[46] [46]

Learning and testing resilience in cooperative multi-agent systems

Thomy Phan, Thomas Gabor, Andreas Sedlmeier, Fabian Ritz, Bernhard Kempter, Cornel Klein, Horst Sauer, Reiner Schmid, Jan Wieghardt, Marc Zeller, et al. Learning and testing resilience in cooperative multi-agent systems. InProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 1055–1063, 2020

work page 2020

[47] [47]

Efficient influence maximization in social networks

Wei Chen, Yajun Wang, and Siyu Yang. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 199–208, 2009

work page 2009

[48] [48]

User interactions in social networks and their implications

Christo Wilson, Bryce Boe, Alessandra Sala, Krishna PN Puttaswamy, and Ben Y Zhao. User interactions in social networks and their implications. InProceedings of the 4th ACM European conference on Computer systems, pages 205–218, 2009

work page 2009

[49] [49]

Scalable influence maximization in social networks under the linear threshold model

Wei Chen, Yifei Yuan, and Li Zhang. Scalable influence maximization in social networks under the linear threshold model. In2010 IEEE international conference on data mining, pages 88–97. IEEE, 2010. 12

work page 2010

[50] [50]

A fast and effective heuristic for discovering small target sets in social networks

Gennaro Cordasco, Luisa Gargano, Marco Mecchia, Adele A Rescigno, and Ugo Vaccaro. A fast and effective heuristic for discovering small target sets in social networks. InCombinatorial Optimization and Applications: 9th International Conference, COCOA 2015, Houston, TX, USA, December 18-20, 2015, Proceedings, pages 193–208. Springer, 2015

work page 2015

[51] [51]

A genetic newgreedy algorithm for influence maximization in social network

Chun-Wei Tsai, Yo-Chung Yang, and Ming-Chao Chiang. A genetic newgreedy algorithm for influence maximization in social network. In2015 IEEE International Conference on Systems, Man, and Cybernetics, pages 2549–2554. IEEE, 2015

work page 2015

[52] [52]

Influence maximization in social networks with genetic algorithms

Doina Bucur and Giovanni Iacca. Influence maximization in social networks with genetic algorithms. InApplications of Evolutionary Computation: 19th European Conference, EvoAp- plications 2016, Porto, Portugal, March 30–April 1, 2016, Proceedings, Part I 19, pages 379–392. Springer, 2016

work page 2016

[53] [53]

Community-based greedy algorithm for mining top-k influential nodes in mobile social networks

Yu Wang, Gao Cong, Guojie Song, and Kunqing Xie. Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. InProceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1039–1048, 2010

work page 2010

[54] [54]

Cim: Community-based influence maximization in social networks.ACM Transactions on Intelligent Systems and Technology (TIST), 5(2):1–31, 2014

Yi-Cheng Chen, Wen-Yuan Zhu, Wen-Chih Peng, Wang-Chien Lee, and Suh-Yin Lee. Cim: Community-based influence maximization in social networks.ACM Transactions on Intelligent Systems and Technology (TIST), 5(2):1–31, 2014

work page 2014

[55] [55]

Controlling graph dynamics with reinforcement learning and graph neural networks

Eli Meirom, Haggai Maron, Shie Mannor, and Gal Chechik. Controlling graph dynamics with reinforcement learning and graph neural networks. InInternational Conference on Machine Learning, pages 7565–7577. PMLR, 2021

work page 2021

[56] [56]

Piano: Influence maximization meets deep reinforcement learning.IEEE Transactions on Computational Social Systems, 10(3):1288–1300, 2022

Hui Li, Mengting Xu, Sourav S Bhowmick, Joty Shafiq Rayhan, Changsheng Sun, and Jiangtao Cui. Piano: Influence maximization meets deep reinforcement learning.IEEE Transactions on Computational Social Systems, 10(3):1288–1300, 2022

work page 2022

[57] [57]

Touplegdd: A fine-designed solution of influence maximization by deep reinforcement learning.IEEE Transactions on Computational Social Systems, 11(2):2210–2221, 2023

Tiantian Chen, Siwen Yan, Jianxiong Guo, and Weili Wu. Touplegdd: A fine-designed solution of influence maximization by deep reinforcement learning.IEEE Transactions on Computational Social Systems, 11(2):2210–2221, 2023

work page 2023

[58] [58]

Deep graph representation learning and optimization for influence maximization

Chen Ling, Junji Jiang, Junxiang Wang, My T Thai, Renhao Xue, James Song, Meikang Qiu, and Liang Zhao. Deep graph representation learning and optimization for influence maximization. InInternational Conference on Machine Learning, pages 21350–21361. PMLR, 2023

work page 2023

[59] [59]

Byzantine robust cooperative multi- agent reinforcement learning as a bayesian game.arXiv preprint arXiv:2305.12872, 2023a

Simin Li, Jun Guo, Jingqiao Xiu, Ruixiao Xu, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, and Xianglong Liu. Byzantine robust cooperative multi-agent reinforcement learning as a bayesian game.arXiv preprint arXiv:2305.12872, 2023

work page arXiv 2023

[60] [60]

Robust dynamic programming.Mathematics of Operations Research, 30(2):257–280, 2005

Garud N Iyengar. Robust dynamic programming.Mathematics of Operations Research, 30(2):257–280, 2005

work page 2005

[61] [61]

Challenges, tasks, and opportunities in modeling agent-based complex systems.Ecological Modelling, 457:109685, 2021

Li An, V olker Grimm, Abigail Sullivan, BL Turner Ii, Nicolas Malleson, Alison Heppenstall, Christian Vincenot, Derek Robinson, Xinyue Ye, Jianguo Liu, et al. Challenges, tasks, and opportunities in modeling agent-based complex systems.Ecological Modelling, 457:109685, 2021

work page 2021

[62] [63]

Novel type of phase transition in a system of self-driven particles.Physical review letters, 75(6):1226, 1995

Tamás Vicsek, András Czirók, Eshel Ben-Jacob, Inon Cohen, and Ofer Shochet. Novel type of phase transition in a system of self-driven particles.Physical review letters, 75(6):1226, 1995

work page 1995

[63] [64]

Dynamics and control of diseases in networks with community structure.PLoS computational biology, 6(4):e1000736, 2010

Marcel Salathé and James H Jones. Dynamics and control of diseases in networks with community structure.PLoS computational biology, 6(4):e1000736, 2010. 13

work page 2010

[64] [65]

Feudal networks for hierarchical reinforcement learning

Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. InInternational conference on machine learning, pages 3540–3549. PMLR, 2017

work page 2017

[65] [66]

Springer, 2018

René Carmona, François Delarue, et al.Probabilistic theory of mean field games with applica- tions I-II. Springer, 2018

work page 2018

[66] [67]

Dynamic programming.science, 153(3731):34–37, 1966

Richard Bellman. Dynamic programming.science, 153(3731):34–37, 1966

work page 1966

[67] [68]

Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

work page 2015

[68] [69]

Deep Reinforcement Learning in Large Discrete Action Spaces

Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. Deep reinforcement learning in large discrete action spaces.arXiv preprint arXiv:1512.07679, 2015. 14 APPENDIX FOR"VULNERABLEAGENTIDENTIFICA- TION INLARGE-SCALEMULTI-AGENTREINFORCE- MENTLEARN...

work page internal anchor Pith review Pith/arXiv arXiv 2015

[69] [70]

Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page 2025