pith. sign in

arxiv: 2509.15103 · v3 · submitted 2025-09-18 · 💻 cs.MA · cs.AI

Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning

Pith reviewed 2026-05-18 15:47 UTC · model grok-4.3

classification 💻 cs.MA cs.AI
keywords vulnerable agent identificationmulti-agent reinforcement learninghierarchical adversarial controlFenchel-Rockafellar transformmean-field controlMDP reformulationlarge-scale MARL
0
0 comments X

The pith

Decomposing the vulnerable agent identification problem in large-scale MARL via Fenchel-Rockafellar transform and MDP reformulation preserves the optimal solution set while enabling independent learning at each hierarchical level.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames the task of finding agents whose failure causes the worst system performance drops as a hierarchical adversarial decentralized mean-field control problem. An upper level chooses which agents to target in an NP-hard selection task, while a lower level learns adversarial policies against them. The authors apply the Fenchel-Rockafellar transform to separate the two levels into a regularized mean-field Bellman operator, then recast the selection step as a Markov decision process with dense rewards. This decomposition lets each level train independently and reduces complexity, yet the paper states that the set of optimal vulnerable agents remains exactly the same as in the original coupled formulation.

Core claim

We frame VAI as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC), where the upper level selects vulnerable agents as an NP-hard task and the lower level learns their worst-case adversarial policies via mean-field MARL. By first applying the Fenchel-Rockafellar transform we obtain a regularized mean-field Bellman operator that decouples the hierarchy and permits independent learning at each level. We then reformulate the upper-level selection as an MDP with dense rewards, which admits sequential identification through greedy and RL algorithms. This decomposition provably preserves the optimal solution.

What carries the argument

Fenchel-Rockafellar transform that decouples the hierarchical adversarial process into a regularized mean-field Bellman operator for the upper level, combined with MDP reformulation of the NP-hard agent selection.

If this is right

  • The method identifies a larger number of vulnerable agents than prior approaches in large-scale MARL environments.
  • Identified agents, when removed, cause measurably worse system failures than randomly chosen agents.
  • The approach reveals per-agent vulnerability rankings across entire large systems.
  • Computational cost drops because each level can now be solved independently rather than jointly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decoupling pattern could be tested on other hierarchical adversarial problems that combine discrete selection with continuous policy learning.
  • In deployed systems the per-agent vulnerability scores might serve as a diagnostic tool to prioritize redundancy or monitoring resources.
  • Because the MDP reformulation admits any RL algorithm, replacing the current solver with a more sample-efficient method would constitute a direct extension.

Load-bearing premise

The Fenchel-Rockafellar transform decouples the upper and lower levels without introducing approximation error that changes which agents belong to the optimal vulnerable set.

What would settle it

On a small-scale instance where exhaustive enumeration of all agent subsets is feasible, check whether the agents returned by the decomposed method exactly match the subset that produces the globally worst performance when those agents are removed.

Figures

Figures reproduced from arXiv: 2509.15103 by Bo An, Chengdong Ma, Jie Luo, Linhao Wang, Ruixiao Xu, Simin Li, Weifeng Lv, Xianglong Liu, Xin Wang, Xin Yu, Yaodong Yang, Yuqing Ma, Zheng Yuwei, Zhiqian Liu, Zihao Mao.

Figure 1
Figure 1. Figure 1: Pearson correlation between the lower-level attack [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a,b) Some agents contributes more to overall system when compromised, reflecting the [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Environments used in our experiments. The task Battle in Magent are proposed in [ [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
read the original abstract

Partial agent failure becomes inevitable when systems scale up, making it crucial to identify the subset of agents whose failure causes worst-case system performance degradations. We study this Vulnerable Agent Identification (VAI) problem in large-scale multi-agent reinforcement learning (MARL). We frame VAI as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC), where the upper level selects vulnerable agents as an NP-hard task and the lower level learns their worst-case adversarial policies via mean-field MARL. The two problems are coupled together, making HAD-MFC difficult to solve. To handle this, we first decouple the hierarchical process by Fenchel-Rockafellar transform, resulting a regularized mean-field Bellman operator for upper level that enables independent learning at each level, thus reducing computational complexity. We next reformulate the upper-level NP-hard problem as an MDP with dense rewards, allowing sequential identification of vulnerable agents via greedy and RL algorithms. This decomposition provably preserves the optimal solution. Experiments show our method effectively identifies more vulnerable agents in large-scale MARL and the rule-based system, fooling system into worse failures, and reveals the vulnerability of each agent in large systems. Code available at https://github.com/Waken-dream/VAI

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper frames Vulnerable Agent Identification (VAI) in large-scale MARL as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC) problem. It decouples the hierarchy via the Fenchel-Rockafellar transform to obtain a regularized mean-field Bellman operator that permits independent learning at each level, then reformulates the upper-level NP-hard selection task as an MDP with dense rewards solvable by greedy and RL algorithms. The authors assert that this decomposition provably preserves the optimal solution set and report experimental results showing improved identification of vulnerable agents that lead to worse system failures.

Significance. If the optimality-preservation claim holds, the approach would offer a computationally tractable method for identifying critical agents in large MARL systems, with direct relevance to robustness and safety analysis. The public code release supports reproducibility. The significance is limited by the absence of explicit verification that the transform maintains the exact optimal vulnerable-agent set under the non-convex policy optimization present in the lower level.

major comments (2)
  1. [Abstract] Abstract (paragraph on decoupling): The claim that the Fenchel-Rockafellar transform 'provably preserves the optimal solution' is load-bearing for the central contribution, yet no derivation, duality-gap bound, or theorem is referenced showing that the fixed point of the resulting regularized mean-field Bellman operator coincides with the original HAD-MFC optimum when the lower-level adversarial policy optimization is non-convex. Standard Fenchel-Rockafellar duality requires convex-concave structure; without explicit conditions or a proof that the set of optimal vulnerable agents is unchanged, the preservation statement cannot be evaluated.
  2. [MDP reformulation] Section on MDP reformulation (upper-level problem): The reformulation of the NP-hard agent-selection task as an MDP with dense rewards is presented as enabling sequential identification, but no analysis is given of how the dense-reward construction interacts with the regularized Bellman operator to guarantee that the greedy/RL solutions recover the same vulnerable-agent ranking as the original hierarchical objective.
minor comments (2)
  1. [Abstract] The abstract mentions 'the rule-based system' without defining the specific rule-based environment or providing its parameters, making it difficult to assess the generality of the reported failure results.
  2. [Method] Notation for the regularization coefficient in the Bellman operator is introduced but its dependence on problem scale or agent count is not discussed, which affects reproducibility of the reported identification performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's thorough review and valuable feedback on our manuscript. We address the major comments point by point below. We will revise the paper to provide additional theoretical details and clarifications as suggested.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph on decoupling): The claim that the Fenchel-Rockafellar transform 'provably preserves the optimal solution' is load-bearing for the central contribution, yet no derivation, duality-gap bound, or theorem is referenced showing that the fixed point of the resulting regularized mean-field Bellman operator coincides with the original HAD-MFC optimum when the lower-level adversarial policy optimization is non-convex. Standard Fenchel-Rockafellar duality requires convex-concave structure; without explicit conditions or a proof that the set of optimal vulnerable agents is unchanged, the preservation statement cannot be evaluated.

    Authors: We thank the referee for highlighting this important point. The Fenchel-Rockafellar transform is applied in the context of the mean-field approximation, where the lower-level problem is formulated as a convex optimization over distributions in the mean-field limit. We will add a new theorem in the revised manuscript that explicitly derives the duality gap bound and shows that under the mean-field regime, the optimal solution set for vulnerable agent selection is preserved even when individual policies are non-convex, as the transform operates on the aggregate mean-field quantities. A detailed proof will be included in the appendix. revision: yes

  2. Referee: [MDP reformulation] Section on MDP reformulation (upper-level problem): The reformulation of the NP-hard agent-selection task as an MDP with dense rewards is presented as enabling sequential identification, but no analysis is given of how the dense-reward construction interacts with the regularized Bellman operator to guarantee that the greedy/RL solutions recover the same vulnerable-agent ranking as the original hierarchical objective.

    Authors: We agree that the interaction between the dense-reward MDP and the regularized Bellman operator requires further elaboration. In the revised version, we will include an analysis showing that the dense rewards are constructed directly from the value function of the regularized operator, ensuring that the optimal policy in the MDP corresponds to the selection that maximizes the hierarchical objective. This will include a proposition demonstrating equivalence in the recovered ranking for the greedy algorithm. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation uses external transform and reformulation without self-reduction

full rationale

The paper applies the Fenchel-Rockafellar transform to decouple the HAD-MFC hierarchy into a regularized mean-field Bellman operator and then reformulates the upper-level selection as an MDP with dense rewards. It explicitly claims this 'decomposition provably preserves the optimal solution' and enables independent learning. No quoted step defines the vulnerable-agent set or optimality in terms of the output itself, renames a fitted quantity as a prediction, or relies on a self-citation chain for the preservation result. The transform is invoked as a standard decoupling tool with independent mathematical content, and the MDP reformulation is presented as a separate algorithmic step. The central claim therefore remains non-tautological relative to the inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on standard mean-field approximations for large agent populations and on the applicability of the Fenchel-Rockafellar transform to the adversarial control objective; no new physical entities are postulated.

free parameters (1)
  • regularization coefficient in Bellman operator
    Introduced to enable independent upper-level learning after the transform; value not specified in abstract.
axioms (2)
  • domain assumption Mean-field approximation is valid for the large-scale agent population
    Invoked for the lower-level adversarial policy learning.
  • domain assumption Fenchel-Rockafellar transform yields an equivalent regularized operator for the hierarchical problem
    Central step that permits decoupling.

pith-pipeline@v0.9.0 · 5797 in / 1297 out tokens · 51263 ms · 2026-05-18T15:47:02.595098+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 1 internal anchor

  1. [1]

    Mean field multi-agent reinforcement learning

    Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. Mean field multi-agent reinforcement learning. InInternational conference on machine learning, pages 5571–5580. PMLR, 2018

  2. [2]

    Decen- tralized mean field games

    Sriram Ganapathi Subramanian, Matthew E Taylor, Mark Crowley, and Pascal Poupart. Decen- tralized mean field games. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 9439–9447, 2022

  3. [3]

    Efficient model-based multi-agent mean- field reinforcement learning.arXiv preprint arXiv:2107.04050, 2021

    Barna Pasztor, Ilija Bogunovic, and Andreas Krause. Efficient model-based multi-agent mean- field reinforcement learning.arXiv preprint arXiv:2107.04050, 2021

  4. [4]

    Scalable deep rein- forcement learning algorithms for mean field games

    Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Elie, Olivier Pietquin, et al. Scalable deep rein- forcement learning algorithms for mean field games. InInternational Conference on Machine Learning, pages 12078–12095. PMLR, 2022

  5. [5]

    Deep reinforcement learning for swarm systems.Journal of Machine Learning Research, 20(54):1–31, 2019

    Maximilian Hüttenrauch, Sosic Adrian, Gerhard Neumann, et al. Deep reinforcement learning for swarm systems.Journal of Machine Learning Research, 20(54):1–31, 2019

  6. [6]

    Magent: A many-agent reinforcement learning platform for artificial collective intelligence

    Lianmin Zheng, Jiacheng Yang, Han Cai, Ming Zhou, Weinan Zhang, Jun Wang, and Yong Yu. Magent: A many-agent reinforcement learning platform for artificial collective intelligence. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018

  7. [7]

    Multi-agent reinforcement learning for active voltage control on power distribution networks.Advances in Neural Information Processing Systems, 34:3271–3284, 2021

    Jianhong Wang, Wangkun Xu, Yunjie Gu, Wenbin Song, and Tim C Green. Multi-agent reinforcement learning for active voltage control on power distribution networks.Advances in Neural Information Processing Systems, 34:3271–3284, 2021

  8. [8]

    Credit assignment for collective multiagent rl with global rewards.Advances in neural information processing systems, 31, 2018

    Duc Thien Nguyen, Akshat Kumar, and Hoong Chuin Lau. Credit assignment for collective multiagent rl with global rewards.Advances in neural information processing systems, 31, 2018

  9. [9]

    Action robust reinforcement learning and applications in continuous control

    Chen Tessler, Yonathan Efroni, and Shie Mannor. Action robust reinforcement learning and applications in continuous control. InInternational Conference on Machine Learning, pages 6215–6224. PMLR, 2019

  10. [10]

    Fault detection and diagnosis in multi-robot systems: A survey.Sensors, 19(18):4019, 2019

    Eliahu Khalastchi and Meir Kalech. Fault detection and diagnosis in multi-robot systems: A survey.Sensors, 19(18):4019, 2019

  11. [11]

    Exploration in extreme environments with swarm robotic system

    Xinge Huang, Farshad Arvin, Craig West, Simon Watson, and Barry Lennox. Exploration in extreme environments with swarm robotic system. In2019 IEEE international conference on mechatronics (ICM), volume 1, pages 193–198. IEEE, 2019

  12. [12]

    Anatomy of unmanned aerial vehicle hijacking with signal spoofing

    Sait Murat Giray. Anatomy of unmanned aerial vehicle hijacking with signal spoofing. In 2013 6th International Conference on Recent Advances in Space Technologies (RAST), pages 795–800. IEEE, 2013

  13. [13]

    Cybersecurity in unmanned aerial vehicles (uavs).Journal of Cyber Security Technology, 5(2):120–137, 2021

    Bora Ly and Romny Ly. Cybersecurity in unmanned aerial vehicles (uavs).Journal of Cyber Security Technology, 5(2):120–137, 2021

  14. [14]

    Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019

    Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019. 10

  15. [15]

    On the robustness of cooperative multi-agent reinforcement learning

    Jieyu Lin, Kristina Dzeparoska, Sai Qian Zhang, Alberto Leon-Garcia, and Nicolas Papernot. On the robustness of cooperative multi-agent reinforcement learning. In2020 IEEE Security and Privacy Workshops (SPW), pages 62–68. IEEE, 2020

  16. [16]

    Online markov decision processes with non-oblivious strategic adversary.Autonomous Agents and Multi-Agent Systems, 37(1):15, 2023

    Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, and Yaodong Yang. Online markov decision processes with non-oblivious strategic adversary.Autonomous Agents and Multi-Agent Systems, 37(1):15, 2023

  17. [17]

    Mean field games.Japanese journal of mathematics, 2(1):229–260, 2007

    Jean-Michel Lasry and Pierre-Louis Lions. Mean field games.Japanese journal of mathematics, 2(1):229–260, 2007

  18. [18]

    Maximizing the spread of influence through a social network

    David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. InProceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137–146, 2003

  19. [19]

    A survey on influence maxi- mization in a social network.Knowledge and Information Systems, 62:3417–3455, 2020

    Suman Banerjee, Mamata Jenamani, and Dilip Kumar Pratihar. A survey on influence maxi- mization in a social network.Knowledge and Information Systems, 62:3417–3455, 2020

  20. [20]

    A survey on influence maximization: From an ml-based combinatorial optimization.ACM Transactions on Knowledge Discovery from Data, 17(9):1–50, 2023

    Yandi Li, Haobo Gao, Yunxuan Gao, Jianxiong Guo, and Weili Wu. A survey on influence maximization: From an ml-based combinatorial optimization.ACM Transactions on Knowledge Discovery from Data, 17(9):1–50, 2023

  21. [21]

    Evaluating robustness of cooperative marl: A model-based approach.arXiv preprint arXiv:2202.03558, 2022

    Nhan H Pham, Lam M Nguyen, Jie Chen, Hoang Thanh Lam, Subhro Das, and Tsui-Wei Weng. Evaluating robustness of cooperative marl: A model-based approach.arXiv preprint arXiv:2202.03558, 2022

  22. [22]

    Adversarial attacks on cooperative multi-agent deep reinforcement learning: a dynamic group-based adversarial example transferability method

    Lixia Zan, Xiangbin Zhu, and Zhao-Long Hu. Adversarial attacks on cooperative multi-agent deep reinforcement learning: a dynamic group-based adversarial example transferability method. Complex & Intelligent Systems, 9(6):7439–7450, 2023

  23. [23]

    Robustness testing for multi-agent reinforcement learning: State perturbations on critical agents.arXiv preprint arXiv:2306.06136, 2023

    Ziyuan Zhou and Guanjun Liu. Robustness testing for multi-agent reinforcement learning: State perturbations on critical agents.arXiv preprint arXiv:2306.06136, 2023

  24. [24]

    The generalized maximum coverage problem.Information Processing Letters, 108(1):15–22, 2008

    Reuven Cohen and Liran Katzir. The generalized maximum coverage problem.Information Processing Letters, 108(1):15–22, 2008

  25. [25]

    Convex analysis, 1970

    Ralph Tyrell Rockafellar. Convex analysis, 1970

  26. [26]

    Reinforcement learning via fenchel-rockafellar duality.arXiv preprint arXiv:2001.01866, 2020

    Ofir Nachum and Bo Dai. Reinforcement learning via fenchel-rockafellar duality.arXiv preprint arXiv:2001.01866, 2020

  27. [27]

    arXiv preprint arXiv:2011.00583 , year=

    Yaodong Yang and Jun Wang. An overview of multi-agent reinforcement learning from game theoretical perspective.arXiv preprint arXiv:2011.00583, 2020

  28. [28]

    Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle.COM- MUNICATIONS IN INFORMATION AND SYSTEMS, 2006

    Minyi Huang, Roland P Malhamé, and Peter E Caines. Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle.COM- MUNICATIONS IN INFORMATION AND SYSTEMS, 2006

  29. [29]

    Learning mean-field games.Advances in neural information processing systems, 32, 2019

    Xin Guo, Anran Hu, Renyuan Xu, and Junzi Zhang. Learning mean-field games.Advances in neural information processing systems, 32, 2019

  30. [30]

    Scaling up mean field games with online mirror descent

    Julien Perolat, Sarah Perrin, Romuald Elie, Mathieu Laurière, Georgios Piliouras, Matthieu Geist, Karl Tuyls, and Olivier Pietquin. Scaling up mean field games with online mirror descent. arXiv preprint arXiv:2103.00623, 2021

  31. [31]

    Learning correlated equilibria in mean-field games.arXiv preprint arXiv:2208.10138, 2022

    Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, and Karl Tuyls. Learning correlated equilibria in mean-field games.arXiv preprint arXiv:2208.10138, 2022

  32. [32]

    Model-free mean-field reinforcement learning: mean-field mdp and mean-field q-learning.The Annals of Applied Probability, 33(6B):5334–5381, 2023

    René Carmona, Mathieu Laurière, and Zongjun Tan. Model-free mean-field reinforcement learning: mean-field mdp and mean-field q-learning.The Annals of Applied Probability, 33(6B):5334–5381, 2023. 11

  33. [33]

    Mean-field controls with q-learning for cooperative marl: convergence and complexity analysis.SIAM Journal on Mathematics of Data Science, 3(4):1168–1196, 2021

    Haotian Gu, Xin Guo, Xiaoli Wei, and Renyuan Xu. Mean-field controls with q-learning for cooperative marl: convergence and complexity analysis.SIAM Journal on Mathematics of Data Science, 3(4):1168–1196, 2021

  34. [34]

    On the approximation of cooperative heterogeneous multi-agent reinforcement learning (marl) using mean field control (mfc).Journal of Machine Learning Research, 23(129):1–46, 2022

    Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, and Satish V Ukkusuri. On the approximation of cooperative heterogeneous multi-agent reinforcement learning (marl) using mean field control (mfc).Journal of Machine Learning Research, 23(129):1–46, 2022

  35. [35]

    Unified reinforcement q-learning for mean field game and control problems.Mathematics of Control, Signals, and Systems, 34(2):217–271, 2022

    Andrea Angiuli, Jean-Pierre Fouque, and Mathieu Laurière. Unified reinforcement q-learning for mean field game and control problems.Mathematics of Control, Signals, and Systems, 34(2):217–271, 2022

  36. [36]

    Reinforcement learning in stationary mean- field games

    Jayakumar Subramanian and Aditya Mahajan. Reinforcement learning in stationary mean- field games. InProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pages 251–259, 2019

  37. [37]

    Multi type mean field reinforcement learning.arXiv preprint arXiv:2002.02513, 2020

    Sriram Ganapathi Subramanian, Pascal Poupart, Matthew E Taylor, and Nidhi Hegde. Multi type mean field reinforcement learning.arXiv preprint arXiv:2002.02513, 2020

  38. [38]

    Partially observable mean field reinforcement learning.arXiv preprint arXiv:2012.15791, 2020

    Sriram Ganapathi Subramanian, Matthew E Taylor, Mark Crowley, and Pascal Poupart. Partially observable mean field reinforcement learning.arXiv preprint arXiv:2012.15791, 2020

  39. [39]

    Efficient model-based multi-agent reinforcement learning via optimistic equilibrium computation

    Pier Giuseppe Sessa, Maryam Kamgarpour, and Andreas Krause. Efficient model-based multi-agent reinforcement learning via optimistic equilibrium computation. InInternational Conference on Machine Learning, pages 19580–19597. PMLR, 2022

  40. [40]

    Learning decentralized partially observable mean field control for artificial collective behavior.arXiv preprint arXiv:2307.06175, 2023

    Kai Cui, Sascha Hauck, Christian Fabian, and Heinz Koeppl. Learning decentralized partially observable mean field control for artificial collective behavior.arXiv preprint arXiv:2307.06175, 2023

  41. [41]

    Major-minor mean field multi-agent reinforcement learning

    Kai Cui, Christian Fabian, Anam Tahir, and Heinz Koeppl. Major-minor mean field multi-agent reinforcement learning. InForty-first International Conference on Machine Learning, 2024

  42. [42]

    Towards comprehen- sive testing on the robustness of cooperative multi-agent reinforcement learning

    Jun Guo, Yonghong Chen, Yihang Hao, Zixin Yin, Yin Yu, and Simin Li. Towards comprehen- sive testing on the robustness of cooperative multi-agent reinforcement learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 115–122, 2022

  43. [43]

    Attacking cooperative multi-agent reinforcement learning by adversarial minority influence

    Simin Li, Jun Guo, Jingqiao Xiu, Pu Feng, Xin Yu, Aishan Liu, Wenjun Wu, and Xianglong Liu. Attacking cooperative multi-agent reinforcement learning by adversarial minority influence. arXiv preprint arXiv:2302.03322, 2023

  44. [44]

    Robust multi-agent reinforcement learning with model uncertainty.Advances in neural information processing systems, 33:10571–10583, 2020

    Kaiqing Zhang, Tao Sun, Yunzhe Tao, Sahika Genc, Sunil Mallya, and Tamer Basar. Robust multi-agent reinforcement learning with model uncertainty.Advances in neural information processing systems, 33:10571–10583, 2020

  45. [45]

    Sample-efficient robust multi- agent reinforcement learning in the face of environmental uncertainty.arXiv preprint arXiv:2404.18909, 2024

    Laixi Shi, Eric Mazumdar, Yuejie Chi, and Adam Wierman. Sample-efficient robust multi- agent reinforcement learning in the face of environmental uncertainty.arXiv preprint arXiv:2404.18909, 2024

  46. [46]

    Learning and testing resilience in cooperative multi-agent systems

    Thomy Phan, Thomas Gabor, Andreas Sedlmeier, Fabian Ritz, Bernhard Kempter, Cornel Klein, Horst Sauer, Reiner Schmid, Jan Wieghardt, Marc Zeller, et al. Learning and testing resilience in cooperative multi-agent systems. InProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 1055–1063, 2020

  47. [47]

    Efficient influence maximization in social networks

    Wei Chen, Yajun Wang, and Siyu Yang. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 199–208, 2009

  48. [48]

    User interactions in social networks and their implications

    Christo Wilson, Bryce Boe, Alessandra Sala, Krishna PN Puttaswamy, and Ben Y Zhao. User interactions in social networks and their implications. InProceedings of the 4th ACM European conference on Computer systems, pages 205–218, 2009

  49. [49]

    Scalable influence maximization in social networks under the linear threshold model

    Wei Chen, Yifei Yuan, and Li Zhang. Scalable influence maximization in social networks under the linear threshold model. In2010 IEEE international conference on data mining, pages 88–97. IEEE, 2010. 12

  50. [50]

    A fast and effective heuristic for discovering small target sets in social networks

    Gennaro Cordasco, Luisa Gargano, Marco Mecchia, Adele A Rescigno, and Ugo Vaccaro. A fast and effective heuristic for discovering small target sets in social networks. InCombinatorial Optimization and Applications: 9th International Conference, COCOA 2015, Houston, TX, USA, December 18-20, 2015, Proceedings, pages 193–208. Springer, 2015

  51. [51]

    A genetic newgreedy algorithm for influence maximization in social network

    Chun-Wei Tsai, Yo-Chung Yang, and Ming-Chao Chiang. A genetic newgreedy algorithm for influence maximization in social network. In2015 IEEE International Conference on Systems, Man, and Cybernetics, pages 2549–2554. IEEE, 2015

  52. [52]

    Influence maximization in social networks with genetic algorithms

    Doina Bucur and Giovanni Iacca. Influence maximization in social networks with genetic algorithms. InApplications of Evolutionary Computation: 19th European Conference, EvoAp- plications 2016, Porto, Portugal, March 30–April 1, 2016, Proceedings, Part I 19, pages 379–392. Springer, 2016

  53. [53]

    Community-based greedy algorithm for mining top-k influential nodes in mobile social networks

    Yu Wang, Gao Cong, Guojie Song, and Kunqing Xie. Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. InProceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1039–1048, 2010

  54. [54]

    Cim: Community-based influence maximization in social networks.ACM Transactions on Intelligent Systems and Technology (TIST), 5(2):1–31, 2014

    Yi-Cheng Chen, Wen-Yuan Zhu, Wen-Chih Peng, Wang-Chien Lee, and Suh-Yin Lee. Cim: Community-based influence maximization in social networks.ACM Transactions on Intelligent Systems and Technology (TIST), 5(2):1–31, 2014

  55. [55]

    Controlling graph dynamics with reinforcement learning and graph neural networks

    Eli Meirom, Haggai Maron, Shie Mannor, and Gal Chechik. Controlling graph dynamics with reinforcement learning and graph neural networks. InInternational Conference on Machine Learning, pages 7565–7577. PMLR, 2021

  56. [56]

    Piano: Influence maximization meets deep reinforcement learning.IEEE Transactions on Computational Social Systems, 10(3):1288–1300, 2022

    Hui Li, Mengting Xu, Sourav S Bhowmick, Joty Shafiq Rayhan, Changsheng Sun, and Jiangtao Cui. Piano: Influence maximization meets deep reinforcement learning.IEEE Transactions on Computational Social Systems, 10(3):1288–1300, 2022

  57. [57]

    Touplegdd: A fine-designed solution of influence maximization by deep reinforcement learning.IEEE Transactions on Computational Social Systems, 11(2):2210–2221, 2023

    Tiantian Chen, Siwen Yan, Jianxiong Guo, and Weili Wu. Touplegdd: A fine-designed solution of influence maximization by deep reinforcement learning.IEEE Transactions on Computational Social Systems, 11(2):2210–2221, 2023

  58. [58]

    Deep graph representation learning and optimization for influence maximization

    Chen Ling, Junji Jiang, Junxiang Wang, My T Thai, Renhao Xue, James Song, Meikang Qiu, and Liang Zhao. Deep graph representation learning and optimization for influence maximization. InInternational Conference on Machine Learning, pages 21350–21361. PMLR, 2023

  59. [59]

    Byzantine robust cooperative multi- agent reinforcement learning as a bayesian game.arXiv preprint arXiv:2305.12872, 2023a

    Simin Li, Jun Guo, Jingqiao Xiu, Ruixiao Xu, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, and Xianglong Liu. Byzantine robust cooperative multi-agent reinforcement learning as a bayesian game.arXiv preprint arXiv:2305.12872, 2023

  60. [60]

    Robust dynamic programming.Mathematics of Operations Research, 30(2):257–280, 2005

    Garud N Iyengar. Robust dynamic programming.Mathematics of Operations Research, 30(2):257–280, 2005

  61. [61]

    Challenges, tasks, and opportunities in modeling agent-based complex systems.Ecological Modelling, 457:109685, 2021

    Li An, V olker Grimm, Abigail Sullivan, BL Turner Ii, Nicolas Malleson, Alison Heppenstall, Christian Vincenot, Derek Robinson, Xinyue Ye, Jianguo Liu, et al. Challenges, tasks, and opportunities in modeling agent-based complex systems.Ecological Modelling, 457:109685, 2021

  62. [63]

    Novel type of phase transition in a system of self-driven particles.Physical review letters, 75(6):1226, 1995

    Tamás Vicsek, András Czirók, Eshel Ben-Jacob, Inon Cohen, and Ofer Shochet. Novel type of phase transition in a system of self-driven particles.Physical review letters, 75(6):1226, 1995

  63. [64]

    Dynamics and control of diseases in networks with community structure.PLoS computational biology, 6(4):e1000736, 2010

    Marcel Salathé and James H Jones. Dynamics and control of diseases in networks with community structure.PLoS computational biology, 6(4):e1000736, 2010. 13

  64. [65]

    Feudal networks for hierarchical reinforcement learning

    Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. InInternational conference on machine learning, pages 3540–3549. PMLR, 2017

  65. [66]

    Springer, 2018

    René Carmona, François Delarue, et al.Probabilistic theory of mean field games with applica- tions I-II. Springer, 2018

  66. [67]

    Dynamic programming.science, 153(3731):34–37, 1966

    Richard Bellman. Dynamic programming.science, 153(3731):34–37, 1966

  67. [68]

    Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

    V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

  68. [69]

    Deep Reinforcement Learning in Large Discrete Action Spaces

    Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. Deep reinforcement learning in large discrete action spaces.arXiv preprint arXiv:1512.07679, 2015. 14 APPENDIX FOR"VULNERABLEAGENTIDENTIFICA- TION INLARGE-SCALEMULTI-AGENTREINFORCE- MENTLEARN...

  69. [70]

    Guidelines: • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...