Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
Pith reviewed 2026-05-18 15:47 UTC · model grok-4.3
The pith
Decomposing the vulnerable agent identification problem in large-scale MARL via Fenchel-Rockafellar transform and MDP reformulation preserves the optimal solution set while enabling independent learning at each hierarchical level.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We frame VAI as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC), where the upper level selects vulnerable agents as an NP-hard task and the lower level learns their worst-case adversarial policies via mean-field MARL. By first applying the Fenchel-Rockafellar transform we obtain a regularized mean-field Bellman operator that decouples the hierarchy and permits independent learning at each level. We then reformulate the upper-level selection as an MDP with dense rewards, which admits sequential identification through greedy and RL algorithms. This decomposition provably preserves the optimal solution.
What carries the argument
Fenchel-Rockafellar transform that decouples the hierarchical adversarial process into a regularized mean-field Bellman operator for the upper level, combined with MDP reformulation of the NP-hard agent selection.
If this is right
- The method identifies a larger number of vulnerable agents than prior approaches in large-scale MARL environments.
- Identified agents, when removed, cause measurably worse system failures than randomly chosen agents.
- The approach reveals per-agent vulnerability rankings across entire large systems.
- Computational cost drops because each level can now be solved independently rather than jointly.
Where Pith is reading between the lines
- The same decoupling pattern could be tested on other hierarchical adversarial problems that combine discrete selection with continuous policy learning.
- In deployed systems the per-agent vulnerability scores might serve as a diagnostic tool to prioritize redundancy or monitoring resources.
- Because the MDP reformulation admits any RL algorithm, replacing the current solver with a more sample-efficient method would constitute a direct extension.
Load-bearing premise
The Fenchel-Rockafellar transform decouples the upper and lower levels without introducing approximation error that changes which agents belong to the optimal vulnerable set.
What would settle it
On a small-scale instance where exhaustive enumeration of all agent subsets is feasible, check whether the agents returned by the decomposed method exactly match the subset that produces the globally worst performance when those agents are removed.
Figures
read the original abstract
Partial agent failure becomes inevitable when systems scale up, making it crucial to identify the subset of agents whose failure causes worst-case system performance degradations. We study this Vulnerable Agent Identification (VAI) problem in large-scale multi-agent reinforcement learning (MARL). We frame VAI as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC), where the upper level selects vulnerable agents as an NP-hard task and the lower level learns their worst-case adversarial policies via mean-field MARL. The two problems are coupled together, making HAD-MFC difficult to solve. To handle this, we first decouple the hierarchical process by Fenchel-Rockafellar transform, resulting a regularized mean-field Bellman operator for upper level that enables independent learning at each level, thus reducing computational complexity. We next reformulate the upper-level NP-hard problem as an MDP with dense rewards, allowing sequential identification of vulnerable agents via greedy and RL algorithms. This decomposition provably preserves the optimal solution. Experiments show our method effectively identifies more vulnerable agents in large-scale MARL and the rule-based system, fooling system into worse failures, and reveals the vulnerability of each agent in large systems. Code available at https://github.com/Waken-dream/VAI
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper frames Vulnerable Agent Identification (VAI) in large-scale MARL as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC) problem. It decouples the hierarchy via the Fenchel-Rockafellar transform to obtain a regularized mean-field Bellman operator that permits independent learning at each level, then reformulates the upper-level NP-hard selection task as an MDP with dense rewards solvable by greedy and RL algorithms. The authors assert that this decomposition provably preserves the optimal solution set and report experimental results showing improved identification of vulnerable agents that lead to worse system failures.
Significance. If the optimality-preservation claim holds, the approach would offer a computationally tractable method for identifying critical agents in large MARL systems, with direct relevance to robustness and safety analysis. The public code release supports reproducibility. The significance is limited by the absence of explicit verification that the transform maintains the exact optimal vulnerable-agent set under the non-convex policy optimization present in the lower level.
major comments (2)
- [Abstract] Abstract (paragraph on decoupling): The claim that the Fenchel-Rockafellar transform 'provably preserves the optimal solution' is load-bearing for the central contribution, yet no derivation, duality-gap bound, or theorem is referenced showing that the fixed point of the resulting regularized mean-field Bellman operator coincides with the original HAD-MFC optimum when the lower-level adversarial policy optimization is non-convex. Standard Fenchel-Rockafellar duality requires convex-concave structure; without explicit conditions or a proof that the set of optimal vulnerable agents is unchanged, the preservation statement cannot be evaluated.
- [MDP reformulation] Section on MDP reformulation (upper-level problem): The reformulation of the NP-hard agent-selection task as an MDP with dense rewards is presented as enabling sequential identification, but no analysis is given of how the dense-reward construction interacts with the regularized Bellman operator to guarantee that the greedy/RL solutions recover the same vulnerable-agent ranking as the original hierarchical objective.
minor comments (2)
- [Abstract] The abstract mentions 'the rule-based system' without defining the specific rule-based environment or providing its parameters, making it difficult to assess the generality of the reported failure results.
- [Method] Notation for the regularization coefficient in the Bellman operator is introduced but its dependence on problem scale or agent count is not discussed, which affects reproducibility of the reported identification performance.
Simulated Author's Rebuttal
We appreciate the referee's thorough review and valuable feedback on our manuscript. We address the major comments point by point below. We will revise the paper to provide additional theoretical details and clarifications as suggested.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph on decoupling): The claim that the Fenchel-Rockafellar transform 'provably preserves the optimal solution' is load-bearing for the central contribution, yet no derivation, duality-gap bound, or theorem is referenced showing that the fixed point of the resulting regularized mean-field Bellman operator coincides with the original HAD-MFC optimum when the lower-level adversarial policy optimization is non-convex. Standard Fenchel-Rockafellar duality requires convex-concave structure; without explicit conditions or a proof that the set of optimal vulnerable agents is unchanged, the preservation statement cannot be evaluated.
Authors: We thank the referee for highlighting this important point. The Fenchel-Rockafellar transform is applied in the context of the mean-field approximation, where the lower-level problem is formulated as a convex optimization over distributions in the mean-field limit. We will add a new theorem in the revised manuscript that explicitly derives the duality gap bound and shows that under the mean-field regime, the optimal solution set for vulnerable agent selection is preserved even when individual policies are non-convex, as the transform operates on the aggregate mean-field quantities. A detailed proof will be included in the appendix. revision: yes
-
Referee: [MDP reformulation] Section on MDP reformulation (upper-level problem): The reformulation of the NP-hard agent-selection task as an MDP with dense rewards is presented as enabling sequential identification, but no analysis is given of how the dense-reward construction interacts with the regularized Bellman operator to guarantee that the greedy/RL solutions recover the same vulnerable-agent ranking as the original hierarchical objective.
Authors: We agree that the interaction between the dense-reward MDP and the regularized Bellman operator requires further elaboration. In the revised version, we will include an analysis showing that the dense rewards are constructed directly from the value function of the regularized operator, ensuring that the optimal policy in the MDP corresponds to the selection that maximizes the hierarchical objective. This will include a proposition demonstrating equivalence in the recovered ranking for the greedy algorithm. revision: yes
Circularity Check
No circularity: derivation uses external transform and reformulation without self-reduction
full rationale
The paper applies the Fenchel-Rockafellar transform to decouple the HAD-MFC hierarchy into a regularized mean-field Bellman operator and then reformulates the upper-level selection as an MDP with dense rewards. It explicitly claims this 'decomposition provably preserves the optimal solution' and enables independent learning. No quoted step defines the vulnerable-agent set or optimality in terms of the output itself, renames a fitted quantity as a prediction, or relies on a self-citation chain for the preservation result. The transform is invoked as a standard decoupling tool with independent mathematical content, and the MDP reformulation is presented as a separate algorithmic step. The central claim therefore remains non-tautological relative to the inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- regularization coefficient in Bellman operator
axioms (2)
- domain assumption Mean-field approximation is valid for the large-scale agent population
- domain assumption Fenchel-Rockafellar transform yields an equivalent regularized operator for the hierarchical problem
Reference graph
Works this paper leans on
-
[1]
Mean field multi-agent reinforcement learning
Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. Mean field multi-agent reinforcement learning. InInternational conference on machine learning, pages 5571–5580. PMLR, 2018
work page 2018
-
[2]
Decen- tralized mean field games
Sriram Ganapathi Subramanian, Matthew E Taylor, Mark Crowley, and Pascal Poupart. Decen- tralized mean field games. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 9439–9447, 2022
work page 2022
-
[3]
Barna Pasztor, Ilija Bogunovic, and Andreas Krause. Efficient model-based multi-agent mean- field reinforcement learning.arXiv preprint arXiv:2107.04050, 2021
-
[4]
Scalable deep rein- forcement learning algorithms for mean field games
Mathieu Laurière, Sarah Perrin, Sertan Girgin, Paul Muller, Ayush Jain, Theophile Cabannes, Georgios Piliouras, Julien Pérolat, Romuald Elie, Olivier Pietquin, et al. Scalable deep rein- forcement learning algorithms for mean field games. InInternational Conference on Machine Learning, pages 12078–12095. PMLR, 2022
work page 2022
-
[5]
Maximilian Hüttenrauch, Sosic Adrian, Gerhard Neumann, et al. Deep reinforcement learning for swarm systems.Journal of Machine Learning Research, 20(54):1–31, 2019
work page 2019
-
[6]
Magent: A many-agent reinforcement learning platform for artificial collective intelligence
Lianmin Zheng, Jiacheng Yang, Han Cai, Ming Zhou, Weinan Zhang, Jun Wang, and Yong Yu. Magent: A many-agent reinforcement learning platform for artificial collective intelligence. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018
work page 2018
-
[7]
Jianhong Wang, Wangkun Xu, Yunjie Gu, Wenbin Song, and Tim C Green. Multi-agent reinforcement learning for active voltage control on power distribution networks.Advances in Neural Information Processing Systems, 34:3271–3284, 2021
work page 2021
-
[8]
Duc Thien Nguyen, Akshat Kumar, and Hoong Chuin Lau. Credit assignment for collective multiagent rl with global rewards.Advances in neural information processing systems, 31, 2018
work page 2018
-
[9]
Action robust reinforcement learning and applications in continuous control
Chen Tessler, Yonathan Efroni, and Shie Mannor. Action robust reinforcement learning and applications in continuous control. InInternational Conference on Machine Learning, pages 6215–6224. PMLR, 2019
work page 2019
-
[10]
Fault detection and diagnosis in multi-robot systems: A survey.Sensors, 19(18):4019, 2019
Eliahu Khalastchi and Meir Kalech. Fault detection and diagnosis in multi-robot systems: A survey.Sensors, 19(18):4019, 2019
work page 2019
-
[11]
Exploration in extreme environments with swarm robotic system
Xinge Huang, Farshad Arvin, Craig West, Simon Watson, and Barry Lennox. Exploration in extreme environments with swarm robotic system. In2019 IEEE international conference on mechatronics (ICM), volume 1, pages 193–198. IEEE, 2019
work page 2019
-
[12]
Anatomy of unmanned aerial vehicle hijacking with signal spoofing
Sait Murat Giray. Anatomy of unmanned aerial vehicle hijacking with signal spoofing. In 2013 6th International Conference on Recent Advances in Space Technologies (RAST), pages 795–800. IEEE, 2013
work page 2013
-
[13]
Bora Ly and Romny Ly. Cybersecurity in unmanned aerial vehicles (uavs).Journal of Cyber Security Technology, 5(2):120–137, 2021
work page 2021
-
[14]
Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019
Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, and Stuart Russell. Adversarial policies: Attacking deep reinforcement learning.arXiv preprint arXiv:1905.10615, 2019. 10
-
[15]
On the robustness of cooperative multi-agent reinforcement learning
Jieyu Lin, Kristina Dzeparoska, Sai Qian Zhang, Alberto Leon-Garcia, and Nicolas Papernot. On the robustness of cooperative multi-agent reinforcement learning. In2020 IEEE Security and Privacy Workshops (SPW), pages 62–68. IEEE, 2020
work page 2020
-
[16]
Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, and Yaodong Yang. Online markov decision processes with non-oblivious strategic adversary.Autonomous Agents and Multi-Agent Systems, 37(1):15, 2023
work page 2023
-
[17]
Mean field games.Japanese journal of mathematics, 2(1):229–260, 2007
Jean-Michel Lasry and Pierre-Louis Lions. Mean field games.Japanese journal of mathematics, 2(1):229–260, 2007
work page 2007
-
[18]
Maximizing the spread of influence through a social network
David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. InProceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137–146, 2003
work page 2003
-
[19]
Suman Banerjee, Mamata Jenamani, and Dilip Kumar Pratihar. A survey on influence maxi- mization in a social network.Knowledge and Information Systems, 62:3417–3455, 2020
work page 2020
-
[20]
Yandi Li, Haobo Gao, Yunxuan Gao, Jianxiong Guo, and Weili Wu. A survey on influence maximization: From an ml-based combinatorial optimization.ACM Transactions on Knowledge Discovery from Data, 17(9):1–50, 2023
work page 2023
-
[21]
Nhan H Pham, Lam M Nguyen, Jie Chen, Hoang Thanh Lam, Subhro Das, and Tsui-Wei Weng. Evaluating robustness of cooperative marl: A model-based approach.arXiv preprint arXiv:2202.03558, 2022
-
[22]
Lixia Zan, Xiangbin Zhu, and Zhao-Long Hu. Adversarial attacks on cooperative multi-agent deep reinforcement learning: a dynamic group-based adversarial example transferability method. Complex & Intelligent Systems, 9(6):7439–7450, 2023
work page 2023
-
[23]
Ziyuan Zhou and Guanjun Liu. Robustness testing for multi-agent reinforcement learning: State perturbations on critical agents.arXiv preprint arXiv:2306.06136, 2023
-
[24]
The generalized maximum coverage problem.Information Processing Letters, 108(1):15–22, 2008
Reuven Cohen and Liran Katzir. The generalized maximum coverage problem.Information Processing Letters, 108(1):15–22, 2008
work page 2008
- [25]
-
[26]
Reinforcement learning via fenchel-rockafellar duality.arXiv preprint arXiv:2001.01866, 2020
Ofir Nachum and Bo Dai. Reinforcement learning via fenchel-rockafellar duality.arXiv preprint arXiv:2001.01866, 2020
-
[27]
arXiv preprint arXiv:2011.00583 , year=
Yaodong Yang and Jun Wang. An overview of multi-agent reinforcement learning from game theoretical perspective.arXiv preprint arXiv:2011.00583, 2020
-
[28]
Minyi Huang, Roland P Malhamé, and Peter E Caines. Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle.COM- MUNICATIONS IN INFORMATION AND SYSTEMS, 2006
work page 2006
-
[29]
Learning mean-field games.Advances in neural information processing systems, 32, 2019
Xin Guo, Anran Hu, Renyuan Xu, and Junzi Zhang. Learning mean-field games.Advances in neural information processing systems, 32, 2019
work page 2019
-
[30]
Scaling up mean field games with online mirror descent
Julien Perolat, Sarah Perrin, Romuald Elie, Mathieu Laurière, Georgios Piliouras, Matthieu Geist, Karl Tuyls, and Olivier Pietquin. Scaling up mean field games with online mirror descent. arXiv preprint arXiv:2103.00623, 2021
-
[31]
Learning correlated equilibria in mean-field games.arXiv preprint arXiv:2208.10138, 2022
Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, and Karl Tuyls. Learning correlated equilibria in mean-field games.arXiv preprint arXiv:2208.10138, 2022
-
[32]
René Carmona, Mathieu Laurière, and Zongjun Tan. Model-free mean-field reinforcement learning: mean-field mdp and mean-field q-learning.The Annals of Applied Probability, 33(6B):5334–5381, 2023. 11
work page 2023
-
[33]
Haotian Gu, Xin Guo, Xiaoli Wei, and Renyuan Xu. Mean-field controls with q-learning for cooperative marl: convergence and complexity analysis.SIAM Journal on Mathematics of Data Science, 3(4):1168–1196, 2021
work page 2021
-
[34]
Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, and Satish V Ukkusuri. On the approximation of cooperative heterogeneous multi-agent reinforcement learning (marl) using mean field control (mfc).Journal of Machine Learning Research, 23(129):1–46, 2022
work page 2022
-
[35]
Andrea Angiuli, Jean-Pierre Fouque, and Mathieu Laurière. Unified reinforcement q-learning for mean field game and control problems.Mathematics of Control, Signals, and Systems, 34(2):217–271, 2022
work page 2022
-
[36]
Reinforcement learning in stationary mean- field games
Jayakumar Subramanian and Aditya Mahajan. Reinforcement learning in stationary mean- field games. InProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pages 251–259, 2019
work page 2019
-
[37]
Multi type mean field reinforcement learning.arXiv preprint arXiv:2002.02513, 2020
Sriram Ganapathi Subramanian, Pascal Poupart, Matthew E Taylor, and Nidhi Hegde. Multi type mean field reinforcement learning.arXiv preprint arXiv:2002.02513, 2020
-
[38]
Partially observable mean field reinforcement learning.arXiv preprint arXiv:2012.15791, 2020
Sriram Ganapathi Subramanian, Matthew E Taylor, Mark Crowley, and Pascal Poupart. Partially observable mean field reinforcement learning.arXiv preprint arXiv:2012.15791, 2020
-
[39]
Efficient model-based multi-agent reinforcement learning via optimistic equilibrium computation
Pier Giuseppe Sessa, Maryam Kamgarpour, and Andreas Krause. Efficient model-based multi-agent reinforcement learning via optimistic equilibrium computation. InInternational Conference on Machine Learning, pages 19580–19597. PMLR, 2022
work page 2022
-
[40]
Kai Cui, Sascha Hauck, Christian Fabian, and Heinz Koeppl. Learning decentralized partially observable mean field control for artificial collective behavior.arXiv preprint arXiv:2307.06175, 2023
-
[41]
Major-minor mean field multi-agent reinforcement learning
Kai Cui, Christian Fabian, Anam Tahir, and Heinz Koeppl. Major-minor mean field multi-agent reinforcement learning. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[42]
Towards comprehen- sive testing on the robustness of cooperative multi-agent reinforcement learning
Jun Guo, Yonghong Chen, Yihang Hao, Zixin Yin, Yin Yu, and Simin Li. Towards comprehen- sive testing on the robustness of cooperative multi-agent reinforcement learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 115–122, 2022
work page 2022
-
[43]
Attacking cooperative multi-agent reinforcement learning by adversarial minority influence
Simin Li, Jun Guo, Jingqiao Xiu, Pu Feng, Xin Yu, Aishan Liu, Wenjun Wu, and Xianglong Liu. Attacking cooperative multi-agent reinforcement learning by adversarial minority influence. arXiv preprint arXiv:2302.03322, 2023
-
[44]
Kaiqing Zhang, Tao Sun, Yunzhe Tao, Sahika Genc, Sunil Mallya, and Tamer Basar. Robust multi-agent reinforcement learning with model uncertainty.Advances in neural information processing systems, 33:10571–10583, 2020
work page 2020
-
[45]
Laixi Shi, Eric Mazumdar, Yuejie Chi, and Adam Wierman. Sample-efficient robust multi- agent reinforcement learning in the face of environmental uncertainty.arXiv preprint arXiv:2404.18909, 2024
-
[46]
Learning and testing resilience in cooperative multi-agent systems
Thomy Phan, Thomas Gabor, Andreas Sedlmeier, Fabian Ritz, Bernhard Kempter, Cornel Klein, Horst Sauer, Reiner Schmid, Jan Wieghardt, Marc Zeller, et al. Learning and testing resilience in cooperative multi-agent systems. InProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 1055–1063, 2020
work page 2020
-
[47]
Efficient influence maximization in social networks
Wei Chen, Yajun Wang, and Siyu Yang. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 199–208, 2009
work page 2009
-
[48]
User interactions in social networks and their implications
Christo Wilson, Bryce Boe, Alessandra Sala, Krishna PN Puttaswamy, and Ben Y Zhao. User interactions in social networks and their implications. InProceedings of the 4th ACM European conference on Computer systems, pages 205–218, 2009
work page 2009
-
[49]
Scalable influence maximization in social networks under the linear threshold model
Wei Chen, Yifei Yuan, and Li Zhang. Scalable influence maximization in social networks under the linear threshold model. In2010 IEEE international conference on data mining, pages 88–97. IEEE, 2010. 12
work page 2010
-
[50]
A fast and effective heuristic for discovering small target sets in social networks
Gennaro Cordasco, Luisa Gargano, Marco Mecchia, Adele A Rescigno, and Ugo Vaccaro. A fast and effective heuristic for discovering small target sets in social networks. InCombinatorial Optimization and Applications: 9th International Conference, COCOA 2015, Houston, TX, USA, December 18-20, 2015, Proceedings, pages 193–208. Springer, 2015
work page 2015
-
[51]
A genetic newgreedy algorithm for influence maximization in social network
Chun-Wei Tsai, Yo-Chung Yang, and Ming-Chao Chiang. A genetic newgreedy algorithm for influence maximization in social network. In2015 IEEE International Conference on Systems, Man, and Cybernetics, pages 2549–2554. IEEE, 2015
work page 2015
-
[52]
Influence maximization in social networks with genetic algorithms
Doina Bucur and Giovanni Iacca. Influence maximization in social networks with genetic algorithms. InApplications of Evolutionary Computation: 19th European Conference, EvoAp- plications 2016, Porto, Portugal, March 30–April 1, 2016, Proceedings, Part I 19, pages 379–392. Springer, 2016
work page 2016
-
[53]
Community-based greedy algorithm for mining top-k influential nodes in mobile social networks
Yu Wang, Gao Cong, Guojie Song, and Kunqing Xie. Community-based greedy algorithm for mining top-k influential nodes in mobile social networks. InProceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1039–1048, 2010
work page 2010
-
[54]
Yi-Cheng Chen, Wen-Yuan Zhu, Wen-Chih Peng, Wang-Chien Lee, and Suh-Yin Lee. Cim: Community-based influence maximization in social networks.ACM Transactions on Intelligent Systems and Technology (TIST), 5(2):1–31, 2014
work page 2014
-
[55]
Controlling graph dynamics with reinforcement learning and graph neural networks
Eli Meirom, Haggai Maron, Shie Mannor, and Gal Chechik. Controlling graph dynamics with reinforcement learning and graph neural networks. InInternational Conference on Machine Learning, pages 7565–7577. PMLR, 2021
work page 2021
-
[56]
Hui Li, Mengting Xu, Sourav S Bhowmick, Joty Shafiq Rayhan, Changsheng Sun, and Jiangtao Cui. Piano: Influence maximization meets deep reinforcement learning.IEEE Transactions on Computational Social Systems, 10(3):1288–1300, 2022
work page 2022
-
[57]
Tiantian Chen, Siwen Yan, Jianxiong Guo, and Weili Wu. Touplegdd: A fine-designed solution of influence maximization by deep reinforcement learning.IEEE Transactions on Computational Social Systems, 11(2):2210–2221, 2023
work page 2023
-
[58]
Deep graph representation learning and optimization for influence maximization
Chen Ling, Junji Jiang, Junxiang Wang, My T Thai, Renhao Xue, James Song, Meikang Qiu, and Liang Zhao. Deep graph representation learning and optimization for influence maximization. InInternational Conference on Machine Learning, pages 21350–21361. PMLR, 2023
work page 2023
-
[59]
Simin Li, Jun Guo, Jingqiao Xiu, Ruixiao Xu, Xin Yu, Jiakai Wang, Aishan Liu, Yaodong Yang, and Xianglong Liu. Byzantine robust cooperative multi-agent reinforcement learning as a bayesian game.arXiv preprint arXiv:2305.12872, 2023
-
[60]
Robust dynamic programming.Mathematics of Operations Research, 30(2):257–280, 2005
Garud N Iyengar. Robust dynamic programming.Mathematics of Operations Research, 30(2):257–280, 2005
work page 2005
-
[61]
Li An, V olker Grimm, Abigail Sullivan, BL Turner Ii, Nicolas Malleson, Alison Heppenstall, Christian Vincenot, Derek Robinson, Xinyue Ye, Jianguo Liu, et al. Challenges, tasks, and opportunities in modeling agent-based complex systems.Ecological Modelling, 457:109685, 2021
work page 2021
-
[63]
Tamás Vicsek, András Czirók, Eshel Ben-Jacob, Inon Cohen, and Ofer Shochet. Novel type of phase transition in a system of self-driven particles.Physical review letters, 75(6):1226, 1995
work page 1995
-
[64]
Marcel Salathé and James H Jones. Dynamics and control of diseases in networks with community structure.PLoS computational biology, 6(4):e1000736, 2010. 13
work page 2010
-
[65]
Feudal networks for hierarchical reinforcement learning
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. InInternational conference on machine learning, pages 3540–3549. PMLR, 2017
work page 2017
-
[66]
René Carmona, François Delarue, et al.Probabilistic theory of mean field games with applica- tions I-II. Springer, 2018
work page 2018
-
[67]
Dynamic programming.science, 153(3731):34–37, 1966
Richard Bellman. Dynamic programming.science, 153(3731):34–37, 1966
work page 1966
-
[68]
Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015
work page 2015
-
[69]
Deep Reinforcement Learning in Large Discrete Action Spaces
Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. Deep reinforcement learning in large discrete action spaces.arXiv preprint arXiv:1512.07679, 2015. 14 APPENDIX FOR"VULNERABLEAGENTIDENTIFICA- TION INLARGE-SCALEMULTI-AGENTREINFORCE- MENTLEARN...
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[70]
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.