Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework
Pith reviewed 2026-05-21 16:24 UTC · model grok-4.3
The pith
A two-layer LLM-RL framework defends cloud networks by adapting to new structures and attacks without retraining.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present CyberOps-Bots as a hierarchical multi-agent system in which an LLM agent equipped with ReAct planning, IPDRR perception, memory, and tool integration manages global awareness and high-level defense tactics drawn from the MITRE ATT&CK model, while separate pre-trained RL agents handle atomic resource deployment and isolation tasks in local regions; experiments on real cloud data show this yields 68.5 percent higher maintained availability and a 34.7 percent jumpstart gain across scenario shifts compared with prior methods.
What carries the argument
The two-layer hierarchy that pairs an LLM agent for strategic planning and human-in-the-loop oversight with lower-level RL agents for reliable local execution.
If this is right
- Defense policies remain effective when network size or topology changes without any model retraining.
- Operators can inject intent through the LLM layer to steer responses while RL agents still execute reliably.
- Pre-trained RL components can be reused across different regions, reducing the cost of scaling defenses.
- Overall system resilience rises because global planning adapts faster than pure RL approaches while local actions stay deterministic.
Where Pith is reading between the lines
- The same separation of language-based planning from action execution could be tested in other dynamic control settings such as power-grid load balancing under shifting demand.
- Over time the framework might allow security teams to update high-level goals in natural language rather than rewriting reward functions for each new threat class.
- Deployment logs from the LLM planning layer could supply traceable records that help auditors verify compliance during incidents.
Load-bearing premise
The real cloud datasets used for testing accurately capture the range of network structures, scales, attack strategies, and intensities that occur in live production environments, and LLM outputs can be converted into RL actions without introducing new errors or vulnerabilities.
What would settle it
Running the framework on a production-scale cloud that encounters an attack intensity or network configuration absent from the study datasets and observing whether availability falls below the reported levels or LLM-generated actions produce execution failures.
Figures
read the original abstract
While virtualization and resource pooling empower cloud networks with structural flexibility and elastic scalability, they inevitably expand the attack surface and challenge cyber resilience. Reinforcement Learning (RL)-based defense strategies have been developed to optimize resource deployment and isolation policies under adversarial conditions, aiming to enhance system resilience by maintaining and restoring network availability. However, existing approaches lack robustness as they require retraining to adapt to dynamic changes in network structure, node scale, attack strategies, and attack intensity. Furthermore, the lack of Human-in-the-Loop (HITL) support limits interpretability and flexibility. To address these limitations, we propose CyberOps-Bots, a hierarchical multi-agent reinforcement learning framework empowered by Large Language Models (LLMs). Inspired by MITRE ATT&CK's Tactics-Techniques model, CyberOps-Bots features a two-layer architecture: (1) An upper-level LLM agent with four modules--ReAct planning, IPDRR-based perception, long-short term memory, and action/tool integration--performs global awareness, human intent recognition, and tactical planning; (2) Lower-level RL agents, developed via heterogeneous separated pre-training, execute atomic defense actions within localized network regions. This synergy preserves LLM adaptability and interpretability while ensuring reliable RL execution. Experiments on real cloud datasets show that, compared to state-of-the-art algorithms, CyberOps-Bots maintains network availability 68.5% higher and achieves a 34.7% jumpstart performance gain when shifting the scenarios without retraining. To our knowledge, this is the first study to establish a robust LLM-RL framework with HITL support for cloud defense.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes CyberOps-Bots, a hierarchical multi-agent reinforcement learning framework that integrates Large Language Models (LLMs) to improve cloud network resilience against adversarial attacks. The architecture consists of an upper-level LLM agent (with ReAct planning, IPDRR-based perception, long-short term memory, and action/tool integration) for global awareness and tactical planning, paired with lower-level RL agents trained via heterogeneous separated pre-training for localized defense actions. Experiments on real cloud datasets are reported to show that CyberOps-Bots maintains 68.5% higher network availability and achieves a 34.7% jumpstart performance gain under scenario shifts without retraining, relative to state-of-the-art algorithms; the work positions itself as the first LLM-RL framework with Human-in-the-Loop support for this domain.
Significance. If the empirical claims are substantiated with full experimental details, the work could meaningfully advance hybrid LLM-RL approaches for cyber defense by mitigating the retraining requirement of conventional RL methods while adding interpretability via HITL. This would be relevant for dynamic cloud environments where network structure, scale, and attack patterns evolve.
major comments (2)
- [Abstract] Abstract: The headline claims of 68.5% higher network availability and 34.7% jumpstart gain without retraining are presented without any description of the baselines, statistical tests performed, dataset characteristics (size, topology, attack distributions), or the precise formulas/metrics used to compute these percentages, rendering the central empirical result unverifiable from the provided text.
- [Experiments] Experiments section: The scenario-shift protocol is not quantitatively specified (e.g., no reported deltas in node count, attack-vector entropy, intensity distributions, or structural changes between train and test regimes), which directly undermines the load-bearing claim that the observed gains demonstrate genuine cross-scenario generalization rather than proximity to the training distribution.
minor comments (1)
- The abstract and architecture description would benefit from explicit citation of the specific real cloud datasets employed and a short reproducibility note on how LLM outputs are mapped to RL action spaces.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We agree that the abstract and experiments section require additional quantitative details to make the empirical claims fully verifiable. We will incorporate these clarifications in the revised manuscript. Our responses to the major comments are provided below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claims of 68.5% higher network availability and 34.7% jumpstart gain without retraining are presented without any description of the baselines, statistical tests performed, dataset characteristics (size, topology, attack distributions), or the precise formulas/metrics used to compute these percentages, rendering the central empirical result unverifiable from the provided text.
Authors: We agree that the abstract does not currently supply enough context to allow direct verification of the headline performance numbers. In the revised version we will expand the abstract to name the primary state-of-the-art baselines, note the statistical tests used to establish significance, summarize the real-cloud dataset properties (scale, topology family, and attack-type distribution), and state the exact definitions of network availability and jumpstart gain. These additions will be kept concise while rendering the central claims traceable from the abstract alone. revision: yes
-
Referee: [Experiments] Experiments section: The scenario-shift protocol is not quantitatively specified (e.g., no reported deltas in node count, attack-vector entropy, intensity distributions, or structural changes between train and test regimes), which directly undermines the load-bearing claim that the observed gains demonstrate genuine cross-scenario generalization rather than proximity to the training distribution.
Authors: The referee correctly notes that the scenario-shift protocol must be described quantitatively to substantiate the generalization claim. We will revise the Experiments section to report the concrete differences between training and test regimes, including measured deltas in node count, attack-vector entropy, attack-intensity distributions, and any topological alterations. These additions will allow readers to assess whether the reported gains reflect genuine cross-scenario robustness. revision: yes
Circularity Check
No circularity: empirical results presented as direct experimental outcomes
full rationale
The paper describes a hierarchical LLM-RL framework for cloud defense and reports performance gains from experiments on real cloud datasets. No equations, derivations, or first-principles predictions appear that reduce by construction to fitted inputs, self-definitions, or self-citation chains. The 68.5% availability and 34.7% jumpstart claims are framed as measured outcomes compared to baselines, not quantities defined in terms of the model's own parameters or prior self-referential results. The architecture draws on external MITRE ATT&CK inspiration and heterogeneous pre-training, but these are design elements validated experimentally rather than circularly justified. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLM agents can perform reliable ReAct planning, perception, and tool integration for cyber defense tasks
- domain assumption Heterogeneous separated pre-training produces RL agents that execute atomic defense actions reliably in localized regions
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
hierarchical multi-agent reinforcement learning framework empowered by Large Language Models... ReAct planning, IPDRR-based perception, long-short term memory
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
heterogeneous separated pre-training... specialized reward functions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
R. Dilworth, “Advancements and challenges in cloud computing: Multi- cloud management, security, and ai-driven threat mitigation,” inPro- ceedings of the 2024 7th Artificial Intelligence and Cloud Computing Conference, 2024, pp. 639–645
work page 2024
-
[2]
D. Soldani, P. Nahi, H. Bour, S. Jafarizadeh, M. F. Soliman, L. Di Gio- vanna, F. Monaco, G. Ognibene, and F. Risso, “ebpf: A new approach to cloud-native observability, networking and security for current (5g) and future mobile networks (6g and beyond),”IEEE Access, vol. 11, pp. 57 174–57 202, 2023
work page 2023
-
[3]
Empowering cloud computing with network acceleration: A survey,
L. Rosa, L. Foschini, and A. Corradi, “Empowering cloud computing with network acceleration: A survey,”IEEE Communications Surveys & Tutorials, vol. 26, no. 4, pp. 2729–2768, 2024
work page 2024
-
[4]
K. Venkata, “Autonomous cloud networking in 2024: Leveraging ai and intent-based architectures for self-healing and optimization,” 2025
work page 2024
-
[5]
A. M. Abdallah, A. Saif Rashed Obaid Alkaabi, G. Bark Nasser Douman Alameri, S. H. Rafique, N. S. Musa, and T. Murugan, “Cloud network anomaly detection using machine and deep learning tech- niques— recent research advancements,”IEEE Access, vol. 12, pp. 56 749–56 773, 2024
work page 2024
-
[6]
Deep reinforcement learning for cyber security,
T. T. Nguyen and V . J. Reddi, “Deep reinforcement learning for cyber security,”IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 8, pp. 3779–3795, 2023
work page 2023
-
[7]
Autonomous network defence using reinforcement learning,
M. Foley, C. Hicks, K. Highnam, and V . Mavroudis, “Autonomous network defence using reinforcement learning,” inProceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, ser. ASIA CCS ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 1252–1254. [Online]. Available: https://doi.org/10.1145/3488932.3527286
-
[8]
Optimal decision making approach for cyber security defense using evolutionary game,
H. Hu, Y . Liu, C. Chen, H. Zhang, and Y . Liu, “Optimal decision making approach for cyber security defense using evolutionary game,”IEEE Transactions on Network and Service Management, vol. 17, no. 3, pp. 1683–1700, 2020
work page 2020
-
[9]
Automated cyber defence: A review,
S. Vyas, J. Hannay, A. Bolton, and P. P. Burnap, “Automated cyber defence: A review,”arXiv preprint arXiv:2303.04926, 2023
-
[10]
Q. Li, R. Wang, D. Li, F. Shi, M. Zhang, A. Chattopadhyay, Y . Shen, and Y . Li, “Dynpen: Automated penetration testing in dynamic network scenarios using deep reinforcement learning,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 8966–8981, 2024
work page 2024
-
[11]
Imbalance in the cloud: An analysis on alibaba cluster trace,
C. Lu, K. Ye, G. Xu, C.-Z. Xu, and T. Bai, “Imbalance in the cloud: An analysis on alibaba cluster trace,” in2017 IEEE International Conference on Big Data (Big Data), 2017, pp. 2884–2892
work page 2017
-
[12]
Heterogeneity and dynamicity of clouds at scale: Google trace analy- sis,
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch, “Heterogeneity and dynamicity of clouds at scale: Google trace analy- sis,” inProceedings of the third ACM symposium on cloud computing, 2012, pp. 1–13
work page 2012
-
[13]
A case study of the capital one data breach,
N. Novaes Neto, S. Madnick, A. Moraes G de Paula, and N. Malara Borges, “A case study of the capital one data breach,”Stuart E. and Moraes G. de Paula, Anchises and Malara Borges, Natasha, A Case Study of the Capital One Data Breach (January 1, 2020), 2020
work page 2020
-
[14]
How Google Cloud Blocked Largest Layer 7 DDoS Attack at 46 Million RPS,
Google Cloud Armor, “How Google Cloud Blocked Largest Layer 7 DDoS Attack at 46 Million RPS,” 2022. [Online]. Available: https://cloud.google.com/blog/products/identity-security/ how-google-cloud-blocked-largest-layer-7-ddos-attack-at-46-million-rps
work page 2022
-
[15]
Causally aware reinforcement learning agents for autonomous cyber defence,
T. Purves, K. G. Kyriakopoulos, S. Jenkins, I. Phillips, and T. Dudman, “Causally aware reinforcement learning agents for autonomous cyber defence,”Knowledge-Based Systems, vol. 304, p. 112521, 2024
work page 2024
-
[16]
Learning games for defending advanced persistent threats in cyber systems,
T. Zhu, D. Ye, Z. Cheng, W. Zhou, and P. S. Yu, “Learning games for defending advanced persistent threats in cyber systems,”IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 53, no. 4, pp. 2410–2422, 2023
work page 2023
-
[17]
Reinforcement- learning-based apt defense for large-scale smart grids,
L. Xiao, H. Liu, Z. Lv, Y . Chen, Z. Lin, and Y . Du, “Reinforcement- learning-based apt defense for large-scale smart grids,”IEEE Internet of Things Journal, vol. 12, no. 9, pp. 11 917–11 925, 2025
work page 2025
-
[18]
Mitre att&ck: State of the art and way forward,
B. Al-Sada, A. Sadighian, and G. Oligeri, “Mitre att&ck: State of the art and way forward,”ACM Comput. Surv., vol. 57, no. 1, Oct. 2024. [Online]. Available: https://doi.org/10.1145/3687300
-
[19]
Y . Peng, H. Hu, F. Li, Y . Jiang, J. Tang, and Y . Liu, “Llm4game: Multi- agent reinforcement learning with knowledge injection for dynamic defense resource allocation in cloud storage,”Computer Networks, p. 111748, 2025
work page 2025
-
[20]
IDS-agent: An LLM agent for explainable intrusion detection in iot networks,
Y . Li, Z. Xiang, N. D. Bastian, D. Song, and B. Li, “IDS-agent: An LLM agent for explainable intrusion detection in iot networks,” 2025. [Online]. Available: https://openreview.net/forum?id=uuCcK4cmlH
work page 2025
-
[21]
Dual-reinforcement-learning-based attack path prediction for 5g industrial cyber–physical systems,
X. Li, X. Hu, and T. Jiang, “Dual-reinforcement-learning-based attack path prediction for 5g industrial cyber–physical systems,”IEEE Internet of Things Journal, vol. 11, no. 1, pp. 50–58, 2024
work page 2024
-
[22]
Hierarchical multi-agent reinforcement learning for cyber network defense,
A. V . Singh, E. Rathbun, E. Graham, L. Oakley, S. Boboila, A. Oprea, and P. Chin, “Hierarchical multi-agent reinforcement learning for cyber network defense,”arXiv preprint arXiv:2410.17351, 2024
-
[23]
J. Chen, X. Lan, Q. Zhang, W. Ma, W. Fang, and J. He, “Defending against apt attacks in cloud computing environments using grouped mul- tiagent deep reinforcement learning,”IEEE Internet of Things Journal, vol. 12, no. 12, pp. 19 459–19 470, 2025
work page 2025
-
[24]
Y . Cao, K. Liu, Y . Lin, L. Wang, and Y . Xia, “Deep-reinforcement- learning-based self-evolving moving target defense approach against unknown attacks,”IEEE Internet of Things Journal, vol. 11, no. 20, pp. 33 027–33 039, 2024. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 18
work page 2024
-
[25]
B. Ren, Y . Tang, H. Wang, Y . Wang, J. Liu, G. Gao, and W. Wei, “A multiagent deep reinforcement learning autonomous security manage- ment approach for internet of things,”IEEE Internet of Things Journal, vol. 11, no. 15, pp. 25 600–25 612, 2024
work page 2024
-
[26]
Recent developments of game theory and reinforcement learning approaches: A systematic review,
G. Jain, A. Kumar, and S. A. Bhat, “Recent developments of game theory and reinforcement learning approaches: A systematic review,” IEEE Access, vol. 12, pp. 9999–10 011, 2024
work page 2024
-
[27]
Game-theoretic apt defense: An experimental study on robotics,
S. Rass, S. K ¨onig, J. Wachter, V . Mayoral-Vilches, and E. Panaousis, “Game-theoretic apt defense: An experimental study on robotics,”Com- puters & Security, vol. 132, p. 103328, 2023
work page 2023
-
[28]
G. Kong, F. Chen, X. Yang, G. Cheng, S. Zhang, and W. He, “Optimal deception asset deployment in cybersecurity: A nash q-learning approach in multi-agent stochastic games,”Applied Sciences, vol. 14, no. 1,
-
[29]
Available: https://www.mdpi.com/2076-3417/14/1/357
[Online]. Available: https://www.mdpi.com/2076-3417/14/1/357
work page 2076
-
[30]
Resilient cyber-physical system hon- eypots for cyberattacker engagement,
A. S. Mohamed and D. Kundur, “Resilient cyber-physical system hon- eypots for cyberattacker engagement,”IEEE Transactions on Industrial Informatics, vol. 21, no. 11, pp. 8585–8595, 2025
work page 2025
-
[31]
T. Ramana, M. Thirunavukkarasan, A. S. Mohammed, G. G. Devarajan, and S. M. Nagarajan, “Ambient intelligence approach: Internet of things based decision performance analysis for intrusion detection,”Computer Communications, vol. 195, pp. 315–322, 2022
work page 2022
-
[32]
Moving target defense (mtd) for 6g edge-to-cloud continuum: A cognitive perspective,
W. Soussi, G. G ¨ur, and B. Stiller, “Moving target defense (mtd) for 6g edge-to-cloud continuum: A cognitive perspective,”IEEE Network, vol. 39, no. 1, pp. 149–156, 2025
work page 2025
-
[33]
Markov game based on reinforcement learning solution against cyber–physical attacks in smart grid,
K. Bitirgen and ¨U. B. Filik, “Markov game based on reinforcement learning solution against cyber–physical attacks in smart grid,”Expert Systems with Applications, vol. 255, p. 124607, 2024
work page 2024
-
[34]
Y . Hou, G. Han, F. Zhang, and C. Lin, “Enhancing underwater iot se- curity: A collaborative pursuit strategy using multi-agent reinforcement learning,”IEEE Internet of Things Magazine, vol. 7, no. 5, pp. 112–118, 2024
work page 2024
-
[35]
Y . Tang, J. Sun, H. Wang, J. Deng, L. Tong, and W. Xu, “A method of network attack-defense game and collaborative defense decision-making based on hierarchical multi-agent reinforcement learning,”Computers & Security, vol. 142, p. 103871, 2024
work page 2024
-
[36]
G. Mcdonald, L. Li, and R. A. Mallah, “Finding the optimal security policies for autonomous cyber operations with competitive reinforce- ment learning,”IEEE Access, vol. 12, pp. 120 292–120 305, 2024
work page 2024
-
[37]
A game-theoretic method for defending against advanced persistent threats in cyber systems,
L. Zhang, T. Zhu, F. K. Hussain, D. Ye, and W. Zhou, “A game-theoretic method for defending against advanced persistent threats in cyber systems,”IEEE Transactions on Information Forensics and Security, vol. 18, pp. 1349–1364, 2023
work page 2023
-
[38]
A Survey of Large Language Models
W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y . Hou, Y . Min, B. Zhang, J. Zhang, Z. Donget al., “A survey of large language models,”arXiv preprint arXiv:2303.18223, vol. 1, no. 2, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[39]
J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[40]
The rise and potential of large language model based agents: A survey,
Z. Xi, W. Chen, X. Guo, W. He, Y . Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhouet al., “The rise and potential of large language model based agents: A survey,”Science China Information Sciences, vol. 68, no. 2, p. 121101, 2025
work page 2025
-
[41]
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
I. Mirzadeh, K. Alizadeh, H. Shahrokhi, O. Tuzel, S. Bengio, and M. Farajtabar, “Gsm-symbolic: Understanding the limitations of mathematical reasoning in large language models,”arXiv preprint arXiv:2410.05229, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[42]
Evaluating large language models on controlled generation tasks,
J. Sun, Y . Tian, W. Zhou, N. Xu, Q. Hu, R. Gupta, J. Wieting, N. Peng, and X. Ma, “Evaluating large language models on controlled generation tasks,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds. Singapore: Association for Computational Linguistics, Dec. 2023, pp. 3155–3168...
work page 2023
-
[43]
N. I. of Standards and Technology, “NIST Cybersecurity Framework,” Nov. 2014. [Online]. Available: https://www.nist.gov/cyberframework
work page 2014
-
[44]
Developing opti- mal causal cyber-defence agents via cyber security simulation,
A. Andrew, S. Spillard, J. Collyer, and N. Dhir, “Developing optimal causal cyber-defence agents via cyber security simulation,” 2022. [Online]. Available: https://arxiv.org/abs/2207.12355
-
[45]
Entity-based reinforcement learning for autonomous cyber defence,
I. Symes Thompson, A. Caron, C. Hicks, and V . Mavroudis, “Entity-based reinforcement learning for autonomous cyber defence,” inProceedings of the Workshop on Autonomous Cybersecurity, ser. AutonomousCyber ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 56–67. [Online]. Available: https://doi.org/10.1145/3689933.3690835
-
[46]
c. ISO/IEC Joint Technical Committee 1, Subcommittee 27 – Information security and privacy protection, “ISO/IEC 27001:2022,” Geneva, Switzerland, 2022. [Online]. Available: https://www.iso.org/ standard/27001
work page 2022
-
[47]
React: Synergizing reasoning and acting in language models,
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” inThe eleventh international conference on learning representations, 2022
work page 2022
-
[48]
Reflexion: language agents with verbal reinforcement learning,
N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: language agents with verbal reinforcement learning,” inAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., 2023, pp. 8634–8652. [Online]. Available: https://proceedings.neurip...
work page 2023
-
[49]
A Realistic Cyber Defense Dataset,
Communications Security Establishment (CSE) and The Canadian Institute for Cybersecurity (CIC), “A Realistic Cyber Defense Dataset,”
-
[50]
Available: https://registry.opendata.aws/cse-cic-ids2018
[Online]. Available: https://registry.opendata.aws/cse-cic-ids2018
-
[51]
A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, C. Zheng, D. Liu, F. Zhou, F. Huang, F. Hu, H. Ge, H. Wei, H. Lin, J. Tang, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Zhou, J. Lin, K. Dang, K. Bao, K. Yang, L. Yu, L. Deng, M. Li, M. Xue, M. Li, P. Zhang, P. Wang, Q. Zhu, R. Men, R. Gao, S. Liu, S. Luo, T. ...
-
[52]
[Online]. Available: https://arxiv.org/abs/2505.09388
work page internal anchor Pith review Pith/arXiv arXiv
-
[53]
Xuance: A comprehensive and unified deep reinforcement learning library,
W. Liu, W. Cai, K. Jiang, G. Cheng, Y . Wang, J. Wang, J. Cao, L. Xu, C. Mu, and C. Sun, “Xuance: A comprehensive and unified deep reinforcement learning library,” 2023. [Online]. Available: https://arxiv.org/abs/2312.16248
-
[54]
The surprising effectiveness of ppo in cooperative multi-agent games,
C. Yu, A. Velu, E. Vinitsky, J. Gao, Y . Wang, A. Bayen, and Y . WU, “The surprising effectiveness of ppo in cooperative multi-agent games,” inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 24 611–24 624. [Online]. Available: https://pr...
work page 2022
-
[55]
L. Matignon, G. J. Laurent, and N. Le Fort-Piat, “Independent rein- forcement learners in cooperative markov games: a survey regarding coordination problems,”The Knowledge Engineering Review, vol. 27, no. 1, p. 1–31, 2012
work page 2012
-
[56]
Monotonic value function factorisation for deep multi-agent reinforcement learning,
T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Monotonic value function factorisation for deep multi-agent reinforcement learning,”Journal of Machine Learning Research, vol. 21, no. 178, pp. 1–51, 2020. [Online]. Available: http://jmlr.org/papers/v21/20-081.html
work page 2020
-
[57]
Value-Decomposition Networks For Cooperative Multi-Agent Learning
P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V . Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuylset al., “Value-decomposition networks for cooperative multi-agent learning,” arXiv preprint arXiv:1706.05296, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[58]
S. Zhou, J. Liu, Y . Lu, J. Yang, Y . Zhang, and J. Chen, “Mind the gap: towards generalizable autonomous penetration testing via domain randomization and meta-reinforcement learning,”Frontiers of Information Technology & Electronic Engineering, vol. 26, no. 12, pp. 2511–2528, 2025. [Online]. Available: https://doi.org/10.1631/FITEE. 2500100
-
[59]
Transfer learning for reinforcement learning domains: A survey
M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning domains: A survey.”Journal of Machine Learning Research, vol. 10, no. 7, 2009
work page 2009
-
[60]
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Y . Li, F. Wei, C. Zhang, and H. Zhang, “Eagle-3: Scaling up inference acceleration of large language models via training-time test,” 2025. [Online]. Available: https://arxiv.org/abs/2503.01840
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.