Building Better Environments for Autonomous Cyber Defence

Ahmad Ridley; Ankita Samaddar; Chris Hicks; Ed Chapman; Elizabeth Bates; Himanshu Neema; Isaac Symes Thompson; Joshua Sylvester; Myles Foley; Nate Foster

arxiv: 2604.08805 · v1 · submitted 2026-04-09 · 💻 cs.CR · cs.AI

Building Better Environments for Autonomous Cyber Defence

Chris Hicks , Elizabeth Bates , Shae McFadden , Isaac Symes Thompson , Myles Foley , Ed Chapman , Nickolas Espinosa Dice , Ankita Samaddar

show 7 more authors

Joshua Sylvester Himanshu Neema Nicholas Butts Nate Foster Ahmad Ridley Zoe M Paul Jones

This is my paper

Pith reviewed 2026-05-10 16:43 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords reinforcement learningautonomous cyber defenceRL environmentscybersecuritybest practicesagent evaluationnetwork defencesimulation to real

0 comments

The pith

A framework decomposes the interface between RL cyber environments and real systems to guide better agent training and evaluation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports findings from a November 2025 workshop attended by experts in academia, industry, and government who design RL environments for autonomous cyber defence. It puts forward a structured way to separate how simulated training setups connect to actual network operations, along with practical guidelines for building those setups and testing the agents inside them. A reader would care because many current RL efforts in cybersecurity fail to produce agents that perform reliably when moved from simulation to the defense of real government or critical infrastructure networks, and the workshop aims to surface the tradecraft that addresses repeated design problems.

Core claim

The authors present two main contributions: a framework that decomposes the interface between RL cyber environments and real systems, and a set of guidelines for environment development and agent evaluation drawn directly from the shared experience of workshop participants who have hands-on work in this area.

What carries the argument

The framework for decomposing the interface between RL cyber environments and real systems, which isolates connection points so that design choices in simulation can be made more deliberately and transferred more reliably.

If this is right

Environment designers gain a clearer separation of concerns when linking simulation to real systems, reducing repeated trial-and-error.
Agent evaluation becomes more consistent across different projects because the guidelines supply shared criteria.
Training runs are more likely to produce agents whose behaviors transfer to operational networks in sensitive sectors.
Common hazards in environment construction are identified and can be avoided from the start of new projects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adoption of the interface framework could make it easier to compare results across separate RL ACD research groups by standardizing how environments are described.
The guidelines implicitly highlight the need for future benchmarks that test generalization from simulation to live network conditions.
Extending the decomposition approach to other RL security tasks, such as intrusion response, could follow the same structure.
A practical next step would be to release example environment templates built according to the workshop practices for others to adapt.

Load-bearing premise

The collective tradecraft and domain knowledge from the workshop participants forms a comprehensive and transferable set of best practices that will produce RL environments able to generalize to government and critical infrastructure networks.

What would settle it

A side-by-side test in which multiple teams build RL cyber environments according to the guidelines versus without them, then measure whether agents trained in the guideline-compliant environments achieve measurably higher defense success rates against realistic attack sequences on a held-out network simulation.

Figures

Figures reproduced from arXiv: 2604.08805 by Ahmad Ridley, Ankita Samaddar, Chris Hicks, Ed Chapman, Elizabeth Bates, Himanshu Neema, Isaac Symes Thompson, Joshua Sylvester, Myles Foley, Nate Foster, Nicholas Butts, Nickolas Espinosa Dice, Paul Jones, Shae McFadden, Zoe M.

**Figure 2.** Figure 2: The breakdown of Sim-to-Real Gap components. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The virtualisation vs. modelling gap taxonomy. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Sequence Modelling: the relationship between con [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

In November 2025, the authors ran a workshop on the topic of what makes a good reinforcement learning (RL) environment for autonomous cyber defence (ACD). This paper details the knowledge shared by participants both during the workshop and shortly afterwards by contributing herein. The workshop participants come from academia, industry, and government, and have extensive hands-on experience designing and working with RL and cyber environments. While there is now a sizeable body of literature describing work in RL for ACD, there is nevertheless a great deal of tradecraft, domain knowledge, and common hazards which are not detailed comprehensively in a single resource. With a specific focus on building better environments to train and evaluate autonomous RL agents in network defence scenarios, including government and critical infrastructure networks, the contributions of this work are twofold: (1) a framework for decomposing the interface between RL cyber environments and real systems, and (2) guidelines on current best practice for RL-based ACD environment development and agent evaluation, based on the key findings from our workshop.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a workshop synthesis that compiles practical tradecraft into a decomposition framework and guidelines for RL environments in autonomous cyber defence, but it offers no new experiments or validation.

read the letter

The core of the paper is the output from their November 2025 workshop: a two-part contribution that breaks down the RL-to-real-system interface and lists best-practice guidelines for building and evaluating ACD environments. They pull in voices from academia, industry, and government, which is the main strength. The decomposition framework looks like a reasonable way to organise the messy boundary between simulation and live networks, and the guidelines cover common hazards that scattered papers often skip. That kind of collation can save time for teams starting out on this work.

Referee Report

0 major / 2 minor

Summary. The manuscript details outcomes from a November 2025 workshop on reinforcement learning (RL) environments for autonomous cyber defence (ACD). Drawing on input from participants across academia, industry, and government, it presents a framework for decomposing the interface between RL cyber environments and real systems, together with guidelines for environment development and agent evaluation. The work positions these as a synthesis of tradecraft and domain knowledge to address gaps in the literature, with particular attention to applications in government and critical infrastructure networks.

Significance. If the framework and guidelines accurately reflect transferable practitioner insights, the paper could help standardize RL environment design for cyber defence and reduce common implementation hazards. The interdisciplinary synthesis is a clear strength and fills a documented gap in comprehensive resources. However, the absence of quantitative validation, formal proofs, or comparative experiments limits the result to a consolidation of expert opinion rather than a demonstrably superior methodology.

minor comments (2)

The abstract announces the twofold contributions but provides no high-level outline of the framework's decomposition steps; adding one sentence would improve immediate accessibility for readers.
Guidelines are presented as synthesized best practice; including at least one concrete workshop-derived example per major guideline would make the advice more actionable without altering the synthesis nature of the work.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and recommendation of minor revision. The manuscript is a synthesis of practitioner insights from the November 2025 workshop rather than an empirical study, and we have clarified this scope in the revision to better align with the referee's observations on the nature of the contribution.

read point-by-point responses

Referee: However, the absence of quantitative validation, formal proofs, or comparative experiments limits the result to a consolidation of expert opinion rather than a demonstrably superior methodology.

Authors: We agree that the work does not include new quantitative validation, formal proofs, or comparative experiments, as its purpose is to consolidate tradecraft and domain knowledge shared by workshop participants from academia, industry, and government. This is explicitly stated in the abstract and introduction as a synthesis addressing gaps in the literature. The framework and guidelines are presented as best practices distilled from expert input rather than a validated superior approach. In the revised manuscript, we have added a short clarifying paragraph in Section 1 (Introduction) to emphasize the scope as expert consensus synthesis and note the absence of empirical benchmarking, which directly addresses this point without changing the core contributions. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a workshop synthesis report whose contributions consist of a framework for decomposing RL-to-real-system interfaces and collated practitioner guidelines drawn from participant tradecraft. No mathematical derivations, equations, fitted parameters, or predictions appear in the provided text. The central claims rest on accurate reporting of external workshop outputs rather than any self-referential reduction, self-citation chain, or renaming of known results. The argument is therefore self-contained against external benchmarks with no load-bearing steps that reduce to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper draws on domain knowledge from workshop participants rather than formal axioms or new entities; no free parameters or invented constructs are introduced.

axioms (1)

domain assumption Expert consensus from a multi-sector workshop accurately captures transferable best practices for RL environment design in cyber defence.
Invoked when presenting the guidelines as current best practice without additional empirical validation.

pith-pipeline@v0.9.0 · 5515 in / 1179 out tokens · 55003 ms · 2026-05-10T16:43:27.035424+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages

[1]

Disrupting the first reported AI-orchestrated cyber espionage cam- paign. Tech. rep., Anthropic (2025), (Online, Accessed 21st January 2026) https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first- reported-AI-orchestrated-cyber-espionage-campaign.pdf

work page 2025
[2]

In: Proceedings of the 35th International Conference on Neural Information Processing Systems

Agarwal, R., Schwarzer, M., Castro, P.S., Courville, A., Bellemare, M.G.: Deep reinforcement learning at the edge of the statistical precipice. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. NIPS ’21 (2021)

work page 2021
[3]

In: 32nd USENIX Security Symposium (USENIX Security 23) (2023)

Al Wahaibi, S., Foley, M., Maffeis, S.: SQIRL: Grey-box detection of sql injec- tion vulnerabilities using reinforcement learning. In: 32nd USENIX Security Symposium (USENIX Security 23) (2023)

work page 2023
[4]

science 314(5799), 610–613 (2006)

Anderson, R., Moore, T.: The economics of information security. science 314(5799), 610–613 (2006)

work page 2006
[5]

In: Workshop on Machine Learning for Cybersecurity (ML4Cyber) (07 2022)

Andrew, A., Spillard, S., Collyer, J., Dhir, N.: Developing optimal causal cyber- defence agents via cyber security simulation. In: Workshop on Machine Learning for Cybersecurity (ML4Cyber) (07 2022)

work page 2022
[6]

Ashton, H.: Causal campbell-goodhart’s law and reinforcement learning (2021), https://arxiv.org/abs/2011.01010

work page arXiv 2021
[7]

In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (2023)

Bates, E., Mavroudis, V., Hicks, C.: Reward shaping for happier autonomous cyber security agents. In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (2023)

work page 2023
[8]

arXiv preprint arXiv:2503.03245 (2025)

Bates, E., Hicks, C., Mavroudis, V.: Less is more? rewards in rl for cyber defence. arXiv preprint arXiv:2503.03245 (2025)

work page arXiv 2025
[9]

Bates, E., Hicks, C., Mavroudis, V.: Beyond rewards in reinforcement learning for cyber defence (2026), https://arxiv.org/abs/2602.04809

work page arXiv 2026
[10]

Bates, E., Mavroudis, V., Hicks, C.: Reward shaping for happier autonomous cyber security agents (2023), https://arxiv.org/abs/2310.13565

work page arXiv 2023
[11]

https://doi.org/10.5281/zenodo.15147271, https://github.com/alan-turing- institute/r3ace

Chapman, E., Hicks, C., Mavroudis, V.: r3ace. https://doi.org/10.5281/zenodo.15147271, https://github.com/alan-turing- institute/r3ace

work page doi:10.5281/zenodo.15147271
[12]

Computers & Security (2023)

Chen, J., Hu, S., Zheng, H., Xing, C., Zhang, G.: GAIL-PT: An intelligent penetra- tion testing framework with generative adversarial imitation learning. Computers & Security (2023)

work page 2023
[13]

Advances in Neural Information Processing Systems (2024)

Chen, X., Nie, Y., Guo, W., Zhang, X.: When llm meets drl: Advancing jailbreaking efficiency via drl-guided search. Advances in Neural Information Processing Systems (2024)

work page 2024
[14]

In: 33rd USENIX Security Symposium (USENIX Security 24) (2024)

De Silva, R., Guo, W., Ruaro, N., Grishchenko, I., Kruegel, C., Vigna, G.: {GuideEnricher}: Protecting the anonymity of ethereum mixing service users with deep reinforcement learning. In: 33rd USENIX Security Symposium (USENIX Security 24) (2024)

work page 2024
[15]

Accessed 2026-02-19

Defence Science and Technology Laboratory UK: Primaite (primary-level ai train- ing environment), https://github.com/Autonomous-Resilient-Cyber-Defence/ PrimAITE, gitHub repository (tag: v4.0.0). Accessed 2026-02-19

work page 2026
[16]

Dudman, T., Bull, M.: Towards a generalisable cyber defence agent for real-world computer networks (2025), https://arxiv.org/abs/2511.09114

work page arXiv 2025
[17]

Emerson, H., Bates, L., Hicks, C., Mavroudis, V.: Cyborg++: An enhanced gym for the development of autonomous cyber agents (2024), https://arxiv.org/abs/ 2410.16324

work page arXiv 2024
[18]

Autonomous network defence using reinforcement learning,

Foley, M., Hicks, C., Highnam, K., Mavroudis, V.: Autonomous Network De- fence Using Reinforcement Learning. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. ASIA CCS ’22 (2022), https://doi.org/10.1145/3488932.3527286

work page doi:10.1145/3488932.3527286 2022
[19]

In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

Foley, M., Maffeis, S.: Apirl: Deep reinforcement learning for rest api fuzzing. In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

work page 2025
[20]

In: Conference on Applied Machine Learning in Information Security (CAMLIS) (2022)

Foley, M., Wang, M., M, Z., Hicks, C., Mavroudis, V.: Inroads into Autonomous Network Defence using Explained Reinforcement Learning. In: Conference on Applied Machine Learning in Information Security (CAMLIS) (2022)

work page 2022
[21]

In: 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS)

Gangupantulu, R., Cody, T., Park, P., Rahman, A., Eisenbeiser, L., Radke, D., Clark, R., Redino, C.: Using cyber terrain in reinforcement learning for penetration testing. In: 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS). IEEE (2022)

work page 2022
[22]

In: 2020 25th IEEE International Conference on Emerging Tech- nologies and Factory Automation (ETFA)

Geiger, M., Bauer, J., Masuch, M., Franke, J.: An analysis of black energy 3, crashoverride, and trisis, three malware approaches targeting operational tech- nology systems. In: 2020 25th IEEE International Conference on Emerging Tech- nologies and Factory Automation (ETFA). vol. 1, pp. 1537–1543. IEEE (2020)

work page 2020
[23]

In: European Symposium on Research in Computer Security

Goel, D., Moore, K., Guo, M., Wang, D., Kim, M., Camtepe, S.: Optimizing cy- ber defense in dynamic active directories through reinforcement learning. In: European Symposium on Research in Computer Security. Springer (2024)

work page 2024
[24]

In: Proceedings of the 2022 ACM SIGSAC conference on computer and communications security (2022)

Gohil, V., Guo, H., Patnaik, S., Rajendran, J.: Attrition: Attacking static hardware trojan detection techniques using reinforcement learning. In: Proceedings of the 2022 ACM SIGSAC conference on computer and communications security (2022)

work page 2022
[25]

Artificial Intelligence Review55(2), 895–943 (2022)

Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review55(2), 895–943 (2022)

work page 2022
[26]

https://github.com/cage-challenge/cage- challenge-3 (2022)

Group, T.C.W.: Ttcp cage challenge 3. https://github.com/cage-challenge/cage- challenge-3 (2022)

work page 2022
[27]

IEEE Transactions on Network and Service Management19(3), 2333–2348 (2022)

Hammar, K., Stadler, R.: Intrusion prevention through optimal stopping. IEEE Transactions on Network and Service Management19(3), 2333–2348 (2022). https://doi.org/10.1109/TNSM.2022.3176781

work page doi:10.1109/tnsm.2022.3176781 2022
[28]

In: International conference on decision and game theory for security

Han, Y., Rubinstein, B.I., Abraham, T., Alpcan, T., De Vel, O., Erfani, S., Hubczenko, D., Leckie, C., Montague, P.: Reinforcement learning for autonomous defence in software-defined networking. In: International conference on decision and game theory for security. Springer (2018)

work page 2018
[29]

In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security

Hicks, C., Mavroudis, V., Foley, M., Davies, T., Highnam, K., Watson, T.: Canaries and Whistles: Resilient Drone Communication Networks with (or without) Deep Reinforcement Learning. In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. AISec ’23 (2023)

work page 2023
[30]

ACM Transactions on Privacy and Security (2025)

Hore, S., Ghadermazi, J., Paudel, D., Shah, A., Das, T., Bastian, N.: Deep pack- gen: A deep reinforcement learning framework for adversarial network packet generation. ACM Transactions on Privacy and Security (2025)

work page 2025
[31]

arXiv preprint arXiv:1912.01798 (2019)

Hou, C., Zhou, M., Ji, Y., Daian, P., Tramer, F., Fanti, G., Juels, A.: Squirrl: Automat- ing attack analysis on blockchain incentive mechanisms with deep reinforcement learning. arXiv preprint arXiv:1912.01798 (2019)

work page arXiv 1912
[32]

In: 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

Hu, Z., Beuran, R., Tan, Y.: Automated penetration testing using deep reinforce- ment learning. In: 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE (2020)

work page 2020
[33]

Multi-agent reinforcement learning: A comprehensive survey

Huh, D., Mohapatra, P.: Multi-agent reinforcement learning: A comprehensive survey. arXiv preprint arXiv:2312.10256 (2023)

work page arXiv 2023
[34]

In: European Symposium on Research in Computer Security

Janisch, J., Pevn`y, T., Lis`y, V.: Nasimemu: Network attack simulator & emulator for training agents generalizing to novel scenarios. In: European Symposium on Research in Computer Security. pp. 589–608. Springer (2023)

work page 2023
[35]

Kaloroumakis, P., Smith, M.: Toward a knowledge graph of cybersecurity coun- termeasures (2020)

work page 2020
[36]

Karwowski, J., Hayman, O., Bai, X., Kiendlhofer, K., Griffin, C., Skalse, J.: Good- hart’s law in reinforcement learning (2023), https://arxiv.org/abs/2310.09144

work page arXiv 2023
[37]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Kiely, M., Ahiskali, M., Borde, E., Bowman, B., Bowman, D., van Bruggen, D., Cowan, K., Dasgupta, P., Devendorf, E., Edwards, B., et al.: Exploring the efficacy of multi-agent reinforcement learning for autonomous cyber defence: A cage challenge 4 perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 28907–28913 (2025)

work page 2025
[38]

AI Magazine46(3), e70021 (2025)

Kiely, M., Ahiskali, M., Borde, E., Bowman, B., Bowman, D., Van Bruggen, D., Cowan, K., Dasgupta, P., Devendorf, E., Edwards, B., et al.: Cage challenge 4: A scalable multi-agent reinforcement learning gym for autonomous cyber defence. AI Magazine46(3), e70021 (2025)

work page 2025
[39]

Kiely, M., Bowman, D., Standen, M., Moir, C.: On autonomous agents in a cyber defence environment (2023), https://arxiv.org/abs/2309.07388

work page arXiv 2023
[40]

King, I.J., Bowman, B., Huang, H.H.: Automated cyber defense with generalizable graph-based reinforcement learning agents (2025), https://arxiv.org/abs/2509. 16151

work page 2025
[41]

ieee Spectrum50(3), 48–53 (2013)

Kushner, D.: The real story of stuxnet. ieee Spectrum50(3), 48–53 (2013)

work page 2013
[42]

In: European Symposium on Research in Computer Security

Kvasov, A., Sahin, M., Hebert, C., De Oliveira, A.S.: Simulating deception for web applications using reinforcement learning. In: European Symposium on Research in Computer Security. Springer (2023)

work page 2023
[43]

Applied Intelligence 53, 27110–27127 (2023)

Li, Q., Hu, M., Hao, H., Zhang, M., Li, Y.: INNES: An intelligent network penetra- tion testing model based on deep reinforcement learning. Applied Intelligence 53, 27110–27127 (2023)

work page 2023
[44]

In: 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Luo, M., Xiong, W., Lee, G., Li, Y., Yang, X., Zhang, A., Tian, Y., Lee, H.H.S., Suh, G.E.: Autocat: Reinforcement learning for automated exploration of cache-timing attacks. In: 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE (2023)

work page 2023
[45]

Computers & Security (2021)

Maeda, R., Mimura, M.: Automating post-exploitation with deep reinforcement learning. Computers & Security (2021)

work page 2021
[46]

In: Proc

McFadden, S., Foley, M., D’Onghia, M., Hicks, C., Mavroudis, V., Paoletti, N., Pierazzi, F.: Drmd: Deep reinforcement learning for malware detection under concept drift. In: Proc. of the AAAI Conference on Artificial Intelligence (2026)

work page 2026
[47]

In: IEEE Workshop on Deep Learning Security and Privacy (DLSP) (2024)

McFadden, S., Maugeri, M., Hicks, C., Mavroudis, V., Pierazzi, F.: Wendigo: Deep reinforcement learning for denial-of-service query discovery in graphql. In: IEEE Workshop on Deep Learning Security and Privacy (DLSP) (2024)

work page 2024
[48]

arXiv preprint arXiv:2602.08690 (2026)

McFadden, S., Foley, M., Bates, E., Tsingenopoulos, I., Vyas, S., Mavroudis, V., Hicks, C., Pierazzi, F.: Sok: The pitfalls of deep reinforcement learning for cyber- security. arXiv preprint arXiv:2602.08690 (2026)

work page arXiv 2026
[49]

org/abs/2111.02445

Mern, J., Hatch, K., Silva, R., Hickert, C., Sookoor, T., Kochenderfer, M.J.: Au- tonomous attack mitigation for industrial control systems (2021), https://arxiv. org/abs/2111.02445

work page arXiv 2021
[50]

Mern, J., Sadigh, D., Kochenderfer, M.J.: Exchangeable input representations for reinforcement learning (2020), https://arxiv.org/abs/2003.09022

work page arXiv 2020
[51]

Miles, I., Farmer, S., Foster, D., Harrold, D., Palmer, G., Parry, C., Willis, C., Casassa Mont, M., Gralewski, L., Menzies, R., Morarji, N., Turkbeyler, E., Wilson, A., Beard, A., Marques, P., Francis Roscoe, J., Bailey, S., Cheah, M., Dorn, M., Haubrick, P., Lacey, M., Rimmer, D., Stone, J., Till, D., Heartfield, R., Harrison, A., Short, J., Wilson, T.,...

work page 2024
[52]

Molina-Markham, A., Miniter, C., Powell, B., Ridley, A.: Network environment design for autonomous cyberdefense (2021), https://arxiv.org/abs/2103.07583

work page arXiv 2021
[53]

arXiv preprint arXiv:2505.22531 (2025)

Molina-Markham, A., Robaina, L., Steinle, S., Trivedi, A., Tsui, D., Potteiger, N., Brandt, L., Winder, R., Ridley, A.: Training rl agents for multi-objective network defense tasks. arXiv preprint arXiv:2505.22531 (2025)

work page arXiv 2025
[54]

Nyberg, J., Johnson, P.: Structural generalization in autonomous cyber incident response with message-passing neural networks and reinforcement learning (2024), https://arxiv.org/abs/2407.05775

work page arXiv 2024
[55]

In: Proceedings of the 17th Cyber Security Experimenta- tion and Test Workshop

Oesch, S., Chaulagain, A., Weber, B., Dixson, M., Sadovnik, A., Roberson, B., Wat- son, C., Austria, P.: Towards a high fidelity training environment for autonomous cyber defense agents. In: Proceedings of the 17th Cyber Security Experimenta- tion and Test Workshop. p. 91–99. CSET ’24, Association for Computing Ma- chinery, New York, NY, USA (2024). https...

work page doi:10.1145/3675741.3675752 2024
[56]

Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., Song, D.: Assessing gener- alization in deep reinforcement learning (2018)

work page 2018
[57]

Palmer, G., Parry, C., Harrold, D.J.B., Willis, C.: Deep reinforcement learning for autonomous cyber defence: A survey (2024), https://arxiv.org/abs/2310.07745

work page arXiv 2024
[58]

Journal of Machine Learning Research (2024)

Patterson, A., Neumann, S., White, M., White, A.: Empirical design in reinforce- ment learning. Journal of Machine Learning Research (2024)

work page 2024
[59]

Com- puters, Materials & Continua (2022)

Praveena, V., V., A., Chinnasamy, P., Ali, I., Alroobaea, R., Alyahyan, S.Y., Raza, M.A.: Optimal deep reinforcement learning for intrusion detection in uavs. Com- puters, Materials & Continua (2022)

work page 2022
[60]

Operations Research70(6), 3601–3628 (2022)

Qu, G., Wierman, A., Li, N.: Scalable reinforcement learning for multiagent networked systems. Operations Research70(6), 3601–3628 (2022)

work page 2022
[61]

Applying communication privacy management theory to youth privacy management in AI contexts,

Samaddar, A., Potteiger, N., Koutsoukos, X.: Out-of-distribution detec- tion for neurosymbolic autonomous cyber agents. In: 2025 IEEE 4th In- ternational Conference on AI in Cybersecurity (ICAIC). pp. 1–9 (2025). https://doi.org/10.1109/ICAIC63015.2025.10849024

work page doi:10.1109/icaic63015.2025.10849024 2025
[62]

https:// networkattacksimulator.readthedocs.io/ (2019)

Schwartz, J., Kurniawatti, H.: Nasim: Network attack simulator. https:// networkattacksimulator.readthedocs.io/ (2019)

work page 2019
[63]

In: Force Readiness for Multi-Domain Operations through Modelling and Simulation: NATO Modelling and Simulation Group (MSG) Symposium (MSG-229)

Short, J.: The essential role of modelling and simulation in helping ai fight cyber-attacks. In: Force Readiness for Multi-Domain Operations through Modelling and Simulation: NATO Modelling and Simulation Group (MSG) Symposium (MSG-229). No. STO-MP-MSG-229 in STO Meeting Proceedings, NATO Science and Technology Organization (STO) (2025), https://publicati...

work page 2025
[64]

(2022), https://docs.oasis-open.org/openc2/oc2arch/v1.0/oc2arch- v1.0.html

Sparrell, D.: Open command and control (openc2) architecture specification version 1.0. (2022), https://docs.oasis-open.org/openc2/oc2arch/v1.0/oc2arch- v1.0.html

work page 2022
[65]

In: IJCAI-21 1st Interna- tional Workshop on Adaptive Cyber Defense (2021)

Standen, M., Lucas, M., Bowman, D., Richer, T., Kim, J., Marriott, D.: CybORG: A gym for the development of autonomous cyber agents. In: IJCAI-21 1st Interna- tional Workshop on Adaptive Cyber Defense (2021)

work page 2021
[66]

MIT Press, second edn

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, second edn. (2018), http://incompleteideas.net/book/the-book-2nd.html

work page 2018
[67]

Symes Thompson, I., Caron, A., Hicks, C., Mavroudis, V.: Entity-based reinforce- ment learning for autonomous cyber defence (2025), https://arxiv.org/abs/2410. 17647

work page 2025
[68]

Team., M.D.R.: Cyberbattlesim. https://github.com/microsoft/ cyberbattlesim (2021), created by Christian Seifert, Michael Betser, William Blum, James Bono, Kate Farris, Emily Goren, Justin Grana, Kris- tian Holsheimer, Brandon Marken, Joshua Neil, Nicole Nichols, Jugal Parikh, Haoran Wei

work page 2021
[69]

In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses (2024)

Terranova, F., Lahmadi, A., Chrisment, I.: Leveraging deep reinforcement learning for cyber-attack paths prediction: Formulation, generalization, and evaluation. In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses (2024)

work page 2024
[70]

In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses (2024)

Tsingenopoulos, I., Cortellazzi, J., Bosansk`y, B., Aonzo, S., Preuveneers, D., Joosen, W., Pierazzi, F., Cavallaro, L.: How to train your antivirus: Rl-based hardening through the problem space. In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses (2024)

work page 2024
[71]

ACM Comput

Vyas, S., Mavroudis, V., Burnap, P.: Towards the Deployment of Realistic Au- tonomous Cyber Network Defence: A Systematic Review. ACM Comput. Surv. 58(1) (Aug 2025). https://doi.org/10.1145/3729213

work page doi:10.1145/3729213 2025
[72]

Frontiers of Computer Science (2025)

Yang, Y., Chen, L., Liu, S., Wang, L., Fu, H., Liu, X., Chen, Z.: Behaviour-diverse automatic penetration testing: a coverage-based deep reinforcement learning approach. Frontiers of Computer Science (2025)

work page 2025
[73]

A survey on self-play methods in reinforcement learning

Zhang, R., Xu, Z., Ma, C., Yu, C., Tu, W.W., Tang, W., Huang, S., Ye, D., Ding, W., Yang, Y., et al.: A survey on self-play methods in reinforcement learning. arXiv preprint arXiv:2408.01072 (2024)

work page arXiv 2024

[1] [1]

Disrupting the first reported AI-orchestrated cyber espionage cam- paign. Tech. rep., Anthropic (2025), (Online, Accessed 21st January 2026) https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first- reported-AI-orchestrated-cyber-espionage-campaign.pdf

work page 2025

[2] [2]

In: Proceedings of the 35th International Conference on Neural Information Processing Systems

Agarwal, R., Schwarzer, M., Castro, P.S., Courville, A., Bellemare, M.G.: Deep reinforcement learning at the edge of the statistical precipice. In: Proceedings of the 35th International Conference on Neural Information Processing Systems. NIPS ’21 (2021)

work page 2021

[3] [3]

In: 32nd USENIX Security Symposium (USENIX Security 23) (2023)

Al Wahaibi, S., Foley, M., Maffeis, S.: SQIRL: Grey-box detection of sql injec- tion vulnerabilities using reinforcement learning. In: 32nd USENIX Security Symposium (USENIX Security 23) (2023)

work page 2023

[4] [4]

science 314(5799), 610–613 (2006)

Anderson, R., Moore, T.: The economics of information security. science 314(5799), 610–613 (2006)

work page 2006

[5] [5]

In: Workshop on Machine Learning for Cybersecurity (ML4Cyber) (07 2022)

Andrew, A., Spillard, S., Collyer, J., Dhir, N.: Developing optimal causal cyber- defence agents via cyber security simulation. In: Workshop on Machine Learning for Cybersecurity (ML4Cyber) (07 2022)

work page 2022

[6] [6]

Ashton, H.: Causal campbell-goodhart’s law and reinforcement learning (2021), https://arxiv.org/abs/2011.01010

work page arXiv 2021

[7] [7]

In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (2023)

Bates, E., Mavroudis, V., Hicks, C.: Reward shaping for happier autonomous cyber security agents. In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (2023)

work page 2023

[8] [8]

arXiv preprint arXiv:2503.03245 (2025)

Bates, E., Hicks, C., Mavroudis, V.: Less is more? rewards in rl for cyber defence. arXiv preprint arXiv:2503.03245 (2025)

work page arXiv 2025

[9] [9]

Bates, E., Hicks, C., Mavroudis, V.: Beyond rewards in reinforcement learning for cyber defence (2026), https://arxiv.org/abs/2602.04809

work page arXiv 2026

[10] [10]

Bates, E., Mavroudis, V., Hicks, C.: Reward shaping for happier autonomous cyber security agents (2023), https://arxiv.org/abs/2310.13565

work page arXiv 2023

[11] [11]

https://doi.org/10.5281/zenodo.15147271, https://github.com/alan-turing- institute/r3ace

Chapman, E., Hicks, C., Mavroudis, V.: r3ace. https://doi.org/10.5281/zenodo.15147271, https://github.com/alan-turing- institute/r3ace

work page doi:10.5281/zenodo.15147271

[12] [12]

Computers & Security (2023)

Chen, J., Hu, S., Zheng, H., Xing, C., Zhang, G.: GAIL-PT: An intelligent penetra- tion testing framework with generative adversarial imitation learning. Computers & Security (2023)

work page 2023

[13] [13]

Advances in Neural Information Processing Systems (2024)

Chen, X., Nie, Y., Guo, W., Zhang, X.: When llm meets drl: Advancing jailbreaking efficiency via drl-guided search. Advances in Neural Information Processing Systems (2024)

work page 2024

[14] [14]

In: 33rd USENIX Security Symposium (USENIX Security 24) (2024)

De Silva, R., Guo, W., Ruaro, N., Grishchenko, I., Kruegel, C., Vigna, G.: {GuideEnricher}: Protecting the anonymity of ethereum mixing service users with deep reinforcement learning. In: 33rd USENIX Security Symposium (USENIX Security 24) (2024)

work page 2024

[15] [15]

Accessed 2026-02-19

Defence Science and Technology Laboratory UK: Primaite (primary-level ai train- ing environment), https://github.com/Autonomous-Resilient-Cyber-Defence/ PrimAITE, gitHub repository (tag: v4.0.0). Accessed 2026-02-19

work page 2026

[16] [16]

Dudman, T., Bull, M.: Towards a generalisable cyber defence agent for real-world computer networks (2025), https://arxiv.org/abs/2511.09114

work page arXiv 2025

[17] [17]

Emerson, H., Bates, L., Hicks, C., Mavroudis, V.: Cyborg++: An enhanced gym for the development of autonomous cyber agents (2024), https://arxiv.org/abs/ 2410.16324

work page arXiv 2024

[18] [18]

Autonomous network defence using reinforcement learning,

Foley, M., Hicks, C., Highnam, K., Mavroudis, V.: Autonomous Network De- fence Using Reinforcement Learning. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. ASIA CCS ’22 (2022), https://doi.org/10.1145/3488932.3527286

work page doi:10.1145/3488932.3527286 2022

[19] [19]

In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

Foley, M., Maffeis, S.: Apirl: Deep reinforcement learning for rest api fuzzing. In: Proceedings of the AAAI Conference on Artificial Intelligence (2025)

work page 2025

[20] [20]

In: Conference on Applied Machine Learning in Information Security (CAMLIS) (2022)

Foley, M., Wang, M., M, Z., Hicks, C., Mavroudis, V.: Inroads into Autonomous Network Defence using Explained Reinforcement Learning. In: Conference on Applied Machine Learning in Information Security (CAMLIS) (2022)

work page 2022

[21] [21]

In: 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS)

Gangupantulu, R., Cody, T., Park, P., Rahman, A., Eisenbeiser, L., Radke, D., Clark, R., Redino, C.: Using cyber terrain in reinforcement learning for penetration testing. In: 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS). IEEE (2022)

work page 2022

[22] [22]

In: 2020 25th IEEE International Conference on Emerging Tech- nologies and Factory Automation (ETFA)

Geiger, M., Bauer, J., Masuch, M., Franke, J.: An analysis of black energy 3, crashoverride, and trisis, three malware approaches targeting operational tech- nology systems. In: 2020 25th IEEE International Conference on Emerging Tech- nologies and Factory Automation (ETFA). vol. 1, pp. 1537–1543. IEEE (2020)

work page 2020

[23] [23]

In: European Symposium on Research in Computer Security

Goel, D., Moore, K., Guo, M., Wang, D., Kim, M., Camtepe, S.: Optimizing cy- ber defense in dynamic active directories through reinforcement learning. In: European Symposium on Research in Computer Security. Springer (2024)

work page 2024

[24] [24]

In: Proceedings of the 2022 ACM SIGSAC conference on computer and communications security (2022)

Gohil, V., Guo, H., Patnaik, S., Rajendran, J.: Attrition: Attacking static hardware trojan detection techniques using reinforcement learning. In: Proceedings of the 2022 ACM SIGSAC conference on computer and communications security (2022)

work page 2022

[25] [25]

Artificial Intelligence Review55(2), 895–943 (2022)

Gronauer, S., Diepold, K.: Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review55(2), 895–943 (2022)

work page 2022

[26] [26]

https://github.com/cage-challenge/cage- challenge-3 (2022)

Group, T.C.W.: Ttcp cage challenge 3. https://github.com/cage-challenge/cage- challenge-3 (2022)

work page 2022

[27] [27]

IEEE Transactions on Network and Service Management19(3), 2333–2348 (2022)

Hammar, K., Stadler, R.: Intrusion prevention through optimal stopping. IEEE Transactions on Network and Service Management19(3), 2333–2348 (2022). https://doi.org/10.1109/TNSM.2022.3176781

work page doi:10.1109/tnsm.2022.3176781 2022

[28] [28]

In: International conference on decision and game theory for security

Han, Y., Rubinstein, B.I., Abraham, T., Alpcan, T., De Vel, O., Erfani, S., Hubczenko, D., Leckie, C., Montague, P.: Reinforcement learning for autonomous defence in software-defined networking. In: International conference on decision and game theory for security. Springer (2018)

work page 2018

[29] [29]

In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security

Hicks, C., Mavroudis, V., Foley, M., Davies, T., Highnam, K., Watson, T.: Canaries and Whistles: Resilient Drone Communication Networks with (or without) Deep Reinforcement Learning. In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. AISec ’23 (2023)

work page 2023

[30] [30]

ACM Transactions on Privacy and Security (2025)

Hore, S., Ghadermazi, J., Paudel, D., Shah, A., Das, T., Bastian, N.: Deep pack- gen: A deep reinforcement learning framework for adversarial network packet generation. ACM Transactions on Privacy and Security (2025)

work page 2025

[31] [31]

arXiv preprint arXiv:1912.01798 (2019)

Hou, C., Zhou, M., Ji, Y., Daian, P., Tramer, F., Fanti, G., Juels, A.: Squirrl: Automat- ing attack analysis on blockchain incentive mechanisms with deep reinforcement learning. arXiv preprint arXiv:1912.01798 (2019)

work page arXiv 1912

[32] [32]

In: 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

Hu, Z., Beuran, R., Tan, Y.: Automated penetration testing using deep reinforce- ment learning. In: 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE (2020)

work page 2020

[33] [33]

Multi-agent reinforcement learning: A comprehensive survey

Huh, D., Mohapatra, P.: Multi-agent reinforcement learning: A comprehensive survey. arXiv preprint arXiv:2312.10256 (2023)

work page arXiv 2023

[34] [34]

In: European Symposium on Research in Computer Security

Janisch, J., Pevn`y, T., Lis`y, V.: Nasimemu: Network attack simulator & emulator for training agents generalizing to novel scenarios. In: European Symposium on Research in Computer Security. pp. 589–608. Springer (2023)

work page 2023

[35] [35]

Kaloroumakis, P., Smith, M.: Toward a knowledge graph of cybersecurity coun- termeasures (2020)

work page 2020

[36] [36]

Karwowski, J., Hayman, O., Bai, X., Kiendlhofer, K., Griffin, C., Skalse, J.: Good- hart’s law in reinforcement learning (2023), https://arxiv.org/abs/2310.09144

work page arXiv 2023

[37] [37]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Kiely, M., Ahiskali, M., Borde, E., Bowman, B., Bowman, D., van Bruggen, D., Cowan, K., Dasgupta, P., Devendorf, E., Edwards, B., et al.: Exploring the efficacy of multi-agent reinforcement learning for autonomous cyber defence: A cage challenge 4 perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 28907–28913 (2025)

work page 2025

[38] [38]

AI Magazine46(3), e70021 (2025)

Kiely, M., Ahiskali, M., Borde, E., Bowman, B., Bowman, D., Van Bruggen, D., Cowan, K., Dasgupta, P., Devendorf, E., Edwards, B., et al.: Cage challenge 4: A scalable multi-agent reinforcement learning gym for autonomous cyber defence. AI Magazine46(3), e70021 (2025)

work page 2025

[39] [39]

Kiely, M., Bowman, D., Standen, M., Moir, C.: On autonomous agents in a cyber defence environment (2023), https://arxiv.org/abs/2309.07388

work page arXiv 2023

[40] [40]

King, I.J., Bowman, B., Huang, H.H.: Automated cyber defense with generalizable graph-based reinforcement learning agents (2025), https://arxiv.org/abs/2509. 16151

work page 2025

[41] [41]

ieee Spectrum50(3), 48–53 (2013)

Kushner, D.: The real story of stuxnet. ieee Spectrum50(3), 48–53 (2013)

work page 2013

[42] [42]

In: European Symposium on Research in Computer Security

Kvasov, A., Sahin, M., Hebert, C., De Oliveira, A.S.: Simulating deception for web applications using reinforcement learning. In: European Symposium on Research in Computer Security. Springer (2023)

work page 2023

[43] [43]

Applied Intelligence 53, 27110–27127 (2023)

Li, Q., Hu, M., Hao, H., Zhang, M., Li, Y.: INNES: An intelligent network penetra- tion testing model based on deep reinforcement learning. Applied Intelligence 53, 27110–27127 (2023)

work page 2023

[44] [44]

In: 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Luo, M., Xiong, W., Lee, G., Li, Y., Yang, X., Zhang, A., Tian, Y., Lee, H.H.S., Suh, G.E.: Autocat: Reinforcement learning for automated exploration of cache-timing attacks. In: 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE (2023)

work page 2023

[45] [45]

Computers & Security (2021)

Maeda, R., Mimura, M.: Automating post-exploitation with deep reinforcement learning. Computers & Security (2021)

work page 2021

[46] [46]

In: Proc

McFadden, S., Foley, M., D’Onghia, M., Hicks, C., Mavroudis, V., Paoletti, N., Pierazzi, F.: Drmd: Deep reinforcement learning for malware detection under concept drift. In: Proc. of the AAAI Conference on Artificial Intelligence (2026)

work page 2026

[47] [47]

In: IEEE Workshop on Deep Learning Security and Privacy (DLSP) (2024)

McFadden, S., Maugeri, M., Hicks, C., Mavroudis, V., Pierazzi, F.: Wendigo: Deep reinforcement learning for denial-of-service query discovery in graphql. In: IEEE Workshop on Deep Learning Security and Privacy (DLSP) (2024)

work page 2024

[48] [48]

arXiv preprint arXiv:2602.08690 (2026)

McFadden, S., Foley, M., Bates, E., Tsingenopoulos, I., Vyas, S., Mavroudis, V., Hicks, C., Pierazzi, F.: Sok: The pitfalls of deep reinforcement learning for cyber- security. arXiv preprint arXiv:2602.08690 (2026)

work page arXiv 2026

[49] [49]

org/abs/2111.02445

Mern, J., Hatch, K., Silva, R., Hickert, C., Sookoor, T., Kochenderfer, M.J.: Au- tonomous attack mitigation for industrial control systems (2021), https://arxiv. org/abs/2111.02445

work page arXiv 2021

[50] [50]

Mern, J., Sadigh, D., Kochenderfer, M.J.: Exchangeable input representations for reinforcement learning (2020), https://arxiv.org/abs/2003.09022

work page arXiv 2020

[51] [51]

Miles, I., Farmer, S., Foster, D., Harrold, D., Palmer, G., Parry, C., Willis, C., Casassa Mont, M., Gralewski, L., Menzies, R., Morarji, N., Turkbeyler, E., Wilson, A., Beard, A., Marques, P., Francis Roscoe, J., Bailey, S., Cheah, M., Dorn, M., Haubrick, P., Lacey, M., Rimmer, D., Stone, J., Till, D., Heartfield, R., Harrison, A., Short, J., Wilson, T.,...

work page 2024

[52] [52]

Molina-Markham, A., Miniter, C., Powell, B., Ridley, A.: Network environment design for autonomous cyberdefense (2021), https://arxiv.org/abs/2103.07583

work page arXiv 2021

[53] [53]

arXiv preprint arXiv:2505.22531 (2025)

Molina-Markham, A., Robaina, L., Steinle, S., Trivedi, A., Tsui, D., Potteiger, N., Brandt, L., Winder, R., Ridley, A.: Training rl agents for multi-objective network defense tasks. arXiv preprint arXiv:2505.22531 (2025)

work page arXiv 2025

[54] [54]

Nyberg, J., Johnson, P.: Structural generalization in autonomous cyber incident response with message-passing neural networks and reinforcement learning (2024), https://arxiv.org/abs/2407.05775

work page arXiv 2024

[55] [55]

In: Proceedings of the 17th Cyber Security Experimenta- tion and Test Workshop

Oesch, S., Chaulagain, A., Weber, B., Dixson, M., Sadovnik, A., Roberson, B., Wat- son, C., Austria, P.: Towards a high fidelity training environment for autonomous cyber defense agents. In: Proceedings of the 17th Cyber Security Experimenta- tion and Test Workshop. p. 91–99. CSET ’24, Association for Computing Ma- chinery, New York, NY, USA (2024). https...

work page doi:10.1145/3675741.3675752 2024

[56] [56]

Packer, C., Gao, K., Kos, J., Krähenbühl, P., Koltun, V., Song, D.: Assessing gener- alization in deep reinforcement learning (2018)

work page 2018

[57] [57]

Palmer, G., Parry, C., Harrold, D.J.B., Willis, C.: Deep reinforcement learning for autonomous cyber defence: A survey (2024), https://arxiv.org/abs/2310.07745

work page arXiv 2024

[58] [58]

Journal of Machine Learning Research (2024)

Patterson, A., Neumann, S., White, M., White, A.: Empirical design in reinforce- ment learning. Journal of Machine Learning Research (2024)

work page 2024

[59] [59]

Com- puters, Materials & Continua (2022)

Praveena, V., V., A., Chinnasamy, P., Ali, I., Alroobaea, R., Alyahyan, S.Y., Raza, M.A.: Optimal deep reinforcement learning for intrusion detection in uavs. Com- puters, Materials & Continua (2022)

work page 2022

[60] [60]

Operations Research70(6), 3601–3628 (2022)

Qu, G., Wierman, A., Li, N.: Scalable reinforcement learning for multiagent networked systems. Operations Research70(6), 3601–3628 (2022)

work page 2022

[61] [61]

Applying communication privacy management theory to youth privacy management in AI contexts,

Samaddar, A., Potteiger, N., Koutsoukos, X.: Out-of-distribution detec- tion for neurosymbolic autonomous cyber agents. In: 2025 IEEE 4th In- ternational Conference on AI in Cybersecurity (ICAIC). pp. 1–9 (2025). https://doi.org/10.1109/ICAIC63015.2025.10849024

work page doi:10.1109/icaic63015.2025.10849024 2025

[62] [62]

https:// networkattacksimulator.readthedocs.io/ (2019)

Schwartz, J., Kurniawatti, H.: Nasim: Network attack simulator. https:// networkattacksimulator.readthedocs.io/ (2019)

work page 2019

[63] [63]

In: Force Readiness for Multi-Domain Operations through Modelling and Simulation: NATO Modelling and Simulation Group (MSG) Symposium (MSG-229)

Short, J.: The essential role of modelling and simulation in helping ai fight cyber-attacks. In: Force Readiness for Multi-Domain Operations through Modelling and Simulation: NATO Modelling and Simulation Group (MSG) Symposium (MSG-229). No. STO-MP-MSG-229 in STO Meeting Proceedings, NATO Science and Technology Organization (STO) (2025), https://publicati...

work page 2025

[64] [64]

(2022), https://docs.oasis-open.org/openc2/oc2arch/v1.0/oc2arch- v1.0.html

Sparrell, D.: Open command and control (openc2) architecture specification version 1.0. (2022), https://docs.oasis-open.org/openc2/oc2arch/v1.0/oc2arch- v1.0.html

work page 2022

[65] [65]

In: IJCAI-21 1st Interna- tional Workshop on Adaptive Cyber Defense (2021)

Standen, M., Lucas, M., Bowman, D., Richer, T., Kim, J., Marriott, D.: CybORG: A gym for the development of autonomous cyber agents. In: IJCAI-21 1st Interna- tional Workshop on Adaptive Cyber Defense (2021)

work page 2021

[66] [66]

MIT Press, second edn

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, second edn. (2018), http://incompleteideas.net/book/the-book-2nd.html

work page 2018

[67] [67]

Symes Thompson, I., Caron, A., Hicks, C., Mavroudis, V.: Entity-based reinforce- ment learning for autonomous cyber defence (2025), https://arxiv.org/abs/2410. 17647

work page 2025

[68] [68]

Team., M.D.R.: Cyberbattlesim. https://github.com/microsoft/ cyberbattlesim (2021), created by Christian Seifert, Michael Betser, William Blum, James Bono, Kate Farris, Emily Goren, Justin Grana, Kris- tian Holsheimer, Brandon Marken, Joshua Neil, Nicole Nichols, Jugal Parikh, Haoran Wei

work page 2021

[69] [69]

In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses (2024)

Terranova, F., Lahmadi, A., Chrisment, I.: Leveraging deep reinforcement learning for cyber-attack paths prediction: Formulation, generalization, and evaluation. In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses (2024)

work page 2024

[70] [70]

In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses (2024)

Tsingenopoulos, I., Cortellazzi, J., Bosansk`y, B., Aonzo, S., Preuveneers, D., Joosen, W., Pierazzi, F., Cavallaro, L.: How to train your antivirus: Rl-based hardening through the problem space. In: Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses (2024)

work page 2024

[71] [71]

ACM Comput

Vyas, S., Mavroudis, V., Burnap, P.: Towards the Deployment of Realistic Au- tonomous Cyber Network Defence: A Systematic Review. ACM Comput. Surv. 58(1) (Aug 2025). https://doi.org/10.1145/3729213

work page doi:10.1145/3729213 2025

[72] [72]

Frontiers of Computer Science (2025)

Yang, Y., Chen, L., Liu, S., Wang, L., Fu, H., Liu, X., Chen, Z.: Behaviour-diverse automatic penetration testing: a coverage-based deep reinforcement learning approach. Frontiers of Computer Science (2025)

work page 2025

[73] [73]

A survey on self-play methods in reinforcement learning

Zhang, R., Xu, Z., Ma, C., Yu, C., Tu, W.W., Tang, W., Huang, S., Ye, D., Ding, W., Yang, Y., et al.: A survey on self-play methods in reinforcement learning. arXiv preprint arXiv:2408.01072 (2024)

work page arXiv 2024