LLM agents security duality: a comprehensive survey of self-security and empowered cybersecurity
Pith reviewed 2026-06-30 01:29 UTC · model grok-4.3
The pith
LLM agents create mutual reinforcement between their own security and their use to strengthen cybersecurity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By systematically surveying threats to LLM agents and their mitigations on one side and the application of agent capabilities across the full cyber offense-defense lifecycle on the other, the work identifies a positive feedback synergy between LLM agents self-security and empowered cybersecurity. It presents the first agent-empowerment framework aligned with that lifecycle and outlines limitations plus future research directions to advance both areas in tandem.
What carries the argument
The agent-empowerment framework that aligns LLM agent capabilities with the full cyber offense-defense lifecycle.
If this is right
- Coordinated development of LLM agents self-security and agent empowered cybersecurity becomes feasible.
- More capable and robust agent applications result from the synergy.
- New insights emerge for advancing both self-security and empowered cybersecurity.
- Current limitations are identified and promising directions for future research are outlined.
Where Pith is reading between the lines
- Integrated research approaches that treat agent protection and agent-assisted cyber operations as linked rather than separate tracks could accelerate progress.
- Vulnerabilities discovered in agent self-security may directly affect the reliability of agent-based cyber defense systems.
- Testing the framework on deployed LLM agents in operational environments would reveal whether the full lifecycle alignment holds in practice.
Load-bearing premise
The surveyed literature is comprehensive and representative, and the taxonomy by threat sources plus the agent-empowerment framework organize the field without major omissions or selection bias.
What would settle it
Discovery of a substantial body of LLM agent security research that cannot be fit into the proposed threat-source taxonomy or that falls outside the agent-empowerment framework's coverage of the offense-defense lifecycle.
read the original abstract
Large language model (LLM) agents are rapidly being integrated into real-world systems. Their autonomy and tool-use capabilities generate substantial value while simultaneously expanding the security attack surface. This survey provides a comprehensive overview of the opportunities and challenges of LLM agents in security, focusing on two core areas: (1) threats to LLM agents themselves and corresponding mitigation strategies (LLM agents self-security), and (2) the role of LLM agents in empowering the cybersecurity lifecycle across offense and defense (LLM agents empowered cybersecurity). We first examine the internal and external attack surfaces of agents, propose a taxonomy organized by threat sources, and analyze associated mitigations and evaluation frameworks. We then investigate how agent capabilities are applied in cybersecurity practice and present, to our knowledge, the first agent-empowerment framework aligned with the full cyber offense-defense lifecycle. By systematically surveying these two areas, we are the first to highlight a positive feedback synergy between LLM agents self-security and empowered cybersecurity, offering new insights for the advancement of both. We further identify current limitations and outline promising directions for future research. The insights provided aim to catalyze the coordinated development of LLM agents self-security and agent empowered cybersecurity, paving the way for more capable and robust agent applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper surveys security issues for LLM agents in two directions: (1) threats to LLM agents themselves along with mitigations (self-security), organized via a taxonomy by threat sources, and (2) the application of LLM agents to the full cybersecurity offense-defense lifecycle (empowered cybersecurity). It proposes an agent-empowerment framework, claims to be the first to identify a positive feedback synergy between the two areas, and outlines limitations and future directions.
Significance. A well-executed survey that rigorously documents literature coverage could provide a useful organizing lens and identify actionable synergies for LLM-agent security research. The proposed taxonomy and lifecycle-aligned framework would be valuable contributions if shown to be comprehensive and free of major omissions. However, the absence of any documented search protocol prevents assessment of whether the coverage supports the 'first' and 'comprehensive' claims.
major comments (2)
- [Abstract / Introduction] Abstract and Introduction: The manuscript asserts that it provides a 'comprehensive overview,' is 'the first to highlight a positive feedback synergy,' and presents 'to our knowledge, the first agent-empowerment framework aligned with the full cyber offense-defense lifecycle.' No search strategy, databases queried, keywords, inclusion/exclusion criteria, screening process, or date range is described anywhere in the text. This omission renders the completeness, representativeness, and novelty assertions unverifiable and directly load-bearing for the central contribution.
- [Taxonomy / Framework sections] Taxonomy and framework sections: The taxonomy organized by threat sources and the agent-empowerment framework are presented as novel syntheses. Without an explicit account of how prior work was identified and filtered, it is impossible to determine whether relevant literature on LLM-agent attack surfaces, red-teaming frameworks, or cyber-lifecycle automation was omitted, undermining the claim that these structures accurately organize the field.
minor comments (1)
- [Abstract] The abstract and introduction repeat the phrase 'to our knowledge' for the framework claim; a single, precise statement of novelty would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The absence of a documented search protocol is a valid concern that affects the verifiability of our comprehensiveness and novelty claims. We will add a dedicated methodology section in the revised manuscript to address both major comments.
read point-by-point responses
-
Referee: [Abstract / Introduction] Abstract and Introduction: The manuscript asserts that it provides a 'comprehensive overview,' is 'the first to highlight a positive feedback synergy,' and presents 'to our knowledge, the first agent-empowerment framework aligned with the full cyber offense-defense lifecycle.' No search strategy, databases queried, keywords, inclusion/exclusion criteria, screening process, or date range is described anywhere in the text. This omission renders the completeness, representativeness, and novelty assertions unverifiable and directly load-bearing for the central contribution.
Authors: We acknowledge that the current manuscript does not include an explicit description of the literature search process. In the revision we will insert a new 'Survey Methodology' subsection (placed after the Introduction) that details the databases queried (arXiv, Google Scholar, IEEE Xplore, ACM Digital Library), the keyword combinations and Boolean strings used separately for self-security and empowered-cybersecurity topics, the publication date range (primarily 2022 onward), inclusion/exclusion criteria (relevance to LLM agents, peer-reviewed or pre-print status, exclusion of non-technical works), and the two-stage screening process. This addition will make the coverage claims verifiable while preserving the 'to our knowledge' qualifier on novelty. revision: yes
-
Referee: [Taxonomy / Framework sections] Taxonomy and framework sections: The taxonomy organized by threat sources and the agent-empowerment framework are presented as novel syntheses. Without an explicit account of how prior work was identified and filtered, it is impossible to determine whether relevant literature on LLM-agent attack surfaces, red-teaming frameworks, or cyber-lifecycle automation was omitted, undermining the claim that these structures accurately organize the field.
Authors: We agree that transparency regarding literature selection is required to substantiate the taxonomy and framework. The new methodology section will explicitly map the search results onto the threat-source taxonomy categories and the offense-defense lifecycle stages of the empowerment framework, including the criteria used for categorization and any iterative refinement steps. We will also add a short limitations paragraph noting that, despite the broad search, rapidly emerging works may still be missed and that the structures reflect the literature available at the time of the survey. revision: yes
Circularity Check
No circularity: survey synthesis with external literature base
full rationale
This is a literature survey paper with no derivation chain, equations, parameter fitting, or self-referential reductions. Claims of being 'first' to highlight synergy or present a framework are novelty assertions resting on the completeness of the surveyed external literature, not on any step that reduces by construction to the paper's own inputs or self-citations. No load-bearing self-citation chains, ansatzes, or renamings of known results are present. The work is self-contained as an organizational synthesis against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agents are rapidly being integrated into real-world systems with autonomy and tool-use capabilities that generate value while expanding the security attack surface.
Reference graph
Works this paper leans on
-
[1]
In: 32nd USENIX Security Symposium (USENIX Security 23)
Abdelnabi S, Fritz M (2023) Fact-Saboteurs: A taxonomy of evidence manipulation attacks against Fact-Verification systems. In: 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, pp 6719–6736, URL https://www.usenix.org/conference/usenixsecurity23/ presentation/abdelnabi Abdelnabi S, Gomaa A, Sivaprasad S, et al (2024) Ll...
-
[2]
https://doi.org/https://doi.org/10.1007/s10462-025-11338-z Beretas C (2024) Information systems security, detection and recovery from cyber attacks. Universal Library of Engineering Technology 1(1) Bianou SG, Batogna RG (2024) Pentest-ai, an llm-powered multi-agents framework for penetration testing automation leveraging mitre attack. In: 2024 IEEE Intern...
-
[3]
Cur- ran Associates, Inc., pp 1877–1901, URL https://proceedings.neurips.cc/paper files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf Brown T, Mann B, Ryder N, et al (2020b) Language models are few-shot learners. Advances in neural information processing systems 33:1877–1901 Bryniarski O, Hingun N, Pachuca P, et al (2022) Evading adversarial...
-
[4]
Association for Computational Linguistics, Online, pp 1536–1547, https://doi.org/10. 18653/v1/2020.findings-emnlp.139, URL https://aclanthology.org/2020.findings-emnlp.139/ Ferrag MA, Alwahedi F, Battah A, et al (2025) Generative ai in cybersecurity: A comprehensive review of llm applications and vulnerabilities. Internet of Things and Cyber-Physical Syst...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.iotcps.2025.01.001 2020
-
[5]
URL https://www.ibm.com/downloads/documents/ us-en/107a02e94948f4ec.https://www.ibm.com/security/digital-assets/cost-data-breach-report/ 1CostofaDataBreachReport2020.pdf Inan H, Upasani K, Chi J, et al (2023) Llama guard: Llm-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674 Ismail, Kurnia R, Brata ZA, et al (2025) T...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.3390/info16050365 2023
-
[6]
Association for Computational Linguistics, Albuquerque, New Mexico, pp 1160–1183, https://doi.org/10.18653/v1/2025.findings-naacl.65, URL https://aclanthology.org/2025.findings-naacl.65/ Luo W, Dai S, Liu X, et al (2025a) Agrail: A lifelong agent guardrail with effective and adaptive safety detection. arXiv preprint arXiv:2502.11448 Luo X, Rechardt A, Sun...
-
[7]
URL https://meta-llama.github.io/PurpleLlama/CyberSecEval/ Mialon G, Fourrier C, Wolf T, et al (2024) GAIA: a benchmark for general AI assistants. In: The Twelfth International Conference on Learning Representations, URL https://openreview.net/forum? id=fibxvahvs3 Microsoft (n.d.) Azure content moderator. URL https://learn.microsoft.com/en-us/azure/ai-ser...
-
[8]
MemGPT: Towards LLMs as Operating Systems
USENIX Association, USA, SSYM’05, p 8 Ou X, Govindavajhala S, Appel AW, et al (2005b) Mulval: A logic-based network security analyzer. In: USENIX security symposium, Baltimore, MD, pp 113–128 OverTheWire (2024) Overthewire wargames. URL https://overthewire.org/wargames/ Packer C, Wooders S, Lin K, et al (2023) Memgpt: Towards llms as operating systems. ar...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3439950 2024
-
[9]
Curran Asso- ciates, Inc., pp 111715–111759, URL https://proceedings.neurips.cc/paper files/paper/2024/file/ ca9567d8ef6b2ea2da0d7eed57b933ee-Paper-Conference.pdf Piet J, Alrashed M, Sitawarin C, et al (2024) Jatmo: Prompt injection defense by task-specific finetuning. In: European Symposium on Research in Computer Security, pp 105–124, https://doi.org/10...
-
[10]
https://doi.org/10.3389/fpsyg.2018.00135, URL https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2018.00135 Reid A, O’Callaghan S, Carroll L, et al (2025) Risk analysis techniques for governed llm-based multi-agent systems. arXiv preprint arXiv:2508.05687 Ren Q, Li H, Liu D, et al (2025) LLMs know their vulnerabilities: Uncover safety ...
-
[11]
Deliberate reasoning in language models as structure-aware planning with an accurate world model
Curran Associates, Inc., pp 57472–57498, URL https://proceedings.neurips.cc/paper files/paper/2024/file/ 69d97a6493fbf016fff0a751f253ad18-Paper-Datasets and Benchmarks Track.pdf Shashwat K, Hahn F, Ou X, et al (2024) A preliminary study on using large language models in software pentesting. arXiv preprint arXiv:240117459 arXiv:2401.17459 65 Shen X, Wang L...
-
[12]
Curran Associates, Inc., pp 9460–9471, URL https://proceedings.neurips.cc/paper files/paper/2022/ file/3d719fee332caa23d5038b8a90e81796-Paper-Conference.pdf Skopik F, Pahi T (2020) Under false flag: using technical artifacts for cyber attack attribution. Cybersecurity 3(1):8. https://doi.org/https://doi.org/10.1186/s42400-020-00048-4 Song C, Ma L, Zheng J...
-
[13]
Curran Associates, Inc., pp 24824–24837, URL https://proceedings. neurips.cc/paper files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf Wei Z, Chen WL, Meng Y (2024a) Instructrag: Instructing retrieval augmented generation via self- synthesized rationales. In: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Le...
-
[14]
Curran Associates, Inc., pp 99040–99088, URL https://proceedings. neurips.cc/paper files/paper/2024/file/b35c38f70065ac6c694089ca93a015bb-Paper-Conference.pdf Zheng Q, Xu Z, Choudhry A, et al (2023) Synergizing human-ai agency: a guide of 23 heuristics for service co-creation with llm-based agents. arXiv preprint arXiv:2310.15065 Zhong PY, Chen S, Wang R,...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.