pith. sign in

arxiv: 2606.28450 · v1 · pith:SD3ZMK4Gnew · submitted 2026-06-26 · 💻 cs.CR · cs.AI

LLM agents security duality: a comprehensive survey of self-security and empowered cybersecurity

Pith reviewed 2026-06-30 01:29 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords LLM agentsself-securityempowered cybersecuritycybersecurity surveythreat taxonomyagent-empowerment frameworkoffense-defense lifecycle
0
0 comments X

The pith

LLM agents create mutual reinforcement between their own security and their use to strengthen cybersecurity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys threats to LLM agents along with mitigation strategies and examines how these agents can empower both offensive and defensive cybersecurity operations. It organizes the threats into a taxonomy by source and introduces an agent-empowerment framework that covers the complete offense-defense lifecycle. The central point is that these two domains exhibit a positive feedback synergy, where progress in securing agents aids their cybersecurity applications and vice versa, which supports development of more effective agent systems.

Core claim

By systematically surveying threats to LLM agents and their mitigations on one side and the application of agent capabilities across the full cyber offense-defense lifecycle on the other, the work identifies a positive feedback synergy between LLM agents self-security and empowered cybersecurity. It presents the first agent-empowerment framework aligned with that lifecycle and outlines limitations plus future research directions to advance both areas in tandem.

What carries the argument

The agent-empowerment framework that aligns LLM agent capabilities with the full cyber offense-defense lifecycle.

If this is right

  • Coordinated development of LLM agents self-security and agent empowered cybersecurity becomes feasible.
  • More capable and robust agent applications result from the synergy.
  • New insights emerge for advancing both self-security and empowered cybersecurity.
  • Current limitations are identified and promising directions for future research are outlined.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Integrated research approaches that treat agent protection and agent-assisted cyber operations as linked rather than separate tracks could accelerate progress.
  • Vulnerabilities discovered in agent self-security may directly affect the reliability of agent-based cyber defense systems.
  • Testing the framework on deployed LLM agents in operational environments would reveal whether the full lifecycle alignment holds in practice.

Load-bearing premise

The surveyed literature is comprehensive and representative, and the taxonomy by threat sources plus the agent-empowerment framework organize the field without major omissions or selection bias.

What would settle it

Discovery of a substantial body of LLM agent security research that cannot be fit into the proposed threat-source taxonomy or that falls outside the agent-empowerment framework's coverage of the offense-defense lifecycle.

read the original abstract

Large language model (LLM) agents are rapidly being integrated into real-world systems. Their autonomy and tool-use capabilities generate substantial value while simultaneously expanding the security attack surface. This survey provides a comprehensive overview of the opportunities and challenges of LLM agents in security, focusing on two core areas: (1) threats to LLM agents themselves and corresponding mitigation strategies (LLM agents self-security), and (2) the role of LLM agents in empowering the cybersecurity lifecycle across offense and defense (LLM agents empowered cybersecurity). We first examine the internal and external attack surfaces of agents, propose a taxonomy organized by threat sources, and analyze associated mitigations and evaluation frameworks. We then investigate how agent capabilities are applied in cybersecurity practice and present, to our knowledge, the first agent-empowerment framework aligned with the full cyber offense-defense lifecycle. By systematically surveying these two areas, we are the first to highlight a positive feedback synergy between LLM agents self-security and empowered cybersecurity, offering new insights for the advancement of both. We further identify current limitations and outline promising directions for future research. The insights provided aim to catalyze the coordinated development of LLM agents self-security and agent empowered cybersecurity, paving the way for more capable and robust agent applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper surveys security issues for LLM agents in two directions: (1) threats to LLM agents themselves along with mitigations (self-security), organized via a taxonomy by threat sources, and (2) the application of LLM agents to the full cybersecurity offense-defense lifecycle (empowered cybersecurity). It proposes an agent-empowerment framework, claims to be the first to identify a positive feedback synergy between the two areas, and outlines limitations and future directions.

Significance. A well-executed survey that rigorously documents literature coverage could provide a useful organizing lens and identify actionable synergies for LLM-agent security research. The proposed taxonomy and lifecycle-aligned framework would be valuable contributions if shown to be comprehensive and free of major omissions. However, the absence of any documented search protocol prevents assessment of whether the coverage supports the 'first' and 'comprehensive' claims.

major comments (2)
  1. [Abstract / Introduction] Abstract and Introduction: The manuscript asserts that it provides a 'comprehensive overview,' is 'the first to highlight a positive feedback synergy,' and presents 'to our knowledge, the first agent-empowerment framework aligned with the full cyber offense-defense lifecycle.' No search strategy, databases queried, keywords, inclusion/exclusion criteria, screening process, or date range is described anywhere in the text. This omission renders the completeness, representativeness, and novelty assertions unverifiable and directly load-bearing for the central contribution.
  2. [Taxonomy / Framework sections] Taxonomy and framework sections: The taxonomy organized by threat sources and the agent-empowerment framework are presented as novel syntheses. Without an explicit account of how prior work was identified and filtered, it is impossible to determine whether relevant literature on LLM-agent attack surfaces, red-teaming frameworks, or cyber-lifecycle automation was omitted, undermining the claim that these structures accurately organize the field.
minor comments (1)
  1. [Abstract] The abstract and introduction repeat the phrase 'to our knowledge' for the framework claim; a single, precise statement of novelty would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The absence of a documented search protocol is a valid concern that affects the verifiability of our comprehensiveness and novelty claims. We will add a dedicated methodology section in the revised manuscript to address both major comments.

read point-by-point responses
  1. Referee: [Abstract / Introduction] Abstract and Introduction: The manuscript asserts that it provides a 'comprehensive overview,' is 'the first to highlight a positive feedback synergy,' and presents 'to our knowledge, the first agent-empowerment framework aligned with the full cyber offense-defense lifecycle.' No search strategy, databases queried, keywords, inclusion/exclusion criteria, screening process, or date range is described anywhere in the text. This omission renders the completeness, representativeness, and novelty assertions unverifiable and directly load-bearing for the central contribution.

    Authors: We acknowledge that the current manuscript does not include an explicit description of the literature search process. In the revision we will insert a new 'Survey Methodology' subsection (placed after the Introduction) that details the databases queried (arXiv, Google Scholar, IEEE Xplore, ACM Digital Library), the keyword combinations and Boolean strings used separately for self-security and empowered-cybersecurity topics, the publication date range (primarily 2022 onward), inclusion/exclusion criteria (relevance to LLM agents, peer-reviewed or pre-print status, exclusion of non-technical works), and the two-stage screening process. This addition will make the coverage claims verifiable while preserving the 'to our knowledge' qualifier on novelty. revision: yes

  2. Referee: [Taxonomy / Framework sections] Taxonomy and framework sections: The taxonomy organized by threat sources and the agent-empowerment framework are presented as novel syntheses. Without an explicit account of how prior work was identified and filtered, it is impossible to determine whether relevant literature on LLM-agent attack surfaces, red-teaming frameworks, or cyber-lifecycle automation was omitted, undermining the claim that these structures accurately organize the field.

    Authors: We agree that transparency regarding literature selection is required to substantiate the taxonomy and framework. The new methodology section will explicitly map the search results onto the threat-source taxonomy categories and the offense-defense lifecycle stages of the empowerment framework, including the criteria used for categorization and any iterative refinement steps. We will also add a short limitations paragraph noting that, despite the broad search, rapidly emerging works may still be missed and that the structures reflect the literature available at the time of the survey. revision: yes

Circularity Check

0 steps flagged

No circularity: survey synthesis with external literature base

full rationale

This is a literature survey paper with no derivation chain, equations, parameter fitting, or self-referential reductions. Claims of being 'first' to highlight synergy or present a framework are novelty assertions resting on the completeness of the surveyed external literature, not on any step that reduces by construction to the paper's own inputs or self-citations. No load-bearing self-citation chains, ansatzes, or renamings of known results are present. The work is self-contained as an organizational synthesis against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The survey rests on the existing body of LLM agent and cybersecurity literature without introducing fitted parameters or new entities; the foundational premise is a domain assumption about agent capabilities.

axioms (1)
  • domain assumption LLM agents are rapidly being integrated into real-world systems with autonomy and tool-use capabilities that generate value while expanding the security attack surface.
    This premise is stated directly in the abstract and underpins both the self-security and empowered cybersecurity sections.

pith-pipeline@v0.9.1-grok · 5766 in / 1293 out tokens · 51950 ms · 2026-06-30T01:29:52.940178+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 3 internal anchors

  1. [1]

    In: 32nd USENIX Security Symposium (USENIX Security 23)

    Abdelnabi S, Fritz M (2023) Fact-Saboteurs: A taxonomy of evidence manipulation attacks against Fact-Verification systems. In: 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, pp 6719–6736, URL https://www.usenix.org/conference/usenixsecurity23/ presentation/abdelnabi Abdelnabi S, Gomaa A, Sivaprasad S, et al (2024) Ll...

  2. [2]

    https://doi.org/https://doi.org/10.1007/s10462-025-11338-z Beretas C (2024) Information systems security, detection and recovery from cyber attacks. Universal Library of Engineering Technology 1(1) Bianou SG, Batogna RG (2024) Pentest-ai, an llm-powered multi-agents framework for penetration testing automation leveraging mitre attack. In: 2024 IEEE Intern...

  3. [3]

    Cur- ran Associates, Inc., pp 1877–1901, URL https://proceedings.neurips.cc/paper files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf Brown T, Mann B, Ryder N, et al (2020b) Language models are few-shot learners. Advances in neural information processing systems 33:1877–1901 Bryniarski O, Hingun N, Pachuca P, et al (2022) Evading adversarial...

  4. [4]

    Association for Computational Linguistics, Online, pp 1536–1547, https://doi.org/10. 18653/v1/2020.findings-emnlp.139, URL https://aclanthology.org/2020.findings-emnlp.139/ Ferrag MA, Alwahedi F, Battah A, et al (2025) Generative ai in cybersecurity: A comprehensive review of llm applications and vulnerabilities. Internet of Things and Cyber-Physical Syst...

  5. [5]

    URL https://www.ibm.com/downloads/documents/ us-en/107a02e94948f4ec.https://www.ibm.com/security/digital-assets/cost-data-breach-report/ 1CostofaDataBreachReport2020.pdf Inan H, Upasani K, Chi J, et al (2023) Llama guard: Llm-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674 Ismail, Kurnia R, Brata ZA, et al (2025) T...

  6. [6]

    ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

    Association for Computational Linguistics, Albuquerque, New Mexico, pp 1160–1183, https://doi.org/10.18653/v1/2025.findings-naacl.65, URL https://aclanthology.org/2025.findings-naacl.65/ Luo W, Dai S, Liu X, et al (2025a) Agrail: A lifelong agent guardrail with effective and adaptive safety detection. arXiv preprint arXiv:2502.11448 Luo X, Rechardt A, Sun...

  7. [7]

    In: The Twelfth International Conference on Learning Representations, URL https://openreview.net/forum? id=fibxvahvs3 Microsoft (n.d.) Azure content moderator

    URL https://meta-llama.github.io/PurpleLlama/CyberSecEval/ Mialon G, Fourrier C, Wolf T, et al (2024) GAIA: a benchmark for general AI assistants. In: The Twelfth International Conference on Learning Representations, URL https://openreview.net/forum? id=fibxvahvs3 Microsoft (n.d.) Azure content moderator. URL https://learn.microsoft.com/en-us/azure/ai-ser...

  8. [8]

    MemGPT: Towards LLMs as Operating Systems

    USENIX Association, USA, SSYM’05, p 8 Ou X, Govindavajhala S, Appel AW, et al (2005b) Mulval: A logic-based network security analyzer. In: USENIX security symposium, Baltimore, MD, pp 113–128 OverTheWire (2024) Overthewire wargames. URL https://overthewire.org/wargames/ Packer C, Wooders S, Lin K, et al (2023) Memgpt: Towards llms as operating systems. ar...

  9. [9]

    Curran Asso- ciates, Inc., pp 111715–111759, URL https://proceedings.neurips.cc/paper files/paper/2024/file/ ca9567d8ef6b2ea2da0d7eed57b933ee-Paper-Conference.pdf Piet J, Alrashed M, Sitawarin C, et al (2024) Jatmo: Prompt injection defense by task-specific finetuning. In: European Symposium on Research in Computer Security, pp 105–124, https://doi.org/10...

  10. [10]

    arXiv preprint arXiv:2508.05687 Ren Q, Li H, Liu D, et al (2025) LLMs know their vulnerabilities: Uncover safety gaps through natural dis- tribution shifts

    https://doi.org/10.3389/fpsyg.2018.00135, URL https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2018.00135 Reid A, O’Callaghan S, Carroll L, et al (2025) Risk analysis techniques for governed llm-based multi-agent systems. arXiv preprint arXiv:2508.05687 Ren Q, Li H, Liu D, et al (2025) LLMs know their vulnerabilities: Uncover safety ...

  11. [11]

    ISBN 979-8-89176-189-6

    Curran Associates, Inc., pp 57472–57498, URL https://proceedings.neurips.cc/paper files/paper/2024/file/ 69d97a6493fbf016fff0a751f253ad18-Paper-Datasets and Benchmarks Track.pdf Shashwat K, Hahn F, Ou X, et al (2024) A preliminary study on using large language models in software pentesting. arXiv preprint arXiv:240117459 arXiv:2401.17459 65 Shen X, Wang L...

  12. [12]

    Cybersecurity 3(1):8

    Curran Associates, Inc., pp 9460–9471, URL https://proceedings.neurips.cc/paper files/paper/2022/ file/3d719fee332caa23d5038b8a90e81796-Paper-Conference.pdf Skopik F, Pahi T (2020) Under false flag: using technical artifacts for cyber attack attribution. Cybersecurity 3(1):8. https://doi.org/https://doi.org/10.1186/s42400-020-00048-4 Song C, Ma L, Zheng J...

  13. [13]

    Curran Associates, Inc., pp 24824–24837, URL https://proceedings. neurips.cc/paper files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf Wei Z, Chen WL, Meng Y (2024a) Instructrag: Instructing retrieval augmented generation via self- synthesized rationales. In: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Le...

  14. [14]

    Curran Associates, Inc., pp 99040–99088, URL https://proceedings. neurips.cc/paper files/paper/2024/file/b35c38f70065ac6c694089ca93a015bb-Paper-Conference.pdf Zheng Q, Xu Z, Choudhry A, et al (2023) Synergizing human-ai agency: a guide of 23 heuristics for service co-creation with llm-based agents. arXiv preprint arXiv:2310.15065 Zhong PY, Chen S, Wang R,...