PocketAgents: A Manifest-Driven Library of Autonomous Defense Agents

\'Agney Lopes Roth Ferraz; Louren\c{c}o Alves Pereira J\'unior; Sidnei Barbieri

arxiv: 2605.21694 · v1 · pith:I4ZS7GSTnew · submitted 2026-05-20 · 💻 cs.CR · cs.AI

PocketAgents: A Manifest-Driven Library of Autonomous Defense Agents

Sidnei Barbieri , \'Agney Lopes Roth Ferraz , Louren\c{c}o Alves Pereira J\'unior This is my paper

Pith reviewed 2026-05-22 09:15 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords autonomous defense agentsLLM-driven securitymanifest-driven librarytyped reportsschema validationcyber defenseattack containment

0 comments

The pith

A typed boundary around LLM agents makes defensive actions measurable, extensible, and attributable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a library in which each defense agent is defined by three data files: a manifest listing allowed actions, a prompt, and runtime context. The runtime limits what the agent can see and requires every output to be a typed report that matches an entry in the manifest. Experiments running two such agents against a simulated attack produced validated blocking actions in thirteen of eighteen trials, with four outputs rejected by schema checks and one valid decision to take no action. A reader would care because the structure converts open-ended model responses into trackable decisions that can be audited, extended, or improved without rewriting code.

Core claim

PocketAgents installs each autonomous defense agent as three data files: a manifest, a prompt, and a runtime context. The shared runtime gives the agent bounded telemetry access and accepts only typed reports whose requested action appears in the manifest. In eighteen closed-loop trials of a DarkSide-inspired attack on a small enterprise topology, thirteen trials produced validated network-block actions that contained the attack, four failed schema validation, and one produced a valid no-action decision.

What carries the argument

The manifest that enumerates permitted actions together with schema validation on every agent output, which carries the argument by enforcing boundaries that turn LLM decisions into measurable and attributable events.

If this is right

Defense successes and failures become countable because every output is either accepted as a listed action or rejected by schema check.
New agents can be added by supplying new manifest, prompt, and context files without altering the underlying runtime.
Each validated action can be traced to the specific manifest entry that permitted it.
Schema failures are logged separately, creating a clear record of where the model did not meet the required format.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same manifest-plus-validation pattern could be reused for LLM agents that control non-security tasks such as automated configuration changes.
Over time, shared manifests might allow different organizations to compare how well their agents perform on identical action lists.
Extending the manifest to include notification or logging actions would let the library handle a wider range of defensive responses.

Load-bearing premise

The language model will reliably emit outputs that pass schema validation and that the chosen testbed and attack scenario capture the essential difficulties of real defensive work.

What would settle it

A new set of trials in which most model outputs fail schema validation or the agents fail to contain the attack when the network topology or attack sequence is changed would show that the typed boundary does not reliably deliver measurable defense.

Figures

Figures reproduced from arXiv: 2605.21694 by \'Agney Lopes Roth Ferraz, Louren\c{c}o Alves Pereira J\'unior, Sidnei Barbieri.

**Figure 1.** Figure 1: PocketAgents runtime. Data-only agents cross a typed boundary before any action is subject to enforcement. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

read the original abstract

Connecting large language models (LLMs) to defensive enforcement requires more than asking a model whether an attack is happening. A defender must decide which model outputs may change the system state, which outputs must be rejected, and how failures should be recorded. We present PocketAgents, a manifest-driven library of autonomous defense agents. Each agent is installed as three data files: a manifest, a prompt, and a runtime context. The shared runtime gives the agent bounded telemetry access and accepts only typed reports whose requested action appears in the manifest. We implemented PocketAgents on top of a cyber arena (Perry), a cyber-deception testbed, and evaluated two agents, Command and Control and Exfiltration, in 18 closed-loop trials of a DarkSide-inspired attack on a small enterprise topology. Thirteen trials produced validated network-block actions and contained the attack; four failed schema validation; one produced a valid no-action decision. The experiments show that a typed boundary makes LLM-driven defense measurable, extensible, and attributable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces PocketAgents, a manifest-driven library for autonomous LLM-based defense agents. Each agent is defined by three data files (manifest, prompt, runtime context) that enforce a typed boundary: the runtime provides bounded telemetry and accepts only schema-validated reports whose requested actions are explicitly listed in the manifest. The authors implement two agents (Command and Control, Exfiltration) on the Perry cyber arena and evaluate them in 18 closed-loop trials against a DarkSide-inspired attack on a small enterprise topology, reporting 13 validated network-block actions that contained the attack, 4 schema-validation failures, and 1 valid no-action decision. They conclude that the typed boundary renders LLM-driven defense measurable, extensible, and attributable.

Significance. If the central claim holds, the work offers a concrete mechanism for safely integrating LLMs into defensive enforcement by making actions auditable and attributable through explicit manifests and schema validation. The closed-loop trials on the Perry testbed constitute a strength, providing an end-to-end demonstration rather than isolated prompt evaluations. The approach could support extensible agent libraries if the reliability issues are resolved.

major comments (2)

[Abstract] Abstract: The central claim that the typed boundary 'makes LLM-driven defense measurable, extensible, and attributable' is load-bearing on reliable schema adherence, yet 4 of 18 trials (22%) failed schema validation with no accompanying error analysis, failure-mode breakdown, or discussion of how rejected outputs affect attribution and measurability.
[Evaluation] Evaluation section (inferred from abstract description of 18 trials): The results lack baselines (e.g., rule-based or non-LLM agents), statistical details, or error analysis, leaving the 13 successful blocking actions without context on whether the manifest-driven approach outperforms simpler alternatives or how often the LLM would have produced unsafe actions without the boundary.

minor comments (2)

[Abstract] The abstract would benefit from a brief statement of the manifest schema structure or example action types to clarify what 'typed reports' entail.
[Abstract] Clarify whether the single 'valid no-action decision' was counted as a success for containment or treated separately in the success metric.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting areas where additional analysis would improve the manuscript. We respond to each major comment below and indicate planned revisions.

read point-by-point responses

Referee: The central claim that the typed boundary 'makes LLM-driven defense measurable, extensible, and attributable' is load-bearing on reliable schema adherence, yet 4 of 18 trials (22%) failed schema validation with no accompanying error analysis, failure-mode breakdown, or discussion of how rejected outputs affect attribution and measurability.

Authors: We agree that the manuscript would benefit from explicit discussion of the schema-validation failures. In the revised version we will add a short failure-mode subsection to the Evaluation section that enumerates the four cases (e.g., malformed JSON versus out-of-manifest action requests), shows the logged validation errors, and explains that rejected outputs remain fully attributable because they are recorded with the precise schema violation. This addition will directly support the claim that the typed boundary preserves measurability and attribution even when the LLM produces invalid reports. revision: yes
Referee: The results lack baselines (e.g., rule-based or non-LLM agents), statistical details, or error analysis, leaving the 13 successful blocking actions without context on whether the manifest-driven approach outperforms simpler alternatives or how often the LLM would have produced unsafe actions without the boundary.

Authors: The present evaluation is intentionally scoped to an end-to-end demonstration of closed-loop behavior on the Perry testbed rather than a comparative study. We will insert a limitations paragraph that acknowledges the absence of rule-based baselines and statistical power calculations, and we will outline future work that could include such comparisons. We did not run the LLM without the manifest boundary because doing so would have risked executing unvetted actions inside the test environment; the design rationale for the boundary is therefore presented as a safety property rather than a quantified reduction in unsafe outputs. revision: partial

Circularity Check

0 steps flagged

No circularity; evaluation relies on independent external testbed trials

full rationale

The paper presents PocketAgents as a manifest-driven library and reports results from 18 closed-loop trials on the Perry cyber arena using a DarkSide-inspired attack. Thirteen trials yielded validated network-block actions, four failed schema validation, and one produced a valid no-action decision. These outcomes are direct observations from the external testbed rather than any derivation, fitted parameter, or self-citation that reduces the claims of measurability, extensibility, and attributability to the inputs by construction. No equations, uniqueness theorems, or ansatzes are invoked; the central claim is supported by observable experimental data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLMs can be reliably constrained through manifests and prompts in a simulated environment; no free parameters or invented physical entities are introduced.

axioms (1)

domain assumption Large language models can be prompted to generate schema-valid typed reports that align with a provided manifest in defense contexts.
The evaluation success rate depends on this LLM behavior without additional enforcement mechanisms beyond the prompt and runtime.

pith-pipeline@v0.9.0 · 5713 in / 1339 out tokens · 44213 ms · 2026-05-22T09:15:53.429163+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Each agent is installed as three data files: a manifest, a prompt, and a runtime context. The shared runtime gives the agent bounded telemetry access and accepts only typed reports whose requested action appears in the manifest.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The boundary produces six outcome classes... valid block, schema fail, no action...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

[1]

99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms,

B. A. Alahmadi, L. Axon, I. Martinovic, “99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms,” in Proceedings of the 31st USENIX Security Symposium (USENIX Security). USENIX Association, 2022, pp. 2783–2800

work page 2022
[2]

True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center,

L. Yang, Z. Chen, C. Wang, Z. Zhang, S. Booma, P. Cao, C. Adam, A. Withers, Z. Kalbarczyk, R. K. Iyer, G. Wang, “True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security). USENIX Association, 2024, pp. 1525–1542

work page 2024
[3]

Matched and mismatched SOCs: A qualitative study on security operations center issues,

F. B. Kokulu, A. Soneji, T. Bao, Y . Shoshitaishvili, Z. Zhao, A. Doup ´e, G.-J. Ahn, “Matched and mismatched SOCs: A qualitative study on security operations center issues,” inProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2019, pp. 1955–1970

work page 2019
[4]

Do you play it by the books? a study on incident response playbooks,

D. Schlette, P. Empl, M. Caselli, T. Schreck, G. Pernul, “Do you play it by the books? a study on incident response playbooks,” inProceedings of the 2024 IEEE Symposium on Security and Privacy (SP). IEEE, 2024, pp. 3625–3643

work page 2024
[5]

The equifax data breach,

Majority Staff Report, 115th Congress, “The equifax data breach,” U.S. House of Representatives, Committee on Oversight and Government Reform, Tech. Rep., 2018

work page 2018
[6]

DarkSide ransomware: Best practices for preventing business disruption from ransomware attacks,

Cybersecurity and Infrastructure Security Agency (CISA) Federal Bureau of Investigation (FBI), “DarkSide ransomware: Best practices for preventing business disruption from ransomware attacks,” U.S. Department of Homeland Security and U.S. Department of Justice, Tech. Rep. AA21-131A, 2021

work page 2021
[7]

Perry: A high-level framework for accelerating cyber deception experimentation,

B. Singer, Y . Saquib, L. Bauer, V . Sekar, “Perry: A high-level framework for accelerating cyber deception experimentation,” in2025 28th Inter- national Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2025, pp. 158–173

work page 2025
[8]

MITRE ATT&CK: Design and philosophy,

B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Pennington, C. B. Thomas, “MITRE ATT&CK: Design and philosophy,” The MITRE Corporation, Tech. Rep. MP180360R1, 2020

work page 2020
[9]

SANE: A protection architecture for enterprise networks,

M. Casado, T. Garfinkel, A. Akella, M. J. Freedman, D. Boneh, N. McKeown, S. Shenker, “SANE: A protection architecture for enterprise networks,” inProceedings of the 15th USENIX Security Symposium, 2006

work page 2006
[10]

Kinetic: Verifiable dynamic network control,

H. Kim, J. Reich, A. Gupta, M. Shahbaz, N. Feamster, R. Clark, “Kinetic: Verifiable dynamic network control,” inProceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015

work page 2015
[11]

PSI: Precise security instrumentation for enterprise networks,

T. Yu, S. K. Fayaz, M. J. Collier, V . Sekar, S. Seshan, “PSI: Precise security instrumentation for enterprise networks,” inProceedings of the 24th Annual Network and Distributed System Security Symposium (NDSS), 2017

work page 2017
[12]

UNICORN: Runtime provenance-based detector for advanced persistent threats,

X. Han, T. Pasquier, A. Bates, J. Mickens, M. Seltzer, “UNICORN: Runtime provenance-based detector for advanced persistent threats,” in Proceedings of the 27th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2020

work page 2020
[13]

NoDoze: Combatting threat alert fatigue with automated provenance triage,

W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, A. Bates, “NoDoze: Combatting threat alert fatigue with automated provenance triage,” in Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2019

work page 2019
[14]

Alert alchemy: SOC workflows and decisions in the management of NIDS rules,

M. Vermeer, N. Kadenko, C. Ga ˜n´an, M. van Eeten, S. Parkin, “Alert alchemy: SOC workflows and decisions in the management of NIDS rules,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2023

work page 2023
[15]

SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response

S. Barbieri, L. V . d. Meneses, ´A. L. Roth Ferraz, L. A. Pereira J ´unior, “SOCpilot: Verifying policy compliance for LLM-assisted incident response,” arXiv preprint arXiv:2605.05501, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[16]

PentestGPT: Evaluating and harnessing large language models for automated penetration testing,

G. Deng, Y . Liu, V . Mayoral-Vilches, P. Liu, Y . Li, Y . Xu, T. Zhang, Y . Liu, M. Pinzger, S. Rass, “PentestGPT: Evaluating and harnessing large language models for automated penetration testing,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security), 2024

work page 2024
[17]

AUTOATTACKER: A large language model guided system to implement automatic cyber-attacks,

J. Xu, J. W. Stokes, G. McDonald, X. Bai, D. Marshall, S. Wang, A. Swaminathan, Z. Li, “AUTOATTACKER: A large language model guided system to implement automatic cyber-attacks,” arXiv preprint arXiv:2403.01038, 2024

work page arXiv 2024
[18]

On the feasibility of using LLMs to autonomously execute multi-host network attacks,

B. Singer, K. Lucas, L. Adiga, M. Jain, L. Bauer, V . Sekar, “On the feasibility of using LLMs to autonomously execute multi-host network attacks,” arXiv preprint arXiv:2501.16466, 2025

work page arXiv 2025
[19]

AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents,

H. Wang, C. M. Poskitt, J. Sun, “AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents,” inProceedings of the 2026 IEEE/ACM 48th International Conference on Software Engineering (ICSE). ACM, 2026

work page 2026
[20]

IsolateGPT: An execution isolation architecture for LLM-based agentic systems,

Y . Wu, F. Roesner, T. Kohno, N. Zhang, U. Iqbal, “IsolateGPT: An execution isolation architecture for LLM-based agentic systems,” in Proceedings of the 32nd Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2025

work page 2025
[21]

Progent: Securing AI Agents with Privilege Control

T. Shi, J. He, Z. Wang, H. Li, L. Wu, W. Guo, D. Song, “Progent: Secur- ing AI agents with privilege control,” arXiv preprint arXiv:2504.11703, 2025, uC Berkeley

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Rabanser, S

S. Rabanser, S. Kapoor, P. Kirgis, K. Liu, S. Utpala, A. Narayanan, “To- wards a science of AI agent reliability,” arXiv preprint arXiv:2602.16666, 2026, princeton University

work page arXiv 2026
[23]

CTINexus: Automatic cyber threat intelligence knowledge graph construction using large language models,

Y . Cheng, O. Bajaber, S. A. Tsegai, D. Song, P. Gao, “CTINexus: Automatic cyber threat intelligence knowledge graph construction using large language models,” inProceedings of the 2025 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2025

work page 2025
[24]

Large language model guided protocol fuzzing,

R. Meng, M. Mirchev, M. B ¨ohme, A. Roychoudhury, “Large language model guided protocol fuzzing,” inProceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2024

work page 2024
[25]

Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis,

Y . Kim, S. Shin, H. Kim, J. Yoon, “Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis,” inProceedings of the 34th USENIX Security Symposium (USENIX Security). USENIX Association, 2025

work page 2025
[26]

Cloak, honey, trap: Proactive defenses against LLM agents,

D. Ayzenshteyn, R. Weiss, Y . Mirsky, “Cloak, honey, trap: Proactive defenses against LLM agents,” inProceedings of the 34th USENIX Security Symposium (USENIX Security). USENIX Association, 2025

work page 2025

[1] [1]

99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms,

B. A. Alahmadi, L. Axon, I. Martinovic, “99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms,” in Proceedings of the 31st USENIX Security Symposium (USENIX Security). USENIX Association, 2022, pp. 2783–2800

work page 2022

[2] [2]

True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center,

L. Yang, Z. Chen, C. Wang, Z. Zhang, S. Booma, P. Cao, C. Adam, A. Withers, Z. Kalbarczyk, R. K. Iyer, G. Wang, “True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security). USENIX Association, 2024, pp. 1525–1542

work page 2024

[3] [3]

Matched and mismatched SOCs: A qualitative study on security operations center issues,

F. B. Kokulu, A. Soneji, T. Bao, Y . Shoshitaishvili, Z. Zhao, A. Doup ´e, G.-J. Ahn, “Matched and mismatched SOCs: A qualitative study on security operations center issues,” inProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2019, pp. 1955–1970

work page 2019

[4] [4]

Do you play it by the books? a study on incident response playbooks,

D. Schlette, P. Empl, M. Caselli, T. Schreck, G. Pernul, “Do you play it by the books? a study on incident response playbooks,” inProceedings of the 2024 IEEE Symposium on Security and Privacy (SP). IEEE, 2024, pp. 3625–3643

work page 2024

[5] [5]

The equifax data breach,

Majority Staff Report, 115th Congress, “The equifax data breach,” U.S. House of Representatives, Committee on Oversight and Government Reform, Tech. Rep., 2018

work page 2018

[6] [6]

DarkSide ransomware: Best practices for preventing business disruption from ransomware attacks,

Cybersecurity and Infrastructure Security Agency (CISA) Federal Bureau of Investigation (FBI), “DarkSide ransomware: Best practices for preventing business disruption from ransomware attacks,” U.S. Department of Homeland Security and U.S. Department of Justice, Tech. Rep. AA21-131A, 2021

work page 2021

[7] [7]

Perry: A high-level framework for accelerating cyber deception experimentation,

B. Singer, Y . Saquib, L. Bauer, V . Sekar, “Perry: A high-level framework for accelerating cyber deception experimentation,” in2025 28th Inter- national Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2025, pp. 158–173

work page 2025

[8] [8]

MITRE ATT&CK: Design and philosophy,

B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Pennington, C. B. Thomas, “MITRE ATT&CK: Design and philosophy,” The MITRE Corporation, Tech. Rep. MP180360R1, 2020

work page 2020

[9] [9]

SANE: A protection architecture for enterprise networks,

M. Casado, T. Garfinkel, A. Akella, M. J. Freedman, D. Boneh, N. McKeown, S. Shenker, “SANE: A protection architecture for enterprise networks,” inProceedings of the 15th USENIX Security Symposium, 2006

work page 2006

[10] [10]

Kinetic: Verifiable dynamic network control,

H. Kim, J. Reich, A. Gupta, M. Shahbaz, N. Feamster, R. Clark, “Kinetic: Verifiable dynamic network control,” inProceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015

work page 2015

[11] [11]

PSI: Precise security instrumentation for enterprise networks,

T. Yu, S. K. Fayaz, M. J. Collier, V . Sekar, S. Seshan, “PSI: Precise security instrumentation for enterprise networks,” inProceedings of the 24th Annual Network and Distributed System Security Symposium (NDSS), 2017

work page 2017

[12] [12]

UNICORN: Runtime provenance-based detector for advanced persistent threats,

X. Han, T. Pasquier, A. Bates, J. Mickens, M. Seltzer, “UNICORN: Runtime provenance-based detector for advanced persistent threats,” in Proceedings of the 27th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2020

work page 2020

[13] [13]

NoDoze: Combatting threat alert fatigue with automated provenance triage,

W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, A. Bates, “NoDoze: Combatting threat alert fatigue with automated provenance triage,” in Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2019

work page 2019

[14] [14]

Alert alchemy: SOC workflows and decisions in the management of NIDS rules,

M. Vermeer, N. Kadenko, C. Ga ˜n´an, M. van Eeten, S. Parkin, “Alert alchemy: SOC workflows and decisions in the management of NIDS rules,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2023

work page 2023

[15] [15]

SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response

S. Barbieri, L. V . d. Meneses, ´A. L. Roth Ferraz, L. A. Pereira J ´unior, “SOCpilot: Verifying policy compliance for LLM-assisted incident response,” arXiv preprint arXiv:2605.05501, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[16] [16]

PentestGPT: Evaluating and harnessing large language models for automated penetration testing,

G. Deng, Y . Liu, V . Mayoral-Vilches, P. Liu, Y . Li, Y . Xu, T. Zhang, Y . Liu, M. Pinzger, S. Rass, “PentestGPT: Evaluating and harnessing large language models for automated penetration testing,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security), 2024

work page 2024

[17] [17]

AUTOATTACKER: A large language model guided system to implement automatic cyber-attacks,

J. Xu, J. W. Stokes, G. McDonald, X. Bai, D. Marshall, S. Wang, A. Swaminathan, Z. Li, “AUTOATTACKER: A large language model guided system to implement automatic cyber-attacks,” arXiv preprint arXiv:2403.01038, 2024

work page arXiv 2024

[18] [18]

On the feasibility of using LLMs to autonomously execute multi-host network attacks,

B. Singer, K. Lucas, L. Adiga, M. Jain, L. Bauer, V . Sekar, “On the feasibility of using LLMs to autonomously execute multi-host network attacks,” arXiv preprint arXiv:2501.16466, 2025

work page arXiv 2025

[19] [19]

AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents,

H. Wang, C. M. Poskitt, J. Sun, “AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents,” inProceedings of the 2026 IEEE/ACM 48th International Conference on Software Engineering (ICSE). ACM, 2026

work page 2026

[20] [20]

IsolateGPT: An execution isolation architecture for LLM-based agentic systems,

Y . Wu, F. Roesner, T. Kohno, N. Zhang, U. Iqbal, “IsolateGPT: An execution isolation architecture for LLM-based agentic systems,” in Proceedings of the 32nd Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2025

work page 2025

[21] [21]

Progent: Securing AI Agents with Privilege Control

T. Shi, J. He, Z. Wang, H. Li, L. Wu, W. Guo, D. Song, “Progent: Secur- ing AI agents with privilege control,” arXiv preprint arXiv:2504.11703, 2025, uC Berkeley

work page internal anchor Pith review Pith/arXiv arXiv 2025

[22] [22]

Rabanser, S

S. Rabanser, S. Kapoor, P. Kirgis, K. Liu, S. Utpala, A. Narayanan, “To- wards a science of AI agent reliability,” arXiv preprint arXiv:2602.16666, 2026, princeton University

work page arXiv 2026

[23] [23]

CTINexus: Automatic cyber threat intelligence knowledge graph construction using large language models,

Y . Cheng, O. Bajaber, S. A. Tsegai, D. Song, P. Gao, “CTINexus: Automatic cyber threat intelligence knowledge graph construction using large language models,” inProceedings of the 2025 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2025

work page 2025

[24] [24]

Large language model guided protocol fuzzing,

R. Meng, M. Mirchev, M. B ¨ohme, A. Roychoudhury, “Large language model guided protocol fuzzing,” inProceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2024

work page 2024

[25] [25]

Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis,

Y . Kim, S. Shin, H. Kim, J. Yoon, “Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis,” inProceedings of the 34th USENIX Security Symposium (USENIX Security). USENIX Association, 2025

work page 2025

[26] [26]

Cloak, honey, trap: Proactive defenses against LLM agents,

D. Ayzenshteyn, R. Weiss, Y . Mirsky, “Cloak, honey, trap: Proactive defenses against LLM agents,” inProceedings of the 34th USENIX Security Symposium (USENIX Security). USENIX Association, 2025

work page 2025