PocketAgents: A Manifest-Driven Library of Autonomous Defense Agents
Pith reviewed 2026-05-22 09:15 UTC · model grok-4.3
The pith
A typed boundary around LLM agents makes defensive actions measurable, extensible, and attributable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PocketAgents installs each autonomous defense agent as three data files: a manifest, a prompt, and a runtime context. The shared runtime gives the agent bounded telemetry access and accepts only typed reports whose requested action appears in the manifest. In eighteen closed-loop trials of a DarkSide-inspired attack on a small enterprise topology, thirteen trials produced validated network-block actions that contained the attack, four failed schema validation, and one produced a valid no-action decision.
What carries the argument
The manifest that enumerates permitted actions together with schema validation on every agent output, which carries the argument by enforcing boundaries that turn LLM decisions into measurable and attributable events.
If this is right
- Defense successes and failures become countable because every output is either accepted as a listed action or rejected by schema check.
- New agents can be added by supplying new manifest, prompt, and context files without altering the underlying runtime.
- Each validated action can be traced to the specific manifest entry that permitted it.
- Schema failures are logged separately, creating a clear record of where the model did not meet the required format.
Where Pith is reading between the lines
- The same manifest-plus-validation pattern could be reused for LLM agents that control non-security tasks such as automated configuration changes.
- Over time, shared manifests might allow different organizations to compare how well their agents perform on identical action lists.
- Extending the manifest to include notification or logging actions would let the library handle a wider range of defensive responses.
Load-bearing premise
The language model will reliably emit outputs that pass schema validation and that the chosen testbed and attack scenario capture the essential difficulties of real defensive work.
What would settle it
A new set of trials in which most model outputs fail schema validation or the agents fail to contain the attack when the network topology or attack sequence is changed would show that the typed boundary does not reliably deliver measurable defense.
Figures
read the original abstract
Connecting large language models (LLMs) to defensive enforcement requires more than asking a model whether an attack is happening. A defender must decide which model outputs may change the system state, which outputs must be rejected, and how failures should be recorded. We present PocketAgents, a manifest-driven library of autonomous defense agents. Each agent is installed as three data files: a manifest, a prompt, and a runtime context. The shared runtime gives the agent bounded telemetry access and accepts only typed reports whose requested action appears in the manifest. We implemented PocketAgents on top of a cyber arena (Perry), a cyber-deception testbed, and evaluated two agents, Command and Control and Exfiltration, in 18 closed-loop trials of a DarkSide-inspired attack on a small enterprise topology. Thirteen trials produced validated network-block actions and contained the attack; four failed schema validation; one produced a valid no-action decision. The experiments show that a typed boundary makes LLM-driven defense measurable, extensible, and attributable.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PocketAgents, a manifest-driven library for autonomous LLM-based defense agents. Each agent is defined by three data files (manifest, prompt, runtime context) that enforce a typed boundary: the runtime provides bounded telemetry and accepts only schema-validated reports whose requested actions are explicitly listed in the manifest. The authors implement two agents (Command and Control, Exfiltration) on the Perry cyber arena and evaluate them in 18 closed-loop trials against a DarkSide-inspired attack on a small enterprise topology, reporting 13 validated network-block actions that contained the attack, 4 schema-validation failures, and 1 valid no-action decision. They conclude that the typed boundary renders LLM-driven defense measurable, extensible, and attributable.
Significance. If the central claim holds, the work offers a concrete mechanism for safely integrating LLMs into defensive enforcement by making actions auditable and attributable through explicit manifests and schema validation. The closed-loop trials on the Perry testbed constitute a strength, providing an end-to-end demonstration rather than isolated prompt evaluations. The approach could support extensible agent libraries if the reliability issues are resolved.
major comments (2)
- [Abstract] Abstract: The central claim that the typed boundary 'makes LLM-driven defense measurable, extensible, and attributable' is load-bearing on reliable schema adherence, yet 4 of 18 trials (22%) failed schema validation with no accompanying error analysis, failure-mode breakdown, or discussion of how rejected outputs affect attribution and measurability.
- [Evaluation] Evaluation section (inferred from abstract description of 18 trials): The results lack baselines (e.g., rule-based or non-LLM agents), statistical details, or error analysis, leaving the 13 successful blocking actions without context on whether the manifest-driven approach outperforms simpler alternatives or how often the LLM would have produced unsafe actions without the boundary.
minor comments (2)
- [Abstract] The abstract would benefit from a brief statement of the manifest schema structure or example action types to clarify what 'typed reports' entail.
- [Abstract] Clarify whether the single 'valid no-action decision' was counted as a success for containment or treated separately in the success metric.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for highlighting areas where additional analysis would improve the manuscript. We respond to each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: The central claim that the typed boundary 'makes LLM-driven defense measurable, extensible, and attributable' is load-bearing on reliable schema adherence, yet 4 of 18 trials (22%) failed schema validation with no accompanying error analysis, failure-mode breakdown, or discussion of how rejected outputs affect attribution and measurability.
Authors: We agree that the manuscript would benefit from explicit discussion of the schema-validation failures. In the revised version we will add a short failure-mode subsection to the Evaluation section that enumerates the four cases (e.g., malformed JSON versus out-of-manifest action requests), shows the logged validation errors, and explains that rejected outputs remain fully attributable because they are recorded with the precise schema violation. This addition will directly support the claim that the typed boundary preserves measurability and attribution even when the LLM produces invalid reports. revision: yes
-
Referee: The results lack baselines (e.g., rule-based or non-LLM agents), statistical details, or error analysis, leaving the 13 successful blocking actions without context on whether the manifest-driven approach outperforms simpler alternatives or how often the LLM would have produced unsafe actions without the boundary.
Authors: The present evaluation is intentionally scoped to an end-to-end demonstration of closed-loop behavior on the Perry testbed rather than a comparative study. We will insert a limitations paragraph that acknowledges the absence of rule-based baselines and statistical power calculations, and we will outline future work that could include such comparisons. We did not run the LLM without the manifest boundary because doing so would have risked executing unvetted actions inside the test environment; the design rationale for the boundary is therefore presented as a safety property rather than a quantified reduction in unsafe outputs. revision: partial
Circularity Check
No circularity; evaluation relies on independent external testbed trials
full rationale
The paper presents PocketAgents as a manifest-driven library and reports results from 18 closed-loop trials on the Perry cyber arena using a DarkSide-inspired attack. Thirteen trials yielded validated network-block actions, four failed schema validation, and one produced a valid no-action decision. These outcomes are direct observations from the external testbed rather than any derivation, fitted parameter, or self-citation that reduces the claims of measurability, extensibility, and attributability to the inputs by construction. No equations, uniqueness theorems, or ansatzes are invoked; the central claim is supported by observable experimental data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can be prompted to generate schema-valid typed reports that align with a provided manifest in defense contexts.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Each agent is installed as three data files: a manifest, a prompt, and a runtime context. The shared runtime gives the agent bounded telemetry access and accepts only typed reports whose requested action appears in the manifest.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The boundary produces six outcome classes... valid block, schema fail, no action...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms,
B. A. Alahmadi, L. Axon, I. Martinovic, “99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms,” in Proceedings of the 31st USENIX Security Symposium (USENIX Security). USENIX Association, 2022, pp. 2783–2800
work page 2022
-
[2]
L. Yang, Z. Chen, C. Wang, Z. Zhang, S. Booma, P. Cao, C. Adam, A. Withers, Z. Kalbarczyk, R. K. Iyer, G. Wang, “True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security). USENIX Association, 2024, pp. 1525–1542
work page 2024
-
[3]
Matched and mismatched SOCs: A qualitative study on security operations center issues,
F. B. Kokulu, A. Soneji, T. Bao, Y . Shoshitaishvili, Z. Zhao, A. Doup ´e, G.-J. Ahn, “Matched and mismatched SOCs: A qualitative study on security operations center issues,” inProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2019, pp. 1955–1970
work page 2019
-
[4]
Do you play it by the books? a study on incident response playbooks,
D. Schlette, P. Empl, M. Caselli, T. Schreck, G. Pernul, “Do you play it by the books? a study on incident response playbooks,” inProceedings of the 2024 IEEE Symposium on Security and Privacy (SP). IEEE, 2024, pp. 3625–3643
work page 2024
-
[5]
Majority Staff Report, 115th Congress, “The equifax data breach,” U.S. House of Representatives, Committee on Oversight and Government Reform, Tech. Rep., 2018
work page 2018
-
[6]
DarkSide ransomware: Best practices for preventing business disruption from ransomware attacks,
Cybersecurity and Infrastructure Security Agency (CISA) Federal Bureau of Investigation (FBI), “DarkSide ransomware: Best practices for preventing business disruption from ransomware attacks,” U.S. Department of Homeland Security and U.S. Department of Justice, Tech. Rep. AA21-131A, 2021
work page 2021
-
[7]
Perry: A high-level framework for accelerating cyber deception experimentation,
B. Singer, Y . Saquib, L. Bauer, V . Sekar, “Perry: A high-level framework for accelerating cyber deception experimentation,” in2025 28th Inter- national Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2025, pp. 158–173
work page 2025
-
[8]
MITRE ATT&CK: Design and philosophy,
B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Pennington, C. B. Thomas, “MITRE ATT&CK: Design and philosophy,” The MITRE Corporation, Tech. Rep. MP180360R1, 2020
work page 2020
-
[9]
SANE: A protection architecture for enterprise networks,
M. Casado, T. Garfinkel, A. Akella, M. J. Freedman, D. Boneh, N. McKeown, S. Shenker, “SANE: A protection architecture for enterprise networks,” inProceedings of the 15th USENIX Security Symposium, 2006
work page 2006
-
[10]
Kinetic: Verifiable dynamic network control,
H. Kim, J. Reich, A. Gupta, M. Shahbaz, N. Feamster, R. Clark, “Kinetic: Verifiable dynamic network control,” inProceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015
work page 2015
-
[11]
PSI: Precise security instrumentation for enterprise networks,
T. Yu, S. K. Fayaz, M. J. Collier, V . Sekar, S. Seshan, “PSI: Precise security instrumentation for enterprise networks,” inProceedings of the 24th Annual Network and Distributed System Security Symposium (NDSS), 2017
work page 2017
-
[12]
UNICORN: Runtime provenance-based detector for advanced persistent threats,
X. Han, T. Pasquier, A. Bates, J. Mickens, M. Seltzer, “UNICORN: Runtime provenance-based detector for advanced persistent threats,” in Proceedings of the 27th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2020
work page 2020
-
[13]
NoDoze: Combatting threat alert fatigue with automated provenance triage,
W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, A. Bates, “NoDoze: Combatting threat alert fatigue with automated provenance triage,” in Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2019
work page 2019
-
[14]
Alert alchemy: SOC workflows and decisions in the management of NIDS rules,
M. Vermeer, N. Kadenko, C. Ga ˜n´an, M. van Eeten, S. Parkin, “Alert alchemy: SOC workflows and decisions in the management of NIDS rules,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2023
work page 2023
-
[15]
SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response
S. Barbieri, L. V . d. Meneses, ´A. L. Roth Ferraz, L. A. Pereira J ´unior, “SOCpilot: Verifying policy compliance for LLM-assisted incident response,” arXiv preprint arXiv:2605.05501, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[16]
PentestGPT: Evaluating and harnessing large language models for automated penetration testing,
G. Deng, Y . Liu, V . Mayoral-Vilches, P. Liu, Y . Li, Y . Xu, T. Zhang, Y . Liu, M. Pinzger, S. Rass, “PentestGPT: Evaluating and harnessing large language models for automated penetration testing,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security), 2024
work page 2024
-
[17]
AUTOATTACKER: A large language model guided system to implement automatic cyber-attacks,
J. Xu, J. W. Stokes, G. McDonald, X. Bai, D. Marshall, S. Wang, A. Swaminathan, Z. Li, “AUTOATTACKER: A large language model guided system to implement automatic cyber-attacks,” arXiv preprint arXiv:2403.01038, 2024
-
[18]
On the feasibility of using LLMs to autonomously execute multi-host network attacks,
B. Singer, K. Lucas, L. Adiga, M. Jain, L. Bauer, V . Sekar, “On the feasibility of using LLMs to autonomously execute multi-host network attacks,” arXiv preprint arXiv:2501.16466, 2025
-
[19]
AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents,
H. Wang, C. M. Poskitt, J. Sun, “AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents,” inProceedings of the 2026 IEEE/ACM 48th International Conference on Software Engineering (ICSE). ACM, 2026
work page 2026
-
[20]
IsolateGPT: An execution isolation architecture for LLM-based agentic systems,
Y . Wu, F. Roesner, T. Kohno, N. Zhang, U. Iqbal, “IsolateGPT: An execution isolation architecture for LLM-based agentic systems,” in Proceedings of the 32nd Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2025
work page 2025
-
[21]
Progent: Securing AI Agents with Privilege Control
T. Shi, J. He, Z. Wang, H. Li, L. Wu, W. Guo, D. Song, “Progent: Secur- ing AI agents with privilege control,” arXiv preprint arXiv:2504.11703, 2025, uC Berkeley
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[22]
S. Rabanser, S. Kapoor, P. Kirgis, K. Liu, S. Utpala, A. Narayanan, “To- wards a science of AI agent reliability,” arXiv preprint arXiv:2602.16666, 2026, princeton University
-
[23]
Y . Cheng, O. Bajaber, S. A. Tsegai, D. Song, P. Gao, “CTINexus: Automatic cyber threat intelligence knowledge graph construction using large language models,” inProceedings of the 2025 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2025
work page 2025
-
[24]
Large language model guided protocol fuzzing,
R. Meng, M. Mirchev, M. B ¨ohme, A. Roychoudhury, “Large language model guided protocol fuzzing,” inProceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2024
work page 2024
-
[25]
Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis,
Y . Kim, S. Shin, H. Kim, J. Yoon, “Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis,” inProceedings of the 34th USENIX Security Symposium (USENIX Security). USENIX Association, 2025
work page 2025
-
[26]
Cloak, honey, trap: Proactive defenses against LLM agents,
D. Ayzenshteyn, R. Weiss, Y . Mirsky, “Cloak, honey, trap: Proactive defenses against LLM agents,” inProceedings of the 34th USENIX Security Symposium (USENIX Security). USENIX Association, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.