pith. sign in

arxiv: 2605.21694 · v1 · pith:I4ZS7GSTnew · submitted 2026-05-20 · 💻 cs.CR · cs.AI

PocketAgents: A Manifest-Driven Library of Autonomous Defense Agents

Pith reviewed 2026-05-22 09:15 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords autonomous defense agentsLLM-driven securitymanifest-driven librarytyped reportsschema validationcyber defenseattack containment
0
0 comments X

The pith

A typed boundary around LLM agents makes defensive actions measurable, extensible, and attributable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a library in which each defense agent is defined by three data files: a manifest listing allowed actions, a prompt, and runtime context. The runtime limits what the agent can see and requires every output to be a typed report that matches an entry in the manifest. Experiments running two such agents against a simulated attack produced validated blocking actions in thirteen of eighteen trials, with four outputs rejected by schema checks and one valid decision to take no action. A reader would care because the structure converts open-ended model responses into trackable decisions that can be audited, extended, or improved without rewriting code.

Core claim

PocketAgents installs each autonomous defense agent as three data files: a manifest, a prompt, and a runtime context. The shared runtime gives the agent bounded telemetry access and accepts only typed reports whose requested action appears in the manifest. In eighteen closed-loop trials of a DarkSide-inspired attack on a small enterprise topology, thirteen trials produced validated network-block actions that contained the attack, four failed schema validation, and one produced a valid no-action decision.

What carries the argument

The manifest that enumerates permitted actions together with schema validation on every agent output, which carries the argument by enforcing boundaries that turn LLM decisions into measurable and attributable events.

If this is right

  • Defense successes and failures become countable because every output is either accepted as a listed action or rejected by schema check.
  • New agents can be added by supplying new manifest, prompt, and context files without altering the underlying runtime.
  • Each validated action can be traced to the specific manifest entry that permitted it.
  • Schema failures are logged separately, creating a clear record of where the model did not meet the required format.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same manifest-plus-validation pattern could be reused for LLM agents that control non-security tasks such as automated configuration changes.
  • Over time, shared manifests might allow different organizations to compare how well their agents perform on identical action lists.
  • Extending the manifest to include notification or logging actions would let the library handle a wider range of defensive responses.

Load-bearing premise

The language model will reliably emit outputs that pass schema validation and that the chosen testbed and attack scenario capture the essential difficulties of real defensive work.

What would settle it

A new set of trials in which most model outputs fail schema validation or the agents fail to contain the attack when the network topology or attack sequence is changed would show that the typed boundary does not reliably deliver measurable defense.

Figures

Figures reproduced from arXiv: 2605.21694 by \'Agney Lopes Roth Ferraz, Louren\c{c}o Alves Pereira J\'unior, Sidnei Barbieri.

Figure 1
Figure 1. Figure 1: PocketAgents runtime. Data-only agents cross a typed boundary before any action is subject to enforcement. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

Connecting large language models (LLMs) to defensive enforcement requires more than asking a model whether an attack is happening. A defender must decide which model outputs may change the system state, which outputs must be rejected, and how failures should be recorded. We present PocketAgents, a manifest-driven library of autonomous defense agents. Each agent is installed as three data files: a manifest, a prompt, and a runtime context. The shared runtime gives the agent bounded telemetry access and accepts only typed reports whose requested action appears in the manifest. We implemented PocketAgents on top of a cyber arena (Perry), a cyber-deception testbed, and evaluated two agents, Command and Control and Exfiltration, in 18 closed-loop trials of a DarkSide-inspired attack on a small enterprise topology. Thirteen trials produced validated network-block actions and contained the attack; four failed schema validation; one produced a valid no-action decision. The experiments show that a typed boundary makes LLM-driven defense measurable, extensible, and attributable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces PocketAgents, a manifest-driven library for autonomous LLM-based defense agents. Each agent is defined by three data files (manifest, prompt, runtime context) that enforce a typed boundary: the runtime provides bounded telemetry and accepts only schema-validated reports whose requested actions are explicitly listed in the manifest. The authors implement two agents (Command and Control, Exfiltration) on the Perry cyber arena and evaluate them in 18 closed-loop trials against a DarkSide-inspired attack on a small enterprise topology, reporting 13 validated network-block actions that contained the attack, 4 schema-validation failures, and 1 valid no-action decision. They conclude that the typed boundary renders LLM-driven defense measurable, extensible, and attributable.

Significance. If the central claim holds, the work offers a concrete mechanism for safely integrating LLMs into defensive enforcement by making actions auditable and attributable through explicit manifests and schema validation. The closed-loop trials on the Perry testbed constitute a strength, providing an end-to-end demonstration rather than isolated prompt evaluations. The approach could support extensible agent libraries if the reliability issues are resolved.

major comments (2)
  1. [Abstract] Abstract: The central claim that the typed boundary 'makes LLM-driven defense measurable, extensible, and attributable' is load-bearing on reliable schema adherence, yet 4 of 18 trials (22%) failed schema validation with no accompanying error analysis, failure-mode breakdown, or discussion of how rejected outputs affect attribution and measurability.
  2. [Evaluation] Evaluation section (inferred from abstract description of 18 trials): The results lack baselines (e.g., rule-based or non-LLM agents), statistical details, or error analysis, leaving the 13 successful blocking actions without context on whether the manifest-driven approach outperforms simpler alternatives or how often the LLM would have produced unsafe actions without the boundary.
minor comments (2)
  1. [Abstract] The abstract would benefit from a brief statement of the manifest schema structure or example action types to clarify what 'typed reports' entail.
  2. [Abstract] Clarify whether the single 'valid no-action decision' was counted as a success for containment or treated separately in the success metric.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting areas where additional analysis would improve the manuscript. We respond to each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: The central claim that the typed boundary 'makes LLM-driven defense measurable, extensible, and attributable' is load-bearing on reliable schema adherence, yet 4 of 18 trials (22%) failed schema validation with no accompanying error analysis, failure-mode breakdown, or discussion of how rejected outputs affect attribution and measurability.

    Authors: We agree that the manuscript would benefit from explicit discussion of the schema-validation failures. In the revised version we will add a short failure-mode subsection to the Evaluation section that enumerates the four cases (e.g., malformed JSON versus out-of-manifest action requests), shows the logged validation errors, and explains that rejected outputs remain fully attributable because they are recorded with the precise schema violation. This addition will directly support the claim that the typed boundary preserves measurability and attribution even when the LLM produces invalid reports. revision: yes

  2. Referee: The results lack baselines (e.g., rule-based or non-LLM agents), statistical details, or error analysis, leaving the 13 successful blocking actions without context on whether the manifest-driven approach outperforms simpler alternatives or how often the LLM would have produced unsafe actions without the boundary.

    Authors: The present evaluation is intentionally scoped to an end-to-end demonstration of closed-loop behavior on the Perry testbed rather than a comparative study. We will insert a limitations paragraph that acknowledges the absence of rule-based baselines and statistical power calculations, and we will outline future work that could include such comparisons. We did not run the LLM without the manifest boundary because doing so would have risked executing unvetted actions inside the test environment; the design rationale for the boundary is therefore presented as a safety property rather than a quantified reduction in unsafe outputs. revision: partial

Circularity Check

0 steps flagged

No circularity; evaluation relies on independent external testbed trials

full rationale

The paper presents PocketAgents as a manifest-driven library and reports results from 18 closed-loop trials on the Perry cyber arena using a DarkSide-inspired attack. Thirteen trials yielded validated network-block actions, four failed schema validation, and one produced a valid no-action decision. These outcomes are direct observations from the external testbed rather than any derivation, fitted parameter, or self-citation that reduces the claims of measurability, extensibility, and attributability to the inputs by construction. No equations, uniqueness theorems, or ansatzes are invoked; the central claim is supported by observable experimental data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLMs can be reliably constrained through manifests and prompts in a simulated environment; no free parameters or invented physical entities are introduced.

axioms (1)
  • domain assumption Large language models can be prompted to generate schema-valid typed reports that align with a provided manifest in defense contexts.
    The evaluation success rate depends on this LLM behavior without additional enforcement mechanisms beyond the prompt and runtime.

pith-pipeline@v0.9.0 · 5713 in / 1339 out tokens · 44213 ms · 2026-05-22T09:15:53.429163+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 2 internal anchors

  1. [1]

    99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms,

    B. A. Alahmadi, L. Axon, I. Martinovic, “99% false positives: A qualitative study of SOC analysts’ perspectives on security alarms,” in Proceedings of the 31st USENIX Security Symposium (USENIX Security). USENIX Association, 2022, pp. 2783–2800

  2. [2]

    True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center,

    L. Yang, Z. Chen, C. Wang, Z. Zhang, S. Booma, P. Cao, C. Adam, A. Withers, Z. Kalbarczyk, R. K. Iyer, G. Wang, “True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security). USENIX Association, 2024, pp. 1525–1542

  3. [3]

    Matched and mismatched SOCs: A qualitative study on security operations center issues,

    F. B. Kokulu, A. Soneji, T. Bao, Y . Shoshitaishvili, Z. Zhao, A. Doup ´e, G.-J. Ahn, “Matched and mismatched SOCs: A qualitative study on security operations center issues,” inProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2019, pp. 1955–1970

  4. [4]

    Do you play it by the books? a study on incident response playbooks,

    D. Schlette, P. Empl, M. Caselli, T. Schreck, G. Pernul, “Do you play it by the books? a study on incident response playbooks,” inProceedings of the 2024 IEEE Symposium on Security and Privacy (SP). IEEE, 2024, pp. 3625–3643

  5. [5]

    The equifax data breach,

    Majority Staff Report, 115th Congress, “The equifax data breach,” U.S. House of Representatives, Committee on Oversight and Government Reform, Tech. Rep., 2018

  6. [6]

    DarkSide ransomware: Best practices for preventing business disruption from ransomware attacks,

    Cybersecurity and Infrastructure Security Agency (CISA) Federal Bureau of Investigation (FBI), “DarkSide ransomware: Best practices for preventing business disruption from ransomware attacks,” U.S. Department of Homeland Security and U.S. Department of Justice, Tech. Rep. AA21-131A, 2021

  7. [7]

    Perry: A high-level framework for accelerating cyber deception experimentation,

    B. Singer, Y . Saquib, L. Bauer, V . Sekar, “Perry: A high-level framework for accelerating cyber deception experimentation,” in2025 28th Inter- national Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2025, pp. 158–173

  8. [8]

    MITRE ATT&CK: Design and philosophy,

    B. E. Strom, A. Applebaum, D. P. Miller, K. C. Nickels, A. G. Pennington, C. B. Thomas, “MITRE ATT&CK: Design and philosophy,” The MITRE Corporation, Tech. Rep. MP180360R1, 2020

  9. [9]

    SANE: A protection architecture for enterprise networks,

    M. Casado, T. Garfinkel, A. Akella, M. J. Freedman, D. Boneh, N. McKeown, S. Shenker, “SANE: A protection architecture for enterprise networks,” inProceedings of the 15th USENIX Security Symposium, 2006

  10. [10]

    Kinetic: Verifiable dynamic network control,

    H. Kim, J. Reich, A. Gupta, M. Shahbaz, N. Feamster, R. Clark, “Kinetic: Verifiable dynamic network control,” inProceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015

  11. [11]

    PSI: Precise security instrumentation for enterprise networks,

    T. Yu, S. K. Fayaz, M. J. Collier, V . Sekar, S. Seshan, “PSI: Precise security instrumentation for enterprise networks,” inProceedings of the 24th Annual Network and Distributed System Security Symposium (NDSS), 2017

  12. [12]

    UNICORN: Runtime provenance-based detector for advanced persistent threats,

    X. Han, T. Pasquier, A. Bates, J. Mickens, M. Seltzer, “UNICORN: Runtime provenance-based detector for advanced persistent threats,” in Proceedings of the 27th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2020

  13. [13]

    NoDoze: Combatting threat alert fatigue with automated provenance triage,

    W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, A. Bates, “NoDoze: Combatting threat alert fatigue with automated provenance triage,” in Proceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2019

  14. [14]

    Alert alchemy: SOC workflows and decisions in the management of NIDS rules,

    M. Vermeer, N. Kadenko, C. Ga ˜n´an, M. van Eeten, S. Parkin, “Alert alchemy: SOC workflows and decisions in the management of NIDS rules,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2023

  15. [15]

    SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response

    S. Barbieri, L. V . d. Meneses, ´A. L. Roth Ferraz, L. A. Pereira J ´unior, “SOCpilot: Verifying policy compliance for LLM-assisted incident response,” arXiv preprint arXiv:2605.05501, 2026

  16. [16]

    PentestGPT: Evaluating and harnessing large language models for automated penetration testing,

    G. Deng, Y . Liu, V . Mayoral-Vilches, P. Liu, Y . Li, Y . Xu, T. Zhang, Y . Liu, M. Pinzger, S. Rass, “PentestGPT: Evaluating and harnessing large language models for automated penetration testing,” inProceedings of the 33rd USENIX Security Symposium (USENIX Security), 2024

  17. [17]

    AUTOATTACKER: A large language model guided system to implement automatic cyber-attacks,

    J. Xu, J. W. Stokes, G. McDonald, X. Bai, D. Marshall, S. Wang, A. Swaminathan, Z. Li, “AUTOATTACKER: A large language model guided system to implement automatic cyber-attacks,” arXiv preprint arXiv:2403.01038, 2024

  18. [18]

    On the feasibility of using LLMs to autonomously execute multi-host network attacks,

    B. Singer, K. Lucas, L. Adiga, M. Jain, L. Bauer, V . Sekar, “On the feasibility of using LLMs to autonomously execute multi-host network attacks,” arXiv preprint arXiv:2501.16466, 2025

  19. [19]

    AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents,

    H. Wang, C. M. Poskitt, J. Sun, “AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents,” inProceedings of the 2026 IEEE/ACM 48th International Conference on Software Engineering (ICSE). ACM, 2026

  20. [20]

    IsolateGPT: An execution isolation architecture for LLM-based agentic systems,

    Y . Wu, F. Roesner, T. Kohno, N. Zhang, U. Iqbal, “IsolateGPT: An execution isolation architecture for LLM-based agentic systems,” in Proceedings of the 32nd Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2025

  21. [21]

    Progent: Securing AI Agents with Privilege Control

    T. Shi, J. He, Z. Wang, H. Li, L. Wu, W. Guo, D. Song, “Progent: Secur- ing AI agents with privilege control,” arXiv preprint arXiv:2504.11703, 2025, uC Berkeley

  22. [22]

    Rabanser, S

    S. Rabanser, S. Kapoor, P. Kirgis, K. Liu, S. Utpala, A. Narayanan, “To- wards a science of AI agent reliability,” arXiv preprint arXiv:2602.16666, 2026, princeton University

  23. [23]

    CTINexus: Automatic cyber threat intelligence knowledge graph construction using large language models,

    Y . Cheng, O. Bajaber, S. A. Tsegai, D. Song, P. Gao, “CTINexus: Automatic cyber threat intelligence knowledge graph construction using large language models,” inProceedings of the 2025 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2025

  24. [24]

    Large language model guided protocol fuzzing,

    R. Meng, M. Mirchev, M. B ¨ohme, A. Roychoudhury, “Large language model guided protocol fuzzing,” inProceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2024

  25. [25]

    Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis,

    Y . Kim, S. Shin, H. Kim, J. Yoon, “Logs in, patches out: Automated vulnerability repair via tree-of-thought LLM analysis,” inProceedings of the 34th USENIX Security Symposium (USENIX Security). USENIX Association, 2025

  26. [26]

    Cloak, honey, trap: Proactive defenses against LLM agents,

    D. Ayzenshteyn, R. Weiss, Y . Mirsky, “Cloak, honey, trap: Proactive defenses against LLM agents,” inProceedings of the 34th USENIX Security Symposium (USENIX Security). USENIX Association, 2025