pith. machine review for the scientific record. sign in

arxiv: 2604.26274 · v1 · submitted 2026-04-29 · 💻 cs.CR · cs.AI

Recognition: unknown

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:11 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords behavioral firewallAI agentstool callsanomaly detectionstructured workflowsparameter boundsattack mitigationtelemetry
0
0 comments X

The pith

Compiling verified benign tool-call data into a parameterized automaton lets a runtime gateway enforce safe sequences and parameter bounds for structured AI agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to record only approved tool-use patterns from safe runs of an AI agent and turn those patterns into a compact model of allowed behavior. At runtime the model checks each new tool call to ensure it follows a permitted sequence, stays in the right context, and uses parameters within pre-set limits. If this holds, many attacks that require stepping outside normal patterns get stopped before they reach sensitive tools. A reader would care because agents often work in environments where one bad tool call can leak data or cause harm, yet the checks add almost no delay once the model exists. The heavy work happens offline, leaving only a fast lookup during operation.

Core claim

By compiling verified benign tool-call telemetry into a parameterized deterministic finite automaton that encodes permitted tool sequences, sequential contexts, and parameter bounds, the system allows a lightweight gateway to enforce these boundaries via constant-time state transitions, thereby collapsing the attack surface for structured-workflow agents while introducing minimal latency.

What carries the argument

A parameterized deterministic finite automaton built from benign telemetry that records allowed tool sequences, contexts, and bounds for fast runtime enforcement.

If this is right

  • Within structured workflows the fraction of successful attacks falls to 2.2 percent, compared with 12.8 percent for a stateless scanner.
  • Multi-step and context-sequential attacks reach zero success in the tested structured settings.
  • Of one thousand spliced exfiltration attempts, only 1.4 percent even reach a valid structural path, and none of those succeed after the final parameter checks.
  • Each tool call adds 2.2 milliseconds of latency while benign tasks still complete with a 2 percent failure rate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same offline-modeling step could be applied to other agents whose normal operation can be profiled in advance.
  • The model would need periodic rebuilding whenever the set of legitimate tasks grows or changes.
  • Pairing the sequence checks with separate defenses for word-level substitutions would address the remaining 18 percent evasion rate noted in the evaluation.

Load-bearing premise

That every legitimate behavior and its exact parameter limits can be captured completely from a finite set of verified safe examples without leaving gaps that attackers can reach.

What would settle it

An attack that reaches a sensitive tool by using a sequence absent from the benign data yet still passes every parameter check would show the model does not fully close the attack surface.

read the original abstract

Structured-workflow agents driven by large language models execute tool calls against sensitive external environments. We propose \codename, a telemetry-driven behavioral anomaly detection firewall. Drawing on sequence-based intrusion detection, \codename\ compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA). The model defines permitted tool sequences, sequential contexts, and parameter bounds. At runtime, a lightweight gateway enforces these boundaries via an $O(1)$ state-transition structural lookup, shifting computationally expensive analysis entirely offline. Evaluated on the Agent Security Bench (ASB), \codename\ achieves a 5.6\% macro-averaged attack success rate (ASR) across five scenarios. Within three structured workflows, ASR drops to 2.2\%, outperforming Aegis, a state-of-the-art stateless scanner, at 12.8\%. \codename\ achieves 0\% ASR on multi-step and context-sequential attacks in structured settings. Furthermore, against 1,000 algorithmically spliced exfiltration payloads, only 1.4\% matched valid structural paths, all of which failed end-to-end string parameter guards (0 successes out of 14 surviving paths, 95\% CI [0\%, 23.2\%]). \codename\ introduces just 2.2~ms of per-call latency (a 3.7$\times$ speedup over \textsc{Aegis}) while maintaining a 2.0\% benign task failure rate (BTFR) on benign workloads. Modeling the behavioral trajectory effectively collapses the available attack surface, but unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18\% evasion rate). Thus, exact-match whitelisting of sensitive parameters ultimately bears the final defensive load against execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces codename, a telemetry-driven behavioral firewall for structured-workflow LLM agents. It compiles verified benign tool-call sequences, contexts, and parameter bounds into a parameterized deterministic finite automaton (pDFA) that is enforced at runtime via O(1) state transitions after offline compilation. On the Agent Security Bench (ASB), it reports a 5.6% macro-averaged ASR across five scenarios (2.2% in three structured workflows), 0% ASR on multi-step and context-sequential attacks in structured settings, 1.4% structural matches on 1,000 spliced exfiltration payloads (all failing parameter guards), 2.2 ms per-call latency (3.7× faster than Aegis), and 2.0% BTFR, while noting that continuous parameter bounds remain vulnerable to synonym-substitution attacks (18% evasion) and that exact-match whitelisting bears the final load.

Significance. If the pDFA construction from benign telemetry proves sound and complete, the work offers a practical, low-overhead defense that shifts expensive analysis offline and demonstrably shrinks the attack surface for structured agents. The concrete latency and ASR numbers, the explicit comparison to Aegis, the 0% result on multi-step attacks, and the honest acknowledgment of parameter-bound limitations are strengths; the approach aligns with sequence-based IDS traditions while addressing agent-specific tool-call trajectories.

major comments (2)
  1. [§4] §4 (Evaluation) and abstract: the reported ASR figures (5.6% macro, 2.2% structured, 0% on multi-step/context-sequential) and the 1.4% structural-match rate on spliced payloads rest on the unstated claim that the collected benign telemetry is both exhaustive for legitimate behaviors/contexts and sufficient to set tight parameter bounds. No description is given of telemetry collection protocol, verification method, or how the pDFA states and bounds are derived from it; without this, the completeness assumption cannot be assessed and the low ASR numbers are not yet reproducible.
  2. [Abstract, §5] Abstract and §5 (Discussion): the paper states that 'unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18% evasion rate)' and that 'exact-match whitelisting of sensitive parameters ultimately bears the final defensive load.' This directly qualifies the headline 0% ASR claim on the 14 surviving structural paths; the evaluation therefore does not demonstrate end-to-end robustness of the pDFA+guards combination beyond the specific test distribution, which is load-bearing for the central security claim.
minor comments (2)
  1. [Abstract] The 95% CI reported for 0/14 successes ([0%, 23.2%]) is correctly wide; the text should explicitly note that this interval does not strongly support a zero-success claim and should be discussed in the context of the small surviving-path count.
  2. [§3] Notation for the pDFA (states, transition function, parameterization of bounds) is introduced but not formalized with equations or pseudocode; adding a compact definition would improve clarity without lengthening the paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments on our paper. We address each of the major comments point by point below, providing clarifications and indicating revisions where necessary to enhance the manuscript's reproducibility and clarity.

read point-by-point responses
  1. Referee: [§4] §4 (Evaluation) and abstract: the reported ASR figures (5.6% macro, 2.2% structured, 0% on multi-step/context-sequential) and the 1.4% structural-match rate on spliced payloads rest on the unstated claim that the collected benign telemetry is both exhaustive for legitimate behaviors/contexts and sufficient to set tight parameter bounds. No description is given of telemetry collection protocol, verification method, or how the pDFA states and bounds are derived from it; without this, the completeness assumption cannot be assessed and the low ASR numbers are not yet reproducible.

    Authors: We agree that additional details on the telemetry collection are necessary for full reproducibility and to allow assessment of the completeness of the benign traces. The current manuscript focuses on the pDFA construction and runtime enforcement but omits a detailed protocol description. In the revised version, we will add a dedicated subsection under §3 (System Design) or §4 (Evaluation) that specifies: (1) the telemetry collection protocol, including the environments and tasks used to gather benign tool-call sequences; (2) the verification method employed to confirm that traces represent only legitimate behaviors; and (3) the exact procedure for deriving pDFA states, transitions, and parameter bounds from the collected data. This will strengthen the claims regarding the low ASR figures. revision: yes

  2. Referee: [Abstract, §5] Abstract and §5 (Discussion): the paper states that 'unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18% evasion rate)' and that 'exact-match whitelisting of sensitive parameters ultimately bears the final defensive load.' This directly qualifies the headline 0% ASR claim on the 14 surviving structural paths; the evaluation therefore does not demonstrate end-to-end robustness of the pDFA+guards combination beyond the specific test distribution, which is load-bearing for the central security claim.

    Authors: The manuscript already explicitly qualifies the 0% ASR result by noting the limitations of continuous parameter bounds and the reliance on exact-match whitelisting for sensitive parameters. The 0% figure applies to the 14 structural paths that survived the pDFA check but were then blocked by the parameter guards in our specific test set of spliced payloads. The synonym-substitution evaluation is presented separately to highlight a known limitation of unmaintained bounds, which is why we emphasize the need for exact-match whitelisting. We believe this does not undermine the central claim for the tested distribution, as the full system (pDFA + guards) achieved 0 successes on those paths. However, we acknowledge that broader robustness would require maintained bounds or additional mechanisms. We will consider adding a brief clarification in §5 to emphasize that the 0% result is under the full enforcement model. revision: partial

Circularity Check

0 steps flagged

No circularity in pDFA compilation or enforcement claims

full rationale

The paper describes an offline compilation of verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA) followed by runtime O(1) structural enforcement. No equations, fitted parameters presented as predictions, self-definitional loops, or load-bearing self-citations appear in the derivation. The reported ASR, BTFR, and evasion rates are empirical measurements on the Agent Security Bench and spliced payloads; they do not reduce to the inputs by construction. The completeness assumption for the telemetry set is stated explicitly as a modeling premise rather than derived from the results themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to identify concrete free parameters, axioms, or invented entities; the pDFA construction and parameter bounds are described at a conceptual level without explicit fitting procedures or background assumptions listed.

pith-pipeline@v0.9.0 · 5615 in / 1201 out tokens · 86203 ms · 2026-05-07T13:11:11.296015+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 11 canonical work pages · 5 internal anchors

  1. [1]

    npj Digital Medicine9(51) (2026)

    Tian, J., Fard, P., Cagan, C., Rezaii, N., Rocha, R.B., Wang, L., Junior, V.M., Blacker, D., Haas, J.S., Patel, C.J., Murphy, S.N., Moura, L., Estiri, H.,: An autonomous agentic workflow for clinical detection of cognitive concerns using large language models. npj Digital Medicine9(51) (2026)

  2. [2]

    Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

    Hou, X., Zhao, Y., Wang, S., Wang, H.: Model context protocol (MCP): Landscape, security threats, and future research directions. arXiv preprint arXiv:2503.23278 (2025) 27

  3. [3]

    Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

    Chhabra, A., Datta, S., Nahin, S.K., Mohapatra, P.: Agentic AI security: Threats, defenses, evaluation, and open challenges. arXiv preprint arXiv:2510.23883 (2025)

  4. [4]

    ICT Express (2025)

    Ferrag, M.A., Tihanyi, N., Hamouda, D., Maglaras, L., Lakas, A., Debbah, M.: From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows. ICT Express (2025)

  5. [5]

    Aegis: No tool call left unchecked – a pre-execution firewall and audit layer for ai agents,

    Yuan, A., Su, Z., Zhao, Y.: AEGIS: No tool call left unchecked — a pre-execution firewall and audit layer for AI agents. arXiv preprint arXiv:2603.12621 (2026)

  6. [6]

    Firewalls to secure dynamic llm agentic networks,

    Abdelnabi, S., Gomaa, A., Bagdasarian, E., Kristensson, P.O., Shokri, R.: Fire- walls to secure dynamic LLM agentic networks. arXiv preprint arXiv:2502.01822 (2025)

  7. [7]

    Applied Sciences16(1), 85 (2026)

    Podpora, M., Baranowski, M., Chopcian, M., Kwasniewicz, L., Radziewicz, W.: LLM firewall using validator agent for prevention against prompt injection attacks. Applied Sciences16(1), 85 (2026)

  8. [8]

    In: Proceedings of the 1996 IEEE Symposium on Security and Privacy, pp

    Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for Unix processes. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy, pp. 120–128. IEEE, Los Alamitos, CA (1996)

  9. [9]

    Data Science and Engineering (2025)

    Xu, W., Huang, C., Gao, S., Shang, S.,: LLM-based agents for tool learning: A survey. Data Science and Engineering (2025)

  10. [10]

    PromptArmor: Simple yet Effective Prompt Injection Defenses.arXiv preprint arXiv:2507.15219, 2025.https: //arxiv.org/abs/2507.15219

    Shi, T., Zhu, K., Wang, Z., Jia, Y., Cai, W., Liang, W., Wang, H., Alzahrani, H., Lu, J., Kawaguchi, K., Alomair, B., Zhao, X., Wang, W.Y., Gong, N., Guo, W., Song, D.: PromptArmor: Simple yet effective prompt injection defenses. arXiv preprint arXiv:2507.15219 (2025)

  11. [11]

    Neural Networks (2025)

    Liao, Z., Chen, K., Lin, Y., Li, K., Liu, Y., et al.: Attack and defense techniques in large language models: A survey and new perspectives. Neural Networks (2025)

  12. [12]

    Securing AI Agents with Information-Flow Control

    Costa, M., K¨ opf, B., Kolluri, A., Paverd, A., Russinovich, M., Salem, A., Tople, S., Wutschitz, L., Zanella-B´ eguelin, S.: Securing AI agents with information-flow control. arXiv preprint arXiv:2505.23643 (2025)

  13. [13]

    arXiv preprint arXiv:2603.05031 (2026)

    Uddin, M.S., Hajira, S.: AegisUI: Behavioral anomaly detection for structured user interface protocols in AI agent systems. arXiv preprint arXiv:2603.05031 (2026)

  14. [14]

    Technical Report MTR-2547

    Bell, D.E., LaPadula, L.J.: Secure computer systems: Mathematical foundations (1973). Technical Report MTR-2547

  15. [15]

    Communications of the ACM19(5), 236–243 (1976) 28

    Denning, D.E.: A lattice model of secure information flow. Communications of the ACM19(5), 236–243 (1976) 28

  16. [16]

    In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, pp

    Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: Alternative data models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, pp. 133–145. IEEE, Los Alamitos, CA (1999)

  17. [17]

    arXiv preprint arXiv:2510.23487 (2025)

    Koohestani, R., Li, Z., Podkopaev, A., Izadi, M.: Are agents probabilistic automata? A trace-based, memory-constrained theory of agentic AI. arXiv preprint arXiv:2510.23487 (2025)

  18. [18]

    Journal of Logic and Algebraic Programming78(5), 293–303 (2009)

    Leucker, M., Schallhart, C.: A brief account of runtime verification. Journal of Logic and Algebraic Programming78(5), 293–303 (2009)

  19. [19]

    In: Proceedings of the 29th International Conference on Computer Aided Verification (CAV)

    D’Antoni, L., Veanes, M.: The power of symbolic automata and transducers. In: Proceedings of the 29th International Conference on Computer Aided Verification (CAV). Lecture Notes in Computer Science, vol. 10426, pp. 47–67. Springer, Cham (2017)

  20. [20]

    Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using Siamese BERT-networks (2019)

  21. [21]

    Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

    Zhang, H., Huang, J., Mei, K., Yao, Y., Wang, Z., Zhan, C., Wang, H., Zhang, Y.: Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents. In: Proceedings of the International Conference on Learning Representations (ICLR) (2025).https://arxiv.org/abs/2410.02644

  22. [22]

    ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

    Qin, Y.,et al.: Toolllm: Facilitating large language models to master 16000+ real-world APIs. In: Proceedings of the International Conference on Learning Representations (ICLR) (2024).https://arxiv.org/abs/2307.16789

  23. [23]

    Advani, Trajectory guard – a lightweight, sequence-aware model for real-time anomaly detection in agentic AI, arXiv preprint arXiv:2601.00516 (2026)

    Advani, L.: Trajectory guard–a lightweight, sequence-aware model for real-time anomaly detection in agentic ai. arXiv preprint arXiv:2601.00516 (2026)

  24. [24]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

    Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? a strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8018–8025 (2020) 29