arxiv: 2604.26274 · v1 · submitted 2026-04-29 · 💻 cs.CR · cs.AI

Recognition: unknown

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

Hung Dang

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:11 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords behavioral firewallAI agentstool callsanomaly detectionstructured workflowsparameter boundsattack mitigationtelemetry

0 comments

The pith

Compiling verified benign tool-call data into a parameterized automaton lets a runtime gateway enforce safe sequences and parameter bounds for structured AI agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to record only approved tool-use patterns from safe runs of an AI agent and turn those patterns into a compact model of allowed behavior. At runtime the model checks each new tool call to ensure it follows a permitted sequence, stays in the right context, and uses parameters within pre-set limits. If this holds, many attacks that require stepping outside normal patterns get stopped before they reach sensitive tools. A reader would care because agents often work in environments where one bad tool call can leak data or cause harm, yet the checks add almost no delay once the model exists. The heavy work happens offline, leaving only a fast lookup during operation.

Core claim

By compiling verified benign tool-call telemetry into a parameterized deterministic finite automaton that encodes permitted tool sequences, sequential contexts, and parameter bounds, the system allows a lightweight gateway to enforce these boundaries via constant-time state transitions, thereby collapsing the attack surface for structured-workflow agents while introducing minimal latency.

What carries the argument

A parameterized deterministic finite automaton built from benign telemetry that records allowed tool sequences, contexts, and bounds for fast runtime enforcement.

If this is right

Within structured workflows the fraction of successful attacks falls to 2.2 percent, compared with 12.8 percent for a stateless scanner.
Multi-step and context-sequential attacks reach zero success in the tested structured settings.
Of one thousand spliced exfiltration attempts, only 1.4 percent even reach a valid structural path, and none of those succeed after the final parameter checks.
Each tool call adds 2.2 milliseconds of latency while benign tasks still complete with a 2 percent failure rate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same offline-modeling step could be applied to other agents whose normal operation can be profiled in advance.
The model would need periodic rebuilding whenever the set of legitimate tasks grows or changes.
Pairing the sequence checks with separate defenses for word-level substitutions would address the remaining 18 percent evasion rate noted in the evaluation.

Load-bearing premise

That every legitimate behavior and its exact parameter limits can be captured completely from a finite set of verified safe examples without leaving gaps that attackers can reach.

What would settle it

An attack that reaches a sensitive tool by using a sequence absent from the benign data yet still passes every parameter check would show the model does not fully close the attack surface.

read the original abstract

Structured-workflow agents driven by large language models execute tool calls against sensitive external environments. We propose \codename, a telemetry-driven behavioral anomaly detection firewall. Drawing on sequence-based intrusion detection, \codename\ compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA). The model defines permitted tool sequences, sequential contexts, and parameter bounds. At runtime, a lightweight gateway enforces these boundaries via an $O(1)$ state-transition structural lookup, shifting computationally expensive analysis entirely offline. Evaluated on the Agent Security Bench (ASB), \codename\ achieves a 5.6\% macro-averaged attack success rate (ASR) across five scenarios. Within three structured workflows, ASR drops to 2.2\%, outperforming Aegis, a state-of-the-art stateless scanner, at 12.8\%. \codename\ achieves 0\% ASR on multi-step and context-sequential attacks in structured settings. Furthermore, against 1,000 algorithmically spliced exfiltration payloads, only 1.4\% matched valid structural paths, all of which failed end-to-end string parameter guards (0 successes out of 14 surviving paths, 95\% CI [0\%, 23.2\%]). \codename\ introduces just 2.2~ms of per-call latency (a 3.7$\times$ speedup over \textsc{Aegis}) while maintaining a 2.0\% benign task failure rate (BTFR) on benign workloads. Modeling the behavioral trajectory effectively collapses the available attack surface, but unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18\% evasion rate). Thus, exact-match whitelisting of sensitive parameters ultimately bears the final defensive load against execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The pDFA firewall gives concrete ASR drops and low latency for structured LLM agent workflows by enforcing trajectories from benign telemetry, but its effectiveness rests on complete coverage of legitimate behaviors and exact parameter whitelisting for the rest.

read the letter

The key takeaway is that this behavioral firewall compiles verified tool-call telemetry into a pDFA to restrict sequences, contexts, and bounds at runtime with O(1) checks, delivering 5.6% macro ASR on the Agent Security Bench, 2.2% in structured workflows, zero on multi-step attacks, and 2.2 ms latency while beating Aegis. It also notes its own limit on parameter evasion. What is new is the direct adaptation of sequence-based intrusion detection and parameterized automata to LLM agent tool workflows, with the offline compilation step keeping runtime cheap. The engineering choice to shift analysis offline and the specific benchmark numbers on spliced payloads and benign failure rates are useful and grounded in the reported results. The 3.7x speedup and 2% BTFR show the approach is practical rather than theoretical. The soft spot is the assumption that benign telemetry can be made complete enough to avoid either blocking valid tasks or loosening the model to admit attacks. The abstract does not detail collection or verification methods, and the 18% synonym-substitution evasion rate means continuous bounds are not robust on their own, so exact-match whitelisting carries the final load. The 95% CI on the zero successes out of 14 is wide, which tempers how definitive the payload results are. If the full paper supplies the missing methodology on telemetry and parameterization, that would strengthen it. This is for researchers and engineers working on secure, fixed-workflow LLM agents in enterprise or tool-using settings where upfront telemetry is feasible. A reader focused on practical agent defenses would find the tradeoffs and comparisons worth examining. It deserves peer review because the mechanism is clearly motivated, the claims are specific and falsifiable, and the limitations are stated openly rather than hidden.

Referee Report

2 major / 2 minor

Summary. The paper introduces codename, a telemetry-driven behavioral firewall for structured-workflow LLM agents. It compiles verified benign tool-call sequences, contexts, and parameter bounds into a parameterized deterministic finite automaton (pDFA) that is enforced at runtime via O(1) state transitions after offline compilation. On the Agent Security Bench (ASB), it reports a 5.6% macro-averaged ASR across five scenarios (2.2% in three structured workflows), 0% ASR on multi-step and context-sequential attacks in structured settings, 1.4% structural matches on 1,000 spliced exfiltration payloads (all failing parameter guards), 2.2 ms per-call latency (3.7× faster than Aegis), and 2.0% BTFR, while noting that continuous parameter bounds remain vulnerable to synonym-substitution attacks (18% evasion) and that exact-match whitelisting bears the final load.

Significance. If the pDFA construction from benign telemetry proves sound and complete, the work offers a practical, low-overhead defense that shifts expensive analysis offline and demonstrably shrinks the attack surface for structured agents. The concrete latency and ASR numbers, the explicit comparison to Aegis, the 0% result on multi-step attacks, and the honest acknowledgment of parameter-bound limitations are strengths; the approach aligns with sequence-based IDS traditions while addressing agent-specific tool-call trajectories.

major comments (2)

[§4] §4 (Evaluation) and abstract: the reported ASR figures (5.6% macro, 2.2% structured, 0% on multi-step/context-sequential) and the 1.4% structural-match rate on spliced payloads rest on the unstated claim that the collected benign telemetry is both exhaustive for legitimate behaviors/contexts and sufficient to set tight parameter bounds. No description is given of telemetry collection protocol, verification method, or how the pDFA states and bounds are derived from it; without this, the completeness assumption cannot be assessed and the low ASR numbers are not yet reproducible.
[Abstract, §5] Abstract and §5 (Discussion): the paper states that 'unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18% evasion rate)' and that 'exact-match whitelisting of sensitive parameters ultimately bears the final defensive load.' This directly qualifies the headline 0% ASR claim on the 14 surviving structural paths; the evaluation therefore does not demonstrate end-to-end robustness of the pDFA+guards combination beyond the specific test distribution, which is load-bearing for the central security claim.

minor comments (2)

[Abstract] The 95% CI reported for 0/14 successes ([0%, 23.2%]) is correctly wide; the text should explicitly note that this interval does not strongly support a zero-success claim and should be discussed in the context of the small surviving-path count.
[§3] Notation for the pDFA (states, transition function, parameterization of bounds) is introduced but not formalized with equations or pseudocode; adding a compact definition would improve clarity without lengthening the paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments on our paper. We address each of the major comments point by point below, providing clarifications and indicating revisions where necessary to enhance the manuscript's reproducibility and clarity.

read point-by-point responses

Referee: [§4] §4 (Evaluation) and abstract: the reported ASR figures (5.6% macro, 2.2% structured, 0% on multi-step/context-sequential) and the 1.4% structural-match rate on spliced payloads rest on the unstated claim that the collected benign telemetry is both exhaustive for legitimate behaviors/contexts and sufficient to set tight parameter bounds. No description is given of telemetry collection protocol, verification method, or how the pDFA states and bounds are derived from it; without this, the completeness assumption cannot be assessed and the low ASR numbers are not yet reproducible.

Authors: We agree that additional details on the telemetry collection are necessary for full reproducibility and to allow assessment of the completeness of the benign traces. The current manuscript focuses on the pDFA construction and runtime enforcement but omits a detailed protocol description. In the revised version, we will add a dedicated subsection under §3 (System Design) or §4 (Evaluation) that specifies: (1) the telemetry collection protocol, including the environments and tasks used to gather benign tool-call sequences; (2) the verification method employed to confirm that traces represent only legitimate behaviors; and (3) the exact procedure for deriving pDFA states, transitions, and parameter bounds from the collected data. This will strengthen the claims regarding the low ASR figures. revision: yes
Referee: [Abstract, §5] Abstract and §5 (Discussion): the paper states that 'unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18% evasion rate)' and that 'exact-match whitelisting of sensitive parameters ultimately bears the final defensive load.' This directly qualifies the headline 0% ASR claim on the 14 surviving structural paths; the evaluation therefore does not demonstrate end-to-end robustness of the pDFA+guards combination beyond the specific test distribution, which is load-bearing for the central security claim.

Authors: The manuscript already explicitly qualifies the 0% ASR result by noting the limitations of continuous parameter bounds and the reliance on exact-match whitelisting for sensitive parameters. The 0% figure applies to the 14 structural paths that survived the pDFA check but were then blocked by the parameter guards in our specific test set of spliced payloads. The synonym-substitution evaluation is presented separately to highlight a known limitation of unmaintained bounds, which is why we emphasize the need for exact-match whitelisting. We believe this does not undermine the central claim for the tested distribution, as the full system (pDFA + guards) achieved 0 successes on those paths. However, we acknowledge that broader robustness would require maintained bounds or additional mechanisms. We will consider adding a brief clarification in §5 to emphasize that the 0% result is under the full enforcement model. revision: partial

Circularity Check

0 steps flagged

No circularity in pDFA compilation or enforcement claims

full rationale

The paper describes an offline compilation of verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA) followed by runtime O(1) structural enforcement. No equations, fitted parameters presented as predictions, self-definitional loops, or load-bearing self-citations appear in the derivation. The reported ASR, BTFR, and evasion rates are empirical measurements on the Agent Security Bench and spliced payloads; they do not reduce to the inputs by construction. The completeness assumption for the telemetry set is stated explicitly as a modeling premise rather than derived from the results themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to identify concrete free parameters, axioms, or invented entities; the pDFA construction and parameter bounds are described at a conceptual level without explicit fitting procedures or background assumptions listed.

pith-pipeline@v0.9.0 · 5615 in / 1201 out tokens · 86203 ms · 2026-05-07T13:11:11.296015+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 11 canonical work pages · 5 internal anchors

[1]

npj Digital Medicine9(51) (2026)

Tian, J., Fard, P., Cagan, C., Rezaii, N., Rocha, R.B., Wang, L., Junior, V.M., Blacker, D., Haas, J.S., Patel, C.J., Murphy, S.N., Moura, L., Estiri, H.,: An autonomous agentic workflow for clinical detection of cognitive concerns using large language models. npj Digital Medicine9(51) (2026)

2026
[2]

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

Hou, X., Zhao, Y., Wang, S., Wang, H.: Model context protocol (MCP): Landscape, security threats, and future research directions. arXiv preprint arXiv:2503.23278 (2025) 27

work page internal anchor Pith review arXiv 2025
[3]

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Chhabra, A., Datta, S., Nahin, S.K., Mohapatra, P.: Agentic AI security: Threats, defenses, evaluation, and open challenges. arXiv preprint arXiv:2510.23883 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

ICT Express (2025)

Ferrag, M.A., Tihanyi, N., Hamouda, D., Maglaras, L., Lakas, A., Debbah, M.: From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows. ICT Express (2025)

2025
[5]

Aegis: No tool call left unchecked – a pre-execution firewall and audit layer for ai agents,

Yuan, A., Su, Z., Zhao, Y.: AEGIS: No tool call left unchecked — a pre-execution firewall and audit layer for AI agents. arXiv preprint arXiv:2603.12621 (2026)

work page arXiv 2026
[6]

Firewalls to secure dynamic llm agentic networks,

Abdelnabi, S., Gomaa, A., Bagdasarian, E., Kristensson, P.O., Shokri, R.: Fire- walls to secure dynamic LLM agentic networks. arXiv preprint arXiv:2502.01822 (2025)

work page arXiv 2025
[7]

Applied Sciences16(1), 85 (2026)

Podpora, M., Baranowski, M., Chopcian, M., Kwasniewicz, L., Radziewicz, W.: LLM firewall using validator agent for prevention against prompt injection attacks. Applied Sciences16(1), 85 (2026)

2026
[8]

In: Proceedings of the 1996 IEEE Symposium on Security and Privacy, pp

Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for Unix processes. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy, pp. 120–128. IEEE, Los Alamitos, CA (1996)

1996
[9]

Data Science and Engineering (2025)

Xu, W., Huang, C., Gao, S., Shang, S.,: LLM-based agents for tool learning: A survey. Data Science and Engineering (2025)

2025
[10]

PromptArmor: Simple yet Effective Prompt Injection Defenses.arXiv preprint arXiv:2507.15219, 2025.https: //arxiv.org/abs/2507.15219

Shi, T., Zhu, K., Wang, Z., Jia, Y., Cai, W., Liang, W., Wang, H., Alzahrani, H., Lu, J., Kawaguchi, K., Alomair, B., Zhao, X., Wang, W.Y., Gong, N., Guo, W., Song, D.: PromptArmor: Simple yet effective prompt injection defenses. arXiv preprint arXiv:2507.15219 (2025)

work page arXiv 2025
[11]

Neural Networks (2025)

Liao, Z., Chen, K., Lin, Y., Li, K., Liu, Y., et al.: Attack and defense techniques in large language models: A survey and new perspectives. Neural Networks (2025)

2025
[12]

Securing AI Agents with Information-Flow Control

Costa, M., K¨ opf, B., Kolluri, A., Paverd, A., Russinovich, M., Salem, A., Tople, S., Wutschitz, L., Zanella-B´ eguelin, S.: Securing AI agents with information-flow control. arXiv preprint arXiv:2505.23643 (2025)

work page internal anchor Pith review arXiv 2025
[13]

arXiv preprint arXiv:2603.05031 (2026)

Uddin, M.S., Hajira, S.: AegisUI: Behavioral anomaly detection for structured user interface protocols in AI agent systems. arXiv preprint arXiv:2603.05031 (2026)

work page arXiv 2026
[14]

Technical Report MTR-2547

Bell, D.E., LaPadula, L.J.: Secure computer systems: Mathematical foundations (1973). Technical Report MTR-2547

1973
[15]

Communications of the ACM19(5), 236–243 (1976) 28

Denning, D.E.: A lattice model of secure information flow. Communications of the ACM19(5), 236–243 (1976) 28

1976
[16]

In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, pp

Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: Alternative data models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, pp. 133–145. IEEE, Los Alamitos, CA (1999)

1999
[17]

arXiv preprint arXiv:2510.23487 (2025)

Koohestani, R., Li, Z., Podkopaev, A., Izadi, M.: Are agents probabilistic automata? A trace-based, memory-constrained theory of agentic AI. arXiv preprint arXiv:2510.23487 (2025)

work page arXiv 2025
[18]

Journal of Logic and Algebraic Programming78(5), 293–303 (2009)

Leucker, M., Schallhart, C.: A brief account of runtime verification. Journal of Logic and Algebraic Programming78(5), 293–303 (2009)

2009
[19]

In: Proceedings of the 29th International Conference on Computer Aided Verification (CAV)

D’Antoni, L., Veanes, M.: The power of symbolic automata and transducers. In: Proceedings of the 29th International Conference on Computer Aided Verification (CAV). Lecture Notes in Computer Science, vol. 10426, pp. 47–67. Springer, Cham (2017)

2017
[20]

Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using Siamese BERT-networks (2019)

2019
[21]

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Zhang, H., Huang, J., Mei, K., Yao, Y., Wang, Z., Zhan, C., Wang, H., Zhang, Y.: Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents. In: Proceedings of the International Conference on Learning Representations (ICLR) (2025).https://arxiv.org/abs/2410.02644

work page internal anchor Pith review arXiv 2025
[22]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Qin, Y.,et al.: Toolllm: Facilitating large language models to master 16000+ real-world APIs. In: Proceedings of the International Conference on Learning Representations (ICLR) (2024).https://arxiv.org/abs/2307.16789

work page internal anchor Pith review arXiv 2024
[23]

Advani, Trajectory guard – a lightweight, sequence-aware model for real-time anomaly detection in agentic AI, arXiv preprint arXiv:2601.00516 (2026)

Advani, L.: Trajectory guard–a lightweight, sequence-aware model for real-time anomaly detection in agentic ai. arXiv preprint arXiv:2601.00516 (2026)

work page arXiv 2026
[24]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? a strong baseline for natural language attack on text classification and entailment. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8018–8025 (2020) 29

2020