pith. machine review for the scientific record. sign in

arxiv: 2605.00081 · v1 · submitted 2026-04-30 · 💻 cs.CR · cs.LO

Recognition: unknown

Alignment Contracts for Agentic Security Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:43 UTC · model grok-4.3

classification 💻 cs.CR cs.LO
keywords alignment contractsagentic securityeffect tracessafety propertiessoundness theoremLLM agentsformal verificationenforcement
0
0 comments X

The pith

Alignment contracts turn observable effect traces into enforceable safety properties for agentic security systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces alignment contracts to specify scope, allowed and forbidden effects, resource budgets, and disclosure policies for systems that combine LLM planners with vulnerability tools. It equips the contracts with finite-trace semantics, shows that satisfaction is a safety property with finite violation witnesses, and proves that admissibility checking is decidable along with refinement and composition rules. Under the explicit assumption that every relevant effect is mediated and therefore observable, a soundness theorem quantifies over agent models and guarantees enforcement soundness for monitor-realized traces. This approach addresses the asymmetric control problem by making the authorized-engagement boundary a first-class formal object rather than an informal guardrail.

Core claim

The authors define alignment contracts over observable effect traces, give them finite-trace semantics, characterize satisfaction as a safety property, develop refinement and one-way composition rules, and prove that admissibility checking is decidable. Under the Effect Observability Assumption that all ΣEff-effects are mediated, they establish a soundness theorem that quantifies over the agent model and yields enforcement guarantees, including for monitor-realized traces, together with an assumption-lifted adaptation result and limits expressed via undecidability transfer and observability-boundary theorems.

What carries the argument

Alignment contracts, which specify scope, allowed and forbidden effects, resource budgets, and disclosure policies over finite observable effect traces.

If this is right

  • Refinement and one-way composition rules support modular contract engineering.
  • Admissibility checking is decidable for any contract.
  • Soundness holds for all mediated ΣEff-effects when the observability assumption is met.
  • The same contract structure extends from web-focused workflows to other effect profiles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The contracts could be layered on top of existing LLM guardrails to add formal trace-level guarantees.
  • The undecidability transfer result marks a hard limit once any effect escapes mediation.
  • Real-world implementations would need to verify that the mediation layer truly captures every side-effect the agent can produce.

Load-bearing premise

Every relevant effect must be mediated so that the enforcement mechanism can observe it.

What would settle it

A concrete agent execution that produces a non-mediated effect violating the contract yet remains undetected by the monitor would falsify the soundness theorem.

read the original abstract

Agentic security systems increasingly combine LLM planners with tools that can discover, validate, and report vulnerabilities. This creates an asymmetric control problem: the system should retain strong offensive capability inside an authorized engagement, while the same capabilities must be denied outside scope. Existing guardrails provide useful policy controls, but they do not make this boundary a first-class formal contract over observable effects. We introduce alignment contracts, a framework for specifying and enforcing behavioral constraints over observable effect traces. A contract defines scope, allowed and forbidden effects, resource budgets, and disclosure policies. We give the language finite-trace semantics, characterize satisfaction as a safety property with finite violation witnesses, develop refinement and one-way composition rules for modular contract engineering, and show that admissibility checking is decidable. We instantiate the framework for web-focused agentic security workflows and show how the same structure extends to other effect profiles. Under an explicit Effect Observability Assumption, where all $\SigmaEff$-effects are mediated, the soundness theorem quantifies over the agent model and gives guarantees for mediated $\SigmaEff$-effects, including enforcement soundness for monitor-realized traces. We also state an assumption-lifted adaptation result and formalize limits through undecidability transfer and observability-boundary theorems. A Lean 4 artifact checks the formal core theorems used by the paper.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces alignment contracts as a framework for specifying and enforcing behavioral constraints over observable effect traces in agentic security systems that combine LLM planners with tools. It defines a language with finite-trace semantics, characterizes satisfaction as a safety property with finite violation witnesses, develops refinement and one-way composition rules, shows that admissibility checking is decidable, and states a soundness theorem (under the explicit Effect Observability Assumption that all ΣEff-effects are mediated) that quantifies over agent models and gives guarantees including enforcement soundness for monitor-realized traces. The core theorems are mechanically checked in a Lean 4 artifact, with an assumption-lifted adaptation result and limits via undecidability transfer and observability-boundary theorems.

Significance. If the results hold, the work provides a formal, machine-checked foundation for making behavioral boundaries first-class contracts in LLM-based agentic security workflows. The explicit finite-trace semantics, decidability result, and Lean artifact are clear strengths that enable modular contract engineering and verifiable enforcement. The paper's transparent treatment of the Effect Observability Assumption and the adaptation result further strengthen the contribution by delineating the precise scope of the guarantees.

major comments (2)
  1. [Soundness theorem (abstract and §4)] The soundness theorem (as stated in the abstract and developed in the formal core) is load-bearing on the Effect Observability Assumption that all ΣEff-effects are mediated and observable by the enforcement mechanism. The manuscript does not provide a concrete argument or coverage analysis showing that this assumption holds for typical LLM tool invocations (e.g., direct side-channel outputs or unlogged calls), which directly limits the quantification over agent models and the enforcement soundness claim for monitor-realized traces.
  2. [Adaptation result and observability-boundary theorems] The adaptation result for lifting the assumption is stated but lacks a detailed proof sketch or Lean-checked statement showing how the guarantees transfer when the observability boundary is relaxed; this is central to the claim that the framework extends beyond the strict mediation case.
minor comments (2)
  1. [Refinement and composition rules] The abstract refers to 'one-way composition rules' without clarifying in the main text whether these rules are strictly one-directional or admit limited bidirectional use under the refinement relation.
  2. [Instantiation section] Notation for ΣEff and the effect profiles in the web-focused instantiation should be cross-referenced to the general semantics definition to avoid ambiguity when extending to other domains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review, and for recognizing the strengths of the finite-trace semantics, decidability result, and Lean 4 artifact. We respond to each major comment below, proposing targeted revisions where they strengthen the connection between the formal results and practical agentic security workflows.

read point-by-point responses
  1. Referee: [Soundness theorem (abstract and §4)] The soundness theorem (as stated in the abstract and developed in the formal core) is load-bearing on the Effect Observability Assumption that all ΣEff-effects are mediated and observable by the enforcement mechanism. The manuscript does not provide a concrete argument or coverage analysis showing that this assumption holds for typical LLM tool invocations (e.g., direct side-channel outputs or unlogged calls), which directly limits the quantification over agent models and the enforcement soundness claim for monitor-realized traces.

    Authors: We agree that the soundness theorem is conditional on the Effect Observability Assumption, which is stated explicitly throughout the manuscript (including in the abstract and §4). The theorem quantifies over agent models under this assumption, and the observability-boundary theorems already delineate the precise limits when mediation does not hold. However, we acknowledge that the manuscript would benefit from a more concrete discussion of applicability to common LLM tool invocations. In the revised version we will add a short subsection in §4 that (i) describes typical mediation patterns for web-focused agentic security tools (e.g., API-wrapped outputs that are logged by the monitor), (ii) notes that direct side-channel or unlogged calls fall outside the assumption, and (iii) cross-references the boundary theorems. This addition clarifies scope without altering the formal statements. revision: yes

  2. Referee: [Adaptation result and observability-boundary theorems] The adaptation result for lifting the assumption is stated but lacks a detailed proof sketch or Lean-checked statement showing how the guarantees transfer when the observability boundary is relaxed; this is central to the claim that the framework extends beyond the strict mediation case.

    Authors: The adaptation result is stated in the manuscript as an assumption-lifted theorem, and the core theorems (including the boundary theorems) are mechanically checked in the Lean 4 artifact. We recognize that a high-level proof sketch in the main text would improve readability and make the transfer of guarantees more transparent. In the revision we will (i) expand the presentation of the adaptation result with a concise proof sketch in §4 and (ii) verify that the Lean artifact contains the adaptation and boundary statements; if any additional formalization is required we will extend the artifact accordingly. The existing boundary theorems already formalize the limits of the relaxed case. revision: partial

Circularity Check

0 steps flagged

No circularity; formal semantics and Lean-verified theorems are self-contained.

full rationale

The paper defines alignment contracts via explicit finite-trace semantics, safety properties, refinement rules, and decidable admissibility. The soundness theorem is stated under an explicit Effect Observability Assumption and is mechanically checked in Lean 4, with an assumption-lifted adaptation result. No step reduces a claimed prediction or theorem to a fitted parameter, self-definition, or unverified self-citation chain; the derivation chain consists of standard formal definitions and verified implications that do not presuppose their own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the Effect Observability Assumption and the finite-trace semantics introduced by the framework. No numerical free parameters appear. The new concept of alignment contracts is the primary invented entity.

axioms (1)
  • domain assumption Effect Observability Assumption: all ΣEff-effects are mediated
    Invoked to obtain the soundness theorem for monitor-realized traces.
invented entities (1)
  • Alignment contract no independent evidence
    purpose: Specifying scope, allowed/forbidden effects, resource budgets, and disclosure policies over observable traces
    New framework element introduced to make behavioral boundaries first-class formal objects.

pith-pipeline@v0.9.0 · 5528 in / 1392 out tokens · 41576 ms · 2026-05-09T20:43:58.317673+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 31 canonical work pages · 3 internal anchors

  1. [1]

    Sahar Abdelnabi, Amr Gomaa, Eugene Bagdasarian, Per Ola Kristensson, and Reza Shokri. 2025. Firewalls to Secure Dynamic LLM Agentic Networks.arXiv preprint arXiv:2502.01822(2025)

  2. [2]

    Al-Shaer and Hazem H

    Ehab S. Al-Shaer and Hazem H. Hamed. 2004. Discovery of Policy Anomalies in Distributed Firewalls. InProceedings IEEE INFOCOM. 2605–2616. doi:10.1109/ INFCOM.2004.1354680

  3. [3]

    Carolyn Jane Anderson, Nate Foster, Arjun Guha, Jean-Baptiste Jeannin, Dexter Kozen, Cole Schlesinger, and David Walker. 2014. NetKAT: Semantic Foundations for Networks. InProceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 113–126. doi:10.1145/2535838. 2535862

  4. [4]

    Anderson

    James P. Anderson. 1972.Computer Security Technology Planning Study. Vol- ume 2. Technical Report ESD-TR-73-51. Air Force Electronic Systems Division. Introduced the reference monitor concept

  5. [5]

    Anthropic. 2025. AI agents find $4.6M in blockchain smart contract exploits. https://red.anthropic.com/2025/smart-contracts/ December 1, 2025

  6. [6]

    Anthropic. 2025. Disrupting the First Reported AI-Orchestrated Cyber Espi- onage Campaign. https://www.anthropic.com/news/disrupting-AI-espionage November 13, 2025

  7. [7]

    Yair Bartal, Alain Mayer, Kobbi Nissim, and Avishai Wool. 1999. Firmato: A Novel Firewall Management Toolkit. InProceedings of the IEEE Symposium on Security and Privacy (S&P). 17–31. doi:10.1109/SECPRI.1999.766714

  8. [8]

    Basin, Vincent Jugé, Felix Klaedtke, and Eugen Zălinescu

    David A. Basin, Vincent Jugé, Felix Klaedtke, and Eugen Zălinescu. 2013. Enforce- able Security Policies Revisited.ACM Transactions on Information and System Security (TISSEC)16, 1 (2013), 1–26. doi:10.1145/2487222.2487225

  9. [9]

    Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, and Václav Vol- hejn. 2025. Design Patterns for Securing LLM Agents against Prompt Injections. arXiv preprint arXiv:2506.08837(2025)

  10. [10]

    Manish Bhatt, Sahana Chennabasappa, Yue Li, Cyrus Nikolaidis, Daniel Song, Shengye Wan, Faizan Ahmad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil, David Molnar, Spencer Whitman, and Joshua Saxe. 2024. CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models. arXiv preprint arXiv:2404.13161(2024)

  11. [11]

    Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin

  12. [12]

    Securing AI Agents with Information-Flow Control.arXiv preprint arXiv:2505.23643(2025)

  13. [13]

    Cutler, Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Michael Hicks, Kesha Hietala, Eleftherios Ioannidis, John H

    Joseph W. Cutler, Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Michael Hicks, Kesha Hietala, Eleftherios Ioannidis, John H. Kastner, Anwar Mamat, Darin McAdams, Matt McCutchen, Neha Rungta, Emina Torlak, and Andrew Wells. 2024. Cedar: A New Language for Expressive, Fast, Safe, and Analyzable Authorization.Proceedings of the ACM on Programming L...

  14. [14]

    Isaac David and Arthur Gervais. 2025. MAPTA: Multi-Agent Penetration Testing AI for the Web.arXiv preprint arXiv:2508.20816(2025)

  15. [15]

    Henzinger

    Luca de Alfaro and Thomas A. Henzinger. 2001. Interface Automata. InProceed- ings of the 8th European Software Engineering Conference / 9th ACM SIGSOFT Symposium on Foundations of Software Engineering (ESEC/FSE). ACM, 109–120. doi:10.1145/503209.503226

  16. [16]

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents. InAdvances in Neural Information Processing Systems (NeurIPS). Datasets and Benchmarks Track; preprint: arXiv:2406.13352

  17. [17]

    Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. 2024. PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing. In33rd USENIX Security Symposium (USENIX Security 24). 847–864

  18. [18]

    Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Michael Hicks, Kesha Hietala, John Kastner, Anwar Mamat, Matt McCutchen, Neha Rungta, Bhakti Shah, Emina Torlak, and Andrew Wells. 2024. How We Built Cedar: A Verification- Guided Approach. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FS...

  19. [19]

    Schneider

    Úlfar Erlingsson and Fred B. Schneider. 2000. IRM Enforcement of Java Stack Inspection. InIEEE Symposium on Security and Privacy (S&P). 246–255

  20. [20]

    Arthur Gervais and Liyi Zhou. 2026. AI Agent Smart Contract Exploit Generation. InFinancial Cryptography and Data Security (FC 2026)

  21. [21]

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security

  22. [22]

    Harrison, Walter L

    Michael A. Harrison, Walter L. Ruzzo, and Jeffrey D. Ullman. 1976. Protection in Operating Systems.Commun. ACM19, 8 (1976), 461–471. doi:10.1145/360303. 360333

  23. [23]

    Hu, David Ferraiolo, Rick Kuhn, Adam Schnitzer, Kenneth Sandlin, Robert Miller, and Karen Scarfone

    Vincent C. Hu, David Ferraiolo, Rick Kuhn, Adam Schnitzer, Kenneth Sandlin, Robert Miller, and Karen Scarfone. 2014.NIST SP 800-162: Guide to Attribute Based Access Control (ABAC) Definition and Considerations. Technical Report. National Institute of Standards and Technology. doi:10.6028/NIST.SP.800-162

  24. [24]

    Ruochen Jiao, Shaoyuan Xie, Justin Yue, Takami Sato, Lixu Wang, Yixuan Wang, Qi Alfred Chen, and Qi Zhu. 2024. Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems.arXiv preprint arXiv:2405.20774(2024)

  25. [25]

    Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

    Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2024. SWE-bench: Can Language Models Resolve Real-World GitHub Issues?. InInternational Conference on Learning Representa- tions (ICLR)

  26. [26]

    Adharsh Kamath, Sishen Zhang, Calvin Xu, Shubham Ugare, Gagandeep Singh, and Sasa Misailovic. 2025. Enforcing Temporal Constraints for LLM Agents. arXiv preprint arXiv:2512.23738(2025)

  27. [27]

    Peyman Kazemian, Michael Chan, Hongyi Zeng, George Varghese, Nick McKe- own, and Scott Whyte. 2013. Real Time Network Policy Checking Using Header Space Analysis. InProceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 99–111. https://www.usenix.org/ conference/nsdi13/technical-sessions/presentation/kazemian

  28. [28]

    Peyman Kazemian, George Varghese, and Nick McKeown. 2012. Header Space Analysis: Static Checking for Networks. InProceedings of the 9th USENIX Sympo- sium on Networked Systems Design and Implementation (NSDI). 113–126. https:// www.usenix.org/conference/nsdi12/technical-sessions/presentation/kazemian

  29. [29]

    Brighten Godfrey

    Ahmed Khurshid, Wenxuan Zhou, Matthew Caesar, and P. Brighten Godfrey

  30. [30]

    InProceedings of the First Workshop on Hot Topics in Software Defined Networks (HotSDN)

    VeriFlow: Verifying Network-Wide Invariants in Real Time. InProceedings of the First Workshop on Hot Topics in Software Defined Networks (HotSDN). 49–54. doi:10.1145/2342441.2342452

  31. [31]

    Roham Koohestani. 2025. AgentGuard: Runtime Verification of AI Agents.arXiv preprint arXiv:2509.23864(2025)

  32. [32]

    Jay Ligatti, Lujo Bauer, and David Walker. 2005. Edit Automata: Enforcement Mechanisms for Run-time Security Policies.International Journal of Information Security4, 1–2 (2005), 2–16. doi:10.1007/s10207-004-0046-8

  33. [33]

    Jay Ligatti, Lujo Bauer, and David Walker. 2009. Run-Time Enforcement of Nonsafety Policies.ACM Transactions on Information and System Security12, 3, Article 19 (2009), 41 pages. doi:10.1145/1455526.1455532

  34. [34]

    Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2024. AgentBench: Evaluating LLMs as Agents. InInternational Conference on Learning Represe...

  35. [35]

    Microsoft Security Response Center. 2026. Strengthening Secure Software at Global Scale: How MSRC is Evolving with AI. https://www.microsoft.com/en- us/msrc/blog/2026/04/strengthening-secure-software-global-scale-how-msrc- is-evolving-with-ai April 7, 2026

  36. [36]

    Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, and Florian Tramèr

    Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V. Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, and Florian Tramèr

  37. [37]

    The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LLM Jailbreaks and Prompt Injections.arXiv preprint arXiv:2510.09023 (2025)

  38. [38]

    Sandhu, Edward J

    Ravi S. Sandhu, Edward J. Coyne, Hal L. Feinstein, and Charles E. Youman

  39. [39]

    Sandhu, E

    Role-Based Access Control Models.IEEE Computer29, 2 (1996), 38–47. doi:10.1109/2.485845

  40. [40]

    Schneider

    Fred B. Schneider. 2000. Enforceable Security Policies.ACM Transactions on Information and System Security (TISSEC)3, 1 (2000), 30–50. doi:10.1145/353323. 353382

  41. [41]

    Xiangmin Shen, Lingzhi Wang, Zhenyuan Li, Yan Chen, Wencheng Zhao, Dawei Sun, Jiashui Wang, and Wei Ruan. 2025. PentestAgent: Incorporating LLM Agents to Automated Penetration Testing. InProceedings of the 20th ACM Asia Conference on Computer and Communications Security. doi:10.1145/3708821.3733882 Preprint: arXiv:2411.05185

  42. [42]

    Shengye Wan, Cyrus Nikolaidis, Daniel Song, David Molnar, James Crnkovich, Jayson Grace, Manish Bhatt, Sahana Chennabasappa, Spencer Whitman, Stephanie Ding, Vlad Ionescu, Yue Li, and Joshua Saxe. 2024. CyberSecEval 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models.arXiv preprint arXiv:2408.01605(2024)

  43. [43]

    AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

    Haoyu Wang, Christopher M. Poskitt, and Jun Sun. 2025. AgentSpec: Customiz- able Runtime Enforcement for Safe and Reliable LLM Agents.arXiv preprint arXiv:2503.18666(2025)

  44. [44]

    EVMbench: Evaluating AI Agents on Smart Contract Security,

    Justin Wang, Andreas Bigger, Xiaohai Xu, Justin W. Lin, Andy Applebaum, Tejal Patwardhan, Alpin Yukseloglu, and Olivia Watkins. 2026. EVMbench: Evaluating AI Agents on Smart Contract Security. doi:10.48550/arXiv.2603.04915 arXiv:2603.04915 [cs.LG] OpenAI, Paradigm, and OtterSec technical report

  45. [45]

    Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models. InProceedings of the 31st ACM Isaac David, Marco Guarnieri, and Arthur Gervais SIGKDD Conference on Knowledge Discovery and Data Mining. doi:10.1145/ 3690624.3709179 P...

  46. [46]

    Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. 2025. Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents. InFindings of the Association for Computational Linguistics: NAACL 2025. 7116–7132. doi:10.18653/v1/2025.findings-naacl.395

  47. [47]

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. InFindings of the Association for Computational Linguistics: ACL

  48. [48]

    doi: 10.18653/v1/2024.findings-acl.624

    10471–10506. doi:10.18653/v1/2024.findings-acl.624

  49. [49]

    Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig

    Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. 2024. WebArena: A Realistic Web Environment for Building Autonomous Agents. InInternational Conference on Learning Representations (ICLR). https: //openreview.net/forum?id=oKn9c6ytLx A Mechanization ...