arxiv: 2605.00081 · v1 · submitted 2026-04-30 · 💻 cs.CR · cs.LO

Recognition: unknown

Alignment Contracts for Agentic Security Systems

Isaac David , Marco Guarnieri , Arthur Gervais

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:43 UTC · model grok-4.3

classification 💻 cs.CR cs.LO

keywords alignment contractsagentic securityeffect tracessafety propertiessoundness theoremLLM agentsformal verificationenforcement

0 comments

The pith

Alignment contracts turn observable effect traces into enforceable safety properties for agentic security systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces alignment contracts to specify scope, allowed and forbidden effects, resource budgets, and disclosure policies for systems that combine LLM planners with vulnerability tools. It equips the contracts with finite-trace semantics, shows that satisfaction is a safety property with finite violation witnesses, and proves that admissibility checking is decidable along with refinement and composition rules. Under the explicit assumption that every relevant effect is mediated and therefore observable, a soundness theorem quantifies over agent models and guarantees enforcement soundness for monitor-realized traces. This approach addresses the asymmetric control problem by making the authorized-engagement boundary a first-class formal object rather than an informal guardrail.

Core claim

The authors define alignment contracts over observable effect traces, give them finite-trace semantics, characterize satisfaction as a safety property, develop refinement and one-way composition rules, and prove that admissibility checking is decidable. Under the Effect Observability Assumption that all ΣEff-effects are mediated, they establish a soundness theorem that quantifies over the agent model and yields enforcement guarantees, including for monitor-realized traces, together with an assumption-lifted adaptation result and limits expressed via undecidability transfer and observability-boundary theorems.

What carries the argument

Alignment contracts, which specify scope, allowed and forbidden effects, resource budgets, and disclosure policies over finite observable effect traces.

If this is right

Refinement and one-way composition rules support modular contract engineering.
Admissibility checking is decidable for any contract.
Soundness holds for all mediated ΣEff-effects when the observability assumption is met.
The same contract structure extends from web-focused workflows to other effect profiles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The contracts could be layered on top of existing LLM guardrails to add formal trace-level guarantees.
The undecidability transfer result marks a hard limit once any effect escapes mediation.
Real-world implementations would need to verify that the mediation layer truly captures every side-effect the agent can produce.

Load-bearing premise

Every relevant effect must be mediated so that the enforcement mechanism can observe it.

What would settle it

A concrete agent execution that produces a non-mediated effect violating the contract yet remains undetected by the monitor would falsify the soundness theorem.

read the original abstract

Agentic security systems increasingly combine LLM planners with tools that can discover, validate, and report vulnerabilities. This creates an asymmetric control problem: the system should retain strong offensive capability inside an authorized engagement, while the same capabilities must be denied outside scope. Existing guardrails provide useful policy controls, but they do not make this boundary a first-class formal contract over observable effects. We introduce alignment contracts, a framework for specifying and enforcing behavioral constraints over observable effect traces. A contract defines scope, allowed and forbidden effects, resource budgets, and disclosure policies. We give the language finite-trace semantics, characterize satisfaction as a safety property with finite violation witnesses, develop refinement and one-way composition rules for modular contract engineering, and show that admissibility checking is decidable. We instantiate the framework for web-focused agentic security workflows and show how the same structure extends to other effect profiles. Under an explicit Effect Observability Assumption, where all $\SigmaEff$-effects are mediated, the soundness theorem quantifies over the agent model and gives guarantees for mediated $\SigmaEff$-effects, including enforcement soundness for monitor-realized traces. We also state an assumption-lifted adaptation result and formalize limits through undecidability transfer and observability-boundary theorems. A Lean 4 artifact checks the formal core theorems used by the paper.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a contract language for agentic security agents with finite-trace semantics, refinement rules, and Lean-checked soundness under an explicit observability assumption.

read the letter

The main thing to know is that this work supplies a formal language for alignment contracts that specify scope, allowed effects, budgets, and disclosure rules for LLM-based security agents. It gives finite-trace semantics, frames satisfaction as a safety property with finite witnesses, develops refinement and one-way composition rules, proves decidability of admissibility, and ships a Lean 4 artifact that checks the core theorems. The soundness result quantifies over agent models for mediated effects and includes an adaptation result for lifting the assumption plus explicit boundary theorems on undecidability and observability limits. That combination of new language plus mechanically verified implications is the concrete advance over prior guardrail work. The formal development is clean on its own terms and the artifact lets others inspect the derivations directly. The web-focused instantiation shows how the structure can apply to common effect profiles. The central limitation is the Effect Observability Assumption: the guarantees only cover traces where every relevant ΣEff-effect routes through the mediation layer. If an agent model can emit unmediated side-channel outputs or unlogged tool calls, those traces fall outside the quantified soundness claim. The paper states this boundary plainly rather than hiding it, which is the right approach, but it does mean real deployments would need to enforce the mediation strictly. This is aimed at researchers who combine formal methods with AI security or agent safety. A reader who wants contract-based enforcement mechanisms with decidable checks and proof artifacts will get usable material. It deserves a serious referee because the formal core and the artifact give something specific to evaluate rather than just high-level ideas.

Referee Report

2 major / 2 minor

Summary. The paper introduces alignment contracts as a framework for specifying and enforcing behavioral constraints over observable effect traces in agentic security systems that combine LLM planners with tools. It defines a language with finite-trace semantics, characterizes satisfaction as a safety property with finite violation witnesses, develops refinement and one-way composition rules, shows that admissibility checking is decidable, and states a soundness theorem (under the explicit Effect Observability Assumption that all ΣEff-effects are mediated) that quantifies over agent models and gives guarantees including enforcement soundness for monitor-realized traces. The core theorems are mechanically checked in a Lean 4 artifact, with an assumption-lifted adaptation result and limits via undecidability transfer and observability-boundary theorems.

Significance. If the results hold, the work provides a formal, machine-checked foundation for making behavioral boundaries first-class contracts in LLM-based agentic security workflows. The explicit finite-trace semantics, decidability result, and Lean artifact are clear strengths that enable modular contract engineering and verifiable enforcement. The paper's transparent treatment of the Effect Observability Assumption and the adaptation result further strengthen the contribution by delineating the precise scope of the guarantees.

major comments (2)

[Soundness theorem (abstract and §4)] The soundness theorem (as stated in the abstract and developed in the formal core) is load-bearing on the Effect Observability Assumption that all ΣEff-effects are mediated and observable by the enforcement mechanism. The manuscript does not provide a concrete argument or coverage analysis showing that this assumption holds for typical LLM tool invocations (e.g., direct side-channel outputs or unlogged calls), which directly limits the quantification over agent models and the enforcement soundness claim for monitor-realized traces.
[Adaptation result and observability-boundary theorems] The adaptation result for lifting the assumption is stated but lacks a detailed proof sketch or Lean-checked statement showing how the guarantees transfer when the observability boundary is relaxed; this is central to the claim that the framework extends beyond the strict mediation case.

minor comments (2)

[Refinement and composition rules] The abstract refers to 'one-way composition rules' without clarifying in the main text whether these rules are strictly one-directional or admit limited bidirectional use under the refinement relation.
[Instantiation section] Notation for ΣEff and the effect profiles in the web-focused instantiation should be cross-referenced to the general semantics definition to avoid ambiguity when extending to other domains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review, and for recognizing the strengths of the finite-trace semantics, decidability result, and Lean 4 artifact. We respond to each major comment below, proposing targeted revisions where they strengthen the connection between the formal results and practical agentic security workflows.

read point-by-point responses

Referee: [Soundness theorem (abstract and §4)] The soundness theorem (as stated in the abstract and developed in the formal core) is load-bearing on the Effect Observability Assumption that all ΣEff-effects are mediated and observable by the enforcement mechanism. The manuscript does not provide a concrete argument or coverage analysis showing that this assumption holds for typical LLM tool invocations (e.g., direct side-channel outputs or unlogged calls), which directly limits the quantification over agent models and the enforcement soundness claim for monitor-realized traces.

Authors: We agree that the soundness theorem is conditional on the Effect Observability Assumption, which is stated explicitly throughout the manuscript (including in the abstract and §4). The theorem quantifies over agent models under this assumption, and the observability-boundary theorems already delineate the precise limits when mediation does not hold. However, we acknowledge that the manuscript would benefit from a more concrete discussion of applicability to common LLM tool invocations. In the revised version we will add a short subsection in §4 that (i) describes typical mediation patterns for web-focused agentic security tools (e.g., API-wrapped outputs that are logged by the monitor), (ii) notes that direct side-channel or unlogged calls fall outside the assumption, and (iii) cross-references the boundary theorems. This addition clarifies scope without altering the formal statements. revision: yes
Referee: [Adaptation result and observability-boundary theorems] The adaptation result for lifting the assumption is stated but lacks a detailed proof sketch or Lean-checked statement showing how the guarantees transfer when the observability boundary is relaxed; this is central to the claim that the framework extends beyond the strict mediation case.

Authors: The adaptation result is stated in the manuscript as an assumption-lifted theorem, and the core theorems (including the boundary theorems) are mechanically checked in the Lean 4 artifact. We recognize that a high-level proof sketch in the main text would improve readability and make the transfer of guarantees more transparent. In the revision we will (i) expand the presentation of the adaptation result with a concise proof sketch in §4 and (ii) verify that the Lean artifact contains the adaptation and boundary statements; if any additional formalization is required we will extend the artifact accordingly. The existing boundary theorems already formalize the limits of the relaxed case. revision: partial

Circularity Check

0 steps flagged

No circularity; formal semantics and Lean-verified theorems are self-contained.

full rationale

The paper defines alignment contracts via explicit finite-trace semantics, safety properties, refinement rules, and decidable admissibility. The soundness theorem is stated under an explicit Effect Observability Assumption and is mechanically checked in Lean 4, with an assumption-lifted adaptation result. No step reduces a claimed prediction or theorem to a fitted parameter, self-definition, or unverified self-citation chain; the derivation chain consists of standard formal definitions and verified implications that do not presuppose their own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the Effect Observability Assumption and the finite-trace semantics introduced by the framework. No numerical free parameters appear. The new concept of alignment contracts is the primary invented entity.

axioms (1)

domain assumption Effect Observability Assumption: all ΣEff-effects are mediated
Invoked to obtain the soundness theorem for monitor-realized traces.

invented entities (1)

Alignment contract no independent evidence
purpose: Specifying scope, allowed/forbidden effects, resource budgets, and disclosure policies over observable traces
New framework element introduced to make behavioral boundaries first-class formal objects.

pith-pipeline@v0.9.0 · 5528 in / 1392 out tokens · 41576 ms · 2026-05-09T20:43:58.317673+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 31 canonical work pages · 3 internal anchors

[1]

Sahar Abdelnabi, Amr Gomaa, Eugene Bagdasarian, Per Ola Kristensson, and Reza Shokri. 2025. Firewalls to Secure Dynamic LLM Agentic Networks.arXiv preprint arXiv:2502.01822(2025)

work page arXiv 2025
[2]

Al-Shaer and Hazem H

Ehab S. Al-Shaer and Hazem H. Hamed. 2004. Discovery of Policy Anomalies in Distributed Firewalls. InProceedings IEEE INFOCOM. 2605–2616. doi:10.1109/ INFCOM.2004.1354680

work page arXiv 2004
[3]

Carolyn Jane Anderson, Nate Foster, Arjun Guha, Jean-Baptiste Jeannin, Dexter Kozen, Cole Schlesinger, and David Walker. 2014. NetKAT: Semantic Foundations for Networks. InProceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 113–126. doi:10.1145/2535838. 2535862

work page doi:10.1145/2535838 2014
[4]

Anderson

James P. Anderson. 1972.Computer Security Technology Planning Study. Vol- ume 2. Technical Report ESD-TR-73-51. Air Force Electronic Systems Division. Introduced the reference monitor concept

1972
[5]

Anthropic. 2025. AI agents find $4.6M in blockchain smart contract exploits. https://red.anthropic.com/2025/smart-contracts/ December 1, 2025

2025
[6]

Anthropic. 2025. Disrupting the First Reported AI-Orchestrated Cyber Espi- onage Campaign. https://www.anthropic.com/news/disrupting-AI-espionage November 13, 2025

2025
[7]

Yair Bartal, Alain Mayer, Kobbi Nissim, and Avishai Wool. 1999. Firmato: A Novel Firewall Management Toolkit. InProceedings of the IEEE Symposium on Security and Privacy (S&P). 17–31. doi:10.1109/SECPRI.1999.766714

work page doi:10.1109/secpri.1999.766714 1999
[8]

Basin, Vincent Jugé, Felix Klaedtke, and Eugen Zălinescu

David A. Basin, Vincent Jugé, Felix Klaedtke, and Eugen Zălinescu. 2013. Enforce- able Security Policies Revisited.ACM Transactions on Information and System Security (TISSEC)16, 1 (2013), 1–26. doi:10.1145/2487222.2487225

work page doi:10.1145/2487222.2487225 2013
[9]

Luca Beurer-Kellner, Beat Buesser, Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, and Václav Vol- hejn. 2025. Design Patterns for Securing LLM Agents against Prompt Injections. arXiv preprint arXiv:2506.08837(2025)

work page arXiv 2025
[10]

Manish Bhatt, Sahana Chennabasappa, Yue Li, Cyrus Nikolaidis, Daniel Song, Shengye Wan, Faizan Ahmad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil, David Molnar, Spencer Whitman, and Joshua Saxe. 2024. CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models. arXiv preprint arXiv:2404.13161(2024)

work page arXiv 2024
[11]

Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin
[12]

Securing AI Agents with Information-Flow Control.arXiv preprint arXiv:2505.23643(2025)

work page internal anchor Pith review arXiv 2025
[13]

Cutler, Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Michael Hicks, Kesha Hietala, Eleftherios Ioannidis, John H

Joseph W. Cutler, Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Michael Hicks, Kesha Hietala, Eleftherios Ioannidis, John H. Kastner, Anwar Mamat, Darin McAdams, Matt McCutchen, Neha Rungta, Emina Torlak, and Andrew Wells. 2024. Cedar: A New Language for Expressive, Fast, Safe, and Analyzable Authorization.Proceedings of the ACM on Programming L...

work page doi:10.1145/3649835 2024
[14]

Isaac David and Arthur Gervais. 2025. MAPTA: Multi-Agent Penetration Testing AI for the Web.arXiv preprint arXiv:2508.20816(2025)

work page arXiv 2025
[15]

Henzinger

Luca de Alfaro and Thomas A. Henzinger. 2001. Interface Automata. InProceed- ings of the 8th European Software Engineering Conference / 9th ACM SIGSOFT Symposium on Foundations of Software Engineering (ESEC/FSE). ACM, 109–120. doi:10.1145/503209.503226

work page doi:10.1145/503209.503226 2001
[16]

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents. InAdvances in Neural Information Processing Systems (NeurIPS). Datasets and Benchmarks Track; preprint: arXiv:2406.13352

work page internal anchor Pith review arXiv 2024
[17]

Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. 2024. PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing. In33rd USENIX Security Symposium (USENIX Security 24). 847–864

2024
[18]

Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Michael Hicks, Kesha Hietala, John Kastner, Anwar Mamat, Matt McCutchen, Neha Rungta, Bhakti Shah, Emina Torlak, and Andrew Wells. 2024. How We Built Cedar: A Verification- Guided Approach. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FS...

work page arXiv 2024
[19]

Schneider

Úlfar Erlingsson and Fred B. Schneider. 2000. IRM Enforcement of Java Stack Inspection. InIEEE Symposium on Security and Privacy (S&P). 246–255

2000
[20]

Arthur Gervais and Liyi Zhou. 2026. AI Agent Smart Contract Exploit Generation. InFinancial Cryptography and Data Security (FC 2026)

2026
[21]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security

2023
[22]

Harrison, Walter L

Michael A. Harrison, Walter L. Ruzzo, and Jeffrey D. Ullman. 1976. Protection in Operating Systems.Commun. ACM19, 8 (1976), 461–471. doi:10.1145/360303. 360333

work page doi:10.1145/360303 1976
[23]

Hu, David Ferraiolo, Rick Kuhn, Adam Schnitzer, Kenneth Sandlin, Robert Miller, and Karen Scarfone

Vincent C. Hu, David Ferraiolo, Rick Kuhn, Adam Schnitzer, Kenneth Sandlin, Robert Miller, and Karen Scarfone. 2014.NIST SP 800-162: Guide to Attribute Based Access Control (ABAC) Definition and Considerations. Technical Report. National Institute of Standards and Technology. doi:10.6028/NIST.SP.800-162

work page doi:10.6028/nist.sp.800-162 2014
[24]

Ruochen Jiao, Shaoyuan Xie, Justin Yue, Takami Sato, Lixu Wang, Yixuan Wang, Qi Alfred Chen, and Qi Zhu. 2024. Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems.arXiv preprint arXiv:2405.20774(2024)

work page arXiv 2024
[25]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2024. SWE-bench: Can Language Models Resolve Real-World GitHub Issues?. InInternational Conference on Learning Representa- tions (ICLR)

2024
[26]

Adharsh Kamath, Sishen Zhang, Calvin Xu, Shubham Ugare, Gagandeep Singh, and Sasa Misailovic. 2025. Enforcing Temporal Constraints for LLM Agents. arXiv preprint arXiv:2512.23738(2025)

work page arXiv 2025
[27]

Peyman Kazemian, Michael Chan, Hongyi Zeng, George Varghese, Nick McKe- own, and Scott Whyte. 2013. Real Time Network Policy Checking Using Header Space Analysis. InProceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 99–111. https://www.usenix.org/ conference/nsdi13/technical-sessions/presentation/kazemian

2013
[28]

Peyman Kazemian, George Varghese, and Nick McKeown. 2012. Header Space Analysis: Static Checking for Networks. InProceedings of the 9th USENIX Sympo- sium on Networked Systems Design and Implementation (NSDI). 113–126. https:// www.usenix.org/conference/nsdi12/technical-sessions/presentation/kazemian

2012
[29]

Brighten Godfrey

Ahmed Khurshid, Wenxuan Zhou, Matthew Caesar, and P. Brighten Godfrey
[30]

InProceedings of the First Workshop on Hot Topics in Software Defined Networks (HotSDN)

VeriFlow: Verifying Network-Wide Invariants in Real Time. InProceedings of the First Workshop on Hot Topics in Software Defined Networks (HotSDN). 49–54. doi:10.1145/2342441.2342452

work page doi:10.1145/2342441.2342452
[31]

Roham Koohestani. 2025. AgentGuard: Runtime Verification of AI Agents.arXiv preprint arXiv:2509.23864(2025)

work page arXiv 2025
[32]

Jay Ligatti, Lujo Bauer, and David Walker. 2005. Edit Automata: Enforcement Mechanisms for Run-time Security Policies.International Journal of Information Security4, 1–2 (2005), 2–16. doi:10.1007/s10207-004-0046-8

work page doi:10.1007/s10207-004-0046-8 2005
[33]

Jay Ligatti, Lujo Bauer, and David Walker. 2009. Run-Time Enforcement of Nonsafety Policies.ACM Transactions on Information and System Security12, 3, Article 19 (2009), 41 pages. doi:10.1145/1455526.1455532

work page doi:10.1145/1455526.1455532 2009
[34]

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2024. AgentBench: Evaluating LLMs as Agents. InInternational Conference on Learning Represe...

2024
[35]

Microsoft Security Response Center. 2026. Strengthening Secure Software at Global Scale: How MSRC is Evolving with AI. https://www.microsoft.com/en- us/msrc/blog/2026/04/strengthening-secure-software-global-scale-how-msrc- is-evolving-with-ai April 7, 2026

2026
[36]

Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, and Florian Tramèr

Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V. Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, and Florian Tramèr
[37]

The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against LLM Jailbreaks and Prompt Injections.arXiv preprint arXiv:2510.09023 (2025)

work page arXiv 2025
[38]

Sandhu, Edward J

Ravi S. Sandhu, Edward J. Coyne, Hal L. Feinstein, and Charles E. Youman
[39]

Sandhu, E

Role-Based Access Control Models.IEEE Computer29, 2 (1996), 38–47. doi:10.1109/2.485845

work page doi:10.1109/2.485845 1996
[40]

Schneider

Fred B. Schneider. 2000. Enforceable Security Policies.ACM Transactions on Information and System Security (TISSEC)3, 1 (2000), 30–50. doi:10.1145/353323. 353382

work page doi:10.1145/353323 2000
[41]

Xiangmin Shen, Lingzhi Wang, Zhenyuan Li, Yan Chen, Wencheng Zhao, Dawei Sun, Jiashui Wang, and Wei Ruan. 2025. PentestAgent: Incorporating LLM Agents to Automated Penetration Testing. InProceedings of the 20th ACM Asia Conference on Computer and Communications Security. doi:10.1145/3708821.3733882 Preprint: arXiv:2411.05185

work page doi:10.1145/3708821.3733882 2025
[42]

Shengye Wan, Cyrus Nikolaidis, Daniel Song, David Molnar, James Crnkovich, Jayson Grace, Manish Bhatt, Sahana Chennabasappa, Spencer Whitman, Stephanie Ding, Vlad Ionescu, Yue Li, and Joshua Saxe. 2024. CyberSecEval 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models.arXiv preprint arXiv:2408.01605(2024)

work page arXiv 2024
[43]

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

Haoyu Wang, Christopher M. Poskitt, and Jun Sun. 2025. AgentSpec: Customiz- able Runtime Enforcement for Safe and Reliable LLM Agents.arXiv preprint arXiv:2503.18666(2025)

work page internal anchor Pith review arXiv 2025
[44]

EVMbench: Evaluating AI Agents on Smart Contract Security,

Justin Wang, Andreas Bigger, Xiaohai Xu, Justin W. Lin, Andy Applebaum, Tejal Patwardhan, Alpin Yukseloglu, and Olivia Watkins. 2026. EVMbench: Evaluating AI Agents on Smart Contract Security. doi:10.48550/arXiv.2603.04915 arXiv:2603.04915 [cs.LG] OpenAI, Paradigm, and OtterSec technical report

work page doi:10.48550/arxiv.2603.04915 2026
[45]

Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. 2025. Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models. InProceedings of the 31st ACM Isaac David, Marco Guarnieri, and Arthur Gervais SIGKDD Conference on Knowledge Discovery and Data Mining. doi:10.1145/ 3690624.3709179 P...

work page arXiv 2025
[46]

Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. 2025. Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents. InFindings of the Association for Computational Linguistics: NAACL 2025. 7116–7132. doi:10.18653/v1/2025.findings-naacl.395

work page doi:10.18653/v1/2025.findings-naacl.395 2025
[47]

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. InFindings of the Association for Computational Linguistics: ACL

2024
[48]

doi: 10.18653/v1/2024.findings-acl.624

10471–10506. doi:10.18653/v1/2024.findings-acl.624

work page doi:10.18653/v1/2024.findings-acl.624 2024
[49]

Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig

Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. 2024. WebArena: A Realistic Web Environment for Building Autonomous Agents. InInternational Conference on Learning Representations (ICLR). https: //openreview.net/forum?id=oKn9c6ytLx A Mechanization ...

2024