Lingering Authority: Revocable Resource-and-Effect Capabilities for Coding Agents

Igor Santos-Grueiro

arxiv: 2606.22504 · v1 · pith:2XNHXILKnew · submitted 2026-06-21 · 💻 cs.CR · cs.SE

Lingering Authority: Revocable Resource-and-Effect Capabilities for Coding Agents

Igor Santos-Grueiro This is my paper

Pith reviewed 2026-06-26 10:10 UTC · model grok-4.3

classification 💻 cs.CR cs.SE

keywords coding agentsrevocable capabilitiesreference monitorlingering authoritytask contractsaccess controlAI securityresource revocation

0 comments

The pith

PORTICO prevents coding agents from reusing resources or causing forbidden effects after their justifying task episode closes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PORTICO to close the gap of lingering authority, in which coding agents keep broad tool access even after the specific subgoal that required it has ended. It works by compiling an explicit task contract into initial capabilities, grant rules, trusted closure predicates, and global deny rules, then issuing opaque epoch-bound handles that are removed from the planner interface on closure. The system rejects any stale replay before side effects can occur. In controlled evaluations, PORTICO records zero executed contract-forbidden effects and rejects every post-closure reuse attempt, while an otherwise identical non-revoking comparator allows all of them. This separation lets agents receive the grants they need for productive work without leaving permanent exposure.

Core claim

PORTICO is a reference monitor that materializes capability expansions as opaque, epoch-bound handles through a request-grant-invoke lifecycle. Closure removes those handles from the next planner interface and rejects stale replay before side effects. In controlled coding-agent tasks, PORTICO records no executed contract-forbidden effects, rejects 10/10 post-closure reuses, and shows 0/6 forbidden effects in a stale-write audit, while a non-revoking comparator permits 10/10 reuses and 6/6 forbidden effects; both systems match on task success, scope compliance, and all pre-closure decisions. Scripted traces and live model traces over file writes, git mutation, and network egress reproduce the

What carries the argument

The PORTICO reference monitor, which compiles an explicit task contract into initial capabilities, grant rules, trusted closure predicates, and global deny rules, then enforces revocation via epoch-bound handles.

If this is right

Controlled grants recover boundary work that a fixed narrow envelope would block.
On the closure slice, task success, scope compliance, and pre-closure decisions remain identical to the non-revoking comparator.
PORTICO rejects all 10/10 post-closure reuses while the comparator permits all 10/10.
A deterministic stale-write audit records 0/6 executed forbidden effects versus 6/6 for the comparator.
Broad request exposure in a four-episode diagnostic preserves zero executed forbidden effects while raising blocked proposals from 67 to 84.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same contract-driven revocation pattern could be tested on agents that interact with external services beyond file and git operations.
Extending the closure predicates to handle concurrent subtasks might reduce the observed increase in blocked proposals.
Real-repository runs suggest the lifecycle can be exercised on live project layouts without requiring changes to the underlying language model.
The approach opens the possibility of policy evolution that tracks task progress rather than relying on static initial envelopes.

Load-bearing premise

The monitor assumes mediated tools and a sound typed catalog.

What would settle it

A controlled run in which PORTICO allows even one contract-forbidden effect to execute after the relevant closure would falsify the reported safety result.

Figures

Figures reproduced from arXiv: 2606.22504 by Igor Santos-Grueiro.

**Figure 2.** Figure 2: Contract excerpt with compiler-derived lifetime for [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Semantic meaning of authority closure. Both rows [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

read the original abstract

Coding agents often receive broad tool access for an entire task, even when a resource is needed only for one subgoal. We call this gap lingering authority: a temporary resource/effect capability remains exposed after the episode that justified it has closed. PORTICO is a reference monitor for revocable capabilities exposed to the planner. It compiles an explicit task contract into initial capabilities, grant rules, trusted closure predicates, and global deny rules. A request-grant-invoke lifecycle materializes expansions as opaque, epoch-bound handles. Closure removes those handles from the next planner interface and rejects stale replay before side effects. The monitor assumes mediated tools and a sound typed catalog. In controlled coding-agent tasks, PORTICO records no executed contract-forbidden effects in the evaluated runs, while controlled grants recover boundary work blocked by a fixed narrow envelope. A non-revoking comparator receives the same initial envelope and the same grants at the same turns. On the closure slice, both systems match task success, scope compliance, and all pre-closure decisions; PORTICO then rejects 10/10 post-closure reuses, while the comparator permits 10/10. A deterministic stale-write audit records 0/6 versus 6/6 executed forbidden effects. Scripted traces and six live model traces over file writes, git mutation, and network egress show the same split. In a four-episode same-policy diagnostic, broad request exposure preserves zero executed forbidden effects but raises blocked proposals from 67 to 84. Frozen real-repository runs, with commits and traces recorded, exercise the same lifecycle on real project layouts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PORTICO gives a clean experimental split on revocable capabilities for coding agents but the zero-forbidden claim only follows if the unverified mediation and catalog assumptions actually hold.

read the letter

The main thing to know is that PORTICO blocks post-closure reuses and forbidden effects in the reported runs while the non-revoking comparator does not, yet the security result depends on two assumptions the paper states but does not test.

What is new is the PORTICO reference monitor itself: it turns a task contract into initial capabilities, grant rules, closure predicates, and deny rules, then uses epoch-bound opaque handles in a request-grant-invoke-closure cycle. The evaluation runs the same initial envelope and grants on both systems and shows matching task success and pre-closure decisions, followed by 10/10 post-closure rejections for PORTICO versus 10/10 permits for the baseline, plus 0/6 versus 6/6 executed forbidden effects in the stale-write audit. The same pattern appears in scripted traces and six live model traces over file writes, git mutations, and network egress, plus frozen real-repository runs.

The paper does well at keeping the comparison controlled and at including both scripted and live traces plus recorded real-repo commits. That makes the split easy to inspect.

The soft spot is exactly the one the stress-test note flags. The monitor only guarantees no forbidden effects if every tool is mediated and the typed catalog classifies every side-effect of the file, git, and network operations that appear. The abstract lists those assumptions but supplies no coverage argument, proof, or audit that the catalog actually covers the operations used in the traces. If either assumption fails for even one call, the observed difference cannot be attributed to the revocation lifecycle. The paper does not address whether the catalog was built or verified for completeness.

This work is for researchers building secure LLM coding agents or exploring capability systems in practice. A reader who wants concrete numbers on revocation in agent planners will get value from the controlled comparison.

It deserves peer review. The experimental evidence is direct enough to be worth referee time, even though the authors will need to tighten the argument around the catalog and mediation assumptions.

Referee Report

2 major / 1 minor

Summary. The paper introduces PORTICO, a reference monitor that compiles task contracts into revocable, epoch-bound capabilities for coding agents. It enforces a request-grant-invoke lifecycle with closure predicates and deny rules to eliminate lingering authority after subgoals close. In controlled experiments on scripted traces and live model runs involving file writes, git mutations, and network egress, PORTICO records zero executed contract-forbidden effects post-closure (rejecting 10/10 reuses and 0/6 stale writes), while a non-revoking comparator permits all 10/10 reuses and 6/6 forbidden effects; task success and pre-closure compliance remain matched.

Significance. If the results hold under the stated assumptions, the work supplies a practical, contract-driven mechanism for limiting authority exposure in AI coding agents, directly addressing a security gap in tool-using planners. The controlled comparisons, including the deterministic stale-write audit and frozen real-repository runs with recorded commits and traces, provide concrete, falsifiable evidence of the revocation lifecycle's effect; the four-episode diagnostic further isolates the impact of broad request exposure on blocked proposals.

major comments (2)

[Abstract] Abstract and experimental evaluation: the central claim that PORTICO produces zero executed forbidden effects (0/6 in the stale-write audit, 10/10 post-closure rejections) is conditioned on the unverified assumptions of complete tool mediation and a sound typed catalog. No coverage argument, proof, or audit is supplied showing that the catalog classifies every side-effect of the file, git, and network tools appearing in the scripted and live traces; if either assumption fails for a single operation, the observed split versus the comparator cannot be attributed to the revocation mechanism.
[Experimental evaluation] Experimental setup (controlled tasks and live model traces): the methodology reports that both systems receive identical initial envelopes and grants, yet supplies no verification step or coverage metric confirming that all tool invocations in the evaluated runs were in fact mediated through the reference monitor. This verification is load-bearing for the claim that the zero forbidden effects result from PORTICO rather than from incomplete tool exposure in the test harness.

minor comments (1)

[Abstract] The abstract states that 'controlled grants recover boundary work blocked by a fixed narrow envelope' but does not define the envelope construction or report the quantitative recovery rate; a brief clarification or table entry would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for identifying the importance of verifying the mediation assumptions in our experiments. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract and experimental evaluation: the central claim that PORTICO produces zero executed forbidden effects (0/6 in the stale-write audit, 10/10 post-closure rejections) is conditioned on the unverified assumptions of complete tool mediation and a sound typed catalog. No coverage argument, proof, or audit is supplied showing that the catalog classifies every side-effect of the file, git, and network tools appearing in the scripted and live traces; if either assumption fails for a single operation, the observed split versus the comparator cannot be attributed to the revocation mechanism.

Authors: The manuscript explicitly states that the monitor assumes mediated tools and a sound typed catalog. The experimental results, including the zero forbidden effects, are presented under these assumptions. We do not supply a coverage argument or proof because the contribution centers on the revocable capability lifecycle rather than exhaustive tool verification. In the test harness, tool invocations are mediated by construction for the evaluated operations. Since the comparator uses the identical harness, the performance split isolates the effect of revocation. We will revise the manuscript to expand the discussion of the catalog and mediation assumptions in the abstract and evaluation sections. revision: partial
Referee: [Experimental evaluation] Experimental setup (controlled tasks and live model traces): the methodology reports that both systems receive identical initial envelopes and grants, yet supplies no verification step or coverage metric confirming that all tool invocations in the evaluated runs were in fact mediated through the reference monitor. This verification is load-bearing for the claim that the zero forbidden effects result from PORTICO rather than from incomplete tool exposure in the test harness.

Authors: We agree that an explicit verification step would strengthen the attribution. The current description indicates that the harness provides the same envelopes and that tools are accessed via the monitor, but does not include a separate coverage metric or audit log. In revision, we will add details on the mediation layer in the experimental setup to confirm that all invocations in the scripted and live traces were routed through PORTICO, thereby supporting that the results stem from the revocation mechanism. revision: partial

Circularity Check

0 steps flagged

No circularity; experimental claims rest on direct comparisons without self-referential derivations or load-bearing self-citations

full rationale

The paper describes a reference monitor (PORTICO) and reports results from scripted traces, live model runs, and audits comparing it to a non-revoking comparator. No equations, fitted parameters, or derivation chains appear. The central claims (zero forbidden effects under PORTICO vs. 6/6 under comparator) are produced by the experimental setup itself rather than by any reduction to prior self-citations or definitional identities. The stated assumptions (mediated tools, sound catalog) are explicit preconditions but are not used to derive the observed split; they are simply the conditions under which the monitor operates. This matches the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review limited to abstract; no free parameters identified; one explicit domain assumption stated.

axioms (1)

domain assumption mediated tools and a sound typed catalog
Explicitly listed as required for the reference monitor to operate.

invented entities (1)

PORTICO no independent evidence
purpose: Reference monitor implementing revocable resource-and-effect capabilities for coding agent planners
Newly introduced system whose design and evaluation form the core of the paper.

pith-pipeline@v0.9.1-grok · 5818 in / 1294 out tokens · 31885 ms · 2026-06-26T10:10:35.231757+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 10 linked inside Pith

[1]

Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt in- jection

Sahar Abdelnabi, Kai Greshake, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt in- jection. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security, AISec ’23, pages 79–90, Copenhagen, Denmark, 2023. Association...

2023
[2]

Zico Kolter, Matt Fredrik- son, Yarin Gal, and Xander Davies

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, J. Zico Kolter, Matt Fredrik- son, Yarin Gal, and Xander Davies. AgentHarm: A benchmark for measuring harmfulness of LLM agents. InThe Thirteenth International Conference on Learning Representations, Singapore, 2025. OpenReview.net

2025
[3]

Con- tract2Tool: Learning preconditions and effects for reli- able tool-augmented LLM agents

Rahul Suresh Babu and Laxmipriya Ganesh Iyer. Con- tract2Tool: Learning preconditions and effects for reli- able tool-augmented LLM agents. arXiv:2606.07904, 2026

Pith/arXiv arXiv 2026
[4]

Tool- ChoiceConfusion: Causal minimal tool filtering for reli- able LLM agents

Rahul Suresh Babu and Laxmipriya Ganesh Iyer. Tool- ChoiceConfusion: Causal minimal tool filtering for reli- able LLM agents. arXiv:2606.06284, 2026

Pith/arXiv arXiv 2026
[5]

Design patterns for securing LLM agents against prompt injections

Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cre¸ tu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, and Václav V olhejn. Design patterns for securing LLM agents against prompt injections. arXiv:2506.08837, 2025

arXiv 2025
[6]

Taylor, Krishnamurthy Dj Dvijotham, and Alexandre Lacoste

Rishika Bhagwatkar, Kevin Kasa, Abhay Puri, Gabriel Huang, Irina Rish, Graham W. Taylor, Krishnamurthy Dj Dvijotham, and Alexandre Lacoste. Indirect prompt injections: Are firewalls all you need, or stronger bench- marks? arXiv:2510.05244, 2025

arXiv 2025
[7]

AgentBound: Securing execution boundaries of AI agents.Proceedings of the ACM on Software Engineering, 3(FSE), 2026

Christoph Bühler, Matteo Biagiola, Luca Di Grazia, and Guido Salvaneschi. AgentBound: Securing execution boundaries of AI agents.Proceedings of the ACM on Software Engineering, 3(FSE), 2026

2026
[8]

LlamaFire- wall: An open source guardrail system for building se- cure AI agents

Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, Alekhya Gampa, Beto de Paola, Dominik Gabi, James Crnkovich, Jean-Christophe Testud, Kat He, Rash- nil Chaturvedi, Wu Zhou, and Joshua Saxe. LlamaFire- wall: An open source guardrail system fo...

arXiv 2025
[9]

Maris: A formally verifiable privacy policy enforce- ment paradigm for multi-agent collaboration systems

Jian Cui, Zichuan Li, Luyi Xing, and Xiaojing Liao. Maris: A formally verifiable privacy policy enforce- ment paradigm for multi-agent collaboration systems. arXiv:2505.04799, 2025

Pith/arXiv arXiv 2025
[10]

Defeating prompt injections by design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design. InIEEE Con- ference on Secure and Trustworthy Machine Learning (SaTML), Munich, Germany, 2026

2026
[11]

AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. InAd- vances in Neural Information Processing Systems, vol- ume 37, pages 82895–82920, Vancouver, BC, Canada,
[12]

Curran Associates, Inc
[13]

Mind2Web: Towards a generalist agent for the web

Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2Web: Towards a generalist agent for the web. InAdvances in Neural Information Processing Systems, volume 36, pages 28091–28114, New Orleans, LA, USA, 2023. Cur- ran Associates, Inc

2023
[14]

Laradji, Manuel Del Verme, Tom Marty, David Vázquez, Nicolas Chapados, and Alexandre Lacoste

Alexandre Drouin, Maxime Gasse, Massimo Caccia, Is- sam H. Laradji, Manuel Del Verme, Tom Marty, David Vázquez, Nicolas Chapados, and Alexandre Lacoste. WorkArena: How capable are web agents at solving common knowledge work tasks? InForty-first Interna- tional Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 116...

2024
[15]

Gray and David R

Cary G. Gray and David R. Cheriton. Leases: An effi- cient fault-tolerant mechanism for distributed file cache 15 consistency. InProceedings of the Twelfth ACM Sympo- sium on Operating Systems Principles, pages 202–210, 1989

1989
[16]

MCPXKIT: The unified toolkit for analyzing model context protocol security

Yongjian Guo, Puzhuo Liu, Wanlun Ma, Zehang Deng, Xiaogang Zhu, Peng Di, Xi Xiao, and Sheng Wen. MCPXKIT: The unified toolkit for analyzing model context protocol security. arXiv:2508.12538, 2025

Pith/arXiv arXiv 2025
[17]

SMCP: Secure model context protocol

Xinyi Hou, Shenao Wang, Yifan Zhang, Ziluo Xue, Yan- jie Zhao, Cai Fu, and Haoyu Wang. SMCP: Secure model context protocol. arXiv:2602.01129, 2026

arXiv 2026
[18]

Model context protocol (MCP): Land- scape, security threats, and future research directions

Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (MCP): Land- scape, security threats, and future research directions. arXiv:2503.23278, 2025

Pith/arXiv arXiv 2025
[19]

MalTool: Malicious tool attacks on LLM agents

Yuepeng Hu, Yuqi Jia, Mengyuan Li, Dawn Song, and Neil Gong. MalTool: Malicious tool attacks on LLM agents. arXiv:2602.12194, 2026

Pith/arXiv arXiv 2026
[20]

Ca- pability minimization as a safety primitive: Risk- aware causal gating for least-privilege LLM agents

Laxmipriya Ganesh Iyer and Rahul Suresh Babu. Ca- pability minimization as a safety primitive: Risk- aware causal gating for least-privilege LLM agents. arXiv:2606.13884, 2026

arXiv 2026
[21]

The gate is only as honest as its contracts: Contract- Guard for the contract layer of risk-aware causal gating

Laxmipriya Ganesh Iyer and Rahul Suresh Babu. The gate is only as honest as its contracts: Contract- Guard for the contract layer of risk-aware causal gating. arXiv:2606.18550, 2026

Pith/arXiv arXiv 2026
[22]

Se- mantic attacks on tool-augmented LLMs: Securing the model context protocol against descriptor-level manipulation

Saeid Jamshidi, Arghavan Moradi Dakhel, Kawser Wazed Nafi, and Foutse Khomh. Se- mantic attacks on tool-augmented LLMs: Securing the model context protocol against descriptor-level manipulation. arXiv:2512.06556, 2025

Pith/arXiv arXiv 2025
[23]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R. Narasimhan. SWE-bench: Can language models resolve real-world GitHub issues? InThe Twelfth International Conference on Learning Representations, Vienna, Aus- tria, 2024. OpenReview.net

2024
[24]

Prompt flow integrity to prevent privilege escalation in LLM agents

Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in LLM agents. arXiv:2503.15547, 2025

arXiv 2025
[25]

Le, and Tomas Pfister

Minbeom Kim, Mihir Parmar, Phillip Wallis, Lesly Mi- culicich, Kyomin Jung, Krishnamurthy Dj Dvijotham, Long T. Le, and Tomas Pfister. CausalArmor: Efficient indirect prompt injection guardrails via causal attribu- tion. arXiv:2602.07918, 2026

arXiv 2026
[26]

VisualWebArena: Evaluating multimodal agents on re- alistic visual web tasks

Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neu- big, Shuyan Zhou, Russ Salakhutdinov, and Daniel Fried. VisualWebArena: Evaluating multimodal agents on re- alistic visual web tasks. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 881–905...

2024
[27]

MobileWorld: Benchmarking au- tonomous mobile agents in agent-user interactive and MCP-augmented environments

Quyu Kong, Xu Zhang, Zhenyu Yang, Nolan Gao, Chen Liu, Panrong Tong, Chenglin Cai, Hanzhang Zhou, Jianan Zhang, Liangyu Chen, Zhidan Liu, Steven Hoi, and Yue Wang. MobileWorld: Benchmarking au- tonomous mobile agents in agent-user interactive and MCP-augmented environments. arXiv:2512.19432, 2025

arXiv 2025
[28]

Benchmarking mobile device control agents across di- verse configurations

Juyong Lee, Taywon Min, Minyong An, Dongyoon Hahm, Haeone Lee, Changyeon Kim, and Kimin Lee. Benchmarking mobile device control agents across di- verse configurations. arXiv:2404.16660, 2024

arXiv 2024
[29]

ACE: A security architecture for LLM-integrated app systems

Evan Li, Tushin Mallick, Evan Rose, William Robert- son, Alina Oprea, and Cristina Nita-Rotaru. ACE: A security architecture for LLM-integrated app systems. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2026. The Internet Soci- ety

2026
[30]

MCP-ITP: An automated framework for implicit tool poisoning in MCP

Ruiqi Li, Zhiqiang Wang, Yunhao Yao, and Xiang-Yang Li. MCP-ITP: An automated framework for implicit tool poisoning in MCP. arXiv:2601.07395, 2026

arXiv 2026
[31]

Linux man-pages project, 2025

Linux man-pages project.seccomp(2) — Linux manual page. Linux man-pages project, 2025

2025
[32]

AgentBench: Evaluating LLMs as agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. AgentBench: Evaluating LLMs as agents. InThe Twelfth International Conference on Learn- ing...

2024
[33]

AgentBoard: An analytical evaluation board of multi-turn LLM agents

Chang Ma, Junlei Zhang, Zhihao Zhu, Cheng Yang, Yu- jiu Yang, Yaohui Jin, Zhenzhong Lan, Lingpeng Kong, and Junxian He. AgentBoard: An analytical evaluation board of multi-turn LLM agents. InAdvances in Neu- ral Information Processing Systems, volume 37, pages 74325–74362, Vancouver, BC, Canada, 2024. Curran Associates, Inc

2024
[34]

Breaking the pro- tocol: Security analysis of the model context protocol specification and prompt injection vulnerabilities in tool- integrated LLM agents

Narek Maloyan and Dmitry Namiot. Breaking the pro- tocol: Security analysis of the model context protocol specification and prompt injection vulnerabilities in tool- integrated LLM agents. arXiv:2601.17549, 2026. 16

arXiv 2026
[35]

Quantifying frontier LLM capabilities for container sandbox escape

Rahul Marchand, Art O Cathain, Jerome Wynne, Philip- pos Maximos Giavridis, Sam Deverett, John Wilkin- son, Jason Gwartz, and Harry Coppock. Quantifying frontier LLM capabilities for container sandbox escape. arXiv:2603.02277, 2026

arXiv 2026
[36]

ceLLMate: Sandboxing browser AI agents

Luoxi Meng, Henry Feng, Ilia Shumailov, and Earlence Fernandes. ceLLMate: Sandboxing browser AI agents. arXiv:2512.12594, 2025

arXiv 2025
[37]

GAIA: A bench- mark for general AI assistants

Grégoire Mialon, Clémentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. GAIA: A bench- mark for general AI assistants. InThe Twelfth Interna- tional Conference on Learning Representations, Vienna, Austria, 2024. OpenReview.net

2024
[38]

Attractive metadata attack: Inducing LLM agents to invoke malicious tools

Kanghua Mo, Li Hu, Yucheng Long, and Zhihao Li. Attractive metadata attack: Inducing LLM agents to invoke malicious tools. arXiv:2508.02110, 2025

arXiv 2025
[39]

Formal policy enforcement for real-world agentic systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi, Guy Amir, Prasad Chalasani, and Somesh Jha. Formal policy enforcement for real-world agentic systems. arXiv:2602.16708, 2026

Pith/arXiv arXiv 2026
[40]

The UCONABC usage control model.ACM Transactions on Information and System Security, 7(1):128–174, 2004

Jaehong Park and Ravi Sandhu. The UCONABC usage control model.ACM Transactions on Information and System Security, 7(1):128–174, 2004

2004
[41]

SWE-PolyBench: A multi-language benchmark for repository-level evaluation of coding agents

Muhammad Shihab Rashid, Christian Bock, Yuan Zhuang, Alexander Buccholz, Tim Esler, Simon Valentin, Luca Franceschi, Martin Wistuba, Prabhu Teja Sivaprasad, Woo Jung Kim, et al. SWE-PolyBench: A multi-language benchmark for repository-level evaluation of coding agents. arXiv:2504.08703, 2025

arXiv 2025
[42]

AndroidWorld: A dynamic benchmarking environment for autonomous agents

Chris Rawles, Sarah Clinckemaillie, Yifan Chang, Jonathan Waltz, Gabrielle Lau, Marybeth Fair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, Daniel Toyama, Robert Berry, Divya Tyamagundlu, Tim- othy Lillicrap, and Oriana Riva. AndroidWorld: A dynamic benchmarking environment for autonomous agents. InThe Thirteenth International Conference on ...

2025
[43]

Maddison, and Tatsunori Hashimoto

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. ToolEmu: Iden- tifying the risks of LM agents with an LM-emulated sandbox. InThe Twelfth International Conference on Learning Representations, Vienna, Austria, 2024. Open- Review.net

2024
[44]

Saltzer and Michael D

Jerome H. Saltzer and Michael D. Schroeder. The pro- tection of information in computer systems.Proceed- ings of the IEEE, 63(9):1278–1308, 1975

1975
[45]

Prompt injec- tion attack to tool selection in LLM agents

Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. Prompt injec- tion attack to tool selection in LLM agents. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2026. The Internet Society

2026
[46]

Pro- gent: Securing AI agents with privilege control

Tianneng Shi, Jingxuan He, Zhun Wang, Linyu Wu, Hongwei Li, Wenbo Guo, and Dawn Song. Pro- gent: Securing AI agents with privilege control. arXiv:2504.11703, 2025

Pith/arXiv arXiv 2025
[47]

PromptArmor: Simple yet effective prompt injection defenses

Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, and Dawn Song. PromptArmor: Simple yet effective prompt injection defenses. arXiv:2507.15219, 2025

arXiv 2025
[48]

SAGA: A security archi- tecture for governing AI agentic systems

Georgios Syros, Anshuman Suri, Jacob Ginesin, Cristina Nita-Rotaru, and Alina Oprea. SAGA: A security archi- tecture for governing AI agentic systems. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2026. The Internet Society

2026
[49]

Poskitt, and Jun Sun

Haoyu Wang, Christopher M. Poskitt, and Jun Sun. AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents. InProceedings of the IEEE/ACM International Conference on Software Engi- neering (ICSE 2026), pages 1–12, Rio de Janeiro, Brazil,

2026
[50]

Association for Computing Machinery
[51]

MCPTox: A benchmark for tool poisoning attack on real-world MCP servers

Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Hao- hua Du, and Xiangyang Li. MCPTox: A benchmark for tool poisoning attack on real-world MCP servers. arXiv:2508.14925, 2025

arXiv 2025
[52]

Robert N. M. Watson, Jonathan Anderson, Ben Laurie, and Kris Kennaway. Capsicum: Practical capabilities for UNIX. In19th USENIX Security Symposium (USENIX Security 2010), pages 29–44, Washington, DC, USA,

2010
[53]

Robert N. M. Watson, Peter G. Neumann, Jonathan Woodruff, Simon W. Moore, Jonathan Anderson, David Chisnall, Nirav Dave, Brooks Davis, Khilan Gudka, and Ben Laurie. CHERI: A hybrid capability-system archi- tecture for scalable software compartmentalization. In 2015 IEEE Symposium on Security and Privacy, pages 20–37, San Jose, CA, USA, 2015. IEEE Computer...

2015
[54]

IsolateGPT: An execution iso- lation architecture for LLM-based systems

Yuhao Wu, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. IsolateGPT: An execution iso- lation architecture for LLM-based systems. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2025. The Internet Society. 17

2025
[55]

OSWorld: Benchmarking multimodal agents for open-ended tasks in real com- puter environments

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhou- jun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yi- heng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, and Tao Yu. OSWorld: Benchmarking multimodal agents for open-ended tasks in real com- puter environments. InAdvances in Neural Informa...

2024
[56]

MCPSecBench: A sys- tematic security benchmark and playground for testing model context protocols

Yixuan Yang, Cuifeng Gao, Daoyuan Wu, Yufan Chen, Yingjiu Li, and Shuai Wang. MCPSecBench: A sys- tematic security benchmark and playground for testing model context protocols. arXiv:2508.13220, 2025

arXiv 2025
[57]

Multi-agent memory from a computer ar- chitecture perspective: Visions and challenges ahead

Zhongming Yu, Naicheng Yu, Hejia Zhang, Wentao Ni, Mingrui Yin, Jiaying Yang, Yujie Zhao, and Jishen Zhao. Multi-agent memory from a computer ar- chitecture perspective: Visions and challenges ahead. arXiv:2603.10062, 2026

arXiv 2026
[58]

InjecAgent: Benchmarking indirect prompt injec- tions in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect prompt injec- tions in tool-integrated large language model agents. In Findings of the Association for Computational Linguis- tics: ACL 2024, pages 10471–10506, Bangkok, Thai- land, 2024. Association for Computational Linguistics

2024
[59]

Agent security bench (ASB): For- malizing and benchmarking attacks and defenses in LLM-based agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (ASB): For- malizing and benchmarking attacks and defenses in LLM-based agents. InThe Thirteenth International Conference on Learning Representations, Singapore,
[60]

AgentAlign: Navigating safety alignment in the shift from informative to agentic large language models

Jinchuan Zhang, Lu Yin, Yan Zhou, and Songlin Hu. AgentAlign: Navigating safety alignment in the shift from informative to agentic large language models. arXiv:2505.23020, 2025

arXiv 2025
[61]

AgentCgroup: Under- standing and controlling OS resources of AI agents

Yusheng Zheng, Jiakun Fan, Quanzhi Fu, Yiwei Yang, Wei Zhang, and Andi Quinn. AgentCgroup: Under- standing and controlling OS resources of AI agents. arXiv:2602.09345, 2026

arXiv 2026
[62]

Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neu- big

Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neu- big. WebArena: A realistic web environment for build- ing autonomous agents. InThe Twelfth International Conference on Learning Representations, Vienna, Aus- tria, 2024. OpenReview.net

2024
[63]

Patil, Vivian Fang, and Raluca Ada Popa

Jinhao Zhu, Kevin Tseng, Gil Vernik, Xiao Huang, Shishir G. Patil, Vivian Fang, and Raluca Ada Popa. MiniScope: A least privilege framework for authorizing tool calling agents. arXiv:2512.11147, 2025. A Representative Trace Witnesses The main text uses compact running examples. This appendix records the corresponding checked-in witnesses and the tran- sit...

arXiv 2025

[1] [1]

Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt in- jection

Sahar Abdelnabi, Kai Greshake, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt in- jection. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security, AISec ’23, pages 79–90, Copenhagen, Denmark, 2023. Association...

2023

[2] [2]

Zico Kolter, Matt Fredrik- son, Yarin Gal, and Xander Davies

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, J. Zico Kolter, Matt Fredrik- son, Yarin Gal, and Xander Davies. AgentHarm: A benchmark for measuring harmfulness of LLM agents. InThe Thirteenth International Conference on Learning Representations, Singapore, 2025. OpenReview.net

2025

[3] [3]

Con- tract2Tool: Learning preconditions and effects for reli- able tool-augmented LLM agents

Rahul Suresh Babu and Laxmipriya Ganesh Iyer. Con- tract2Tool: Learning preconditions and effects for reli- able tool-augmented LLM agents. arXiv:2606.07904, 2026

Pith/arXiv arXiv 2026

[4] [4]

Tool- ChoiceConfusion: Causal minimal tool filtering for reli- able LLM agents

Rahul Suresh Babu and Laxmipriya Ganesh Iyer. Tool- ChoiceConfusion: Causal minimal tool filtering for reli- able LLM agents. arXiv:2606.06284, 2026

Pith/arXiv arXiv 2026

[5] [5]

Design patterns for securing LLM agents against prompt injections

Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cre¸ tu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, and Václav V olhejn. Design patterns for securing LLM agents against prompt injections. arXiv:2506.08837, 2025

arXiv 2025

[6] [6]

Taylor, Krishnamurthy Dj Dvijotham, and Alexandre Lacoste

Rishika Bhagwatkar, Kevin Kasa, Abhay Puri, Gabriel Huang, Irina Rish, Graham W. Taylor, Krishnamurthy Dj Dvijotham, and Alexandre Lacoste. Indirect prompt injections: Are firewalls all you need, or stronger bench- marks? arXiv:2510.05244, 2025

arXiv 2025

[7] [7]

AgentBound: Securing execution boundaries of AI agents.Proceedings of the ACM on Software Engineering, 3(FSE), 2026

Christoph Bühler, Matteo Biagiola, Luca Di Grazia, and Guido Salvaneschi. AgentBound: Securing execution boundaries of AI agents.Proceedings of the ACM on Software Engineering, 3(FSE), 2026

2026

[8] [8]

LlamaFire- wall: An open source guardrail system for building se- cure AI agents

Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, Alekhya Gampa, Beto de Paola, Dominik Gabi, James Crnkovich, Jean-Christophe Testud, Kat He, Rash- nil Chaturvedi, Wu Zhou, and Joshua Saxe. LlamaFire- wall: An open source guardrail system fo...

arXiv 2025

[9] [9]

Maris: A formally verifiable privacy policy enforce- ment paradigm for multi-agent collaboration systems

Jian Cui, Zichuan Li, Luyi Xing, and Xiaojing Liao. Maris: A formally verifiable privacy policy enforce- ment paradigm for multi-agent collaboration systems. arXiv:2505.04799, 2025

Pith/arXiv arXiv 2025

[10] [10]

Defeating prompt injections by design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design. InIEEE Con- ference on Secure and Trustworthy Machine Learning (SaTML), Munich, Germany, 2026

2026

[11] [11]

AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. InAd- vances in Neural Information Processing Systems, vol- ume 37, pages 82895–82920, Vancouver, BC, Canada,

[12] [12]

Curran Associates, Inc

[13] [13]

Mind2Web: Towards a generalist agent for the web

Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2Web: Towards a generalist agent for the web. InAdvances in Neural Information Processing Systems, volume 36, pages 28091–28114, New Orleans, LA, USA, 2023. Cur- ran Associates, Inc

2023

[14] [14]

Laradji, Manuel Del Verme, Tom Marty, David Vázquez, Nicolas Chapados, and Alexandre Lacoste

Alexandre Drouin, Maxime Gasse, Massimo Caccia, Is- sam H. Laradji, Manuel Del Verme, Tom Marty, David Vázquez, Nicolas Chapados, and Alexandre Lacoste. WorkArena: How capable are web agents at solving common knowledge work tasks? InForty-first Interna- tional Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 116...

2024

[15] [15]

Gray and David R

Cary G. Gray and David R. Cheriton. Leases: An effi- cient fault-tolerant mechanism for distributed file cache 15 consistency. InProceedings of the Twelfth ACM Sympo- sium on Operating Systems Principles, pages 202–210, 1989

1989

[16] [16]

MCPXKIT: The unified toolkit for analyzing model context protocol security

Yongjian Guo, Puzhuo Liu, Wanlun Ma, Zehang Deng, Xiaogang Zhu, Peng Di, Xi Xiao, and Sheng Wen. MCPXKIT: The unified toolkit for analyzing model context protocol security. arXiv:2508.12538, 2025

Pith/arXiv arXiv 2025

[17] [17]

SMCP: Secure model context protocol

Xinyi Hou, Shenao Wang, Yifan Zhang, Ziluo Xue, Yan- jie Zhao, Cai Fu, and Haoyu Wang. SMCP: Secure model context protocol. arXiv:2602.01129, 2026

arXiv 2026

[18] [18]

Model context protocol (MCP): Land- scape, security threats, and future research directions

Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (MCP): Land- scape, security threats, and future research directions. arXiv:2503.23278, 2025

Pith/arXiv arXiv 2025

[19] [19]

MalTool: Malicious tool attacks on LLM agents

Yuepeng Hu, Yuqi Jia, Mengyuan Li, Dawn Song, and Neil Gong. MalTool: Malicious tool attacks on LLM agents. arXiv:2602.12194, 2026

Pith/arXiv arXiv 2026

[20] [20]

Ca- pability minimization as a safety primitive: Risk- aware causal gating for least-privilege LLM agents

Laxmipriya Ganesh Iyer and Rahul Suresh Babu. Ca- pability minimization as a safety primitive: Risk- aware causal gating for least-privilege LLM agents. arXiv:2606.13884, 2026

arXiv 2026

[21] [21]

The gate is only as honest as its contracts: Contract- Guard for the contract layer of risk-aware causal gating

Laxmipriya Ganesh Iyer and Rahul Suresh Babu. The gate is only as honest as its contracts: Contract- Guard for the contract layer of risk-aware causal gating. arXiv:2606.18550, 2026

Pith/arXiv arXiv 2026

[22] [22]

Se- mantic attacks on tool-augmented LLMs: Securing the model context protocol against descriptor-level manipulation

Saeid Jamshidi, Arghavan Moradi Dakhel, Kawser Wazed Nafi, and Foutse Khomh. Se- mantic attacks on tool-augmented LLMs: Securing the model context protocol against descriptor-level manipulation. arXiv:2512.06556, 2025

Pith/arXiv arXiv 2025

[23] [23]

Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R

Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R. Narasimhan. SWE-bench: Can language models resolve real-world GitHub issues? InThe Twelfth International Conference on Learning Representations, Vienna, Aus- tria, 2024. OpenReview.net

2024

[24] [24]

Prompt flow integrity to prevent privilege escalation in LLM agents

Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in LLM agents. arXiv:2503.15547, 2025

arXiv 2025

[25] [25]

Le, and Tomas Pfister

Minbeom Kim, Mihir Parmar, Phillip Wallis, Lesly Mi- culicich, Kyomin Jung, Krishnamurthy Dj Dvijotham, Long T. Le, and Tomas Pfister. CausalArmor: Efficient indirect prompt injection guardrails via causal attribu- tion. arXiv:2602.07918, 2026

arXiv 2026

[26] [26]

VisualWebArena: Evaluating multimodal agents on re- alistic visual web tasks

Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neu- big, Shuyan Zhou, Russ Salakhutdinov, and Daniel Fried. VisualWebArena: Evaluating multimodal agents on re- alistic visual web tasks. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 881–905...

2024

[27] [27]

MobileWorld: Benchmarking au- tonomous mobile agents in agent-user interactive and MCP-augmented environments

Quyu Kong, Xu Zhang, Zhenyu Yang, Nolan Gao, Chen Liu, Panrong Tong, Chenglin Cai, Hanzhang Zhou, Jianan Zhang, Liangyu Chen, Zhidan Liu, Steven Hoi, and Yue Wang. MobileWorld: Benchmarking au- tonomous mobile agents in agent-user interactive and MCP-augmented environments. arXiv:2512.19432, 2025

arXiv 2025

[28] [28]

Benchmarking mobile device control agents across di- verse configurations

Juyong Lee, Taywon Min, Minyong An, Dongyoon Hahm, Haeone Lee, Changyeon Kim, and Kimin Lee. Benchmarking mobile device control agents across di- verse configurations. arXiv:2404.16660, 2024

arXiv 2024

[29] [29]

ACE: A security architecture for LLM-integrated app systems

Evan Li, Tushin Mallick, Evan Rose, William Robert- son, Alina Oprea, and Cristina Nita-Rotaru. ACE: A security architecture for LLM-integrated app systems. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2026. The Internet Soci- ety

2026

[30] [30]

MCP-ITP: An automated framework for implicit tool poisoning in MCP

Ruiqi Li, Zhiqiang Wang, Yunhao Yao, and Xiang-Yang Li. MCP-ITP: An automated framework for implicit tool poisoning in MCP. arXiv:2601.07395, 2026

arXiv 2026

[31] [31]

Linux man-pages project, 2025

Linux man-pages project.seccomp(2) — Linux manual page. Linux man-pages project, 2025

2025

[32] [32]

AgentBench: Evaluating LLMs as agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. AgentBench: Evaluating LLMs as agents. InThe Twelfth International Conference on Learn- ing...

2024

[33] [33]

AgentBoard: An analytical evaluation board of multi-turn LLM agents

Chang Ma, Junlei Zhang, Zhihao Zhu, Cheng Yang, Yu- jiu Yang, Yaohui Jin, Zhenzhong Lan, Lingpeng Kong, and Junxian He. AgentBoard: An analytical evaluation board of multi-turn LLM agents. InAdvances in Neu- ral Information Processing Systems, volume 37, pages 74325–74362, Vancouver, BC, Canada, 2024. Curran Associates, Inc

2024

[34] [34]

Breaking the pro- tocol: Security analysis of the model context protocol specification and prompt injection vulnerabilities in tool- integrated LLM agents

Narek Maloyan and Dmitry Namiot. Breaking the pro- tocol: Security analysis of the model context protocol specification and prompt injection vulnerabilities in tool- integrated LLM agents. arXiv:2601.17549, 2026. 16

arXiv 2026

[35] [35]

Quantifying frontier LLM capabilities for container sandbox escape

Rahul Marchand, Art O Cathain, Jerome Wynne, Philip- pos Maximos Giavridis, Sam Deverett, John Wilkin- son, Jason Gwartz, and Harry Coppock. Quantifying frontier LLM capabilities for container sandbox escape. arXiv:2603.02277, 2026

arXiv 2026

[36] [36]

ceLLMate: Sandboxing browser AI agents

Luoxi Meng, Henry Feng, Ilia Shumailov, and Earlence Fernandes. ceLLMate: Sandboxing browser AI agents. arXiv:2512.12594, 2025

arXiv 2025

[37] [37]

GAIA: A bench- mark for general AI assistants

Grégoire Mialon, Clémentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. GAIA: A bench- mark for general AI assistants. InThe Twelfth Interna- tional Conference on Learning Representations, Vienna, Austria, 2024. OpenReview.net

2024

[38] [38]

Attractive metadata attack: Inducing LLM agents to invoke malicious tools

Kanghua Mo, Li Hu, Yucheng Long, and Zhihao Li. Attractive metadata attack: Inducing LLM agents to invoke malicious tools. arXiv:2508.02110, 2025

arXiv 2025

[39] [39]

Formal policy enforcement for real-world agentic systems

Nils Palumbo, Sarthak Choudhary, Jihye Choi, Guy Amir, Prasad Chalasani, and Somesh Jha. Formal policy enforcement for real-world agentic systems. arXiv:2602.16708, 2026

Pith/arXiv arXiv 2026

[40] [40]

The UCONABC usage control model.ACM Transactions on Information and System Security, 7(1):128–174, 2004

Jaehong Park and Ravi Sandhu. The UCONABC usage control model.ACM Transactions on Information and System Security, 7(1):128–174, 2004

2004

[41] [41]

SWE-PolyBench: A multi-language benchmark for repository-level evaluation of coding agents

Muhammad Shihab Rashid, Christian Bock, Yuan Zhuang, Alexander Buccholz, Tim Esler, Simon Valentin, Luca Franceschi, Martin Wistuba, Prabhu Teja Sivaprasad, Woo Jung Kim, et al. SWE-PolyBench: A multi-language benchmark for repository-level evaluation of coding agents. arXiv:2504.08703, 2025

arXiv 2025

[42] [42]

AndroidWorld: A dynamic benchmarking environment for autonomous agents

Chris Rawles, Sarah Clinckemaillie, Yifan Chang, Jonathan Waltz, Gabrielle Lau, Marybeth Fair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, Daniel Toyama, Robert Berry, Divya Tyamagundlu, Tim- othy Lillicrap, and Oriana Riva. AndroidWorld: A dynamic benchmarking environment for autonomous agents. InThe Thirteenth International Conference on ...

2025

[43] [43]

Maddison, and Tatsunori Hashimoto

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. ToolEmu: Iden- tifying the risks of LM agents with an LM-emulated sandbox. InThe Twelfth International Conference on Learning Representations, Vienna, Austria, 2024. Open- Review.net

2024

[44] [44]

Saltzer and Michael D

Jerome H. Saltzer and Michael D. Schroeder. The pro- tection of information in computer systems.Proceed- ings of the IEEE, 63(9):1278–1308, 1975

1975

[45] [45]

Prompt injec- tion attack to tool selection in LLM agents

Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. Prompt injec- tion attack to tool selection in LLM agents. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2026. The Internet Society

2026

[46] [46]

Pro- gent: Securing AI agents with privilege control

Tianneng Shi, Jingxuan He, Zhun Wang, Linyu Wu, Hongwei Li, Wenbo Guo, and Dawn Song. Pro- gent: Securing AI agents with privilege control. arXiv:2504.11703, 2025

Pith/arXiv arXiv 2025

[47] [47]

PromptArmor: Simple yet effective prompt injection defenses

Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, and Dawn Song. PromptArmor: Simple yet effective prompt injection defenses. arXiv:2507.15219, 2025

arXiv 2025

[48] [48]

SAGA: A security archi- tecture for governing AI agentic systems

Georgios Syros, Anshuman Suri, Jacob Ginesin, Cristina Nita-Rotaru, and Alina Oprea. SAGA: A security archi- tecture for governing AI agentic systems. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2026. The Internet Society

2026

[49] [49]

Poskitt, and Jun Sun

Haoyu Wang, Christopher M. Poskitt, and Jun Sun. AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents. InProceedings of the IEEE/ACM International Conference on Software Engi- neering (ICSE 2026), pages 1–12, Rio de Janeiro, Brazil,

2026

[50] [50]

Association for Computing Machinery

[51] [51]

MCPTox: A benchmark for tool poisoning attack on real-world MCP servers

Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Hao- hua Du, and Xiangyang Li. MCPTox: A benchmark for tool poisoning attack on real-world MCP servers. arXiv:2508.14925, 2025

arXiv 2025

[52] [52]

Robert N. M. Watson, Jonathan Anderson, Ben Laurie, and Kris Kennaway. Capsicum: Practical capabilities for UNIX. In19th USENIX Security Symposium (USENIX Security 2010), pages 29–44, Washington, DC, USA,

2010

[53] [53]

Robert N. M. Watson, Peter G. Neumann, Jonathan Woodruff, Simon W. Moore, Jonathan Anderson, David Chisnall, Nirav Dave, Brooks Davis, Khilan Gudka, and Ben Laurie. CHERI: A hybrid capability-system archi- tecture for scalable software compartmentalization. In 2015 IEEE Symposium on Security and Privacy, pages 20–37, San Jose, CA, USA, 2015. IEEE Computer...

2015

[54] [54]

IsolateGPT: An execution iso- lation architecture for LLM-based systems

Yuhao Wu, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. IsolateGPT: An execution iso- lation architecture for LLM-based systems. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2025. The Internet Society. 17

2025

[55] [55]

OSWorld: Benchmarking multimodal agents for open-ended tasks in real com- puter environments

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhou- jun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yi- heng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, and Tao Yu. OSWorld: Benchmarking multimodal agents for open-ended tasks in real com- puter environments. InAdvances in Neural Informa...

2024

[56] [56]

MCPSecBench: A sys- tematic security benchmark and playground for testing model context protocols

Yixuan Yang, Cuifeng Gao, Daoyuan Wu, Yufan Chen, Yingjiu Li, and Shuai Wang. MCPSecBench: A sys- tematic security benchmark and playground for testing model context protocols. arXiv:2508.13220, 2025

arXiv 2025

[57] [57]

Multi-agent memory from a computer ar- chitecture perspective: Visions and challenges ahead

Zhongming Yu, Naicheng Yu, Hejia Zhang, Wentao Ni, Mingrui Yin, Jiaying Yang, Yujie Zhao, and Jishen Zhao. Multi-agent memory from a computer ar- chitecture perspective: Visions and challenges ahead. arXiv:2603.10062, 2026

arXiv 2026

[58] [58]

InjecAgent: Benchmarking indirect prompt injec- tions in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect prompt injec- tions in tool-integrated large language model agents. In Findings of the Association for Computational Linguis- tics: ACL 2024, pages 10471–10506, Bangkok, Thai- land, 2024. Association for Computational Linguistics

2024

[59] [59]

Agent security bench (ASB): For- malizing and benchmarking attacks and defenses in LLM-based agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (ASB): For- malizing and benchmarking attacks and defenses in LLM-based agents. InThe Thirteenth International Conference on Learning Representations, Singapore,

[60] [60]

AgentAlign: Navigating safety alignment in the shift from informative to agentic large language models

Jinchuan Zhang, Lu Yin, Yan Zhou, and Songlin Hu. AgentAlign: Navigating safety alignment in the shift from informative to agentic large language models. arXiv:2505.23020, 2025

arXiv 2025

[61] [61]

AgentCgroup: Under- standing and controlling OS resources of AI agents

Yusheng Zheng, Jiakun Fan, Quanzhi Fu, Yiwei Yang, Wei Zhang, and Andi Quinn. AgentCgroup: Under- standing and controlling OS resources of AI agents. arXiv:2602.09345, 2026

arXiv 2026

[62] [62]

Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neu- big

Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neu- big. WebArena: A realistic web environment for build- ing autonomous agents. InThe Twelfth International Conference on Learning Representations, Vienna, Aus- tria, 2024. OpenReview.net

2024

[63] [63]

Patil, Vivian Fang, and Raluca Ada Popa

Jinhao Zhu, Kevin Tseng, Gil Vernik, Xiao Huang, Shishir G. Patil, Vivian Fang, and Raluca Ada Popa. MiniScope: A least privilege framework for authorizing tool calling agents. arXiv:2512.11147, 2025. A Representative Trace Witnesses The main text uses compact running examples. This appendix records the corresponding checked-in witnesses and the tran- sit...

arXiv 2025