pith. sign in

arxiv: 2606.22504 · v1 · pith:2XNHXILKnew · submitted 2026-06-21 · 💻 cs.CR · cs.SE

Lingering Authority: Revocable Resource-and-Effect Capabilities for Coding Agents

Pith reviewed 2026-06-26 10:10 UTC · model grok-4.3

classification 💻 cs.CR cs.SE
keywords coding agentsrevocable capabilitiesreference monitorlingering authoritytask contractsaccess controlAI securityresource revocation
0
0 comments X

The pith

PORTICO prevents coding agents from reusing resources or causing forbidden effects after their justifying task episode closes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PORTICO to close the gap of lingering authority, in which coding agents keep broad tool access even after the specific subgoal that required it has ended. It works by compiling an explicit task contract into initial capabilities, grant rules, trusted closure predicates, and global deny rules, then issuing opaque epoch-bound handles that are removed from the planner interface on closure. The system rejects any stale replay before side effects can occur. In controlled evaluations, PORTICO records zero executed contract-forbidden effects and rejects every post-closure reuse attempt, while an otherwise identical non-revoking comparator allows all of them. This separation lets agents receive the grants they need for productive work without leaving permanent exposure.

Core claim

PORTICO is a reference monitor that materializes capability expansions as opaque, epoch-bound handles through a request-grant-invoke lifecycle. Closure removes those handles from the next planner interface and rejects stale replay before side effects. In controlled coding-agent tasks, PORTICO records no executed contract-forbidden effects, rejects 10/10 post-closure reuses, and shows 0/6 forbidden effects in a stale-write audit, while a non-revoking comparator permits 10/10 reuses and 6/6 forbidden effects; both systems match on task success, scope compliance, and all pre-closure decisions. Scripted traces and live model traces over file writes, git mutation, and network egress reproduce the

What carries the argument

The PORTICO reference monitor, which compiles an explicit task contract into initial capabilities, grant rules, trusted closure predicates, and global deny rules, then enforces revocation via epoch-bound handles.

If this is right

  • Controlled grants recover boundary work that a fixed narrow envelope would block.
  • On the closure slice, task success, scope compliance, and pre-closure decisions remain identical to the non-revoking comparator.
  • PORTICO rejects all 10/10 post-closure reuses while the comparator permits all 10/10.
  • A deterministic stale-write audit records 0/6 executed forbidden effects versus 6/6 for the comparator.
  • Broad request exposure in a four-episode diagnostic preserves zero executed forbidden effects while raising blocked proposals from 67 to 84.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same contract-driven revocation pattern could be tested on agents that interact with external services beyond file and git operations.
  • Extending the closure predicates to handle concurrent subtasks might reduce the observed increase in blocked proposals.
  • Real-repository runs suggest the lifecycle can be exercised on live project layouts without requiring changes to the underlying language model.
  • The approach opens the possibility of policy evolution that tracks task progress rather than relying on static initial envelopes.

Load-bearing premise

The monitor assumes mediated tools and a sound typed catalog.

What would settle it

A controlled run in which PORTICO allows even one contract-forbidden effect to execute after the relevant closure would falsify the reported safety result.

Figures

Figures reproduced from arXiv: 2606.22504 by Igor Santos-Grueiro.

Figure 1
Figure 1. Figure 1: Four nested but non-equivalent descriptor sets. In the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Contract excerpt with compiler-derived lifetime for [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Semantic meaning of authority closure. Both rows [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
read the original abstract

Coding agents often receive broad tool access for an entire task, even when a resource is needed only for one subgoal. We call this gap lingering authority: a temporary resource/effect capability remains exposed after the episode that justified it has closed. PORTICO is a reference monitor for revocable capabilities exposed to the planner. It compiles an explicit task contract into initial capabilities, grant rules, trusted closure predicates, and global deny rules. A request-grant-invoke lifecycle materializes expansions as opaque, epoch-bound handles. Closure removes those handles from the next planner interface and rejects stale replay before side effects. The monitor assumes mediated tools and a sound typed catalog. In controlled coding-agent tasks, PORTICO records no executed contract-forbidden effects in the evaluated runs, while controlled grants recover boundary work blocked by a fixed narrow envelope. A non-revoking comparator receives the same initial envelope and the same grants at the same turns. On the closure slice, both systems match task success, scope compliance, and all pre-closure decisions; PORTICO then rejects 10/10 post-closure reuses, while the comparator permits 10/10. A deterministic stale-write audit records 0/6 versus 6/6 executed forbidden effects. Scripted traces and six live model traces over file writes, git mutation, and network egress show the same split. In a four-episode same-policy diagnostic, broad request exposure preserves zero executed forbidden effects but raises blocked proposals from 67 to 84. Frozen real-repository runs, with commits and traces recorded, exercise the same lifecycle on real project layouts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces PORTICO, a reference monitor that compiles task contracts into revocable, epoch-bound capabilities for coding agents. It enforces a request-grant-invoke lifecycle with closure predicates and deny rules to eliminate lingering authority after subgoals close. In controlled experiments on scripted traces and live model runs involving file writes, git mutations, and network egress, PORTICO records zero executed contract-forbidden effects post-closure (rejecting 10/10 reuses and 0/6 stale writes), while a non-revoking comparator permits all 10/10 reuses and 6/6 forbidden effects; task success and pre-closure compliance remain matched.

Significance. If the results hold under the stated assumptions, the work supplies a practical, contract-driven mechanism for limiting authority exposure in AI coding agents, directly addressing a security gap in tool-using planners. The controlled comparisons, including the deterministic stale-write audit and frozen real-repository runs with recorded commits and traces, provide concrete, falsifiable evidence of the revocation lifecycle's effect; the four-episode diagnostic further isolates the impact of broad request exposure on blocked proposals.

major comments (2)
  1. [Abstract] Abstract and experimental evaluation: the central claim that PORTICO produces zero executed forbidden effects (0/6 in the stale-write audit, 10/10 post-closure rejections) is conditioned on the unverified assumptions of complete tool mediation and a sound typed catalog. No coverage argument, proof, or audit is supplied showing that the catalog classifies every side-effect of the file, git, and network tools appearing in the scripted and live traces; if either assumption fails for a single operation, the observed split versus the comparator cannot be attributed to the revocation mechanism.
  2. [Experimental evaluation] Experimental setup (controlled tasks and live model traces): the methodology reports that both systems receive identical initial envelopes and grants, yet supplies no verification step or coverage metric confirming that all tool invocations in the evaluated runs were in fact mediated through the reference monitor. This verification is load-bearing for the claim that the zero forbidden effects result from PORTICO rather than from incomplete tool exposure in the test harness.
minor comments (1)
  1. [Abstract] The abstract states that 'controlled grants recover boundary work blocked by a fixed narrow envelope' but does not define the envelope construction or report the quantitative recovery rate; a brief clarification or table entry would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for identifying the importance of verifying the mediation assumptions in our experiments. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract and experimental evaluation: the central claim that PORTICO produces zero executed forbidden effects (0/6 in the stale-write audit, 10/10 post-closure rejections) is conditioned on the unverified assumptions of complete tool mediation and a sound typed catalog. No coverage argument, proof, or audit is supplied showing that the catalog classifies every side-effect of the file, git, and network tools appearing in the scripted and live traces; if either assumption fails for a single operation, the observed split versus the comparator cannot be attributed to the revocation mechanism.

    Authors: The manuscript explicitly states that the monitor assumes mediated tools and a sound typed catalog. The experimental results, including the zero forbidden effects, are presented under these assumptions. We do not supply a coverage argument or proof because the contribution centers on the revocable capability lifecycle rather than exhaustive tool verification. In the test harness, tool invocations are mediated by construction for the evaluated operations. Since the comparator uses the identical harness, the performance split isolates the effect of revocation. We will revise the manuscript to expand the discussion of the catalog and mediation assumptions in the abstract and evaluation sections. revision: partial

  2. Referee: [Experimental evaluation] Experimental setup (controlled tasks and live model traces): the methodology reports that both systems receive identical initial envelopes and grants, yet supplies no verification step or coverage metric confirming that all tool invocations in the evaluated runs were in fact mediated through the reference monitor. This verification is load-bearing for the claim that the zero forbidden effects result from PORTICO rather than from incomplete tool exposure in the test harness.

    Authors: We agree that an explicit verification step would strengthen the attribution. The current description indicates that the harness provides the same envelopes and that tools are accessed via the monitor, but does not include a separate coverage metric or audit log. In revision, we will add details on the mediation layer in the experimental setup to confirm that all invocations in the scripted and live traces were routed through PORTICO, thereby supporting that the results stem from the revocation mechanism. revision: partial

Circularity Check

0 steps flagged

No circularity; experimental claims rest on direct comparisons without self-referential derivations or load-bearing self-citations

full rationale

The paper describes a reference monitor (PORTICO) and reports results from scripted traces, live model runs, and audits comparing it to a non-revoking comparator. No equations, fitted parameters, or derivation chains appear. The central claims (zero forbidden effects under PORTICO vs. 6/6 under comparator) are produced by the experimental setup itself rather than by any reduction to prior self-citations or definitional identities. The stated assumptions (mediated tools, sound catalog) are explicit preconditions but are not used to derive the observed split; they are simply the conditions under which the monitor operates. This matches the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review limited to abstract; no free parameters identified; one explicit domain assumption stated.

axioms (1)
  • domain assumption mediated tools and a sound typed catalog
    Explicitly listed as required for the reference monitor to operate.
invented entities (1)
  • PORTICO no independent evidence
    purpose: Reference monitor implementing revocable resource-and-effect capabilities for coding agent planners
    Newly introduced system whose design and evaluation form the core of the paper.

pith-pipeline@v0.9.1-grok · 5818 in / 1294 out tokens · 31885 ms · 2026-06-26T10:10:35.231757+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 10 linked inside Pith

  1. [1]

    Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt in- jection

    Sahar Abdelnabi, Kai Greshake, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt in- jection. InProceedings of the 16th ACM Workshop on Artificial Intelligence and Security, AISec ’23, pages 79–90, Copenhagen, Denmark, 2023. Association...

  2. [2]

    Zico Kolter, Matt Fredrik- son, Yarin Gal, and Xander Davies

    Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, J. Zico Kolter, Matt Fredrik- son, Yarin Gal, and Xander Davies. AgentHarm: A benchmark for measuring harmfulness of LLM agents. InThe Thirteenth International Conference on Learning Representations, Singapore, 2025. OpenReview.net

  3. [3]

    Con- tract2Tool: Learning preconditions and effects for reli- able tool-augmented LLM agents

    Rahul Suresh Babu and Laxmipriya Ganesh Iyer. Con- tract2Tool: Learning preconditions and effects for reli- able tool-augmented LLM agents. arXiv:2606.07904, 2026

  4. [4]

    Tool- ChoiceConfusion: Causal minimal tool filtering for reli- able LLM agents

    Rahul Suresh Babu and Laxmipriya Ganesh Iyer. Tool- ChoiceConfusion: Causal minimal tool filtering for reli- able LLM agents. arXiv:2606.06284, 2026

  5. [5]

    Design patterns for securing LLM agents against prompt injections

    Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cre¸ tu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, and Václav V olhejn. Design patterns for securing LLM agents against prompt injections. arXiv:2506.08837, 2025

  6. [6]

    Taylor, Krishnamurthy Dj Dvijotham, and Alexandre Lacoste

    Rishika Bhagwatkar, Kevin Kasa, Abhay Puri, Gabriel Huang, Irina Rish, Graham W. Taylor, Krishnamurthy Dj Dvijotham, and Alexandre Lacoste. Indirect prompt injections: Are firewalls all you need, or stronger bench- marks? arXiv:2510.05244, 2025

  7. [7]

    AgentBound: Securing execution boundaries of AI agents.Proceedings of the ACM on Software Engineering, 3(FSE), 2026

    Christoph Bühler, Matteo Biagiola, Luca Di Grazia, and Guido Salvaneschi. AgentBound: Securing execution boundaries of AI agents.Proceedings of the ACM on Software Engineering, 3(FSE), 2026

  8. [8]

    LlamaFire- wall: An open source guardrail system for building se- cure AI agents

    Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, Alekhya Gampa, Beto de Paola, Dominik Gabi, James Crnkovich, Jean-Christophe Testud, Kat He, Rash- nil Chaturvedi, Wu Zhou, and Joshua Saxe. LlamaFire- wall: An open source guardrail system fo...

  9. [9]

    Maris: A formally verifiable privacy policy enforce- ment paradigm for multi-agent collaboration systems

    Jian Cui, Zichuan Li, Luyi Xing, and Xiaojing Liao. Maris: A formally verifiable privacy policy enforce- ment paradigm for multi-agent collaboration systems. arXiv:2505.04799, 2025

  10. [10]

    Defeating prompt injections by design

    Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design. InIEEE Con- ference on Secure and Trustworthy Machine Learning (SaTML), Munich, Germany, 2026

  11. [11]

    AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. InAd- vances in Neural Information Processing Systems, vol- ume 37, pages 82895–82920, Vancouver, BC, Canada,

  12. [12]

    Curran Associates, Inc

  13. [13]

    Mind2Web: Towards a generalist agent for the web

    Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2Web: Towards a generalist agent for the web. InAdvances in Neural Information Processing Systems, volume 36, pages 28091–28114, New Orleans, LA, USA, 2023. Cur- ran Associates, Inc

  14. [14]

    Laradji, Manuel Del Verme, Tom Marty, David Vázquez, Nicolas Chapados, and Alexandre Lacoste

    Alexandre Drouin, Maxime Gasse, Massimo Caccia, Is- sam H. Laradji, Manuel Del Verme, Tom Marty, David Vázquez, Nicolas Chapados, and Alexandre Lacoste. WorkArena: How capable are web agents at solving common knowledge work tasks? InForty-first Interna- tional Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 116...

  15. [15]

    Gray and David R

    Cary G. Gray and David R. Cheriton. Leases: An effi- cient fault-tolerant mechanism for distributed file cache 15 consistency. InProceedings of the Twelfth ACM Sympo- sium on Operating Systems Principles, pages 202–210, 1989

  16. [16]

    MCPXKIT: The unified toolkit for analyzing model context protocol security

    Yongjian Guo, Puzhuo Liu, Wanlun Ma, Zehang Deng, Xiaogang Zhu, Peng Di, Xi Xiao, and Sheng Wen. MCPXKIT: The unified toolkit for analyzing model context protocol security. arXiv:2508.12538, 2025

  17. [17]

    SMCP: Secure model context protocol

    Xinyi Hou, Shenao Wang, Yifan Zhang, Ziluo Xue, Yan- jie Zhao, Cai Fu, and Haoyu Wang. SMCP: Secure model context protocol. arXiv:2602.01129, 2026

  18. [18]

    Model context protocol (MCP): Land- scape, security threats, and future research directions

    Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (MCP): Land- scape, security threats, and future research directions. arXiv:2503.23278, 2025

  19. [19]

    MalTool: Malicious tool attacks on LLM agents

    Yuepeng Hu, Yuqi Jia, Mengyuan Li, Dawn Song, and Neil Gong. MalTool: Malicious tool attacks on LLM agents. arXiv:2602.12194, 2026

  20. [20]

    Ca- pability minimization as a safety primitive: Risk- aware causal gating for least-privilege LLM agents

    Laxmipriya Ganesh Iyer and Rahul Suresh Babu. Ca- pability minimization as a safety primitive: Risk- aware causal gating for least-privilege LLM agents. arXiv:2606.13884, 2026

  21. [21]

    The gate is only as honest as its contracts: Contract- Guard for the contract layer of risk-aware causal gating

    Laxmipriya Ganesh Iyer and Rahul Suresh Babu. The gate is only as honest as its contracts: Contract- Guard for the contract layer of risk-aware causal gating. arXiv:2606.18550, 2026

  22. [22]

    Se- mantic attacks on tool-augmented LLMs: Securing the model context protocol against descriptor-level manipulation

    Saeid Jamshidi, Arghavan Moradi Dakhel, Kawser Wazed Nafi, and Foutse Khomh. Se- mantic attacks on tool-augmented LLMs: Securing the model context protocol against descriptor-level manipulation. arXiv:2512.06556, 2025

  23. [23]

    Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R

    Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R. Narasimhan. SWE-bench: Can language models resolve real-world GitHub issues? InThe Twelfth International Conference on Learning Representations, Vienna, Aus- tria, 2024. OpenReview.net

  24. [24]

    Prompt flow integrity to prevent privilege escalation in LLM agents

    Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in LLM agents. arXiv:2503.15547, 2025

  25. [25]

    Le, and Tomas Pfister

    Minbeom Kim, Mihir Parmar, Phillip Wallis, Lesly Mi- culicich, Kyomin Jung, Krishnamurthy Dj Dvijotham, Long T. Le, and Tomas Pfister. CausalArmor: Efficient indirect prompt injection guardrails via causal attribu- tion. arXiv:2602.07918, 2026

  26. [26]

    VisualWebArena: Evaluating multimodal agents on re- alistic visual web tasks

    Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neu- big, Shuyan Zhou, Russ Salakhutdinov, and Daniel Fried. VisualWebArena: Evaluating multimodal agents on re- alistic visual web tasks. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 881–905...

  27. [27]

    MobileWorld: Benchmarking au- tonomous mobile agents in agent-user interactive and MCP-augmented environments

    Quyu Kong, Xu Zhang, Zhenyu Yang, Nolan Gao, Chen Liu, Panrong Tong, Chenglin Cai, Hanzhang Zhou, Jianan Zhang, Liangyu Chen, Zhidan Liu, Steven Hoi, and Yue Wang. MobileWorld: Benchmarking au- tonomous mobile agents in agent-user interactive and MCP-augmented environments. arXiv:2512.19432, 2025

  28. [28]

    Benchmarking mobile device control agents across di- verse configurations

    Juyong Lee, Taywon Min, Minyong An, Dongyoon Hahm, Haeone Lee, Changyeon Kim, and Kimin Lee. Benchmarking mobile device control agents across di- verse configurations. arXiv:2404.16660, 2024

  29. [29]

    ACE: A security architecture for LLM-integrated app systems

    Evan Li, Tushin Mallick, Evan Rose, William Robert- son, Alina Oprea, and Cristina Nita-Rotaru. ACE: A security architecture for LLM-integrated app systems. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2026. The Internet Soci- ety

  30. [30]

    MCP-ITP: An automated framework for implicit tool poisoning in MCP

    Ruiqi Li, Zhiqiang Wang, Yunhao Yao, and Xiang-Yang Li. MCP-ITP: An automated framework for implicit tool poisoning in MCP. arXiv:2601.07395, 2026

  31. [31]

    Linux man-pages project, 2025

    Linux man-pages project.seccomp(2) — Linux manual page. Linux man-pages project, 2025

  32. [32]

    AgentBench: Evaluating LLMs as agents

    Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. AgentBench: Evaluating LLMs as agents. InThe Twelfth International Conference on Learn- ing...

  33. [33]

    AgentBoard: An analytical evaluation board of multi-turn LLM agents

    Chang Ma, Junlei Zhang, Zhihao Zhu, Cheng Yang, Yu- jiu Yang, Yaohui Jin, Zhenzhong Lan, Lingpeng Kong, and Junxian He. AgentBoard: An analytical evaluation board of multi-turn LLM agents. InAdvances in Neu- ral Information Processing Systems, volume 37, pages 74325–74362, Vancouver, BC, Canada, 2024. Curran Associates, Inc

  34. [34]

    Breaking the pro- tocol: Security analysis of the model context protocol specification and prompt injection vulnerabilities in tool- integrated LLM agents

    Narek Maloyan and Dmitry Namiot. Breaking the pro- tocol: Security analysis of the model context protocol specification and prompt injection vulnerabilities in tool- integrated LLM agents. arXiv:2601.17549, 2026. 16

  35. [35]

    Quantifying frontier LLM capabilities for container sandbox escape

    Rahul Marchand, Art O Cathain, Jerome Wynne, Philip- pos Maximos Giavridis, Sam Deverett, John Wilkin- son, Jason Gwartz, and Harry Coppock. Quantifying frontier LLM capabilities for container sandbox escape. arXiv:2603.02277, 2026

  36. [36]

    ceLLMate: Sandboxing browser AI agents

    Luoxi Meng, Henry Feng, Ilia Shumailov, and Earlence Fernandes. ceLLMate: Sandboxing browser AI agents. arXiv:2512.12594, 2025

  37. [37]

    GAIA: A bench- mark for general AI assistants

    Grégoire Mialon, Clémentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. GAIA: A bench- mark for general AI assistants. InThe Twelfth Interna- tional Conference on Learning Representations, Vienna, Austria, 2024. OpenReview.net

  38. [38]

    Attractive metadata attack: Inducing LLM agents to invoke malicious tools

    Kanghua Mo, Li Hu, Yucheng Long, and Zhihao Li. Attractive metadata attack: Inducing LLM agents to invoke malicious tools. arXiv:2508.02110, 2025

  39. [39]

    Formal policy enforcement for real-world agentic systems

    Nils Palumbo, Sarthak Choudhary, Jihye Choi, Guy Amir, Prasad Chalasani, and Somesh Jha. Formal policy enforcement for real-world agentic systems. arXiv:2602.16708, 2026

  40. [40]

    The UCONABC usage control model.ACM Transactions on Information and System Security, 7(1):128–174, 2004

    Jaehong Park and Ravi Sandhu. The UCONABC usage control model.ACM Transactions on Information and System Security, 7(1):128–174, 2004

  41. [41]

    SWE-PolyBench: A multi-language benchmark for repository-level evaluation of coding agents

    Muhammad Shihab Rashid, Christian Bock, Yuan Zhuang, Alexander Buccholz, Tim Esler, Simon Valentin, Luca Franceschi, Martin Wistuba, Prabhu Teja Sivaprasad, Woo Jung Kim, et al. SWE-PolyBench: A multi-language benchmark for repository-level evaluation of coding agents. arXiv:2504.08703, 2025

  42. [42]

    AndroidWorld: A dynamic benchmarking environment for autonomous agents

    Chris Rawles, Sarah Clinckemaillie, Yifan Chang, Jonathan Waltz, Gabrielle Lau, Marybeth Fair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, Daniel Toyama, Robert Berry, Divya Tyamagundlu, Tim- othy Lillicrap, and Oriana Riva. AndroidWorld: A dynamic benchmarking environment for autonomous agents. InThe Thirteenth International Conference on ...

  43. [43]

    Maddison, and Tatsunori Hashimoto

    Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. ToolEmu: Iden- tifying the risks of LM agents with an LM-emulated sandbox. InThe Twelfth International Conference on Learning Representations, Vienna, Austria, 2024. Open- Review.net

  44. [44]

    Saltzer and Michael D

    Jerome H. Saltzer and Michael D. Schroeder. The pro- tection of information in computer systems.Proceed- ings of the IEEE, 63(9):1278–1308, 1975

  45. [45]

    Prompt injec- tion attack to tool selection in LLM agents

    Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. Prompt injec- tion attack to tool selection in LLM agents. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2026. The Internet Society

  46. [46]

    Pro- gent: Securing AI agents with privilege control

    Tianneng Shi, Jingxuan He, Zhun Wang, Linyu Wu, Hongwei Li, Wenbo Guo, and Dawn Song. Pro- gent: Securing AI agents with privilege control. arXiv:2504.11703, 2025

  47. [47]

    PromptArmor: Simple yet effective prompt injection defenses

    Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, and Dawn Song. PromptArmor: Simple yet effective prompt injection defenses. arXiv:2507.15219, 2025

  48. [48]

    SAGA: A security archi- tecture for governing AI agentic systems

    Georgios Syros, Anshuman Suri, Jacob Ginesin, Cristina Nita-Rotaru, and Alina Oprea. SAGA: A security archi- tecture for governing AI agentic systems. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2026. The Internet Society

  49. [49]

    Poskitt, and Jun Sun

    Haoyu Wang, Christopher M. Poskitt, and Jun Sun. AgentSpec: Customizable runtime enforcement for safe and reliable LLM agents. InProceedings of the IEEE/ACM International Conference on Software Engi- neering (ICSE 2026), pages 1–12, Rio de Janeiro, Brazil,

  50. [50]

    Association for Computing Machinery

  51. [51]

    MCPTox: A benchmark for tool poisoning attack on real-world MCP servers

    Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Hao- hua Du, and Xiangyang Li. MCPTox: A benchmark for tool poisoning attack on real-world MCP servers. arXiv:2508.14925, 2025

  52. [52]

    Robert N. M. Watson, Jonathan Anderson, Ben Laurie, and Kris Kennaway. Capsicum: Practical capabilities for UNIX. In19th USENIX Security Symposium (USENIX Security 2010), pages 29–44, Washington, DC, USA,

  53. [53]

    Robert N. M. Watson, Peter G. Neumann, Jonathan Woodruff, Simon W. Moore, Jonathan Anderson, David Chisnall, Nirav Dave, Brooks Davis, Khilan Gudka, and Ben Laurie. CHERI: A hybrid capability-system archi- tecture for scalable software compartmentalization. In 2015 IEEE Symposium on Security and Privacy, pages 20–37, San Jose, CA, USA, 2015. IEEE Computer...

  54. [54]

    IsolateGPT: An execution iso- lation architecture for LLM-based systems

    Yuhao Wu, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. IsolateGPT: An execution iso- lation architecture for LLM-based systems. InNetwork and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 2025. The Internet Society. 17

  55. [55]

    OSWorld: Benchmarking multimodal agents for open-ended tasks in real com- puter environments

    Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhou- jun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yi- heng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, and Tao Yu. OSWorld: Benchmarking multimodal agents for open-ended tasks in real com- puter environments. InAdvances in Neural Informa...

  56. [56]

    MCPSecBench: A sys- tematic security benchmark and playground for testing model context protocols

    Yixuan Yang, Cuifeng Gao, Daoyuan Wu, Yufan Chen, Yingjiu Li, and Shuai Wang. MCPSecBench: A sys- tematic security benchmark and playground for testing model context protocols. arXiv:2508.13220, 2025

  57. [57]

    Multi-agent memory from a computer ar- chitecture perspective: Visions and challenges ahead

    Zhongming Yu, Naicheng Yu, Hejia Zhang, Wentao Ni, Mingrui Yin, Jiaying Yang, Yujie Zhao, and Jishen Zhao. Multi-agent memory from a computer ar- chitecture perspective: Visions and challenges ahead. arXiv:2603.10062, 2026

  58. [58]

    InjecAgent: Benchmarking indirect prompt injec- tions in tool-integrated large language model agents

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect prompt injec- tions in tool-integrated large language model agents. In Findings of the Association for Computational Linguis- tics: ACL 2024, pages 10471–10506, Bangkok, Thai- land, 2024. Association for Computational Linguistics

  59. [59]

    Agent security bench (ASB): For- malizing and benchmarking attacks and defenses in LLM-based agents

    Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (ASB): For- malizing and benchmarking attacks and defenses in LLM-based agents. InThe Thirteenth International Conference on Learning Representations, Singapore,

  60. [60]

    AgentAlign: Navigating safety alignment in the shift from informative to agentic large language models

    Jinchuan Zhang, Lu Yin, Yan Zhou, and Songlin Hu. AgentAlign: Navigating safety alignment in the shift from informative to agentic large language models. arXiv:2505.23020, 2025

  61. [61]

    AgentCgroup: Under- standing and controlling OS resources of AI agents

    Yusheng Zheng, Jiakun Fan, Quanzhi Fu, Yiwei Yang, Wei Zhang, and Andi Quinn. AgentCgroup: Under- standing and controlling OS resources of AI agents. arXiv:2602.09345, 2026

  62. [62]

    Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neu- big

    Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neu- big. WebArena: A realistic web environment for build- ing autonomous agents. InThe Twelfth International Conference on Learning Representations, Vienna, Aus- tria, 2024. OpenReview.net

  63. [63]

    Patil, Vivian Fang, and Raluca Ada Popa

    Jinhao Zhu, Kevin Tseng, Gil Vernik, Xiao Huang, Shishir G. Patil, Vivian Fang, and Raluca Ada Popa. MiniScope: A least privilege framework for authorizing tool calling agents. arXiv:2512.11147, 2025. A Representative Trace Witnesses The main text uses compact running examples. This appendix records the corresponding checked-in witnesses and the tran- sit...