Securing LLM Agents Need Intent-to-Execution Integrity

Dawn Song; Jiaheng Zhang; Ming Xu; Peiran Wang; Shengfang Zhai; Wenjie Qu

arxiv: 2605.16976 · v1 · pith:GRUVDJFPnew · submitted 2026-05-16 · 💻 cs.CR

Securing LLM Agents Need Intent-to-Execution Integrity

Wenjie Qu , Ming Xu , Peiran Wang , Shengfang Zhai , Jiaheng Zhang , Dawn Song This is my paper

Pith reviewed 2026-05-19 20:12 UTC · model grok-4.3

classification 💻 cs.CR

keywords LLM agentssecurityintegrity propertiesintent-to-executionuntrusted toolscompiler analogyagentic systems

0 comments

The pith

Securing LLM agents requires intent-to-execution integrity so executions faithfully match user intent even with untrusted tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This position paper argues that LLM agents need a precise end-to-end correctness property to guarantee that their concrete actions preserve the user's natural-language intent. It defines this property, called intent-to-execution integrity, as the simultaneous satisfaction of four integrity conditions: Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity. The conditions are obtained by treating the agent's translation from instructions to tool calls and API operations as a compiler-like process that must avoid mis-execution. Existing defenses are shown to cover only subsets of these conditions and to lack compositional guarantees, especially once third-party skills gain direct access to user environments. A reader would care because the trusted-tools assumption that underpins most prior work no longer holds in open agent ecosystems.

Core claim

The paper claims that LLM agents operate over an intent-to-execution pipeline analogous to compilation, with two root sources of failure—untrusted data ingestion and untrusted tool execution—and that four integrity properties must hold together for executions to preserve user intent. Their conjunction is termed intent-to-execution integrity. Analysis of current defenses shows they supply only partial, non-compositional coverage of these properties.

What carries the argument

Intent-to-execution integrity, the conjunction of Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity, obtained via the compiler analogy applied to the pipeline from natural-language instructions to concrete system operations.

If this is right

Agents in open ecosystems with third-party skills cannot be secured by tool-call constraints alone and require simultaneous enforcement of all four properties.
Defense mechanisms must be evaluated and composed against the full set of properties rather than against isolated threat models.
Violations of Data Flow Integrity or Judgment Integrity can bypass existing defenses even when tool calls appear syntactically valid.
New agent architectures should be designed from the start to make each integrity property independently verifiable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same four-property decomposition could be used to audit and harden other autonomous systems that translate high-level goals into low-level actions.
Empirical measurement of how often deployed agents violate each property in isolation would help prioritize which gaps to close first.
Formal verification techniques developed for compiler correctness might be adapted to prove that an agent implementation meets intent-to-execution integrity.

Load-bearing premise

The structural analogy between LLM agents and compilers is close enough that the four integrity properties are both necessary and jointly sufficient for end-to-end correctness.

What would settle it

An agent that satisfies all four integrity properties yet produces an execution that diverges from the user's stated intent, or an agent that violates one property while still executing the intent correctly.

read the original abstract

This position paper argues that securing LLM agents requires first defining an end-to-end correctness property that specifies when an agent's execution faithfully reflects the user's intent. Modern LLM agents operate over an \emph{intent-to-execution pipeline}, where natural-language instructions are translated into concrete system operations such as tool calls, API requests, and code execution. While recent defenses have made progress in constraining how agents construct tool calls, most existing formulations implicitly assume that tools are trusted. The emergence of systems such as OpenClaw, with open ecosystems of third-party skills and direct access to user environments, breaks this assumption and exposes new failure modes, including malicious or over-privileged components in the execution pipeline. Despite rapid progress in defense mechanisms, there is no adequate correctness property that defines what ``secure'' means for LLM agents, nor a principled way to evaluate the coverage of existing defenses. We observe that LLM agents are structurally analogous to compilers, where security violations correspond to mis-executions that do not preserve user intent. Drawing on this analogy, we identify two fundamental problem sources -- untrusted data ingestion and untrusted tool execution -- and derive four integrity properties that must hold simultaneously: \emph{Tool Integrity}, \emph{Instruction Integrity}, \emph{Judgment Integrity}, and \emph{Data Flow Integrity}. We call their conjunction \emph{intent-to-execution integrity}. Analyzing existing agentic defenses against these properties reveals that current systems provide only partial and non-compositional coverage, leaving fundamental gaps in securing modern LLM agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This position paper defines intent-to-execution integrity via four properties drawn from a compiler analogy and uses it to critique gaps in current LLM agent defenses, but the analogy does not establish that the properties are jointly sufficient.

read the letter

The main point is that the authors want a clear end-to-end correctness property for LLM agents that use tools. They draw an analogy to compilers and name four properties—Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity—whose conjunction they call intent-to-execution integrity. They then argue that existing defenses only cover some of these and leave real gaps, especially once agents start using untrusted third-party skills and direct environment access.

Referee Report

2 major / 2 minor

Summary. This position paper argues that securing LLM agents requires defining an end-to-end 'intent-to-execution integrity' property. It posits a structural analogy between LLM agents and compilers, identifies untrusted data ingestion and untrusted tool execution as fundamental problem sources, and derives four necessary integrity properties (Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity) whose conjunction defines the desired correctness notion. The paper then evaluates existing agentic defenses against these properties and concludes that current systems offer only partial, non-compositional coverage, leaving fundamental gaps especially in open ecosystems such as OpenClaw.

Significance. If the compiler analogy can be made rigorous and the four properties shown to be jointly sufficient for intent preservation, the framework would offer a principled way to evaluate defense coverage and guide future designs for LLM agents operating with untrusted tools and third-party skills. The position paper usefully highlights the shift from trusted-tool assumptions to open execution environments, but its conceptual nature means the significance depends on whether the properties can be operationalized or validated.

major comments (2)

[Derivation of the four integrity properties] The central derivation of the four properties from the compiler analogy (abstract and the section introducing the properties) does not include a formal mapping or proof that the conjunction is sufficient for end-to-end intent preservation. Because user intent is expressed in natural language and can be underspecified, it remains possible for an execution to satisfy Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity yet still diverge from the original intent (e.g., via legitimate but unintended tool behavior on ambiguous instructions). This undercuts the subsequent claim that existing defenses leave fundamental gaps.
[Evaluation of existing agentic defenses] The analysis of existing defenses (section evaluating current systems) treats the four properties as an exhaustive checklist, but without a demonstration of necessity or completeness, the conclusion that coverage is 'partial and non-compositional' rests on an unverified framing rather than a load-bearing argument.

minor comments (2)

[Introduction] The abstract and introduction use 'intent-to-execution pipeline' without an early diagram or formal definition of the pipeline stages; adding one would clarify how the four properties map onto specific stages.
[Motivation] The paper cites OpenClaw as an example of open ecosystems but provides limited detail on its architecture; a short description or reference to its threat model would strengthen the motivation for untrusted tool execution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our position paper. We address each major comment below, clarifying the conceptual scope of the framework while indicating revisions to improve precision and acknowledge limitations.

read point-by-point responses

Referee: [Derivation of the four integrity properties] The central derivation of the four properties from the compiler analogy (abstract and the section introducing the properties) does not include a formal mapping or proof that the conjunction is sufficient for end-to-end intent preservation. Because user intent is expressed in natural language and can be underspecified, it remains possible for an execution to satisfy Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity yet still diverge from the original intent (e.g., via legitimate but unintended tool behavior on ambiguous instructions). This undercuts the subsequent claim that existing defenses leave fundamental gaps.

Authors: We agree that the derivation is analogical rather than a formal proof of sufficiency, which is consistent with the position-paper format. The four properties are obtained by decomposing the intent-to-execution pipeline at the two fundamental sources of failure (untrusted ingestion and untrusted execution) and identifying the integrity requirements at each stage. The referee correctly identifies that natural-language underspecification can produce executions that satisfy the properties yet still diverge from intent; we will revise the manuscript to state explicitly that the properties are necessary conditions for preventing the concrete failure modes we enumerate, while noting that additional mechanisms may be required to handle ambiguity. This refinement does not undercut the gap analysis, because existing defenses fail to enforce even these necessary properties in open ecosystems. revision: partial
Referee: [Evaluation of existing agentic defenses] The analysis of existing defenses (section evaluating current systems) treats the four properties as an exhaustive checklist, but without a demonstration of necessity or completeness, the conclusion that coverage is 'partial and non-compositional' rests on an unverified framing rather than a load-bearing argument.

Authors: The evaluation applies the four properties as an analytical lens rather than a proven exhaustive checklist. By mapping each defense to the subset of properties it addresses, we demonstrate that no current system enforces the full conjunction, which supports our claim of partial and non-compositional coverage. We accept that a formal necessity argument would be stronger; we will revise the section to describe the properties as derived directly from the identified problem sources and therefore necessary by construction for the open-tool setting, while clarifying that the evaluation is intended to expose practical gaps rather than to constitute a completeness proof. revision: partial

Circularity Check

0 steps flagged

No significant circularity in definitional derivation from compiler analogy

full rationale

The paper introduces the four integrity properties by drawing an explicit structural analogy between LLM agents and compilers, identifying untrusted data ingestion and untrusted tool execution as sources, and defining Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity whose conjunction constitutes intent-to-execution integrity. This is a constructive, definitional step rather than a reduction of any output to fitted parameters, self-referential equations, or load-bearing prior results. The subsequent claim that existing defenses provide only partial coverage follows directly from evaluating those defenses against the newly defined properties, without any loop that assumes the conclusion in the premises. No self-citations, uniqueness theorems, or ansatzes are invoked to force the framework. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper relies on one domain assumption (the compiler analogy) and introduces one invented conceptual entity (intent-to-execution integrity). No free parameters or additional invented entities are used.

axioms (1)

domain assumption LLM agents operate over an intent-to-execution pipeline that is structurally analogous to compilers, where security violations correspond to mis-executions that do not preserve user intent.
This assumption is used to identify the two problem sources and derive the four integrity properties.

invented entities (1)

Intent-to-execution integrity no independent evidence
purpose: To serve as the end-to-end correctness property that defines when an agent's execution faithfully reflects the user's intent.
Newly coined term whose necessity is argued via the compiler analogy and analysis of existing defenses.

pith-pipeline@v0.9.0 · 5813 in / 1319 out tokens · 34749 ms · 2026-05-19T20:12:41.744001+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We observe that LLM agents are structurally analogous to compilers, where security violations correspond to mis-executions that do not preserve user intent. Drawing on this analogy, we identify two fundamental problem sources—untrusted data ingestion and untrusted tool execution—and derive four integrity properties...
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We call their conjunction intent-to-execution integrity.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 3 internal anchors

[1]

OpenClaw: An open-source framework for AI agents

OpenClaw. OpenClaw: An open-source framework for AI agents. https://github.com/openclaw/ openclaw, 2025

work page 2025
[2]

NemoClaw: Hardened OpenClaw runtime with Landlock and seccomp sandboxing

NVIDIA. NemoClaw: Hardened OpenClaw runtime with Landlock and seccomp sandboxing. https: //github.com/NVIDIA/NemoClaw, 2026

work page 2026
[3]

IronClaw: Agent OS focused on privacy, security, and extensibility

IronClaw. IronClaw: Agent OS focused on privacy, security, and extensibility. https://github.com/ nearai/ironclaw, 2026

work page 2026
[4]

System-level defense against indirect prompt injection attacks: An information flow control perspective

Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective.arXiv preprint arXiv:2409.19091, 2024

work page arXiv 2024
[5]

SeClaw: The security armored personal AI assistant

SaFo-Lab. SeClaw: The security armored personal AI assistant. https://github.com/SaFo-Lab/ seclaw, 2026

work page 2026
[6]

SafeClaw-R: Risk analysis and runtime enforcement for OpenClaw skills.arXiv preprint arXiv:2603.28807, 2026

SafeClaw-R Authors. SafeClaw-R: Risk analysis and runtime enforcement for OpenClaw skills.arXiv preprint arXiv:2603.28807, 2026

work page arXiv 2026
[7]

Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages

Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltaformaggio, and Wenke Lee. Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages. InNetwork and Distributed System Security Symposium (NDSS), 2021

work page 2021
[8]

Struq: Defending against prompt injection with structured queries

Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. StruQ: Defending Against Prompt Injection with Structured Queries.arXiv preprint arXiv:2402.06363, 2024

work page arXiv 2024
[9]

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Securing AI Agents with Information-Flow Control.arXiv preprint arXiv:2505.23643, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

TopicAttack: An Indirect Prompt Injection Attack via Topic Transition

Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, and Bryan Hooi. TopicAttack: An Indirect Prompt Injection Attack via Topic Transition. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7327–7345, 2025

work page 2025
[11]

SecureClaw: Security plugin and skill for OpenClaw

Adversa AI. SecureClaw: Security plugin and skill for OpenClaw. https://github.com/adversa-ai/ secureclaw, 2026

work page 2026
[12]

The task shield: Enforcing task alignment to defend against indirect prompt injection in LLM agents

Feiran Jia, Tong Wu, Xin Qin, and Anna Squicciarini. The task shield: Enforcing task alignment to defend against indirect prompt injection in LLM agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 29680–29697, 2025

work page 2025
[13]

A framework for formalizing LLM agent security.arXiv preprint arXiv:2603.19469, 2026

Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Neil Gong, Chenguang Wang, and Dawn Song. A framework for formalizing LLM agent security.arXiv preprint arXiv:2603.19469, 2026

work page arXiv 2026
[14]

DRIFT: Dynamic rule-based defense with injection isolation for securing LLM agents

Hao Li, Xiaogeng Liu, Hung-Chun Chiu, Dianqi Li, Ning Zhang, and Chaowei Xiao. DRIFT: Dynamic rule-based defense with injection isolation for securing LLM agents. InAdvances in Neural Information Processing Systems, 2025

work page 2025
[15]

ClawHavoc: Analyzing a coordinated supply-chain attack on OpenClaw

Koi Security. ClawHavoc: Analyzing a coordinated supply-chain attack on OpenClaw. Security report, 2026

work page 2026
[16]

Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, and Bryan Hooi. Can indirect prompt injection attacks be detected and removed? InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18189–18206, 2025

work page 2025
[17]

Large-scale security analysis of AI agent skills.arXiv preprint, 2026

Liu et al. Large-scale security analysis of AI agent skills.arXiv preprint, 2026

work page 2026
[18]

OpenClaw skill security: Credential exposure in the ClawHub registry

Snyk. OpenClaw skill security: Credential exposure in the ClawHub registry. Security advisory, 2026

work page 2026
[19]

Defeating Prompt Injections by Design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

PromptArmor: Simple yet effective prompt injection defenses

Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, and Dawn Song. PromptArmor: Simple yet effective prompt injection defenses. InICLR, 2026

work page 2026
[21]

SecAlign: Defending against prompt injection with preference optimization

Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, and Chuan Guo. SecAlign: Defending against prompt injection with preference optimization. InACM CCS, 2025

work page 2025
[22]

Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents

Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents. InFindings of the Association for Computa- tional Linguistics: NAACL 2025, pages 7116–7132, 2025

work page 2025
[23]

Alfredo Oliveira, Buddy Tancio, David Fiser, Philippe Lin, and Roel Reyes. Malicious OpenClaw skills used to distribute Atomic macOS Stealer.Trend Micro Research, 2026.https://www.trendmicro.com/ en_us/research/26/b/openclaw-skills-used-to-distribute-atomic-macos-stealer. html. 10

work page 2026
[24]

Researchers find 341 malicious ClawHub skills stealing data from OpenClaw users.The Hacker News, 2026

Ravie Lakshmanan. Researchers find 341 malicious ClawHub skills stealing data from OpenClaw users.The Hacker News, 2026. https://thehackernews.com/2026/02/ researchers-find-341-malicious-clawhub.html

work page 2026
[25]

Progent: Programmable privilege control for LLM agents.arXiv preprint, 2025

Tianneng Shi et al. Progent: Programmable privilege control for LLM agents.arXiv preprint, 2025

work page 2025
[26]

Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. InAISec, 2023

work page 2023
[27]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovi ´c, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents.arXiv preprint arXiv:2406.13352, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

Formal verification of a realistic compiler.Communications of the ACM, 52(7):107–116, 2009

Xavier Leroy. Formal verification of a realistic compiler.Communications of the ACM, 52(7):107–116, 2009

work page 2009
[29]

Aios compiler: Llm as interpreter for natural language programming and flow programming of ai agents, 2024

Yingqiang Ge, Yujie Ren, Wenyue Hua, Shuyuan Xu, Juntao Tan, and Yongfeng Zhang. LLM as interpreter for natural language programming, pseudo-code programming, and flow programming of AI agents.arXiv preprint arXiv:2405.06907, 2024

work page arXiv 2024
[30]

Agint: Agentic graph compilation for software engineering agents

Lvmin Zhang and Maneesh Agrawala. Agint: Agentic graph compilation for software engineering agents. arXiv preprint arXiv:2511.19635, 2025. 11

work page arXiv 2025

[1] [1]

OpenClaw: An open-source framework for AI agents

OpenClaw. OpenClaw: An open-source framework for AI agents. https://github.com/openclaw/ openclaw, 2025

work page 2025

[2] [2]

NemoClaw: Hardened OpenClaw runtime with Landlock and seccomp sandboxing

NVIDIA. NemoClaw: Hardened OpenClaw runtime with Landlock and seccomp sandboxing. https: //github.com/NVIDIA/NemoClaw, 2026

work page 2026

[3] [3]

IronClaw: Agent OS focused on privacy, security, and extensibility

IronClaw. IronClaw: Agent OS focused on privacy, security, and extensibility. https://github.com/ nearai/ironclaw, 2026

work page 2026

[4] [4]

System-level defense against indirect prompt injection attacks: An information flow control perspective

Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective.arXiv preprint arXiv:2409.19091, 2024

work page arXiv 2024

[5] [5]

SeClaw: The security armored personal AI assistant

SaFo-Lab. SeClaw: The security armored personal AI assistant. https://github.com/SaFo-Lab/ seclaw, 2026

work page 2026

[6] [6]

SafeClaw-R: Risk analysis and runtime enforcement for OpenClaw skills.arXiv preprint arXiv:2603.28807, 2026

SafeClaw-R Authors. SafeClaw-R: Risk analysis and runtime enforcement for OpenClaw skills.arXiv preprint arXiv:2603.28807, 2026

work page arXiv 2026

[7] [7]

Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages

Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltaformaggio, and Wenke Lee. Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages. InNetwork and Distributed System Security Symposium (NDSS), 2021

work page 2021

[8] [8]

Struq: Defending against prompt injection with structured queries

Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. StruQ: Defending Against Prompt Injection with Structured Queries.arXiv preprint arXiv:2402.06363, 2024

work page arXiv 2024

[9] [9]

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Securing AI Agents with Information-Flow Control.arXiv preprint arXiv:2505.23643, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

TopicAttack: An Indirect Prompt Injection Attack via Topic Transition

Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, and Bryan Hooi. TopicAttack: An Indirect Prompt Injection Attack via Topic Transition. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7327–7345, 2025

work page 2025

[11] [11]

SecureClaw: Security plugin and skill for OpenClaw

Adversa AI. SecureClaw: Security plugin and skill for OpenClaw. https://github.com/adversa-ai/ secureclaw, 2026

work page 2026

[12] [12]

The task shield: Enforcing task alignment to defend against indirect prompt injection in LLM agents

Feiran Jia, Tong Wu, Xin Qin, and Anna Squicciarini. The task shield: Enforcing task alignment to defend against indirect prompt injection in LLM agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 29680–29697, 2025

work page 2025

[13] [13]

A framework for formalizing LLM agent security.arXiv preprint arXiv:2603.19469, 2026

Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Neil Gong, Chenguang Wang, and Dawn Song. A framework for formalizing LLM agent security.arXiv preprint arXiv:2603.19469, 2026

work page arXiv 2026

[14] [14]

DRIFT: Dynamic rule-based defense with injection isolation for securing LLM agents

Hao Li, Xiaogeng Liu, Hung-Chun Chiu, Dianqi Li, Ning Zhang, and Chaowei Xiao. DRIFT: Dynamic rule-based defense with injection isolation for securing LLM agents. InAdvances in Neural Information Processing Systems, 2025

work page 2025

[15] [15]

ClawHavoc: Analyzing a coordinated supply-chain attack on OpenClaw

Koi Security. ClawHavoc: Analyzing a coordinated supply-chain attack on OpenClaw. Security report, 2026

work page 2026

[16] [16]

Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, and Bryan Hooi. Can indirect prompt injection attacks be detected and removed? InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18189–18206, 2025

work page 2025

[17] [17]

Large-scale security analysis of AI agent skills.arXiv preprint, 2026

Liu et al. Large-scale security analysis of AI agent skills.arXiv preprint, 2026

work page 2026

[18] [18]

OpenClaw skill security: Credential exposure in the ClawHub registry

Snyk. OpenClaw skill security: Credential exposure in the ClawHub registry. Security advisory, 2026

work page 2026

[19] [19]

Defeating Prompt Injections by Design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[20] [20]

PromptArmor: Simple yet effective prompt injection defenses

Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, and Dawn Song. PromptArmor: Simple yet effective prompt injection defenses. InICLR, 2026

work page 2026

[21] [21]

SecAlign: Defending against prompt injection with preference optimization

Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, and Chuan Guo. SecAlign: Defending against prompt injection with preference optimization. InACM CCS, 2025

work page 2025

[22] [22]

Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents

Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents. InFindings of the Association for Computa- tional Linguistics: NAACL 2025, pages 7116–7132, 2025

work page 2025

[23] [23]

Alfredo Oliveira, Buddy Tancio, David Fiser, Philippe Lin, and Roel Reyes. Malicious OpenClaw skills used to distribute Atomic macOS Stealer.Trend Micro Research, 2026.https://www.trendmicro.com/ en_us/research/26/b/openclaw-skills-used-to-distribute-atomic-macos-stealer. html. 10

work page 2026

[24] [24]

Researchers find 341 malicious ClawHub skills stealing data from OpenClaw users.The Hacker News, 2026

Ravie Lakshmanan. Researchers find 341 malicious ClawHub skills stealing data from OpenClaw users.The Hacker News, 2026. https://thehackernews.com/2026/02/ researchers-find-341-malicious-clawhub.html

work page 2026

[25] [25]

Progent: Programmable privilege control for LLM agents.arXiv preprint, 2025

Tianneng Shi et al. Progent: Programmable privilege control for LLM agents.arXiv preprint, 2025

work page 2025

[26] [26]

Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. InAISec, 2023

work page 2023

[27] [27]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovi ´c, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents.arXiv preprint arXiv:2406.13352, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[28] [28]

Formal verification of a realistic compiler.Communications of the ACM, 52(7):107–116, 2009

Xavier Leroy. Formal verification of a realistic compiler.Communications of the ACM, 52(7):107–116, 2009

work page 2009

[29] [29]

Aios compiler: Llm as interpreter for natural language programming and flow programming of ai agents, 2024

Yingqiang Ge, Yujie Ren, Wenyue Hua, Shuyuan Xu, Juntao Tan, and Yongfeng Zhang. LLM as interpreter for natural language programming, pseudo-code programming, and flow programming of AI agents.arXiv preprint arXiv:2405.06907, 2024

work page arXiv 2024

[30] [30]

Agint: Agentic graph compilation for software engineering agents

Lvmin Zhang and Maneesh Agrawala. Agint: Agentic graph compilation for software engineering agents. arXiv preprint arXiv:2511.19635, 2025. 11

work page arXiv 2025