Securing LLM Agents Need Intent-to-Execution Integrity
Pith reviewed 2026-05-19 20:12 UTC · model grok-4.3
The pith
Securing LLM agents requires intent-to-execution integrity so executions faithfully match user intent even with untrusted tools.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that LLM agents operate over an intent-to-execution pipeline analogous to compilation, with two root sources of failure—untrusted data ingestion and untrusted tool execution—and that four integrity properties must hold together for executions to preserve user intent. Their conjunction is termed intent-to-execution integrity. Analysis of current defenses shows they supply only partial, non-compositional coverage of these properties.
What carries the argument
Intent-to-execution integrity, the conjunction of Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity, obtained via the compiler analogy applied to the pipeline from natural-language instructions to concrete system operations.
If this is right
- Agents in open ecosystems with third-party skills cannot be secured by tool-call constraints alone and require simultaneous enforcement of all four properties.
- Defense mechanisms must be evaluated and composed against the full set of properties rather than against isolated threat models.
- Violations of Data Flow Integrity or Judgment Integrity can bypass existing defenses even when tool calls appear syntactically valid.
- New agent architectures should be designed from the start to make each integrity property independently verifiable.
Where Pith is reading between the lines
- The same four-property decomposition could be used to audit and harden other autonomous systems that translate high-level goals into low-level actions.
- Empirical measurement of how often deployed agents violate each property in isolation would help prioritize which gaps to close first.
- Formal verification techniques developed for compiler correctness might be adapted to prove that an agent implementation meets intent-to-execution integrity.
Load-bearing premise
The structural analogy between LLM agents and compilers is close enough that the four integrity properties are both necessary and jointly sufficient for end-to-end correctness.
What would settle it
An agent that satisfies all four integrity properties yet produces an execution that diverges from the user's stated intent, or an agent that violates one property while still executing the intent correctly.
read the original abstract
This position paper argues that securing LLM agents requires first defining an end-to-end correctness property that specifies when an agent's execution faithfully reflects the user's intent. Modern LLM agents operate over an \emph{intent-to-execution pipeline}, where natural-language instructions are translated into concrete system operations such as tool calls, API requests, and code execution. While recent defenses have made progress in constraining how agents construct tool calls, most existing formulations implicitly assume that tools are trusted. The emergence of systems such as OpenClaw, with open ecosystems of third-party skills and direct access to user environments, breaks this assumption and exposes new failure modes, including malicious or over-privileged components in the execution pipeline. Despite rapid progress in defense mechanisms, there is no adequate correctness property that defines what ``secure'' means for LLM agents, nor a principled way to evaluate the coverage of existing defenses. We observe that LLM agents are structurally analogous to compilers, where security violations correspond to mis-executions that do not preserve user intent. Drawing on this analogy, we identify two fundamental problem sources -- untrusted data ingestion and untrusted tool execution -- and derive four integrity properties that must hold simultaneously: \emph{Tool Integrity}, \emph{Instruction Integrity}, \emph{Judgment Integrity}, and \emph{Data Flow Integrity}. We call their conjunction \emph{intent-to-execution integrity}. Analyzing existing agentic defenses against these properties reveals that current systems provide only partial and non-compositional coverage, leaving fundamental gaps in securing modern LLM agents.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This position paper argues that securing LLM agents requires defining an end-to-end 'intent-to-execution integrity' property. It posits a structural analogy between LLM agents and compilers, identifies untrusted data ingestion and untrusted tool execution as fundamental problem sources, and derives four necessary integrity properties (Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity) whose conjunction defines the desired correctness notion. The paper then evaluates existing agentic defenses against these properties and concludes that current systems offer only partial, non-compositional coverage, leaving fundamental gaps especially in open ecosystems such as OpenClaw.
Significance. If the compiler analogy can be made rigorous and the four properties shown to be jointly sufficient for intent preservation, the framework would offer a principled way to evaluate defense coverage and guide future designs for LLM agents operating with untrusted tools and third-party skills. The position paper usefully highlights the shift from trusted-tool assumptions to open execution environments, but its conceptual nature means the significance depends on whether the properties can be operationalized or validated.
major comments (2)
- [Derivation of the four integrity properties] The central derivation of the four properties from the compiler analogy (abstract and the section introducing the properties) does not include a formal mapping or proof that the conjunction is sufficient for end-to-end intent preservation. Because user intent is expressed in natural language and can be underspecified, it remains possible for an execution to satisfy Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity yet still diverge from the original intent (e.g., via legitimate but unintended tool behavior on ambiguous instructions). This undercuts the subsequent claim that existing defenses leave fundamental gaps.
- [Evaluation of existing agentic defenses] The analysis of existing defenses (section evaluating current systems) treats the four properties as an exhaustive checklist, but without a demonstration of necessity or completeness, the conclusion that coverage is 'partial and non-compositional' rests on an unverified framing rather than a load-bearing argument.
minor comments (2)
- [Introduction] The abstract and introduction use 'intent-to-execution pipeline' without an early diagram or formal definition of the pipeline stages; adding one would clarify how the four properties map onto specific stages.
- [Motivation] The paper cites OpenClaw as an example of open ecosystems but provides limited detail on its architecture; a short description or reference to its threat model would strengthen the motivation for untrusted tool execution.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our position paper. We address each major comment below, clarifying the conceptual scope of the framework while indicating revisions to improve precision and acknowledge limitations.
read point-by-point responses
-
Referee: [Derivation of the four integrity properties] The central derivation of the four properties from the compiler analogy (abstract and the section introducing the properties) does not include a formal mapping or proof that the conjunction is sufficient for end-to-end intent preservation. Because user intent is expressed in natural language and can be underspecified, it remains possible for an execution to satisfy Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity yet still diverge from the original intent (e.g., via legitimate but unintended tool behavior on ambiguous instructions). This undercuts the subsequent claim that existing defenses leave fundamental gaps.
Authors: We agree that the derivation is analogical rather than a formal proof of sufficiency, which is consistent with the position-paper format. The four properties are obtained by decomposing the intent-to-execution pipeline at the two fundamental sources of failure (untrusted ingestion and untrusted execution) and identifying the integrity requirements at each stage. The referee correctly identifies that natural-language underspecification can produce executions that satisfy the properties yet still diverge from intent; we will revise the manuscript to state explicitly that the properties are necessary conditions for preventing the concrete failure modes we enumerate, while noting that additional mechanisms may be required to handle ambiguity. This refinement does not undercut the gap analysis, because existing defenses fail to enforce even these necessary properties in open ecosystems. revision: partial
-
Referee: [Evaluation of existing agentic defenses] The analysis of existing defenses (section evaluating current systems) treats the four properties as an exhaustive checklist, but without a demonstration of necessity or completeness, the conclusion that coverage is 'partial and non-compositional' rests on an unverified framing rather than a load-bearing argument.
Authors: The evaluation applies the four properties as an analytical lens rather than a proven exhaustive checklist. By mapping each defense to the subset of properties it addresses, we demonstrate that no current system enforces the full conjunction, which supports our claim of partial and non-compositional coverage. We accept that a formal necessity argument would be stronger; we will revise the section to describe the properties as derived directly from the identified problem sources and therefore necessary by construction for the open-tool setting, while clarifying that the evaluation is intended to expose practical gaps rather than to constitute a completeness proof. revision: partial
Circularity Check
No significant circularity in definitional derivation from compiler analogy
full rationale
The paper introduces the four integrity properties by drawing an explicit structural analogy between LLM agents and compilers, identifying untrusted data ingestion and untrusted tool execution as sources, and defining Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity whose conjunction constitutes intent-to-execution integrity. This is a constructive, definitional step rather than a reduction of any output to fitted parameters, self-referential equations, or load-bearing prior results. The subsequent claim that existing defenses provide only partial coverage follows directly from evaluating those defenses against the newly defined properties, without any loop that assumes the conclusion in the premises. No self-citations, uniqueness theorems, or ansatzes are invoked to force the framework. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agents operate over an intent-to-execution pipeline that is structurally analogous to compilers, where security violations correspond to mis-executions that do not preserve user intent.
invented entities (1)
-
Intent-to-execution integrity
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We observe that LLM agents are structurally analogous to compilers, where security violations correspond to mis-executions that do not preserve user intent. Drawing on this analogy, we identify two fundamental problem sources—untrusted data ingestion and untrusted tool execution—and derive four integrity properties...
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We call their conjunction intent-to-execution integrity.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
OpenClaw: An open-source framework for AI agents
OpenClaw. OpenClaw: An open-source framework for AI agents. https://github.com/openclaw/ openclaw, 2025
work page 2025
-
[2]
NemoClaw: Hardened OpenClaw runtime with Landlock and seccomp sandboxing
NVIDIA. NemoClaw: Hardened OpenClaw runtime with Landlock and seccomp sandboxing. https: //github.com/NVIDIA/NemoClaw, 2026
work page 2026
-
[3]
IronClaw: Agent OS focused on privacy, security, and extensibility
IronClaw. IronClaw: Agent OS focused on privacy, security, and extensibility. https://github.com/ nearai/ironclaw, 2026
work page 2026
-
[4]
Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective.arXiv preprint arXiv:2409.19091, 2024
-
[5]
SeClaw: The security armored personal AI assistant
SaFo-Lab. SeClaw: The security armored personal AI assistant. https://github.com/SaFo-Lab/ seclaw, 2026
work page 2026
-
[6]
SafeClaw-R Authors. SafeClaw-R: Risk analysis and runtime enforcement for OpenClaw skills.arXiv preprint arXiv:2603.28807, 2026
-
[7]
Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages
Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltaformaggio, and Wenke Lee. Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages. InNetwork and Distributed System Security Symposium (NDSS), 2021
work page 2021
-
[8]
Struq: Defending against prompt injection with structured queries
Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. StruQ: Defending Against Prompt Injection with Structured Queries.arXiv preprint arXiv:2402.06363, 2024
-
[9]
Securing AI Agents with Information-Flow Control
Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Securing AI Agents with Information-Flow Control.arXiv preprint arXiv:2505.23643, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
TopicAttack: An Indirect Prompt Injection Attack via Topic Transition
Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, and Bryan Hooi. TopicAttack: An Indirect Prompt Injection Attack via Topic Transition. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7327–7345, 2025
work page 2025
-
[11]
SecureClaw: Security plugin and skill for OpenClaw
Adversa AI. SecureClaw: Security plugin and skill for OpenClaw. https://github.com/adversa-ai/ secureclaw, 2026
work page 2026
-
[12]
The task shield: Enforcing task alignment to defend against indirect prompt injection in LLM agents
Feiran Jia, Tong Wu, Xin Qin, and Anna Squicciarini. The task shield: Enforcing task alignment to defend against indirect prompt injection in LLM agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 29680–29697, 2025
work page 2025
-
[13]
A framework for formalizing LLM agent security.arXiv preprint arXiv:2603.19469, 2026
Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Neil Gong, Chenguang Wang, and Dawn Song. A framework for formalizing LLM agent security.arXiv preprint arXiv:2603.19469, 2026
-
[14]
DRIFT: Dynamic rule-based defense with injection isolation for securing LLM agents
Hao Li, Xiaogeng Liu, Hung-Chun Chiu, Dianqi Li, Ning Zhang, and Chaowei Xiao. DRIFT: Dynamic rule-based defense with injection isolation for securing LLM agents. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[15]
ClawHavoc: Analyzing a coordinated supply-chain attack on OpenClaw
Koi Security. ClawHavoc: Analyzing a coordinated supply-chain attack on OpenClaw. Security report, 2026
work page 2026
-
[16]
Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, and Bryan Hooi. Can indirect prompt injection attacks be detected and removed? InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18189–18206, 2025
work page 2025
-
[17]
Large-scale security analysis of AI agent skills.arXiv preprint, 2026
Liu et al. Large-scale security analysis of AI agent skills.arXiv preprint, 2026
work page 2026
-
[18]
OpenClaw skill security: Credential exposure in the ClawHub registry
Snyk. OpenClaw skill security: Credential exposure in the ClawHub registry. Security advisory, 2026
work page 2026
-
[19]
Defeating Prompt Injections by Design
Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[20]
PromptArmor: Simple yet effective prompt injection defenses
Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, and Dawn Song. PromptArmor: Simple yet effective prompt injection defenses. InICLR, 2026
work page 2026
-
[21]
SecAlign: Defending against prompt injection with preference optimization
Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, and Chuan Guo. SecAlign: Defending against prompt injection with preference optimization. InACM CCS, 2025
work page 2025
-
[22]
Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents
Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents. InFindings of the Association for Computa- tional Linguistics: NAACL 2025, pages 7116–7132, 2025
work page 2025
-
[23]
Alfredo Oliveira, Buddy Tancio, David Fiser, Philippe Lin, and Roel Reyes. Malicious OpenClaw skills used to distribute Atomic macOS Stealer.Trend Micro Research, 2026.https://www.trendmicro.com/ en_us/research/26/b/openclaw-skills-used-to-distribute-atomic-macos-stealer. html. 10
work page 2026
-
[24]
Ravie Lakshmanan. Researchers find 341 malicious ClawHub skills stealing data from OpenClaw users.The Hacker News, 2026. https://thehackernews.com/2026/02/ researchers-find-341-malicious-clawhub.html
work page 2026
-
[25]
Progent: Programmable privilege control for LLM agents.arXiv preprint, 2025
Tianneng Shi et al. Progent: Programmable privilege control for LLM agents.arXiv preprint, 2025
work page 2025
-
[26]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. InAISec, 2023
work page 2023
-
[27]
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
Edoardo Debenedetti, Jie Zhang, Mislav Balunovi ´c, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents.arXiv preprint arXiv:2406.13352, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[28]
Formal verification of a realistic compiler.Communications of the ACM, 52(7):107–116, 2009
Xavier Leroy. Formal verification of a realistic compiler.Communications of the ACM, 52(7):107–116, 2009
work page 2009
-
[29]
Yingqiang Ge, Yujie Ren, Wenyue Hua, Shuyuan Xu, Juntao Tan, and Yongfeng Zhang. LLM as interpreter for natural language programming, pseudo-code programming, and flow programming of AI agents.arXiv preprint arXiv:2405.06907, 2024
-
[30]
Agint: Agentic graph compilation for software engineering agents
Lvmin Zhang and Maneesh Agrawala. Agint: Agentic graph compilation for software engineering agents. arXiv preprint arXiv:2511.19635, 2025. 11
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.