pith. sign in

arxiv: 2605.16976 · v1 · pith:GRUVDJFPnew · submitted 2026-05-16 · 💻 cs.CR

Securing LLM Agents Need Intent-to-Execution Integrity

Pith reviewed 2026-05-19 20:12 UTC · model grok-4.3

classification 💻 cs.CR
keywords LLM agentssecurityintegrity propertiesintent-to-executionuntrusted toolscompiler analogyagentic systems
0
0 comments X

The pith

Securing LLM agents requires intent-to-execution integrity so executions faithfully match user intent even with untrusted tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This position paper argues that LLM agents need a precise end-to-end correctness property to guarantee that their concrete actions preserve the user's natural-language intent. It defines this property, called intent-to-execution integrity, as the simultaneous satisfaction of four integrity conditions: Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity. The conditions are obtained by treating the agent's translation from instructions to tool calls and API operations as a compiler-like process that must avoid mis-execution. Existing defenses are shown to cover only subsets of these conditions and to lack compositional guarantees, especially once third-party skills gain direct access to user environments. A reader would care because the trusted-tools assumption that underpins most prior work no longer holds in open agent ecosystems.

Core claim

The paper claims that LLM agents operate over an intent-to-execution pipeline analogous to compilation, with two root sources of failure—untrusted data ingestion and untrusted tool execution—and that four integrity properties must hold together for executions to preserve user intent. Their conjunction is termed intent-to-execution integrity. Analysis of current defenses shows they supply only partial, non-compositional coverage of these properties.

What carries the argument

Intent-to-execution integrity, the conjunction of Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity, obtained via the compiler analogy applied to the pipeline from natural-language instructions to concrete system operations.

If this is right

  • Agents in open ecosystems with third-party skills cannot be secured by tool-call constraints alone and require simultaneous enforcement of all four properties.
  • Defense mechanisms must be evaluated and composed against the full set of properties rather than against isolated threat models.
  • Violations of Data Flow Integrity or Judgment Integrity can bypass existing defenses even when tool calls appear syntactically valid.
  • New agent architectures should be designed from the start to make each integrity property independently verifiable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same four-property decomposition could be used to audit and harden other autonomous systems that translate high-level goals into low-level actions.
  • Empirical measurement of how often deployed agents violate each property in isolation would help prioritize which gaps to close first.
  • Formal verification techniques developed for compiler correctness might be adapted to prove that an agent implementation meets intent-to-execution integrity.

Load-bearing premise

The structural analogy between LLM agents and compilers is close enough that the four integrity properties are both necessary and jointly sufficient for end-to-end correctness.

What would settle it

An agent that satisfies all four integrity properties yet produces an execution that diverges from the user's stated intent, or an agent that violates one property while still executing the intent correctly.

read the original abstract

This position paper argues that securing LLM agents requires first defining an end-to-end correctness property that specifies when an agent's execution faithfully reflects the user's intent. Modern LLM agents operate over an \emph{intent-to-execution pipeline}, where natural-language instructions are translated into concrete system operations such as tool calls, API requests, and code execution. While recent defenses have made progress in constraining how agents construct tool calls, most existing formulations implicitly assume that tools are trusted. The emergence of systems such as OpenClaw, with open ecosystems of third-party skills and direct access to user environments, breaks this assumption and exposes new failure modes, including malicious or over-privileged components in the execution pipeline. Despite rapid progress in defense mechanisms, there is no adequate correctness property that defines what ``secure'' means for LLM agents, nor a principled way to evaluate the coverage of existing defenses. We observe that LLM agents are structurally analogous to compilers, where security violations correspond to mis-executions that do not preserve user intent. Drawing on this analogy, we identify two fundamental problem sources -- untrusted data ingestion and untrusted tool execution -- and derive four integrity properties that must hold simultaneously: \emph{Tool Integrity}, \emph{Instruction Integrity}, \emph{Judgment Integrity}, and \emph{Data Flow Integrity}. We call their conjunction \emph{intent-to-execution integrity}. Analyzing existing agentic defenses against these properties reveals that current systems provide only partial and non-compositional coverage, leaving fundamental gaps in securing modern LLM agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This position paper argues that securing LLM agents requires defining an end-to-end 'intent-to-execution integrity' property. It posits a structural analogy between LLM agents and compilers, identifies untrusted data ingestion and untrusted tool execution as fundamental problem sources, and derives four necessary integrity properties (Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity) whose conjunction defines the desired correctness notion. The paper then evaluates existing agentic defenses against these properties and concludes that current systems offer only partial, non-compositional coverage, leaving fundamental gaps especially in open ecosystems such as OpenClaw.

Significance. If the compiler analogy can be made rigorous and the four properties shown to be jointly sufficient for intent preservation, the framework would offer a principled way to evaluate defense coverage and guide future designs for LLM agents operating with untrusted tools and third-party skills. The position paper usefully highlights the shift from trusted-tool assumptions to open execution environments, but its conceptual nature means the significance depends on whether the properties can be operationalized or validated.

major comments (2)
  1. [Derivation of the four integrity properties] The central derivation of the four properties from the compiler analogy (abstract and the section introducing the properties) does not include a formal mapping or proof that the conjunction is sufficient for end-to-end intent preservation. Because user intent is expressed in natural language and can be underspecified, it remains possible for an execution to satisfy Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity yet still diverge from the original intent (e.g., via legitimate but unintended tool behavior on ambiguous instructions). This undercuts the subsequent claim that existing defenses leave fundamental gaps.
  2. [Evaluation of existing agentic defenses] The analysis of existing defenses (section evaluating current systems) treats the four properties as an exhaustive checklist, but without a demonstration of necessity or completeness, the conclusion that coverage is 'partial and non-compositional' rests on an unverified framing rather than a load-bearing argument.
minor comments (2)
  1. [Introduction] The abstract and introduction use 'intent-to-execution pipeline' without an early diagram or formal definition of the pipeline stages; adding one would clarify how the four properties map onto specific stages.
  2. [Motivation] The paper cites OpenClaw as an example of open ecosystems but provides limited detail on its architecture; a short description or reference to its threat model would strengthen the motivation for untrusted tool execution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our position paper. We address each major comment below, clarifying the conceptual scope of the framework while indicating revisions to improve precision and acknowledge limitations.

read point-by-point responses
  1. Referee: [Derivation of the four integrity properties] The central derivation of the four properties from the compiler analogy (abstract and the section introducing the properties) does not include a formal mapping or proof that the conjunction is sufficient for end-to-end intent preservation. Because user intent is expressed in natural language and can be underspecified, it remains possible for an execution to satisfy Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity yet still diverge from the original intent (e.g., via legitimate but unintended tool behavior on ambiguous instructions). This undercuts the subsequent claim that existing defenses leave fundamental gaps.

    Authors: We agree that the derivation is analogical rather than a formal proof of sufficiency, which is consistent with the position-paper format. The four properties are obtained by decomposing the intent-to-execution pipeline at the two fundamental sources of failure (untrusted ingestion and untrusted execution) and identifying the integrity requirements at each stage. The referee correctly identifies that natural-language underspecification can produce executions that satisfy the properties yet still diverge from intent; we will revise the manuscript to state explicitly that the properties are necessary conditions for preventing the concrete failure modes we enumerate, while noting that additional mechanisms may be required to handle ambiguity. This refinement does not undercut the gap analysis, because existing defenses fail to enforce even these necessary properties in open ecosystems. revision: partial

  2. Referee: [Evaluation of existing agentic defenses] The analysis of existing defenses (section evaluating current systems) treats the four properties as an exhaustive checklist, but without a demonstration of necessity or completeness, the conclusion that coverage is 'partial and non-compositional' rests on an unverified framing rather than a load-bearing argument.

    Authors: The evaluation applies the four properties as an analytical lens rather than a proven exhaustive checklist. By mapping each defense to the subset of properties it addresses, we demonstrate that no current system enforces the full conjunction, which supports our claim of partial and non-compositional coverage. We accept that a formal necessity argument would be stronger; we will revise the section to describe the properties as derived directly from the identified problem sources and therefore necessary by construction for the open-tool setting, while clarifying that the evaluation is intended to expose practical gaps rather than to constitute a completeness proof. revision: partial

Circularity Check

0 steps flagged

No significant circularity in definitional derivation from compiler analogy

full rationale

The paper introduces the four integrity properties by drawing an explicit structural analogy between LLM agents and compilers, identifying untrusted data ingestion and untrusted tool execution as sources, and defining Tool Integrity, Instruction Integrity, Judgment Integrity, and Data Flow Integrity whose conjunction constitutes intent-to-execution integrity. This is a constructive, definitional step rather than a reduction of any output to fitted parameters, self-referential equations, or load-bearing prior results. The subsequent claim that existing defenses provide only partial coverage follows directly from evaluating those defenses against the newly defined properties, without any loop that assumes the conclusion in the premises. No self-citations, uniqueness theorems, or ansatzes are invoked to force the framework. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper relies on one domain assumption (the compiler analogy) and introduces one invented conceptual entity (intent-to-execution integrity). No free parameters or additional invented entities are used.

axioms (1)
  • domain assumption LLM agents operate over an intent-to-execution pipeline that is structurally analogous to compilers, where security violations correspond to mis-executions that do not preserve user intent.
    This assumption is used to identify the two problem sources and derive the four integrity properties.
invented entities (1)
  • Intent-to-execution integrity no independent evidence
    purpose: To serve as the end-to-end correctness property that defines when an agent's execution faithfully reflects the user's intent.
    Newly coined term whose necessity is argued via the compiler analogy and analysis of existing defenses.

pith-pipeline@v0.9.0 · 5813 in / 1319 out tokens · 34749 ms · 2026-05-19T20:12:41.744001+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 3 internal anchors

  1. [1]

    OpenClaw: An open-source framework for AI agents

    OpenClaw. OpenClaw: An open-source framework for AI agents. https://github.com/openclaw/ openclaw, 2025

  2. [2]

    NemoClaw: Hardened OpenClaw runtime with Landlock and seccomp sandboxing

    NVIDIA. NemoClaw: Hardened OpenClaw runtime with Landlock and seccomp sandboxing. https: //github.com/NVIDIA/NemoClaw, 2026

  3. [3]

    IronClaw: Agent OS focused on privacy, security, and extensibility

    IronClaw. IronClaw: Agent OS focused on privacy, security, and extensibility. https://github.com/ nearai/ironclaw, 2026

  4. [4]

    System-level defense against indirect prompt injection attacks: An information flow control perspective

    Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective.arXiv preprint arXiv:2409.19091, 2024

  5. [5]

    SeClaw: The security armored personal AI assistant

    SaFo-Lab. SeClaw: The security armored personal AI assistant. https://github.com/SaFo-Lab/ seclaw, 2026

  6. [6]

    SafeClaw-R: Risk analysis and runtime enforcement for OpenClaw skills.arXiv preprint arXiv:2603.28807, 2026

    SafeClaw-R Authors. SafeClaw-R: Risk analysis and runtime enforcement for OpenClaw skills.arXiv preprint arXiv:2603.28807, 2026

  7. [7]

    Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages

    Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltaformaggio, and Wenke Lee. Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages. InNetwork and Distributed System Security Symposium (NDSS), 2021

  8. [8]

    Struq: Defending against prompt injection with structured queries

    Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. StruQ: Defending Against Prompt Injection with Structured Queries.arXiv preprint arXiv:2402.06363, 2024

  9. [9]

    Securing AI Agents with Information-Flow Control

    Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Securing AI Agents with Information-Flow Control.arXiv preprint arXiv:2505.23643, 2025

  10. [10]

    TopicAttack: An Indirect Prompt Injection Attack via Topic Transition

    Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, and Bryan Hooi. TopicAttack: An Indirect Prompt Injection Attack via Topic Transition. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 7327–7345, 2025

  11. [11]

    SecureClaw: Security plugin and skill for OpenClaw

    Adversa AI. SecureClaw: Security plugin and skill for OpenClaw. https://github.com/adversa-ai/ secureclaw, 2026

  12. [12]

    The task shield: Enforcing task alignment to defend against indirect prompt injection in LLM agents

    Feiran Jia, Tong Wu, Xin Qin, and Anna Squicciarini. The task shield: Enforcing task alignment to defend against indirect prompt injection in LLM agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 29680–29697, 2025

  13. [13]

    A framework for formalizing LLM agent security.arXiv preprint arXiv:2603.19469, 2026

    Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang, Neil Gong, Chenguang Wang, and Dawn Song. A framework for formalizing LLM agent security.arXiv preprint arXiv:2603.19469, 2026

  14. [14]

    DRIFT: Dynamic rule-based defense with injection isolation for securing LLM agents

    Hao Li, Xiaogeng Liu, Hung-Chun Chiu, Dianqi Li, Ning Zhang, and Chaowei Xiao. DRIFT: Dynamic rule-based defense with injection isolation for securing LLM agents. InAdvances in Neural Information Processing Systems, 2025

  15. [15]

    ClawHavoc: Analyzing a coordinated supply-chain attack on OpenClaw

    Koi Security. ClawHavoc: Analyzing a coordinated supply-chain attack on OpenClaw. Security report, 2026

  16. [16]

    Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, and Bryan Hooi. Can indirect prompt injection attacks be detected and removed? InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 18189–18206, 2025

  17. [17]

    Large-scale security analysis of AI agent skills.arXiv preprint, 2026

    Liu et al. Large-scale security analysis of AI agent skills.arXiv preprint, 2026

  18. [18]

    OpenClaw skill security: Credential exposure in the ClawHub registry

    Snyk. OpenClaw skill security: Credential exposure in the ClawHub registry. Security advisory, 2026

  19. [19]

    Defeating Prompt Injections by Design

    Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

  20. [20]

    PromptArmor: Simple yet effective prompt injection defenses

    Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, and Dawn Song. PromptArmor: Simple yet effective prompt injection defenses. InICLR, 2026

  21. [21]

    SecAlign: Defending against prompt injection with preference optimization

    Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, and Chuan Guo. SecAlign: Defending against prompt injection with preference optimization. InACM CCS, 2025

  22. [22]

    Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents

    Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents. InFindings of the Association for Computa- tional Linguistics: NAACL 2025, pages 7116–7132, 2025

  23. [23]

    Alfredo Oliveira, Buddy Tancio, David Fiser, Philippe Lin, and Roel Reyes. Malicious OpenClaw skills used to distribute Atomic macOS Stealer.Trend Micro Research, 2026.https://www.trendmicro.com/ en_us/research/26/b/openclaw-skills-used-to-distribute-atomic-macos-stealer. html. 10

  24. [24]

    Researchers find 341 malicious ClawHub skills stealing data from OpenClaw users.The Hacker News, 2026

    Ravie Lakshmanan. Researchers find 341 malicious ClawHub skills stealing data from OpenClaw users.The Hacker News, 2026. https://thehackernews.com/2026/02/ researchers-find-341-malicious-clawhub.html

  25. [25]

    Progent: Programmable privilege control for LLM agents.arXiv preprint, 2025

    Tianneng Shi et al. Progent: Programmable privilege control for LLM agents.arXiv preprint, 2025

  26. [26]

    Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. InAISec, 2023

  27. [27]

    AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovi ´c, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents.arXiv preprint arXiv:2406.13352, 2024

  28. [28]

    Formal verification of a realistic compiler.Communications of the ACM, 52(7):107–116, 2009

    Xavier Leroy. Formal verification of a realistic compiler.Communications of the ACM, 52(7):107–116, 2009

  29. [29]

    Aios compiler: Llm as interpreter for natural language programming and flow programming of ai agents, 2024

    Yingqiang Ge, Yujie Ren, Wenyue Hua, Shuyuan Xu, Juntao Tan, and Yongfeng Zhang. LLM as interpreter for natural language programming, pseudo-code programming, and flow programming of AI agents.arXiv preprint arXiv:2405.06907, 2024

  30. [30]

    Agint: Agentic graph compilation for software engineering agents

    Lvmin Zhang and Maneesh Agrawala. Agint: Agentic graph compilation for software engineering agents. arXiv preprint arXiv:2511.19635, 2025. 11