Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents

Narek Maloyan , Dmitry Namiot

Authors on Pith no claims yet

Pith reviewed 2026-05-14 18:19 UTC · model grok-4.3

classification 💻 cs.CR

keywords sleeper channelsprompt injectionalways-on agentsprovenance gatespersistent processesconfused deputyaction digestsautonomous AI security

0 comments

The pith

Always-on AI agents allow untrusted inputs to persist across interfaces as sleeper channels and activate later without the attacker present.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that agents running as one persistent process under a single identity fold messaging, memory, skills, and scheduling into one boundary, creating sleeper channels: an input arrives on one surface, stores itself, then fires later on a different surface. The authors demonstrate this with a full confused-deputy cron attack on OpenClaw. They introduce tiered provenance gates, where D2 supplies a soundness theorem against seven deployment invariants by keying actions on a canonical digest and requiring one-shot owner attestations that block paraphrase laundering, grant reuse, and replay.

Core claim

Sleeper channels are defined along two axes—persistence substrate and firing-separation—and the D2 gate defeats them by requiring every action instance to carry a canonical digest attested exactly once by the owner, thereby eliminating reuse and delayed execution paths even when the original input is no longer present.

What carries the argument

D2 provenance gate: a canonical action-instance digest combined with one-shot owner attestations that enforces mediation at ten hooks and is proven sound against seven named deployment invariants.

If this is right

D2 blocks paraphrase laundering, multi-input grant reuse, and replay of scheduled or stored actions.
Five of the ten mediation hooks can be realized as a runtime adapter around the cron path while preserving Node 20 compatibility.
A static audit of vendored source can verify the presence of the required provenance checks.
The same digest-plus-attestation pattern applies to other persistent surfaces such as memory stores and self-authored skills.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agent platforms that collapse all surfaces under one identity will need explicit provenance tracking at every persistence boundary to avoid delayed execution risks.
If the seven invariants turn out to be unrealistic in production, lighter-weight alternatives such as D1 or D3 may still provide partial protection.
The approach suggests that future agent runtimes should separate owner attestation from input surfaces by design rather than by ad-hoc patches.

Load-bearing premise

The seven deployment invariants actually hold for real always-on agents and one-shot attestations can be added without creating new attack surfaces or usability problems.

What would settle it

A working end-to-end attack on an OpenClaw-style agent that reuses or replays a persisted action after D2 mediation is installed, or an implementation of one-shot attestations that itself introduces a new injection vector.

read the original abstract

Always-on AI agents (OpenClaw, Hermes Agent) run as a single persistent process under the owner's identity, folding messaging, memory, self-authored skills, scheduling, and shell into one authority boundary. This configuration opens what we call \emph{sleeper channels}: an untrusted input to one surface persists as a memory, skill, scheduled job, or filesystem patch, then fires later through a different surface with no attacker present. Two independent axes define the class: persistence substrate and firing-separation. We walk a confused-deputy cron attack end-to-end through OpenClaw at a pinned commit. The defense is tiered (D1, D2, D3), and D2 carries a soundness theorem against seven named deployment invariants. D2 keys on a canonical action-instance digest with one-shot owner attestations, defeating paraphrase laundering, multi-input grant reuse, and replay. A companion artifact ships the gate, a static audit over the vendored source, and a runtime adapter realising five of the ten mediation hooks (H1, H2, H3, H6, H9) around the cron path (42 tests, Node~$\geq{}20$, at \href{https://github.com/maloyan/sleeper-channels}{github.com/maloyan/sleeper-channels}). Empirical evaluation is preregistered as follow-on.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the described always-on agent architecture as a domain assumption and introduces new entities without external falsifiable evidence beyond the analysis.

axioms (1)

domain assumption Always-on AI agents run as a single persistent process folding messaging, memory, self-authored skills, scheduling, and shell into one authority boundary.
Stated directly in the abstract as the configuration that opens sleeper channels.

invented entities (2)

sleeper channels no independent evidence
purpose: Untrusted inputs that persist across surfaces and fire later with no attacker present.
New term and class definition introduced to categorize the attack.
provenance gates no independent evidence
purpose: Mediation using canonical action-instance digests and one-shot owner attestations.
New defense mechanism proposed with claimed soundness theorem.

pith-pipeline@v0.9.0 · 5546 in / 1288 out tokens · 46199 ms · 2026-05-14T18:19:49.256458+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 6 internal anchors

[1]

OpenClaw: Personal AI assistant runtime,

“OpenClaw: Personal AI assistant runtime,” https://github.com/openclaw/openclaw, commit 3120401f53e789caf565e60ba29cb9751829b1b6, 2026- 04-27, 2026

work page 2026
[2]

Hermes Agent,

Nous Research, “Hermes Agent,” https:// github.com/nousresearch/hermes-agent, commit 98d75dea5a86aec599b1e081f8bbe9170bd3f964, 2026- 04-27; releasev0.11.0, 2026-04-23, 2026

work page 2026
[3]

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world LLM- integrated applications with indirect prompt injection,” inProc. 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023, arXiv:2302.12173

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

E. Debenedetti, J. Zhang, M. Balunovi ´c, L. Beurer-Kellner, M. Fischer, and F. Tram`er, “AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,” inAdvances in Neural Information Processing Systems, Datasets and Benchmarks Track, 2024, arXiv:2406.13352

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,

S. S. Srivastava, “MemoryGraft: Persistent compromise of LLM agents via poisoned experience retrieval,” arXiv:2512.16962, Dec. 2025

work page arXiv 2025
[6]

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

E. Hubingeret al., “Sleeper agents: Training deceptive LLMs that persist through safety training,” arXiv:2401.05566, Jan. 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

The confused deputy (or why capabilities might have been invented),

N. Hardy, “The confused deputy (or why capabilities might have been invented),”ACM SIGOPS Operating Systems Review, vol. 22, no. 4, pp. 36–38, 1988

work page 1988
[8]

Robust composition: Towards a unified approach to access control and concurrency control,

M. S. Miller, “Robust composition: Towards a unified approach to access control and concurrency control,” Ph.D. dissertation, Johns Hopkins University, 2006

work page 2006
[9]

OpenClaw threat model v1.0 (MITRE AT- LAS),

OpenClaw maintainers, “OpenClaw threat model v1.0 (MITRE AT- LAS),”docs/security/THREAT-MODEL-ATLAS.md, OpenClaw repository at commit3120401f53e789caf565e60ba29cb9751829b1b6, last updated 2026-02-04, 2026

work page 2026
[10]

Feature: Runtime prompt injection defenses,

Anonymous community contributor, “Feature: Runtime prompt injection defenses,” Upstream issue (date, handle, and number anonymized for double-blind review), declined upstream, 2026. 7

work page 2026
[11]

Ignore Previous Prompt: Attack Techniques For Language Models

F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models,” arXiv:2211.09527, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[12]

Tensor Trust: Interpretable prompt injection attacks from an online game,

S. Toyer, O. Watkins, E. A. Mendes, J. Svegliato, L. Bailey, T. Wang, I. Ong, K. Elmaaroufi, P. Abbeel, T. Darrell, A. Ritter, and S. Russell, “Tensor Trust: Interpretable prompt injection attacks from an online game,” arXiv:2311.01011, 2023

work page arXiv 2023
[13]

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

H. Zhang, J. Huang, K. Mei, Y . Yao, Z. Wang, C. Zhan, H. Wang, and Y . Zhang, “Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents,” arXiv:2410.02644, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

InjecAgent: Benchmark- ing indirect prompt injections in tool-integrated large language model agents,

Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “InjecAgent: Benchmark- ing indirect prompt injections in tool-integrated large language model agents,” Findings of ACL, 2024

work page 2024
[15]

AgentPoison: Red- teaming LLM agents via poisoning memory or knowledge bases,

Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “AgentPoison: Red- teaming LLM agents via poisoning memory or knowledge bases,” Proc. NeurIPS, 2024

work page 2024
[16]

PoisonedRAG: Knowledge corruption attacks to retrieval-augmented generation of large language models,

W. Zou, R. Geng, B. Wang, and J. Jia, “PoisonedRAG: Knowledge corruption attacks to retrieval-augmented generation of large language models,” Proc. USENIX Security Symposium, 2024

work page 2024
[17]

Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, and Florian Tramèr

M. Nasret al., “The attacker moves second: Stronger adaptive at- tacks bypass defenses against LLM jailbreaks and prompt injections,” arXiv:2510.09023, Oct. 2025

work page arXiv 2025
[18]

Are aligned neural networks adversarially aligned?

N. Carlini, M. Nasr, C. A. Choquette-Choo, M. Jagielski, I. Gao, A. Awadalla, P. W. Koh, D. Ippolito, K. Lee, F. Tramer, and L. Schmidt, “Are aligned neural networks adversarially aligned?” Proc. NeurIPS, 2024

work page 2024
[19]

The protection of information in computer systems,

J. H. Saltzer and M. D. Schroeder, “The protection of information in computer systems,”Proceedings of the IEEE, vol. 63, no. 9, pp. 1278– 1308, 1975

work page 1975
[20]

Capability myths demolished,

M. S. Miller, K.-P. Yee, and J. Shapiro, “Capability myths demolished,” inTech. Rep. SRL2003-02. Johns Hopkins Univ. Systems Research Laboratory, 2003

work page 2003
[21]

H. M. Levy,Capability-based computer systems. Digital Press, 1984

work page 1984
[22]

Agents rule of two: A practical approach to AI agent security,

Meta AI Security, “Agents rule of two: A practical approach to AI agent security,” Tech. blog, Oct. 2025

work page 2025
[23]

A lattice model of secure information flow,

D. E. Denning, “A lattice model of secure information flow,”Commu- nications of the ACM, vol. 19, no. 5, pp. 236–243, 1976

work page 1976
[24]

L. Wall, T. Christiansen, and J. Orwant,Programming Perl, 3rd ed. O’Reilly, 2000

work page 2000
[25]

Secure program execution via dynamic information flow tracking,

G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas, “Secure program execution via dynamic information flow tracking,” inProc. 11th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2004, pp. 85–96

work page 2004
[26]

TaintDroid: An information- flow tracking system for realtime privacy monitoring on smartphones,

W. Enck, P. Gilbert, S. Han, V . Tendulkar, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth, “TaintDroid: An information- flow tracking system for realtime privacy monitoring on smartphones,” in Proc. 9th USENIX Symp. Operating Systems Design and Implementation (OSDI), 2010, pp. 393–407

work page 2010
[27]

All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask),

E. J. Schwartz, T. Avgerinos, and D. Brumley, “All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask),” inProc. IEEE Symp. Security and Privacy (S&P), 2010, pp. 317–331

work page 2010
[28]

Securing AI agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

M. Costaet al., “Securing AI agents with information-flow control,” arXiv:2505.23643, 2025

work page arXiv 2025
[29]

Defeating Prompt Injections by Design

E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, F. Tram `er, and A. Terzis, “Defeating prompt injections by design,” arXiv:2503.18813, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Some recipes can do more than spoil your appetite: Analyzing the security and privacy risks of IFTTT recipes,

M. Surbatovich, J. Aljuraidan, L. Bauer, A. Das, and L. Jia, “Some recipes can do more than spoil your appetite: Analyzing the security and privacy risks of IFTTT recipes,” inProc. 26th Int. Conf. World Wide Web (WWW), 2017, pp. 1501–1510

work page 2017
[31]

Fear and logging in the Internet of Things,

Q. Wang, W. U. Hassan, A. Bates, and C. A. Gunter, “Fear and logging in the Internet of Things,” inProc. NDSS, 2018

work page 2018
[32]

Agentic Security Initiative,

OW ASP Foundation, “Agentic Security Initiative,” https://genai.owasp. org/initiatives/agentic-security-initiative/, accessed Apr. 2026

work page 2026
[33]

AI agents under threat: A survey of key security challenges and future pathways,

Z. Deng, Y . Guo, C. Han, W. Ma, J. Xiong, S. Wen, and Y . Xiang, “AI agents under threat: A survey of key security challenges and future pathways,” arXiv:2406.02630, 2025

work page arXiv 2025
[34]

The emerged security and privacy of LLM agent: A survey with case studies,

F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of LLM agent: A survey with case studies,” arXiv:2407.19354, 2024

work page arXiv 2024
[35]

Poisoning retrieval corpora by injecting adversarial passages,

Z. Zhong, Z. Huang, A. Wettig, and D. Chen, “Poisoning retrieval corpora by injecting adversarial passages,” inProc. EMNLP, 2023

work page 2023
[36]

TPM 2.0 library specification, part 1: Architecture,

Trusted Computing Group, “TPM 2.0 library specification, part 1: Architecture,” Specification Version 1.59, 2019

work page 2019
[37]

Formalizing and benchmarking prompt injection attacks and de- fenses

Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses,” inProc. 33rd USENIX Security Symposium, 2024, arXiv:2310.12815

work page arXiv 2024
[38]

Repeatability in computer systems research,

C. Collberg and T. A. Proebsting, “Repeatability in computer systems research,” inCommunications of the ACM, vol. 59, no. 3, 2016, pp. 62–69. 8

work page 2016