Recognition: unknown
Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents
Pith reviewed 2026-05-14 18:19 UTC · model grok-4.3
The pith
Always-on AI agents allow untrusted inputs to persist across interfaces as sleeper channels and activate later without the attacker present.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Sleeper channels are defined along two axes—persistence substrate and firing-separation—and the D2 gate defeats them by requiring every action instance to carry a canonical digest attested exactly once by the owner, thereby eliminating reuse and delayed execution paths even when the original input is no longer present.
What carries the argument
D2 provenance gate: a canonical action-instance digest combined with one-shot owner attestations that enforces mediation at ten hooks and is proven sound against seven named deployment invariants.
If this is right
- D2 blocks paraphrase laundering, multi-input grant reuse, and replay of scheduled or stored actions.
- Five of the ten mediation hooks can be realized as a runtime adapter around the cron path while preserving Node 20 compatibility.
- A static audit of vendored source can verify the presence of the required provenance checks.
- The same digest-plus-attestation pattern applies to other persistent surfaces such as memory stores and self-authored skills.
Where Pith is reading between the lines
- Agent platforms that collapse all surfaces under one identity will need explicit provenance tracking at every persistence boundary to avoid delayed execution risks.
- If the seven invariants turn out to be unrealistic in production, lighter-weight alternatives such as D1 or D3 may still provide partial protection.
- The approach suggests that future agent runtimes should separate owner attestation from input surfaces by design rather than by ad-hoc patches.
Load-bearing premise
The seven deployment invariants actually hold for real always-on agents and one-shot attestations can be added without creating new attack surfaces or usability problems.
What would settle it
A working end-to-end attack on an OpenClaw-style agent that reuses or replays a persisted action after D2 mediation is installed, or an implementation of one-shot attestations that itself introduces a new injection vector.
read the original abstract
Always-on AI agents (OpenClaw, Hermes Agent) run as a single persistent process under the owner's identity, folding messaging, memory, self-authored skills, scheduling, and shell into one authority boundary. This configuration opens what we call \emph{sleeper channels}: an untrusted input to one surface persists as a memory, skill, scheduled job, or filesystem patch, then fires later through a different surface with no attacker present. Two independent axes define the class: persistence substrate and firing-separation. We walk a confused-deputy cron attack end-to-end through OpenClaw at a pinned commit. The defense is tiered (D1, D2, D3), and D2 carries a soundness theorem against seven named deployment invariants. D2 keys on a canonical action-instance digest with one-shot owner attestations, defeating paraphrase laundering, multi-input grant reuse, and replay. A companion artifact ships the gate, a static audit over the vendored source, and a runtime adapter realising five of the ten mediation hooks (H1, H2, H3, H6, H9) around the cron path (42 tests, Node~$\geq{}20$, at \href{https://github.com/maloyan/sleeper-channels}{github.com/maloyan/sleeper-channels}). Empirical evaluation is preregistered as follow-on.
Editorial analysis
A structured set of objections, weighed in public.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Always-on AI agents run as a single persistent process folding messaging, memory, self-authored skills, scheduling, and shell into one authority boundary.
invented entities (2)
-
sleeper channels
no independent evidence
-
provenance gates
no independent evidence
Reference graph
Works this paper leans on
-
[1]
OpenClaw: Personal AI assistant runtime,
“OpenClaw: Personal AI assistant runtime,” https://github.com/openclaw/openclaw, commit 3120401f53e789caf565e60ba29cb9751829b1b6, 2026- 04-27, 2026
work page 2026
-
[2]
Nous Research, “Hermes Agent,” https:// github.com/nousresearch/hermes-agent, commit 98d75dea5a86aec599b1e081f8bbe9170bd3f964, 2026- 04-27; releasev0.11.0, 2026-04-23, 2026
work page 2026
-
[3]
K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real-world LLM- integrated applications with indirect prompt injection,” inProc. 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023, arXiv:2302.12173
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
E. Debenedetti, J. Zhang, M. Balunovi ´c, L. Beurer-Kellner, M. Fischer, and F. Tram`er, “AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,” inAdvances in Neural Information Processing Systems, Datasets and Benchmarks Track, 2024, arXiv:2406.13352
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[5]
Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,
S. S. Srivastava, “MemoryGraft: Persistent compromise of LLM agents via poisoned experience retrieval,” arXiv:2512.16962, Dec. 2025
-
[6]
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
E. Hubingeret al., “Sleeper agents: Training deceptive LLMs that persist through safety training,” arXiv:2401.05566, Jan. 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
The confused deputy (or why capabilities might have been invented),
N. Hardy, “The confused deputy (or why capabilities might have been invented),”ACM SIGOPS Operating Systems Review, vol. 22, no. 4, pp. 36–38, 1988
work page 1988
-
[8]
Robust composition: Towards a unified approach to access control and concurrency control,
M. S. Miller, “Robust composition: Towards a unified approach to access control and concurrency control,” Ph.D. dissertation, Johns Hopkins University, 2006
work page 2006
-
[9]
OpenClaw threat model v1.0 (MITRE AT- LAS),
OpenClaw maintainers, “OpenClaw threat model v1.0 (MITRE AT- LAS),”docs/security/THREAT-MODEL-ATLAS.md, OpenClaw repository at commit3120401f53e789caf565e60ba29cb9751829b1b6, last updated 2026-02-04, 2026
work page 2026
-
[10]
Feature: Runtime prompt injection defenses,
Anonymous community contributor, “Feature: Runtime prompt injection defenses,” Upstream issue (date, handle, and number anonymized for double-blind review), declined upstream, 2026. 7
work page 2026
-
[11]
Ignore Previous Prompt: Attack Techniques For Language Models
F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models,” arXiv:2211.09527, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[12]
Tensor Trust: Interpretable prompt injection attacks from an online game,
S. Toyer, O. Watkins, E. A. Mendes, J. Svegliato, L. Bailey, T. Wang, I. Ong, K. Elmaaroufi, P. Abbeel, T. Darrell, A. Ritter, and S. Russell, “Tensor Trust: Interpretable prompt injection attacks from an online game,” arXiv:2311.01011, 2023
-
[13]
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
H. Zhang, J. Huang, K. Mei, Y . Yao, Z. Wang, C. Zhan, H. Wang, and Y . Zhang, “Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents,” arXiv:2410.02644, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “InjecAgent: Benchmark- ing indirect prompt injections in tool-integrated large language model agents,” Findings of ACL, 2024
work page 2024
-
[15]
AgentPoison: Red- teaming LLM agents via poisoning memory or knowledge bases,
Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “AgentPoison: Red- teaming LLM agents via poisoning memory or knowledge bases,” Proc. NeurIPS, 2024
work page 2024
-
[16]
W. Zou, R. Geng, B. Wang, and J. Jia, “PoisonedRAG: Knowledge corruption attacks to retrieval-augmented generation of large language models,” Proc. USENIX Security Symposium, 2024
work page 2024
-
[17]
M. Nasret al., “The attacker moves second: Stronger adaptive at- tacks bypass defenses against LLM jailbreaks and prompt injections,” arXiv:2510.09023, Oct. 2025
-
[18]
Are aligned neural networks adversarially aligned?
N. Carlini, M. Nasr, C. A. Choquette-Choo, M. Jagielski, I. Gao, A. Awadalla, P. W. Koh, D. Ippolito, K. Lee, F. Tramer, and L. Schmidt, “Are aligned neural networks adversarially aligned?” Proc. NeurIPS, 2024
work page 2024
-
[19]
The protection of information in computer systems,
J. H. Saltzer and M. D. Schroeder, “The protection of information in computer systems,”Proceedings of the IEEE, vol. 63, no. 9, pp. 1278– 1308, 1975
work page 1975
-
[20]
M. S. Miller, K.-P. Yee, and J. Shapiro, “Capability myths demolished,” inTech. Rep. SRL2003-02. Johns Hopkins Univ. Systems Research Laboratory, 2003
work page 2003
-
[21]
H. M. Levy,Capability-based computer systems. Digital Press, 1984
work page 1984
-
[22]
Agents rule of two: A practical approach to AI agent security,
Meta AI Security, “Agents rule of two: A practical approach to AI agent security,” Tech. blog, Oct. 2025
work page 2025
-
[23]
A lattice model of secure information flow,
D. E. Denning, “A lattice model of secure information flow,”Commu- nications of the ACM, vol. 19, no. 5, pp. 236–243, 1976
work page 1976
-
[24]
L. Wall, T. Christiansen, and J. Orwant,Programming Perl, 3rd ed. O’Reilly, 2000
work page 2000
-
[25]
Secure program execution via dynamic information flow tracking,
G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas, “Secure program execution via dynamic information flow tracking,” inProc. 11th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2004, pp. 85–96
work page 2004
-
[26]
TaintDroid: An information- flow tracking system for realtime privacy monitoring on smartphones,
W. Enck, P. Gilbert, S. Han, V . Tendulkar, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth, “TaintDroid: An information- flow tracking system for realtime privacy monitoring on smartphones,” in Proc. 9th USENIX Symp. Operating Systems Design and Implementation (OSDI), 2010, pp. 393–407
work page 2010
-
[27]
E. J. Schwartz, T. Avgerinos, and D. Brumley, “All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask),” inProc. IEEE Symp. Security and Privacy (S&P), 2010, pp. 317–331
work page 2010
-
[28]
Securing AI agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025
M. Costaet al., “Securing AI agents with information-flow control,” arXiv:2505.23643, 2025
-
[29]
Defeating Prompt Injections by Design
E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, F. Tram `er, and A. Terzis, “Defeating prompt injections by design,” arXiv:2503.18813, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[30]
M. Surbatovich, J. Aljuraidan, L. Bauer, A. Das, and L. Jia, “Some recipes can do more than spoil your appetite: Analyzing the security and privacy risks of IFTTT recipes,” inProc. 26th Int. Conf. World Wide Web (WWW), 2017, pp. 1501–1510
work page 2017
-
[31]
Fear and logging in the Internet of Things,
Q. Wang, W. U. Hassan, A. Bates, and C. A. Gunter, “Fear and logging in the Internet of Things,” inProc. NDSS, 2018
work page 2018
-
[32]
OW ASP Foundation, “Agentic Security Initiative,” https://genai.owasp. org/initiatives/agentic-security-initiative/, accessed Apr. 2026
work page 2026
-
[33]
AI agents under threat: A survey of key security challenges and future pathways,
Z. Deng, Y . Guo, C. Han, W. Ma, J. Xiong, S. Wen, and Y . Xiang, “AI agents under threat: A survey of key security challenges and future pathways,” arXiv:2406.02630, 2025
-
[34]
The emerged security and privacy of LLM agent: A survey with case studies,
F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of LLM agent: A survey with case studies,” arXiv:2407.19354, 2024
-
[35]
Poisoning retrieval corpora by injecting adversarial passages,
Z. Zhong, Z. Huang, A. Wettig, and D. Chen, “Poisoning retrieval corpora by injecting adversarial passages,” inProc. EMNLP, 2023
work page 2023
-
[36]
TPM 2.0 library specification, part 1: Architecture,
Trusted Computing Group, “TPM 2.0 library specification, part 1: Architecture,” Specification Version 1.59, 2019
work page 2019
-
[37]
Formalizing and benchmarking prompt injection attacks and de- fenses
Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses,” inProc. 33rd USENIX Security Symposium, 2024, arXiv:2310.12815
-
[38]
Repeatability in computer systems research,
C. Collberg and T. A. Proebsting, “Repeatability in computer systems research,” inCommunications of the ACM, vol. 59, no. 3, 2016, pp. 62–69. 8
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.