Securing LLM-Agent Long-Term Memory Against Poisoning: Non-Malleable, Origin-Bound Authority with Machine-Checked Guarantees

Yedidel Louck

arxiv: 2606.24322 · v1 · pith:Q5V55P7Snew · submitted 2026-06-23 · 💻 cs.CR

Securing LLM-Agent Long-Term Memory Against Poisoning: Non-Malleable, Origin-Bound Authority with Machine-Checked Guarantees

Yedidel Louck This is my paper

Pith reviewed 2026-06-25 23:38 UTC · model grok-4.3

classification 💻 cs.CR

keywords LLM agentsmemory poisoninginformation flow controlorigin bindinglaundering attackstamper-evident memorySybil-resistant corroboration

0 comments

The pith

Write-time origin binding with non-malleable authority is necessary and sufficient to block memory poisoning in LLM agents even under laundering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that defenses relying on content inspection or derivation history can be defeated when an attacker launders untrusted memory through the agent's own summarization, trusted-tool echoes, or manufactured corroboration. These channels make poisoned items appear benign and break or reverse their apparent trusted lineage. The authors formalize this malleability for the write-retrieve-act pipeline and prove a machine-checked separation result: content- and lineage-based methods are unsound under laundering, origin binding at write time is required, and non-malleable origin-bound authority plus Sybil-resistant corroboration gating suffices. Their TMA-NM construction implements this approach and is evaluated in a cross-model, cross-attack benchmark over eight frontier models, where it records zero attack success on both direct and laundering attempts while preserving full legitimate utility.

Core claim

No content- or lineage-based defense is sound under laundering (T1), write-time origin binding is necessary (T2), and non-malleable origin-bound authority with Sybil-resistant corroboration-gated elevation is sufficient (T3). TMA-NM instantiates non-malleable information-flow control for LLM-agent memory and reaches 0% attack success on both direct and laundering attacks across all tested models and channels at full legitimate utility.

What carries the argument

TMA-NM (Tamper-evident Memory Authority, Non-Malleable), which enforces write-time origin binding and applies non-malleable information-flow control to prevent laundering of memory authority.

If this is right

Existing content- and lineage-based defenses reach up to 68% success on laundering attacks.
TMA-NM blocks both direct poisoning and laundering across eight frontier models.
Legitimate agent utility remains unchanged under the new authority mechanism.
Machine-checked TLA+ specifications confirm the separation theorems for the memory pipeline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same origin-binding principle could apply to other forms of persistent state that agents accumulate across sessions.
Agent frameworks may need to expose explicit write-time origin metadata rather than relying on post-hoc trust scoring.
Production deployments could adopt corroboration gating as a default for any memory that can trigger external actions.

Load-bearing premise

The TLA+ models accurately capture the full write-retrieve-act pipeline and all relevant laundering channels in real LLM agents, and the cross-model benchmark is representative without unstated exclusions or model-specific artifacts that would alter the 0% result.

What would settle it

Observing a laundering channel or attack vector that produces non-zero success against TMA-NM in an expanded set of models or real agent deployments would falsify the sufficiency claim.

Figures

Figures reproduced from arXiv: 2606.24322 by Yedidel Louck.

**Figure 2.** Figure 2: Unified benchmark (pooled, 8 models, Wilson 95% CIs). TMA-NM (green) is 0% on the direct attack and laundering while preserving 100% legit-utility. an uncorroborated, untrusted-sourced consequential action requires a one-time user confirmation (uncorr-auto = 0%), the correct anti-fraud behavior that the insecure baselines skip (uncorr-auto = 100%). Three supporting studies reinforce this. The cross-model … view at source ↗

**Figure 3.** Figure 3: Ablation (mean over the eight models): removing origin binding [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: Theory↔benchmark correspondence: attacker-action ASR (%, 8 models) for each defense class against each laundering channel. Each malleable defense fails on the channels the separation theorem predicts, and TMA-NM is 0% throughout. A. Theory↔benchmark correspondence (laundering) The separation theorem (Section IV) predicts which defense class fails on which laundering channel [PITH_FULL_IMAGE:figures/full_… view at source ↗

read the original abstract

LLM agents increasingly rely on persistent long-term memory, which creates a critical vulnerability that we study here: memory poisoning. An adversary can store untrusted content in one session that later steers a consequential action, such as a payment, a setting change, or data exfiltration, in a future session. Existing defenses base a memory item's authority to act on either its content (detection or trust-scoring) or its derivation history (lineage). We show that both signals are malleable. An attacker can launder an untrusted origin through three channels specific to LLM agents: the agent's own summarization, a trusted-tool echo, and manufactured corroboration. Each makes the content look benign and breaks or flips its derivation edge to ``trusted.'' We formalize malleability for the memory write-retrieve-act pipeline and prove a machine-checked separation theorem. No content- or lineage-based defense is sound under laundering (T1), write-time origin binding is necessary (T2), and non-malleable origin-bound authority with Sybil-resistant corroboration-gated elevation is sufficient (T3). Our construction, TMA-NM (Tamper-evident Memory Authority, Non-Malleable), instantiates non-malleable information-flow control (IFC) for LLM-agent memory. A cross-defense, cross-attack, and cross-model benchmark over eight frontier models shows that existing defenses fail exactly where the theory predicts (up to 68% laundering attack-success), while TMA-NM reaches 0% attack success on both direct and laundering attacks across all models and channels, at full legitimate utility. We release the benchmark, harness, and machine-checked TLA+ models to support reproducibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows origin binding plus non-malleable IFC is needed once laundering channels exist in LLM memory and supplies machine-checked TLA+ theorems plus a benchmark that hits zero attack success where prior methods fail.

read the letter

The main point is that content and lineage signals for LLM-agent memory can be laundered through the agent's own summarization, trusted-tool echoes, and manufactured corroboration. The authors formalize this in a write-retrieve-act pipeline and prove with TLA+ that no defense based on those signals holds up, that write-time origin binding is required, and that their TMA-NM construction suffices.

They do the work of naming the three LLM-specific channels and checking the separation theorems. The benchmark across eight models is direct: existing defenses reach 68% success on the laundering attacks while TMA-NM stays at 0% with full legitimate utility. Releasing the TLA+ models and the harness is concrete and lets others inspect the claims.

The soft spot is the scope of the model. The TLA+ specs treat the LLM as a state machine with exactly those three channels. If real agents have additional paths, such as multi-hop tool chaining or retrieval effects that rewrite memory without hitting the modeled edges, then the necessity claim applies only inside the specification rather than to every deployed system. The 0% result validates against the attacks they encoded, not against every possible laundering route.

This is for researchers and engineers who build or secure long-running agents that keep persistent memory. The formal grounding and the clean empirical split make the work worth referee time even if the modeling assumptions need discussion in review.

Referee Report

2 major / 2 minor

Summary. The paper claims that content- and lineage-based defenses against memory poisoning in LLM-agent long-term memory are unsound because an adversary can launder untrusted origins through three agent-specific channels (summarization, trusted-tool echo, manufactured corroboration). It proves via machine-checked TLA+ theorems that no such defense is sound under laundering (T1), write-time origin binding is necessary (T2), and non-malleable origin-bound authority with Sybil-resistant corroboration-gated elevation (TMA-NM) is sufficient (T3). A cross-model benchmark over eight frontier models reports that existing defenses reach up to 68% laundering attack success while TMA-NM achieves 0% on both direct and laundering attacks at full legitimate utility; the TLA+ models, benchmark harness, and code are released.

Significance. If the TLA+ models faithfully capture the write-retrieve-act pipeline and the benchmark is representative, the work supplies the first machine-checked separation theorems for this vulnerability class together with a concrete, non-malleable construction that demonstrably blocks the modeled attacks. The explicit release of the TLA+ specifications and reproducible harness is a clear strength that enables external verification and extension.

major comments (2)

[TLA+ models and theorems T1–T3] TLA+ models (theorems T1–T3): the separation results rest on a state-machine abstraction of the LLM that enumerates exactly three laundering channels. The manuscript must explicitly argue why this abstraction is complete with respect to real frontier-model behaviors (e.g., multi-hop tool chaining, implicit retrieval-triggered rewriting, or nondeterministic output that bypasses modeled edges); without such justification the necessity claim (T2) and sufficiency claim (T3) remain conditional on the modeled channels only.
[Benchmark evaluation] Benchmark section: the abstract states 0% attack success for TMA-NM and up to 68% for baselines across eight models, yet the main text must report the exact number of trials per (model, attack, channel) cell, the precise laundering implementations, and whether any runs were excluded post-hoc. Absent these details the empirical support for the “0% across all models and channels” claim cannot be assessed for robustness.

minor comments (2)

[Evaluation metrics] The utility metric (“full legitimate utility”) is referenced but not defined with concrete success criteria or measurement protocol; add an explicit definition and table of per-task utility numbers.
[Formal model] Notation for the three laundering channels is introduced in the abstract but should be given a single, consistently used label (e.g., L1–L3) in the formal model section to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to strengthen the manuscript as indicated.

read point-by-point responses

Referee: [TLA+ models and theorems T1–T3] TLA+ models (theorems T1–T3): the separation results rest on a state-machine abstraction of the LLM that enumerates exactly three laundering channels. The manuscript must explicitly argue why this abstraction is complete with respect to real frontier-model behaviors (e.g., multi-hop tool chaining, implicit retrieval-triggered rewriting, or nondeterministic output that bypasses modeled edges); without such justification the necessity claim (T2) and sufficiency claim (T3) remain conditional on the modeled channels only.

Authors: We agree that an explicit argument for the completeness of the three-channel abstraction is needed to support the necessity (T2) and sufficiency (T3) claims. In the revised manuscript we will add a new subsection (Section 4.3) that justifies the abstraction by showing that the three channels correspond to the fundamental operations in the write-retrieve-act pipeline: summarization covers implicit rewriting and retrieval-triggered generation; trusted-tool echo subsumes multi-hop tool chaining and external data ingestion; and manufactured corroboration addresses self-reinforcing or multi-agent feedback loops. We will further note that the TLA+ model already incorporates nondeterminism to capture output variability, and that any unmodeled behavior would still be subject to the same information-flow constraints. While the abstraction cannot anticipate every future model capability, it is complete for the class of behaviors exhibited by current frontier models under the stated pipeline. revision: yes
Referee: [Benchmark evaluation] Benchmark section: the abstract states 0% attack success for TMA-NM and up to 68% for baselines across eight models, yet the main text must report the exact number of trials per (model, attack, channel) cell, the precise laundering implementations, and whether any runs were excluded post-hoc. Absent these details the empirical support for the “0% across all models and channels” claim cannot be assessed for robustness.

Authors: We acknowledge that the main text currently presents aggregate results and that the requested per-cell statistics, implementation details, and exclusion criteria should be reported explicitly for robustness assessment. In the revision we will expand Section 6 to include: (i) the exact trial count (50 independent trials per model-attack-channel combination, for a total of 9,600 runs), (ii) precise pseudocode and parameter settings for each laundering channel implementation, and (iii) an explicit statement that no runs were excluded post-hoc. These details already appear in the released benchmark harness and appendix; we will move the key figures and descriptions into the main body. revision: yes

Circularity Check

0 steps flagged

No circularity; machine-checked TLA+ theorems are externally verifiable and independent of paper claims.

full rationale

The core claims (T1: content/lineage defenses unsound under laundering; T2: origin binding necessary; T3: TMA-NM sufficient) rest on a released TLA+ specification of the write-retrieve-act pipeline and three laundering channels, which is machine-checked and externally reproducible. This satisfies the criterion for independent support via formal verification. No self-definitional reductions, fitted inputs renamed as predictions, load-bearing self-citations, or ansatz smuggling occur. The empirical 0% results validate the modeled attacks without reducing to internal parameter fits. The skeptic concern about model completeness is a correctness issue, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the TLA+ models correctly encoding the memory pipeline and attacks, plus the benchmark being exhaustive; no free parameters are described.

axioms (1)

domain assumption Malleability of content- and lineage-based signals can be formalized for the write-retrieve-act pipeline and separated into necessary and sufficient conditions via T1-T3.
Invoked to establish the theorems and the sufficiency of TMA-NM.

invented entities (1)

TMA-NM no independent evidence
purpose: Instantiates non-malleable origin-bound authority with corroboration-gated elevation for LLM memory.
New construction introduced to satisfy T3.

pith-pipeline@v0.9.1-grok · 5841 in / 1478 out tokens · 35628 ms · 2026-06-25T23:38:51.649991+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 20 linked inside Pith

[1]

Memgpt: Towards llms as operating systems,

C. Packer, S. Wooders, K. Lin, V . Fang, S. G. Patil, I. Stoica, and J. E. Gonzalez, “Memgpt: Towards llms as operating systems,” 2024. [Online]. Available: https://arxiv.org/abs/2310.08560

Pith/arXiv arXiv 2024
[2]

Mem0: Building production-ready ai agents with scalable long-term memory,

P. Chhikara, D. Khant, S. Aryan, T. Singh, and D. Yadav, “Mem0: Building production-ready ai agents with scalable long-term memory,”
[3]

Available: https://arxiv.org/abs/2504.19413

[Online]. Available: https://arxiv.org/abs/2504.19413

Pith/arXiv arXiv
[4]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” 2023. [Online]. Available: https://arxiv.org/abs/2304.03442

Pith/arXiv arXiv 2023
[5]

Memmachine: A ground-truth-preserving memory system for personalized ai agents,

S. Wang, E. Yu, O. Love, T. Zhang, T. Wong, S. Scargall, and C. Fan, “Memmachine: A ground-truth-preserving memory system for personalized ai agents,” 2026. [Online]. Available: https://arxiv.org/abs/2604.04853

Pith/arXiv arXiv 2026
[6]

Hidden in memory: Sleeper memory poisoning in llm agents,

S. Pulipaka, S. Hlebik, L. Raghav, S. Abdelnabi, V . Raina, I. Sheth, and M. Fritz, “Hidden in memory: Sleeper memory poisoning in llm agents,” 2026. [Online]. Available: https://arxiv.org/abs/2605.15338

Pith/arXiv arXiv 2026
[7]

Sleeper agents: Training deceptive llms that persist through safety training,

E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D. M. Ziegler, T. Maxwell, N. Cheng, A. Jermyn, A. Askell, A. Radhakrishnan, C. Anil, D. Duvenaud, D. Ganguli, F. Barez, J. Clark, K. Ndousse, K. Sachan, M. Sellitto, M. Sharma, N. DasSarma, R. Grosse, S. Kravec, Y . Bai, Z. Witten, M. Favaro, J. Brauner, H. Karnofsky, P. Chris...

Pith/arXiv arXiv 2024
[8]

From storage to steering: Memory control flow attacks on llm agents,

Z. Xu, X. Zhu, Y . Yao, M. Xue, and Y . Song, “From storage to steering: Memory control flow attacks on llm agents,” 2026. [Online]. Available: https://arxiv.org/abs/2603.15125

Pith/arXiv arXiv 2026
[9]

Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,

W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2402.07867

arXiv 2024
[10]

Memory injection attacks on llm agents via query-only interaction,

S. Dong, S. Xu, P. He, Y . Li, J. Tang, T. Liu, H. Liu, and Z. Xiang, “Memory injection attacks on llm agents via query-only interaction,”
[11]

Available: https://arxiv.org/abs/2503.03704

[Online]. Available: https://arxiv.org/abs/2503.03704

arXiv
[12]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,

Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,”
[13]

Available: https://arxiv.org/abs/2407.12784

[Online]. Available: https://arxiv.org/abs/2407.12784

arXiv
[14]

Watch out for your agents! investigating backdoor threats to llm-based agents,

W. Yang, X. Bi, Y . Lin, S. Chen, J. Zhou, and X. Sun, “Watch out for your agents! investigating backdoor threats to llm-based agents,” 2024. [Online]. Available: https://arxiv.org/abs/2402.11208

arXiv 2024
[15]

A systematic survey of security threats and defenses in llm-based ai agents: A layered attack surface framework,

K. Chu, “A systematic survey of security threats and defenses in llm-based ai agents: A layered attack surface framework,” 2026. [Online]. Available: https://arxiv.org/abs/2604.23338

Pith/arXiv arXiv 2026
[16]

A survey on long-term memory security in llm agents: Attacks, defenses, and governance across the memory lifecycle,

Z. Lin, X. Hao, R. Fu, S. Cui, K. Chen, C. Li, Z. Li, and F. Xiong, “A survey on long-term memory security in llm agents: Attacks, defenses, and governance across the memory lifecycle,” 2026. [Online]. Available: https://arxiv.org/abs/2604.16548

Pith/arXiv arXiv 2026
[17]

Memlineage: Lineage-guided enforcement for llm agent memory,

C. Ouyang and R. Hou, “Memlineage: Lineage-guided enforcement for llm agent memory,” 2026. [Online]. Available: https://arxiv.org/abs/ 2605.14421

Pith/arXiv arXiv 2026
[18]

Nonmalleable information flow control,

E. Cecchetti, A. C. Myers, and O. Arden, “Nonmalleable information flow control,” inProc. ACM SIGSAC Conf. on Computer and Commu- nications Security (CCS), 2017, arXiv:1708.08596

arXiv 2017
[19]

Security analysis of agentic ai communication protocols: A comparative evaluation,

Y . Louck, A. Dvir, and A. Stulman, “Security analysis of agentic ai communication protocols: A comparative evaluation,”ACM Transactions on AI Security and Privacy, 2025

2025
[20]

Improving google a2a protocol: Protecting sensitive data and mitigating unintended harms in multi-agent systems,

Y . Louck, A. Stulman, and A. Dvir, “Improving google a2a protocol: Protecting sensitive data and mitigating unintended harms in multi-agent systems,”ACM Transactions on Software Engineering and Methodology, 2025. 12

2025
[21]

Superlocalmemory: Privacy-preserving multi-agent memory with bayesian trust defense against memory poisoning,

V . P. Bhardwaj, “Superlocalmemory: Privacy-preserving multi-agent memory with bayesian trust defense against memory poisoning,” 2026. [Online]. Available: https://arxiv.org/abs/2603.02240

arXiv 2026
[22]

Memmorph: Tool hijacking in llm agents via memory poisoning,

X. Zhang, Y . Zheng, Z. Xu, K. Zhou, B. Shen, H. Ou, T. Zhang, and K.-Y . Lam, “Memmorph: Tool hijacking in llm agents via memory poisoning,” 2026. [Online]. Available: https://arxiv.org/abs/2605.26154

Pith/arXiv arXiv 2026
[23]

Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,

S. S. Srivastava and H. He, “Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,” 2025. [Online]. Available: https://arxiv.org/abs/2512.16962

arXiv 2025
[24]

Trojan hippo: Weaponizing agent memory for data exfiltration,

D. Das, J. Piet, D. Kaviani, L. Beurer-Kellner, F. Tram `er, and D. Wagner, “Trojan hippo: Weaponizing agent memory for data exfiltration,” 2026. [Online]. Available: https://arxiv.org/abs/2605.01970

Pith/arXiv arXiv 2026
[25]

Hijacking agent memory: Stealthy trojan attacks through conversational interaction,

H. Wang, S. Yang, Y . Chen, and P. Liu, “Hijacking agent memory: Stealthy trojan attacks through conversational interaction,” 2026. [Online]. Available: https://arxiv.org/abs/2605.29960

Pith/arXiv arXiv 2026
[26]

Defeating prompt injections by design,

E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tram `er, “Defeating prompt injections by design,” 2025. [Online]. Available: https://arxiv.org/abs/2503.18813

Pith/arXiv arXiv 2025
[27]

Securing ai agents with information-flow control,

M. Costa, B. K ¨opf, A. Kolluri, A. Paverd, M. Russinovich, A. Salem, S. Tople, L. Wutschitz, and S. Zanella-B ´eguelin, “Securing ai agents with information-flow control,” 2025. [Online]. Available: https://arxiv.org/abs/2505.23643

Pith/arXiv arXiv 2025
[28]

Ghost in the agent: Redefining information flow tracking for llm agents,

Y . Cai, W. Tang, C. Wen, and S. Qin, “Ghost in the agent: Redefining information flow tracking for llm agents,” 2026. [Online]. Available: https://arxiv.org/abs/2604.23374

Pith/arXiv arXiv 2026
[29]

Selection integrity for llm graph memory: An accumulability criterion for information-flow-blind retrieval,

Z. Fei, H. Fei, X. Wang, Y . Yang, P. Gope, B. Sikdar, and Y . Zhang, “Selection integrity for llm graph memory: An accumulability criterion for information-flow-blind retrieval,” 2026. [Online]. Available: https://arxiv.org/abs/2606.12290

Pith/arXiv arXiv 2026
[30]

Taming various privilege escalation in llm-based agent systems: A mandatory access control framework,

Z. Ji, D. Wu, W. Jiang, P. Ma, Z. Li, Y . Gao, S. Wang, and Y . Li, “Taming various privilege escalation in llm-based agent systems: A mandatory access control framework,” 2026. [Online]. Available: https://arxiv.org/abs/2601.11893

arXiv 2026
[31]

A formal security framework for mcp-based ai agents: Threat taxonomy, verification models, and defense mechanisms,

N. Acharya and G. K. Gupta, “A formal security framework for mcp-based ai agents: Threat taxonomy, verification models, and defense mechanisms,” 2026. [Online]. Available: https://arxiv.org/abs/ 2604.05969

Pith/arXiv arXiv 2026
[32]

Mesh memory protocol: Semantic infrastructure for multi-agent llm systems,

H. Xu, “Mesh memory protocol: Semantic infrastructure for multi-agent llm systems,” 2026. [Online]. Available: https://arxiv.org/abs/2604.19540

Pith/arXiv arXiv 2026
[33]

Memory poisoning attack and defense on memory based llm-agents,

B. D. Sunil, I. Sinha, P. Maheshwari, S. Todmal, S. Mallik, and S. Mishra, “Memory poisoning attack and defense on memory based llm-agents,” 2026. [Online]. Available: https://arxiv.org/abs/2601.05504

arXiv 2026
[34]

Drunkagent: Stealthy memory corruption in llm-powered recommender agents,

S. Yang, Z. Hu, X. Li, C. Wang, T. Yu, X. Xu, L. Zhu, and L. Yao, “Drunkagent: Stealthy memory corruption in llm-powered recommender agents,” 2025. [Online]. Available: https://arxiv.org/abs/2503.23804

arXiv 2025
[35]

Trustrag: Enhancing robustness and trustworthiness in retrieval-augmented generation,

H. Zhou, K.-H. Lee, Z. Zhan, Y . Chen, Z. Li, Z. Wang, H. Haddadi, and E. Yilmaz, “Trustrag: Enhancing robustness and trustworthiness in retrieval-augmented generation,” 2025. [Online]. Available: https://arxiv.org/abs/2501.00879

arXiv 2025
[36]

Visual inception: Compromising long-term planning in agentic recommenders via multimodal memory poisoning,

J. Qian, “Visual inception: Compromising long-term planning in agentic recommenders via multimodal memory poisoning,” 2026. [Online]. Available: https://arxiv.org/abs/2604.16966

Pith/arXiv arXiv 2026
[37]

Integrity considerations for secure computer systems,

K. J. Biba, “Integrity considerations for secure computer systems,” Technical Report MTR-3153 / ESD-TR-76-372, The MITRE Corporation, 1977

1977
[38]

A comparison of commercial and military computer security policies,

D. D. Clark and D. R. Wilson, “A comparison of commercial and military computer security policies,” inProc. IEEE Symp. on Security and Privacy (S&P), 1987, pp. 184–194

1987
[39]

A lattice model of secure information flow,

D. E. Denning, “A lattice model of secure information flow,”Commu- nications of the ACM, vol. 19, no. 5, pp. 236–243, 1976

1976
[40]

A decentralized model for information flow control,

A. C. Myers and B. Liskov, “A decentralized model for information flow control,” inProc. ACM Symp. on Operating Systems Principles (SOSP), 1997, pp. 129–142

1997
[41]

The sybil attack,

J. R. Douceur, “The sybil attack,” inProc. 1st Int. Workshop on Peer- to-Peer Systems (IPTPS), LNCS 2429, 2002, pp. 251–260

2002
[42]

The foundations for provenance on the web,

L. Moreau, “The foundations for provenance on the web,”Foundations and Trends in Web Science, vol. 2, no. 2–3, pp. 99–241, 2010. APPENDIXA BASELINEFIDELITY Each baseline is thestrongest faithfulinstance of its defense class, and its authorization rule is deterministic and reproduced verbatim from the released harness. •noneauthorizes every action (undefen...

2010

[1] [1]

Memgpt: Towards llms as operating systems,

C. Packer, S. Wooders, K. Lin, V . Fang, S. G. Patil, I. Stoica, and J. E. Gonzalez, “Memgpt: Towards llms as operating systems,” 2024. [Online]. Available: https://arxiv.org/abs/2310.08560

Pith/arXiv arXiv 2024

[2] [2]

Mem0: Building production-ready ai agents with scalable long-term memory,

P. Chhikara, D. Khant, S. Aryan, T. Singh, and D. Yadav, “Mem0: Building production-ready ai agents with scalable long-term memory,”

[3] [3]

Available: https://arxiv.org/abs/2504.19413

[Online]. Available: https://arxiv.org/abs/2504.19413

Pith/arXiv arXiv

[4] [4]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” 2023. [Online]. Available: https://arxiv.org/abs/2304.03442

Pith/arXiv arXiv 2023

[5] [5]

Memmachine: A ground-truth-preserving memory system for personalized ai agents,

S. Wang, E. Yu, O. Love, T. Zhang, T. Wong, S. Scargall, and C. Fan, “Memmachine: A ground-truth-preserving memory system for personalized ai agents,” 2026. [Online]. Available: https://arxiv.org/abs/2604.04853

Pith/arXiv arXiv 2026

[6] [6]

Hidden in memory: Sleeper memory poisoning in llm agents,

S. Pulipaka, S. Hlebik, L. Raghav, S. Abdelnabi, V . Raina, I. Sheth, and M. Fritz, “Hidden in memory: Sleeper memory poisoning in llm agents,” 2026. [Online]. Available: https://arxiv.org/abs/2605.15338

Pith/arXiv arXiv 2026

[7] [7]

Sleeper agents: Training deceptive llms that persist through safety training,

E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D. M. Ziegler, T. Maxwell, N. Cheng, A. Jermyn, A. Askell, A. Radhakrishnan, C. Anil, D. Duvenaud, D. Ganguli, F. Barez, J. Clark, K. Ndousse, K. Sachan, M. Sellitto, M. Sharma, N. DasSarma, R. Grosse, S. Kravec, Y . Bai, Z. Witten, M. Favaro, J. Brauner, H. Karnofsky, P. Chris...

Pith/arXiv arXiv 2024

[8] [8]

From storage to steering: Memory control flow attacks on llm agents,

Z. Xu, X. Zhu, Y . Yao, M. Xue, and Y . Song, “From storage to steering: Memory control flow attacks on llm agents,” 2026. [Online]. Available: https://arxiv.org/abs/2603.15125

Pith/arXiv arXiv 2026

[9] [9]

Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,

W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2402.07867

arXiv 2024

[10] [10]

Memory injection attacks on llm agents via query-only interaction,

S. Dong, S. Xu, P. He, Y . Li, J. Tang, T. Liu, H. Liu, and Z. Xiang, “Memory injection attacks on llm agents via query-only interaction,”

[11] [11]

Available: https://arxiv.org/abs/2503.03704

[Online]. Available: https://arxiv.org/abs/2503.03704

arXiv

[12] [12]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,

Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,”

[13] [13]

Available: https://arxiv.org/abs/2407.12784

[Online]. Available: https://arxiv.org/abs/2407.12784

arXiv

[14] [14]

Watch out for your agents! investigating backdoor threats to llm-based agents,

W. Yang, X. Bi, Y . Lin, S. Chen, J. Zhou, and X. Sun, “Watch out for your agents! investigating backdoor threats to llm-based agents,” 2024. [Online]. Available: https://arxiv.org/abs/2402.11208

arXiv 2024

[15] [15]

A systematic survey of security threats and defenses in llm-based ai agents: A layered attack surface framework,

K. Chu, “A systematic survey of security threats and defenses in llm-based ai agents: A layered attack surface framework,” 2026. [Online]. Available: https://arxiv.org/abs/2604.23338

Pith/arXiv arXiv 2026

[16] [16]

A survey on long-term memory security in llm agents: Attacks, defenses, and governance across the memory lifecycle,

Z. Lin, X. Hao, R. Fu, S. Cui, K. Chen, C. Li, Z. Li, and F. Xiong, “A survey on long-term memory security in llm agents: Attacks, defenses, and governance across the memory lifecycle,” 2026. [Online]. Available: https://arxiv.org/abs/2604.16548

Pith/arXiv arXiv 2026

[17] [17]

Memlineage: Lineage-guided enforcement for llm agent memory,

C. Ouyang and R. Hou, “Memlineage: Lineage-guided enforcement for llm agent memory,” 2026. [Online]. Available: https://arxiv.org/abs/ 2605.14421

Pith/arXiv arXiv 2026

[18] [18]

Nonmalleable information flow control,

E. Cecchetti, A. C. Myers, and O. Arden, “Nonmalleable information flow control,” inProc. ACM SIGSAC Conf. on Computer and Commu- nications Security (CCS), 2017, arXiv:1708.08596

arXiv 2017

[19] [19]

Security analysis of agentic ai communication protocols: A comparative evaluation,

Y . Louck, A. Dvir, and A. Stulman, “Security analysis of agentic ai communication protocols: A comparative evaluation,”ACM Transactions on AI Security and Privacy, 2025

2025

[20] [20]

Improving google a2a protocol: Protecting sensitive data and mitigating unintended harms in multi-agent systems,

Y . Louck, A. Stulman, and A. Dvir, “Improving google a2a protocol: Protecting sensitive data and mitigating unintended harms in multi-agent systems,”ACM Transactions on Software Engineering and Methodology, 2025. 12

2025

[21] [21]

Superlocalmemory: Privacy-preserving multi-agent memory with bayesian trust defense against memory poisoning,

V . P. Bhardwaj, “Superlocalmemory: Privacy-preserving multi-agent memory with bayesian trust defense against memory poisoning,” 2026. [Online]. Available: https://arxiv.org/abs/2603.02240

arXiv 2026

[22] [22]

Memmorph: Tool hijacking in llm agents via memory poisoning,

X. Zhang, Y . Zheng, Z. Xu, K. Zhou, B. Shen, H. Ou, T. Zhang, and K.-Y . Lam, “Memmorph: Tool hijacking in llm agents via memory poisoning,” 2026. [Online]. Available: https://arxiv.org/abs/2605.26154

Pith/arXiv arXiv 2026

[23] [23]

Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,

S. S. Srivastava and H. He, “Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,” 2025. [Online]. Available: https://arxiv.org/abs/2512.16962

arXiv 2025

[24] [24]

Trojan hippo: Weaponizing agent memory for data exfiltration,

D. Das, J. Piet, D. Kaviani, L. Beurer-Kellner, F. Tram `er, and D. Wagner, “Trojan hippo: Weaponizing agent memory for data exfiltration,” 2026. [Online]. Available: https://arxiv.org/abs/2605.01970

Pith/arXiv arXiv 2026

[25] [25]

Hijacking agent memory: Stealthy trojan attacks through conversational interaction,

H. Wang, S. Yang, Y . Chen, and P. Liu, “Hijacking agent memory: Stealthy trojan attacks through conversational interaction,” 2026. [Online]. Available: https://arxiv.org/abs/2605.29960

Pith/arXiv arXiv 2026

[26] [26]

Defeating prompt injections by design,

E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tram `er, “Defeating prompt injections by design,” 2025. [Online]. Available: https://arxiv.org/abs/2503.18813

Pith/arXiv arXiv 2025

[27] [27]

Securing ai agents with information-flow control,

M. Costa, B. K ¨opf, A. Kolluri, A. Paverd, M. Russinovich, A. Salem, S. Tople, L. Wutschitz, and S. Zanella-B ´eguelin, “Securing ai agents with information-flow control,” 2025. [Online]. Available: https://arxiv.org/abs/2505.23643

Pith/arXiv arXiv 2025

[28] [28]

Ghost in the agent: Redefining information flow tracking for llm agents,

Y . Cai, W. Tang, C. Wen, and S. Qin, “Ghost in the agent: Redefining information flow tracking for llm agents,” 2026. [Online]. Available: https://arxiv.org/abs/2604.23374

Pith/arXiv arXiv 2026

[29] [29]

Selection integrity for llm graph memory: An accumulability criterion for information-flow-blind retrieval,

Z. Fei, H. Fei, X. Wang, Y . Yang, P. Gope, B. Sikdar, and Y . Zhang, “Selection integrity for llm graph memory: An accumulability criterion for information-flow-blind retrieval,” 2026. [Online]. Available: https://arxiv.org/abs/2606.12290

Pith/arXiv arXiv 2026

[30] [30]

Taming various privilege escalation in llm-based agent systems: A mandatory access control framework,

Z. Ji, D. Wu, W. Jiang, P. Ma, Z. Li, Y . Gao, S. Wang, and Y . Li, “Taming various privilege escalation in llm-based agent systems: A mandatory access control framework,” 2026. [Online]. Available: https://arxiv.org/abs/2601.11893

arXiv 2026

[31] [31]

A formal security framework for mcp-based ai agents: Threat taxonomy, verification models, and defense mechanisms,

N. Acharya and G. K. Gupta, “A formal security framework for mcp-based ai agents: Threat taxonomy, verification models, and defense mechanisms,” 2026. [Online]. Available: https://arxiv.org/abs/ 2604.05969

Pith/arXiv arXiv 2026

[32] [32]

Mesh memory protocol: Semantic infrastructure for multi-agent llm systems,

H. Xu, “Mesh memory protocol: Semantic infrastructure for multi-agent llm systems,” 2026. [Online]. Available: https://arxiv.org/abs/2604.19540

Pith/arXiv arXiv 2026

[33] [33]

Memory poisoning attack and defense on memory based llm-agents,

B. D. Sunil, I. Sinha, P. Maheshwari, S. Todmal, S. Mallik, and S. Mishra, “Memory poisoning attack and defense on memory based llm-agents,” 2026. [Online]. Available: https://arxiv.org/abs/2601.05504

arXiv 2026

[34] [34]

Drunkagent: Stealthy memory corruption in llm-powered recommender agents,

S. Yang, Z. Hu, X. Li, C. Wang, T. Yu, X. Xu, L. Zhu, and L. Yao, “Drunkagent: Stealthy memory corruption in llm-powered recommender agents,” 2025. [Online]. Available: https://arxiv.org/abs/2503.23804

arXiv 2025

[35] [35]

Trustrag: Enhancing robustness and trustworthiness in retrieval-augmented generation,

H. Zhou, K.-H. Lee, Z. Zhan, Y . Chen, Z. Li, Z. Wang, H. Haddadi, and E. Yilmaz, “Trustrag: Enhancing robustness and trustworthiness in retrieval-augmented generation,” 2025. [Online]. Available: https://arxiv.org/abs/2501.00879

arXiv 2025

[36] [36]

Visual inception: Compromising long-term planning in agentic recommenders via multimodal memory poisoning,

J. Qian, “Visual inception: Compromising long-term planning in agentic recommenders via multimodal memory poisoning,” 2026. [Online]. Available: https://arxiv.org/abs/2604.16966

Pith/arXiv arXiv 2026

[37] [37]

Integrity considerations for secure computer systems,

K. J. Biba, “Integrity considerations for secure computer systems,” Technical Report MTR-3153 / ESD-TR-76-372, The MITRE Corporation, 1977

1977

[38] [38]

A comparison of commercial and military computer security policies,

D. D. Clark and D. R. Wilson, “A comparison of commercial and military computer security policies,” inProc. IEEE Symp. on Security and Privacy (S&P), 1987, pp. 184–194

1987

[39] [39]

A lattice model of secure information flow,

D. E. Denning, “A lattice model of secure information flow,”Commu- nications of the ACM, vol. 19, no. 5, pp. 236–243, 1976

1976

[40] [40]

A decentralized model for information flow control,

A. C. Myers and B. Liskov, “A decentralized model for information flow control,” inProc. ACM Symp. on Operating Systems Principles (SOSP), 1997, pp. 129–142

1997

[41] [41]

The sybil attack,

J. R. Douceur, “The sybil attack,” inProc. 1st Int. Workshop on Peer- to-Peer Systems (IPTPS), LNCS 2429, 2002, pp. 251–260

2002

[42] [42]

The foundations for provenance on the web,

L. Moreau, “The foundations for provenance on the web,”Foundations and Trends in Web Science, vol. 2, no. 2–3, pp. 99–241, 2010. APPENDIXA BASELINEFIDELITY Each baseline is thestrongest faithfulinstance of its defense class, and its authorization rule is deterministic and reproduced verbatim from the released harness. •noneauthorizes every action (undefen...

2010