arxiv: 2604.16548 · v1 · submitted 2026-04-17 · 💻 cs.CR · cs.AI· cs.CL

Recognition: unknown

A Survey on the Security of Long-Term Memory in LLM Agents: Toward Mnemonic Sovereignty

Zehao Lin , Chunyu Li , Kai Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:49 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CL

keywords memorygovernancesecurityagentagentsarchitecturesavailabilityconfidentiality

0 comments

The pith

The survey maps security vulnerabilities in LLM agent memory across write-store-retrieve-execute-share-forget phases and advocates for mnemonic sovereignty to enable verifiable control over memory operations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language model agents are AI systems that can hold onto information over many interactions, not just within one chat. This persistent memory lets them learn from past experiences but also opens them up to new kinds of attacks. The survey breaks memory management into six main phases: writing new information into memory, storing it safely, retrieving it when needed, using it to guide actions, sharing it with other agents or systems, and forgetting or rolling back changes when appropriate. Against these phases, the authors look at four security goals: keeping the memory accurate and untampered (integrity), keeping it private (confidentiality), making sure it is accessible when wanted (availability), and having proper oversight and control (governance). They review existing work on attacks like poisoning the memory during writing, extracting secret information, corrupting retrieval, hijacking control through memory, spreading bad memory across agents, and issues with rollback. The key observations are that most studies focus on integrity problems at write and retrieve times, while confidentiality, availability, and the store and forget phases get less attention. Also, no current system design handles all the governance needs they list. They suggest that using the LLMs themselves to help secure memory is an underused but promising approach. Overall, they argue that future agents will stand out based on how well they govern their memory, not just how much they can remember. This concept is called mnemonic sovereignty, meaning the ability to control and verify memory operations in a trustworthy way.

Core claim

Three findings stand out: the literature concentrates on write- and retrieve-time integrity attacks, while confidentiality, availability, store/forget, and benign-persistence failures remain sparsely studied; no published architecture covers all nine governance primitives we identify; and using LLMs themselves for memory security remains sparse yet essential.

Load-bearing premise

The six-phase memory-lifecycle framework comprehensively captures all relevant security aspects of agent memory, and the identified literature gaps accurately reflect the state of the field without systematic search methodology details.

Figures

Figures reproduced from arXiv: 2604.16548 by Chunyu Li, Kai Chen, Zehao Lin.

**Figure 2.** Figure 2: Five correspondences between human-memory mechanisms (left) and LLM-agent memory-security [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: The six-phase memory lifecycle of an LLM agent. Each phase introduces distinct security questions; [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Escalation of write-path attacks on LLM-agent memory, 2023–2026. The vertical axis shows the attacker [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: End-to-end attack chain for memory-augmented LLM agents. A manipulated web observation is [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

**Figure 6.** Figure 6: Coverage of the nine mnemonic-sovereignty primitives by six representative memory architectures. [PITH_FULL_IMAGE:figures/full_fig_p047_6.png] view at source ↗

**Figure 7.** Figure 7: The five primitives of mnemonic sovereignty, rendered as a dependency tower. Each layer requires all [PITH_FULL_IMAGE:figures/full_fig_p051_7.png] view at source ↗

read the original abstract

Research on large language model (LLM) security is shifting from "will the model leak training data" to a more consequential question: can an agent with persistent, long-term memory be continuously shaped, cross-session poisoned, accessed without authorization, and propagated across shared organizational state? Recent surveys cover memory architectures and agent mechanisms, but fewer center the epistemic and governance properties of persistent, writable memory as the reason memory is an independent security problem. This survey addresses that gap. Drawing on cognitive neuroscience and the philosophy of memory, we characterize agent memory as malleable, rewritable, and socially propagating, and develop a memory-lifecycle framework organized around six phases -- Write, Store, Retrieve, Execute, Share, Forget/Rollback -- cross-tabulated against four security objectives: integrity, confidentiality, availability, governance. We organize the literature on memory poisoning, extraction, retrieval corruption, control-flow hijacking, cross-agent propagation, rollback, and governance, and situate representative architectures as determinants of which phases are explicitly governable. Three findings stand out: the literature concentrates on write- and retrieve-time integrity attacks, while confidentiality, availability, store/forget, and benign-persistence failures remain sparsely studied; no published architecture covers all nine governance primitives we identify; and using LLMs themselves for memory security remains sparse yet essential. We unify these under mnemonic sovereignty -- verifiable, recoverable governance over what may be written, who may read, when updates are authorized, and which states may be forgotten -- arguing future secure agents will be differentiated not only by recall capacity, but by memory governance quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey gives a clean lifecycle framework for LLM agent memory security and flags real gaps in the literature, but the gap claims rest on an undocumented search.

read the letter

The paper organizes security work on long-term memory in LLM agents into a six-phase lifecycle—Write, Store, Retrieve, Execute, Share, Forget/Rollback—crossed with integrity, confidentiality, availability, and governance. It introduces mnemonic sovereignty as the goal of verifiable control over what gets stored, read, updated, and erased. That framing is the main new piece: it pulls together poisoning, extraction, propagation, and rollback papers under one structure and notes that current agent architectures leave most of those primitives ungoverned.

Referee Report

2 major / 1 minor

Summary. The paper surveys security issues in long-term memory for LLM agents. It draws on cognitive neuroscience and philosophy of memory to characterize agent memory as malleable and socially propagating, proposes a six-phase lifecycle framework (Write, Store, Retrieve, Execute, Share, Forget/Rollback) cross-tabulated against four security objectives (integrity, confidentiality, availability, governance), reviews literature on attacks such as poisoning, extraction, and propagation as well as representative architectures, identifies concentrations and gaps in the literature, and introduces the concept of mnemonic sovereignty as verifiable governance over memory operations.

Significance. If the framework is comprehensive and the literature categorization representative, the survey could usefully organize research on an emerging security surface for persistent LLM agents and highlight under-studied areas such as confidentiality and governance primitives, potentially informing design of more secure agent systems differentiated by memory control quality.

major comments (2)

[Literature review and findings sections] The central findings that the literature concentrates on write- and retrieve-time integrity attacks while confidentiality, availability, store/forget, and benign-persistence failures remain sparsely studied, and that no published architecture covers all nine governance primitives, rest on the authors' literature categorization. The manuscript supplies no search protocol, databases, keywords, inclusion/exclusion criteria, or temporal bounds, so these gap claims cannot be distinguished from under-sampling by the review.
[Memory-lifecycle framework definition] The six-phase memory-lifecycle framework plus four objectives is used to organize the literature and to support the claim that no architecture covers all nine governance primitives. No derivation, coverage argument, or justification is provided for why these phases and objectives exhaust the relevant security surface; missing dimensions such as provenance, multi-agent consensus, or memory versioning would falsify the coverage claim.

minor comments (1)

[Abstract] The abstract refers to 'nine governance primitives' without clarifying how this number is obtained from the 6x4 framework; a brief derivation or table mapping would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The comments identify two areas where the manuscript can be strengthened through greater methodological transparency and explicit justification of the proposed framework. We address each point below and will incorporate the suggested revisions in the next version of the manuscript.

read point-by-point responses

Referee: [Literature review and findings sections] The central findings that the literature concentrates on write- and retrieve-time integrity attacks while confidentiality, availability, store/forget, and benign-persistence failures remain sparsely studied, and that no published architecture covers all nine governance primitives, rest on the authors' literature categorization. The manuscript supplies no search protocol, databases, keywords, inclusion/exclusion criteria, or temporal bounds, so these gap claims cannot be distinguished from under-sampling by the review.

Authors: We agree that the absence of an explicit search protocol limits the defensibility of the gap claims. In the revised manuscript we will insert a dedicated 'Literature Review Methodology' subsection. It will specify the databases queried (arXiv, Google Scholar, ACL Anthology, IEEE Xplore, and selected security conference proceedings), the keyword strings employed (including 'LLM agent long-term memory', 'memory poisoning', 'retrieval attack LLM agent', 'agent memory governance'), the inclusion criteria (works addressing persistent, cross-session memory in LLM-based agents, published or posted 2022–2024), and the exclusion criteria (short-context-only studies, non-agent systems, purely theoretical model papers without implementation). Temporal bounds will be justified by the emergence of production-grade agent frameworks after 2022. These additions will allow readers to evaluate sampling completeness and will reinforce rather than undermine the reported concentrations and gaps. revision: yes
Referee: [Memory-lifecycle framework definition] The six-phase memory-lifecycle framework plus four objectives is used to organize the literature and to support the claim that no architecture covers all nine governance primitives. No derivation, coverage argument, or justification is provided for why these phases and objectives exhaust the relevant security surface; missing dimensions such as provenance, multi-agent consensus, or memory versioning would falsify the coverage claim.

Authors: The six phases are adapted from canonical cognitive-neuroscience accounts of memory (encoding, storage, retrieval, execution, social transmission, and forgetting/rollback), while the four objectives extend the CIA triad with governance to capture control and accountability requirements specific to writable agent memory. We acknowledge that the current text provides no explicit derivation or completeness argument. In revision we will expand the framework section with a new subsection that (a) maps each phase to its cognitive and agentic counterpart, (b) justifies the four objectives as the minimal set needed to cover integrity, secrecy, liveness, and authorization, and (c) addresses the cited dimensions: provenance is subsumed under write-time integrity and governance primitives; multi-agent consensus is handled within the Share and Governance phases; versioning is treated as part of Store and Forget/Rollback. We will also note any residual gaps and, if warranted, augment the nine primitives rather than assert exhaustiveness without support. This revision will preserve the framework while making its coverage claims verifiable. revision: partial

Circularity Check

0 steps flagged

No circularity: survey framework and gap analysis are externally grounded

full rationale

This literature survey proposes a six-phase memory-lifecycle framework (Write, Store, Retrieve, Execute, Share, Forget/Rollback) cross-tabulated with four security objectives (integrity, confidentiality, availability, governance) drawn from cognitive neuroscience and philosophy of memory. The claim that no architecture covers all nine governance primitives follows directly from applying this externally motivated taxonomy to reviewed works; it does not reduce to a self-definition, fitted parameter, or self-citation chain. Gap assertions about sparsely studied areas likewise rest on the literature organization rather than any internal prediction or uniqueness theorem imported from the authors' prior work. No equations, derivations, or statistical fits appear, satisfying the default expectation of no significant circularity for a non-mathematical survey.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper relies on the domain assumption about the nature of memory and introduces a new conceptual entity without independent evidence.

axioms (1)

domain assumption Agent memory is malleable, rewritable, and socially propagating.
Stated in the abstract as characterization drawn from cognitive neuroscience and philosophy of memory.

invented entities (1)

mnemonic sovereignty no independent evidence
purpose: To provide verifiable, recoverable governance over memory operations in LLM agents.
Introduced as a unifying concept for future secure agents, without external validation or falsifiable prediction in the abstract.

pith-pipeline@v0.9.0 · 5595 in / 1394 out tokens · 82678 ms · 2026-05-10T08:49:11.806030+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
cs.CR 2026-04 unverdicted novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study
cs.CR 2026-04 conditional novelty 4.0

The survey organizes security threats and defenses in autonomous LLM agents into four layers and identifies that risks can propagate across layers from inputs to ecosystem impacts.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages · cited by 2 Pith papers

[1]

doi:10.20944/preprints202601.0618.v2 ICLR 2026 Workshop on Memory for LLM-Based Agentic Systems (MemAgents)

From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms.Preprints.org 202601.0618(2026). doi:10.20944/preprints202601.0618.v2 ICLR 2026 Workshop on Memory for LLM-Based Agentic Systems (MemAgents). Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating very long-term ...

work page doi:10.20944/preprints202601.0618.v2 2026
[2]

Helen Nissenbaum

VerificAgent: Domain-specific memory verification for scalable oversight of aligned computer-use agents.arXiv preprint arXiv:2506.02539(2025). Helen Nissenbaum. 2004. Privacy as contextual integrity.Washington Law Review79, 1 (2004), 119–157. Eric T Olson. 2023. Personal Identity. InThe Stanford Encyclopedia of Philosophy, Edward N Zalta (Ed.). OWASP Foun...

work page doi:10.1145/3708359.3712112 2025
[3]

Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, and Zhenhua Dong

Memory poisoning attack and defense on memory based LLM-agents.arXiv preprint arXiv:2601.05504(2026). Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, and Zhenhua Dong. 2025b. MemBench: Towards more comprehensive evaluation on the memory of LLM-based agents. InFindings of the Annual Meeting of the Association for Computational Linguistics (ACL). arXi...

work page doi:10.1016/j.inffus.2025.103941 2026
[4]

Ghost of the past

BenchPreS: A benchmark for context-aware personalized preference selectivity of persistent-memory LLMs.arXiv preprint arXiv:2603.16557(2026). Zhongming Yu, Naicheng Yu, Hejia Zhang, et al. 2026. Multi-agent memory from a computer architecture perspective: Visions and challenges ahead.arXiv preprint arXiv:2603.10062(2026). Yosif Zaki and Denise J Cai. 2025...

work page arXiv 2026