MRMMIA is a multi-recall-probe membership inference attack that extracts signals from chat agent memory and outperforms baselines in black-, gray-, and white-box settings.
hub Canonical reference
A Survey on the Memory Mechanism of Large Language Model based Agents
Canonical reference. 92% of citing Pith papers cite this work as background.
abstract
Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Compared with original LLMs, LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems that need long-term and complex agent-environment interactions. The key component to support agent-environment interactions is the memory of the agents. While previous studies have proposed many promising memory mechanisms, they are scattered in different papers, and there lacks a systematical review to summarize and compare these works from a holistic perspective, failing to abstract common and effective designing patterns for inspiring future studies. To bridge this gap, in this paper, we propose a comprehensive survey on the memory mechanism of LLM-based agents. In specific, we first discuss ''what is'' and ''why do we need'' the memory in LLM-based agents. Then, we systematically review previous studies on how to design and evaluate the memory module. In addition, we also present many agent applications, where the memory module plays an important role. At last, we analyze the limitations of existing work and show important future directions. To keep up with the latest advances in this field, we create a repository at \url{https://github.com/nuster1128/LLM_Agent_Memory_Survey}.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Belief Engine is a configurable belief-update mechanism for multi-agent LLM systems that uses structured argument extraction and log-odds stance updates to make evidence-grounded deliberation inspectable and controllable.
All tested LLM memory systems fail at dependency reasoning in multi-entity evolving scenarios, with only an expensive file-based setup showing partial recovery.
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
AEL uses a fast-timescale bandit for memory policy selection and slow-timescale LLM reflection for causal insights, achieving a Sharpe ratio of 2.13 on a 208-episode portfolio benchmark while showing that added mechanisms degrade performance.
PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
MemEvolve jointly evolves agent experiential knowledge and memory architectures via a modular codebase, delivering up to 17% gains on agent benchmarks with cross-task and cross-model generalization.
Empirical study of open-source AI agents shows testing effort concentrates on deterministic tools and workflows (over 70%) while the FM-based plan body gets under 5% and prompts appear in only 1% of tests.
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
Relabeling an identical erroneous claim from the model's own thought role to an external chat role increases explicit correction rates by 23-93 percentage points across 13 model-domain cells, indicating a chat-template artifact rather than a cognitive deficit.
RAMPART is a registry-based memory system for LLM agents with priority-aware primitives that experimentally demonstrates position-dependent performance cliffs and benefits from block grouping and relevance gating.
Introduces a benchmark with 34,560 instances for selective QA over conflicting multi-source personal memory and compares fusion methods against LLMs.
Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.
MedExpMem lets VLM diagnostic agents store and retrieve experience from past failures as pairwise differential notes, producing up to 7% accuracy gains on a multi-subspecialty radiology benchmark.
Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.
A hybrid LLM-symbolic verifier maintains a dependency graph over conversation turns classified into eight formal update operations, enabling linear-time groundedness checks and precise retraction propagation with a conflict-free guarantee.
Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.
SkillLens organizes skills into policies-strategies-procedures-primitives layers, retrieves via degree-corrected random walk, and uses a verifier for local adaptation, yielding up to 6.31 pp gains on MuLocbench and raising ALFWorld success from 45% to 51.31%.
HiGMem combines hierarchical event-turn memory with LLM-guided selection to retrieve concise relevant evidence from long dialogues, improving F1 scores and cutting retrieved turns by an order of magnitude on the LoCoMo10 benchmark.
TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.
AgentSociety is a large-scale LLM agent-based social simulator validated on polarization, UBI, disasters, and sustainability issues with alignment to real experiments.
The survey organizes RAG methods via a taxonomy of query-based, logits-based, latent, and parametric fusion with comparisons on accessibility, efficiency, applications, and challenges.
citing papers explorer
-
AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society
AgentSociety is a large-scale LLM agent-based social simulator validated on polarization, UBI, disasters, and sustainability issues with alignment to real experiments.