Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory
Pith reviewed 2026-06-29 21:19 UTC · model grok-4.3
The pith
Long-term AI agent memory requires state-level operators and correctness conditions on evolving trajectories, not record-level database operations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Current agent memory systems localize correctness at records, embeddings, or edges and therefore produce unregulated growth, missing semantic revision, capacity-driven forgetting, and read-only retrieval. The paper formalizes long-term agent memory as Governed Evolving Memory (GEM), a workload whose correctness is a property of the state trajectory. GEM replaces record-level operations with four state-level operators—ingestion, revision, forgetting, and retrieval—governed by six correctness conditions. Three structural observations establish that no record-level system can satisfy these conditions, and a property-graph prototype called MemState shows the abstraction is implementable but poin
What carries the argument
Governed Evolving Memory (GEM), which replaces record-level database operations with four state-level operators (ingestion, revision, forgetting, retrieval) whose behavior is constrained by six correctness conditions on the evolution of the full memory state.
If this is right
- Memory correctness must be judged on properties of the state trajectory rather than on any single record.
- The four state-level operators become the only allowed ways to change agent memory.
- No existing record-level storage model can meet the six conditions for long-term agent memory.
- A working prototype on a property-graph backend is feasible but reveals the gap to a native implementation.
- Memory-centric data management emerges as a distinct workload with its own research agenda.
Where Pith is reading between the lines
- Database engines could be redesigned around explicit state-evolution rules rather than query and transaction primitives.
- Auditing and compliance features for AI agents would follow directly from the six conditions on state history.
- The same state-level framing might apply to other long-running autonomous systems that must revise or forget past observations.
Load-bearing premise
The six correctness conditions are both necessary and sufficient for long-term agent memory and cannot be satisfied by any extension of record-level database operations.
What would settle it
A concrete demonstration that some record-level database, possibly extended, can maintain an agent's memory state while satisfying all six correctness conditions without exhibiting unregulated growth, missing semantic revisions, capacity-driven forgetting, or read-only retrieval.
Figures
read the original abstract
Long-running AI agents need persistent memory. Memory supports learning across sessions, reduces repeated context injection, and enables auditing of past decisions. Current agent memory systems and database paradigms treat memory as storage. They localize correctness at records, embeddings, or edges. Each supplies only some of the capabilities that long-term memory requires. The result is four recurring failure modes: unregulated growth, missing semantic revision, capacity-driven forgetting, and read-only retrieval. In our vision, long-term agent memory is a new data-management workload. Its correctness is a property of the state trajectory, not of individual records. We formalize this as Governed Evolving Memory (GEM). GEM replaces record-level database operations with four state-level operators: ingestion, revision, forgetting, and retrieval. Six correctness conditions govern how the state evolves. Three structural observations establish that no record-level system can satisfy these conditions, regardless of the storage model. We realize the abstraction in MemState, a prototype on a property-graph backend. MemState validates feasibility and exposes the gap to a native engine. We outline three research directions that define memory-centric data management as a workload.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that existing database and agent memory systems fail for long-term AI agents due to four recurring modes (unregulated growth, missing semantic revision, capacity-driven forgetting, read-only retrieval) because they localize correctness at the record level. It proposes Governed Evolving Memory (GEM) as a new workload whose correctness is defined on state trajectories via four state-level operators (ingestion, revision, forgetting, retrieval) and six governing conditions; three structural observations are said to prove that no record-level system (regardless of storage model) can satisfy them. The abstraction is realized in the MemState prototype on a property-graph backend, which is presented as validating feasibility while exposing the gap to a native engine; three research directions are outlined.
Significance. If the structural observations are shown to be rigorous and the prototype is demonstrated to satisfy the six conditions, the work would establish memory-centric data management as a distinct workload and motivate new engine designs beyond record-oriented storage. The explicit enumeration of failure modes and the shift from record-level to state-level operators supply a concrete vocabulary that could guide subsequent systems research.
major comments (1)
- [Abstract / structural observations section] Abstract (and the section presenting the three structural observations): the claim that "no record-level system can satisfy these conditions, regardless of the storage model" is load-bearing for the central impossibility result, yet MemState is realized on a property-graph backend whose primitives remain node/edge CRUD operations on records. The manuscript supplies neither a formal definition of "record-level" that would exclude this backend nor an argument showing how an intervening governance layer evades the observations; this leaves the impossibility result and the prototype in unresolved tension.
minor comments (2)
- [Abstract] The six correctness conditions are referenced repeatedly but never enumerated in the abstract or early sections; listing them explicitly would improve readability.
- [MemState prototype section] The prototype validation is described only at a high level; a brief description of how the four operators map onto the property-graph primitives would clarify the feasibility claim.
Simulated Author's Rebuttal
We thank the referee for identifying the tension between the impossibility claim and the MemState prototype. The comment correctly notes the absence of a formal definition of 'record-level' and an explicit argument for how governance evades the structural observations. We will revise the manuscript to supply both.
read point-by-point responses
-
Referee: [Abstract / structural observations section] Abstract (and the section presenting the three structural observations): the claim that "no record-level system can satisfy these conditions, regardless of the storage model" is load-bearing for the central impossibility result, yet MemState is realized on a property-graph backend whose primitives remain node/edge CRUD operations on records. The manuscript supplies neither a formal definition of "record-level" that would exclude this backend nor an argument showing how an intervening governance layer evades the observations; this leaves the impossibility result and the prototype in unresolved tension.
Authors: We agree the manuscript must define 'record-level system' precisely and show why the governance layer does not contradict the observations. We define a record-level system as one whose interface and correctness are expressed exclusively via operations on individual records (nodes, edges, tuples) without reference to an agent's state trajectory. The three structural observations demonstrate that the six GEM conditions cannot be satisfied by any such interface, because they require trajectory-level semantics (e.g., revision that may invalidate prior records, forgetting that is not capacity-driven). MemState uses the property-graph backend solely as a storage substrate; the governance layer implements the four state-level operators and enforces the conditions. This architecture does not violate the observations, as the backend alone is not the system under consideration. We will insert the formal definition into the structural observations section and add an explicit paragraph explaining the separation between substrate and governance layer. revision: yes
Circularity Check
No significant circularity; derivation is conceptual and self-contained.
full rationale
The paper advances a conceptual redefinition of agent memory as a state-trajectory workload governed by six correctness conditions and four operators. No equations, fitted parameters, or self-citations appear in the abstract or described derivation. The three structural observations are presented as independent arguments against record-level systems; GEM is introduced as a new abstraction rather than derived from prior self-work. The MemState prototype on a property-graph backend is described as feasibility validation, not as a step that reduces the impossibility claim to its own inputs. No pattern from the enumerated circularity kinds is exhibited by direct quotation and reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Long-term agent memory correctness must be defined over state trajectories rather than individual records
invented entities (2)
-
Governed Evolving Memory (GEM)
no independent evidence
-
MemState
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Renzo Angles and Claudio Gutierrez. 2008. Survey Of Graph Database Models. ACM Computing Surveys (CSUR)40, 1 (2008), 1–39. https://dl.acm.org/doi/ pdf/10.1145/1322432.1322433
-
[2]
Anthropic. 2025. Claude.Anthropic(2025).https://www.anthropic.com
2025
-
[3]
Anthropic. 2025. Claude Code.Anthropic(2025). https://www.anthropic. com/claude-code
2025
-
[4]
Anysphere. 2025. Cursor.Anysphere(2025).https://cursor.com
2025
-
[5]
Stefano Ceri and Jennifer Widom. 1990. Deriving Production Rules for Constraint Maintenance. InProceedings of the International Conference on Very Large Data Bases (VLDB). 566–577
1990
-
[6]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wal- lach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A Distributed Storage System For Structured Data.ACM Transactions on Computer Systems (TOCS)26, 2 (2008), 1–26. https://dl.acm.org/doi/ pdf/10.1145/1365815.1365816
-
[7]
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav
-
[8]
Mem0: Building Production-Ready AI Agents With Scalable Long-Term Memory.arXiv Preprint(2025).https://arxiv.org/pdf/2504.19413
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [9]
-
[10]
CrewAI Inc. 2025. CrewAI: A Framework For Building Role-Based Multi-Agent Systems With LLMs.https://www.crewai.com/. Accessed: 2026
2025
-
[11]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s Highly Available Key-Value Store.ACM SIGOPS Operating Systems Review41, 6 (2007), 205–220. https: //dl.acm.org/doi/pdf/10.1145/1323293.1294281
-
[12]
2016.Fundamentals Of Database Systems
Ramez Elmasri and Shamkant Navathe. 2016.Fundamentals Of Database Systems. Pearson
2016
-
[13]
Ronald Fagin, Phokion G Kolaitis, Renee J Miller, and Lucian Popa. 2005. Data Exchange: Semantics and Query Answering.Theoretical Computer Science(2005)
2005
-
[14]
Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, and Semih Salihoğlu. 2023. Kùzu Graph Database Management System. InConference on Innovative Data Systems Research (CIDR). https://vldb.org/cidrdb/papers/2023/p48-jin. pdf
2023
-
[15]
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lin- daaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andres Taylor. 2018. Cypher: An Evolving Query Language For Property Graphs. InProceedings of the ACM SIGMOD International Conference on Manage- ment of Data (SIGMOD). 1433–1445. https://dl.acm.org/doi/pdf...
- [16]
-
[17]
Yuanzhe Hu, Yu Wang, and Julian McAuley. 2025. Evaluating Memory In LLM Agents Via Incremental Multi-Turn Interactions. InProceedings of the ICML 2025 Workshop on Long-Context Foundation Models (ICML). https://openreview. net/forum?id=ZgQ0t3zYTQ
2025
-
[18]
Jensen and Richard T
Christian S. Jensen and Richard T. Snodgrass. 2002. Temporal Data Manage- ment.IEEE Transactions on Knowledge and Data Engineering11, 1 (2002), 36–
2002
- [19]
-
[20]
Haridimos Kondylakis, Stefania Dumbrava, Matteo Lissandrini, Nikolay Yakovets, Angela Bonifati, Vasilis Efthymiou, George Fletcher, Dimitris Plexousakis, Ric- cardo Tommasini, Georgia Troullinou, et al. 2025. Property Graph Standards: State of the Art and Open Challenges.Proc. VLDB Endowment (PVLDB)(2025)
2025
-
[21]
LangChain Inc. 2026. LangGraph: A Library For Building Multi-Agent Work- flows With LLMs. https://docs.langchain.com/oss/python/langgraph/. Accessed: 2026
2026
-
[22]
Zhiyu Li, Shichao Song, Chenyang Xi, Hanyu Wang, Chen Tang, Simin Niu, Ding Chen, Qingchen Yang, Pengyuan Yu, and Jiahao Huo. 2025. MemOS: A Memory OS For AI System.arXiv Preprint(2025). https://arxiv.org/pdf/2507.03724
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, and Mohit Bansal. 2024. Evaluating Very Long-Term Conversational Memory Of LLM Agents. InProceed- ings of the Annual Meeting of the Association for Computational Linguistics (ACL). 13851–13870.https://aclanthology.org/2024.acl-long.747.pdf
2024
-
[24]
Malkov and Dmitry A
Yu A. Malkov and Dmitry A. Yashunin. 2018. Efficient And Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence42, 4 (2018), 824–836. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8594636
2018
-
[25]
Thomas Neumann and Gerhard Weikum. 2010. x-RDF-3X: Fast Querying, High Update Rates, And Consistency For RDF Databases.Proceedings of the VLDB Endowment (PVLDB)3, 1-2 (2010), 256–263. https://vldb.org/pvldb/vol3/ R22.pdf
2010
-
[26]
Dan Olteanu. 2024. Recent Increments in Incremental View Maintenance. In PODS. 12–25.https://doi.org/10.1145/3635138.3654763
-
[27]
Reham Omar, Ishika Dhall, Panos Kalnis, and Essam Mansour. 2023. A Universal Question-Answering Platform For Knowledge Graphs.Proceedings of the ACM on Management of Data (SIGMOD)1, 1 (2023), 1–25. https://dl.acm.org/ doi/pdf/10.1145/3588696
-
[28]
Reham Omar, Abdelghny Orogat, Ibrahim Abdelaziz, Omij Mangukiya, Panos Kalnis, and Essam Mansour. 2026. Chatty-KG: A Multi-Agent AI System For On-Demand Conversational Question Answering Over Knowledge Graphs. Proceedings of the ACM on Management of Data (SIGMOD)(2026). https: //dl.acm.org/doi/abs/10.1145/3786632
-
[29]
OpenAI. 2025. ChatGPT.OpenAI(2025).https://chat.openai.com
2025
-
[30]
OpenAI. 2025. OpenAI Agents SDK: A Python framework for building and orches- trating multi-agent systems. https://openai.github.io/openai-agents- python/. Accessed: Nov. 2025
2025
-
[31]
MemGPT: Towards LLMs as Operating Systems
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs As Operating Systems.arXiv Preprint(2023).https://arxiv.org/pdf/2310.08560
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
James Jie Pan, Jianguo Wang, and Guoliang Li. 2024. Survey Of Vector Database Management Systems.The VLDB Journal33, 5 (2024), 1591–1615. https: //doi.org/10.1007/s00778-024-00864-x
-
[33]
James Jie Pan, Jianguo Wang, and Guoliang Li. 2024. Vector Database Man- agement Techniques And Systems. InProceedings of the ACM SIGMOD In- ternational Conference on Management of Data (SIGMOD). 597–604. https: //dl.acm.org/doi/pdf/10.1145/3626246.3654691
-
[34]
Generative Agents: Interactive Simulacra of Human Behavior
Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the ACM Symposium on User Interface Software and Technology (UIST).https://arxiv.org/pdf/2304.03442
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[35]
Jorge Perez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics And Com- plexity Of SPARQL.ACM Transactions on Database Systems (TODS)34, 3 (2009), 1–45.https://dl.acm.org/doi/pdf/10.1145/1567274.1567278
-
[36]
Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. 2025. Zep: A Temporal Knowledge Graph Architecture For Agent Mem- ory.arXiv Preprint(2025).https://arxiv.org/pdf/2501.13956
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
1999.Developing Time-Oriented Database Applica- tions In SQL
Richard Thomas Snodgrass. 1999.Developing Time-Oriented Database Applica- tions In SQL. Morgan Kaufmann Publishers
1999
-
[38]
Ji Sun, Guoliang Li, James Pan, Jiang Wang, and et al. 2025. GaussDB-Vector: A Large-Scale Persistent Real-Time Vector Database for LLM Applications.Proc. VLDB Endowment (PVLDB)(2025). https://www.vldb.org/pvldb/vol18/ p4951-sun.pdf
2025
-
[39]
Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, and et al. 2025. MemBench: To- wards More Comprehensive Evaluation On The Memory Of LLM-Based Agents. InFindings of the Association for Computational Linguistics (ACL). 19336–19352. https://aclanthology.org/2025.findings-acl.989.pdf
2025
-
[40]
Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi- angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus: A Purpose-Built Vector Data Management System. InProceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 2614–2627. https://dl.acm.org/doi/pdf/10.1145/3448016.3457550
-
[41]
Yu Wang and Xi Chen. 2025. Mirix: Multi-Agent Memory System For LLM-Based Agents.arXiv Preprint(2025).https://arxiv.org/pdf/2507.07957
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, and et al. 2025. Mem-𝛼: Learning Memory Construction via Reinforcement Learning.arXiv preprint arXiv:2509.25911(2025).https://arxiv.org/pdf/2509.25911
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
Jennifer Widom and Stefano Ceri. 1996. Active Database Systems: Triggers and Rules for Advanced Database Processing. (1996)
1996
-
[44]
Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. 2025. LongMemEval: Benchmarking Chat Assistants On Long-Term In- teractive Memory. InProceedings of the International Conference on Learning Representations (ICLR).https://openreview.net/forum?id=pZiyCaVuti
2025
-
[45]
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang
-
[46]
InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)
A-MEM: Agentic Memory For LLM Agents. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS). https://openreview.net/ pdf?id=FiM0M8gcct
-
[47]
Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, and et al. 2025. Memory- R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning.CoRR abs/2508.19828(2025). https://doi.org/ 10.48550/arXiv.2508.19828
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.19828 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.