pith. sign in

arxiv: 2605.26252 · v1 · pith:VU26E7KJnew · submitted 2026-05-25 · 💻 cs.AI · cs.DB

Is Agent Memory a Database? Rethinking Data Foundations for Long-Term AI Agent Memory

Pith reviewed 2026-06-29 21:19 UTC · model grok-4.3

classification 💻 cs.AI cs.DB
keywords AI agent memorylong-term memorydata managementGoverned Evolving Memorystate-level operatorscorrectness conditionsmemory systems
0
0 comments X

The pith

Long-term AI agent memory requires state-level operators and correctness conditions on evolving trajectories, not record-level database operations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that treating agent memory as storage with correctness at individual records produces four persistent failure modes: unregulated growth, inability to revise semantics, capacity-driven forgetting, and read-only retrieval. It proposes instead that memory correctness is a property of the entire state trajectory and formalizes this view as Governed Evolving Memory. GEM defines four state-level operators—ingestion, revision, forgetting, and retrieval—plus six conditions that must hold as the state changes over time. Three structural observations are offered to show that no record-level system, regardless of underlying storage model, can meet those conditions. A prototype called MemState on a property-graph backend is used to demonstrate basic feasibility while exposing the distance to a purpose-built engine.

Core claim

Current agent memory systems localize correctness at records, embeddings, or edges and therefore produce unregulated growth, missing semantic revision, capacity-driven forgetting, and read-only retrieval. The paper formalizes long-term agent memory as Governed Evolving Memory (GEM), a workload whose correctness is a property of the state trajectory. GEM replaces record-level operations with four state-level operators—ingestion, revision, forgetting, and retrieval—governed by six correctness conditions. Three structural observations establish that no record-level system can satisfy these conditions, and a property-graph prototype called MemState shows the abstraction is implementable but poin

What carries the argument

Governed Evolving Memory (GEM), which replaces record-level database operations with four state-level operators (ingestion, revision, forgetting, retrieval) whose behavior is constrained by six correctness conditions on the evolution of the full memory state.

If this is right

  • Memory correctness must be judged on properties of the state trajectory rather than on any single record.
  • The four state-level operators become the only allowed ways to change agent memory.
  • No existing record-level storage model can meet the six conditions for long-term agent memory.
  • A working prototype on a property-graph backend is feasible but reveals the gap to a native implementation.
  • Memory-centric data management emerges as a distinct workload with its own research agenda.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Database engines could be redesigned around explicit state-evolution rules rather than query and transaction primitives.
  • Auditing and compliance features for AI agents would follow directly from the six conditions on state history.
  • The same state-level framing might apply to other long-running autonomous systems that must revise or forget past observations.

Load-bearing premise

The six correctness conditions are both necessary and sufficient for long-term agent memory and cannot be satisfied by any extension of record-level database operations.

What would settle it

A concrete demonstration that some record-level database, possibly extended, can maintain an agent's memory state while satisfying all six correctness conditions without exhibiting unregulated growth, missing semantic revisions, capacity-driven forgetting, or read-only retrieval.

Figures

Figures reproduced from arXiv: 2605.26252 by Abdelghny Orogat, Essam Mansour.

Figure 1
Figure 1. Figure 1: Agent memory as an append-only record store over [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Our GEM Abstraction. The state 𝑀𝑡 = (𝐷𝑡 , 𝑆𝑡 , 𝑃𝑡 ) holds semantic units (𝐷𝑡 ), their structural organization (𝑆𝑡 ), and declarative evolution policies (𝑃𝑡 ). These three elements must be explicit in any compliant implementation; their real￾ization varies by backend. Four state-level operators replace record-level CRUD: ingestion, revision, forgetting, retrieval. 2.4 State-Modifying Retrieval State-modifyi… view at source ↗
Figure 3
Figure 3. Figure 3: MemState data model. (a) A self-contained topic [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Long-running AI agents need persistent memory. Memory supports learning across sessions, reduces repeated context injection, and enables auditing of past decisions. Current agent memory systems and database paradigms treat memory as storage. They localize correctness at records, embeddings, or edges. Each supplies only some of the capabilities that long-term memory requires. The result is four recurring failure modes: unregulated growth, missing semantic revision, capacity-driven forgetting, and read-only retrieval. In our vision, long-term agent memory is a new data-management workload. Its correctness is a property of the state trajectory, not of individual records. We formalize this as Governed Evolving Memory (GEM). GEM replaces record-level database operations with four state-level operators: ingestion, revision, forgetting, and retrieval. Six correctness conditions govern how the state evolves. Three structural observations establish that no record-level system can satisfy these conditions, regardless of the storage model. We realize the abstraction in MemState, a prototype on a property-graph backend. MemState validates feasibility and exposes the gap to a native engine. We outline three research directions that define memory-centric data management as a workload.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims that existing database and agent memory systems fail for long-term AI agents due to four recurring modes (unregulated growth, missing semantic revision, capacity-driven forgetting, read-only retrieval) because they localize correctness at the record level. It proposes Governed Evolving Memory (GEM) as a new workload whose correctness is defined on state trajectories via four state-level operators (ingestion, revision, forgetting, retrieval) and six governing conditions; three structural observations are said to prove that no record-level system (regardless of storage model) can satisfy them. The abstraction is realized in the MemState prototype on a property-graph backend, which is presented as validating feasibility while exposing the gap to a native engine; three research directions are outlined.

Significance. If the structural observations are shown to be rigorous and the prototype is demonstrated to satisfy the six conditions, the work would establish memory-centric data management as a distinct workload and motivate new engine designs beyond record-oriented storage. The explicit enumeration of failure modes and the shift from record-level to state-level operators supply a concrete vocabulary that could guide subsequent systems research.

major comments (1)
  1. [Abstract / structural observations section] Abstract (and the section presenting the three structural observations): the claim that "no record-level system can satisfy these conditions, regardless of the storage model" is load-bearing for the central impossibility result, yet MemState is realized on a property-graph backend whose primitives remain node/edge CRUD operations on records. The manuscript supplies neither a formal definition of "record-level" that would exclude this backend nor an argument showing how an intervening governance layer evades the observations; this leaves the impossibility result and the prototype in unresolved tension.
minor comments (2)
  1. [Abstract] The six correctness conditions are referenced repeatedly but never enumerated in the abstract or early sections; listing them explicitly would improve readability.
  2. [MemState prototype section] The prototype validation is described only at a high level; a brief description of how the four operators map onto the property-graph primitives would clarify the feasibility claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for identifying the tension between the impossibility claim and the MemState prototype. The comment correctly notes the absence of a formal definition of 'record-level' and an explicit argument for how governance evades the structural observations. We will revise the manuscript to supply both.

read point-by-point responses
  1. Referee: [Abstract / structural observations section] Abstract (and the section presenting the three structural observations): the claim that "no record-level system can satisfy these conditions, regardless of the storage model" is load-bearing for the central impossibility result, yet MemState is realized on a property-graph backend whose primitives remain node/edge CRUD operations on records. The manuscript supplies neither a formal definition of "record-level" that would exclude this backend nor an argument showing how an intervening governance layer evades the observations; this leaves the impossibility result and the prototype in unresolved tension.

    Authors: We agree the manuscript must define 'record-level system' precisely and show why the governance layer does not contradict the observations. We define a record-level system as one whose interface and correctness are expressed exclusively via operations on individual records (nodes, edges, tuples) without reference to an agent's state trajectory. The three structural observations demonstrate that the six GEM conditions cannot be satisfied by any such interface, because they require trajectory-level semantics (e.g., revision that may invalidate prior records, forgetting that is not capacity-driven). MemState uses the property-graph backend solely as a storage substrate; the governance layer implements the four state-level operators and enforces the conditions. This architecture does not violate the observations, as the backend alone is not the system under consideration. We will insert the formal definition into the structural observations section and add an explicit paragraph explaining the separation between substrate and governance layer. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is conceptual and self-contained.

full rationale

The paper advances a conceptual redefinition of agent memory as a state-trajectory workload governed by six correctness conditions and four operators. No equations, fitted parameters, or self-citations appear in the abstract or described derivation. The three structural observations are presented as independent arguments against record-level systems; GEM is introduced as a new abstraction rather than derived from prior self-work. The MemState prototype on a property-graph backend is described as feasibility validation, not as a step that reduces the impossibility claim to its own inputs. No pattern from the enumerated circularity kinds is exhibited by direct quotation and reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities beyond the high-level proposal are detailed.

axioms (1)
  • domain assumption Long-term agent memory correctness must be defined over state trajectories rather than individual records
    Central premise stated in the abstract as motivation for GEM.
invented entities (2)
  • Governed Evolving Memory (GEM) no independent evidence
    purpose: New data-management workload abstraction for agent memory
    Core proposal introduced in the abstract.
  • MemState no independent evidence
    purpose: Prototype implementation on property-graph backend
    Mentioned as realization of the GEM abstraction.

pith-pipeline@v0.9.1-grok · 5728 in / 1313 out tokens · 35932 ms · 2026-06-29T21:19:12.973154+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 22 canonical work pages · 8 internal anchors

  1. [1]

    Renzo Angles and Claudio Gutierrez. 2008. Survey Of Graph Database Models. ACM Computing Surveys (CSUR)40, 1 (2008), 1–39. https://dl.acm.org/doi/ pdf/10.1145/1322432.1322433

  2. [2]

    Anthropic. 2025. Claude.Anthropic(2025).https://www.anthropic.com

  3. [3]

    Anthropic. 2025. Claude Code.Anthropic(2025). https://www.anthropic. com/claude-code

  4. [4]

    Anysphere. 2025. Cursor.Anysphere(2025).https://cursor.com

  5. [5]

    Stefano Ceri and Jennifer Widom. 1990. Deriving Production Rules for Constraint Maintenance. InProceedings of the International Conference on Very Large Data Bases (VLDB). 566–577

  6. [6]

    Hsieh, Deborah A

    Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wal- lach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A Distributed Storage System For Structured Data.ACM Transactions on Computer Systems (TOCS)26, 2 (2008), 1–26. https://dl.acm.org/doi/ pdf/10.1145/1365815.1365816

  7. [7]

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav

  8. [8]

    Mem0: Building Production-Ready AI Agents With Scalable Long-Term Memory.arXiv Preprint(2025).https://arxiv.org/pdf/2504.19413

  9. [9]

    Edgar F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM13, 6 (1970), 377–387. https://dl.acm.org/doi/pdf/10. 1145/362384.362685

  10. [10]

    CrewAI Inc. 2025. CrewAI: A Framework For Building Role-Based Multi-Agent Systems With LLMs.https://www.crewai.com/. Accessed: 2026

  11. [11]

    Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s Highly Available Key-Value Store.ACM SIGOPS Operating Systems Review41, 6 (2007), 205–220. https: //dl.acm.org/doi/pdf/10.1145/1323293.1294281

  12. [12]

    2016.Fundamentals Of Database Systems

    Ramez Elmasri and Shamkant Navathe. 2016.Fundamentals Of Database Systems. Pearson

  13. [13]

    Ronald Fagin, Phokion G Kolaitis, Renee J Miller, and Lucian Popa. 2005. Data Exchange: Semantics and Query Answering.Theoretical Computer Science(2005)

  14. [14]

    Xiyang Feng, Guodong Jin, Ziyi Chen, Chang Liu, and Semih Salihoğlu. 2023. Kùzu Graph Database Management System. InConference on Innovative Data Systems Research (CIDR). https://vldb.org/cidrdb/papers/2023/p48-jin. pdf

  15. [15]

    Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lin- daaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andres Taylor. 2018. Cypher: An Evolving Query Language For Property Graphs. InProceedings of the ACM SIGMOD International Conference on Manage- ment of Data (SIGMOD). 1433–1445. https://dl.acm.org/doi/pdf...

  16. [16]

    Chuanrui Hu, Xingze Gao, Zuyi Zhou, Dannong Xu, and et al. 2026. EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning.arXiv preprint arXiv:2601.02163(2026). https://arxiv.org/abs/ 2601.02163

  17. [17]

    Yuanzhe Hu, Yu Wang, and Julian McAuley. 2025. Evaluating Memory In LLM Agents Via Incremental Multi-Turn Interactions. InProceedings of the ICML 2025 Workshop on Long-Context Foundation Models (ICML). https://openreview. net/forum?id=ZgQ0t3zYTQ

  18. [18]

    Jensen and Richard T

    Christian S. Jensen and Richard T. Snodgrass. 2002. Temporal Data Manage- ment.IEEE Transactions on Knowledge and Data Engineering11, 1 (2002), 36–

  19. [19]

    https://vbn.aau.dk/ws/files/310302702/tdb_tutorial_ed_csj_4_ uncommented.pdf

  20. [20]

    Haridimos Kondylakis, Stefania Dumbrava, Matteo Lissandrini, Nikolay Yakovets, Angela Bonifati, Vasilis Efthymiou, George Fletcher, Dimitris Plexousakis, Ric- cardo Tommasini, Georgia Troullinou, et al. 2025. Property Graph Standards: State of the Art and Open Challenges.Proc. VLDB Endowment (PVLDB)(2025)

  21. [21]

    LangChain Inc. 2026. LangGraph: A Library For Building Multi-Agent Work- flows With LLMs. https://docs.langchain.com/oss/python/langgraph/. Accessed: 2026

  22. [22]

    Zhiyu Li, Shichao Song, Chenyang Xi, Hanyu Wang, Chen Tang, Simin Niu, Ding Chen, Qingchen Yang, Pengyuan Yu, and Jiahao Huo. 2025. MemOS: A Memory OS For AI System.arXiv Preprint(2025). https://arxiv.org/pdf/2507.03724

  23. [23]

    Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, and Mohit Bansal. 2024. Evaluating Very Long-Term Conversational Memory Of LLM Agents. InProceed- ings of the Annual Meeting of the Association for Computational Linguistics (ACL). 13851–13870.https://aclanthology.org/2024.acl-long.747.pdf

  24. [24]

    Malkov and Dmitry A

    Yu A. Malkov and Dmitry A. Yashunin. 2018. Efficient And Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence42, 4 (2018), 824–836. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8594636

  25. [25]

    Thomas Neumann and Gerhard Weikum. 2010. x-RDF-3X: Fast Querying, High Update Rates, And Consistency For RDF Databases.Proceedings of the VLDB Endowment (PVLDB)3, 1-2 (2010), 256–263. https://vldb.org/pvldb/vol3/ R22.pdf

  26. [26]

    Dan Olteanu. 2024. Recent Increments in Incremental View Maintenance. In PODS. 12–25.https://doi.org/10.1145/3635138.3654763

  27. [27]

    Reham Omar, Ishika Dhall, Panos Kalnis, and Essam Mansour. 2023. A Universal Question-Answering Platform For Knowledge Graphs.Proceedings of the ACM on Management of Data (SIGMOD)1, 1 (2023), 1–25. https://dl.acm.org/ doi/pdf/10.1145/3588696

  28. [28]

    Reham Omar, Abdelghny Orogat, Ibrahim Abdelaziz, Omij Mangukiya, Panos Kalnis, and Essam Mansour. 2026. Chatty-KG: A Multi-Agent AI System For On-Demand Conversational Question Answering Over Knowledge Graphs. Proceedings of the ACM on Management of Data (SIGMOD)(2026). https: //dl.acm.org/doi/abs/10.1145/3786632

  29. [29]

    OpenAI. 2025. ChatGPT.OpenAI(2025).https://chat.openai.com

  30. [30]

    OpenAI. 2025. OpenAI Agents SDK: A Python framework for building and orches- trating multi-agent systems. https://openai.github.io/openai-agents- python/. Accessed: Nov. 2025

  31. [31]

    MemGPT: Towards LLMs as Operating Systems

    Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2023. MemGPT: Towards LLMs As Operating Systems.arXiv Preprint(2023).https://arxiv.org/pdf/2310.08560

  32. [32]

    James Jie Pan, Jianguo Wang, and Guoliang Li. 2024. Survey Of Vector Database Management Systems.The VLDB Journal33, 5 (2024), 1591–1615. https: //doi.org/10.1007/s00778-024-00864-x

  33. [33]

    James Jie Pan, Jianguo Wang, and Guoliang Li. 2024. Vector Database Man- agement Techniques And Systems. InProceedings of the ACM SIGMOD In- ternational Conference on Management of Data (SIGMOD). 597–604. https: //dl.acm.org/doi/pdf/10.1145/3626246.3654691

  34. [34]

    Generative Agents: Interactive Simulacra of Human Behavior

    Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the ACM Symposium on User Interface Software and Technology (UIST).https://arxiv.org/pdf/2304.03442

  35. [35]

    Jorge Perez, Marcelo Arenas, and Claudio Gutierrez. 2009. Semantics And Com- plexity Of SPARQL.ACM Transactions on Database Systems (TODS)34, 3 (2009), 1–45.https://dl.acm.org/doi/pdf/10.1145/1567274.1567278

  36. [36]

    Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. 2025. Zep: A Temporal Knowledge Graph Architecture For Agent Mem- ory.arXiv Preprint(2025).https://arxiv.org/pdf/2501.13956

  37. [37]

    1999.Developing Time-Oriented Database Applica- tions In SQL

    Richard Thomas Snodgrass. 1999.Developing Time-Oriented Database Applica- tions In SQL. Morgan Kaufmann Publishers

  38. [38]

    Ji Sun, Guoliang Li, James Pan, Jiang Wang, and et al. 2025. GaussDB-Vector: A Large-Scale Persistent Real-Time Vector Database for LLM Applications.Proc. VLDB Endowment (PVLDB)(2025). https://www.vldb.org/pvldb/vol18/ p4951-sun.pdf

  39. [39]

    Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, and et al. 2025. MemBench: To- wards More Comprehensive Evaluation On The Memory Of LLM-Based Agents. InFindings of the Association for Computational Linguistics (ACL). 19336–19352. https://aclanthology.org/2025.findings-acl.989.pdf

  40. [40]

    Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xi- angyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, et al. 2021. Milvus: A Purpose-Built Vector Data Management System. InProceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 2614–2627. https://dl.acm.org/doi/pdf/10.1145/3448016.3457550

  41. [41]

    Yu Wang and Xi Chen. 2025. Mirix: Multi-Agent Memory System For LLM-Based Agents.arXiv Preprint(2025).https://arxiv.org/pdf/2507.07957

  42. [42]

    Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, and et al. 2025. Mem-𝛼: Learning Memory Construction via Reinforcement Learning.arXiv preprint arXiv:2509.25911(2025).https://arxiv.org/pdf/2509.25911

  43. [43]

    Jennifer Widom and Stefano Ceri. 1996. Active Database Systems: Triggers and Rules for Advanced Database Processing. (1996)

  44. [44]

    Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. 2025. LongMemEval: Benchmarking Chat Assistants On Long-Term In- teractive Memory. InProceedings of the International Conference on Learning Representations (ICLR).https://openreview.net/forum?id=pZiyCaVuti

  45. [45]

    Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang

  46. [46]

    InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)

    A-MEM: Agentic Memory For LLM Agents. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS). https://openreview.net/ pdf?id=FiM0M8gcct

  47. [47]

    Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, and et al. 2025. Memory- R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning.CoRR abs/2508.19828(2025). https://doi.org/ 10.48550/arXiv.2508.19828