Are We Ready For An Agent-Native Memory System?

· 2026 · cs.CL · arXiv 2606.24775

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Memory for large language model (LLM) agents has rapidly evolved from simple retrieval-augmented mechanisms into a data management system that supports persistent information storage, retrieval, update, consolidation, and dynamic lifecycle governance throughout agent execution. Despite this evolution, existing evaluations still benchmark agent memory mainly through end-to-end task success metrics (e.g., F1, BLEU), while treating the underlying system as a monolithic black box. As a result, critical system-level concerns, including operational costs, architectural trade-offs across memory modules, and robustness under dynamic knowledge updates, remain insufficiently explored. In this paper, we present a systematic experimental study of agent memory from a data management perspective. We propose an analytical framework that decomposes agent memory into four core modules: memory representation and storage, extraction, retrieval and routing, and maintenance. Under this framework, we evaluate 12 representative memory systems and two reference baselines across five benchmark workloads spanning 11 datasets. Our extensive end-to-end evaluation shows that no single architecture dominates across all scenarios; instead, effectiveness depends heavily on how well the memory structure aligns with the workload bottleneck. Furthermore, through fine-grained ablation studies, we quantify their individual effects on representation fidelity, retrieval precision, update correctness, and long-horizon stability. Finally, we reveal cost-performance trade-offs under realistic workloads, showing localized maintenance is more cost-efficient than global reorganization. Based on these findings, we identify promising directions towards building truly agent-native memory systems. The code is publicly available at https://github.com/OpenDataBox/MemoryData.

representative citing papers

AgenticDataBench: A Comprehensive Benchmark for Data Agents

cs.DB · 2026-07-02 · unverdicted · novelty 5.0

AgenticDataBench is a new benchmark covering realistic data science tasks across 15 domains using extracted skills and LLM-generated workflows to evaluate data agents at fine granularity.

citing papers explorer

Showing 1 of 1 citing paper.

AgenticDataBench: A Comprehensive Benchmark for Data Agents cs.DB · 2026-07-02 · unverdicted · none · ref 67 · internal anchor
AgenticDataBench is a new benchmark covering realistic data science tasks across 15 domains using extracted skills and LLM-generated workflows to evaluate data agents at fine granularity.

Are We Ready For An Agent-Native Memory System?

fields

years

verdicts

representative citing papers

citing papers explorer