MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing

Bingsheng He; Han Chen; Hongbao Zhang; Jason Zeng; Michael Heinrich; Ming Wu; Wei Wu; Wenqi Pei; Zining Zhang

arxiv: 2605.23986 · v1 · pith:O6J76X2Mnew · submitted 2026-05-16 · 💻 cs.DB · cs.AI· cs.MA

MemForest: An Efficient Agent Memory System with Hierarchical Temporal Indexing

Han Chen , Zining Zhang , Wenqi Pei , Bingsheng He , Ming Wu , Jason Zeng , Michael Heinrich , Wei Wu

show 1 more author

Hongbao Zhang

This is my paper

Pith reviewed 2026-06-30 19:24 UTC · model grok-4.3

classification 💻 cs.DB cs.AIcs.MA

keywords agent memoryhierarchical temporal indexLLM agentsmemory managementlong-context agentstemporal data managementwrite-efficient updates

0 comments

The pith

MemForest treats agent memory as a temporal indexing problem solved by parallel chunk extraction and localized tree-path updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that agent memory systems suffer from coarse-grained full-state rewrites and sequential update pipelines that couple tightly to LLM inference. It reformulates the problem as write-efficient temporal data management and introduces a hierarchical index to replace global summaries with time-ordered trees. This allows parallel independent chunk extraction and per-node updates confined to affected paths. On LongMemEval-S the approach reaches 79.8 percent pass@1 accuracy while delivering roughly six times the memory construction throughput of prior stateful systems. A reader would care because the design directly targets the scalability barrier that grows with accumulating interaction history.

Core claim

MemForest decouples memory construction into concurrent independent operations through parallel chunk extraction and replaces full-state rewrites with localized per-node updates on MemTree, a hierarchical temporal index that organizes memory as time-ordered trees, thereby reducing maintenance cost to the affected tree paths while naturally preserving temporally evolving states.

What carries the argument

MemTree, the hierarchical temporal index that organizes memory as time-ordered trees and supports localized per-node updates instead of full-state rewrites.

If this is right

Memory maintenance cost is confined to affected tree paths rather than entire states.
Construction throughput scales independently of sequential LLM inference bottlenecks.
Temporally evolving states remain preserved without explicit full reconciliation.
The system attains the highest overall accuracy among stateful baselines on the evaluated long-context benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same localized-update principle could apply to other persistent state stores that accumulate over long sessions.
Further scaling of interaction length would likely show sub-linear growth in per-step latency.
The parallel extraction layer might integrate with streaming data sources beyond agent logs.

Load-bearing premise

Localized per-node updates along the hierarchical temporal index are sufficient to preserve all necessary temporally evolving agent states without information loss or full-state reconciliation.

What would settle it

Running MemForest on LongMemEval-S and observing either pass@1 accuracy substantially below 79.8 percent or memory construction throughput not reaching approximately six times that of prior stateful baselines would falsify the central performance claims.

Figures

Figures reproduced from arXiv: 2605.23986 by Bingsheng He, Han Chen, Hongbao Zhang, Jason Zeng, Michael Heinrich, Ming Wu, Wei Wu, Wenqi Pei, Zining Zhang.

**Figure 2.** Figure 2: A generic workflow of agent memory systems. New [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: MemTree materializes one temporal scope as a time [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 3.** Figure 3: MemForest architecture. Sessions are extracted into [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Migration efficiency on progressively merged Long [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Write-path scalability diagnostics for MemTree. (a) Batch mark-dirty refresh reduces LLM summary calls relative to [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Per-question write-path breakdown for MemForest [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 9.** Figure 9: LoCoMo pass@𝑘 curves for 𝑘 = 1, . . . , 8 under the shared recall budget of top-10. decoding budgets. On LongMemEval, MemForest remains consistently strongest across the full pass@𝑘 range. On LoCoMo, the benchmark-level ordering is closer, with EverMemOS generally leading and MemForest usually remaining a strong second-best system across most categories [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

read the original abstract

Memory is a fundamental component for enabling long-context LLM agents, supporting persistent state across interactions through a continuous serve-and-update lifecycle. Despite substantial prior work, existing systems suffer from significant maintenance overhead due to two key limitations: coarse-grained state management and inherently sequential update pipelines. In particular, updates are often tightly coupled with LLM inference and require full-state rewrites, leading to poor scalability and growing latency as memory accumulates. To address these challenges, we present MemForest, a memory framework that reformulates agent memory as a write-efficient temporal data management problem. MemForest breaks the sequential bottleneck via parallel chunk extraction, decoupling memory construction into concurrent, independent operations. To further eliminate coarse-grained maintenance, we introduce MemTree, a hierarchical temporal index that organizes memory as time-ordered trees rather than flat global summaries. This design replaces full-state rewrites with localized per-node updates, reducing maintenance cost to the affected tree paths while naturally preserving temporally evolving states. We evaluate MemForest on two long-context memory benchmarks, LongMemEval-S and LoCoMo. On LongMemEval-S, MemForest achieves the best overall performance among stateful baselines, reaching 79.8% pass@1 accuracy while sustaining a memory construction throughput approximately 6x higher than state-of-the-art approaches including EverMemOS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MemForest recasts agent memory as a temporal indexing problem and uses a hierarchical tree for localized updates, which could cut maintenance costs, but the abstract gives too little experimental detail to judge whether accuracy is truly preserved.

read the letter

The main new piece is MemTree, a hierarchical temporal index that stores memory as time-ordered trees so updates touch only the affected paths rather than forcing full-state rewrites. Paired with parallel chunk extraction, this decouples the construction pipeline from sequential LLM inference. That framing turns a common agent-systems headache into a standard database maintenance question, and the reported 6x throughput lift over EverMemOS on LongMemEval-S is the kind of number that would matter if the accuracy side holds.

The paper does a clean job naming the two concrete limits—coarse global summaries and tightly coupled update-inference loops—and shows how the tree structure is meant to keep temporally evolving states queryable without touching the whole memory. The 79.8% pass@1 result among stateful baselines is presented as evidence that localized updates do not sacrifice correctness on the tested workloads.

The soft spot is the missing experimental scaffolding. The abstract supplies no baseline descriptions, no error bars, no account of how cross-branch facts are handled, and no discussion of whether the hierarchy actually captures dependencies that span multiple nodes. The stress-test worry about cross-node temporal dependencies therefore lands as a question the full paper must answer; if the design silently drops information that a full reconciliation would keep, the accuracy claim weakens. Minor issues like missing statistical tests can be fixed in revision, but the dependency question is load-bearing.

This is aimed at people working on long-context LLM agents who already think in terms of persistent state and maintenance cost. A reader who needs a concrete indexing trick for memory construction would find the design worth examining. I would bring the full paper to a reading group to walk through the tree invariants. I would not cite it yet. It deserves peer review because the underlying problem is real, the database analogy is applied honestly, and the claimed gains are large enough to justify referee time even if the experiments need tightening.

Referee Report

2 major / 1 minor

Summary. The paper presents MemForest, a memory framework for long-context LLM agents that reformulates agent memory as a write-efficient temporal data management problem. It decouples memory construction via parallel chunk extraction and introduces MemTree, a hierarchical temporal index organizing memory as time-ordered trees. This enables localized per-node updates rather than full-state rewrites. On LongMemEval-S, it reports 79.8% pass@1 accuracy (best among stateful baselines) and ~6x higher memory construction throughput than approaches including EverMemOS; similar claims are made for LoCoMo.

Significance. If the performance claims and design assumptions hold, the work could meaningfully advance scalable persistent memory for LLM agents by addressing coarse-grained maintenance and sequential update bottlenecks. The parallel extraction and tree-based localized updates represent a concrete engineering advance over flat global summaries.

major comments (2)

[Evaluation] Evaluation section: The reported 79.8% pass@1 accuracy and 6x throughput gains are presented without baseline descriptions, statistical tests, error bars, data exclusion rules, or variance across runs. This directly undermines the central claim of best overall performance among stateful baselines, as the numbers cannot be assessed for significance or reproducibility.
[MemTree design] MemTree design (hierarchical temporal index): The claim that localized per-node updates on time-ordered trees preserve all temporally evolving agent states without information loss rests on the unexamined assumption that cross-branch temporal dependencies do not exist or do not affect future queries. No formal argument, invariant, or targeted experiment (e.g., on LoCoMo cases with facts updated in one chunk affecting later unrelated chunks) is supplied to support this load-bearing assumption.

minor comments (1)

[Abstract] The abstract states specific accuracy and throughput numbers but supplies no experimental details; this should be expanded in the abstract or a dedicated experimental-setup subsection for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for improving the clarity and rigor of our evaluation and design justification. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The reported 79.8% pass@1 accuracy and 6x throughput gains are presented without baseline descriptions, statistical tests, error bars, data exclusion rules, or variance across runs. This directly undermines the central claim of best overall performance among stateful baselines, as the numbers cannot be assessed for significance or reproducibility.

Authors: We agree that the evaluation section requires additional details to support reproducibility and allow assessment of statistical significance. In the revised manuscript, we will expand this section to provide full descriptions of all baselines (including their configurations and implementation details), report standard deviations and confidence intervals across multiple runs (e.g., 5 independent runs), include error bars in figures, explicitly state data exclusion rules (none were applied beyond the standard protocols of LongMemEval-S and LoCoMo), and add variance metrics. These additions will directly address the concerns and strengthen the performance claims. revision: yes
Referee: [MemTree design] MemTree design (hierarchical temporal index): The claim that localized per-node updates on time-ordered trees preserve all temporally evolving agent states without information loss rests on the unexamined assumption that cross-branch temporal dependencies do not exist or do not affect future queries. No formal argument, invariant, or targeted experiment (e.g., on LoCoMo cases with facts updated in one chunk affecting later unrelated chunks) is supplied to support this load-bearing assumption.

Authors: The MemTree design structures memory as time-ordered hierarchical trees, with each node holding a temporal chunk; localized updates modify only the path to the affected node, and queries traverse from the root to collect relevant states while respecting temporal order within branches. This avoids full rewrites and preserves evolving states by design. We acknowledge that the manuscript does not provide an explicit formal invariant or targeted experiment addressing potential cross-branch dependencies. In revision, we will add a dedicated subsection formalizing the invariants (e.g., that all temporally relevant facts remain accessible via ancestor paths) and include a new experiment on LoCoMo subsets involving cross-chunk fact updates to empirically demonstrate preservation of information. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation on external benchmarks

full rationale

The paper introduces MemForest and MemTree as a new hierarchical temporal index for agent memory, motivated by limitations in prior systems (coarse-grained management, sequential updates, full-state rewrites). All performance claims (79.8% pass@1 on LongMemEval-S, 6x throughput vs. EverMemOS) are presented as results of implementation and benchmarking on named external datasets (LongMemEval-S, LoCoMo). No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or description. The design is justified by addressing stated problems rather than reducing to self-referential definitions or imported uniqueness theorems. This is a standard systems contribution whose validity rests on reproducible external benchmarks, not internal circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract provides no information on free parameters, background axioms, or formal properties of MemTree; the only new entity mentioned is the index structure itself.

invented entities (1)

MemTree no independent evidence
purpose: Hierarchical temporal index organizing memory as time-ordered trees for localized updates
Introduced as the central new data structure; no independent evidence or formal definition supplied in abstract.

pith-pipeline@v0.9.1-grok · 5782 in / 1190 out tokens · 46155 ms · 2026-06-30T19:24:45.196356+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 12 canonical work pages · 6 internal anchors

[1]

Shubham Agarwal, Sai Sundaresan, Subrata Mitra, Debabrata Mahapatra, Ar- chit Gupta, Rounak Sharma, Nirmal Joshua Kapu, Tong Yu, and Shiv Saini
[2]

ACM Manag

Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation.Proc. ACM Manag. Data3, 3, Article 136 (June 2025), 28 pages. https://doi.org/10.1145/3725273

work page doi:10.1145/3725273 2025
[3]

Bruno Becker, Stephan Gschwind, Thomas Ohler, Bernhard Seeger, and Peter Widmayer. 1996. An asymptotically optimal multiversion B-tree.The VLDB Journal5, 4 (1996), 264–275

1996
[4]

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Ya- dav. 2025. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Tri Dao. 2024. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. InInternational Conference on Learning Representations (ICLR)

2024
[6]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

Ramez Elmasri, Gene TJ Wuu, and Yeong-Joon Kim. 1990. The time index: An access structure for temporal data. InProceedings of the 16th International Conference on Very Large Data Bases. 1–12

1990
[8]

Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, and Ningyu Zhang
[9]

InThe Fourteenth International Conference on Learning Representations

LightMem: Lightweight and Efficient Memory-Augmented Generation. InThe Fourteenth International Conference on Learning Representations. https: //openreview.net/forum?id=dyJ0GWpjJB
[10]

Pengyu Gao, Jinming Zhao, Xinyue Chen, and Long Yilin. 2025. An efficient context-dependent memory framework for llm-centric agents. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track). 1055–1069

2025
[11]

Yubin Ge, Salvatore Romeo, Jason Cai, Raphael Shu, Yassine Benajiba, Monica Sunkara, and Yi Zhang. 2025. Tremu: Towards neuro-symbolic temporal reason- ing for llm-agents with memory in multi-session dialogues. InFindings of the Association for Computational Linguistics: ACL 2025. 18974–18988

2025
[12]

Tooraj Helmi. 2025. Decentralizing AI Memory: SHIMI, a Semantic Hierarchical Memory Index for Scalable Agent Reasoning.arXiv preprint arXiv:2504.06135 (2025)

work page arXiv 2025
[13]

Chuanrui Hu, Xingze Gao, Zuyi Zhou, Dannong Xu, Yi Bai, Xintong Li, Hui Zhang, Tong Li, Chong Zhang, Lidong Bing, et al. 2026. EverMemOS: A Self- Organizing Memory Operating System for Structured Long-Horizon Reasoning. arXiv preprint arXiv:2601.02163(2026)

work page arXiv 2026
[14]

Guoyu Hu, Shaofeng Cai, Tien Tuan Anh Dinh, Zhongle Xie, Cong Yue, Gang Chen, and Beng Chin Ooi. 2025. HAKES: Scalable Vector Database for Embedding Search Service.Proceedings of the VLDB Endowment18, 9 (2025), 3049–3062

2025
[15]

Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. 2025. Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 32779–32798

2025
[16]

Paul Jackson and Jane Klobas. 2008. Transactive memory systems in organi- zations: Implications for knowledge directories.Decision support systems44, 2 (2008), 409–424

2008
[17]

Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory os of ai agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 25972–25981

2025
[18]

Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al . 2025. Deepseek- v3. 2: Pushing the frontier of open large language models.arXiv preprint arXiv:2512.02556(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Mingfei Lu, Mengjia Wu, Feng Liu, Jiawei Xu, Weikai Li, Haoyang Wang, Zheng- dong Hu, Ying Ding, Yizhou Sun, Jie Lu, et al. 2026. Choosing How to Remember: Adaptive Memory Structures for LLM Agents.arXiv preprint arXiv:2602.14038 (2026)

work page arXiv 2026
[20]

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating very long-term conversational memory of llm agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 13851–13870

2024
[21]

2026.MemPalace

milla-jovovich. 2026.MemPalace. https://github.com/milla-jovovich/mempalace

2026
[22]

Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree).Acta informatica33, 4 (1996), 351–385

1996
[23]

Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. 2023. MemGPT: towards LLMs as operating systems. (2023)

2023
[24]

Vicky Zhao, Lili Qiu, and Jianfeng Gao

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Xufang Luo, Hao Cheng, Dongsheng Li, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, and Jianfeng Gao. 2025. SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=xKDZAW0He3

2025
[25]

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, et al. 2024. Llmlingua-2: Data distillation for efficient and faithful task-agnostic prompt compression. In Findings of the Association for Computational Linguistics: ACL 2024. 963–981

2024
[26]

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22

2023
[27]

Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. 2025. Zep: a temporal knowledge graph architecture for agent memory. arXiv preprint arXiv:2501.13956(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

Alireza Rezazadeh, Zichao Li, Ange Lou, Yuying Zhao, Wei Wei, and Yujia Bao
[29]

Collaborative memory: Multi-user memory sharing in llm agents with dynamic access control.arXiv preprint arXiv:2505.18279(2025)

work page arXiv 2025
[30]

Alireza Rezazadeh, Zichao Li, Wei Wei, and Yujia Bao. 2025. From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=moXtEmCleY

2025
[31]

Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D Manning. 2024. Raptor: Recursive abstractive processing for tree-organized retrieval. InThe Twelfth International Conference on Learning Representations

2024
[32]

Ji Sun, Guoliang Li, James Pan, Jiang Wang, Yongqing Xie, Ruicheng Liu, and Wen Nie. 2025. GaussDB-Vector: A Large-Scale Persistent Real-Time Vector Database for LLM Applications.Proceedings of the VLDB Endowment18, 12 (2025), 4951–4963

2025
[33]

Zhen Tan, Jun Yan, I-Hung Hsu, Rujun Han, Zifeng Wang, Long Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, et al. 2025. In prospect and retrospect: Reflective memory management for long-term personalized dialogue agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 8416–8439

2025
[34]

Zhenheng Tang, Xin He, Tiancheng Zhao, Fanjunduo Wei, Xiang Liu, Peijie Dong, Qian Wang, Qi Li, Huacan Wang, Ronghao Chen, et al. 2026. LLM Agent Memory: A Survey from a Unified Representation–Management Perspective. (2026)

2026
[35]

Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu
[36]

InThe Thirteenth International Conference on Learning Representations

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=pZiyCaVuti
[37]

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang
[38]

InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

A-Mem: Agentic Memory for LLM Agents. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. https://openreview.net/ forum?id=FiM0M8gcct
[39]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical Chen et al. report.arXiv preprint arXiv:2505.09388(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

Zihe Ye, Jingyuan Huang, Weixin Chen, and Yongfeng Zhang. 2026. H-Mem: Hybrid Multi-Dimensional Memory Management for Long-Context Conversa- tional Agents. InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 7756–7775

2026
[41]

Runjie Yu, Weizhou Huang, Shuhan Bai, Jian Zhou, and Fei Wu. 2025. AquaPipe: A Quality-Aware Pipeline for Knowledge Retrieval and Large Language Models. Proc. ACM Manag. Data3, 1, Article 11 (Feb. 2025), 26 pages. https://doi.org/10. 1145/3709661

2025
[42]

Ningning Zhang, Xingxing Yang, Zhizhong Tan, Weiping Deng, and Wenyong Wang. 2026. HiMem: Hierarchical Long-Term Memory for LLM Long-Horizon Agents.arXiv preprint arXiv:2601.06377(2026)

work page arXiv 2026
[43]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. 2025. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A survey on the memory mechanism of large language model-based agents.ACM Transactions on Information Systems 43, 6 (2025), 1–47

2025
[45]

label":

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2024. Memo- rybank: Enhancing large language models with long-term memory. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 19724–19731. MemForest A PROMPTS A.1 LLM-as-Judge Prompts LongMemEval Judge Prompt Your task is to label an answer to a LongMemEval question as C...

2024

[1] [1]

Shubham Agarwal, Sai Sundaresan, Subrata Mitra, Debabrata Mahapatra, Ar- chit Gupta, Rounak Sharma, Nirmal Joshua Kapu, Tong Yu, and Shiv Saini

[2] [2]

ACM Manag

Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation.Proc. ACM Manag. Data3, 3, Article 136 (June 2025), 28 pages. https://doi.org/10.1145/3725273

work page doi:10.1145/3725273 2025

[3] [3]

Bruno Becker, Stephan Gschwind, Thomas Ohler, Bernhard Seeger, and Peter Widmayer. 1996. An asymptotically optimal multiversion B-tree.The VLDB Journal5, 4 (1996), 264–275

1996

[4] [4]

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Ya- dav. 2025. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[5] [5]

Tri Dao. 2024. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. InInternational Conference on Learning Representations (ICLR)

2024

[6] [6]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

Ramez Elmasri, Gene TJ Wuu, and Yeong-Joon Kim. 1990. The time index: An access structure for temporal data. InProceedings of the 16th International Conference on Very Large Data Bases. 1–12

1990

[8] [8]

Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, and Ningyu Zhang

[9] [9]

InThe Fourteenth International Conference on Learning Representations

LightMem: Lightweight and Efficient Memory-Augmented Generation. InThe Fourteenth International Conference on Learning Representations. https: //openreview.net/forum?id=dyJ0GWpjJB

[10] [10]

Pengyu Gao, Jinming Zhao, Xinyue Chen, and Long Yilin. 2025. An efficient context-dependent memory framework for llm-centric agents. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track). 1055–1069

2025

[11] [11]

Yubin Ge, Salvatore Romeo, Jason Cai, Raphael Shu, Yassine Benajiba, Monica Sunkara, and Yi Zhang. 2025. Tremu: Towards neuro-symbolic temporal reason- ing for llm-agents with memory in multi-session dialogues. InFindings of the Association for Computational Linguistics: ACL 2025. 18974–18988

2025

[12] [12]

Tooraj Helmi. 2025. Decentralizing AI Memory: SHIMI, a Semantic Hierarchical Memory Index for Scalable Agent Reasoning.arXiv preprint arXiv:2504.06135 (2025)

work page arXiv 2025

[13] [13]

Chuanrui Hu, Xingze Gao, Zuyi Zhou, Dannong Xu, Yi Bai, Xintong Li, Hui Zhang, Tong Li, Chong Zhang, Lidong Bing, et al. 2026. EverMemOS: A Self- Organizing Memory Operating System for Structured Long-Horizon Reasoning. arXiv preprint arXiv:2601.02163(2026)

work page arXiv 2026

[14] [14]

Guoyu Hu, Shaofeng Cai, Tien Tuan Anh Dinh, Zhongle Xie, Cong Yue, Gang Chen, and Beng Chin Ooi. 2025. HAKES: Scalable Vector Database for Embedding Search Service.Proceedings of the VLDB Endowment18, 9 (2025), 3049–3062

2025

[15] [15]

Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. 2025. Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 32779–32798

2025

[16] [16]

Paul Jackson and Jane Klobas. 2008. Transactive memory systems in organi- zations: Implications for knowledge directories.Decision support systems44, 2 (2008), 409–424

2008

[17] [17]

Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory os of ai agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 25972–25981

2025

[18] [18]

Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al . 2025. Deepseek- v3. 2: Pushing the frontier of open large language models.arXiv preprint arXiv:2512.02556(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [19]

Mingfei Lu, Mengjia Wu, Feng Liu, Jiawei Xu, Weikai Li, Haoyang Wang, Zheng- dong Hu, Ying Ding, Yizhou Sun, Jie Lu, et al. 2026. Choosing How to Remember: Adaptive Memory Structures for LLM Agents.arXiv preprint arXiv:2602.14038 (2026)

work page arXiv 2026

[20] [20]

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating very long-term conversational memory of llm agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 13851–13870

2024

[21] [21]

2026.MemPalace

milla-jovovich. 2026.MemPalace. https://github.com/milla-jovovich/mempalace

2026

[22] [22]

Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree).Acta informatica33, 4 (1996), 351–385

1996

[23] [23]

Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. 2023. MemGPT: towards LLMs as operating systems. (2023)

2023

[24] [24]

Vicky Zhao, Lili Qiu, and Jianfeng Gao

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Xufang Luo, Hao Cheng, Dongsheng Li, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, and Jianfeng Gao. 2025. SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=xKDZAW0He3

2025

[25] [25]

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, et al. 2024. Llmlingua-2: Data distillation for efficient and faithful task-agnostic prompt compression. In Findings of the Association for Computational Linguistics: ACL 2024. 963–981

2024

[26] [26]

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22

2023

[27] [27]

Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. 2025. Zep: a temporal knowledge graph architecture for agent memory. arXiv preprint arXiv:2501.13956(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

Alireza Rezazadeh, Zichao Li, Ange Lou, Yuying Zhao, Wei Wei, and Yujia Bao

[29] [29]

Collaborative memory: Multi-user memory sharing in llm agents with dynamic access control.arXiv preprint arXiv:2505.18279(2025)

work page arXiv 2025

[30] [30]

Alireza Rezazadeh, Zichao Li, Wei Wei, and Yujia Bao. 2025. From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=moXtEmCleY

2025

[31] [31]

Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D Manning. 2024. Raptor: Recursive abstractive processing for tree-organized retrieval. InThe Twelfth International Conference on Learning Representations

2024

[32] [32]

Ji Sun, Guoliang Li, James Pan, Jiang Wang, Yongqing Xie, Ruicheng Liu, and Wen Nie. 2025. GaussDB-Vector: A Large-Scale Persistent Real-Time Vector Database for LLM Applications.Proceedings of the VLDB Endowment18, 12 (2025), 4951–4963

2025

[33] [33]

Zhen Tan, Jun Yan, I-Hung Hsu, Rujun Han, Zifeng Wang, Long Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, et al. 2025. In prospect and retrospect: Reflective memory management for long-term personalized dialogue agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 8416–8439

2025

[34] [34]

Zhenheng Tang, Xin He, Tiancheng Zhao, Fanjunduo Wei, Xiang Liu, Peijie Dong, Qian Wang, Qi Li, Huacan Wang, Ronghao Chen, et al. 2026. LLM Agent Memory: A Survey from a Unified Representation–Management Perspective. (2026)

2026

[35] [35]

Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu

[36] [36]

InThe Thirteenth International Conference on Learning Representations

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=pZiyCaVuti

[37] [37]

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang

[38] [38]

InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

A-Mem: Agentic Memory for LLM Agents. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. https://openreview.net/ forum?id=FiM0M8gcct

[39] [39]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical Chen et al. report.arXiv preprint arXiv:2505.09388(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

Zihe Ye, Jingyuan Huang, Weixin Chen, and Yongfeng Zhang. 2026. H-Mem: Hybrid Multi-Dimensional Memory Management for Long-Context Conversa- tional Agents. InProceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 7756–7775

2026

[41] [41]

Runjie Yu, Weizhou Huang, Shuhan Bai, Jian Zhou, and Fei Wu. 2025. AquaPipe: A Quality-Aware Pipeline for Knowledge Retrieval and Large Language Models. Proc. ACM Manag. Data3, 1, Article 11 (Feb. 2025), 26 pages. https://doi.org/10. 1145/3709661

2025

[42] [42]

Ningning Zhang, Xingxing Yang, Zhizhong Tan, Weiping Deng, and Wenyong Wang. 2026. HiMem: Hierarchical Long-Term Memory for LLM Long-Horizon Agents.arXiv preprint arXiv:2601.06377(2026)

work page arXiv 2026

[43] [43]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. 2025. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[44] [44]

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A survey on the memory mechanism of large language model-based agents.ACM Transactions on Information Systems 43, 6 (2025), 1–47

2025

[45] [45]

label":

Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2024. Memo- rybank: Enhancing large language models with long-term memory. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 19724–19731. MemForest A PROMPTS A.1 LLM-as-Judge Prompts LongMemEval Judge Prompt Your task is to label an answer to a LongMemEval question as C...

2024