arxiv: 2604.01707 · v2 · submitted 2026-04-02 · 💻 cs.CL · cs.DB

Recognition: 2 theorem links

· Lean Theorem

Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

Yanchen Wu , Tenghui Lin , Yingli Zhou , Fangyuan Zhang , Qintian Guo , Xun Zhou , Sibo Wang , Xilin Liu

show 2 more authors

Yuchi Ma Yixiang Fang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:00 UTC · model grok-4.3

classification 💻 cs.CL cs.DB

keywords LLM agentsmemory mechanismsunified frameworklong-horizon tasksmodular architecturesbenchmark comparisonhybrid memoryagent performance

0 comments

The pith

A unified framework for LLM agent memory methods shows that recombining their modules creates a hybrid system outperforming prior state-of-the-art on standard benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language model agents require memory to sustain performance across long-horizon tasks such as extended dialogues or game sequences. The paper organizes existing memory techniques into a single high-level framework that highlights their shared modular structure. It then runs a controlled comparison of representative methods on two established benchmarks, revealing clear patterns in how different components contribute to success or failure. From those patterns the authors construct a new memory approach by selecting and combining the strongest modules, and this hybrid outperforms existing leaders. The results supply a practical basis for designing more capable agent memory in future work.

Core claim

The paper presents a unified framework that captures all existing agent memory methods at a modular level. Systematic side-by-side testing on two benchmarks identifies effective and ineffective components across methods. Exploiting this analysis, the authors assemble a new memory method from the strongest modules of prior work and demonstrate that it exceeds the performance of current state-of-the-art approaches on the same benchmarks.

What carries the argument

A unified modular framework that decomposes agent memory into interchangeable components for storage, retrieval, and updating.

If this is right

Memory methods share reusable modules whose individual contributions can be measured separately.
Selecting and combining strong modules from different methods produces measurable gains over any single original method.
Current methods vary widely in how they handle knowledge accumulation versus iterative reasoning.
Future designs can target specific task demands by swapping or weighting individual memory modules.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The modular view may extend to memory designs outside the LLM-agent setting, such as in classical planning systems.
Dynamic selection among modules at runtime could further improve results on mixed task types.
Resource cost differences among modules remain unmeasured and could limit deployment on smaller hardware.
The framework invites tests on longer or more open-ended tasks than the two benchmarks provide.

Load-bearing premise

The two chosen benchmarks capture the essential range of long-horizon tasks where memory determines agent success.

What would settle it

Evaluating the new hybrid memory method on an additional benchmark that involves multi-turn scientific discovery and observing that it no longer exceeds the previous best method would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 2604.01707 by Fangyuan Zhang, Qintian Guo, Sibo Wang, Tenghui Lin, Xilin Liu, Xun Zhou, Yanchen Wu, Yingli Zhou, Yixiang Fang, Yuchi Ma.

**Figure 2.** Figure 2: An overview of the unified framework for agent [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: A sample prompt for summarization-based extrac [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Methods of information extraction. 5 MEMORY MANAGEMENT The memory management process governs how an agent system maintains, refines, and evolves its memory over time. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Workflow of the memory management process. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Overall trade-off between performance and token [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Average token costs per dialogue across sessions on [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Robustness analysis of memory mechanisms on LONGMEMEVAL. (a) illustrates the context scalability as the input [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Context scalability of various memory methods [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 11.** Figure 11: The framework of our newly designed method. [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 10.** Figure 10: Comparison of our newly designed method in [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 13.** Figure 13: Prompt for answer simplification. LONGMEMEVAL categorizes memory tasks to assess the following aspects: • Information Extraction (IE): Ability to recall specific information from extensive interactive histories, including the details mentioned by either the user (single-session-user) or the assistant (single-session-assistant), and whether the model can 17 [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗

read the original abstract

Memory emerges as the core module in the large language model (LLM)-based agents for long-horizon complex tasks (e.g., multi-turn dialogue, game playing, scientific discovery), where memory can enable knowledge accumulation, iterative reasoning and self-evolution. A number of memory methods have been proposed in the literature. However, these methods have not been systematically and comprehensively compared under the same experimental settings. In this paper, we first summarize a unified framework that incorporates all the existing agent memory methods from a high-level perspective. We then extensively compare representative agent memory methods on two well-known benchmarks and examine the effectiveness of all methods, providing a thorough analysis of those methods. As a byproduct of our experimental analysis, we also design a new memory method by exploiting modules in the existing methods, which outperforms the state-of-the-art methods. Finally, based on these findings, we offer promising future research opportunities. We believe that a deeper understanding of the behavior of existing methods can provide valuable new insights for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper organizes memory methods for LLM agents into one framework, compares them on two benchmarks, and derives a hybrid that claims to beat prior results.

read the letter

This paper's core contribution is a unified framework that organizes existing memory methods for LLM agents, followed by head-to-head experiments on two benchmarks that lead to a new hybrid method outperforming the state of the art. The framework itself is a solid organizing tool. It takes the various approaches scattered across the literature and places them in one structure, which makes it easier to see the common building blocks like storage, retrieval, and update mechanisms. The experimental section does the field a service by running the representative methods under consistent settings and providing analysis of their strengths and weaknesses. That kind of direct comparison has been missing, and the byproduct hybrid shows how mixing modules can yield better results without starting from scratch. The main limitation is the scope of the evaluation. The claims about handling long-horizon tasks rest on results from only two benchmarks. Those may not fully represent the diversity of scenarios the paper itself lists, such as game playing or scientific discovery, especially if the horizons or memory demands differ. If the gains are benchmark-specific, the hybrid's advantage might not generalize as broadly as suggested. This work is for researchers focused on LLM agents and memory augmentation. Anyone designing systems for multi-turn interactions would find the framework and comparisons useful as a reference point. The paper shows clear thinking in how it connects prior methods and derives a new one from them. I would send it to peer review. The synthesis and empirical comparison are worth referee attention, though the authors should be asked to justify the benchmark choices and perhaps add more tests.

Referee Report

1 major / 2 minor

Summary. The paper presents a unified high-level framework that subsumes existing memory methods for LLM-based agents on long-horizon tasks. It performs a systematic empirical comparison of representative methods on two benchmarks, analyzes their behavior with respect to knowledge accumulation, iterative reasoning, and self-evolution, and, as a byproduct, constructs a new composite memory method by recombining modules from prior work; this new method is reported to outperform existing SOTA approaches. The manuscript concludes with a discussion of open research opportunities.

Significance. If the reported outperformance is robust, the work is significant because it supplies the first controlled head-to-head evaluation of memory modules under identical settings and demonstrates that modular recombination can yield measurable gains. The unified framework itself offers a useful organizing lens for future agent designs, and the explicit identification of promising research directions (e.g., better handling of self-evolution) adds value beyond the empirical results.

major comments (1)

[§4 (Experimental Results) and Table 2] §4 (Experimental Results) and Table 2: the central claim that the newly designed composite memory method outperforms SOTA rests on head-to-head results from exactly two benchmarks. The manuscript provides no coverage argument showing that these benchmarks exercise the full spectrum of memory operations (knowledge accumulation across multi-turn dialogue, long-horizon game playing, and scientific discovery) enumerated in the introduction; without such justification the observed gains may be benchmark-specific rather than evidence of a generally superior modular architecture.

minor comments (2)

[Abstract] Abstract: the statement that experiments were conducted and a new method outperforms SOTA is given without any quantitative deltas, baseline names, or benchmark identifiers; adding these would improve readability.
[§3 (Unified Framework)] §3 (Unified Framework): the high-level modular decomposition is described qualitatively; a concise table or diagram that maps each prior method to the specific modules it uses would make the framework easier to use as a reference.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We agree that strengthening the justification for our benchmark choices will improve the manuscript and address concerns about the generalizability of our results. We provide a point-by-point response below.

read point-by-point responses

Referee: [§4 (Experimental Results) and Table 2] §4 (Experimental Results) and Table 2: the central claim that the newly designed composite memory method outperforms SOTA rests on head-to-head results from exactly two benchmarks. The manuscript provides no coverage argument showing that these benchmarks exercise the full spectrum of memory operations (knowledge accumulation across multi-turn dialogue, long-horizon game playing, and scientific discovery) enumerated in the introduction; without such justification the observed gains may be benchmark-specific rather than evidence of a generally superior modular architecture.

Authors: We appreciate this observation and agree that an explicit coverage argument was missing. The two benchmarks were selected as representative long-horizon tasks that require knowledge accumulation and iterative reasoning (one focused on multi-turn dialogue-style interactions and the other on game-playing environments). However, we did not provide a detailed mapping to all operations listed in the introduction, including scientific discovery. In the revised manuscript, we will add a new paragraph in §4 that (1) justifies the benchmark selection based on their coverage of core memory operations, (2) includes a table mapping benchmark tasks to knowledge accumulation, iterative reasoning, and self-evolution, and (3) explicitly acknowledges that scientific discovery scenarios are not directly evaluated, framing this as a limitation and future direction. This revision will clarify the scope of our claims without requiring new experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical framework summary and benchmark comparison

full rationale

The paper summarizes prior memory methods into a high-level unified framework, runs direct empirical comparisons of representative methods on two fixed benchmarks, and constructs a new composite method by recombining observed modules from those comparisons. No mathematical derivation chain exists. No equations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear. Central claims rest on the reported head-to-head results rather than reducing to inputs by construction. This matches the expected non-circular outcome for an empirical survey paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical axioms, free parameters, or invented entities are described in the abstract; the work is an empirical unification and comparison study.

pith-pipeline@v0.9.0 · 5504 in / 904 out tokens · 34484 ms · 2026-05-13T22:00:38.299773+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

unified framework that decomposes memory mechanisms into four stages: Information Extraction, Memory Management, Memory Storage, and Information Retrieval
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

new memory method by exploiting modules in the existing methods, which outperforms the state-of-the-art methods

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

134 extracted references · 134 canonical work pages · 19 internal anchors

[1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Flo- rencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shya- mal Anadkat, et al. 2023. Gpt-4 Technical Report.arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Eric Anderson, Jonathan Fritz, Austin Lee, Bohou Li, Mark Lindblad, Henry Lindeman, Alex Meyer, Parth Parmar, Tanvi Ranade, Mehul A Shah, et al. 2024. The Design of an LLM-powered Unstructured Analytics System.arXiv preprint arXiv:2409.00847(2024)

work page arXiv 2024
[3]

Anthropic. 2026. Introducing Claude Sonnet 4.6: Our fastest, smartest model is now available for all. https://www.anthropic.com/news/claude-sonnet-4-6

work page 2026
[4]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2023. Self-RAG: Learning to Retrieve, Generate, and Critique Through Self-Reflection. arXiv preprint arXiv:2310.11511(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. 2024. LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ...

work page doi:10.18653/v1/2024.acl-long.172 2024
[6]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gr...

work page 2020
[7]

Sibei Chen, Ju Fan, Bin Wu, Nan Tang, Chao Deng, Pengyi Wang, Ye Li, Jian Tan, Feifei Li, Jingren Zhou, et al . 2024. Automatic Database Configu- ration Debugging using Retrieval-Augmented Language Models.arXiv preprint arXiv:2412.07548(2024)

work page arXiv 2024
[8]

Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, and Surajit Chaudhuri. 2024. Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations.Proceedings of the ACM on Management of Data2, 3 (2024), 1–27

work page 2024
[9]

Sibei Chen, Nan Tang, Ju Fan, Xuemi Yan, Chengliang Chai, Guoliang Li, and Xiaoyong Du. 2023. Haipipe: Combining Human-Generated and Machine- Generated Pipelines for Data Preparation.Proceedings of the ACM on Manage- ment of Data1, 1 (2023), 1–26

work page 2023
[10]

Zui Chen, Lei Cao, Sam Madden, Tim Kraska, Zeyuan Shang, Ju Fan, Nan Tang, Zihui Gu, Chunwei Liu, and Michael Cafarella. 2023. SEED: Domain-Specific Data Curation With Large Language Models.arXiv e-prints(2023), arXiv–2310

work page 2023
[11]

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav

work page
[12]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory. https://arxiv.org/abs/2504.19413 arXiv:2504.19413

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. 2024. A Survey on In-Context Learning.arXiv preprint arXiv:2301.00234(2024). https://arxiv.org/abs/2301.00234

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2025. The Faiss Library. arXiv:2401.08281 [cs.LG] https://arxiv.org/abs/2401.08281

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

Yiming Du, Wenyu Huang, Danna Zheng, Zhaowei Wang, Sebastien Mon- tella, Mirella Lapata, Kam-Fai Wong, and Jeff Z. Pan. 2025. Rethinking Mem- ory in LLM based Agents: Representations, Operations, and Emerging Topics. arXiv:2505.00675 [cs.CL] https://arxiv.org/abs/2505.00675

work page arXiv 2025
[16]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. 2024. From Local to Global: A Graph RAG Approach to Query-Focused Summarization.arXiv preprint arXiv:2404.16130(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Ju Fan, Zihui Gu, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Samuel Madden, Xiaoyong Du, and Nan Tang. 2024. Combining small language models and large language models for zero-shot nl2sql.Proceedings of the VLDB Endowment17, 11 (2024), 2750–2763

work page 2024
[18]

Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation.Proceedings of the VLDB Endowment17, 5 (Jan. 2024), 1132–1145. https://doi.org/10.14778/3641204.3641221

work page doi:10.14778/3641204.3641221 2024
[19]

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey.arXiv preprint arXiv:2312.10997(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

Aashish Ghimire, James Prather, and John Edwards. 2024. Generative AI in Education: A Study of Educators’ Awareness, Sentiments, and Influencing Factors.arXiv preprint arXiv:2403.15586(2024)

work page arXiv 2024
[21]

Victor Giannankouris and Immanuel Trummer. 2024. {\lambda}-Tune: Har- nessing Large Language Models for Automated Database System Tuning.arXiv preprint arXiv:2411.03500(2024)

work page arXiv 2024
[22]

Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska-Barwińska, Sergio Gómez Colmenarejo, Edward Grefen- stette, Tiago Ramalho, John Agapiou, Adrià Puigdomènech Badia, Karl Moritz Hermann, Yori Zwols, Georg Ostrovski, Adam Cain, Helen King, Christopher Summerfield, Phil Blunsom, Koray Kavukcuoglu, and Demis Hassabis. 201...

work page doi:10.1038/nature20101 2016
[23]

Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. 2024. LightRAG: Simple and Fast Retrieval-Augmented Generation.arXiv e-prints(2024), arXiv– 2410

work page 2024
[24]

Bernal Jimenez Gutierrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su

work page
[25]

InThe Thirty-eighth Annual Conference on Neural Information Processing Systems

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=hkujvAPVsg

work page
[26]

Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A Rossi, Subhabrata Mukherjee, Xianfeng Tang, et al. 2024. Retrieval-Augmented Generation with Graphs (GraphRAG). arXiv preprint arXiv:2501.00309(2024)

work page arXiv 2024
[27]

Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Ceyao Zhang, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Robert Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Yongxin Ni, Zhibin Gou, Zongze Xu, Yuyu Luo, and Chenglin Wu. 2025. Da...

work page doi:10.18653/v1/2025.findings-acl 2025
[28]

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models?. InProceedings of the First Conference on Language Modeling (COLM)

work page 2024
[29]

Chenxu Hu, Jie Fu, Chenzhuang Du, Simian Luo, Junbo Zhao, and Hang Zhao

work page
[30]

Chatdb: Augmenting llms with databases as their symbolic memory

ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory. arXiv:2306.03901 [cs.AI] https://arxiv.org/abs/2306.03901

work page arXiv
[31]

Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. 2025. HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, ...

work page doi:10.18653/v1/2025.acl-long.1575 2025
[32]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2023. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions.arXiv preprint arXiv:2311.05232(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[33]

Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence33, 1 (2011), 117–128. https://doi.org/10.1109/TPAMI.2010.57

work page doi:10.1109/tpami.2010.57 2011
[34]

Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C Park

work page
[35]

Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity.arXiv preprint arXiv:2403.14403(2024)

work page arXiv 2024
[36]

Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory OS of AI Agent.arXiv preprint arXiv:2506.06326(2025)

work page arXiv 2025
[37]

Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T Joshi, Hanna Moazam, et al. 2023. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines.arXiv preprint arXiv:2310.03714(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Langchain. 2023. Langchain. https://python.langchain.com/docs/additional_ resources/arxiv_references/

work page 2023
[39]

Jiale Lao, Yibo Wang, Yufei Li, Jianping Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Mingjie Tang, and Jianguo Wang. 2024. Gptuner: A manual- reading database tuning system via gpt-guided bayesian optimization.Proceed- ings of the VLDB Endowment17, 8 (2024), 1939–1952

work page 2024
[40]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Infor- mation Processing Systems (NeurIPS), Vol. 33. 9459–9474

work page 2020
[41]

Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sukwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, et al. 2024. DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer’s Disease Questions with Scientific Literature.arXiv preprint arXiv:2405.04819(2024). 14

work page arXiv 2024
[42]

Hongxin Li, Jingran Su, Yuntao Chen, Qing Li, and Zhao-Xiang Zhang. 2023. SheetCopilot: Bringing Software Productivity to the Next Level Through Large Language Models. InAdvances in Neural Information Processing Systems, Vol. 36. 4952–4984

work page 2023
[43]

Lan Li, Liri Fang, Bertram Ludäscher, and Vetle I Torvik. 2025. AutoDCWork- flow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark. InFindings of the Association for Computational Linguistics: EMNLP 2025, Chris- tos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou...

work page doi:10.18653/v1/2025.findings-emnlp.410 2025
[44]

Xinzhe Li. 2024. A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning. arXiv:2406.05804 [cs.AI] https://arxiv.org/abs/2406.05804

work page arXiv 2024
[45]

Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. 2023. Compressing Con- text to Enhance Inference Efficiency of Large Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 6342–6353. https://doi.o...

work page doi:10.18653/v1/2023.emnlp- 2023
[46]

Yiyan Li, Haoyang Li, Jing Zhang, Renata Borovica-Gajic, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Cuiping Li, and Hong Chen. 2025. AgentTune: An Agent-Based Large Language Model Framework for Database Knob Tuning. Proc. ACM Manag. Data3, 6, Article 293 (Dec. 2025), 29 pages. https://doi.org/ 10.1145/3769758

work page doi:10.1145/3769758 2025
[47]

Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen. 2023. Large Language Models in Finance: A Survey. InProceedings of the fourth ACM international conference on AI in finance. 374–382

work page 2023
[48]

Zhiyu Li, Shichao Song, Chenyang Xi, Hanyu Wang, Chen Tang, Simin Niu, Ding Chen, Jiawei Yang, Chunyu Li, Qingchen Yu, et al. 2025. Memos: A memory os for ai system.arXiv preprint arXiv:2507.03724(2025)

work page internal anchor Pith review arXiv 2025
[49]

Zhaodonghui Li, Haitao Yuan, Huiming Wang, Gao Cong, and Lidong Bing

work page
[50]

LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency.Proceedings of the VLDB Endowment1, 18 (2025), 53–65

work page 2025
[51]

Chen Liang, Donghua Yang, Zheng Liang, Zhiyu Liang, Tianle Zhang, Boyu Xiao, Yuqing Yang, Wenqi Wang, and Hongzhi Wang. 2025. Revisiting Data Analysis with Pre-trained Foundation Models.arXiv preprint arXiv:2501.01631 (2025)

work page arXiv 2025
[52]

Yiming Lin, Mawil Hasan, Rohan Kosalge, Alvin Cheung, and Aditya G Parameswaran. 2025. TWIX: Automatically Reconstructing Structured Data from Templatized Documents.arXiv preprint arXiv:2501.06659(2025)

work page arXiv 2025
[53]

Yiming Lin, Madelon Hulsebos, Ruiying Ma, Shreya Shankar, Sepanta Zeigham, Aditya G Parameswaran, and Eugene Wu. 2024. Towards Accurate and Ef- ficient Document Analytics with Large Language Models.arXiv preprint arXiv:2405.04674(2024)

work page arXiv 2024
[54]

Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, and Gerardo Vitagliano

work page
[55]

A Declarative System for Optimizing AI Workloads.arXiv preprint arXiv:2405.14696(2024)

work page arXiv 2024
[56]

Chunwei Liu, Gerardo Vitagliano, Brandon Rose, Matt Prinz, David Andrew Samson, and Michael Cafarella. 2025. PalimpChat: Declarative and Interactive AI Analytics.arXiv preprint arXiv:2502.03368(2025)

work page arXiv 2025
[57]

Lei Liu, Xiaoyan Yang, Junchi Lei, Xiaoyang Liu, Yue Shen, Zhiqiang Zhang, Peng Wei, Jinjie Gu, Zhixuan Chu, Zhan Qin, et al. 2024. A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions.arXiv preprint arXiv:2406.03712(2024)

work page arXiv 2024
[58]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173

work page 2024
[59]

Peiqi Liu, Zhanqiu Guo, Mohit Warke, Soumith Chintala, Chris Paxton, Nur Muhammad Mahi Shafiullah, and Lerrel Pinto. 2025. DynaMem: Online Dy- namic Spatio-Semantic Memory for Open World Mobile Manipulation. InICRA 2025 Workshop: Human-Centered Robot Learning in the Era of Big Data and Large Models. https://openreview.net/forum?id=RJKUIhDJg1

work page 2025
[60]

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv:2107.13586 [cs.CL] https://arxiv.org/abs/2107.13586

work page arXiv 2021
[61]

llamaindex. 2023. llamaindex. https://www.llamaindex.ai/

work page 2023
[62]

Junru Lu, Siyu An, Mingbao Lin, Gabriele Pergola, Yulan He, Di Yin, Xing Sun, and Yunsheng Wu. 2023. MemoChat: Tuning LLMs to Use Memos for Consis- tent Long-Range Open-Domain Conversation.arXiv preprint arXiv:2308.08239 (2023)

work page arXiv 2023
[63]

Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating Very Long-Term Conversational Memory of LLM Agents. https://arxiv.org/abs/2402.17753 arXiv:2402.17753

work page internal anchor Pith review Pith/arXiv arXiv 2024
[64]

Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE transactions on pattern analysis and machine intelligence42, 4 (2018), 824– 836

work page 2018
[65]

Zan Ahmad Naeem, Mohammad Shahmeer Ahmad, Mohamed Eltabakh, Mourad Ouzzani, and Nan Tang. 2024. RetClean: Retrieval-Based Data Cleaning Using LLMs and Data Lakes.Proceedings of the VLDB Endowment17, 12 (2024), 4421–4424

work page 2024
[66]

Avanika Narayan, Ines Chami, Laurel Orr, and Christopher Ré. 2022. Can Foundation Models Wrangle Your Data?Proceedings of the VLDB Endowment 16, 4 (2022), 738–746

work page 2022
[67]

NebulaGraph. 2024. NebulaGraph. https://nebula-graph.io/

work page 2024
[68]

Neo4j. 2006. Neo4j. https://neo4j.com/

work page 2006
[69]

Yuqi Nie, Yaxuan Kong, Xiaowen Dong, John M Mulvey, H Vincent Poor, Qing- song Wen, and Stefan Zohren. 2024. A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges.arXiv preprint arXiv:2406.11903(2024)

work page arXiv 2024
[70]

OpenClaw Team. 2026. OpenClaw: Your own personal AI assistant. https: //github.com/openclaw/openclaw

work page 2026
[71]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe

work page
[72]

InAdvances in Neural Information Processing Systems, Vol

Training Language Models to Follow Instructions with Human Feedback. InAdvances in Neural Information Processing Systems, Vol. 35. 27730–27744

work page
[73]

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G Patil, Ion Stoica, and Joseph E Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems.arXiv preprint arXiv:2310.08560(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[74]

O’Brien, Carrie J

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery. https://doi.org/10.1145/3586183.3606763

work page doi:10.1145/3586183.3606763 2023
[75]

Liana Patel, Siddharth Jha, Carlos Guestrin, and Matei Zaharia. 2024. Lotus: Enabling semantic queries with llms over tables of unstructured and structured data.arXiv preprint arXiv:2407.11418(2024)

work page arXiv 2024
[76]

Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, and Siliang Tang. 2024. Graph Retrieval-Augmented Generation: A Survey.arXiv preprint arXiv:2408.08921(2024)

work page arXiv 2024
[77]

Vijay Putta, Krishna Teja Areti, Ajay Guyyala, and Prudhvi Ratna Badri Satya

work page
[78]

https: //doi.org/10.5120/ijca2026926236

Self-Reflective Memory Consolidation in Agentic Architectures.In- ternational Journal of Computer Applications187, 73 (Jan 2026), 1–14. https: //doi.org/10.5120/ijca2026926236

work page doi:10.5120/ijca2026926236 2026
[79]

Yichen Qian, Yongyi He, Rong Zhu, Jintao Huang, Zhijian Ma, Haibin Wang, Yaohua Wang, Xiuyu Sun, Defu Lian, Bolin Ding, et al. 2024. UniDM: A Unified Framework for Data Manipulation with Large Language Models.Proceedings of Machine Learning and Systems6 (2024), 465–482

work page 2024
[80]

Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. 2025. Zep: A Temporal Knowledge Graph Architecture for Agent Memory.arXiv preprint arXiv:2501.13956(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

Showing first 80 references.