Recognition: 2 theorem links
· Lean TheoremMemory in the LLM Era: Modular Architectures and Strategies in a Unified Framework
Pith reviewed 2026-05-13 22:00 UTC · model grok-4.3
The pith
A unified framework for LLM agent memory methods shows that recombining their modules creates a hybrid system outperforming prior state-of-the-art on standard benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents a unified framework that captures all existing agent memory methods at a modular level. Systematic side-by-side testing on two benchmarks identifies effective and ineffective components across methods. Exploiting this analysis, the authors assemble a new memory method from the strongest modules of prior work and demonstrate that it exceeds the performance of current state-of-the-art approaches on the same benchmarks.
What carries the argument
A unified modular framework that decomposes agent memory into interchangeable components for storage, retrieval, and updating.
If this is right
- Memory methods share reusable modules whose individual contributions can be measured separately.
- Selecting and combining strong modules from different methods produces measurable gains over any single original method.
- Current methods vary widely in how they handle knowledge accumulation versus iterative reasoning.
- Future designs can target specific task demands by swapping or weighting individual memory modules.
Where Pith is reading between the lines
- The modular view may extend to memory designs outside the LLM-agent setting, such as in classical planning systems.
- Dynamic selection among modules at runtime could further improve results on mixed task types.
- Resource cost differences among modules remain unmeasured and could limit deployment on smaller hardware.
- The framework invites tests on longer or more open-ended tasks than the two benchmarks provide.
Load-bearing premise
The two chosen benchmarks capture the essential range of long-horizon tasks where memory determines agent success.
What would settle it
Evaluating the new hybrid memory method on an additional benchmark that involves multi-turn scientific discovery and observing that it no longer exceeds the previous best method would falsify the superiority claim.
Figures
read the original abstract
Memory emerges as the core module in the large language model (LLM)-based agents for long-horizon complex tasks (e.g., multi-turn dialogue, game playing, scientific discovery), where memory can enable knowledge accumulation, iterative reasoning and self-evolution. A number of memory methods have been proposed in the literature. However, these methods have not been systematically and comprehensively compared under the same experimental settings. In this paper, we first summarize a unified framework that incorporates all the existing agent memory methods from a high-level perspective. We then extensively compare representative agent memory methods on two well-known benchmarks and examine the effectiveness of all methods, providing a thorough analysis of those methods. As a byproduct of our experimental analysis, we also design a new memory method by exploiting modules in the existing methods, which outperforms the state-of-the-art methods. Finally, based on these findings, we offer promising future research opportunities. We believe that a deeper understanding of the behavior of existing methods can provide valuable new insights for future research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a unified high-level framework that subsumes existing memory methods for LLM-based agents on long-horizon tasks. It performs a systematic empirical comparison of representative methods on two benchmarks, analyzes their behavior with respect to knowledge accumulation, iterative reasoning, and self-evolution, and, as a byproduct, constructs a new composite memory method by recombining modules from prior work; this new method is reported to outperform existing SOTA approaches. The manuscript concludes with a discussion of open research opportunities.
Significance. If the reported outperformance is robust, the work is significant because it supplies the first controlled head-to-head evaluation of memory modules under identical settings and demonstrates that modular recombination can yield measurable gains. The unified framework itself offers a useful organizing lens for future agent designs, and the explicit identification of promising research directions (e.g., better handling of self-evolution) adds value beyond the empirical results.
major comments (1)
- [§4 (Experimental Results) and Table 2] §4 (Experimental Results) and Table 2: the central claim that the newly designed composite memory method outperforms SOTA rests on head-to-head results from exactly two benchmarks. The manuscript provides no coverage argument showing that these benchmarks exercise the full spectrum of memory operations (knowledge accumulation across multi-turn dialogue, long-horizon game playing, and scientific discovery) enumerated in the introduction; without such justification the observed gains may be benchmark-specific rather than evidence of a generally superior modular architecture.
minor comments (2)
- [Abstract] Abstract: the statement that experiments were conducted and a new method outperforms SOTA is given without any quantitative deltas, baseline names, or benchmark identifiers; adding these would improve readability.
- [§3 (Unified Framework)] §3 (Unified Framework): the high-level modular decomposition is described qualitatively; a concise table or diagram that maps each prior method to the specific modules it uses would make the framework easier to use as a reference.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We agree that strengthening the justification for our benchmark choices will improve the manuscript and address concerns about the generalizability of our results. We provide a point-by-point response below.
read point-by-point responses
-
Referee: [§4 (Experimental Results) and Table 2] §4 (Experimental Results) and Table 2: the central claim that the newly designed composite memory method outperforms SOTA rests on head-to-head results from exactly two benchmarks. The manuscript provides no coverage argument showing that these benchmarks exercise the full spectrum of memory operations (knowledge accumulation across multi-turn dialogue, long-horizon game playing, and scientific discovery) enumerated in the introduction; without such justification the observed gains may be benchmark-specific rather than evidence of a generally superior modular architecture.
Authors: We appreciate this observation and agree that an explicit coverage argument was missing. The two benchmarks were selected as representative long-horizon tasks that require knowledge accumulation and iterative reasoning (one focused on multi-turn dialogue-style interactions and the other on game-playing environments). However, we did not provide a detailed mapping to all operations listed in the introduction, including scientific discovery. In the revised manuscript, we will add a new paragraph in §4 that (1) justifies the benchmark selection based on their coverage of core memory operations, (2) includes a table mapping benchmark tasks to knowledge accumulation, iterative reasoning, and self-evolution, and (3) explicitly acknowledges that scientific discovery scenarios are not directly evaluated, framing this as a limitation and future direction. This revision will clarify the scope of our claims without requiring new experiments. revision: yes
Circularity Check
No significant circularity in empirical framework summary and benchmark comparison
full rationale
The paper summarizes prior memory methods into a high-level unified framework, runs direct empirical comparisons of representative methods on two fixed benchmarks, and constructs a new composite method by recombining observed modules from those comparisons. No mathematical derivation chain exists. No equations, fitted parameters presented as predictions, self-definitional constructs, or load-bearing self-citations appear. Central claims rest on the reported head-to-head results rather than reducing to inputs by construction. This matches the expected non-circular outcome for an empirical survey paper.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
unified framework that decomposes memory mechanisms into four stages: Information Extraction, Memory Management, Memory Storage, and Information Retrieval
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
new memory method by exploiting modules in the existing methods, which outperforms the state-of-the-art methods
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Flo- rencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shya- mal Anadkat, et al. 2023. Gpt-4 Technical Report.arXiv preprint arXiv:2303.08774 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [2]
-
[3]
Anthropic. 2026. Introducing Claude Sonnet 4.6: Our fastest, smartest model is now available for all. https://www.anthropic.com/news/claude-sonnet-4-6
work page 2026
-
[4]
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2023. Self-RAG: Learning to Retrieve, Generate, and Critique Through Self-Reflection. arXiv preprint arXiv:2310.11511(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. 2024. LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ...
-
[6]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gr...
work page 2020
- [7]
-
[8]
Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, and Surajit Chaudhuri. 2024. Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations.Proceedings of the ACM on Management of Data2, 3 (2024), 1–27
work page 2024
-
[9]
Sibei Chen, Nan Tang, Ju Fan, Xuemi Yan, Chengliang Chai, Guoliang Li, and Xiaoyong Du. 2023. Haipipe: Combining Human-Generated and Machine- Generated Pipelines for Data Preparation.Proceedings of the ACM on Manage- ment of Data1, 1 (2023), 1–26
work page 2023
-
[10]
Zui Chen, Lei Cao, Sam Madden, Tim Kraska, Zeyuan Shang, Ju Fan, Nan Tang, Zihui Gu, Chunwei Liu, and Michael Cafarella. 2023. SEED: Domain-Specific Data Curation With Large Language Models.arXiv e-prints(2023), arXiv–2310
work page 2023
-
[11]
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav
-
[12]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory. https://arxiv.org/abs/2504.19413 arXiv:2504.19413
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. 2024. A Survey on In-Context Learning.arXiv preprint arXiv:2301.00234(2024). https://arxiv.org/abs/2301.00234
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2025. The Faiss Library. arXiv:2401.08281 [cs.LG] https://arxiv.org/abs/2401.08281
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [15]
-
[16]
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. 2024. From Local to Global: A Graph RAG Approach to Query-Focused Summarization.arXiv preprint arXiv:2404.16130(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Ju Fan, Zihui Gu, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Samuel Madden, Xiaoyong Du, and Nan Tang. 2024. Combining small language models and large language models for zero-shot nl2sql.Proceedings of the VLDB Endowment17, 11 (2024), 2750–2763
work page 2024
-
[18]
Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation.Proceedings of the VLDB Endowment17, 5 (Jan. 2024), 1132–1145. https://doi.org/10.14778/3641204.3641221
-
[19]
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey.arXiv preprint arXiv:2312.10997(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [20]
- [21]
-
[22]
Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska-Barwińska, Sergio Gómez Colmenarejo, Edward Grefen- stette, Tiago Ramalho, John Agapiou, Adrià Puigdomènech Badia, Karl Moritz Hermann, Yori Zwols, Georg Ostrovski, Adam Cain, Helen King, Christopher Summerfield, Phil Blunsom, Koray Kavukcuoglu, and Demis Hassabis. 201...
-
[23]
Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. 2024. LightRAG: Simple and Fast Retrieval-Augmented Generation.arXiv e-prints(2024), arXiv– 2410
work page 2024
-
[24]
Bernal Jimenez Gutierrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su
-
[25]
InThe Thirty-eighth Annual Conference on Neural Information Processing Systems
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=hkujvAPVsg
- [26]
-
[27]
Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Ceyao Zhang, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Robert Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Yongxin Ni, Zhibin Gou, Zongze Xu, Yuyu Luo, and Chenglin Wu. 2025. Da...
-
[28]
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, and Boris Ginsburg. 2024. RULER: What’s the Real Context Size of Your Long-Context Language Models?. InProceedings of the First Conference on Language Modeling (COLM)
work page 2024
-
[29]
Chenxu Hu, Jie Fu, Chenzhuang Du, Simian Luo, Junbo Zhao, and Hang Zhao
-
[30]
Chatdb: Augmenting llms with databases as their symbolic memory
ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory. arXiv:2306.03901 [cs.AI] https://arxiv.org/abs/2306.03901
-
[31]
Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, and Ping Luo. 2025. HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, ...
-
[32]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2023. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions.arXiv preprint arXiv:2311.05232(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[33]
Hervé Jégou, Matthijs Douze, and Cordelia Schmid. 2011. Product Quantization for Nearest Neighbor Search.IEEE Transactions on Pattern Analysis and Machine Intelligence33, 1 (2011), 117–128. https://doi.org/10.1109/TPAMI.2010.57
-
[34]
Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C Park
- [35]
- [36]
-
[37]
Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T Joshi, Hanna Moazam, et al. 2023. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines.arXiv preprint arXiv:2310.03714(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
Langchain. 2023. Langchain. https://python.langchain.com/docs/additional_ resources/arxiv_references/
work page 2023
-
[39]
Jiale Lao, Yibo Wang, Yufei Li, Jianping Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Mingjie Tang, and Jianguo Wang. 2024. Gptuner: A manual- reading database tuning system via gpt-guided bayesian optimization.Proceed- ings of the VLDB Endowment17, 8 (2024), 1939–1952
work page 2024
-
[40]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Infor- mation Processing Systems (NeurIPS), Vol. 33. 9459–9474
work page 2020
-
[41]
Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sukwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, et al. 2024. DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer’s Disease Questions with Scientific Literature.arXiv preprint arXiv:2405.04819(2024). 14
-
[42]
Hongxin Li, Jingran Su, Yuntao Chen, Qing Li, and Zhao-Xiang Zhang. 2023. SheetCopilot: Bringing Software Productivity to the Next Level Through Large Language Models. InAdvances in Neural Information Processing Systems, Vol. 36. 4952–4984
work page 2023
-
[43]
Lan Li, Liri Fang, Bertram Ludäscher, and Vetle I Torvik. 2025. AutoDCWork- flow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark. InFindings of the Association for Computational Linguistics: EMNLP 2025, Chris- tos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou...
- [44]
-
[45]
Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. 2023. Compressing Con- text to Enhance Inference Efficiency of Large Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 6342–6353. https://doi.o...
-
[46]
Yiyan Li, Haoyang Li, Jing Zhang, Renata Borovica-Gajic, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Cuiping Li, and Hong Chen. 2025. AgentTune: An Agent-Based Large Language Model Framework for Database Knob Tuning. Proc. ACM Manag. Data3, 6, Article 293 (Dec. 2025), 29 pages. https://doi.org/ 10.1145/3769758
-
[47]
Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen. 2023. Large Language Models in Finance: A Survey. InProceedings of the fourth ACM international conference on AI in finance. 374–382
work page 2023
-
[48]
Zhiyu Li, Shichao Song, Chenyang Xi, Hanyu Wang, Chen Tang, Simin Niu, Ding Chen, Jiawei Yang, Chunyu Li, Qingchen Yu, et al. 2025. Memos: A memory os for ai system.arXiv preprint arXiv:2507.03724(2025)
work page internal anchor Pith review arXiv 2025
-
[49]
Zhaodonghui Li, Haitao Yuan, Huiming Wang, Gao Cong, and Lidong Bing
-
[50]
LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency.Proceedings of the VLDB Endowment1, 18 (2025), 53–65
work page 2025
- [51]
- [52]
- [53]
-
[54]
Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, and Gerardo Vitagliano
- [55]
- [56]
- [57]
-
[58]
Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173
work page 2024
-
[59]
Peiqi Liu, Zhanqiu Guo, Mohit Warke, Soumith Chintala, Chris Paxton, Nur Muhammad Mahi Shafiullah, and Lerrel Pinto. 2025. DynaMem: Online Dy- namic Spatio-Semantic Memory for Open World Mobile Manipulation. InICRA 2025 Workshop: Human-Centered Robot Learning in the Era of Big Data and Large Models. https://openreview.net/forum?id=RJKUIhDJg1
work page 2025
- [60]
-
[61]
llamaindex. 2023. llamaindex. https://www.llamaindex.ai/
work page 2023
- [62]
-
[63]
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating Very Long-Term Conversational Memory of LLM Agents. https://arxiv.org/abs/2402.17753 arXiv:2402.17753
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[64]
Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE transactions on pattern analysis and machine intelligence42, 4 (2018), 824– 836
work page 2018
-
[65]
Zan Ahmad Naeem, Mohammad Shahmeer Ahmad, Mohamed Eltabakh, Mourad Ouzzani, and Nan Tang. 2024. RetClean: Retrieval-Based Data Cleaning Using LLMs and Data Lakes.Proceedings of the VLDB Endowment17, 12 (2024), 4421–4424
work page 2024
-
[66]
Avanika Narayan, Ines Chami, Laurel Orr, and Christopher Ré. 2022. Can Foundation Models Wrangle Your Data?Proceedings of the VLDB Endowment 16, 4 (2022), 738–746
work page 2022
-
[67]
NebulaGraph. 2024. NebulaGraph. https://nebula-graph.io/
work page 2024
-
[68]
Neo4j. 2006. Neo4j. https://neo4j.com/
work page 2006
- [69]
-
[70]
OpenClaw Team. 2026. OpenClaw: Your own personal AI assistant. https: //github.com/openclaw/openclaw
work page 2026
-
[71]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe
-
[72]
InAdvances in Neural Information Processing Systems, Vol
Training Language Models to Follow Instructions with Human Feedback. InAdvances in Neural Information Processing Systems, Vol. 35. 27730–27744
-
[73]
Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G Patil, Ion Stoica, and Joseph E Gonzalez. 2023. MemGPT: Towards LLMs as Operating Systems.arXiv preprint arXiv:2310.08560(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[74]
Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery. https://doi.org/10.1145/3586183.3606763
- [75]
- [76]
-
[77]
Vijay Putta, Krishna Teja Areti, Ajay Guyyala, and Prudhvi Ratna Badri Satya
-
[78]
https: //doi.org/10.5120/ijca2026926236
Self-Reflective Memory Consolidation in Agentic Architectures.In- ternational Journal of Computer Applications187, 73 (Jan 2026), 1–14. https: //doi.org/10.5120/ijca2026926236
-
[79]
Yichen Qian, Yongyi He, Rong Zhu, Jintao Huang, Zhijian Ma, Haibin Wang, Yaohua Wang, Xiuyu Sun, Defu Lian, Bolin Ding, et al. 2024. UniDM: A Unified Framework for Data Manipulation with Large Language Models.Proceedings of Machine Learning and Systems6 (2024), 465–482
work page 2024
-
[80]
Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. 2025. Zep: A Temporal Knowledge Graph Architecture for Agent Memory.arXiv preprint arXiv:2501.13956(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.