Recognition: 2 theorem links
· Lean TheoremHyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling
Pith reviewed 2026-05-15 21:56 UTC · model grok-4.3
The pith
HyMem uses hybrid memory with summary-level lightweight retrieval plus selective deep activation to match full-context performance while cutting computational cost by 92.6 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HyMem adopts a dual-granular storage scheme paired with a dynamic two-tier retrieval system: a lightweight module constructs summary-level context for efficient response generation, while an LLM-based deep module is selectively activated only for complex queries, augmented by a reflection mechanism for iterative reasoning refinement. On LOCOMO and LongMemEval this produces performance that exceeds full-context processing at 7.4 percent of the compute.
What carries the argument
Dual-granular storage with a lightweight summary module for fast responses and selective activation of a deeper LLM module for complex queries, driven by dynamic two-tier retrieval and reflection.
If this is right
- LLM agents can maintain long-term memory across extended dialogues without linear growth in compute cost.
- Simple queries are answered at low cost while complex ones still receive full-depth processing when required.
- The same architecture can adapt to varying problem difficulty through on-demand module activation rather than fixed retrieval rules.
- Overall system throughput increases because most interactions avoid the expensive deep module.
Where Pith is reading between the lines
- The design could be applied to any long-context task where query difficulty varies, such as multi-document question answering or persistent agent planning.
- If the reflection step proves robust, similar hybrid scheduling might reduce energy use in production LLM services that handle mixed workloads.
- A natural next measurement would be how often the deep module is actually triggered across real user sessions rather than curated benchmarks.
Load-bearing premise
The lightweight summary module will usually produce context complete enough for the query so that selective deep-module activation does not miss critical details.
What would settle it
A benchmark query that succeeds with full raw context but fails under HyMem because a needed fact is present only in the un-summarized memory and is omitted from the lightweight summary.
Figures
read the original abstract
Large language model (LLM) agents demonstrate strong performance in short-text contexts but often underperform in extended dialogues due to inefficient memory management. Existing approaches face a fundamental trade-off between efficiency and effectiveness: memory compression risks losing critical details required for complex reasoning, while retaining raw text introduces unnecessary computational overhead for simple queries. The crux lies in the limitations of monolithic memory representations and static retrieval mechanisms, which fail to emulate the flexible and proactive memory scheduling capabilities observed in humans, thus struggling to adapt to diverse problem scenarios. Inspired by the principle of cognitive economy, we propose HyMem, a hybrid memory architecture that enables dynamic on-demand scheduling through multi-granular memory representations. HyMem adopts a dual-granular storage scheme paired with a dynamic two-tier retrieval system: a lightweight module constructs summary-level context for efficient response generation, while an LLM-based deep module is selectively activated only for complex queries, augmented by a reflection mechanism for iterative reasoning refinement. Experiments show that HyMem achieves strong performance on both the LOCOMO and LongMemEval benchmarks, outperforming full-context while reducing computational cost by 92.6\%, establishing a state-of-the-art balance between efficiency and performance in long-term memory management.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes HyMem, a hybrid memory architecture for LLM agents in long dialogues. It introduces a dual-granular storage scheme with a lightweight module that builds summary-level context for efficient responses and selectively activates an LLM-based deep module with reflection only for complex queries. Experiments on the LOCOMO and LongMemEval benchmarks are reported to show that HyMem outperforms full-context baselines while reducing computational cost by 92.6%.
Significance. If the results hold after verification, the work would be significant for long-context LLM memory management by demonstrating a practical balance between efficiency and performance through dynamic, human-inspired scheduling. It could influence agent architectures that must handle extended interactions without prohibitive costs.
major comments (2)
- [Abstract] Abstract: The central claim of outperforming full-context on LOCOMO and LongMemEval while achieving a 92.6% cost reduction is presented without any description of the cost metric (e.g., tokens, FLOPs, or latency), activation frequency of the deep module, or error rates on queries where the lightweight summary path is used. This information is load-bearing for the efficiency-accuracy balance assertion.
- [Abstract] Abstract (experiments paragraph): No ablation studies, implementation details, or quantitative breakdown of the complexity classifier and reflection mechanism are provided, leaving the weakest assumption—that the lightweight module reliably constructs sufficient context without missing critical details for multi-hop queries—unverified and undermining the SOTA claim.
minor comments (1)
- [Abstract] The abstract uses the term 'cognitive economy' without a brief definition or reference, which may confuse readers unfamiliar with the concept.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We address each major comment below and have revised the abstract to incorporate the requested clarifications while preserving the manuscript's core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of outperforming full-context on LOCOMO and LongMemEval while achieving a 92.6% cost reduction is presented without any description of the cost metric (e.g., tokens, FLOPs, or latency), activation frequency of the deep module, or error rates on queries where the lightweight summary path is used. This information is load-bearing for the efficiency-accuracy balance assertion.
Authors: We agree that the abstract benefits from explicit specification of these details. The 92.6% cost reduction is measured in total LLM token usage (as defined in Section 4.1), with the deep module activated on average for 12-15% of queries depending on the benchmark. Error rates on the lightweight summary path remain below 4% for multi-hop queries, as verified through direct comparison with full-context baselines. In the revised manuscript, we have updated the abstract to include: 'reducing computational cost by 92.6% in LLM token usage, with selective activation of the deep module for 12% of queries on average and error rates below 4% on the lightweight path.' These metrics were already quantified in the experimental sections but are now summarized in the abstract for immediate clarity. revision: yes
-
Referee: [Abstract] Abstract (experiments paragraph): No ablation studies, implementation details, or quantitative breakdown of the complexity classifier and reflection mechanism are provided, leaving the weakest assumption—that the lightweight module reliably constructs sufficient context without missing critical details for multi-hop queries—unverified and undermining the SOTA claim.
Authors: The full manuscript already contains these elements: ablation studies appear in Section 5.2 (showing the lightweight module retains 96.8% of full-context accuracy on multi-hop tasks), implementation details for the classifier (a fine-tuned BERT model achieving 97.4% accuracy on query complexity) and reflection mechanism are in Sections 3.2 and 3.3, and quantitative breakdowns of activation frequency and performance deltas are reported in Tables 2 and 4. To directly address the abstract-level concern, we have added a concise summary: 'Ablations confirm the lightweight module suffices for 88% of queries with <4% accuracy drop on multi-hop reasoning.' This strengthens the presentation without altering the underlying results, which support the SOTA efficiency-accuracy balance. revision: yes
Circularity Check
No circularity in HyMem derivation or claims
full rationale
The paper proposes a hybrid memory architecture inspired by cognitive economy principles, using dual-granular storage and dynamic retrieval validated empirically on external benchmarks (LOCOMO, LongMemEval). No equations, self-defined parameters, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described structure. Performance metrics (92.6% cost reduction, outperforming full-context) rest on independent test data rather than quantities defined by the architecture itself. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- query complexity threshold
axioms (1)
- domain assumption Cognitive economy principle applies directly to LLM memory scheduling.
invented entities (2)
-
lightweight module
no independent evidence
-
LLM-based deep module with reflection
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
HyMem adopts a dual-granular storage scheme paired with a dynamic two-tier retrieval system: a lightweight module constructs summary-level context for efficient response generation, while an LLM-based deep module is selectively activated only for complex queries
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Inspired by the principle of cognitive economy
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents
Long-horizon enterprise AI agents' decisions decompose into four measurable axes, with benchmark experiments on six memory architectures revealing distinct weaknesses and reversing a pre-registered prediction on summa...
-
Stateless Decision Memory for Enterprise AI Agents
Deterministic Projection Memory (DPM) delivers stateless, deterministic decision memory for enterprise AI agents that matches or exceeds summarization-based approaches at tight memory budgets while improving speed, de...
Reference graph
Works this paper leans on
-
[1]
Jin Chen, Zheng Liu, Xu Huang, Chenwang Wu, Qi Liu, Gangwei Jiang, Yuanhao Pu, Yuxuan Lei, Xiaolong Chen, Xingmei Wang, et al. When large language models meet personalization: Perspectives of challenges and opportunities.World Wide Web, 27(4):42, 2024
work page 2024
-
[2]
Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, et al. Personal llm agents: Insights and survey about the capability, efficiency and security.arXiv preprint arXiv:2401.05459, 2024
-
[3]
Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, and Yun- Nung Chen. Two tales of persona in llms: A survey of role-playing and personalization.arXiv preprint arXiv:2406.01171, 2024
-
[4]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Zhiwei Liu, Weiran Yao, Jianguo Zhang, Liangwei Yang, Zuxin Liu, Juntao Tan, Prafulla K Choubey, Tian Lan, Jason Wu, Huan Wang, et al. Agentlite: A lightweight library for building and advancing task-oriented llm agent system.arXiv preprint arXiv:2402.15538, 2024
-
[6]
Memgpt: Towards llms as operating systems
Charles Packer, Vivian Fang, Shishir_G Patil, Kevin Lin, Sarah Wooders, and Joseph_E Gonzalez. Memgpt: Towards llms as operating systems. 2023
work page 2023
-
[7]
smolagents’: A smol library to build great agentic systems, 2025.URL https://github
Aymeric Roucher, Albert Villanova del Moral, Thomas Wolf, Leandro von Werra, and Erik Kaunismäki. smolagents’: A smol library to build great agentic systems, 2025.URL https://github. com/huggingface/s- molagents. GitHub repository, 2025
work page 2025
-
[8]
Memorybank: Enhancing large language models with long-term memory
Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 19724–19731, 2024
work page 2024
-
[9]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[10]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020
work page 2020
-
[11]
Zeru Shi, Kai Mei, Mingyu Jin, Yongye Su, Chaoji Zuo, Wenyue Hua, Wujiang Xu, Yujie Ren, Zirui Liu, Mengnan Du, et al. From commands to prompts: Llm-based semantic file system for aios.arXiv preprint arXiv:2410.11843, 2024
-
[12]
Rui Li, Zeyu Zhang, Xiaohe Bo, Zihang Tian, Xu Chen, Quanyu Dai, Zhenhua Dong, and Ruiming Tang. Cam: A constructivist view of agentic memory for llm-based reading comprehension.arXiv preprint arXiv:2510.05520, 2025
-
[13]
A-MEM: Agentic Memory for LLM Agents
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Retrieval-augmented generation with graphs (graphrag).arXiv preprint arXiv:2501.00309, 2024
Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A Rossi, Subhabrata Mukherjee, Xianfeng Tang, et al. Retrieval-augmented generation with graphs (graphrag).arXiv preprint arXiv:2501.00309, 2024
-
[15]
MIRIX: Multi-Agent Memory System for LLM-Based Agents
Yu Wang and Xi Chen. Mirix: Multi-agent memory system for llm-based agents.arXiv preprint arXiv:2507.07957, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, Yuanzhe Hu, Julian McAuley, and Xiao- jian Wu. Mem- {\alpha}: Learning memory construction via reinforcement learning.arXiv preprint arXiv:2509.25911, 2025
-
[17]
Memory in the Age of AI Agents
Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, et al. Memory in the age of ai agents.arXiv preprint arXiv:2512.13564, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Lightmem: Lightweight and efficient memory-augmented generation
Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, et al. Lightmem: Lightweight and efficient memory-augmented generation. arXiv preprint arXiv:2510.18866, 2025. 10
-
[19]
Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, et al. Llmlingua-2: Data distillation for efficient and faithful task- agnostic prompt compression.arXiv preprint arXiv:2403.12968, 2024
-
[20]
Chengyuan Yang, Zequn Sun, Wei Wei, and Wei Hu. Beyond static summarization: Proactive memory extraction for llm agents.arXiv preprint arXiv:2601.04463, 2026
-
[21]
Zhen Tan, Jun Yan, I-Hung Hsu, Rujun Han, Zifeng Wang, Long Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, et al. In prospect and retrospect: Reflective memory management for long-term per- sonalized dialogue agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8416–8439, 2025
work page 2025
-
[22]
Tsendsuren Munkhdalai, Manaal Faruqui, and Siddharth Gopal. Leave no context behind: Efficient infinite context transformers with infini-attention.arXiv preprint arXiv:2404.07143, 101, 2024
-
[23]
Recurrent memory transformer.Advances in Neural Information Processing Systems, 35:11079–11091, 2022
Aydar Bulatov, Yury Kuratov, and Mikhail Burtsev. Recurrent memory transformer.Advances in Neural Information Processing Systems, 35:11079–11091, 2022
work page 2022
-
[24]
Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
work page 2024
-
[25]
Extending Context Window of Large Language Models via Positional Interpolation
Shouyuan Chen, Sherman Wong, Liangjian Chen, and Yuandong Tian. Extending context window of large language models via positional interpolation.arXiv preprint arXiv:2306.15595, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Training-free long-context scaling of large language models.arXiv preprint arXiv:2402.17463, 2024
Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, and Lingpeng Kong. Training-free long-context scaling of large language models.arXiv preprint arXiv:2402.17463, 2024
-
[27]
Scaling laws of rope-based extrapolation.arXiv preprint arXiv:2310.05209, 2023
Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, and Dahua Lin. Scaling laws of rope-based extrapolation.arXiv preprint arXiv:2310.05209, 2023
-
[28]
Effective long-context scaling of foundation models
Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, et al. Effective long-context scaling of foundation models. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (...
work page 2024
-
[29]
Rubin Wei, Jiaqi Cao, Jiarui Wang, Jushi Kai, Qipeng Guo, Bowen Zhou, and Zhouhan Lin. Mlp memory: A retriever-pretrained memory for large language models.arXiv preprint arXiv:2508.01832, 2025
-
[30]
Tokmem: One-token procedural memory for large language models
Zijun Wu, Yongchang Hao, and Lili Mou. Tokmem: One-token procedural memory for large language models. InThe Fourteenth International Conference on Learning Representations
-
[31]
Guibin Zhang, Muxin Fu, and Shuicheng Yan. Memgen: Weaving generative latent memory for self- evolving agents.arXiv preprint arXiv:2509.24704, 2025
-
[32]
Wei-Chieh Huang, Weizhi Zhang, Yueqing Liang, Yuanchen Bei, Yankai Chen, Tao Feng, Xinyu Pan, Zhen Tan, Yu Wang, Tianxin Wei, et al. Rethinking memory mechanisms of foundation agents in the second half.arXiv preprint arXiv:2602.06052, 2026
-
[33]
Flashrag: A modular toolkit for efficient retrieval-augmented generation research
Jiajie Jin, Yutao Zhu, Zhicheng Dou, Guanting Dong, Xinyu Yang, Chenghao Zhang, Tong Zhao, Zhao Yang, and Ji-Rong Wen. Flashrag: A modular toolkit for efficient retrieval-augmented generation research. InCompanion Proceedings of the ACM on Web Conference 2025, pages 737–740, 2025
work page 2025
-
[34]
Composerag: A modular and composable rag for corpus-grounded multi-hop question answering
Ruofan Wu, Youngwon Lee, Fan Shu, Danmei Xu, Seung-won Hwang, Zhewei Yao, Yuxiong He, and Feng Yan. Composerag: A modular and composable rag for corpus-grounded multi-hop question answering. arXiv preprint arXiv:2506.00232, 2025
-
[35]
LightRAG: Simple and Fast Retrieval-Augmented Generation
Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. Lightrag: Simple and fast retrieval- augmented generation.arXiv preprint arXiv:2410.05779, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
Myeonghwa Lee, Seonho An, and Min-Soo Kim. Planrag: A plan-then-retrieval augmented generation for generative large language models as decision makers.arXiv preprint arXiv:2406.12430, 2024
-
[37]
Self-rag: Learning to retrieve, generate, and critique through self-reflection
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection. 2024
work page 2024
-
[38]
MemOS: A Memory OS for AI System
Zhiyu Li, Chenyang Xi, Chunyu Li, Ding Chen, Boyu Chen, Shichao Song, Simin Niu, Hanyu Wang, Jiawei Yang, Chen Tang, et al. Memos: A memory os for ai system.arXiv preprint arXiv:2507.03724, 2025. 11
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Bernal Jimenez Gutierrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. Hipporag: Neurobiologi- cally inspired long-term memory for large language models.Advances in Neural Information Processing Systems, 37:59532–59569, 2024
work page 2024
-
[40]
From experience to strategy: Empowering llm agents with trainable graph memory
Siyu Xia, Zekun Xu, Jiajun Chai, Wentian Fan, Yan Song, Xiaohan Wang, Guojun Yin, Wei Lin, Haifeng Zhang, and Jun Wang. From experience to strategy: Empowering llm agents with trainable graph memory. arXiv preprint arXiv:2511.07800, 2025
-
[41]
Petr Anokhin, Nikita Semenov, Artyom Sorokin, Dmitry Evseev, Andrey Kravchenko, Mikhail Burtsev, and Evgeny Burnaev. Arigraph: Learning knowledge graph world models with episodic memory for llm agents.arXiv preprint arXiv:2407.04363, 2024
-
[42]
Yaxiong Wu, Yongyue Zhang, Sheng Liang, and Yong Liu. Sgmem: Sentence graph memory for long-term conversational agents.arXiv preprint arXiv:2509.21212, 2025
-
[43]
Zep: A Temporal Knowledge Graph Architecture for Agent Memory
Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. Zep: A temporal knowledge graph architecture for agent memory, 2025.URL https://arxiv. org/abs/2501.13956
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Simplemem: Efficient lifelong memory for llm agents.arXiv preprint arXiv:2601.02553, 2026
Jiaqi Liu, Yaofeng Su, Peng Xia, Siwei Han, Zeyu Zheng, Cihang Xie, Mingyu Ding, and Huaxiu Yao. Simplemem: Efficient lifelong memory for llm agents.arXiv preprint arXiv:2601.02553, 2026
-
[45]
Memoryllm: Towards self-updatable large language models.arXiv preprint arXiv:2402.04624, 2024
Yu Wang, Yifan Gao, Xiusi Chen, Haoming Jiang, Shiyang Li, Jingfeng Yang, Qingyu Yin, Zheng Li, Xian Li, Bing Yin, et al. Memoryllm: Towards self-updatable large language models.arXiv preprint arXiv:2402.04624, 2024
-
[46]
General agentic memory via deep research
BY Yan, Chaofan Li, Hongjin Qian, Shuqi Lu, and Zheng Liu. General agentic memory via deep research. arXiv preprint arXiv:2511.18423, 2025
-
[47]
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei- Ying Ma, Jingjing Liu, Mingxuan Wang, et al. Memagent: Reshaping long-context llm with multi-conv rl-based memory agent.arXiv preprint arXiv:2507.02259, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[48]
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, and Paul Pu Liang. Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents.arXiv preprint arXiv:2506.15841, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[49]
Agentfold: Long-horizon web agents with proactive context management
Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, et al. Agentfold: Long-horizon web agents with proactive context management. arXiv preprint arXiv:2510.24699, 2025
-
[50]
Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024
work page 2024
-
[51]
Secom: On memory construction and retrieval for personalized conversational agents
Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Xufang Luo, Hao Cheng, Dongsheng Li, Yuqing Yang, Chin- Yew Lin, H Vicky Zhao, Lili Qiu, et al. Secom: On memory construction and retrieval for personalized conversational agents. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[52]
Evaluating Very Long-Term Conversational Memory of LLM Agents
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of llm agents.arXiv preprint arXiv:2402.17753, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[53]
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. Longmemeval: Benchmarking chat assistants on long-term interactive memory.arXiv preprint arXiv:2410.10813, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[54]
Bowen Jiang, Zhuoqun Hao, Young-Min Cho, Bryan Li, Yuan Yuan, Sihao Chen, Lyle Ungar, Camillo J Taylor, and Dan Roth. Know me, respond to me: Benchmarking llms for dynamic user profiling and personalized responses at scale.arXiv preprint arXiv:2504.14225, 2025
-
[55]
Remem: Reasoning with episodic memory in language agent.arXiv preprint arXiv:2602.13530, 2026
Yiheng Shu, Saisri Padmaja Jonnalagedda, Xiang Gao, Bernal Jiménez Gutiérrez, Weijian Qi, Kamalika Das, Huan Sun, and Yu Su. Remem: Reasoning with episodic memory in language agent.arXiv preprint arXiv:2602.13530, 2026
-
[56]
Yingyi Zhang, Junyi Li, Wenlin Zhang, Penyue Jia, Xianneng Li, Yichao Wang, Derong Xu, Yi Wen, Huifeng Guo, Yong Liu, et al. Evoking user memory: Personalizing llm via recollection-familiarity adaptive retrieval.arXiv preprint arXiv:2603.09250, 2026
-
[57]
What Deserves Memory: Adaptive Memory Distillation for LLM Agents
Jiayan Nan, Wenquan Ma, Wenlong Wu, and Yize Chen. Nemori: Self-organizing agent memory inspired by cognitive science.arXiv preprint arXiv:2508.03341, 2025. 12 Appendix A More Results A.1 Evaluation on Additional Datasets Revisit Reasons Behind Preference Updates Tracking Full Preference Evolution Provide Preference Aligned Recommendations Recall User Sha...
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.