pith. sign in

arxiv: 2605.20616 · v1 · pith:QYEVXGGJnew · submitted 2026-05-20 · 💻 cs.CL

Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents

Pith reviewed 2026-05-21 05:33 UTC · model grok-4.3

classification 💻 cs.CL
keywords memory consolidationlanguage agentsoffline learningagent memory managementreinforcement learningtask generalizationcomplementary learning systems
0
0 comments X

The pith

Auto-Dreamer trains an offline consolidator that abstracts language-agent memories into smaller reusable sets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Language agents gather detailed memories during individual task sessions yet rarely convert those into compact knowledge that works across many sessions. Auto-Dreamer decouples quick per-session recording from slower cross-session review by training a consolidator model that inspects selected memory regions and their source trajectories. The model then replaces the original entries with fewer, more abstract ones. When trained only on ScienceWorld data, the resulting system raises agent success rates while shrinking the active memory bank by a factor of twelve compared with strong baselines, and the same consolidator continues to lead on unseen environments without any retraining.

Core claim

The central claim is that an offline consolidator, trained end-to-end with agent task performance as the reward, can treat a typed memory region and its provenance-linked trajectories as read-only evidence, perform limited tool calls to inspect them, and synthesize a compact replacement set that abstracts recurring patterns and improves or maintains downstream performance.

What carries the argument

The learned offline consolidator that uses bounded tool-use on provenance-linked trajectories to synthesize compact replacements for selected memory-bank regions.

If this is right

  • Agents achieve higher task success with an active memory bank twelve times smaller than the strongest baseline on ScienceWorld.
  • The same consolidator transfers to ALFWorld and WebArena without retraining and uses six times less memory than the strongest baseline on ALFWorld.
  • Decoupling acquisition from consolidation gives the agent a global view across sessions for discovering shared procedures.
  • End-to-end performance reward can teach the consolidator which abstractions are worth keeping.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could extend to agents that maintain memories over much longer streams of tasks by periodically running the consolidator.
  • Typed memory banks with explicit provenance links may become a standard interface for any offline memory system.
  • Biological complementary-learning-systems ideas may translate to practical gains in other sequential decision domains beyond language agents.

Load-bearing premise

The typed memory structure and linked trajectories supply enough read-only evidence for the consolidator to create smaller replacements that preserve necessary details for future tasks.

What would settle it

An experiment in which an agent using the consolidated memory bank records lower success rates on the original training tasks than an agent using the unconsolidated bank.

Figures

Figures reproduced from arXiv: 2605.20616 by Chongrui Ye, Ge Liu, Haofei Yu, Jiaxuan You, Julian McAuley, Yining Zhao, Yu Wang, Yuxiang Liu.

Figure 1
Figure 1. Figure 1: Memory primitives and operations. (A) The memory bank B holds typed entries (semantic or procedural); each entry has a short name ni , a body si , and provenance links to source trajectories in the trajectory log T . (B) The read operator retrieves the top-K entries by cosine similarity between a frozen sentence encoder ϕ applied to the query and to each entry’s name-body text. (C) The write operator appli… view at source ↗
Figure 2
Figure 2. Figure 2: Auto-Dreamer overview. (A) A frozen writer appends typed entries from each trajectory τt to the memory bank B. (B) Every k sessions, the consolidator Cθ rewrites a working region R into a replacement set S via tool-use rollout over memory and provenance trajectories. (C) Training: G group rollouts produce candidates {Sg}, scored on evaluation tasks V; GRPO updates θ using reward rg = RV (Sg) + α rcf(Sg; V)… view at source ↗
Figure 3
Figure 3. Figure 3: Memory efficiency, reward ablation, and consolidator analysis. (a) Auto-Dreamer lies on the Pareto frontier of task success versus retrieval-time memory cost. (b) Auto-Dreamer maintains a compact memory bank while most baseline methods grow monotonically as the task stream lengthens. (c,d) The counterfactual utility reward preserves task performance while bounding bank growth during training. (e) Provenanc… view at source ↗
read the original abstract

Language agents increasingly operate over streams of related tasks, yet existing memory systems struggle to convert accumulated experience into reusable knowledge. Retrieval-augmented and structured memory methods record per-session observations effectively, but often couple acquisition and consolidation into a single online process, leaving the agent without a global view across sessions to discover recurring patterns, abstract shared procedures, or prune redundant entries. Inspired by complementary learning systems theory, we propose Auto-Dreamer, a learned offline consolidator for language-agent memory. Auto-Dreamer decouples fast per-session memory acquisition from slow cross-session consolidation. Given a selected working region of a typed memory bank, the consolidator treats the region as read-only evidence, performs bounded tool-use to inspect entries and provenance-linked source trajectories, and synthesizes a fresh compact replacement set that abstracts across sessions and supersedes the original region. We train Auto-Dreamer via GRPO, using end-to-end agent performance as the reward signal to learn how to consolidate memories acquired through fast online experience. Trained on ScienceWorld trajectories alone, Auto-Dreamer outperforms fixed, RL-trained, and prompted memory baselines on ScienceWorld by 7 points while using an active memory bank 12$\times$ smaller than the strongest baseline, and continues to lead on held-out ALFWorld and WebArena without retraining -- using 6$\times$ less memory than the strongest baseline on ALFWorld.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Auto-Dreamer, a learned offline memory consolidator for language agents that decouples fast per-session acquisition from slow cross-session consolidation. The consolidator treats selected regions of a typed memory bank as read-only evidence, uses bounded tool calls to inspect entries and provenance-linked trajectories, and synthesizes compact replacements. It is trained via GRPO with end-to-end agent performance as the reward signal. Trained solely on ScienceWorld trajectories, the method reports a 7-point improvement over fixed, RL-trained, and prompted baselines while using a 12× smaller active memory bank, and maintains leadership on held-out ALFWorld and WebArena without retraining (6× memory reduction on ALFWorld).

Significance. If the results hold under rigorous controls, the work offers a concrete mechanism for offline memory consolidation in language agents, directly addressing the limitation of online methods that lack a global view across sessions. The use of GRPO with end-to-end performance reward and the explicit separation of acquisition and consolidation phases are notable strengths that avoid circular training signals and enable potential reuse across environments.

major comments (2)
  1. Experimental evaluation (transfer results): The headline generalization claim to ALFWorld and WebArena rests on the assumption that the GRPO-trained consolidator learns an environment-agnostic abstraction procedure. However, no ablation is reported that compares the trained consolidator against a zero-shot prompted consolidation baseline on the held-out domains. Without this control, the 6–12× memory reductions and performance gains could be explained by shared procedural patterns (object manipulation, state tracking) already present in the base LLM rather than the learned policy.
  2. Experimental evaluation (ScienceWorld results): The reported 7-point gain and 12× memory reduction are presented without details on run-to-run variance, number of seeds, statistical significance testing, or the precise protocol for evaluating post-consolidation agent trajectories. These omissions make it impossible to assess whether the gains are robust or sensitive to evaluation choices.
minor comments (2)
  1. The description of the typed memory bank and provenance links would benefit from a concrete example (e.g., a short table or figure) showing an original region, the tool-use steps, and the synthesized replacement set.
  2. Notation for the working region selection and the bounded tool-use budget is introduced without a formal definition or pseudocode; adding these would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, agreeing where the manuscript can be strengthened and outlining specific revisions.

read point-by-point responses
  1. Referee: Experimental evaluation (transfer results): The headline generalization claim to ALFWorld and WebArena rests on the assumption that the GRPO-trained consolidator learns an environment-agnostic abstraction procedure. However, no ablation is reported that compares the trained consolidator against a zero-shot prompted consolidation baseline on the held-out domains. Without this control, the 6–12× memory reductions and performance gains could be explained by shared procedural patterns (object manipulation, state tracking) already present in the base LLM rather than the learned policy.

    Authors: We agree that an explicit zero-shot prompted consolidation baseline on the held-out domains would provide stronger evidence that the performance and memory gains stem from the learned GRPO policy rather than base LLM capabilities. The current manuscript already evaluates a prompted consolidation baseline on ScienceWorld (where the trained model outperforms it) and demonstrates that the ScienceWorld-trained consolidator leads on ALFWorld and WebArena without any retraining. To directly address the concern, we will add the requested ablation in the revised manuscript, applying the identical zero-shot prompted consolidator to the held-out environments and reporting the resulting performance and memory sizes for comparison. revision: yes

  2. Referee: Experimental evaluation (ScienceWorld results): The reported 7-point gain and 12× memory reduction are presented without details on run-to-run variance, number of seeds, statistical significance testing, or the precise protocol for evaluating post-consolidation agent trajectories. These omissions make it impossible to assess whether the gains are robust or sensitive to evaluation choices.

    Authors: We acknowledge that these experimental details are important for evaluating robustness. The manuscript reports mean improvements, but we will expand the experimental section in the revision to specify the number of random seeds (5), include standard deviations, report statistical significance (paired t-tests against baselines), and provide a precise description of the post-consolidation evaluation protocol, including how working memory regions are selected, how agent trajectories are generated and scored after consolidation, and any fixed hyperparameters used during evaluation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external reward signal and held-out evaluation

full rationale

The paper trains Auto-Dreamer via GRPO using end-to-end agent performance on ScienceWorld trajectories as the explicit reward signal, then evaluates the resulting consolidator directly on task success metrics against fixed, RL-trained, and prompted baselines. Generalization claims to ALFWorld and WebArena are measured on held-out environments without retraining, using the same external performance metric rather than any internally fitted quantity. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the described chain; the typed memory bank and provenance links function as read-only inputs to a learned policy whose quality is validated externally.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that a typed memory bank with provenance allows bounded inspection to produce useful abstractions; no explicit free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Complementary learning systems theory provides a useful analogy for separating fast acquisition from slow consolidation in artificial agents.
    Abstract states the method is inspired by this theory.

pith-pipeline@v0.9.0 · 5802 in / 1214 out tokens · 51972 ms · 2026-05-21T05:33:56.630882+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 20 internal anchors

  1. [1]

    Deci- sionFlow: Advancing large language model as principled decision maker

    Xiusi Chen, Shanyong Wang, Cheng Qian, Hongru Wang, Peixuan Han, and Heng Ji. Deci- sionFlow: Advancing large language model as principled decision maker. InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 16668–16692. Association for Computational Linguistics, November 2025. ISBN 979-8-89176-335-7

  2. [2]

    Mem0: Building production-ready AI agents with scalable long-term memory

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready AI agents with scalable long-term memory. InEuropean Conference on Artificial Intelligence (ECAI), 2025

  3. [3]

    Gemini 3 Flash model card

    Google DeepMind. Gemini 3 Flash model card. Technical report, Google DeepMind, December

  4. [4]

    URLhttps://deepmind.google/models/model-cards/gemini-3-flash/

  5. [5]

    Gemini 3.1 Flash-Lite model card

    Google DeepMind. Gemini 3.1 Flash-Lite model card. Technical report, Google DeepMind, March 2026. URL https://deepmind.google/models/model-cards/ gemini-3-1-flash-lite/

  6. [6]

    LightMem: Lightweight and Efficient Memory-Augmented Generation

    Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, and Ningyu Zhang. LightMem: Lightweight and efficient memory-augmented generation.arXiv preprint arXiv:2510.18866, 2025

  7. [7]

    Memp: Exploring Agent Procedural Memory

    Runnan Fang, Yuan Liang, Xiaobin Wang, Jialong Wu, Shuofei Qiao, Pengjun Xie, Fei Huang, Huajun Chen, and Ningyu Zhang. Memp: Exploring agent procedural memory.arXiv preprint arXiv:2508.06433, 2025. 11

  8. [8]

    Dream to control: Learning behaviors by latent imagination

    Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representa- tions (ICLR), 2020

  9. [9]

    Mastering Diverse Domains through World Models

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models.arXiv preprint arXiv:2301.04104, 2023

  10. [10]

    arXiv:2601.02163 [cs.AI] https: //arxiv.org/abs/2601.02163

    Chuanrui Hu, Xingze Gao, Zuyi Zhou, Dannong Xu, Yi Bai, Xintong Li, Hui Zhang, Tong Li, Chong Zhang, Lidong Bing, et al. Evermemos: A self-organizing memory operating system for structured long-horizon reasoning.arXiv preprint arXiv:2601.02163, 2026

  11. [11]

    Memory in the Age of AI Agents

    Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, et al. Memory in the age of ai agents.arXiv preprint arXiv:2512.13564, 2025

  12. [12]

    Rethinking memory mechanisms of foundation agents in the second half.arXiv preprint arXiv:2602.06052, 2026

    Wei-Chieh Huang, Weizhi Zhang, Yueqing Liang, Yuanchen Bei, Yankai Chen, Tao Feng, Xinyu Pan, Zhen Tan, Yu Wang, Tianxin Wei, et al. Rethinking memory mechanisms of foundation agents in the second half.arXiv preprint arXiv:2602.06052, 2026

  13. [13]

    McClelland

    Dharshan Kumaran, Demis Hassabis, and James L. McClelland. What learning systems do intelligent agents need? Complementary learning systems theory updated.Trends in Cognitive Sciences, 20(7):512–534, 2016. doi: 10.1016/j.tics.2016.05.004

  14. [14]

    MemOS: A Memory OS for AI System

    Zhiyu Li, Chenyang Xi, Chunyu Li, Ding Chen, Boyu Chen, Shichao Song, Simin Niu, Hanyu Wang, Jiawei Yang, Chen Tang, et al. Memos: A memory os for ai system.arXiv preprint arXiv:2507.03724, 2025

  15. [15]

    Gonzalez

    Kevin Lin, Charlie Snell, Yu Wang, Charles Packer, Sarah Wooders, Ion Stoica, and Joseph E. Gonzalez. Sleep-time compute: Beyond inference scaling at test-time.arXiv preprint arXiv:2504.13171, 2025

  16. [16]

    Simplemem: Efficient lifelong memory for llm agents.arXiv preprint arXiv:2601.02553, 2026

    Jiaqi Liu, Yaofeng Su, Peng Xia, Siwei Han, Zeyu Zheng, Cihang Xie, Mingyu Ding, and Huaxiu Yao. Simplemem: Efficient lifelong memory for llm agents.arXiv preprint arXiv:2601.02553, 2026

  17. [17]

    Self-refine: Iterative refinement with self-feedback.Advances in neural information processing systems, 36:46534–46594, 2023

    Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback.Advances in neural information processing systems, 36:46534–46594, 2023

  18. [18]

    McClelland, Bruce L

    James L. McClelland, Bruce L. McNaughton, and Randall C. O’Reilly. Why there are com- plementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory.Psychological Review, 102(3): 419–457, 1995. doi: 10.1037/0033-295X.102.3.419

  19. [19]

    Integration of new information in memory: new insights from a complementary learning systems perspective

    James L McClelland, Bruce L McNaughton, and Andrew K Lampinen. Integration of new information in memory: new insights from a complementary learning systems perspective. Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1799), 2020

  20. [20]

    What Deserves Memory: Adaptive Memory Distillation for LLM Agents

    Jiayan Nan, Wenquan Ma, Wenlong Wu, and Yize Chen. Nemori: Self-organizing agent memory inspired by cognitive science.arXiv preprint arXiv:2508.03341, 2025

  21. [21]

    ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

    Siru Ouyang, Jun Yan, I-Hung Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T. Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, and Tomas Pfister. ReasoningBank: Scaling agent self-evolving with reasoning memory.arXiv preprint arXiv:2509.25140, 2025

  22. [22]

    MemGPT: Towards LLMs as Operating Systems

    Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. MemGPT: Towards LLMs as operating systems.arXiv preprint arXiv:2310.08560, 2023

  23. [23]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. DeepSeekMath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. 12

  24. [24]

    Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

  25. [25]

    ALFWorld: Aligning text and embodied environments for interactive learning

    Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. ALFWorld: Aligning text and embodied environments for interactive learning. InInternational Conference on Learning Representations (ICLR), 2021

  26. [26]

    Qwen3 Technical Report

    Qwen Team. Qwen3 technical report, 2025. URLhttps://arxiv.org/abs/2505.09388

  27. [27]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023

  28. [28]

    A survey on large language model based autonomous agents

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Jirong Wen. A survey on large language model based autonomous agents.Frontiers Comput. Sci., 18(6): 186345, 2024. doi: 10.1007/S11704-024-40231-1. URL https://doi.org/10.1007/ s11704-024-40231-1

  29. [29]

    ScienceWorld: Is your agent smarter than a 5th grader? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022

    Ruoyao Wang, Peter Jansen, Marc-Alexandre Côté, and Prithviraj Ammanabrolu. ScienceWorld: Is your agent smarter than a 5th grader? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022

  30. [30]

    MIRIX: Multi-Agent Memory System for LLM-Based Agents

    Yu Wang and Xi Chen. MIRIX: Multi-agent memory system for LLM-based agents.arXiv preprint arXiv:2507.07957, 2025

  31. [31]

    Mem-{\alpha}: Learning Memory Construction via Reinforcement Learning

    Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, Yuanzhe Hu, Julian McAuley, and Xiaojian Wu. Mem- α: Learning memory construction via reinforcement learning.arXiv preprint arXiv:2509.25911, 2025

  32. [32]

    Agent Workflow Memory

    Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, and Graham Neubig. Agent workflow memory. arXiv preprint arXiv:2409.07429, 2024

  33. [33]

    Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, Xuying Ning, Mengting Ai, Yunzhe Li, Jingrui He, Ed H. Chi, Chi Wang, Shuo Chen, Fernando Pereira, Wang-Cheng Kang, and Derek Zhiyuan Cheng. Evo-Memory: Benchmarking LLM agent test-time learning with self-evolving memory.arXiv preprint arXiv:2511.20857, 2025

  34. [34]

    The rise and potential of large language model based agents: a survey.Sci

    Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, Qi Zhang, and Tao Gui. Th...

  35. [35]

    A-MEM: Agentic memory for LLM agents

    Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-MEM: Agentic memory for LLM agents. InAdvances in Neural Information Processing Systems (NeurIPS), 2025

  36. [36]

    Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

    Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Jinhe Bi, Kristian Kersting, Jeff Z. Pan, Hinrich Schütze, V olker Tresp, and Yunpu Ma. Memory-R1: Enhancing large language model agents to manage and utilize memories via reinforcement learning.arXiv preprint arXiv:2508.19828, 2025

  37. [37]

    Plugmem: A task-agnostic plugin memory module for llm agents.arXiv preprint arXiv:2603.03296, 2026

    Ke Yang, Zixi Chen, Xuan He, Jize Jiang, Michel Galley, Chenglong Wang, Jianfeng Gao, Jiawei Han, and ChengXiang Zhai. Plugmem: A task-agnostic plugin memory module for llm agents.arXiv preprint arXiv:2603.03296, 2026

  38. [38]

    UMEM: Unified memory extraction and management framework for generalizable memory.arXiv preprint arXiv:2602.10652, 2026

    Yongshi Ye, Hui Jiang, Feihu Jiang, Tian Lan, Yichao Du, Biao Fu, Xiaodong Shi, Qianghuai Jia, Longyue Wang, and Weihua Luo. UMEM: Unified memory extraction and management framework for generalizable memory.arXiv preprint arXiv:2602.10652, 2026. 13

  39. [39]

    MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

    Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu, Ya-Qin Zhang, Wei-Ying Ma, Jingjing Liu, Mingxuan Wang, et al. Memagent: Reshaping long-context llm with multi-conv rl-based memory agent.arXiv preprint arXiv:2507.02259, 2025

  40. [40]

    Large language models are semi-parametric reinforcement learning agents.Advances in Neural Information Processing Systems, 36:78227–78239, 2023

    Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, and Kai Yu. Large language models are semi-parametric reinforcement learning agents.Advances in Neural Information Processing Systems, 36:78227–78239, 2023

  41. [41]

    MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

    Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Zhuo Li, Yujie Zheng, Weinan Zhang, Ying Wen, Zhiyu Li, et al. Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory.arXiv preprint arXiv:2601.03192, 2026

  42. [42]

    Learn to memorize: Optimizing llm-based agents with adaptive memory framework.arXiv preprint arXiv:2508.16629, 2025

    Zeyu Zhang, Quanyu Dai, Rui Li, Xiaohe Bo, Xu Chen, and Zhenhua Dong. Learn to memorize: Optimizing llm-based agents with adaptive memory framework.arXiv preprint arXiv:2508.16629, 2025

  43. [43]

    ExpeL: LLM agents are experiential learners

    Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. ExpeL: LLM agents are experiential learners. InProceedings of the AAAI Conference on Artificial Intelligence, 2024

  44. [44]

    Memorybank: Enhancing large language models with long-term memory

    Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. Memorybank: Enhancing large language models with long-term memory. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 19724–19731, 2024

  45. [45]

    WebArena: A Realistic Web Environment for Building Autonomous Agents

    Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. WebArena: A realistic web environment for building autonomous agents. InThe Twelfth International Conference on Learning Representations (ICLR), 2024. URLhttps://arxiv.org/abs/2307.13854

  46. [46]

    MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

    Zijian Zhou, Ao Qu, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, Bryan Kian Hsiang Low, and Paul Pu Liang. Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents.arXiv preprint arXiv:2506.15841, 2025. 14 A Limitations Evaluation scope.Our evaluation is restricted to three text-based agent environments shar...

  47. [47]

    For each message, stop and carefully evaluate before moving to the next

    You MUST process messagesstrictly in ascending sequence_number order. For each message, stop and carefully evaluate before moving to the next. Do NOT reorder, batch-skip, or skip ahead

  48. [48]

    For each, decide whether it contains factual information; if yes extract and rephrase as a standalone sentence; if no (pure greeting/filler) skip

    You MUST process every user message in order. For each, decide whether it contains factual information; if yes extract and rephrase as a standalone sentence; if no (pure greeting/filler) skip. Do NOT skip just because it looks minor

  49. [49]

    Perform light contextual completion so each fact is a standalone statement

  50. [50]

    Use thesequence_number(integer prefix before each message) as thesource_id

  51. [51]

    data": [{

    Output as JSON:{"data": [{"source_id": <id>, "fact": "<complete fact>"}]}. Reminder: Be exhaustive. Unless a message is purely meaningless, extract and output it as a fact. Table 15:LightMem — offline UPDATE/DELETE/IGNORE consolidation prompt You are a memory management assistant. Your task is to decide whether the target memory should be updated, deleted...