pith. sign in

arxiv: 2605.28773 · v1 · pith:ZGSJA7BHnew · submitted 2026-05-27 · 💻 cs.CL · cs.AI· cs.LG· cs.MA· cs.MM

Rethinking Memory as Continuously Evolving Connectivity

Pith reviewed 2026-06-29 12:23 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LGcs.MAcs.MM
keywords LLM agentsmemory augmentationheterogeneous graphsevolving connectivityagentic environmentsdynamic memorytopology refinement
0
0 comments X

The pith

Memory in LLM agents works better when modeled as a connectivity-evolving heterogeneous graph refined across three stages instead of a static repository.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that fixed memory representations and retrieval methods break down in dynamic agent environments because feedback and task changes continuously alter what should be remembered and how elements connect. FluxMem counters this by representing memory as a heterogeneous graph and evolving its topology in three stages: forming initial connections, refining them based on feedback, and consolidating over the long term. A single metric tracks generalizability and evolutionary maturity while the system repairs missing links, removes interfering ones, matches abstraction levels, and turns repeated successes into reusable circuits. Results on LoCoMo, Mind2Web, and GAIA show consistent gains, indicating that treating memory as evolving connectivity supports stronger adaptation without relying on preset structures.

Core claim

FluxMem models memory as a heterogeneous graph and progressively refines its topology through initial connection formation, feedback-driven refinement, and long-term consolidation. It repairs missing links, prunes interference, aligns abstraction granularity, and distills recurrent successful trajectories into reusable procedural circuits, guided by one metric for memory generalizability and evolutionary maturity, which produces state-of-the-art performance on LoCoMo, Mind2Web, and GAIA.

What carries the argument

The heterogeneous graph memory representation together with its three-stage progressive topology refinement process, guided by a single metric of generalizability and evolutionary maturity.

If this is right

  • Agents gain the ability to dynamically repair and prune memory connections in response to ongoing feedback and task variation.
  • Successful trajectories become reusable procedural circuits that reduce repeated computation in similar future tasks.
  • A single guiding metric for generalizability and maturity simplifies oversight of the memory evolution process.
  • Performance remains high across fundamentally different benchmarks that test complex agentic behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-evolution approach could extend to non-agent LLM uses such as maintaining coherent long-context reasoning without fixed retrieval rules.
  • If the three-stage process scales without added cost, it offers a path to reduce reliance on periodic full retraining when environments shift.
  • Treating memory links as the primary object of change rather than stored content alone might generalize to other sequential decision systems.

Load-bearing premise

Progressively refining the topology of a heterogeneous graph memory through three stages guided by one metric will deliver reliable adaptation gains without instability or excessive cost.

What would settle it

If FluxMem fails to reach state-of-the-art results or shows performance instability on any of the LoCoMo, Mind2Web, or GAIA benchmarks under the described conditions, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2605.28773 by Baohua Dong, Buqiang Xu, Feiyu Xiong, Gang Yu, Guozhou Zheng, Hangcheng Zhu, Haofen Wang, Haoliang Cao, Huajun Chen, Jizhan Fang, Ningyu Zhang, Ruohui Huang, Xinle Deng, Ying Wei, Zhixian Wang.

Figure 1
Figure 1. Figure 1: The failures of static memory systems. agents, memory effectiveness ultimately depends on whether the most useful memories can be accessed at each decision step, as sufficiently useful memory context substantially improves subtask success. We formalize such usefulness as a problem of memory connectivity. Drawing from cognitive sci￾ence (Hebb, 2005; Frankland and Bontempi, 2005), we define memory as the lon… view at source ↗
Figure 2
Figure 2. Figure 2: The FluxMem architecture. Stages I and II operate online at a step-wise granularity. Stage III is conducted offline, aiming for immediate performance optimization and long-term memory consolidation, respectively. its full step-by-step trajectory τq = {(ot , at)} T t=1. The three layers are linked in a bottom-up order through two types of edges in E. First, during task execution, the agent retrieves relevan… view at source ↗
Figure 3
Figure 3. Figure 3: Detailed analysis of FluxMem components and evolution dynamics: (a) Ablation study of different stages [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Case Study. The key points have been highlighted in red. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Existing memory-augmented LLM agents often treat memory as a static repository with pre-defined representations and fixed retrieval pipelines, which is brittle in dynamic agentic environments where feedback, task variation, and heterogeneous signals continuously reshape what should be remembered and how it should be connected. To address this, we propose FluxMem, a connectivity-evolving memory framework that models memory as a heterogeneous graph and progressively refines its topology through three stages: initial connection formation, feedback-driven refinement, and long-term consolidation. During execution, FluxMem repairs missing links, prunes interference, aligns abstraction granularity, and distills recurrent successful trajectories into reusable procedural circuits, guided by one metric for memory generalizability and evolutionary maturity. Across three fundamentally distinct benchmarks including LoCoMo, Mind2Web, and GAIA, FluxMem achieves consistent state-of-the-art performance, demonstrating strong adaptation and generalization in complex agentic environments. The code will be open-sourced in https://github.com/zjunlp/LightMem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes FluxMem, a memory-augmented LLM agent framework that models memory as a heterogeneous graph whose topology evolves continuously via three stages—initial connection formation, feedback-driven refinement, and long-term consolidation—while repairing links, pruning interference, aligning abstraction levels, and distilling trajectories. A single (unspecified) metric for generalizability and evolutionary maturity is said to guide all operations. The central empirical claim is consistent state-of-the-art performance across three distinct benchmarks (LoCoMo, Mind2Web, GAIA) demonstrating superior adaptation and generalization in dynamic agentic settings.

Significance. If the three-stage refinement mechanism and its guiding metric can be shown to produce stable gains without ad-hoc fitting or excessive cost, the work would offer a substantive alternative to static memory repositories in agent literature. The planned open-sourcing of code is a positive step toward reproducibility.

major comments (3)
  1. [Abstract] Abstract: the claim that a single metric for 'memory generalizability and evolutionary maturity' reliably controls link repair, pruning, alignment, and distillation across three stages is load-bearing for the SOTA results, yet no definition, formula, or sensitivity analysis of this metric is supplied; without it the reported performance cannot be traced to the proposed mechanism.
  2. [Abstract] Abstract: no ablation isolating the contribution of each of the three stages (initial formation, feedback-driven refinement, long-term consolidation) or quantifying instability introduced by pruning is presented, leaving open the possibility that observed gains arise from other unstated factors rather than the evolving-connectivity design.
  3. [Abstract] Abstract: the benchmarks (LoCoMo, Mind2Web, GAIA) are described as 'fundamentally distinct,' but no baseline comparisons, metric definitions, or statistical significance tests are referenced, so it is impossible to verify that the 'consistent state-of-the-art' claim follows from the graph-evolution procedure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to supply the requested details and analyses.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that a single metric for 'memory generalizability and evolutionary maturity' reliably controls link repair, pruning, alignment, and distillation across three stages is load-bearing for the SOTA results, yet no definition, formula, or sensitivity analysis of this metric is supplied; without it the reported performance cannot be traced to the proposed mechanism.

    Authors: We agree that an explicit definition, formula, and sensitivity analysis of the guiding metric are required to trace performance to the mechanism. We will add these elements, including the precise formulation and sensitivity results, to the methods and experimental sections of the revised manuscript. revision: yes

  2. Referee: [Abstract] Abstract: no ablation isolating the contribution of each of the three stages (initial formation, feedback-driven refinement, long-term consolidation) or quantifying instability introduced by pruning is presented, leaving open the possibility that observed gains arise from other unstated factors rather than the evolving-connectivity design.

    Authors: We concur that stage-specific ablations and pruning instability analysis are necessary. We will incorporate these ablations and the associated instability quantification into the experiments section of the revised manuscript. revision: yes

  3. Referee: [Abstract] Abstract: the benchmarks (LoCoMo, Mind2Web, GAIA) are described as 'fundamentally distinct,' but no baseline comparisons, metric definitions, or statistical significance tests are referenced, so it is impossible to verify that the 'consistent state-of-the-art' claim follows from the graph-evolution procedure.

    Authors: We will expand the abstract and results discussion to explicitly reference the baseline comparisons, metric definitions, and statistical significance tests already computed on these benchmarks, thereby clarifying how the gains derive from the graph-evolution procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical framework with no equations or derivations

full rationale

The paper presents FluxMem as a three-stage heterogeneous graph refinement process guided by an unspecified metric for generalizability, with performance claims on external benchmarks (LoCoMo, Mind2Web, GAIA). No equations, formal derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or description. The central claims are empirical and do not reduce by construction to inputs via self-definition or ansatz smuggling; they remain open to external validation or falsification.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5756 in / 1006 out tokens · 32898 ms · 2026-06-29T12:23:17.727115+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 55 canonical work pages · 30 internal anchors

  1. [1]

    online" 'onlinestring :=

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Aadharsh Aadhithya A, Sachin Kumar S, and Soman K. P. 2024. https://arxiv.org/abs/2406.06124 Enhancing long-term memory using hierarchical aggregate tree for retrieval augmented generation . Preprint, arXiv:2406.06124

  4. [4]

    Huan ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, Hongru Wang, Han Xiao, Yuhang Zhou, Shaokun Zhang, Jiayi Zhang, Jinyu Xiang, Yixiong Fang, Qiwen Zhao, Dongrui Liu, and 8 others. 2026. https://arxiv.org/abs/2507.21046 A survey of self-evolving agents: What, when, how, and where to e...

  5. [5]

    Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, and Vahab Mirrokni. 2025 a . https://arxiv.org/abs/2504.13173 It's all connected: A journey through test-time memorization, attentional bias, retention, and online optimization . Preprint, arXiv:2504.13173

  6. [6]

    Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, and Vahab Mirrokni. 2025 b . https://arxiv.org/abs/2512.24695 Nested learning: The illusion of deep learning architectures . Preprint, arXiv:2512.24695

  7. [7]

    Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. 2024. https://arxiv.org/abs/2501.00663 Titans: Learning to memorize at test time . Preprint, arXiv:2501.00663

  8. [8]

    Zhicheng Cai, Xinyuan Guo, Yu Pei, Jiangtao Feng, Jinsong Su, Jiangjie Chen, Ya-Qin Zhang, Wei-Ying Ma, Mingxuan Wang, and Hao Zhou. 2025. https://arxiv.org/abs/2511.06449 Flex: Continuous agent evolution via forward learning from experience . Preprint, arXiv:2511.06449

  9. [9]

    Zouying Cao, Jiaji Deng, Li Yu, Weikang Zhou, Zhaoyang Liu, Bolin Ding, and Hai Zhao. 2025. https://arxiv.org/abs/2512.10696 Remember me, refine me: A dynamic procedural memory framework for experience-driven agent evolution . Preprint, arXiv:2512.10696

  10. [10]

    Ding Chen, Simin Niu, Kehang Li, Peng Liu, Xiangping Zheng, Bo Tang, Xinchi Li, Feiyu Xiong, and Zhiyu Li. 2026 a . https://arxiv.org/abs/2511.03506 Halumem: Evaluating hallucinations in memory systems of agents . Preprint, arXiv:2511.03506

  11. [11]

    Yining Chen, Jihao Zhao, Bo Tang, Haofen Wang, Yue Zhang, Fei Huang, Feiyu Xiong, and Zhiyu Li. 2026 b . https://arxiv.org/abs/2605.09530 Memprivacy: Privacy-preserving personalized memory management for edge-cloud agents . Preprint, arXiv:2605.09530

  12. [12]

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. 2025. https://arxiv.org/abs/2504.19413 Mem0: Building production-ready ai agents with scalable long-term memory . Preprint, arXiv:2504.19413

  13. [13]

    Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2web: Towards a generalist agent for the web. Advances in Neural Information Processing Systems, 36:28091--28114

  14. [14]

    Jinyuan Fang, Yanwen Peng, Xi Zhang, Yingxu Wang, Xinhao Yi, Guibin Zhang, Yi Xu, Bin Wu, Siwei Liu, Zihao Li, Zhaochun Ren, Nikos Aletras, Xi Wang, Han Zhou, and Zaiqiao Meng. 2025 a . https://arxiv.org/abs/2508.07407 A comprehensive survey of self-evolving ai agents: A new paradigm bridging foundation models and lifelong agentic systems . Preprint, arXi...

  15. [15]

    Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, and 1 others. 2025 b . Lightmem: Lightweight and efficient memory-augmented generation. arXiv preprint arXiv:2510.18866

  16. [16]

    Runnan Fang, Yuan Liang, Xiaobin Wang, Jialong Wu, Shuofei Qiao, Pengjun Xie, Fei Huang, Huajun Chen, and Ningyu Zhang. 2026. https://arxiv.org/abs/2508.06433 Memp: Exploring agent procedural memory . Preprint, arXiv:2508.06433

  17. [17]

    Adam Fourney, Gagan Bansal, Hussein Mozannar, Cheng Tan, Eduardo Salinas, Erkang, Zhu, Friederike Niedtner, Grace Proebsting, Griffin Bassman, Jack Gerrits, Jacob Alber, Peter Chang, Ricky Loynd, Robert West, Victor Dibia, Ahmed Awadallah, Ece Kamar, Rafah Hosn, and Saleema Amershi. 2024. https://arxiv.org/abs/2411.04468 Magentic-one: A generalist multi-a...

  18. [18]

    Paul W Frankland and Bruno Bontempi. 2005. The organization of recent and remote memories. Nature reviews neuroscience, 6(2):119--130

  19. [19]

    Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. 2025 a . https://arxiv.org/abs/2405.14831 Hipporag: Neurobiologically inspired long-term memory for large language models . Preprint, arXiv:2405.14831

  20. [20]

    Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, and Yu Su. 2025 b . https://arxiv.org/abs/2502.14802 From rag to memory: Non-parametric continual learning for large language models . Preprint, arXiv:2502.14802

  21. [21]

    Retrieval-Augmented Generation with Graphs (GraphRAG)

    Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A. Rossi, Subhabrata Mukherjee, Xianfeng Tang, Qi He, Zhigang Hua, Bo Long, Tong Zhao, Neil Shah, Amin Javari, Yinglong Xia, and Jiliang Tang. 2025. https://arxiv.org/abs/2501.00309 Retrieval-augmented generation with graphs (graphrag) . Preprint, arXiv:2501.00309

  22. [22]

    Donald Olding Hebb. 2005. The organization of behavior: A neuropsychological theory. Psychology press

  23. [23]

    Chuanrui Hu, Xingze Gao, Zuyi Zhou, Dannong Xu, Yi Bai, Xintong Li, Hui Zhang, Tong Li, Chong Zhang, Lidong Bing, and 1 others. 2026 a . Evermemos: A self-organizing memory operating system for structured long-horizon reasoning. arXiv preprint arXiv:2601.02163

  24. [24]

    Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, Senjie Jin, Jiejun Tan, Yanbin Yin, Jiongnan Liu, Zeyu Zhang, Zhongxiang Sun, Yutao Zhu, Hao Sun, Boci Peng, and 28 others. 2026 b . https://arxiv.org/abs/2512.13564 Memory in the age of ai agents . Preprint, arXiv:2512.13564

  25. [25]

    Bowen Jiang, Yuan Yuan, Maohao Shen, Zhuoqun Hao, Zhangchen Xu, Zichen Chen, Ziyi Liu, Anvesh Rao Vijjini, Jiashu He, Hanchao Yu, Radha Poovendran, Gregory Wornell, Lyle Ungar, Dan Roth, Sihao Chen, and Camillo Jose Taylor. 2025. https://arxiv.org/abs/2512.06688 Personamem-v2: Towards personalized intelligence via learning implicit user personas and agent...

  26. [26]

    Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. Memory os of ai agent. arXiv preprint arXiv:2506.06326

  27. [27]

    AM Clare Kelly and Hugh Garavan. 2005. Human functional neuroimaging of brain changes associated with practice. Cerebral cortex, 15(8):1089--1102

  28. [28]

    Yitao Liu, Chenglei Si, Karthik Narasimhan, and Shunyu Yao. 2025. https://arxiv.org/abs/2506.06698 Contextual experience replay for self-improvement of language agents . Preprint, arXiv:2506.06698

  29. [29]

    Lin Long, Yichen He, Wentao Ye, Yiyuan Pan, Yuan Lin, Hang Li, Junbo Zhao, and Wei Li. 2025. https://arxiv.org/abs/2508.09736 Seeing, listening, remembering, and reasoning: A multimodal agent with long-term memory . Preprint, arXiv:2508.09736

  30. [30]

    Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating very long-term conversational memory of llm agents. arXiv preprint arXiv:2402.17753

  31. [31]

    Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, Chenlin Zhou, Jiayi Mao, Tianze Xia, Jiafeng Guo, and Shenghua Liu. 2025. https://arxiv.org/abs/2507.13334 A survey of context engineering for large language models . Preprint, arXiv:2507.13334

  32. [32]

    Gr \'e goire Mialon, Cl \'e mentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. 2023. Gaia: a benchmark for general ai assistants. In The Twelfth International Conference on Learning Representations

  33. [33]

    Jiayan Nan, Wenquan Ma, Wenlong Wu, and Yize Chen. 2025. Nemori: Self-organizing agent memory inspired by cognitive science. arXiv preprint arXiv:2508.03341

  34. [34]

    OpenAI. 2024. https://openai.com/index/introducing-deep-research/ deepresearch

  35. [35]

    ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

    Siru Ouyang, Jun Yan, I-Hung Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T. Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, and Tomas Pfister. 2025. https://arxiv.org/abs/2509.25140 Reasoningbank: Scaling agent self-evolving with reasoning memory . Preprint, arXiv:2509.25140

  36. [36]

    MemGPT: Towards LLMs as Operating Systems

    Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2024. https://arxiv.org/abs/2310.08560 Memgpt: Towards llms as operating systems . Preprint, arXiv:2310.08560

  37. [37]

    Daiyi Peng. 2023. https://github.com/google/langfun Langfun

  38. [38]

    Shihao Qi, Jie Ma, Rui Xing, Wei Guo, Xiao Huang, Zhitao Gao, Jianhao Deng, Jun Liu, Lingling Zhang, Bifan Wei, Boqian Yang, Pinghui Wang, Jianwen Sun, Jing Tao, Yaqiang Wu, Hui Liu, Yu Yao, and Tongliang Liu. 2026. https://arxiv.org/abs/2605.14892 Beyond individual intelligence: Surveying collaboration, failure attribution, and self-evolution in llm-base...

  39. [39]

    Tianrui Qin, Qianben Chen, Sinuo Wang, He Xing, King Zhu, He Zhu, Dingfeng Shi, Xinxin Liu, Ge Zhang, Jiaheng Liu, Yuchen Eleanor Jiang, Xitong Gao, and Wangchunshu Zhou. 2025. https://arxiv.org/abs/2509.25301 Flash-searcher: Fast and effective web agents via dag-based parallel execution . Preprint, arXiv:2509.25301

  40. [40]

    Jiahao Qiu, Xuan Qi, Tongcheng Zhang, Xinzhe Juan, Jiacheng Guo, Yifu Lu, Yimin Wang, Zixin Yao, Qihan Ren, Xun Jiang, Xing Zhou, Dongrui Liu, Ling Yang, Yue Wu, Kaixuan Huang, Shilong Liu, Hongru Wang, and Mengdi Wang. 2025. https://arxiv.org/abs/2505.20286 Alita: Generalist agent enabling scalable agentic reasoning with minimal predefinition and maximal...

  41. [41]

    Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. 2025. https://arxiv.org/abs/2501.13956 Zep: A temporal knowledge graph architecture for agent memory . Preprint, arXiv:2501.13956

  42. [42]

    Aymeric Roucher, Albert Villanova del Moral, Thomas Wolf, Leandro von Werra, and Erik Kaunismäki. 2025. `smolagents`: a smol library to build great agentic systems. https://github.com/huggingface/smolagents

  43. [43]

    Yuchen Shi, Yuzheng Cai, Siqi Cai, Zihan Xu, Lichao Chen, Yulei Qin, Zhijian Zhou, Xiang Fei, Chaofan Qiu, Xiaoyu Tan, Gang Li, Zongyi Li, Haojia Lin, Guocan Cai, Yong Mao, Yunsheng Wu, Ke Li, and Xing Sun. 2025. https://arxiv.org/abs/2512.24615 Youtu-agent: Scaling agent productivity with automated generation and hybrid policy optimization . Preprint, ar...

  44. [44]

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36:8634--8652

  45. [45]

    Mirac Suzgun, Mert Yuksekgonul, Federico Bianchi, Dan Jurafsky, and James Zou. 2026. https://doi.org/10.18653/v1/2026.eacl-long.333 Dynamic cheatsheet: Test-time learning with adaptive memory . In Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 1: Long Papers) , pages 7080--7106, Raba...

  46. [46]

    Xiangru Tang, Tianyu Hu, Muyang Ye, Yanjun Shao, Xunjian Yin, Siru Ouyang, Wangchunshu Zhou, Pan Lu, Zhuosheng Zhang, Yilun Zhao, Arman Cohan, and Mark Gerstein. 2025 a . https://arxiv.org/abs/2501.06590 Chemagent: Self-updating library in large language models improves chemical reasoning . Preprint, arXiv:2501.06590

  47. [47]

    Xiangru Tang, Tianrui Qin, Tianhao Peng, Ziyang Zhou, Daniel Shao, Tingting Du, Xinming Wei, Peng Xia, Fang Wu, He Zhu, Ge Zhang, Jiaheng Liu, Xingyao Wang, Sirui Hong, Chenglin Wu, Hao Cheng, Chi Wang, and Wangchunshu Zhou. 2025 b . https://arxiv.org/abs/2507.06229 Agent kb: Leveraging cross-domain experience for agentic problem solving . Preprint, arXiv...

  48. [48]

    Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang, Shuofei Qiao, Kexin Cao, Guozhou Zheng, Xiang Qi, Peng Zhang, and Shumin Deng. 2026. https://arxiv.org/abs/2604.04804 Skillx: Automatically constructing skill knowledge bases for agents . Preprint, arXiv:2604.04804

  49. [49]

    Peng Wang, Zexi Li, Ningyu Zhang, Ziwen Xu, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, and Huajun Chen. 2024 a . Wise: Rethinking the knowledge memory for lifelong model editing of large language models. Advances in Neural Information Processing Systems, 37:53764--53797

  50. [50]

    Yu Wang and Xi Chen. 2025. Mirix: Multi-agent memory system for llm-based agents. arXiv preprint arXiv:2507.07957

  51. [51]

    Yu Wang, Yifan Gao, Xiusi Chen, Haoming Jiang, Shiyang Li, Jingfeng Yang, Qingyu Yin, Zheng Li, Xian Li, Bing Yin, Jingbo Shang, and Julian McAuley. 2024 b . https://arxiv.org/abs/2402.04624 Memoryllm: Towards self-updatable large language models . Preprint, arXiv:2402.04624

  52. [52]

    Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, and Graham Neubig. 2024 c . Agent workflow memory. arXiv preprint arXiv:2409.07429

  53. [53]

    Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, Xuying Ning, Mengting Ai, Yunzhe Li, Jingrui He, Ed H. Chi, Chi Wang, Shuo Chen, Fernando Pereira, Wang-Cheng Kang, and Derek Zhiyuan Cheng. 2025. https://arxiv.org/abs/2511.20857 Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory . Preprint, arXiv:2511.20857

  54. [54]

    Rong Wu, Xiaoman Wang, Jianbiao Mei, Pinlong Cai, Daocheng Fu, Cheng Yang, Licheng Wen, Xuemeng Yang, Yufan Shen, Yuxin Wang, and Botian Shi. 2025. https://arxiv.org/abs/2510.16079 Evolver: Self-evolving llm agents through an experience-driven lifecycle . Preprint, arXiv:2510.16079

  55. [55]

    Peng Xia, Kaide Zeng, Jiaqi Liu, Can Qin, Fang Wu, Yiyang Zhou, Caiming Xiong, and Huaxiu Yao. 2025. https://arxiv.org/abs/2511.16043 Agent0: Unleashing self-evolving agents from zero data via tool-integrated reasoning . Preprint, arXiv:2511.16043

  56. [56]

    Buqiang Xu, Yijun Chen, Jizhan Fang, Ruobin Zhong, Yunzhi Yao, Yuqi Zhu, Lun Du, and Shumin Deng. 2026. https://doi.org/10.48550/ARXIV.2604.21748 Structmem: Structured memory for long-horizon behavior in llms . CoRR, abs/2604.21748

  57. [57]

    Wujiang Xu, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, and Yongfeng Zhang. 2025. A-mem: Agentic memory for llm agents. arXiv preprint arXiv:2502.12110

  58. [58]

    Ke Yang, Zixi Chen, Xuan He, Jize Jiang, Michel Galley, Chenglong Wang, Jianfeng Gao, Jiawei Han, and ChengXiang Zhai. 2026. https://arxiv.org/abs/2603.03296 Plugmem: A task-agnostic plugin memory module for llm agents . Preprint, arXiv:2603.03296

  59. [59]

    Chongrui Ye, Yuxiang Liu, Yu Wang, Haofei Yu, Yining Zhao, Ge Liu, Julian McAuley, and Jiaxuan You. 2026. https://arxiv.org/abs/2605.20616 Auto-dreamer: Learning offline memory consolidation for language agents . Preprint, arXiv:2605.20616

  60. [60]

    Shicheng Ye, Chao Yu, Kaiqiang Ke, Chengdong Xu, and Yinqi Wei. 2025. https://arxiv.org/abs/2509.12810 H ^2 r: Hierarchical hindsight reflection for multi-task llm agents . Preprint, arXiv:2509.12810

  61. [61]

    Yunpeng Zhai, Shuchang Tao, Cheng Chen, Anni Zou, Ziqian Chen, Qingxu Fu, Shinji Mai, Li Yu, Jiaji Deng, Zouying Cao, Zhaoyang Liu, Bolin Ding, and Jingren Zhou. 2025. https://arxiv.org/abs/2511.10395 Agentevolver: Towards efficient self-evolving agent system . Preprint, arXiv:2511.10395

  62. [62]

    Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, and Shuicheng Yan. 2025 a . https://arxiv.org/abs/2506.07398 G-memory: Tracing hierarchical memory for multi-agent systems . Preprint, arXiv:2506.07398

  63. [63]

    Guibin Zhang, Haotian Ren, Chong Zhan, Zhenhong Zhou, Junhao Wang, He Zhu, Wangchunshu Zhou, and Shuicheng Yan. 2025 b . https://arxiv.org/abs/2512.18746 Memevolve: Meta-evolution of agent memory systems . Preprint, arXiv:2512.18746

  64. [64]

    Haozhen Zhang, Quanyu Long, Jianzhu Bao, Tao Feng, Weizhi Zhang, Haodong Yue, and Wenya Wang. 2026 a . https://arxiv.org/abs/2602.02474 Memskill: Learning and evolving memory skills for self-evolving agents . Preprint, arXiv:2602.02474

  65. [65]

    Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Weinan Zhang, Ying Wen, Zhiyu Li, Feiyu Xiong, Yutao Qi, Bo Tang, and Muning Wen. 2026 b . https://arxiv.org/abs/2601.03192 Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory . Preprint, arXiv:2601.03192

  66. [66]

    Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025 c . A survey on the memory mechanism of large language model-based agents. ACM Transactions on Information Systems, 43(6):1--47

  67. [67]

    Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2024. Expel: Llm agents are experiential learners. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 19632--19642

  68. [68]

    Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, and Jun Wang. 2025. https://arxiv.org/abs/2508.16153 Memento: Fine-tuning llm agents without fine-tuning llms . Preprint, arXiv:2508.16153

  69. [69]

    Adam Zweiger, Jyothish Pari, Han Guo, Ekin Akyürek, Yoon Kim, and Pulkit Agrawal. 2025. https://arxiv.org/abs/2506.10943 Self-adapting language models . Preprint, arXiv:2506.10943