{"paper":{"title":"EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"LLM agents can improve long-term memory by letting an LLM module diagnose retrieval failures and autonomously adjust the system's own configuration.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Cihang Xie, Huaxiu Yao, Jiaqi Liu, Mingyu Ding, Peng Xia, Xinyu Ye, Zeyu Zheng","submitted_at":"2026-05-13T17:12:44Z","abstract_excerpt":"Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, a self-evolving memory architecture that exposes its full retrieval configuration as a structured action space optimized by an LLM-powered diagnosis module. In"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"EvolveMem outperforms the strongest baseline by 25.7% relative on LoCoMo and 18.9% on MemBench; evolved configurations transfer positively across benchmarks, indicating capture of universal retrieval principles.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The LLM diagnosis module can reliably identify root causes of retrieval failures and propose configuration changes that improve performance without introducing new biases or regressions that the safeguards fail to catch.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-benchmark transfer.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"LLM agents can improve long-term memory by letting an LLM module diagnose retrieval failures and autonomously adjust the system's own configuration.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"2ff3c5bb2b034fbdaed71f252fa802ea5416d47f769eef906db60f0a7a31c2e9"},"source":{"id":"2605.13941","kind":"arxiv","version":1},"verdict":{"id":"f93c35fc-a09d-4c4c-b1d7-68043ba538e9","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T04:45:05.843429Z","strongest_claim":"EvolveMem outperforms the strongest baseline by 25.7% relative on LoCoMo and 18.9% on MemBench; evolved configurations transfer positively across benchmarks, indicating capture of universal retrieval principles.","one_line_summary":"EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-benchmark transfer.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The LLM diagnosis module can reliably identify root causes of retrieval failures and propose configuration changes that improve performance without introducing new biases or regressions that the safeguards fail to catch.","pith_extraction_headline":"LLM agents can improve long-term memory by letting an LLM module diagnose retrieval failures and autonomously adjust the system's own configuration."},"references":{"count":76,"sample":[{"doi":"","year":2024,"title":"Self-rag: Learning to retrieve, generate, and critique through self-reflection","work_id":"116adb18-9402-490d-85f6-b8674bbab21d","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Self-play fine- tuning converts weak language models to strong language models","work_id":"b72620da-95af-40a5-8d2e-5c3cd254ded3","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory","work_id":"a5aed26c-a248-48b6-a59e-f7693fcb180a","ref_index":3,"cited_arxiv_id":"2504.19413","is_internal_anchor":true},{"doi":"","year":null,"title":"Über das gedächtnis: Untersuchungen zur experimentellen psychologie","work_id":"37ac3cb9-a8bc-4852-8fd1-ca3b11a7fb8a","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence","work_id":"f5de9511-98bf-411c-9eba-e8b7a914ec18","ref_index":5,"cited_arxiv_id":"2507.21046","is_internal_anchor":true}],"resolved_work":76,"snapshot_sha256":"b780da0963989cf05efdcfc13e57565ffcbb288f83641bda2b3717abed87502e","internal_anchors":13},"formal_canon":{"evidence_count":2,"snapshot_sha256":"426a1c15ea1408a26a5c5467d22242b6ccbfce9fe009e08d8f0ae63de5888485"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}