{"paper":{"title":"MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"MemAgent lets LLMs handle millions of tokens by segmenting input and overwriting memory after RL training on 32K texts.","cross_cats":["cs.AI","cs.LG"],"primary_cat":"cs.CL","authors_text":"Hao Zhou, Hongli Yu, Jiangjie Chen, Jiangtao Feng, Jingjing Liu, Mingxuan Wang, Qiying Yu, Tinghong Chen, Weinan Dai, Wei-Ying Ma, Ya-Qin Zhang","submitted_at":"2025-07-03T03:11:50Z","abstract_excerpt":"Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents with linear complexity without performance degradation during extrapolation remains the ultimate challenge in long-text processing. We directly optimize for long-text tasks in an end-to-end fashion and introduce a novel agent workflow, MemAgent, which reads text in segments and updates the memory using an overwrite strategy. We extend the DAPO algorithm to facilitate training via independent-context multi-conversation generation. MemAgent has demonstrated superb long-context"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"MemAgent has demonstrated superb long-context capabilities, being able to extrapolate from an 8K context trained on 32K text to a 3.5M QA task with performance loss < 5% and achieves 95%+ in 512K RULER test.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the overwrite memory strategy combined with multi-conversation RL training will continue to prevent performance degradation when scaling far beyond the 32K training length without additional mechanisms or data.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"MemAgent lets LLMs handle millions of tokens by segmenting input and overwriting memory after RL training on 32K texts.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"37d6df6170b4e6d89a70129ae69baaeb847622fd5e1472e0ab5d8c9f2783130d"},"source":{"id":"2507.02259","kind":"arxiv","version":1},"verdict":{"id":"49cf8c9c-1fb9-4c13-b384-d5862d5733c3","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T11:11:50.392302Z","strongest_claim":"MemAgent has demonstrated superb long-context capabilities, being able to extrapolate from an 8K context trained on 32K text to a 3.5M QA task with performance loss < 5% and achieves 95%+ in 512K RULER test.","one_line_summary":"MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the overwrite memory strategy combined with multi-conversation RL training will continue to prevent performance degradation when scaling far beyond the 32K training length without additional mechanisms or data.","pith_extraction_headline":"MemAgent lets LLMs handle millions of tokens by segmenting input and overwriting memory after RL training on 32K texts."},"references":{"count":65,"sample":[{"doi":"","year":2024,"title":"RULER: What's the Real Context Size of Your Long-Context Language Models?","work_id":"c0bc4689-3ce8-4e3d-9442-bd74869445bb","ref_index":1,"cited_arxiv_id":"2404.06654","is_internal_anchor":true},{"doi":"","year":2018,"title":"HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering","work_id":"c87d7e5f-b81a-41c8-beca-f0b9d598aae4","ref_index":2,"cited_arxiv_id":"1809.09600","is_internal_anchor":true},{"doi":"","year":2024,"title":"Learning to reason with llms","work_id":"16f7723e-0b48-4125-885f-310de5bf2c28","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Gemini 2.0 flash thinking, 2024","work_id":"ae6b4683-11f7-4205-91f3-a116f2c49f43","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Grok 3 beta — the age of reasoning agents, 2024","work_id":"21963273-aa23-44a9-be08-9bd3fd2bc191","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":65,"snapshot_sha256":"ab37c378c53f205646ae8c52b2abcc0c15c8085d63d5d0d0c88ec0179c9d0268","internal_anchors":33},"formal_canon":{"evidence_count":2,"snapshot_sha256":"57dc516578b067252ffe60da63e7f3913ced605f33dcf1ea6b02242da768c781"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}