{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:RZ7367QXH25YUOKYP43PJPHK2O","short_pith_number":"pith:RZ7367QX","schema_version":"1.0","canonical_sha256":"8e7fbf7e173ebb8a39587f36f4bcead394069ddcc3d7107926de73ed48948a0c","source":{"kind":"arxiv","id":"2511.20857","version":1},"attestation_state":"computed","paper":{"title":"Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory","license":"http://creativecommons.org/licenses/by/4.0/","headline":"LLM agents achieve continual improvement on streaming tasks by using the ReMem pipeline to integrate reasoning, actions, and memory updates.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Benjamin Coleman, Chi Wang, Derek Zhiyuan Cheng, Ed H. Chi, Fernando Pereira, Jingrui He, Mengting Ai, Noveen Sachdeva, Shuo Chen, Tianxin Wei, Wang-Cheng Kang, Xuying Ning, Yuanchen Bei, Yunzhe Li, Zhankui He","submitted_at":"2025-11-25T21:08:07Z","abstract_excerpt":"Statefulness is essential for large language model (LLM) agents to perform long-term planning and problem-solving. This makes memory a critical component, yet its management and evolution remain largely underexplored. Existing evaluations mostly focus on static conversational settings, where memory is passively retrieved from dialogue to answer queries, overlooking the dynamic ability to accumulate and reuse experience across evolving task streams. In real-world environments such as interactive problem assistants or embodied agents, LLMs are required to handle continuous task streams, yet ofte"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2511.20857","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CL","submitted_at":"2025-11-25T21:08:07Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"0d818bd916402e6653c573575779d522ab3b6ccd03e7edcf7f20c510c04d1e7e","abstract_canon_sha256":"91510619ee77a1e8bb1dbe58ace5f7ed30ee2190b6f2fa018506d5ca6f3c0544"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:19.897856Z","signature_b64":"N2Dh92hhcQCcpTGW7kj9eE9Vs7ANziGrEmsbXs95gEmQ/Jye9lIwXnMY7KHA4bzOMSUcZdRzSwWwjGBX/R4sDQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"8e7fbf7e173ebb8a39587f36f4bcead394069ddcc3d7107926de73ed48948a0c","last_reissued_at":"2026-05-17T23:39:19.896925Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:19.896925Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory","license":"http://creativecommons.org/licenses/by/4.0/","headline":"LLM agents achieve continual improvement on streaming tasks by using the ReMem pipeline to integrate reasoning, actions, and memory updates.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Benjamin Coleman, Chi Wang, Derek Zhiyuan Cheng, Ed H. Chi, Fernando Pereira, Jingrui He, Mengting Ai, Noveen Sachdeva, Shuo Chen, Tianxin Wei, Wang-Cheng Kang, Xuying Ning, Yuanchen Bei, Yunzhe Li, Zhankui He","submitted_at":"2025-11-25T21:08:07Z","abstract_excerpt":"Statefulness is essential for large language model (LLM) agents to perform long-term planning and problem-solving. This makes memory a critical component, yet its management and evolution remain largely underexplored. Existing evaluations mostly focus on static conversational settings, where memory is passively retrieved from dialogue to answer queries, overlooking the dynamic ability to accumulate and reuse experience across evolving task streams. In real-world environments such as interactive problem assistants or embodied agents, LLMs are required to handle continuous task streams, yet ofte"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"ReMem, an action-think-memory refine pipeline, tightly integrates reasoning, task actions, and memory updates to achieve continual improvement in LLM agents on streaming tasks.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the chosen sequential task streams and the implemented memory modules faithfully capture the dynamics of real-world continuous interactions where memory evolution is required, without hidden implementation biases affecting the comparisons.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"LLM agents achieve continual improvement on streaming tasks by using the ReMem pipeline to integrate reasoning, actions, and memory updates.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"0ea7554e0285eee97172779bf5efa5883de6325821b2fa6e9aa4987c291f81aa"},"source":{"id":"2511.20857","kind":"arxiv","version":1},"verdict":{"id":"af7e0477-7a68-4bdb-a637-7436043acc6f","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T23:08:31.892284Z","strongest_claim":"ReMem, an action-think-memory refine pipeline, tightly integrates reasoning, task actions, and memory updates to achieve continual improvement in LLM agents on streaming tasks.","one_line_summary":"Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the chosen sequential task streams and the implemented memory modules faithfully capture the dynamics of real-world continuous interactions where memory evolution is required, without hidden implementation biases affecting the comparisons.","pith_extraction_headline":"LLM agents achieve continual improvement on streaming tasks by using the ReMem pipeline to integrate reasoning, actions, and memory updates."},"references":{"count":299,"sample":[{"doi":"","year":2009,"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","ref_index":1,"cited_arxiv_id":"2009.03300","is_internal_anchor":true},{"doi":"","year":null,"title":"International Conference on Learning Representations (ICLR) , year=","work_id":"1852f1a8-2303-4108-a8a5-0562f7716a9f","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Advances in Neural Information Processing Systems (NeurIPS) , year=","work_id":"0cb97455-c4bf-4962-a363-31b7fd9dc41b","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Advances in Neural Information Processing Systems (NeurIPS) , year=","work_id":"fda20f90-227f-46ae-9d68-9c841c704211","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"International Conference on Machine Learning (ICML) , year=","work_id":"98f812e7-24ab-4f7b-a3df-b17d84a7b2e4","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":299,"snapshot_sha256":"053a4e2e41893da11f3db45f136b055dc708cc66c3dd725bf6e47a3ff4a38303","internal_anchors":36},"formal_canon":{"evidence_count":1,"snapshot_sha256":"ab70ab64680b2b0ec733a08592583ffb6ace64537130a4bf27dc69f776abcc09"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2511.20857","created_at":"2026-05-17T23:39:19.897095+00:00"},{"alias_kind":"arxiv_version","alias_value":"2511.20857v1","created_at":"2026-05-17T23:39:19.897095+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2511.20857","created_at":"2026-05-17T23:39:19.897095+00:00"},{"alias_kind":"pith_short_12","alias_value":"RZ7367QXH25Y","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"RZ7367QXH25YUOKY","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"RZ7367QX","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":37,"internal_anchor_count":37,"sample":[{"citing_arxiv_id":"2602.06470","citing_title":"Improve Large Language Model Systems with User Logs","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2604.08216","citing_title":"MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20616","citing_title":"Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents","ref_index":33,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07358","citing_title":"A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications","ref_index":148,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18421","citing_title":"EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18747","citing_title":"Code as Agent Harness","ref_index":202,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17721","citing_title":"EXG: Self-Evolving Agents with Experience Graphs","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15384","citing_title":"Is One Score Enough? Rethinking the Evaluation of Sequentially Evolving LLM Memory","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2601.12538","citing_title":"Agentic Reasoning for Large Language Models","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13050","citing_title":"Context Training with Active Information Seeking","ref_index":60,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07594","citing_title":"MemCompiler: Compile, Don't Inject -- State-Conditioned Memory for Embodied Agents","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13941","citing_title":"EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14477","citing_title":"Test-Time Learning with an Evolving Library","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03295","citing_title":"Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13050","citing_title":"Context Training with Active Information Seeking","ref_index":60,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13542","citing_title":"RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03512","citing_title":"ActionNex: A Virtual Outage Manager for Cloud Computing","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06130","citing_title":"Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning","ref_index":22,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26805","citing_title":"Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10913","citing_title":"Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09315","citing_title":"Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10268","citing_title":"MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10064","citing_title":"MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26805","citing_title":"Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08704","citing_title":"AgentPSO: Evolving Agent Reasoning Skill via Multi-agent Particle Swarm Optimization","ref_index":42,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":1,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/RZ7367QXH25YUOKYP43PJPHK2O","json":"https://pith.science/pith/RZ7367QXH25YUOKYP43PJPHK2O.json","graph_json":"https://pith.science/api/pith-number/RZ7367QXH25YUOKYP43PJPHK2O/graph.json","events_json":"https://pith.science/api/pith-number/RZ7367QXH25YUOKYP43PJPHK2O/events.json","paper":"https://pith.science/paper/RZ7367QX"},"agent_actions":{"view_html":"https://pith.science/pith/RZ7367QXH25YUOKYP43PJPHK2O","download_json":"https://pith.science/pith/RZ7367QXH25YUOKYP43PJPHK2O.json","view_paper":"https://pith.science/paper/RZ7367QX","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2511.20857&json=true","fetch_graph":"https://pith.science/api/pith-number/RZ7367QXH25YUOKYP43PJPHK2O/graph.json","fetch_events":"https://pith.science/api/pith-number/RZ7367QXH25YUOKYP43PJPHK2O/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/RZ7367QXH25YUOKYP43PJPHK2O/action/timestamp_anchor","attest_storage":"https://pith.science/pith/RZ7367QXH25YUOKYP43PJPHK2O/action/storage_attestation","attest_author":"https://pith.science/pith/RZ7367QXH25YUOKYP43PJPHK2O/action/author_attestation","sign_citation":"https://pith.science/pith/RZ7367QXH25YUOKYP43PJPHK2O/action/citation_signature","submit_replication":"https://pith.science/pith/RZ7367QXH25YUOKYP43PJPHK2O/action/replication_record"}},"created_at":"2026-05-17T23:39:19.897095+00:00","updated_at":"2026-05-17T23:39:19.897095+00:00"}