{"total":23,"items":[{"citing_arxiv_id":"2605.21768","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents","primary_cat":"cs.LG","submitted_at":"2026-05-20T22:02:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Memory-R2 proposes LoGo-GRPO to fix unfair trajectory comparisons in RL training of memory-augmented LLM agents by combining global end-to-end rewards with local rerollouts from identical memory states.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20616","ref_index":31,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents","primary_cat":"cs.CL","submitted_at":"2026-05-20T02:03:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Auto-Dreamer trains an offline memory consolidator via GRPO on agent performance to abstract cross-session patterns, outperforming baselines by 7 points on ScienceWorld with 12x smaller memory and generalizing to ALFWorld and WebArena.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17065","ref_index":43,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PyraVid: Hierarchical Multimodal Memory for Long-Horizon Video Reasoning","primary_cat":"cs.MA","submitted_at":"2026-05-16T16:15:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PyraVid is a hierarchical multimodal memory system that structures long videos into pyramids to improve long-horizon reasoning and evidence aggregation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14133","ref_index":36,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents","primary_cat":"cs.AI","submitted_at":"2026-05-13T21:34:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12493","ref_index":95,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues","primary_cat":"cs.CL","submitted_at":"2026-05-12T17:59:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12039","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs","primary_cat":"cs.CL","submitted_at":"2026-05-12T12:21:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SkillGraph represents skills as nodes in an evolving directed graph with typed dependency edges and updates the graph from RL trajectories to boost compositional task performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11814","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare","primary_cat":"cs.AI","submitted_at":"2026-05-12T09:06:40+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"agent memory is not merely a conversational enhancer; it is the critical infrastructure required to maintain evolving patient profiles, medication histories, and strict safety constraints. Recent works have made notable progress in knowledge graph enhanced memory [7, 10, 25, 29], dedicated memory frameworks [5, 19, 22, 37], and reinforcement learning-optimized memory [33, 38-40], yet these approaches have been predominantly validated in general domains such as daily conversations, and remain insufficiently examined in large-scale, high-risk medical dialogue scenarios. The medical domain imposes uniquely stringent requirements that shatter the fundamental assumptions of general- purpose memory systems. First, medical inquiries demandexceptional precision; semantically similar complaints,"},{"citing_arxiv_id":"2605.10488","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning","primary_cat":"cs.CL","submitted_at":"2026-05-11T12:48:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09874","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding","primary_cat":"cs.CV","submitted_at":"2026-05-11T01:59:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08374","ref_index":37,"ref_count":3,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs","primary_cat":"cs.AI","submitted_at":"2026-05-08T18:30:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemQ improves LLM agent performance by using eligibility traces over provenance DAGs to assign credit to dependent memories, achieving top success rates on six benchmarks with largest gains on complex multi-step tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04811","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Tree-based Credit Assignment for Multi-Agent Memory System","primary_cat":"cs.MA","submitted_at":"2026-05-06T12:02:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TreeMem assigns credit to agents in multi-agent memory systems by expanding outputs into a tree and using Monte Carlo averaging of final rewards to optimize each agent's policy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00702","ref_index":159,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory","primary_cat":"cs.CL","submitted_at":"2026-05-01T14:45:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15877","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents","primary_cat":"cs.AI","submitted_at":"2026-04-17T09:26:25+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14029","ref_index":46,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch","primary_cat":"cs.CV","submitted_at":"2026-04-15T16:09:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"search model from scratch, specifically investigating whether injecting agentic- related data during the model's formative stages yields superior performance. Agent memory.Continuous environment interaction often leads to context explosion, which can easily exceed the context window limitations. To address this, various memory management strategies [6,46,52] have been proposed. For example, MemAgent [58] proposes maintaining a fixed-length memory that is proactively and selectively updated by the model. AgentFold [57] introduces a foldingmechanismtointelligentlyfoldsegmentsofcontextduringtaskexecution. More recently, cross-modal compression [10,35] has emerged as a potent alterna- tive. DeepSeek-OCR [47] proposes that images can serve as an effective medium"},{"citing_arxiv_id":"2604.10110","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Trust Your Memory: Verifiable Control of Smart Homes through Reinforcement Learning with Multi-dimensional Rewards","primary_cat":"cs.AI","submitted_at":"2026-04-11T09:08:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces MemHome benchmark and RL with multi-dimensional rewards for memory-driven smart home device control.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08256","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"HyperMem: Hypergraph Memory for Long-Term Conversations","primary_cat":"cs.CL","submitted_at":"2026-04-09T13:43:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HyperMem is a hypergraph memory architecture that groups related conversation episodes and facts via hyperedges and reports 92.73% LLM-as-a-judge accuracy on the LoCoMo benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08224","ref_index":152,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering","primary_cat":"cs.SE","submitted_at":"2026-04-09T13:19:41+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.07894","ref_index":71,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation","primary_cat":"cs.CL","submitted_at":"2026-04-09T07:04:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.12631","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Joint Optimization of Multi-agent Memory System","primary_cat":"cs.MA","submitted_at":"2026-03-13T04:04:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CoMAM jointly optimizes agents in multi-agent LLM memory systems via end-to-end RL and adaptive credit assignment to improve collaboration and performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.13933","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling","primary_cat":"cs.AI","submitted_at":"2026-02-15T00:06:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.21468","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning","primary_cat":"cs.AI","submitted_at":"2026-01-29T09:47:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MemOCR renders structured memory as images with adaptive visual density to improve long-horizon reasoning under tight context budgets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.05488","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards","primary_cat":"cs.CL","submitted_at":"2026-01-09T02:44:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MemBuilder trains 4B-parameter models with attributed dense rewards to outperform closed-source baselines on long-term dialogue memory tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.02547","ref_index":135,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The Landscape of Agentic Reinforcement Learning for LLMs: A Survey","primary_cat":"cs.AI","submitted_at":"2025-09-02T17:46:26+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Memory Token [131], both of which explicitly preserve a pool of natural-language memory representations. More frequently, works like ReSum [132], context folding [133] has also explored RL for context memory management. The second form is(II) implicit tokens, where memory is maintained in the form of latent embeddings. A representative line of work includes MemoryLLM [134] and M+ [135], in which a fixed set of latent tokens serves as \"memory tokens.\" As the context evolves, these tokens are repeatedly retrieved, integrated into the LLM's forward computation, and updated, thereby preserving contextual information and exhibiting strong resistance to forgetting. Unlike explicit tokens, these memory tokens are not tied 17 Table 3: An overview of three classic categories of agent memory; works marked with† directly employ RL."}],"limit":50,"offset":0}