hub Canonical reference

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

Hongli Yu, Tinghong Chen, Jiangtao Feng, Jiangjie Chen, Weinan Dai, Qiying Yu · 2025 · cs.CL · arXiv 2507.02259

Canonical reference. 100% of citing Pith papers cite this work as background.

37 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 37 citing papers arXiv PDF

abstract

Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents with linear complexity without performance degradation during extrapolation remains the ultimate challenge in long-text processing. We directly optimize for long-text tasks in an end-to-end fashion and introduce a novel agent workflow, MemAgent, which reads text in segments and updates the memory using an overwrite strategy. We extend the DAPO algorithm to facilitate training via independent-context multi-conversation generation. MemAgent has demonstrated superb long-context capabilities, being able to extrapolate from an 8K context trained on 32K text to a 3.5M QA task with performance loss < 5% and achieves 95%+ in 512K RULER test.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6

citation-polarity summary

background 6

representative citing papers

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents

cs.AI · 2026-05-22 · unverdicted · novelty 7.0

Co-ReAct adds step-level rubric guidance to ReAct agents via a GRPO-trained generator using list-wise ranking rewards, yielding consistent gains on DeepResearchBench and SQA-CS-V2.

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent

cs.AI · 2026-04-08 · unverdicted · novelty 7.0

PRIME enables agents to proactively reason in user-centric tasks by iteratively evolving structured memories from interaction trajectories without gradient-based training.

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

cs.AI · 2026-03-24 · unverdicted · novelty 7.0

PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

cs.CL · 2026-03-16 · unverdicted · novelty 7.0

CLAG organizes agent memory into clusters via an SLM router and uses cluster profiles for two-stage retrieval, yielding better answer quality on QA benchmarks than prior memory systems.

LMEB: Long-horizon Memory Embedding Benchmark

cs.CL · 2026-03-13 · unverdicted · novelty 7.0

LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

cs.AI · 2026-01-14 · conditional · novelty 7.0

PersonalAlign introduces a hierarchical memory agent that uses long-term user records to resolve vague GUI instructions and provide proactive assistance, improving execution by 15.7% and proactive performance by 7.3% on the new AndroidIntent benchmark.

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

cs.CL · 2025-11-04 · unverdicted · novelty 7.0

MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

cs.CL · 2025-07-07 · unverdicted · novelty 7.0

MemoryAgentBench is a new multi-turn benchmark assessing four memory competencies in LLM agents—accurate retrieval, test-time learning, long-range understanding, and selective forgetting—showing that existing methods fall short.

Mem-$\pi$: Adaptive Memory through Learning When and What to Generate

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.

Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Auto-Dreamer trains an offline memory consolidator via GRPO on agent performance to abstract cross-session patterns, outperforming baselines by 7 points on ScienceWorld with 12x smaller memory and generalizing to ALFWorld and WebArena.

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.

EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

EvoMemBench evaluates 15 memory methods for LLM agents and finds long-context baselines competitive with no single memory approach working consistently across settings.

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Context-ReAct enables agents to dynamically manage context via five atomic operations, and LongSeeker fine-tuned on 10k trajectories achieves 61.5% and 62.5% on BrowseComp benchmarks, outperforming prior agents.

An Agentic Approach to Metadata Reasoning

cs.DB · 2026-04-22 · unverdicted · novelty 6.0

Metadata Reasoner uses agentic LLM reasoning on metadata to select sufficient and minimal data sources, achieving 83.16% F1 on KramaBench and 85.5% F1 on noisy synthetic benchmarks while avoiding low-quality tables 99% of the time.

POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.

MEMENTO: Teaching LLMs to Manage Their Own Context

cs.AI · 2026-04-10 · unverdicted · novelty 6.0

MEMENTO trains LLMs to segment reasoning into blocks, generate mementos as dense summaries, and reason forward using only mementos and KV states, cutting peak KV cache by ~2.5x while preserving benchmark accuracy.

In-Place Test-Time Training

cs.LG · 2026-04-07 · conditional · novelty 6.0

In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.

Decocted Experience Improves Test-Time Inference in LLM Agents

cs.AI · 2026-04-06 · unverdicted · novelty 6.0

Decocted experience—extracting and organizing the essence from accumulated interactions—enables more effective context construction that improves test-time inference in LLM agents on math, web, and software tasks.

LightThinker++: From Reasoning Compression to Memory Management

cs.CL · 2026-04-04 · unverdicted · novelty 6.0

LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

cs.AI · 2026-04-03 · unverdicted · novelty 6.0

GrandCode is the first AI system to consistently beat all human participants and place first in live Codeforces competitive programming contests.

citing papers explorer

Showing 37 of 37 citing papers.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare cs.AI · 2026-05-12 · conditional · none · ref 39 · internal anchor
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty cs.CL · 2026-05-12 · unverdicted · none · ref 21 · internal anchor
Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.
Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents cs.AI · 2026-05-22 · unverdicted · none · ref 13 · internal anchor
Co-ReAct adds step-level rubric guidance to ReAct agents via a GRPO-trained generator using list-wise ranking rewards, yielding consistent gains on DeepResearchBench and SQA-CS-V2.
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States cs.CL · 2026-05-06 · unverdicted · none · ref 42 · internal anchor
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 172 · internal anchor
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent cs.AI · 2026-04-08 · unverdicted · none · ref 23 · internal anchor
PRIME enables agents to proactively reason in user-centric tasks by iteratively evolving structured memories from interaction trajectories without gradient-based training.
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments cs.AI · 2026-03-24 · unverdicted · none · ref 78 · internal anchor
PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents cs.CL · 2026-03-16 · unverdicted · none · ref 7 · internal anchor
CLAG organizes agent memory into clusters via an SLM router and uses cluster profiles for two-stage retrieval, yielding better answer quality on QA benchmarks than prior memory systems.
LMEB: Long-horizon Memory Embedding Benchmark cs.CL · 2026-03-13 · unverdicted · none · ref 38 · internal anchor
LMEB benchmark shows that embedding models' performance on traditional retrieval does not transfer to long-horizon memory tasks, larger models do not always perform better, and LMEB measures capabilities orthogonal to MTEB.
PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records cs.AI · 2026-01-14 · conditional · none · ref 5 · internal anchor
PersonalAlign introduces a hierarchical memory agent that uses long-term user records to resolve vague GUI instructions and provide proactive assistance, improving execution by 15.7% and proactive performance by 7.3% on the new AndroidIntent benchmark.
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning cs.CL · 2025-11-04 · unverdicted · none · ref 36 · internal anchor
MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.
Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions cs.CL · 2025-07-07 · unverdicted · none · ref 46 · internal anchor
MemoryAgentBench is a new multi-turn benchmark assessing four memory competencies in LLM agents—accurate retrieval, test-time learning, long-range understanding, and selective forgetting—showing that existing methods fall short.
Mem-$\pi$: Adaptive Memory through Learning When and What to Generate cs.CL · 2026-05-20 · unverdicted · none · ref 55 · internal anchor
Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.
Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents cs.CL · 2026-05-20 · unverdicted · none · ref 39 · internal anchor
Auto-Dreamer trains an offline memory consolidator via GRPO on agent performance to abstract cross-session patterns, outperforming baselines by 7 points on ScienceWorld with 12x smaller memory and generalizing to ALFWorld and WebArena.
PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents cs.AI · 2026-05-19 · unverdicted · none · ref 48 · internal anchor
PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.
EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective cs.CL · 2026-05-18 · unverdicted · none · ref 40 · internal anchor
EvoMemBench evaluates 15 memory methods for LLM agents and finds long-context baselines competitive with no single memory approach working consistently across settings.
LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents cs.AI · 2026-05-06 · unverdicted · none · ref 15 · internal anchor
Context-ReAct enables agents to dynamically manage context via five atomic operations, and LongSeeker fine-tuned on 10k trajectories achieves 61.5% and 62.5% on BrowseComp benchmarks, outperforming prior agents.
An Agentic Approach to Metadata Reasoning cs.DB · 2026-04-22 · unverdicted · none · ref 41 · internal anchor
Metadata Reasoner uses agentic LLM reasoning on metadata to select sufficient and minimal data sources, achieving 83.16% F1 on KramaBench and 85.5% F1 on noisy synthetic benchmarks while avoiding low-quality tables 99% of the time.
POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch cs.CV · 2026-04-15 · unverdicted · none · ref 58 · internal anchor
POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.
MEMENTO: Teaching LLMs to Manage Their Own Context cs.AI · 2026-04-10 · unverdicted · none · ref 35 · internal anchor
MEMENTO trains LLMs to segment reasoning into blocks, generate mementos as dense summaries, and reason forward using only mementos and KV states, cutting peak KV cache by ~2.5x while preserving benchmark accuracy.
In-Place Test-Time Training cs.LG · 2026-04-07 · conditional · none · ref 65 · internal anchor
In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
Decocted Experience Improves Test-Time Inference in LLM Agents cs.AI · 2026-04-06 · unverdicted · none · ref 4 · internal anchor
Decocted experience—extracting and organizing the essence from accumulated interactions—enables more effective context construction that improves test-time inference in LLM agents on math, web, and software tasks.
LightThinker++: From Reasoning Compression to Memory Management cs.CL · 2026-04-04 · unverdicted · none · ref 72 · internal anchor
LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning cs.AI · 2026-04-03 · unverdicted · none · ref 37 · internal anchor
GrandCode is the first AI system to consistently beat all human participants and place first in live Codeforces competitive programming contests.
MemFactory: Unified Inference & Training Framework for Agent Memory cs.CL · 2026-03-31 · unverdicted · none · ref 17 · internal anchor
MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.
MICA: Multi-granularity Intertemporal Credit Assignment for Long-Horizon Emotional Support Dialogue cs.CL · 2026-03-06 · unverdicted · none · ref 59 · internal anchor
MICA combines incremental per-turn distance rewards and Monte Carlo returns from a shared potential function over user support states to create a mixed advantage signal that enables stable multi-turn RL optimization for emotional support dialogues.
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens cs.CL · 2026-03-06 · unverdicted · none · ref 49 · internal anchor
MSA is an end-to-end trainable memory model using sparse attention and document-wise RoPE that scales to 100M tokens with linear complexity and less than 9% degradation.
HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling cs.AI · 2026-02-15 · unverdicted · none · ref 47 · internal anchor
HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory cs.CL · 2025-11-25 · unverdicted · none · ref 140 · internal anchor
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey cs.AI · 2025-09-02 · accept · none · ref 137 · internal anchor
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents cs.LG · 2026-05-20 · unverdicted · none · ref 25 · internal anchor
Memory-R2 proposes LoGo-GRPO to fix unfair trajectory comparisons in RL training of memory-augmented LLM agents by combining global end-to-end rewards with local rerollouts from identical memory states.
MiA-Signature: Approximating Global Activation for Long-Context Understanding cs.CL · 2026-05-07 · unverdicted · none · ref 36 · internal anchor
MiA-Signature approximates the global activation state induced by a query via submodular concept selection to enable tractable long-context understanding in LLMs.
LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning cs.LG · 2026-04-16 · unverdicted · none · ref 31 · internal anchor
LongAct uses saliency from high-magnitude activations to guide sparse weight updates in long-context RL, yielding about 8% gains on LongBench v2 across multiple algorithms.
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning cs.AI · 2026-01-29 · unverdicted · none · ref 32 · internal anchor
MemOCR renders structured memory as images with adaptive visual density to improve long-horizon reasoning under tight context budgets.
xOffense: An Autonomous Multi-Agent Framework for Penetration Testing with Domain-Adapted Large Language Models cs.CR · 2025-09-16 · unverdicted · none · ref 40 · internal anchor
xOffense automates penetration testing via a fine-tuned Qwen3-32B LLM in a multi-agent setup with specialized agents for reconnaissance, vulnerability scanning, and exploitation, reporting 79.17% sub-task completion on AutoPenBench and AI-Pentest-Benchmark.
What Deserves Memory: Adaptive Memory Distillation for LLM Agents cs.AI · 2025-08-05 · unverdicted · none · ref 22 · internal anchor
NEMORI is an adaptive memory distillation framework for LLM agents that transforms raw interactions into narratives and extracts insights via prediction error to decide what deserves retention.
Understand and Accelerate Memory Processing Pipeline for Large Language Model Inference cs.DC · 2026-03-30 · unreviewed · ref 29 · internal anchor

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer