hub Mixed citations

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Zhuo Li · 2026 · cs.CL · arXiv 2601.03192

Mixed citation behavior. Most common role is background (60%).

20 Pith papers citing it

Background 60% of classified citations

open full Pith review browse 20 citing papers arXiv PDF

abstract

The hallmark of human intelligence is the self-evolving ability to master new skills by learning from past experiences. However, current AI agents struggle to emulate this self-evolution: fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we propose MemRL, a non-parametric approach that evolves via reinforcement learning on episodic memory. By decoupling stable reasoning from plastic memory, MemRL employs a Two-Phase Retrieval mechanism to filter noise and identify high-utility strategies through environmental feedback. Extensive experiments on HLE, BigCodeBench, ALFWorld, and Lifelong Agent Bench demonstrate that MemRL significantly outperforms state-of-the-art baselines, confirming that MemRL effectively reconciles the stability-plasticity dilemma, enabling continuous runtime improvement without weight updates. Code is available at https://github.com/MemTensor/MemRL.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 baseline 2

citation-polarity summary

background 3 baseline 2

representative citing papers

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

EXG: Self-Evolving Agents with Experience Graphs

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

EXG is an experience graph framework for self-evolving LLM agents that supports online real-time growth and offline reuse to enhance solution quality and efficiency on code generation and reasoning benchmarks.

Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

SkillTTA synthesizes temporary task-specific skills from retrieved training trajectories to boost LLM agent Pass@1 scores on SpreadsheetBench and BigCodeBench without parameter updates.

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

cs.AI · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-benchmark transfer.

MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.

Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Auto-Dreamer trains an offline memory consolidator via GRPO on agent performance to abstract cross-session patterns, outperforming baselines by 7 points on ScienceWorld with 12x smaller memory and generalizing to ALFWorld and WebArena.

SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

SkillGraph represents skills as nodes in an evolving directed graph with typed dependency edges and updates the graph from RL trajectories to boost compositional task performance.

Feedback-Normalized Developer Memory for Reinforcement-Learning Coding Agents: A Safety-Gated MCP Architecture

cs.SE · 2026-05-02 · unverdicted · novelty 6.0

RL Developer Memory is a feedback-normalized, safety-gated memory architecture for RL coding agents that logs contextual decisions and applies conservative off-policy gates to maintain 80% decision accuracy and full hard-negative suppression on a 200-case benchmark.

CreativeGame:Toward Mechanic-Aware Creative Game Generation

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

CreativeGame enables iterative HTML5 game generation via mechanic-guided planning, lineage memory, runtime validation, and programmatic rewards to produce inspectable version-to-version mechanic evolution.

Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web

cs.AI · 2026-01-18 · unverdicted · novelty 6.0

Holos is a five-layer LLM-based multi-agent system architecture using the Nuwa engine for agent generation, a market-driven Orchestrator for coordination, and an endogenous value cycle for incentive-compatible persistence in the Agentic Web.

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

cs.LG · 2026-05-11 · unverdicted · novelty 5.0 · 2 refs

SLIM dynamically optimizes the active external skill set in agentic RL via leave-one-skill-out marginal contribution estimates and lifecycle operations, delivering a 7.1% average gain over baselines on ALFWorld and SearchQA while showing some skills remain externally useful.

Learning CLI Agents with Structured Action Credit under Selective Observation

cs.AI · 2026-05-08 · unverdicted · novelty 5.0

CLI agents trained with RL benefit from selective observation via σ-Reveal and structured credit assignment via A³ that leverages AST action sub-chains and trajectory margins.

MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

cs.CL · 2026-05-07 · unverdicted · novelty 5.0 · 2 refs

MemReranker applies multi-stage distillation to Qwen3-Reranker to produce reasoning-aware rerankers that outperform baselines on memory tasks with temporal and causal constraints.

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

cs.AI · 2026-05-07 · unverdicted · novelty 5.0 · 3 refs

Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.

Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations

cs.AI · 2026-04-21 · unverdicted · novelty 5.0

Forage V2 enables agent organizations to grow knowledge from 0 to 54 entries over runs and transfer it so weaker models nearly match stronger ones in coverage, cost, and speed on open-world tasks.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

cs.SE · 2026-04-09 · accept · novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

Improve Large Language Model Systems with User Logs

cs.CL · 2026-02-06 · unverdicted · novelty 5.0

UNO distills user logs into semi-structured rules and preferences, applies query-and-feedback clustering to handle heterogeneity, quantifies cognitive gaps to filter noise, and builds primary and reflective modules that outperform RAG and memory baselines.

LLM-Oriented Information Retrieval: A Denoising-First Perspective

cs.IR · 2026-05-01 · unverdicted · novelty 4.0 · 2 refs

Argues for a denoising-first paradigm in LLM-oriented information retrieval, framing challenges via a four-stage progression and providing a taxonomy of signal-to-noise optimization techniques across the pipeline.

Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace

cs.AI · 2026-05-11

citing papers explorer

Showing 20 of 20 citing papers.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare cs.AI · 2026-05-12 · conditional · none · ref 41 · internal anchor
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
EXG: Self-Evolving Agents with Experience Graphs cs.AI · 2026-05-18 · unverdicted · none · ref 53 · internal anchor
EXG is an experience graph framework for self-evolving LLM agents that supports online real-time growth and offline reuse to enhance solution quality and efficiency on code generation and reasoning benchmarks.
Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents cs.CL · 2026-05-16 · unverdicted · none · ref 2 · internal anchor
SkillTTA synthesizes temporary task-specific skills from retrieved training trajectories to boost LLM agent Pass@1 scores on SpreadsheetBench and BigCodeBench without parameter updates.
ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents cs.AI · 2026-05-13 · unverdicted · none · ref 35 · 2 links · internal anchor
ClawForge is a generator framework that creates reproducible executable benchmarks for command-line agents under state conflict, with ClawForge-Bench showing frontier models reach at most 45.3% strict accuracy and that state inspection drives most performance gaps.
EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents cs.LG · 2026-05-13 · unverdicted · none · ref 40 · internal anchor
EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-benchmark transfer.
MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs cs.AI · 2026-05-11 · unverdicted · none · ref 32 · internal anchor
MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.
Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents cs.CL · 2026-05-20 · unverdicted · none · ref 41 · internal anchor
Auto-Dreamer trains an offline memory consolidator via GRPO on agent performance to abstract cross-session patterns, outperforming baselines by 7 points on ScienceWorld with 12x smaller memory and generalizing to ALFWorld and WebArena.
SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs cs.CL · 2026-05-12 · unverdicted · none · ref 21 · internal anchor
SkillGraph represents skills as nodes in an evolving directed graph with typed dependency edges and updates the graph from RL trajectories to boost compositional task performance.
Feedback-Normalized Developer Memory for Reinforcement-Learning Coding Agents: A Safety-Gated MCP Architecture cs.SE · 2026-05-02 · unverdicted · none · ref 38 · internal anchor
RL Developer Memory is a feedback-normalized, safety-gated memory architecture for RL coding agents that logs contextual decisions and applies conservative off-policy gates to maintain 80% decision accuracy and full hard-negative suppression on a 200-case benchmark.
CreativeGame:Toward Mechanic-Aware Creative Game Generation cs.AI · 2026-04-21 · unverdicted · none · ref 3 · internal anchor
CreativeGame enables iterative HTML5 game generation via mechanic-guided planning, lineage memory, runtime validation, and programmatic rewards to produce inspectable version-to-version mechanic evolution.
Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web cs.AI · 2026-01-18 · unverdicted · none · ref 44 · internal anchor
Holos is a five-layer LLM-based multi-agent system architecture using the Nuwa engine for agent generation, a market-driven Orchestrator for coordination, and an endogenous value cycle for incentive-compatible persistence in the Agentic Web.
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning cs.LG · 2026-05-11 · unverdicted · none · ref 69 · 2 links · internal anchor
SLIM dynamically optimizes the active external skill set in agentic RL via leave-one-skill-out marginal contribution estimates and lifecycle operations, delivering a 7.1% average gain over baselines on ALFWorld and SearchQA while showing some skills remain externally useful.
Learning CLI Agents with Structured Action Credit under Selective Observation cs.AI · 2026-05-08 · unverdicted · none · ref 73 · internal anchor
CLI agents trained with RL benefit from selective observation via σ-Reveal and structured credit assignment via A³ that leverages AST action sub-chains and trajectory margins.
MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval cs.CL · 2026-05-07 · unverdicted · none · ref 8 · 2 links · internal anchor
MemReranker applies multi-stage distillation to Qwen3-Reranker to produce reasoning-aware rerankers that outperform baselines on memory tasks with temporal and causal constraints.
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning cs.AI · 2026-05-07 · unverdicted · none · ref 49 · 3 links · internal anchor
Skill1 trains a single RL policy to co-evolve skill selection, utilization, and distillation in language model agents from one task-outcome reward, using low-frequency trends to credit selection and high-frequency variation to credit distillation, outperforming baselines on ALFWorld and WebShop.
Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations cs.AI · 2026-04-21 · unverdicted · none · ref 51 · internal anchor
Forage V2 enables agent organizations to grow knowledge from 0 to 54 entries over runs and transfer it so weaker models nearly match stronger ones in coverage, cost, and speed on open-world tasks.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 188 · internal anchor
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
Improve Large Language Model Systems with User Logs cs.CL · 2026-02-06 · unverdicted · none · ref 48 · internal anchor
UNO distills user logs into semi-structured rules and preferences, applies query-and-feedback clustering to handle heterogeneity, quantifies cognitive gaps to filter noise, and builds primary and reflective modules that outperform RAG and memory baselines.
LLM-Oriented Information Retrieval: A Denoising-First Perspective cs.IR · 2026-05-01 · unverdicted · none · ref 229 · 2 links · internal anchor
Argues for a denoising-first paradigm in LLM-oriented information retrieval, framing challenges via a four-stage progression and providing a taxonomy of signal-to-noise optimization techniques across the pipeline.
Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace cs.AI · 2026-05-11 · unreviewed · ref 57 · internal anchor

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer