SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents

· 2026 · cs.AI · arXiv 2604.07791

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) have demonstrated significant potential in single-turn reasoning tasks. With the paradigm shift toward self-evolving agentic learning, models are increasingly expected to learn from trajectories by synthesizing tools or accumulating explicit experiences. However, prevailing methods typically rely on large-scale LLMs or multi-agent frameworks, which hinder their deployment in resource-constrained environments. The inherent sparsity of outcome-based rewards also poses a substantial challenge, as agents typically receive feedback only upon completion of tasks. To address these limitations, we introduce a Tool-Memory based self-evolving agentic framework SEARL. Unlike approaches that directly utilize interaction experiences, our method constructs a structured experience memory that integrates planning with execution. This provides a novel state abstraction that facilitates generalization across analogous contexts, such as tool reuse. Consequently, agents extract explicit knowledge from historical data while leveraging inter-trajectory correlations to densify reward signals. We evaluate our framework on knowledge reasoning and mathematics tasks, demonstrating its effectiveness in achieving more practical and efficient learning.

representative citing papers

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

cs.CL · 2026-06-03 · unverdicted · novelty 5.0

Existing methods for turning LLM interaction experience into parametric skills collapse over multiple iterations; principle-level experience, step-wise injection, and off-policy teacher distillation yield more stable continual learning.

citing papers explorer

Showing 1 of 1 citing paper.

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents cs.CL · 2026-06-03 · unverdicted · none · ref 24 · internal anchor
Existing methods for turning LLM interaction experience into parametric skills collapse over multiple iterations; principle-level experience, step-wise injection, and off-policy teacher distillation yield more stable continual learning.

SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents

fields

years

verdicts

representative citing papers

citing papers explorer