AEL: Agent Evolving Learning for Open-Ended Environments

· 2026 · cs.CL · arXiv 2604.21725

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

LLM agents increasingly operate in open-ended environments spanning hundreds of sequential episodes, yet they remain largely stateless: each task is solved from scratch without converting past experience into better future behavior. The central obstacle is not \emph{what} to remember but \emph{how to use} what has been remembered, including which retrieval policy to apply, how to interpret prior outcomes, and when the current strategy itself must change. We introduce \emph{Agent Evolving Learning} (\ael{}), a two-timescale framework that addresses this obstacle. At the fast timescale, a Thompson Sampling bandit learns which memory retrieval policy to apply at each episode; at the slow timescale, LLM-driven reflection diagnoses failure patterns and injects causal insights into the agent's decision prompt, giving it an interpretive frame for the evidence it retrieves. On a sequential portfolio benchmark (10 sector-diverse tickers, 208 episodes, 5 random seeds), \ael{} achieves a Sharpe ratio of 2.13$\pm$0.47, outperforming five published self-improving methods and all non-LLM baselines while maintaining the lowest variance among all LLM-based approaches. A nine-variant ablation reveals a ``less is more'' pattern: memory and reflection together produce a 58\% cumulative improvement over the stateless baseline, yet every additional mechanism we test (planner evolution, per-tool selection, cold-start initialization, skill extraction, and three credit assignment methods) \emph{degrades} performance. This demonstrates that the bottleneck in agent self-improvement is \emph{self-diagnosing how to use} experience rather than adding architectural complexity. Code and data: https://github.com/WujiangXu/AEL.

representative citing papers

COMAP: Co-Evolving World Models and Agent Policies for LLM Agents

cs.AI · 2026-06-01 · unverdicted · novelty 6.0

COMAP co-evolves textual world models and agent policies for LLMs through on-policy self-distillation, yielding up to 16.75% relative gains on embodied planning, web navigation, and tool-use tasks.

How do Humans Process AI-generated Hallucination Contents: a Neuroimaging Study

cs.AI · 2026-05-16 · unverdicted · novelty 6.0

EEG study reveals distinct ERP patterns for AI hallucinations, with misjudged ones failing to trigger standard neurocognitive verification pathways.

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

MemEye benchmark evaluates multimodal memory on visual granularity and evidence synthesis, finding that 13 methods across 4 VLMs struggle with fine details and temporal state changes.

citing papers explorer

Showing 2 of 2 citing papers after filters.

COMAP: Co-Evolving World Models and Agent Policies for LLM Agents cs.AI · 2026-06-01 · unverdicted · none · ref 64 · internal anchor
COMAP co-evolves textual world models and agent policies for LLMs through on-policy self-distillation, yielding up to 16.75% relative gains on embodied planning, web navigation, and tool-use tasks.
How do Humans Process AI-generated Hallucination Contents: a Neuroimaging Study cs.AI · 2026-05-16 · unverdicted · none · ref 15 · internal anchor
EEG study reveals distinct ERP patterns for AI hallucinations, with misjudged ones failing to trigger standard neurocognitive verification pathways.

AEL: Agent Evolving Learning for Open-Ended Environments

fields

years

verdicts

representative citing papers

citing papers explorer