Agent Learning via Early Experience

Ashish Shah; Bo Liu; Boyu Gou; Dat Huynh; Hengduo Li; Huan Sun; Jason Weston; Jiacheng Zhu; Jianwei Yang; Jian Xie

arxiv: 2510.08558 · v3 · pith:ATSZCHBVnew · submitted 2025-10-09 · 💻 cs.AI · cs.CL· cs.IR· cs.LG

Agent Learning via Early Experience

Kai Zhang , Xiangchao Chen , Bo Liu , Tianci Xue , Zeyi Liao , Zhihan Liu , Xiyao Wang , Yuting Ning

show 22 more authors

Zhaorun Chen Xiaohan Fu Jian Xie Yuxuan Sun Boyu Gou Qi Qi Zihang Meng Jianwei Yang Ning Zhang Xian Li Ashish Shah Dat Huynh Hengduo Li Zi Yang Sara Cao Lawrence Jang Shuyan Zhou Jiacheng Zhu Huan Sun Jason Weston Yu Su Yifan Wu

This is my paper

classification 💻 cs.AI cs.CLcs.IRcs.LG

keywords experienceagentagentsdataearlylearningenvironmentsimprove

0 comments

read the original abstract

A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly. This limitation stems from the nature of expert demonstrations: they capture only a narrow range of scenarios, and expose the agent to limited environment diversity. We address this limitation with a middle-ground paradigm we call early experience: interaction data generated by the agent's own actions, where the resulting future states serve as supervision without reward signals. Within this paradigm, we study two strategies of using such data: (1) implicit world modeling, which uses collected states to ground the policy in environment dynamics; and (2) self-reflection, where the agent learns from its suboptimal actions to improve reasoning and decision-making. Evaluation across eight diverse environments and multiple model families shows that our approaches consistently improve effectiveness and out-of-domain generalization, highlighting the value of early experience. Moreover, in environments with verifiable rewards, our results provide promising signals that early experience offers a strong foundation for subsequent reinforcement learning, making it a practical bridge between imitation learning and fully experience-driven agents.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 15 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EXG: Self-Evolving Agents with Experience Graphs
cs.AI 2026-05 unverdicted novelty 7.0

EXG is an experience graph framework for self-evolving LLM agents that supports online real-time growth and offline reuse to enhance solution quality and efficiency on code generation and reasoning benchmarks.
Revisiting the Travel Planning Capabilities of Large Language Models
cs.AI 2026-05 unverdicted novelty 7.0

LLMs extract explicit constraints effectively but struggle with implicit open-world requirements, structural biases in plans, and ineffective self-correction during travel planning.
Reference-Sampled Boltzmann Projection for KL-Regularized RLVR: Target-Matched Weighted SFT, Finite One-Shot Gaps, and Policy Mirror Descent
cs.LG 2026-05 unverdicted novelty 7.0

Reference-sampled weighted SFT with prompt-normalized Boltzmann weights induces the same policy as fixed-reference KL-regularized RLVR, with BOLT as the estimator and a finite one-shot error decomposition separating c...
MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning
cs.AI 2026-05 unverdicted novelty 6.0

MAP improves LLM agent reasoning by constructing a structured cognitive map of the environment before task execution, yielding performance gains on benchmarks like ARC-AGI-3 and superior training data via the new MAP-...
Learning Agent Routing From Early Experience
cs.CL 2026-05 unverdicted novelty 6.0

BoundaryRouter routes queries to LLM or agent using early experience memory from a seed set, cutting inference time 60.6% versus always using agents and raising performance 28.6% versus always using direct LLM inference.
From History to State: Constant-Context Skill Learning for LLM Agents
cs.AI 2026-05 unverdicted novelty 6.0

Constant-context skill learning trains reusable task-family modules for LLM agents using a deterministic state block for progress tracking and subgoal rewards, achieving 89.6% unseen success on ALFWorld, 76.8% on WebS...
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
cs.AI 2026-05 unverdicted novelty 6.0

CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration
cs.AI 2026-04 unverdicted novelty 6.0

LLM agents trained with a task-success reward on self-generated knowledge can spontaneously explore and adapt to new environments without any rewards or instructions at inference, yielding 20% gains on web tasks and a...
HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment
cs.LG 2026-04 unverdicted novelty 6.0

HEAL mitigates entropy collapse in few-shot RLVR by selectively adding general-domain data and aligning trajectory-level entropy dynamics, matching full-shot performance with 32 target samples.
Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory
cs.CL 2026-04 unverdicted novelty 6.0

RoMem uses a Semantic Speed Gate to assign volatility to relations and continuous phase rotation to shadow obsolete facts in complex space, delivering SOTA temporal KG completion and 2-3x gains on agentic memory benchmarks.
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
cs.AI 2025-11 unverdicted novelty 6.0

ViLoMem is a dual-stream grow-and-refine memory system that separates visual and logical error patterns in MLLMs to improve pass@1 accuracy and reduce repeated mistakes across six multimodal benchmarks.
Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models
cs.LG 2026-05 unverdicted novelty 5.0

DMoA is a differentiable multi-agent LLM framework with recurrent context-aware routing and predictive entropy self-supervision that claims SOTA results on 9 benchmarks through elastic agent collaboration.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
cs.SE 2026-04 accept novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics
cs.LG 2026-02 unverdicted novelty 5.0

UI-Oceanus shows that continual pre-training on forward dynamics predictions from synthetic GUI exploration improves agent success rates by 7% offline and 16.8% online, with gains scaling by data volume.
Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models
cs.CL 2026-04 unverdicted novelty 4.0

A 3B model with few-shot prompting reaches 79.7% of GPT-5 tool-use performance while a hypernetwork adaptation adds zero measurable benefit across four benchmarks.