UCE builds a typed, evolving library of Memory, Strategy, Workflow and Skill units from agent trajectories, improving ALFWorld success from 75.4% to 96.3% and WebShop score from 45.1% to 61.3% while transferring to new actor models.
MPO : Boosting LLM Agents with Meta Plan Optimization
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Q-Evolve unifies automatic process-reward labeling via advantage estimation and behavior-proximal policy optimization inside an in-distribution RL loop to enable self-evolving LLM agents on interactive tasks.
citing papers explorer
-
Unified Context Evolution for LLM Agents
UCE builds a typed, evolving library of Memory, Strategy, Workflow and Skill units from agent trajectories, improving ALFWorld success from 75.4% to 96.3% and WebShop score from 45.1% to 61.3% while transferring to new actor models.
-
Self-evolving LLM agents with in-distribution Optimization
Q-Evolve unifies automatic process-reward labeling via advantage estimation and behavior-proximal policy optimization inside an in-distribution RL loop to enable self-evolving LLM agents on interactive tasks.