PACE coordinates low-risk prompt evolution with validated higher-risk control-logic updates to improve frozen SLM agents on benchmarks without model retraining.
hub
A survey on self-evolution of large language models
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
EvoVid proposes a temporal-centric self-evolution framework for Video-LLMs that uses temporal-aware Questioner and temporal-grounded Solver rewards to improve performance directly from unannotated videos.
Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.
Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.
G-Zero uses the Hint-δ intrinsic reward to drive co-evolution between a Proposer and Generator via GRPO and DPO, providing a theoretical suboptimality guarantee for self-improvement from internal dynamics alone.
DarwinTOD proposes a dual-loop LLM-driven framework with an Evolvable Strategy Bank that enables lifelong autonomous improvement in task-oriented dialog systems through online multi-agent critique and offline evolutionary refinement.
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.
Bian Que is an agentic framework using a unified operational paradigm, flexible Skill Arrangement, and self-evolving mechanism to automate O&M tasks, achieving 75% alert reduction and over 50% MTTR cut in production deployment.
ACE introduces a solver-adversary loop where an LLM generates both candidate programs and adversarial tests, using execution outcomes for preference optimization to achieve 3-7% pass@1 gains on code benchmarks without ground-truth code.
SOLAR introduces a self-optimizing agent using meta-learning on model weights and RL-driven strategy discovery for lifelong adaptation in LLMs, claiming superior performance on reasoning tasks across domains.
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.
citing papers explorer
-
PACE: Two-Timescale Self-Evolution for Small Language Model Agents
PACE coordinates low-risk prompt evolution with validated higher-risk control-logic updates to improve frozen SLM agents on benchmarks without model retraining.
-
EvoVid: Temporal-Centric Self-Evolution for Video Large Language Models
EvoVid proposes a temporal-centric self-evolution framework for Video-LLMs that uses temporal-aware Questioner and temporal-grounded Solver rewards to improve performance directly from unannotated videos.
-
Mem-$\pi$: Adaptive Memory through Learning When and What to Generate
Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.
-
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution
Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.
-
G-Zero: Self-Play for Open-Ended Generation from Zero Data
G-Zero uses the Hint-δ intrinsic reward to drive co-evolution between a Proposer and Generator via GRPO and DPO, providing a theoretical suboptimality guarantee for self-improvement from internal dynamics alone.
-
DarwinTOD: LLM-driven Lifelong Self-evolution for Task-oriented Dialog Systems
DarwinTOD proposes a dual-loop LLM-driven framework with an Evolvable Strategy Bank that enables lifelong autonomous improvement in task-oriented dialog systems through online multi-agent critique and offline evolutionary refinement.
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
-
D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning
D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.
-
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
Bian Que is an agentic framework using a unified operational paradigm, flexible Skill Arrangement, and self-evolving mechanism to automate O&M tasks, achieving 75% alert reduction and over 50% MTTR cut in production deployment.
-
ACE: Self-Evolving LLM Coding Framework via Adversarial Unit Test Generation and Preference Optimization
ACE introduces a solver-adversary loop where an LLM generates both candidate programs and adversarial tests, using execution outcomes for preference optimization to achieve 3-7% pass@1 gains on code benchmarks without ground-truth code.
-
SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation
SOLAR introduces a self-optimizing agent using meta-learning on model weights and RL-driven strategy discovery for lifelong adaptation in LLMs, claiming superior performance on reasoning tasks across domains.
-
Large Language Model-Brained GUI Agents: A Survey
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.
-
Rethinking Agentic Reinforcement Learning In Large Language Models
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.