hub

A survey on self-evolution of large language models

A survey on self-evolution of large language models , author= · 2026 · arXiv 2404.14387

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

PACE: Two-Timescale Self-Evolution for Small Language Model Agents

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

PACE coordinates low-risk prompt evolution with validated higher-risk control-logic updates to improve frozen SLM agents on benchmarks without model retraining.

EvoVid: Temporal-Centric Self-Evolution for Video Large Language Models

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

EvoVid proposes a temporal-centric self-evolution framework for Video-LLMs that uses temporal-aware Questioner and temporal-grounded Solver rewards to improve performance directly from unannotated videos.

Mem-$\pi$: Adaptive Memory through Learning When and What to Generate

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.

Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.

G-Zero: Self-Play for Open-Ended Generation from Zero Data

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

G-Zero uses the Hint-δ intrinsic reward to drive co-evolution between a Proposer and Generator via GRPO and DPO, providing a theoretical suboptimality guarantee for self-improvement from internal dynamics alone.

DarwinTOD: LLM-driven Lifelong Self-evolution for Task-oriented Dialog Systems

cs.MA · 2026-01-12 · unverdicted · novelty 6.0

DarwinTOD proposes a dual-loop LLM-driven framework with an Evolvable Strategy Bank that enables lifelong autonomous improvement in task-oriented dialog systems through online multi-agent critique and offline evolutionary refinement.

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

cs.AI · 2025-09-02 · accept · novelty 6.0

Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.

Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

cs.AI · 2026-04-29 · unverdicted · novelty 5.0 · 2 refs

Bian Que is an agentic framework using a unified operational paradigm, flexible Skill Arrangement, and self-evolving mechanism to automate O&M tasks, achieving 75% alert reduction and over 50% MTTR cut in production deployment.

ACE: Self-Evolving LLM Coding Framework via Adversarial Unit Test Generation and Preference Optimization

cs.SE · 2026-04-17 · unverdicted · novelty 5.0 · 2 refs

ACE introduces a solver-adversary loop where an LLM generates both candidate programs and adversarial tests, using execution outcomes for preference optimization to achieve 3-7% pass@1 gains on code benchmarks without ground-truth code.

SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation

cs.AI · 2026-03-23 · unverdicted · novelty 5.0

SOLAR introduces a self-optimizing agent using meta-learning on model weights and RL-driven strategy discovery for lifelong adaptation in LLMs, claiming superior performance on reasoning tasks across domains.

Large Language Model-Brained GUI Agents: A Survey

cs.AI · 2024-11-27 · unverdicted · novelty 4.0

A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.

Rethinking Agentic Reinforcement Learning In Large Language Models

cs.AI · 2026-04-30 · unverdicted · novelty 3.0 · 3 refs

The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.

citing papers explorer

Showing 13 of 13 citing papers.

PACE: Two-Timescale Self-Evolution for Small Language Model Agents cs.LG · 2026-05-21 · unverdicted · none · ref 33
PACE coordinates low-risk prompt evolution with validated higher-risk control-logic updates to improve frozen SLM agents on benchmarks without model retraining.
EvoVid: Temporal-Centric Self-Evolution for Video Large Language Models cs.CV · 2026-05-21 · unverdicted · none · ref 15
EvoVid proposes a temporal-centric self-evolution framework for Video-LLMs that uses temporal-aware Questioner and temporal-grounded Solver rewards to improve performance directly from unannotated videos.
Mem-$\pi$: Adaptive Memory through Learning When and What to Generate cs.CL · 2026-05-20 · unverdicted · none · ref 41
Mem-π is a framework using a dedicated model and decision-content decoupled RL to generate context-specific guidance on demand for LLM agents, outperforming retrieval baselines by over 30% on web navigation.
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution cs.AI · 2026-05-14 · unverdicted · none · ref 36
Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.
G-Zero: Self-Play for Open-Ended Generation from Zero Data cs.LG · 2026-05-11 · unverdicted · none · ref 27
G-Zero uses the Hint-δ intrinsic reward to drive co-evolution between a Proposer and Generator via GRPO and DPO, providing a theoretical suboptimality guarantee for self-improvement from internal dynamics alone.
DarwinTOD: LLM-driven Lifelong Self-evolution for Task-oriented Dialog Systems cs.MA · 2026-01-12 · unverdicted · none · ref 3
DarwinTOD proposes a dual-loop LLM-driven framework with an Evolvable Strategy Bank that enables lifelong autonomous improvement in task-oriented dialog systems through online multi-agent critique and offline evolutionary refinement.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey cs.AI · 2025-09-02 · accept · none · ref 25
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning cs.LG · 2026-05-16 · unverdicted · none · ref 46
D²Evo mines medium-difficulty anchors from the current model, trains a Questioner to generate matching questions, and jointly optimizes Solver and Questioner for progressive gains, outperforming baselines on math reasoning with under 2K real samples.
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations cs.AI · 2026-04-29 · unverdicted · none · ref 45 · 2 links
Bian Que is an agentic framework using a unified operational paradigm, flexible Skill Arrangement, and self-evolving mechanism to automate O&M tasks, achieving 75% alert reduction and over 50% MTTR cut in production deployment.
ACE: Self-Evolving LLM Coding Framework via Adversarial Unit Test Generation and Preference Optimization cs.SE · 2026-04-17 · unverdicted · none · ref 12 · 2 links
ACE introduces a solver-adversary loop where an LLM generates both candidate programs and adversarial tests, using execution outcomes for preference optimization to achieve 3-7% pass@1 gains on code benchmarks without ground-truth code.
SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation cs.AI · 2026-03-23 · unverdicted · none · ref 26
SOLAR introduces a self-optimizing agent using meta-learning on model weights and RL-driven strategy discovery for lifelong adaptation in LLMs, claiming superior performance on reasoning tasks across domains.
Large Language Model-Brained GUI Agents: A Survey cs.AI · 2024-11-27 · unverdicted · none · ref 262
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.
Rethinking Agentic Reinforcement Learning In Large Language Models cs.AI · 2026-04-30 · unverdicted · none · ref 85 · 3 links
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.

A survey on self-evolution of large language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer