VIGOR assigns higher rewards to LLM completions that produce smaller l2 norms of teacher-forced negative log-likelihood gradients, with sqrt(T) length correction and group ranking, yielding +3.31% math and +1.91% code gains over RLIF on Qwen2.5-7B.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
citing papers explorer
-
Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward
VIGOR assigns higher rewards to LLM completions that produce smaller l2 norms of teacher-forced negative log-likelihood gradients, with sqrt(T) length correction and group ranking, yielding +3.31% math and +1.91% code gains over RLIF on Qwen2.5-7B.
-
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.