VeRO supplies a versioned harness, benchmark suite, and empirical comparison of optimizer configurations for coding agents that improve other agents.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
A two-level evolution framework automates the design of task-specific harnesses for AI agents by optimizing both per-task performance and a reusable meta-blueprint that enables adaptation to new domains without human engineering.
LLM agents trained with a task-success reward on self-generated knowledge can spontaneously explore and adapt to new environments without any rewards or instructions at inference, yielding 20% gains on web tasks and allowing a 14B model to beat Gemini-2.5-Flash.
A feedforward graph of heterogeneous frozen LLMs linked by linear projections in a shared latent space outperforms single models on ARC-Challenge, OpenBookQA, and MMLU using just 17.6M trainable parameters.
citing papers explorer
-
VeRO: An Evaluation Harness for Agents to Optimize Agents
VeRO supplies a versioned harness, benchmark suite, and empirical comparison of optimizer configurations for coding agents that improve other agents.
-
The Last Harness You'll Ever Build
A two-level evolution framework automates the design of task-specific harnesses for AI agents by optimizing both per-task performance and a reusable meta-blueprint that enables adaptation to new domains without human engineering.
-
Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration
LLM agents trained with a task-success reward on self-generated knowledge can spontaneously explore and adapt to new environments without any rewards or instructions at inference, yielding 20% gains on web tasks and allowing a 14B model to beat Gemini-2.5-Flash.
-
Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models
A feedforward graph of heterogeneous frozen LLMs linked by linear projections in a shared latent space outperforms single models on ARC-Challenge, OpenBookQA, and MMLU using just 17.6M trainable parameters.