hub Mixed citations

CodeEvolve: An open source evolutionary coding agent for algorithmic discovery and optimization, March 2026

Henrique Assumpção, Diego Ferreira, Leandro Campos, Fabricio Murai · 2025 · cs.AI · arXiv 2510.14150

Mixed citation behavior. Most common role is background (67%).

19 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 19 citing papers arXiv PDF

abstract

We introduce CodeEvolve, an open-source framework that couples large language models with island-based evolutionary search for end-to-end algorithmic discovery. CodeEvolve integrates inspiration-based crossover, meta-prompting, and depth-based refinement on top of a CVT-MAP-Elites archive and a weighted LLM ensemble to generate optimized solutions for complex problems. On the AlphaEvolve benchmark suite, CodeEvolve matches or surpasses the reported AlphaEvolve results on 5 of 9 problems and, under matched conditions, outperforms the open-source frameworks OpenEvolve and ShinkaEvolve on 6 of 9. With the open-weight Qwen3-Coder-30B backbone, it surpasses the reported AlphaEvolve score on both CirclePackingSquare instances at roughly an order of magnitude lower cost than a frontier closed-source ensemble, and remains competitive with EoH on heuristic-design tasks without retuning. Ablations show that the interaction between CodeEvolve's components, rather than any single operator, drives these results. We release the framework, experimental data, and practical hyperparameter guidelines at https://github.com/inter-co/science-codeevolve.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 baseline 1 other 1

citation-polarity summary

background 6 unclear 2 baseline 1

representative citing papers

What Do Evolutionary Coding Agents Evolve?

cs.NE · 2026-05-19 · unverdicted · novelty 7.0

Evolutionary coding agents achieve most benchmark gains through a small subset of edit types and by cycling previously deleted code lines rather than developing new algorithmic structures.

From I/O to Code with Discovery Agent

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DIO-Agent frames IO2Code as LLM-driven evolutionary search over programs with a Transformation Priority Premise to favor simple hypotheses, outperforming baselines on a new IO2CodeBench.

SemaTune: Semantic-Aware Online OS Tuning with Large Language Models

cs.OS · 2026-05-14 · unverdicted · novelty 7.0

SemaTune uses LLM guidance with semantic context to tune up to 41 Linux OS parameters, delivering 72.5% performance gains over defaults and 153.3% over non-LLM baselines on 13 workloads while avoiding degraded states.

MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

MLS-Bench is a benchmark with 140 tasks that evaluates AI agents on inventing generalizable and scalable ML methods, finding they lag human performance especially in insight-driven invention rather than tuning.

Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.

Evolutionary Search for Automated Design of Uncertainty Quantification Methods

cs.CL · 2026-04-03 · unverdicted · novelty 7.0

LLM-driven evolutionary search discovers unsupervised UQ methods as Python programs that improve ROC-AUC by up to 6.7% over manual baselines on atomic claim verification across 9 datasets with OOD generalization.

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

cs.CL · 2026-06-27 · unverdicted · novelty 6.0

Evolution Fine-Tuning trains LLMs on 156K trajectories spanning 371 tasks to achieve 10.22% average improvement on 22 held-out optimization tasks and match SOTA on select circle-packing problems when combined with test-time RL.

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

cs.SE · 2026-05-31 · unverdicted · novelty 6.0

BenchEvolver evolves coding problem solutions to generate harder, valid tasks, producing LiveCodeBench-Plus where frontier models score 27.5-62.6% and enabling RL gains on held-out tests.

FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.

Open-Ended Task Discovery via Bayesian Optimization

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

Generate-Select-Refine is an open-ended Bayesian optimization method that generates tasks and concentrates evaluations on the best one with only logarithmic regret overhead relative to standard single-task optimization.

Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization

cs.AI · 2026-04-28 · accept · novelty 6.0

An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation

cs.CL · 2026-04-17 · unverdicted · novelty 6.0

AdaExplore improves correctness and speed of Triton kernel generation by converting recurring failures into a memory of rules and organizing search as a tree that mixes local refinements with larger regenerations, yielding 3.12x and 1.72x speedups on KernelBench Level-2 and Level-3 within 100 steps.

TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution

cs.NE · 2026-04-12 · unverdicted · novelty 6.0

TurboEvolve improves LLM program evolution by running parallel islands with LLM-generated diverse candidates that carry self-assigned weights, an adaptive scheduler, and clustered seed injection to reach stronger solutions at lower evaluation budgets.

Learning to Solve and Optimize by Evolving Code

cs.LG · 2026-05-29 · unverdicted · novelty 5.0

CHECKMATE evolves correct high-performing solvers from formal specs and natural language descriptions, outperforming SOTA on configuration and scheduling problems.

Evolutionary Ensemble of Agents

cs.NE · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.

PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing, recommendation, and protein tasks.

Algorithmic algorithm development with LLMs: A Case Study on LLM-Usage for Contraction Order Optimization in Tensor Networks

cs.AI · 2026-06-01 · unverdicted · novelty 4.0

Case study applies verifier-guided LLM evolutionary agents to contraction-order optimization in tensor networks and concludes that human validation remains essential.

Effective Harness Engineering for Algorithm Discovery with Coding Agents

cs.SE · 2026-05-13 · unverdicted · novelty 4.0

Under fixed token budget on Circle Packing, deeper per-candidate reasoning beats generating more shallow candidates, and capable models produce evaluation hacks at higher rates.

citing papers explorer

Showing 18 of 18 citing papers after filters.

What Do Evolutionary Coding Agents Evolve? cs.NE · 2026-05-19 · unverdicted · none · ref 5 · internal anchor
Evolutionary coding agents achieve most benchmark gains through a small subset of edit types and by cycling previously deleted code lines rather than developing new algorithmic structures.
From I/O to Code with Discovery Agent cs.LG · 2026-05-14 · unverdicted · none · ref 1 · internal anchor
DIO-Agent frames IO2Code as LLM-driven evolutionary search over programs with a Transformation Priority Premise to favor simple hypotheses, outperforming baselines on a new IO2CodeBench.
SemaTune: Semantic-Aware Online OS Tuning with Large Language Models cs.OS · 2026-05-14 · unverdicted · none · ref 6 · internal anchor
SemaTune uses LLM guidance with semantic context to tune up to 41 Linux OS parameters, delivering 72.5% performance gains over defaults and 153.3% over non-LLM baselines on 13 workloads while avoiding degraded states.
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI cs.LG · 2026-05-09 · unverdicted · none · ref 5 · 2 links · internal anchor
MLS-Bench is a benchmark with 140 tasks that evaluates AI agents on inventing generalizable and scalable ML methods, finding they lag human performance especially in insight-driven invention rather than tuning.
Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics cs.DC · 2026-04-08 · unverdicted · none · ref 30 · internal anchor
Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.
Evolutionary Search for Automated Design of Uncertainty Quantification Methods cs.CL · 2026-04-03 · unverdicted · none · ref 1 · internal anchor
LLM-driven evolutionary search discovers unsupervised UQ methods as Python programs that improve ROC-AUC by up to 6.7% over manual baselines on atomic claim verification across 9 datasets with OOD generalization.
Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks cs.CL · 2026-06-27 · unverdicted · none · ref 9 · internal anchor
Evolution Fine-Tuning trains LLMs on 156K trajectories spanning 371 tasks to achieve 10.22% average improvement on 22 held-out optimization tasks and match SOTA on select circle-packing problems when combined with test-time RL.
BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution cs.SE · 2026-05-31 · unverdicted · none · ref 43 · internal anchor
BenchEvolver evolves coding problem solutions to generate harder, valid tasks, producing LiveCodeBench-Plus where frontier models score 27.5-62.6% and enabling RL gains on held-out tests.
FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration cs.LG · 2026-05-08 · unverdicted · none · ref 3 · internal anchor
FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.
Open-Ended Task Discovery via Bayesian Optimization cs.AI · 2026-05-08 · unverdicted · none · ref 9 · internal anchor
Generate-Select-Refine is an open-ended Bayesian optimization method that generates tasks and concentrates evaluations on the best one with only logarithmic regret overhead relative to standard single-task optimization.
Evaluation-driven Scaling for Scientific Discovery cs.LG · 2026-04-21 · unverdicted · none · ref 7 · internal anchor
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation cs.CL · 2026-04-17 · unverdicted · none · ref 2 · internal anchor
AdaExplore improves correctness and speed of Triton kernel generation by converting recurring failures into a memory of rules and organizing search as a tree that mixes local refinements with larger regenerations, yielding 3.12x and 1.72x speedups on KernelBench Level-2 and Level-3 within 100 steps.
TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution cs.NE · 2026-04-12 · unverdicted · none · ref 2 · internal anchor
TurboEvolve improves LLM program evolution by running parallel islands with LLM-generated diverse candidates that carry self-assigned weights, an adaptive scheduler, and clustered seed injection to reach stronger solutions at lower evaluation budgets.
Learning to Solve and Optimize by Evolving Code cs.LG · 2026-05-29 · unverdicted · none · ref 1 · internal anchor
CHECKMATE evolves correct high-performing solvers from formal specs and natural language descriptions, outperforming SOTA on configuration and scheduling problems.
Evolutionary Ensemble of Agents cs.NE · 2026-05-09 · unverdicted · none · ref 9 · 2 links · internal anchor
EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.
PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents cs.LG · 2026-05-07 · unverdicted · none · ref 4 · internal anchor
PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing, recommendation, and protein tasks.
Algorithmic algorithm development with LLMs: A Case Study on LLM-Usage for Contraction Order Optimization in Tensor Networks cs.AI · 2026-06-01 · unverdicted · none · ref 3 · internal anchor
Case study applies verifier-guided LLM evolutionary agents to contraction-order optimization in tensor networks and concludes that human validation remains essential.
Effective Harness Engineering for Algorithm Discovery with Coding Agents cs.SE · 2026-05-13 · unverdicted · none · ref 15 · internal anchor
Under fixed token budget on Circle Packing, deeper per-candidate reasoning beats generating more shallow candidates, and capable models produce evaluation hacks at higher rates.

CodeEvolve: An open source evolutionary coding agent for algorithmic discovery and optimization, March 2026

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer