pith. sign in

hub Mixed citations

CodeEvolve: An open source evolutionary coding agent for algorithmic discovery and optimization, March 2026

Mixed citation behavior. Most common role is background (67%).

19 Pith papers citing it
Background 67% of classified citations
abstract

We introduce CodeEvolve, an open-source framework that couples large language models with island-based evolutionary search for end-to-end algorithmic discovery. CodeEvolve integrates inspiration-based crossover, meta-prompting, and depth-based refinement on top of a CVT-MAP-Elites archive and a weighted LLM ensemble to generate optimized solutions for complex problems. On the AlphaEvolve benchmark suite, CodeEvolve matches or surpasses the reported AlphaEvolve results on 5 of 9 problems and, under matched conditions, outperforms the open-source frameworks OpenEvolve and ShinkaEvolve on 6 of 9. With the open-weight Qwen3-Coder-30B backbone, it surpasses the reported AlphaEvolve score on both CirclePackingSquare instances at roughly an order of magnitude lower cost than a frontier closed-source ensemble, and remains competitive with EoH on heuristic-design tasks without retuning. Ablations show that the interaction between CodeEvolve's components, rather than any single operator, drives these results. We release the framework, experimental data, and practical hyperparameter guidelines at https://github.com/inter-co/science-codeevolve.

hub tools

citation-role summary

background 7 baseline 1 other 1

citation-polarity summary

years

2026 19

clear filters

representative citing papers

What Do Evolutionary Coding Agents Evolve?

cs.NE · 2026-05-19 · unverdicted · novelty 7.0

Evolutionary coding agents achieve most benchmark gains through a small subset of edit types and by cycling previously deleted code lines rather than developing new algorithmic structures.

From I/O to Code with Discovery Agent

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DIO-Agent frames IO2Code as LLM-driven evolutionary search over programs with a Transformation Priority Premise to favor simple hypotheses, outperforming baselines on a new IO2CodeBench.

Open-Ended Task Discovery via Bayesian Optimization

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

Generate-Select-Refine is an open-ended Bayesian optimization method that generates tasks and concentrates evaluations on the best one with only logarithmic regret overhead relative to standard single-task optimization.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution

cs.NE · 2026-04-12 · unverdicted · novelty 6.0

TurboEvolve improves LLM program evolution by running parallel islands with LLM-generated diverse candidates that carry self-assigned weights, an adaptive scheduler, and clustered seed injection to reach stronger solutions at lower evaluation budgets.

Learning to Solve and Optimize by Evolving Code

cs.LG · 2026-05-29 · unverdicted · novelty 5.0

CHECKMATE evolves correct high-performing solvers from formal specs and natural language descriptions, outperforming SOTA on configuration and scheduling problems.

Evolutionary Ensemble of Agents

cs.NE · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.

PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing, recommendation, and protein tasks.

citing papers explorer

Showing 18 of 18 citing papers after filters.

  • What Do Evolutionary Coding Agents Evolve? cs.NE · 2026-05-19 · unverdicted · none · ref 5 · internal anchor

    Evolutionary coding agents achieve most benchmark gains through a small subset of edit types and by cycling previously deleted code lines rather than developing new algorithmic structures.

  • From I/O to Code with Discovery Agent cs.LG · 2026-05-14 · unverdicted · none · ref 1 · internal anchor

    DIO-Agent frames IO2Code as LLM-driven evolutionary search over programs with a Transformation Priority Premise to favor simple hypotheses, outperforming baselines on a new IO2CodeBench.

  • SemaTune: Semantic-Aware Online OS Tuning with Large Language Models cs.OS · 2026-05-14 · unverdicted · none · ref 6 · internal anchor

    SemaTune uses LLM guidance with semantic context to tune up to 41 Linux OS parameters, delivering 72.5% performance gains over defaults and 153.3% over non-LLM baselines on 13 workloads while avoiding degraded states.

  • MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI cs.LG · 2026-05-09 · unverdicted · none · ref 5 · 2 links · internal anchor

    MLS-Bench is a benchmark with 140 tasks that evaluates AI agents on inventing generalizable and scalable ML methods, finding they lag human performance especially in insight-driven invention rather than tuning.

  • Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics cs.DC · 2026-04-08 · unverdicted · none · ref 30 · internal anchor

    Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.

  • Evolutionary Search for Automated Design of Uncertainty Quantification Methods cs.CL · 2026-04-03 · unverdicted · none · ref 1 · internal anchor

    LLM-driven evolutionary search discovers unsupervised UQ methods as Python programs that improve ROC-AUC by up to 6.7% over manual baselines on atomic claim verification across 9 datasets with OOD generalization.

  • Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks cs.CL · 2026-06-27 · unverdicted · none · ref 9 · internal anchor

    Evolution Fine-Tuning trains LLMs on 156K trajectories spanning 371 tasks to achieve 10.22% average improvement on 22 held-out optimization tasks and match SOTA on select circle-packing problems when combined with test-time RL.

  • BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution cs.SE · 2026-05-31 · unverdicted · none · ref 43 · internal anchor

    BenchEvolver evolves coding problem solutions to generate harder, valid tasks, producing LiveCodeBench-Plus where frontier models score 27.5-62.6% and enabling RL gains on held-out tests.

  • FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration cs.LG · 2026-05-08 · unverdicted · none · ref 3 · internal anchor

    FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.

  • Open-Ended Task Discovery via Bayesian Optimization cs.AI · 2026-05-08 · unverdicted · none · ref 9 · internal anchor

    Generate-Select-Refine is an open-ended Bayesian optimization method that generates tasks and concentrates evaluations on the best one with only logarithmic regret overhead relative to standard single-task optimization.

  • Evaluation-driven Scaling for Scientific Discovery cs.LG · 2026-04-21 · unverdicted · none · ref 7 · internal anchor

    SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

  • AdaExplore: Failure-Driven Adaptation and Diversity-Preserving Search for Efficient Kernel Generation cs.CL · 2026-04-17 · unverdicted · none · ref 2 · internal anchor

    AdaExplore improves correctness and speed of Triton kernel generation by converting recurring failures into a memory of rules and organizing search as a tree that mixes local refinements with larger regenerations, yielding 3.12x and 1.72x speedups on KernelBench Level-2 and Level-3 within 100 steps.

  • TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution cs.NE · 2026-04-12 · unverdicted · none · ref 2 · internal anchor

    TurboEvolve improves LLM program evolution by running parallel islands with LLM-generated diverse candidates that carry self-assigned weights, an adaptive scheduler, and clustered seed injection to reach stronger solutions at lower evaluation budgets.

  • Learning to Solve and Optimize by Evolving Code cs.LG · 2026-05-29 · unverdicted · none · ref 1 · internal anchor

    CHECKMATE evolves correct high-performing solvers from formal specs and natural language descriptions, outperforming SOTA on configuration and scheduling problems.

  • Evolutionary Ensemble of Agents cs.NE · 2026-05-09 · unverdicted · none · ref 9 · 2 links · internal anchor

    EvE co-evolves code solvers and guidance states via synchronous races and Elo updates, discovering a rescale-then-interpolate mechanism that enables example-count generalization in ICON.

  • PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents cs.LG · 2026-05-07 · unverdicted · none · ref 4 · internal anchor

    PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing, recommendation, and protein tasks.

  • Algorithmic algorithm development with LLMs: A Case Study on LLM-Usage for Contraction Order Optimization in Tensor Networks cs.AI · 2026-06-01 · unverdicted · none · ref 3 · internal anchor

    Case study applies verifier-guided LLM evolutionary agents to contraction-order optimization in tensor networks and concludes that human validation remains essential.

  • Effective Harness Engineering for Algorithm Discovery with Coding Agents cs.SE · 2026-05-13 · unverdicted · none · ref 15 · internal anchor

    Under fixed token budget on Circle Packing, deeper per-candidate reasoning beats generating more shallow candidates, and capable models produce evaluation hacks at higher rates.