super hub Canonical reference

AlphaEvolve: A coding agent for scientific and algorithmic discovery

Adam Zsolt Wagner, Alexander Novikov, Emilien Dupont, Marvin Eisenberger, Po-Sen Huang · 2025 · cs.AI · arXiv 2506.13131

Canonical reference. 74% of citing Pith papers cite this work as background.

251 Pith papers citing it

Background 74% of classified citations

open full Pith review browse 251 citing papers more from Adam Zsolt Wagner arXiv PDF

abstract

In this white paper, we present AlphaEvolve, an evolutionary coding agent that substantially enhances capabilities of state-of-the-art LLMs on highly challenging tasks such as tackling open scientific problems or optimizing critical pieces of computational infrastructure. AlphaEvolve orchestrates an autonomous pipeline of LLMs, whose task is to improve an algorithm by making direct changes to the code. Using an evolutionary approach, continuously receiving feedback from one or more evaluators, AlphaEvolve iteratively improves the algorithm, potentially leading to new scientific and practical discoveries. We demonstrate the broad applicability of this approach by applying it to a number of important computational problems. When applied to optimizing critical components of large-scale computational stacks at Google, AlphaEvolve developed a more efficient scheduling algorithm for data centers, found a functionally equivalent simplification in the circuit design of hardware accelerators, and accelerated the training of the LLM underpinning AlphaEvolve itself. Furthermore, AlphaEvolve discovered novel, provably correct algorithms that surpass state-of-the-art solutions on a spectrum of problems in mathematics and computer science, significantly expanding the scope of prior automated discovery methods (Romera-Paredes et al., 2023). Notably, AlphaEvolve developed a search algorithm that found a procedure to multiply two $4 \times 4$ complex-valued matrices using $48$ scalar multiplications; offering the first improvement, after 56 years, over Strassen's algorithm in this setting. We believe AlphaEvolve and coding agents like it can have a significant impact in improving solutions of problems across many areas of science and computation.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 33 baseline 3 method 3 dataset 2 other 1

citation-polarity summary

background 31 baseline 3 use method 3 unclear 2 use dataset 2 support 1

claims ledger

abstract In this white paper, we present AlphaEvolve, an evolutionary coding agent that substantially enhances capabilities of state-of-the-art LLMs on highly challenging tasks such as tackling open scientific problems or optimizing critical pieces of computational infrastructure. AlphaEvolve orchestrates an autonomous pipeline of LLMs, whose task is to improve an algorithm by making direct changes to the code. Using an evolutionary approach, continuously receiving feedback from one or more evaluators, AlphaEvolve iteratively improves the algorithm, potentially leading to new scientific and practical d

authors

Adam Zsolt Wagner Alexander Novikov Emilien Dupont Marvin Eisenberger Ng\^an V\~u Po-Sen Huang

co-cited works

representative citing papers

Resolving the Schwartz Quadratic Meander Number Conjecture

math.CO · 2026-06-10 · unverdicted · novelty 8.0 · 2 refs

The maximum meander number for cyclic permutations on n letters is bounded above and below by quadratic functions of n.

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

cs.AI · 2026-06-03 · unverdicted · novelty 8.0

AutoLab benchmark shows frontier models mostly fail at sustained iterative optimization due to premature termination, with persistence as the key success factor.

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

cs.AI · 2026-06-03 · unverdicted · novelty 8.0

The Meta-Agent Challenge shows frontier AI models rarely match human-engineered agent baselines when tasked with autonomous development, with proprietary models succeeding most often and some exhibiting cheating under pressure.

LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning

cs.AI · 2026-05-28 · unverdicted · novelty 8.0

LLM-guided evolutionary search yields the first domain-independent C++ planning heuristics that exceed the strongest hand-engineered baselines on coverage and speed trade-offs across unseen domains.

FastKernels: Benchmarking GPU Kernel Generation in Production

cs.LG · 2026-05-22 · conditional · novelty 8.0

FastKernels is a production-aligned benchmark covering 96.2% of HuggingFace Transformers that reveals state-of-the-art kernel agents deliver at most 0.94x aggregate speedup.

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

cs.CL · 2026-05-08 · conditional · novelty 8.0 · 2 refs

AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

cs.AI · 2026-05-07 · unverdicted · novelty 8.0

VibeServe demonstrates that AI agents can synthesize bespoke LLM serving systems end-to-end, remaining competitive with vLLM in standard settings while outperforming it in six non-standard scenarios involving unusual models, workloads, or hardware.

MappingEvolve: LLM-Driven Code Evolution for Technology Mapping

cs.CE · 2026-04-29 · unverdicted · novelty 8.0

MappingEvolve applies LLMs through Planner-Evolver-Evaluator agents to evolve technology mapping code, delivering 10.04% area reduction versus ABC and 7.93% versus mockturtle on EPFL benchmarks.

Prism: Symbolic Superoptimization of Tensor Programs

cs.PL · 2026-04-16 · unverdicted · novelty 8.0

Prism is the first symbolic superoptimizer for tensor programs that uses sGraph for compact representation of program families, two-level search, e-graph equivalence checking, and auto-tuning to achieve up to 2.2x speedup over prior superoptimizers on LLM workloads.

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis

cs.CL · 2026-04-14 · unverdicted · novelty 8.0

InfiniteScienceGym procedurally generates unbounded scientific repositories with exact ground-truth QA pairs to benchmark LLMs on data reasoning, abstention, and tool use without static datasets.

CHIA: An open-source framework for principled, agentic AI-driven hardware/software co-design research

cs.AR · 2026-06-25 · unverdicted · novelty 7.0 · 2 refs

CHIA introduces a framework for building and deploying agentic AI co-design flows as CHIA loops with tool nodes, reliability mechanisms, and five case-study demonstrations.

Optimizing CUDA like a Human: Micro-Profiling Tools as Expert Surrogates for LLM-Based GPU Kernel Optimization

cs.LG · 2026-06-24 · conditional · novelty 7.0

KernelPro combines LLM code generation, roofline-guided tool orchestration, and domain-adapted MCTS to produce GPU kernels that outperform prior automated and some hand-tuned baselines on KernelBench and VeOmni workloads.

Evolving Quantum Error-Correcting Encodings for Molecular Simulation

quant-ph · 2026-06-24 · conditional · novelty 7.0

LLM-driven evolutionary program synthesis discovers Generalized Superfast Encodings with exact distance 5 (and 6 on one instance) for molecular Hamiltonians, the first beyond distance 3.

Large-Language-Model Discovery of Quantum LDPC Codes through Structured Concept Evolution

quant-ph · 2026-06-23 · unverdicted · novelty 7.0

A new LLM-guided search method called structured concept evolution discovers competitive lifted-product qLDPC code families including non-abelian constructions.

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

cs.CL · 2026-06-23 · unverdicted · novelty 7.0

NatureBench evaluates ten frontier AI coding agents on 90 tasks from Nature papers under web-search-disabled conditions and finds the strongest agent surpasses published SOTA on only 17.8% of tasks, succeeding mainly by translating problems into familiar supervised learning setups.

SPIRAL: Learning to Search and Aggregate

cs.AI · 2026-06-22 · unverdicted · novelty 7.0

SPIRAL is a reinforcement learning framework that jointly optimizes sequential reasoning, parallel trace generation, and aggregation in language models for improved test-time performance.

A catalog of fast matrix multiplication algorithms with frontier-closure search

cs.SC · 2026-06-11 · unverdicted · novelty 7.0

A machine-checkable catalog of low-rank matrix multiplication algorithms up to 32x32x32 is built over multiple fields via frontier-closure search that recombines entries while preserving a non-overlap property with prior bilinear cores.

Mathematical perspective on genetic algorithms with optimization guided operators

cs.NE · 2026-06-10 · unverdicted · novelty 7.0

Presents a query-complexity framework for genetic algorithms with guided operators and shows necessity of multiple operators and tight bounds for diversity in solution pools.

AgentCanary: A Security Evaluation Framework for Autonomous AI Agents in Real Executable Environments

cs.CR · 2026-06-09 · unverdicted · novelty 7.0

AgentCanary introduces an Entry × Impact risk taxonomy, high-fidelity real tool environments with persistent state, and multi-dimensional trajectory evaluation to assess AI agent security across models and attacks.

Harnessing the Collective Intelligence of AI Agents in the Wild for New Discoveries

cs.CL · 2026-06-09 · unverdicted · novelty 7.0

EinsteinArena is a platform for AI agents to collectively discover new mathematical results through open interaction, achieving 12 new state-of-the-art outcomes including raising the 11-dimensional kissing number lower bound from 593 to 604.

Self-Harness: Harnesses That Improve Themselves

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

Self-Harness lets LLM agents autonomously refine their interaction harnesses through weakness mining, proposal generation, and validation, raising held-out pass rates on Terminal-Bench-2.0 from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1% across three models.

FunctionEvolve: Structure-Guided Symbolic Regression with LLMs

cs.LG · 2026-06-05 · unverdicted · novelty 7.0

FunctionEvolve recovers 107 exact symbolic forms out of 129 synthetic tasks (82.9% SA@50) by using expression-tree structure for evolutionary search, parent selection, mutation, and coefficient scoring with LLMs.

MotionDisco: Motion Discovery for Extreme Humanoid Loco-Manipulation

cs.RO · 2026-06-04 · unverdicted · novelty 7.0

MotionDisco discovers long-horizon humanoid loco-manipulation motions from scratch via LLM-guided evolutionary search, trajectory optimization, and pruning, then transfers them to real robots with RL policies.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Advancing Mathematics Research with AI-Driven Formal Proof Search cs.AI · 2026-05-21 · conditional · none · ref 46 · 2 links · internal anchor
An LLM-based agent with Lean verification autonomously solved multiple open Erdős problems and OEIS conjectures in the first large-scale test.
AI scientists produce results without reasoning scientifically cs.AI · 2026-04-20 · conditional · none · ref 25 · internal anchor
LLM agents execute scientific tasks but fail to follow core scientific reasoning norms such as evidence consideration and belief revision based on refutations.
OpenDeepThink: Parallel Reasoning via Bradley-Terry Aggregation cs.AI · 2026-05-14 · conditional · none · ref 14 · 2 links · internal anchor
OpenDeepThink uses Bradley-Terry aggregation of LLM pairwise judgments to rank and evolve parallel reasoning traces, improving Gemini 3.1 Pro Codeforces Elo by 405 points over eight rounds.
AIRA_2: Overcoming Bottlenecks in AI Research Agents cs.AI · 2026-03-27 · conditional · none · ref 17 · internal anchor
AIRA₂ improves AI research agents via asynchronous multi-GPU workers, hidden consistent evaluation, and interactive ReAct agents, reaching 81.5-83.1% percentile rank on MLE-bench-30 and exceeding human SOTA on 6 of 20 AIRS-Bench tasks.

AlphaEvolve: A coding agent for scientific and algorithmic discovery

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer