Avo: Agentic variation operators for autonomous evolutionary search, 2026

Terry Chen, Zhifan Ye, Bing Xu, Zihao Ye, Timmy Liu, Ali Hassani, Tianqi Chen, Andrew Kerr, Haicheng Wu, Yang Xu, Yu-Jung Chen, Hanfeng Chen, Aditya Kane, Ronny Krashinsky, Ming-Yu Liu, Vinod Grover, Luis Ceze, Roger Bringmann, John Tran, W · 2026 · arXiv 2603.24517

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

MLS-Bench is a benchmark with 140 tasks that evaluates AI agents on inventing generalizable and scalable ML methods, finding they lag human performance especially in insight-driven invention rather than tuning.

CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

CUDAHercules benchmark demonstrates that leading LLMs generate functional CUDA code but fail to recover expert-level optimization strategies needed for peak performance on Ampere, Hopper, and Blackwell GPUs.

GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

LLMs can forecast GPU kernel performance accurately enough to serve as selective surrogates, allowing kernel searches to consider more candidates and recover faster kernels under fixed GPU evaluation budgets.

CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

CUDABEAVER benchmark and pass@k(M,C,A) metric show LLM CUDA debugging success drops by up to 40 percentage points under strict performance requirements.

Pioneer Agent: Continual Improvement of Small Language Models in Production

cs.AI · 2026-04-10 · unverdicted · novelty 6.0

Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.

citing papers explorer

Showing 5 of 5 citing papers after filters.

MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI cs.LG · 2026-05-09 · unverdicted · none · ref 15
MLS-Bench is a benchmark with 140 tasks that evaluates AI agents on inventing generalizable and scalable ML methods, finding they lag human performance especially in insight-driven invention rather than tuning.
CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs cs.LG · 2026-05-08 · unverdicted · none · ref 3
CUDAHercules benchmark demonstrates that leading LLMs generate functional CUDA code but fail to recover expert-level optimization strategies needed for peak performance on Ampere, Hopper, and Blackwell GPUs.
GPU Forecasters: Language Models as Selective Surrogates for Kernel Runtime Optimization cs.LG · 2026-05-29 · unverdicted · none · ref 5
LLMs can forecast GPU kernel performance accurately enough to serve as selective surrogates, allowing kernel searches to consider more candidates and recover faster kernels under fixed GPU evaluation budgets.
CUDABeaver: Benchmarking LLM-Based Automated CUDA Debugging cs.LG · 2026-05-08 · unverdicted · none · ref 5
CUDABEAVER benchmark and pass@k(M,C,A) metric show LLM CUDA debugging success drops by up to 40 percentage points under strict performance requirements.
Pioneer Agent: Continual Improvement of Small Language Models in Production cs.AI · 2026-04-10 · unverdicted · none · ref 14
Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.

Avo: Agentic variation operators for autonomous evolutionary search, 2026

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer