pith. sign in

hub

AIDE: AI-Driven Exploration in the Space of Code

43 Pith papers cite this work. Polarity classification is still indexing.

43 Pith papers citing it
abstract

Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.

hub tools

citation-role summary

background 3 baseline 1

citation-polarity summary

years

2026 39 2025 4

clear filters

representative citing papers

Data Flow Control: Data Safety Policies for AI Agents

cs.DB · 2026-06-04 · unverdicted · novelty 7.0

Data Flow Control formalizes data safety as aggregate predicates over provenance monomials and implements enforcement via the Passant query rewriting layer achieving near-zero overhead across five DBMS engines.

What Do Evolutionary Coding Agents Evolve?

cs.NE · 2026-05-19 · unverdicted · novelty 7.0

Evolutionary coding agents achieve most benchmark gains through a small subset of edit types and by cycling previously deleted code lines rather than developing new algorithmic structures.

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

Arbor combines a coordinator, executors, and a hypothesis tree to enable cumulative autonomous research, outperforming Codex and Claude Code by over 2.5x on six real tasks and reaching 86.36% Any Medal on MLE-Bench Lite.

DataMaster: Data-Centric Autonomous AI Research

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

DataMaster deploys an AI agent to autonomously engineer data via tree search over external sources, shared candidate pools, and memory of past outcomes, yielding 32% higher medal rates on MLE-Bench Lite and a small GPQA gain over the base instruct model.

AgentGA: Evolving Code Solutions in Agent-Seed Space

cs.AI · 2026-04-16 · unverdicted · novelty 6.0 · 2 refs

AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.

Pioneer Agent: Continual Improvement of Small Language Models in Production

cs.AI · 2026-04-10 · unverdicted · novelty 6.0

Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.

AIRA_2: Overcoming Bottlenecks in AI Research Agents

cs.AI · 2026-03-27 · conditional · novelty 6.0

AIRA₂ improves AI research agents via asynchronous multi-GPU workers, hidden consistent evaluation, and interactive ReAct agents, reaching 81.5-83.1% percentile rank on MLE-bench-30 and exceeding human SOTA on 6 of 20 AIRS-Bench tasks.

citing papers explorer

Showing 1 of 1 citing paper after filters.