An AI system to help scientists write expert-level empirical software

Anastasiya Belyaeva; Anna Bulanova; Brian P. Williams; Chris Co; Chujun He; Cory Y. McLean; Dan Liebling; David Smalling; Erica Brand; Eser Ayg\"un

arxiv: 2509.06503 · v2 · pith:2QRJUZEZnew · submitted 2025-09-08 · 💻 cs.AI · q-bio.QM

An AI system to help scientists write expert-level empirical software

Eser Ayg\"un , Anastasiya Belyaeva , Gheorghe Comanici , Marc Coram , Hao Cui , Jake Garrison , Renee Johnston Anton Kast , Cory Y. McLean

show 33 more authors

Peter Norgaard Zahra Shamsi David Smalling James Thompson Subhashini Venugopalan Brian P. Williams Chujun He Sarah Martinson Martyna Plomecka Lai Wei Yuchen Zhou Qian-Ze Zhu Matthew Abraham Erica Brand Anna Bulanova Jeffrey A. Cardille Chris Co Scott Ellsworth Grace Joseph Malcolm Kane Ryan Krueger Johan Kartiwa Dan Liebling Jan-Matthis Lueckmann Paul Raccuglia Xuefei (Julie) Wang Katherine Chou James Manyika Yossi Matias John C. Platt Lizzie Dorfman Shibl Mourad Michael P. Brenner

This is my paper

classification 💻 cs.AI q-bio.QM

keywords expert-levelsoftwarenovelscientificsystemanalysiscitediverse

0 comments

read the original abstract

The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software to support computational experiments\cite{hannay2009how}. To address this, we present Empirical Research Assistance (ERA), an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. The system uses a Large Language Model (LLM) and Tree Search (TS)\cite{silver2016mastering} to systematically improve the quality metric and intelligently navigate the large space of possible solutions. ERA achieves expert-level results when it explores and integrates complex research ideas from external sources. The effectiveness of tree search is demonstrated across a diverse range of tasks. In bioinformatics, ERA discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard. In epidemiology, ERA generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations. ERA also produced expert-level software for geospatial analysis, neural activity prediction in zebrafish, and numerical solution of integrals, and a novel rule-based construction for time series forecasting. By devising and implementing novel solutions to diverse tasks, ERA represents a significant step towards accelerating scientific progress.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search
cs.AI 2026-05 unverdicted novelty 7.0

An LLM-guided tree search system autonomously creates diverse forecasting models that match or beat CDC human-curated ensembles in a 2025-2026 prospective multi-pathogen evaluation.
Probabilistic Seasonal Streamflow Forecasting Across California's Sierra Nevada Watersheds with Agentic AI
physics.ao-ph 2026-05 unverdicted novelty 7.0

An agentic AI workflow evolves an adaptive XGBoost quantile regression ensemble that reduces watershed-averaged forecast error by up to 29% versus California's operational forecasts for April-July runoff at 1-6 month ...
Optimized Three-Dimensional Photovoltaic Structures with LLM guided Tree Search
cs.CL 2026-05 conditional novelty 6.0

LLM-guided tree search with coding agents optimizes 3D photovoltaic designs for higher diurnal energy yield after correcting for simulation exploits.
Glia: A Human-Inspired AI for Automated Systems Design and Optimization
cs.AI 2025-10 unverdicted novelty 6.0

Glia deploys a multi-agent LLM workflow with reasoning, experimentation, and analysis agents to generate interpretable algorithms for request routing, scheduling, and auto-scaling in distributed GPU clusters, reaching...
ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms
cs.LG 2025-12 unverdicted novelty 5.0

ATHENA introduces an agentic team framework that autonomously manages the end-to-end computational research lifecycle via a knowledge-driven HENA loop to achieve validation errors of 10^{-14} in scientific computing a...