An AI system to help scientists write expert-level empirical software
Pith reviewed 2026-05-22 13:17 UTC · model grok-4.3
The pith
An AI system uses tree search over LLM-generated code to produce scientific software that outperforms human experts on real leaderboards.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ERA is an AI system that uses a large language model guided by tree search to generate, evaluate, and iteratively improve scientific software whose goal is to maximize a domain-specific quality metric. When the system is allowed to explore and integrate complex research ideas from external sources, it produces runnable code that achieves expert-level performance, including 40 novel methods for single-cell data analysis that outperformed the top human-developed entries on a public leaderboard and 14 forecasting models that outperformed the CDC ensemble for COVID-19 hospitalizations.
What carries the argument
Tree search over variants of code generated by the large language model, with each candidate evaluated directly by the target quality metric to decide which branches to expand.
If this is right
- The same tree-search approach can be applied to other domains such as geospatial analysis and zebrafish neural prediction, yielding expert-level code without manual coding.
- ERA can produce entirely new rule-based constructions for time series forecasting that improve on existing techniques.
- By repeatedly integrating ideas from published literature, the system generates solutions that human developers had not previously combined.
Where Pith is reading between the lines
- If the quality metric can be defined for a new field, the same machinery could accelerate software development in that field without requiring new training of the underlying model.
- The approach opens the possibility of chaining multiple such systems, where one ERA instance writes code that another instance then uses as input for a downstream analysis.
- A natural next test would be whether human scientists can steer the search by occasionally editing the quality metric or injecting new constraints mid-process.
Load-bearing premise
The chosen quality metric truly measures expert-level scientific performance and the language model can turn external research ideas into correct, runnable code without human fixes.
What would settle it
Running the 40 single-cell methods discovered by ERA on the same public leaderboard and finding that none of them rank above the previous top human entry.
read the original abstract
The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software to support computational experiments\cite{hannay2009how}. To address this, we present Empirical Research Assistance (ERA), an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. The system uses a Large Language Model (LLM) and Tree Search (TS)\cite{silver2016mastering} to systematically improve the quality metric and intelligently navigate the large space of possible solutions. ERA achieves expert-level results when it explores and integrates complex research ideas from external sources. The effectiveness of tree search is demonstrated across a diverse range of tasks. In bioinformatics, ERA discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard. In epidemiology, ERA generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations. ERA also produced expert-level software for geospatial analysis, neural activity prediction in zebrafish, and numerical solution of integrals, and a novel rule-based construction for time series forecasting. By devising and implementing novel solutions to diverse tasks, ERA represents a significant step towards accelerating scientific progress.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Empirical Research Assistance (ERA), an AI system that uses a large language model together with tree search to generate and iteratively improve expert-level empirical software whose objective is to maximize a user-specified quality metric. The central claims are that ERA discovers and implements novel solutions by integrating complex ideas from external sources, with concrete demonstrations including 40 novel methods for single-cell data analysis that outperform the top human-developed entries on a public leaderboard and 14 models for COVID-19 hospitalization forecasting that surpass the CDC ensemble and all other individual models.
Significance. If the outperformance claims are shown to rest on an independently validated quality metric rather than direct optimization of the reported scores, the work would constitute a meaningful advance in automating the creation of domain-specific scientific code. The combination of LLM-based idea integration with tree search for systematic exploration across diverse tasks (bioinformatics, epidemiology, geospatial analysis, neural activity prediction, and numerical methods) is a clear strength, as is the emphasis on producing runnable, expert-level software rather than isolated code snippets.
major comments (2)
- [Abstract / Results (bioinformatics and epidemiology)] Abstract and results on bioinformatics/epidemiology: the headline claims of 40 outperforming methods and 14 outperforming models are load-bearing for the assertion of expert-level performance, yet no definition of the quality metric, exact baseline implementations, error bars, or controls against post-hoc selection of the reported solutions are supplied. Without these, it is impossible to determine whether tree search produced genuinely novel expert software or simply optimized the scalar used for both guidance and final reporting.
- [Results sections on single-cell analysis and COVID-19 forecasting] The central claim that ERA 'achieves expert-level results when it explores and integrates complex research ideas from external sources' requires evidence that the quality metric is independent of the public leaderboard or forecast accuracy used for evaluation. No independent validation set, expert review protocol, or failure-case analysis is described that would decouple the metric from the reported wins.
minor comments (2)
- [Abstract] The abstract cites tree search but does not briefly indicate how the search is adapted to the space of code solutions and external literature integration.
- [Results] Ensure that all statements of novelty are accompanied by explicit comparison to the closest prior human or automated methods rather than only to leaderboard rank.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We are encouraged by the recognition of ERA's potential to advance automated scientific software development. Below, we provide point-by-point responses to the major comments, clarifying our approach and outlining revisions to address the concerns about metric definition and independence.
read point-by-point responses
-
Referee: Abstract and results on bioinformatics/epidemiology: the headline claims of 40 outperforming methods and 14 outperforming models are load-bearing for the assertion of expert-level performance, yet no definition of the quality metric, exact baseline implementations, error bars, or controls against post-hoc selection of the reported solutions are supplied. Without these, it is impossible to determine whether tree search produced genuinely novel expert software or simply optimized the scalar used for both guidance and final reporting.
Authors: We agree that these details are essential for rigorous evaluation. In the revised manuscript, we will add a dedicated section defining the quality metrics: for single-cell analysis, it is the composite score from the public leaderboard (e.g., based on clustering accuracy metrics like ARI and NMI on test data); for COVID-19 forecasting, it follows the CDC's evaluation protocol using mean absolute error or similar on reported hospitalizations. Exact baseline implementations will be described by referencing the top leaderboard entries and noting how we reproduced or compared against them. Error bars will be included from repeated ERA runs with different random seeds. For controls against post-hoc selection, we will report the number of solutions explored and the distribution of scores, showing that the reported ones are the top performers from the search rather than selected after the fact. While the metric guides the search, the novelty comes from the LLM proposing and implementing integrated ideas from external literature. revision: yes
-
Referee: The central claim that ERA 'achieves expert-level results when it explores and integrates complex research ideas from external sources' requires evidence that the quality metric is independent of the public leaderboard or forecast accuracy used for evaluation. No independent validation set, expert review protocol, or failure-case analysis is described that would decouple the metric from the reported wins.
Authors: The quality metric is indeed the performance on the respective benchmarks, as ERA is designed to maximize user-specified metrics for practical scientific tasks. However, the key contribution is the systematic exploration via tree search that allows integration of complex ideas (e.g., from recent papers on single-cell methods) into runnable code, leading to solutions that surpass existing ones. We will revise to include a failure-case analysis, describing instances where the search converged to suboptimal solutions or failed to integrate ideas effectively. We will also detail the expert review by noting that the generated code was validated for correctness and novelty through comparison to literature. An independent validation set separate from the leaderboard is not described because the leaderboards serve as the standard evaluation; we will add a limitations section discussing potential overfitting to public benchmarks and the value of future private test sets. revision: partial
Circularity Check
No significant circularity: results grounded in external public benchmarks
full rationale
The paper presents ERA as using LLM+tree search to maximize an internal quality metric, then reports outperformance on independent public leaderboards (bioinformatics) and the CDC ensemble (epidemiology). These external benchmarks are not shown to be identical to the search metric by any quoted equation or definition, and no self-citation chain or ansatz is invoked to force the headline results. The derivation chain therefore remains self-contained against external validation sets rather than reducing to its own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- Quality metric
axioms (1)
- domain assumption Tree search combined with an LLM can systematically explore and improve code solutions by integrating external research ideas.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The system uses a Large Language Model (LLM) and Tree Search (TS) to systematically improve the quality metric and intelligently navigate the large space of possible solutions.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ERA discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 6 Pith papers
-
Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search
An LLM-guided tree search system autonomously creates diverse forecasting models that match or beat CDC human-curated ensembles in a 2025-2026 prospective multi-pathogen evaluation.
-
Probabilistic Seasonal Streamflow Forecasting Across California's Sierra Nevada Watersheds with Agentic AI
An agentic AI workflow evolves an adaptive XGBoost quantile regression ensemble that reduces watershed-averaged forecast error by up to 29% versus California's operational forecasts for April-July runoff at 1-6 month ...
-
Optimized Three-Dimensional Photovoltaic Structures with LLM guided Tree Search
LLM-guided tree search with coding agents optimizes 3D photovoltaic designs for higher diurnal energy yield after correcting for simulation exploits.
-
Glia: A Human-Inspired AI for Automated Systems Design and Optimization
Glia deploys a multi-agent LLM workflow with reasoning, experimentation, and analysis agents to generate interpretable algorithms for request routing, scheduling, and auto-scaling in distributed GPU clusters, reaching...
-
ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms
ATHENA introduces an agentic team framework that autonomously manages the end-to-end computational research lifecycle via a knowledge-driven HENA loop to achieve validation errors of 10^{-14} in scientific computing a...
-
TusoAI: Agentic Optimization for Scientific Methods
TusoAI is an LLM-based agent that builds and iteratively optimizes domain-specific computational methods for scientific data analysis, outperforming expert baselines on RNA-seq denoising and earth monitoring while rep...
Reference graph
Works this paper leans on
-
[1]
Fortin, J. A., Cardille, J. A. & Perez, E. Multi-sensor detection of forest-cover change across 45 years in Mato Grosso, Brazil.Remote Sens. Environ.238, 111266 (2020)
work page 2020
- [2]
- [3]
-
[4]
Warshel, A. & Levitt, M. Theoretical studies of enzymic reactions: dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme.J. Mol. Biol.103, 227–249 (1976). 22 An AI system to help scientists write expert-level empirical software
work page 1976
-
[5]
Jumper, J.et al.Highly accurate protein structure prediction with AlphaFold.Nature596, 583–589 (2021)
work page 2021
-
[6]
Baek, M.et al.Accurate prediction of protein structures and interactions using a three-track neural network.Science373, 871–876 (2021)
work page 2021
-
[7]
Hourdin, F.et al.The art and science of climate model tuning.Bull. Am. Meteorol. Soc.98, 589–602 (2017)
work page 2017
-
[8]
Anderson Jr., J. Basic philosophy of CFD. InComputational Fluid Dynamics, 3–14 (Springer, 2009)
work page 2009
-
[9]
Silver, N.The signal and the noise: why so many predictions fail-but some don’t(Penguin, 2012)
work page 2012
-
[10]
D.Making sense of chaos: a better economics for a better world(Yale Univ
Farmer, J. D.Making sense of chaos: a better economics for a better world(Yale Univ. Press, 2024)
work page 2024
-
[11]
Bernanke, B. & Blanchard, O. What caused the US pandemic-era inflation?Am. Econ. J. Macroecon.17, 1–35 (2025)
work page 2025
-
[12]
Silver, D.et al.Mastering the game of Go with deep neural networks and tree search.Nature 529, 484–489 (2016)
work page 2016
-
[13]
Silver, D.et al.Mastering the game of Go without human knowledge.Nature550, 354–359 (2017)
work page 2017
-
[14]
Jiang, Z.et al.AIDE: AI-driven exploration in the space of code.arXiv preprint arXiv:2502.13138 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[15]
Novikov, A.et al.AlphaEvolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
Romera-Paredes, B.et al.Mathematical discoveries from program search with large language models.Nature625, 468–475 (2024)
work page 2024
- [17]
-
[18]
Automated Design of Agentic Systems
Hu, S., Lu, C. & Clune, J. Automated design of agentic systems.arXiv preprint arXiv:2408.08435 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
Xu, C.et al.Automatic cell-type harmonization and integration across Human Cell Atlas datasets. Cell186, 5876–5891.e20 (2023)
work page 2023
-
[20]
Regev, A.et al.The Human Cell Atlas.eLife6, e27041 (2017)
work page 2017
-
[21]
Centers for Disease Control and Prevention. COVID-19 forecast hub (2025). URL https: //github.com/cdcgov/covid19-forecast-hub?tab=readme-ov-file
work page 2025
- [22]
-
[23]
arXiv preprint arXiv:2503.02618(2025)
Lueckmann, J.-M.et al.ZAPBench: a benchmark for whole-brain activity prediction in zebrafish. arXiv preprint arXiv:2503.02618(2025)
-
[24]
arXiv preprint arXiv:2410.10393(2024)
Aksu, T.et al.GIFT-Eval: a benchmark for general time series forecasting model evaluation. arXiv preprint arXiv:2410.10393(2024). URL https://huggingface.co/spaces/Salesforce/ GIFT-Eval. 23 An AI system to help scientists write expert-level empirical software
-
[25]
Jovic, D.et al.Single-cell RNA sequencing technologies and applications: a brief overview.Clin. and Transl. Med.12, e694 (2022)
work page 2022
-
[26]
Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade.Nat. Protoc.13, 599–604 (2018)
work page 2018
-
[27]
CZI Cell Science Programet al.CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data.Nucleic Acids Res.53, D886–D900 (2025)
work page 2025
-
[28]
Zhang, J.et al.Tahoe-100M: a giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling.bioRxiv2025–02 (2025)
work page 2025
-
[29]
Stuart, T. & Satija, R. Integrative single-cell analysis.Nat. Rev. Genet.20, 257–272 (2019)
work page 2019
-
[30]
Zappia, L., Phipson, B. & Oshlack, A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database.PLoS Comput. Biol.14, e1006245 (2018)
work page 2018
-
[31]
Tran, H. T. N.et al.A benchmark of batch-effect correction methods for single-cell RNA sequencing data.Genome Biol.21, 1–32 (2020)
work page 2020
-
[32]
Chazarra-Gil, R., van Dongen, S., Kiselev, V. Y. & Hemberg, M. Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench.Nucleic Acids Res.49, e42 (2021)
work page 2021
-
[33]
D.et al.Benchmarking atlas-level data integration in single-cell genomics.Nat
Luecken, M. D.et al.Benchmarking atlas-level data integration in single-cell genomics.Nat. Methods19, 41–50 (2022)
work page 2022
-
[34]
D.et al.Defining and benchmarking open problems in single-cell analysis.Nat
Luecken, M. D.et al.Defining and benchmarking open problems in single-cell analysis.Nat. Biotechnol.43, 1035–1040 (2025)
work page 2025
-
[35]
Google. Gemini Deep Research (2025). URL https://gemini.google/overview/ deep-research/?hl=en
work page 2025
-
[36]
Gottweis, J.et al.Towards an AI co-scientist.arXiv preprint arXiv:2502.18864(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [37]
-
[38]
Polański, K.et al.BBKNN: fast batch alignment of single cell transcriptomes.Bioinformatics36, 964–965 (2019)
work page 2019
-
[39]
Chandrashekar, A.et al.TabVI: leveraging lightweight transformer architectures to learn biologically meaningful cellular representations.bioRxiv2025–02 (2025)
work page 2025
-
[40]
Yang, Y. & Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proc. 18th SIGSPATIAL Int. Conf. on Adv. in Geogr. Inf. Syst., 270–279 (Association for Computing Machinery, 2010)
work page 2010
-
[41]
Russakovsky, O.et al.ImageNet large scale visual recognition challenge.Int. J. Comput. Vis. 115, 211–252 (2015)
work page 2015
-
[42]
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks.Adv. Neural Inf. Process. Syst.25(2012)
work page 2012
-
[43]
Zhong, B., Du, J., Liu, M., Yang, A. & Wu, J. Region-enhancing network for semantic segmenta- tion of remote-sensing imagery.Sensors21(2021). 24 An AI system to help scientists write expert-level empirical software
work page 2021
-
[44]
Zhang, Z., Liu, B. & Li, Y. FURSformer: semantic segmentation network for remote sensing images with fused heterogeneous features.Electronics12(2023)
work page 2023
-
[45]
Atiampo, A. K. & Diédié, G. H. F. New fusion approach of spatial and channel attention for semantic segmentation of very high spatial resolution remote sensing images.Open J. Appl. Sci. 14, 288–319 (2024)
work page 2024
- [46]
-
[47]
Elgamily, K. M., Mohamed, M. A., Abou-Taleb, A. M. & Ata, M. M. A novel W13 deep CNN structure for improved semantic segmentation of multiple objects in remote sensing imagery. Neural Comput. Appl.37, 5397–5427 (2025)
work page 2025
- [48]
-
[49]
Zeng, A., Chen, M., Zhang, L. & Xu, Q. Are transformers effective for time series forecasting? In Proc AAAI Conf. Artif. Intell., vol. 37, 11121–11128 (2023)
work page 2023
-
[50]
Das, A.et al.Long-term forecasting with TiDE: Time-series Dense Encoder.Trans. Mach. Learn. Res.(2023)
work page 2023
-
[51]
Chen, S.-A., Li, C.-L., Yoder, N., Arik, S. O. & Pfister, T. TSMixer: An All-MLP architecture for time series forecasting.Trans. Mach. Learn. Res.(2023)
work page 2023
-
[52]
Perez, E., Strub, F., De Vries, H., Dumoulin, V. & Courville, A. FiLM: Visual reasoning with a general conditioning layer. InProc AAAI Conf. Artif. Intell., vol. 32 (2018)
work page 2018
-
[53]
Deistler, M.et al.Differentiable simulation enables large-scale training of detailed biophysical models of neural dynamics.bioRxiv2024–08 (2024)
work page 2024
-
[54]
Hoo, S. B., Müller, S., Salinas, D. & Hutter, F. From tables to time: how TabPFN-v2 outperforms specialized time series forecasting models.arXiv preprint arXiv:2501.02945(2025)
-
[55]
Liu, Y.et al.Sundial: A family of highly capable time series foundation models.arXiv preprint arXiv:2502.00816(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[56]
F.et al.Chronos: learning the language of time series.Trans
Ansari, A. F.et al.Chronos: learning the language of time series.Trans. Mach. Learn. Res. (2024)
work page 2024
-
[57]
Oreshkin, B. N., Carpov, D., Chapados, N. & Bengio, Y. N-BEATS: neural basis expansion analysis for interpretable time series forecasting.arXiv preprint arXiv:1905.10437(2019)
-
[58]
Ho, S. L. & Xie, M. The use of ARIMA models for reliability forecasting and analysis.Comput. Ind. Eng.35, 213–216 (1998)
work page 1998
-
[59]
Piessens, R., de Doncker-Kapenga, E., Überhuber, C. W. & Kahaner, D.QUADPACK: a subroutine package for automatic integration(Springer-Verlag, 1983)
work page 1983
-
[60]
& Ryzhik, I.Table of integrals, series, and products, 8th edn(Academic Press, 1994)
Gradshteyn, I. & Ryzhik, I.Table of integrals, series, and products, 8th edn(Academic Press, 1994)
work page 1994
-
[61]
Koza, J. R. Genetic programming as a means for programming computers by natural selection. Stat. Comput.4, 87–112 (1994). 25 An AI system to help scientists write expert-level empirical software
work page 1994
-
[62]
Mernik, M., Heering, J. & Sloane, A. M. When and how to develop domain-specific languages. ACM computing surveys (CSUR)37, 316–344 (2005)
work page 2005
-
[63]
Generative programming: Methods, techniques, and applications tutorial abstract
Czarnecki, K. Generative programming: Methods, techniques, and applications tutorial abstract. InInternational Conference on Software Reuse, 351–352 (Springer, 2002)
work page 2002
-
[64]
Chen, M.et al.Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374(2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[65]
Li, Y.et al.Competition-level code generation with AlphaCode.Science378, 1092–1097 (2022)
work page 2022
-
[66]
& Vanschoren, J.Automated machine learning: methods, systems, challenges (Springer Nature, 2019)
Hutter, F., Kotthoff, L. & Vanschoren, J.Automated machine learning: methods, systems, challenges (Springer Nature, 2019)
work page 2019
-
[67]
Merchant, A.et al.Scaling deep learning for materials discovery.Nature624, 80–85 (2023)
work page 2023
- [68]
-
[69]
Zhang, H.et al.CompBioAgent: An LLM-powered agent for single-cell RNA-seq data exploration. bioRxiv2025–03 (2025)
work page 2025
-
[70]
Zhou, J.et al.An AI agent for fully automated multi-omic analyses.Adv. Sci.11, 2407094 (2024)
work page 2024
-
[71]
Xin, Q.et al.BioInformatics Agent (BIA): unleashing the power of large language models to reshape bioinformatics workflow.bioRxiv2024–05 (2024)
work page 2024
-
[72]
Alber, S.et al.CellVoyager: AI compbio agent generates new insights by autonomously analyzing biological data.bioRxiv2025–06 (2025)
work page 2025
-
[73]
Baek, J., Jauhar, S. K., Cucerzan, S. & Hwang, S. J. ResearchAgent: iterative research idea generationoverscientificliteraturewithlargelanguagemodels.arXivpreprintarXiv:2404.07738 (2024)
-
[74]
Lu, C.et al.The AI Scientist: towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[75]
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Du, M., Xu, B., Zhu, C., Wang, X. & Mao, Z. DeepResearch Bench: a comprehensive benchmark for deep research agents.arXiv preprint arXiv:2506.11763(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[76]
Perplexity Deep Research (2025)
Perplexity. Perplexity Deep Research (2025). URL https://www.perplexity.ai/hub/blog/ introducing-perplexity-deep-research
work page 2025
- [77]
- [78]
-
[79]
Lee, J.et al.Gemini Embedding: Generalizable embeddings from Gemini.arXiv preprint arXiv:2503.07891(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[80]
Gigante, S., Cannoodt, R.et al. openproblems (2025). URL https://github.com/ openproblems-bio/openproblems. 26 An AI system to help scientists write expert-level empirical software
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.