Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems
Pith reviewed 2026-05-24 08:49 UTC · model grok-4.3
The pith
Level five autonomy in scientific discovery requires no human intervention at all.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that scientific discovery systems can be ranked by autonomy levels modeled on self-driving cars, where the top level demands zero human intervention in the entire process of producing scientific knowledge, and that movement toward this level constitutes a measurable step in the development of AI scientists that operate at or beyond the level of the best human scientists by 2050.
What carries the argument
Autonomy levels for scientific discovery systems, defined by analogy to autonomous driving, with level five requiring no human intervention in knowledge production.
If this is right
- Closed-loop discovery systems can operate without human input across domains such as material science and astronomy.
- Deep neural networks can be used to generate human-interpretable scientific knowledge.
- Pioneering systems like Adam represent early milestones on the path to higher autonomy.
- The defined levels provide a concrete metric for measuring progress toward fully independent AI scientists.
Where Pith is reading between the lines
- Verification standards for AI-generated discoveries would need to be developed independently of human oversight.
- Full autonomy could shift scientific practice from hypothesis-driven work by individuals to continuous, machine-led exploration.
- Domains with high experimental cost or safety constraints might see the earliest practical deployment of level-five systems.
Load-bearing premise
That the autonomy levels from driving apply directly to scientific discovery and that current trends in machine learning and robotics will reach level five without new conceptual breakthroughs.
What would settle it
A concrete demonstration that level-five autonomy in discovery cannot be achieved using existing machine-learning and robotics approaches and instead requires fundamentally new concepts.
Figures
read the original abstract
The paper surveys automated scientific discovery, from equation discovery and symbolic regression to autonomous discovery systems and agents. It discusses the individual approaches from a "big picture" perspective and in context, but also discusses open issues and recent topics like the various roles of deep neural networks in this area, aiding in the discovery of human-interpretable knowledge. Further, we will present closed-loop scientific discovery systems, starting with the pioneering work on the Adam system up to current efforts in fields from material science to astronomy. Finally, we will elaborate on autonomy from a machine learning perspective, but also in analogy to the autonomy levels in autonomous driving. The maximal level, level five, is defined to require no human intervention at all in the production of scientific knowledge. Achieving this is one step towards solving the Nobel Turing Grand Challenge to develop AI Scientists: AI systems capable of making Nobel-quality scientific discoveries highly autonomously at a level comparable, and possibly superior, to the best human scientists by 2050.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper is a survey of automated scientific discovery, tracing the development from equation discovery and symbolic regression techniques to closed-loop autonomous discovery systems. It provides context for individual approaches, discusses the integration of deep neural networks for generating human-interpretable knowledge, reviews closed-loop systems from the Adam system to applications in material science and astronomy, and introduces a framework for autonomy levels in scientific discovery modeled after those in autonomous driving. The highest level, level five, is characterized by complete absence of human intervention in scientific knowledge production, and achieving this is presented as a step toward the Nobel Turing Grand Challenge of creating AI systems capable of Nobel-quality discoveries by 2050.
Significance. If the survey accurately captures the state of the field and the proposed autonomy framework proves useful for classifying and guiding future work, the paper could contribute to organizing the literature on AI for science and inspiring research toward higher levels of autonomy. The synthesis of existing systems and identification of open issues adds value for the community working on AI-driven scientific discovery.
minor comments (1)
- [Abstract] Abstract: The abstract is written in future tense (e.g., 'we will present', 'we will elaborate'), which should be revised to present tense for consistency with a completed manuscript.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the survey and the recommendation for minor revision. No specific major comments were provided in the report.
Circularity Check
No significant circularity
full rationale
The paper is a survey reviewing existing systems and sketching a vision for autonomous scientific discovery. It contains no original derivations, equations, quantitative predictions, or load-bearing technical claims that could reduce to fitted parameters or self-referential definitions. Autonomy levels are defined by explicit analogy to driving, not derived from data or prior self-citations. The Nobel Turing Grand Challenge is positioned as an external goal, not a result obtained within the paper. No circular steps exist.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
[Brence et al., 2021] Jure Brence, Ljupčo Todorovski, Sašo Džeroski: Probabilistic Grammars for Equation Dis- covery, Knowledge Based Systems, 224:107077,
work page 2021
-
[2]
[Brence et al., 2023] Jure Brence, Ljupčo Todorovski, Sašo Džeroski: Dimensionally consistent equation discovery through probabilistic attribute grammars. Information Sciences
work page 2023
-
[3]
of the 34th Canadian Conference on Artificial Intelligence (Canadian AI 2021),
[Bellinger et al., 2021] Colin Bellinger et al.: Active Meas- ure Reinforcement Learning for Observation Cost Mini- mization, in: Proc. of the 34th Canadian Conference on Artificial Intelligence (Canadian AI 2021),
work page 2021
-
[4]
2008] Will Bridewell, Pat Langley, Ljupčo Todorovski, Sašo Džeroski
[Bridewell et al. 2008] Will Bridewell, Pat Langley, Ljupčo Todorovski, Sašo Džeroski. Inductive process modeling. Machine Learning, 71: 1-32
work page 2008
-
[5]
[Brunton et al. 2016] Discovering governing equations from data by sparse identification of nonlinear dynamical sys- tems, PNAS, 113: 3932-3937,
work page 2016
-
[6]
2021] Benjamin Burger et al.:A mobile robotic chemist, Nature 583, 237–241,
[Burger et al. 2021] Benjamin Burger et al.:A mobile robotic chemist, Nature 583, 237–241,
work page 2021
-
[7]
[Chaushevska et al., 2022] Marija Chaushevska, Ljupco To- dorovski, Jure Brence, Sašo Džeroski. Learning the pro- babilities in probabilistic context-free grammars for arith- metical expressions from equation corpora, in: Proc. Slo- venian Conference on Artificial Intelligence,
work page 2022
-
[8]
[Chen et al., 2022] Boyuan Chen, Kuang Huang, Sunand Raghupathi, Ishaan Chandratreya, Qiang Du & Hod Lip- son: Automated discovery of fundamental variables hid- den in experimental data, Nature Computational Science, 2:433–442,
work page 2022
-
[9]
[Cherepnalkoski et al., 2012] Darko Čerepnalkoski, Katerina Taškova, Ljupčo Todorovski, Nataša Atanasova, Sašo Džeroski,
work page 2012
-
[10]
Ecological Modelling, 45:136-165 [Cohn et al., 1996] D
The influence of parameter fitting me- thods on model structure selection in automated modeling of aquatic ecosystems. Ecological Modelling, 45:136-165 [Cohn et al., 1996] D. A. Cohn, Z. Ghahramani, M. I. Jordan: Active Learning with Statistical Models, Journal of Arti- ficial Intelligence Research, 4, 129-145,
work page 1996
-
[11]
[Coulant et al., 2019] Anthony Coutant et al. (2019) Closed- Loop Cycles of Experiment Design, Execution, and Learning Accelerate Systems Biology Model Deve- lopment in Yeast, Proceedings of the National Academy of Sciences, 116(36):18142-18147. [Cranmer et al., 2020] Miles D. Cranmer, Alvaro Sanchez- Gonzalez, Peter W. Battaglia, Rui Xu, Kyle Cranmer, Da...
work page 2019
-
[12]
of the 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), 853-862,
[De Raedt & Kramer, 2001] Luc De Raedt, Stefan Kramer: The Levelwise Version Space Algorithm and its Applica- tion to Molecular Fragment Finding, in: Proc. of the 17th International Joint Conference on Artificial Intelligence (IJCAI 2001), 853-862,
work page 2001
- [13]
-
[14]
Seventh European Conference on Machine Learning, pages 347-350
[Džeroski & Petrovski, 1994] Sašo Džeroski, Igor Petrovski: Discovering dynamics with genetic programming, in: Proc. Seventh European Conference on Machine Learning, pages 347-350. Springer,
work page 1994
-
[15]
of the Tenth International Conference on Machine Learning, pages 97-103
[Džeroski & Todorovski, 1993] Sašo Džeroski, Ljupčo To- dorovski: Discovering dynamics, in: Proc. of the Tenth International Conference on Machine Learning, pages 97-103. Morgan Kaufmann,
work page 1993
-
[16]
Nye, Mathias Sablé-Meyer, Lucas Morales, Luke B
[Ellis et al., 2021] Kevin Ellis, Catherine Wong, Maxwell I. Nye, Mathias Sablé-Meyer, Lucas Morales, Luke B. He- witt, Luc Cary, Armando Solar-Lezama, Joshua B. Te- nenbaum: DreamCoder: bootstrapping inductive program synthesis with wake-sleep library learning, in: Proc. of the 42nd ACM SIGPLAN International Conference on Pro- gramming Language Design an...
work page 2021
-
[17]
of the 13th International Conference on Discovery Science (DS 2010), Springer
[Ganzert et al., 2010] Steven Ganzert, Josef Guttmann, Da- niel Steinmann, Stefan Kramer: Equation Discovery for Model Identification in Respiratory Mechanics of the Me- chanically Ventilated Human Lung, in: Proc. of the 13th International Conference on Discovery Science (DS 2010), Springer. 296-310. [Garcon et al., 2022] Antoine Garcon, Julian Vexler, Dm...
work page 2010
-
[18]
[Guimerà et al., 2020] Roger Guimerà, Ignasi Reichardt, An- toni Aguilar-Mogas, Francesco A. Massucci, Manuel Mi- randa, Jordi Pallarès, Marta Sales-Pardo: A Bayesian ma- chine scientist to aid in the solution of challenging scien- tific problems, Science Advances, 6:eaav6971,
work page 2020
-
[19]
[Jumper et al., 2021] John Jumper et al.: Highly accurate protein structure prediction with AlphaFold, Nature, 596: 583-589,
work page 2021
-
[20]
[King et al. 2004]: Ross D. King, Kenneth E. Whelan, Ffion M. Jones, Philip G.K. Reiser, Christopher H. Bryant, Ste- phen H. Muggleton, Douglas B. Kell, Stephen G. Oliver. Functional genomic hypothesis generation and experi- mentation by a robot scientist, Nature 427,
work page 2004
-
[21]
[King et al., 2009] Ross D. King, Jem Rowland, Stephen G. Oliver, Michael Young, Wayne Aubrey, Emma Byrne, Maria Liakata, Magdalena Markham, Pinar Pir, Larisa N. Soldatova, Andrew Sparkes, Kenneth E. Whelan, Amanda Clare (2009) The Automation of Science, Sci- ence, 324:5923, 85-89. [Kitano, 2021] Hiroaki Kitano: Nobel Turing Challenge: cre- ating the engi...
work page 2009
-
[22]
[Köppel et al., 2022] Marius Köppel, Alexander Segner, Martin Wagener, Lukas Pensel, Andreas Karwath, Chri- stian Schmitt & Stefan Kramer: Learning to rank Higgs boson candidates, Scientific Reports, 12, 13094,
work page 2022
-
[23]
[Koza, 2004] John R. Koza: Genetic programming as a means for programming computers by natural selection, Stati- stics and Computing, 4:87-112,
work page 2004
-
[24]
of the 5th International Joint Conference on Artificial Intelligence (IJCAI 1977), 344,
[Langley, 1977] Pat Langley: BACON: A Production Sy- stem That Discovers Empirical Laws, in: Proc. of the 5th International Joint Conference on Artificial Intelligence (IJCAI 1977), 344,
work page 1977
-
[25]
[Langley et al., 1987] Patrick W. Langley, Herbert A. Simon, Gary Bradshaw, Jan M. Zytkow (1987) Scientific Dis- covery: Computational Explorations of the Creative Pro- cess, MIT Press. [Langley, 2021] Pat Langley: Agents of Exploration and Dis- covery, AI Magazine, 42:4, 72–82,
work page 1987
-
[26]
[Lemos et al., 2022] Pablo Lemos, Niall Jeffrey, Miles D. Cranmer, Shirley Ho, Peter W. Battaglia: Rediscovering orbital mechanics with machine learning, arXiv preprint, CoRR abs/2202.02306,
-
[27]
[Levina & Bickel, 2004] Elizaveta Levina, Peter J. Bickel: Maximum Likelihood Estimation of Intrinsic Dimension, Advances in Neural Information Processing Systems 17, pp. 777–784,
work page 2004
-
[28]
[Li et al., 2021] Zelong Li, Jianchao Ji, Yongfeng Zhang: From Kepler to Newton: Explainable AI for Science, ar- Xiv preprint, https://doi.org/10.48550/arXiv.2111.12210,
-
[29]
Petersen, Mikel Landajuela Larma, T
[Petersen et al., 2021] Brenden K. Petersen, Mikel Landajuela Larma, T. Nathan Mundhenk, Claudio P. San- tiago, Soo K. Kim, Joanne T. Kim: Deep Symbolic Re- gression: Recovering Mathematical Expressions from Data Via Risk-Seeking Policy Gradients, in: Proc. of the 9th International Conference on Learning Representa- tions ( ICLR 2021),
work page 2021
-
[30]
[Pušnik et al., 2022] Žiga Pušnik, Miha Mraz, Nikolaj Zimic, Miha Moškon: Review and assessment of Boolean ap- proaches for inference of gene regulatory networks, He- liyon, 8(8): e10222,
work page 2022
-
[31]
[Schmidt & Lipson, 2009] Michael Schmidt, Hod Lipson (2009) Distilling Free-Form Natural Laws from Experi- mental Data, Science, 324:5923, 81-85. [Sparkes et al. 2010] Andrew Sparkes, Wayne Aubrey, Emma Byrne, Amanda Clare, Muhammed N. Khan, Ma- ria Liakata, Magdalena Markham, Jem Rowland, Larisa N. Soldatova, Kenneth E Whelan, Michael Young, Ross D. King...
work page 2009
-
[32]
[Sutton & Barto, 2018] Richard Sutton, Andrew Barto: Rein- forcement Learning: An Introduction, MIT Press,
work page 2018
-
[33]
of the Fourteenth International Conference on Ma- chine Learning, pages 376-384
[Todorovski & Džeroski, 1997] Ljupčo Todorovski, Sašo Džeroski: Declarative bias in equation discovery, in: Proc. of the Fourteenth International Conference on Ma- chine Learning, pages 376-384. Morgan Kaufmann,
work page 1997
-
[34]
Ecological Modelling, 194:3-13,
[Todorovski & Džeroski, 2006] Ljupčo Todorovski, Sašo Džeroski: Integrating knowledge-driven and data-driven approaches to modeling. Ecological Modelling, 194:3-13,
work page 2006
-
[35]
[Udrescu et al., 2020] Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng, Orisvaldo Neto, Tailin Wu, Max Tegmark: AI Feynman 2.0: Pareto-optimal symbolic regression ex- ploiting graph modularity, Advances in Neural Informa- tion Processing Systems 33,
work page 2020
-
[36]
of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI 1997), 810-819,
[Washio & Motoda, 1997] Takashi Washio and Hiroshi Mo- toda: Discovering Admissible Models of Complex Sys- tems Based on Scale-Types and Idemtity Constraints, in: Proc. of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI 1997), 810-819,
work page 1997
-
[37]
[Wigner, 2012] Eugene Paul Wigner: Philosophical Reflec- tions and Syntheses, 549, Springer Science & Business Media,
work page 2012
-
[38]
Soldatova, Kurt De Grave, Jan Ramon, Michaela de Clare, Worachart Sirawaraporn, Stephen G
[Williams et al., 2015] Kevin Williams, Elizabeth Bilsland, Andrew Sparkes, Wayne Aubrey, Michael Young, Larisa N. Soldatova, Kurt De Grave, Jan Ramon, Michaela de Clare, Worachart Sirawaraporn, Stephen G. Oliver, Ross D. King: Cheaper faster drug development validated by the repositioning of drugs against neglected tropical dis- eases, Journal of the Roy...
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.