pith. machine review for the scientific record. sign in

arxiv: 2305.01582 · v3 · submitted 2023-05-02 · 🌌 astro-ph.IM · cs.LG· cs.NE· cs.SC· physics.data-an

Recognition: 2 theorem links

· Lean Theorem

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Authors on Pith no claims yet

Pith reviewed 2026-05-13 03:42 UTC · model grok-4.3

classification 🌌 astro-ph.IM cs.LGcs.NEcs.SCphysics.data-an
keywords symbolic regressionevolutionary algorithmsinterpretable machine learningscientific discoveryequation recoverybenchmarkingdata-driven modeling
0
0 comments X

The pith

A multi-population evolutionary search recovers historical scientific equations from data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a library for symbolic regression built to help scientists find clear, human-readable equations directly from measurements. Its core process runs multiple groups of candidate expressions through repeated cycles of evolution, simplification, and adjustment of any unknown numbers inside them. The system distributes this work across many processors and connects to neural network tools for hybrid use. It also defines a benchmark that checks whether algorithms can reconstruct past empirical equations from both their original data and versions with added synthetic noise. A sympathetic reader would care because this could shift scientific modeling from opaque predictions toward extractable rules that reveal underlying relationships.

Core claim

The paper states that its evolutionary algorithm, consisting of a multi-population search with an evolve-simplify-optimize loop and supported by a high-performance backend that fuses operators into fast kernels and computes derivatives automatically, recovers historical empirical equations when evaluated on the new EmpiricalBench benchmark of original and synthetic scientific datasets.

What carries the argument

The evolve-simplify-optimize loop inside a multi-population evolutionary algorithm that searches for symbolic expressions while tuning their scalar constants.

If this is right

  • Scientists obtain human-readable models straight from observations instead of black-box predictors.
  • Large datasets become tractable because populations of expressions can be distributed across clusters.
  • Hybrid modeling is enabled by connecting the search to deep learning packages for initialization or guidance.
  • Different symbolic regression approaches can be compared on a shared set of historical scientific cases.
  • Interpretable alternatives become available in fields where understanding the form of the relation matters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be applied to hypothesize functional forms in areas where no prior equation exists.
  • Extending the benchmark to include differential equations or constrained systems would test broader utility.
  • Real-time use during experiments might allow on-the-fly model refinement as data arrives.
  • Combining the method with other data-driven techniques could accelerate discovery in under-theorized domains.

Load-bearing premise

Recovering known historical equations from original and synthetic data serves as a good test of whether the method will work for finding new equations in fresh scientific problems.

What would settle it

Failure of the search to produce any simple symbolic expression that matches a new, previously untested dataset from a current experiment where an empirical relation is expected.

read the original abstract

PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript describes PySR, an open-source Python library for symbolic regression in scientific applications, built on the high-performance Julia backend SymbolicRegression.jl. It details a multi-population evolutionary algorithm featuring an evolve-simplify-optimize loop to handle unknown scalar constants, along with runtime SIMD kernel fusion, automatic differentiation, distributed computing across clusters, and interfaces to deep learning packages. The work also introduces EmpiricalBench, a benchmark that quantifies applicability by measuring recovery rates of historical empirical equations from both original and synthetic datasets.

Significance. If the implementation details and benchmark results hold, the paper delivers a practical, accessible tool for interpretable machine learning that lowers barriers to symbolic regression in fields such as astrophysics. Notable strengths include the fully open-source release with reproducible code, the optimized distributed backend, and the introduction of EmpiricalBench as an externally defined evaluation framework grounded in real historical equations rather than synthetic toy problems.

major comments (1)
  1. [EmpiricalBench] In the section introducing EmpiricalBench: the central claim that recovery of historical empirical equations from original and synthetic datasets quantifies applicability for science rests on the untested assumption that these datasets (in noise level, dimensionality, operator sets, and selection biases) are representative of future unknown-equation discovery tasks. The manuscript provides no direct tests on contemporary problems where the true functional form is unknown a priori, which is load-bearing for the 'practical for science' assertion.
minor comments (2)
  1. [Abstract] Abstract: the interfaces with deep learning packages are mentioned but not enumerated; the main text should explicitly name them (e.g., PyTorch, TensorFlow) with version or usage details for reproducibility.
  2. [Software architecture] Software architecture section: clarify the exact mechanism by which user-defined operators are fused into SIMD kernels at runtime, including any constraints on operator arity or type.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation of the manuscript and for the constructive comment on EmpiricalBench. We address the point below and propose a targeted revision to clarify the benchmark's scope.

read point-by-point responses
  1. Referee: In the section introducing EmpiricalBench: the central claim that recovery of historical empirical equations from original and synthetic datasets quantifies applicability for science rests on the untested assumption that these datasets (in noise level, dimensionality, operator sets, and selection biases) are representative of future unknown-equation discovery tasks. The manuscript provides no direct tests on contemporary problems where the true functional form is unknown a priori, which is load-bearing for the 'practical for science' assertion.

    Authors: We appreciate the referee highlighting this assumption. EmpiricalBench was constructed precisely to provide a more grounded evaluation than purely synthetic toy problems by recovering equations that were historically discovered from real data. The benchmark incorporates both the original observational datasets (where they exist) and synthetically regenerated versions that preserve the functional form while allowing controlled variation in noise, dimensionality, and sampling. This design directly tests recovery under conditions mirroring past scientific discovery. We agree that no benchmark can exhaustively represent all possible future tasks, and that direct quantitative tests on problems whose functional form is truly unknown a priori are not possible within a recovery benchmark, because success cannot be measured without ground truth. In the revised manuscript we will add an explicit limitations paragraph to the EmpiricalBench section that discusses the representativeness assumptions regarding noise levels, dimensionality, operator sets, and selection biases, and that qualifies the 'practical for science' claim accordingly. This addition will make the scope of the benchmark transparent without altering the reported results. revision: partial

Circularity Check

0 steps flagged

No circularity: software description and external benchmark definition are self-contained

full rationale

The paper describes the PySR library implementation (multi-population evolutionary algorithm with evolve-simplify-optimize loop) and introduces EmpiricalBench as a new benchmark that measures recovery rates on historical empirical equations and their synthetic versions. No derivation, prediction, or central claim reduces by construction to fitted parameters, self-defined quantities, or a load-bearing self-citation chain. The benchmark is defined externally via known historical equations rather than derived from the algorithm's outputs, and the software claims rest on released code rather than internal redefinition. This is a standard methods/software paper with no self-referential reduction in its assertions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software-description and benchmark paper; the central claims rest on engineering choices and the definition of the benchmark rather than scientific axioms or fitted parameters.

pith-pipeline@v0.9.0 · 5487 in / 1058 out tokens · 85373 ms · 2026-05-13T03:42:43.113494+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Cost.FunctionalEquation washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    PySR is an open-source library for practical symbolic regression... built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages.

  • Foundation.HierarchyEmergence hierarchy_emergence_forces_phi unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    we also introduce a new benchmark, 'EmpiricalBench,' to quantify the applicability of symbolic regression algorithms in science.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 31 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SEVerA: Verified Synthesis of Self-Evolving Agents

    cs.LG 2026-03 unverdicted novelty 8.0

    SEVerA uses Formally Guarded Generative Models and a three-stage Search-Verification-Learning process to synthesize self-evolving agents that satisfy hard formal constraints while improving task performance.

  2. KAN: Kolmogorov-Arnold Networks

    cs.LG 2024-04 conditional novelty 8.0

    KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.

  3. The finite expression method for turbulent dynamics with high-order moment recovery

    cs.LG 2026-05 unverdicted novelty 7.0

    A two-stage symbolic regression plus generative model framework recovers governing interaction terms and forcing in stochastic triad models while accurately predicting statistical moments up to order five.

  4. Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs

    cs.AI 2026-05 unverdicted novelty 7.0

    A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortio...

  5. Reconstructing conformal field theoretical compositions with Transformers

    hep-th 2026-05 unverdicted novelty 7.0

    Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.

  6. Additive Atomic Forests for Symbolic Function and Antiderivative Discovery

    cs.LG 2026-05 unverdicted novelty 7.0

    A derivative algebra with EML and SOL primitives plus additive atomic forests enables simultaneous symbolic recovery of functions and antiderivatives from data, matching or exceeding XGBoost on 13 of 17 benchmarks wit...

  7. Machine Collective Intelligence for Explainable Scientific Discovery

    cs.AI 2026-04 unverdicted novelty 7.0

    Machine collective intelligence uses coordinated AI agents to evolve symbolic hypotheses and recover governing equations from observations in deterministic, stochastic, and uncharacterized systems, achieving up to six...

  8. Neuro-Symbolic ODE Discovery with Latent Grammar Flow

    cs.LG 2026-04 unverdicted novelty 7.0

    Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by d...

  9. First observational constraints on cosmic backreaction over an extended redshift range

    astro-ph.CO 2026-04 unverdicted novelty 7.0

    First direct constraints on total cosmic backreaction over a significant redshift range are consistent with vanishing backreaction within 1 sigma but are too weak to exclude meaningful backreaction.

  10. LLM-ODE: Data-driven Discovery of Dynamical Systems with Large Language Models

    cs.LG 2026-03 unverdicted novelty 7.0

    LLM-ODE integrates large language models into genetic programming to guide symbolic search for governing equations of dynamical systems, outperforming classical GP on 91 test cases in efficiency and solution quality.

  11. In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks

    cs.LG 2026-03 unverdicted novelty 7.0

    In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.

  12. AlphaEvolve: A coding agent for scientific and algorithmic discovery

    cs.AI 2025-06 unverdicted novelty 7.0

    AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, ...

  13. Primordial Black Hole from Tensor-induced Density Fluctuation: First-order Phase Transitions and Domain Walls

    astro-ph.CO 2026-05 unverdicted novelty 6.0

    Tensor perturbations from first-order phase transitions and domain wall annihilation induce curvature fluctuations at second order that form primordial black holes, allowing asteroid-mass PBHs to comprise all dark mat...

  14. FePySR: A Neural Feature Extraction Framework for Efficient and Scalable Symbolic Regression

    cs.SC 2026-05 unverdicted novelty 6.0

    FePySR uses a neural network to pre-extract valid features before PySR search, recovering more equations than baselines on benchmarks and identifying governing ODEs in 24 of 100 biological cases where PySR finds none.

  15. GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing

    cs.AI 2026-05 unverdicted novelty 6.0

    GESR uses two BERT models to intelligently direct mutations and crossovers inside genetic programming, yielding higher efficiency and competitive accuracy on symbolic regression benchmarks.

  16. GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing

    cs.AI 2026-05 unverdicted novelty 6.0

    GESR uses BERT models as guided 'gene editors' within genetic programming to direct mutations and crossovers, yielding higher efficiency and competitive performance on symbolic regression benchmarks.

  17. GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing

    cs.AI 2026-05 unverdicted novelty 6.0

    GESR uses two BERT models to intelligently guide mutations and crossovers in genetic programming for symbolic regression, claiming better efficiency than standard GP.

  18. Discovery of Nonlinear Dynamics with Automated Basis Function Generation

    cs.LG 2026-05 unverdicted novelty 6.0

    AutoSINDy automatically builds a tailored basis library from PySR symbolic regression and applies SINDy to recover ground-truth nonlinear dynamics with 92.8% success under noise.

  19. Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation

    cs.AI 2026-05 unverdicted novelty 6.0

    DoLQ employs a sampler agent, parameter optimizer, and LLM-based scientist agent to iteratively propose, refine, and evaluate ODE candidates, yielding higher success rates and better symbolic term recovery than prior ...

  20. Programmatic Context Augmentation for LLM-based Symbolic Regression

    cs.AI 2026-05 unverdicted novelty 6.0

    Programmatic context augmentation lets LLM-based symbolic regression perform code-driven data analysis during search, yielding superior efficiency and accuracy over baselines on LLM-SRBench.

  21. Interpretable Analytic Formulae for GWTC-4 Binary Black Hole Population Properties via Symbolic Regression

    astro-ph.CO 2026-04 unverdicted novelty 6.0

    Symbolic regression on GWTC-4 posteriors yields closed-form analytic formulae for merger-rate evolution, effective-spin dependencies on mass ratio and redshift, and conditional mass-ratio distributions at specific pri...

  22. Physics-Informed Neural Networks for Biological $2\mathrm{D}{+}t$ Reaction-Diffusion Systems

    cs.LG 2026-04 unverdicted novelty 6.0

    BINNs are extended to 2D+t systems and combined with symbolic regression to recover reaction-diffusion models of lung cancer cell dynamics from time-lapse microscopy data.

  23. Machine Learning Hamiltonian Dynamical Systems with Sparse and Noisy Data

    cs.LG 2026-04 unverdicted novelty 6.0

    ASRNNs recover Hamiltonian dynamics and symbolic equations from trajectories with only two irregularly spaced noisy points by preserving symplectic structure without derivative estimation.

  24. Discovering quantum phenomena with Interpretable Machine Learning

    quant-ph 2026-04 unverdicted novelty 6.0

    Variational autoencoders combined with symbolic regression extract physically meaningful representations and order parameters from raw quantum measurement data, revealing new phenomena such as corner-ordering in Rydbe...

  25. Into the Gompverse: A robust Gompertzian reionization model for CMB analyses

    astro-ph.CO 2026-04 unverdicted novelty 6.0

    A Gompertzian reionization model with three nuisance parameters demotes optical depth to a derived quantity, reducing its uncertainty by a factor of three and revealing potential neutrino mass tension in CMB analyses.

  26. Model-independent constraints on generalized FLRW consistency relations with bootstrap-based symbolic regression

    astro-ph.CO 2026-04 unverdicted novelty 6.0

    Bootstrap-based symbolic regression on supernova and BAO data finds mild 2-4 sigma deviations from FLRW consistency relations, which if real would rule out most FLRW-based solutions to cosmological tensions.

  27. Discovery of Interpretable Surrogates via Agentic AI: Application to Gravitational Waves

    gr-qc 2026-05 unverdicted novelty 5.0

    GWAgent agentic workflow produces analytic surrogates for eccentric BBH waveforms with 6.9e-4 median mismatch and 8.4x speedup, outperforming baselines, and infers eccentricity for GW200129.

  28. Balance-Guided Sparse Identification of Multiscale Nonlinear PDEs with Small-coefficient Terms

    cs.LG 2026-04 unverdicted novelty 5.0

    BG-SINDy reformulates l0-constrained regression as term-level l2,0 regularization and uses progressive pruning guided by balance contributions to recover small-coefficient terms in multiscale PDEs.

  29. Singularity Formation: Synergy in Theoretical, Numerical and Machine Learning Approaches

    math.NA 2026-04 unverdicted novelty 5.0

    The work introduces a modulation-based analytical method for singularity proofs in singular PDEs and refines ML techniques like PINNs and KANs to identify blowup solutions, with application to the open 3D Keller-Segel...

  30. Identifying Topological Invariants of Non-Hermitian Systems via Domain-Adaptive Multimodal Model for Mathematics

    cond-mat.other 2026-04 unverdicted novelty 4.0

    A multimodal model with Qwen Math backbone identifies topological invariants of non-Hermitian systems from eigenvalues and eigenvectors in momentum space.

  31. Experimental Design for Missing Physics

    stat.ML 2026-03 unverdicted novelty 4.0

    A sequential experimental design technique discriminates between model structures from symbolic regression to discover missing physics in process systems such as bioreactors.

Reference graph

Works this paper leans on

106 extracted references · 106 canonical work pages · cited by 29 Pith papers

  1. [1]

    Running, Philadelphia, Pa.; London, 2004

    Stephen Hawking.On the Shoulders of Giants: The Great Works of Physics and Astronomy. Running, Philadelphia, Pa.; London, 2004

  2. [2]

    Über eine Verbesserung der Wien’schen Spectralgleichung

    Max Planck. Über eine Verbesserung der Wien’schen Spectralgleichung. Friedr. Vieweg & Sohn, 1900

  3. [3]

    Marco Virgolin and Solon P. Pissis. Symbolic Regression is NP-hard, July 2022. 18

  4. [4]

    Information processing, data inferences, and scientific generalization.Behav- ioral Science, 19(5):314–325, 1974

    Donald Gerwin. Information processing, data inferences, and scientific generalization.Behav- ioral Science, 19(5):314–325, 1974

  5. [5]

    BACON: A production system that discovers empirical laws

    Pat Langley. BACON: A production system that discovers empirical laws. InIJCAI, 1977

  6. [6]

    Rediscovering physics with BA- CON.3

    Pat Langley. Rediscovering physics with BA- CON.3. InProceedings of the 6th International Joint Conference on Artificial Intelligence - Volume 1, IJCAI’79, pages 505–507, San Fran- cisco, CA, USA, 1979. Morgan Kaufmann Pub- lishers Inc

  7. [7]

    PatLangley, GaryL.Bradshaw, andHerbertA. Simon. BACON.5: The discovery of conserva- tion laws. InIJCAI, 1981

  8. [8]

    Pat Langley and Jan M. Zytkow. Data-driven approaches to empirical discovery.Artificial In- telligence, 40(1):283–312, 1989

  9. [9]

    John R. Koza. Genetic programming as a means for programming computers by natural selection. Statistics and Computing, 4(2):87– 112, June 1994

  10. [10]

    FromtheCover: Automated reverse engineering of nonlinear dy- namical systems

    JoshBongardandHodLipson. FromtheCover: Automated reverse engineering of nonlinear dy- namical systems. Proceedings of the National Academy of Science, 104(24):9943–9948, June 2007

  11. [11]

    Distilling Free-Form Natural Laws from Experimental Data

    Michael Schmidt and Hod Lipson. Distilling Free-Form Natural Laws from Experimental Data. Science, 324(5923):81–85, April 2009

  12. [12]

    Schmidt and H

    M. Schmidt and H. Lipson. Symbolic regres- sion of implicit equations. Genetic Program- ming Theory and Practice VII, pages 73–85, 2010

  13. [13]

    Wagner and M

    S. Wagner and M. Affenzeller. HeuristicLab: A Generic and Extensible Optimization Envi- ronment. In Bernardete Ribeiro, Rudolf F. Al- brecht, Andrej Dobnikar, David W. Pearson, and Nigel C. Steele, editors,Adaptive and Nat- ural Computing Algorithms, pages 538–541, Vi- enna, 2005. Springer

  14. [14]

    Meyarivan

    KalyanmoyDeb, SamirAgrawal, AmritPratap, and T. Meyarivan. A Fast Elitist Non- dominated Sorting Genetic Algorithm for Multi-objective Optimization: NSGA-II. In Marc Schoenauer, Kalyanmoy Deb, Günther Rudolph, XinYao, EvelyneLutton, JuanJulian Merelo, and Hans-Paul Schwefel, editors,Paral- lel Problem Solving from Nature PPSN VI, Lec- tureNotesinComputerS...

  15. [15]

    K. Deb, A. Pratap, S. Agarwal, and T. Meyari- van. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197, April 2002

  16. [16]

    Davidson, D.A

    J.W. Davidson, D.A. Savic, and G.A. Wal- ters. Symbolic and numerical regression: Ex- periments and applications. Information Sci- ences, 150(1):95–117, 2003

  17. [17]

    Sean Bowman

    Kyle Cranmer and R. Sean Bowman. Physic- sGP: A Genetic Programming approach to event selection. Computer Physics Communi- cations, 167(3):165–176, May 2005

  18. [18]

    Epsilon-Lexicase Selection for Re- gression

    William La Cava, Lee Spector, and Kourosh Danai. Epsilon-Lexicase Selection for Re- gression. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 , GECCO ’16, pages 741–748, New York, NY, USA, July 2016. Association for Computing Machinery

  19. [19]

    William La Cava, Thomas Helmuth, Lee Spec- tor, and Jason H. Moore. A probabilistic and multi-objective analysis of lexicase selec- tion and epsilon-lexicase selection, April 2018

  20. [20]

    Marco Virgolin, Tanja Alderliesten, Cees Wit- teveen, and Peter A. N. Bosman. Scalable ge- netic programming by gene-pool optimal mix- ing and input-space entropy-based building- block learning. In Proceedings of the Ge- netic and Evolutionary Computation Confer- ence, GECCO ’17, pages 1041–1048, New York, NY, USA, July 2017. Association for Comput- ing Machinery

  21. [21]

    Marco Virgolin, Tanja Alderliesten, Cees Wit- teveen, and Peter A. N. Bosman. Improving Model-based Genetic Programming for Sym- bolic Regression of Small Expressions. Evo- 19 lutionary Computation, 29(2):211–237, June 2021

  22. [22]

    Cranmer, Rui Xu, Peter Battaglia, and Shirley Ho

    Miles D. Cranmer, Rui Xu, Peter Battaglia, and Shirley Ho. Learning Symbolic Physics with Graph Networks. ML4Physics Workshop @ NeurIPS 2019, November 2019

  23. [23]

    Discovering Symbolic Models from Deep Learning with Inductive Bi- ases

    Miles Cranmer, Alvaro Sanchez-Gonzalez, Pe- ter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. Discovering Symbolic Models from Deep Learning with Inductive Bi- ases. NeurIPS, June 2020

  24. [24]

    Dis- entangled Sparsity Networks for Explainable AI

    Miles Cranmer, Can Cui, Drummond B Field- ing, Shirley Ho, Alvaro Sanchez-Gonzalez, Kimberly Stachenfeld, Tobias Pfaff, et al. Dis- entangled Sparsity Networks for Explainable AI. Workshop on Sparse Neural Networks , page 7, July 2021

  25. [25]

    Bayesian Symbolic Regression

    Ying Jin, Weilin Fu, Jian Kang, Jiadong Guo, and Jian Guo. Bayesian Symbolic Regression. January 2020

  26. [26]

    Massucci, Manuel Miranda, Jordi Pallarès, and Marta Sales- Pardo

    Roger Guimerà, Ignasi Reichardt, Antoni Aguilar-Mogas, Francesco A. Massucci, Manuel Miranda, Jordi Pallarès, and Marta Sales- Pardo. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Science Advances, 6(5), 2020

  27. [27]

    Petersen, Mikel Landajuela, T

    Brenden K. Petersen, Mikel Landajuela, T. Nathan Mundhenk, Claudio P. Santiago, Soo K. Kim, and Joanne T. Kim. Deep sym- bolic regression: Recovering mathematical ex- pressions from data via risk-seeking policy gra- dients, April 2021

  28. [28]

    Neural-guided symbolic regression with asymptotic constraints

    Li Li, Minjie Fan, Rishabh Singh, and Patrick Riley. Neural-guided symbolic regression with asymptotic constraints. arXiv preprint arXiv:1901.07714, 2019

  29. [29]

    Deep Symbolic Regression for Recurrent Se- quences, June 2022

    Stéphaned’Ascoli, Pierre-AlexandreKamienny, Guillaume Lample, and François Charton. Deep Symbolic Regression for Recurrent Se- quences, June 2022

  30. [30]

    End- to-end symbolic regression with transformers

    Pierre-AlexandreKamienny, Stéphaned’Ascoli, GuillaumeLample, andFrançoisCharton. End- to-end symbolic regression with transformers. April 2022

  31. [31]

    AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity

    Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng, Orisvaldo Neto, Tailin Wu, and Max Tegmark. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity. In Advances in Neural Information Pro- cessing Systems, volume 33, pages 4860–4871. Curran Associates, Inc., 2020

  32. [32]

    AI poincaré: Machinelearningconservationlawsfromtrajec- tories

    Ziming Liu and Max Tegmark. AI poincaré: Machinelearningconservationlawsfromtrajec- tories. arXiv e-prints, page arXiv:2011.04698, November 2020

  33. [33]

    Wetzel, Roger G

    Sebastian J. Wetzel, Roger G. Melko, Joseph Scott, Maysum Panju, and Vijay Ganesh. Dis- covering symmetry invariants and conserved quantities by interpreting siamese neural net- works. Physical Review Research, 2(3):033499, September 2020

  34. [34]

    Learning Equations for Extrapola- tion and Control

    Subham Sahoo, Christoph Lampert, and Georg Martius. Learning Equations for Extrapola- tion and Control. volume 80 of Proceedings of Machine Learning Research, pages 4442– 4450, Stockholmsmässan, Stockholm Sweden, July 2018. PMLR

  35. [35]

    Data-driven discovery of free-form governing differential equations.arXiv preprint arXiv:1910.05117, 2019

    Steven Atkinson, Waad Subber, Liping Wang, Genghis Khan, Philippe Hawi, and Roger Ghanem. Data-driven discovery of free-form governing differential equations.arXiv preprint arXiv:1910.05117, 2019

  36. [36]

    Benchmarking of machine learning ocean subgrid parameterizations in an idealized model, October 2022

    Andrew Slavin Ross, Ziwei Li, Pavel Perezhogin, Carlos Fernandez-Granda, and Laure Zanna. Benchmarking of machine learning ocean subgrid parameterizations in an idealized model, October 2022

  37. [37]

    Brameier and W

    M. Brameier and W. Banzhaf. A comparison of linear genetic programming and neural net- works in medical data mining. IEEE Trans- actions on Evolutionary Computation, 5(1):17– 26, February 2001

  38. [38]

    Linear genetic programming for time-series modelling of daily flow rate.Journal of Earth System Science, 118(2):137–146, April 2009

    Aytac Guven. Linear genetic programming for time-series modelling of daily flow rate.Journal of Earth System Science, 118(2):137–146, April 2009

  39. [39]

    Evolving symbolic density 20 functionals

    He Ma, Arunachalam Narayanaswamy, Patrick Riley, and Li Li. Evolving symbolic density 20 functionals. Science Advances, 8(36):eabq0279, September 2022

  40. [40]

    Douglas Mota Dias and Marco Aurélio C. Pacheco. Describing Quantum-Inspired Linear Genetic Programming from symbolic regression problems. In 2012 IEEE Congress on Evolu- tionary Computation, pages 1–8, June 2012

  41. [41]

    Brunton, Joshua L

    Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equa- tions from data by sparse identification of non- linear dynamical systems. Proceedings of the National Academy of Sciences, 113(15):3932– 3937, 2016

  42. [42]

    Data-driven dis- covery of partial differential equations.Science Advances, 3(4):e1602614, 2017

    Samuel H Rudy, Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Data-driven dis- covery of partial differential equations.Science Advances, 3(4):e1602614, 2017

  43. [43]

    Nathan Kutz, and Steven L

    Kathleen Champion, Bethany Lusch, J. Nathan Kutz, and Steven L. Brunton. Data-driven dis- covery of coordinates and governing equations. arXiv e-prints, page arXiv:1904.02107, March 2019

  44. [44]

    Operon C++: An effi- cient genetic programming framework for sym- bolic regression

    Bogdan Burlacu, Gabriel Kronberger, and Michael Kommenda. Operon C++: An effi- cient genetic programming framework for sym- bolic regression. In Proceedings of the 2020 Genetic and Evolutionary Computation Con- ference Companion, GECCO ’20, pages 1562– 1570, New York, NY, USA, July 2020. Associ- ation for Computing Machinery

  45. [45]

    FFX: Fast, Scalable, Deter- ministic Symbolic Regression Technology

    Trent McConaghy. FFX: Fast, Scalable, Deter- ministic Symbolic Regression Technology. In Rick Riolo, Ekaterina Vladislavleva, and Ja- son H. Moore, editors, Genetic Programming Theory and Practice IX, Genetic and Evolu- tionary Computation, pages 235–260. Springer, New York, NY, 2011

  46. [46]

    Nathan Kutz, and Steven L

    Bethany Lusch, J. Nathan Kutz, and Steven L. Brunton. Deep learning for universal linear em- beddings of nonlinear dynamics.Nature Com- munications, 9:4950, November 2018

  47. [47]

    DeepMoD: Deep learning for Model Discovery in noisy data

    Gert-Jan Both, Subham Choudhury, Pierre Sens, and Remy Kusters. DeepMoD: Deep learning for Model Discovery in noisy data. 2019

  48. [48]

    Deep learningofphysicallawsfromscarcedata

    Zhao Chen, Yang Liu, and Hao Sun. Deep learningofphysicallawsfromscarcedata. 2020

  49. [49]

    Universal differential equations for scientific machine learning.arXiv preprint arXiv:2001.04385, 2020

    Christopher Rackauckas, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Ro- hit Supekar, Dominic Skinner, and Ali Ra- madhan. Universal differential equations for scientific machine learning. arXiv preprint arXiv:2001.04385, 2020

  50. [50]

    Sahinidis

    Alison Cozad and Nikolaos V. Sahinidis. A global MINLP approach to symbolic regres- sion. Mathematical Programming, 170(1):97– 119, July 2018

  51. [51]

    A Greedy Search Tree Heuristic for Symbolic Regression.Infor- mation Sciences, 442–443:18–32, May 2018

    Fabricio Olivetti de Franca. A Greedy Search Tree Heuristic for Symbolic Regression.Infor- mation Sciences, 442–443:18–32, May 2018

  52. [52]

    AI Descartes: Combining Data and Theory for Derivable Sci- entific Discovery.arXiv:2109.01634 [cs], Octo- ber 2021

    Cristina Cornelio, Sanjeeb Dash, Vernon Aus- tel, Tyler Josephson, Joao Goncalves, Kenneth Clarkson, Nimrod Megiddo, et al. AI Descartes: Combining Data and Theory for Derivable Sci- entific Discovery.arXiv:2109.01634 [cs], Octo- ber 2021

  53. [53]

    A. M. Price-Whelan, B. M. Sip’ocz, H. M. G"unther, P. L. Lim, S. M. Crawford, S. Con- seil, D. L. Shupe, et al. The Astropy Project: Building an Open-science Project and Status of the v2.0 Core Package.aj, 156:123, September 2018

  54. [54]

    Price- Whelan, Pey Lian Lim, Nicholas Earl, Nathaniel Starkman, Larry Bradley, David L

    Astropy Collaboration, Adrian M. Price- Whelan, Pey Lian Lim, Nicholas Earl, Nathaniel Starkman, Larry Bradley, David L. Shupe, et al. The Astropy Project: Sustain- ing and Growing a Community-oriented Open- source Project and the Latest Major Release (v5.0) of the Core Package.The Astrophysical Journal, 935:167, August 2022

  55. [55]

    PhD thesis, 1980

    Anne Brindle.Genetic Algorithms for Function Optimization. PhD thesis, 1980

  56. [56]

    Goldberg and Kalyanmoy Deb

    David E. Goldberg and Kalyanmoy Deb. A Comparative Analysis of Selection Schemes Used in Genetic Algorithms. In GREGORY J. E. Rawlins, editor, Foundations of Genetic 21 Algorithms, volume 1, pages 69–93. Elsevier, January 1991

  57. [57]

    Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. Regularized Evolution for Im- age Classifier Architecture Search.Proceedings of the AAAI Conference on Artificial Intelli- gence, 33(01):4780–4789, July 2019

  58. [58]

    AutoML-Zero: Evolving Machine Learn- ing Algorithms From Scratch

    Esteban Real, Chen Liang, David So, and Quoc Le. AutoML-Zero: Evolving Machine Learn- ing Algorithms From Scratch. InInternational Conference on Machine Learning, pages 8007–

  59. [59]

    Gregory S. Hornby. ALPS: The age-layered population structure for reducing the problem of premature convergence. InProceedings of the 8th Annual Conference on Genetic and Evolu- tionary Computation, GECCO ’06, pages 815– 822, New York, NY, USA, July 2006. Associa- tion for Computing Machinery

  60. [60]

    Schmidt and Hod Lipson

    Michael D. Schmidt and Hod Lipson. Age- fitness pareto optimization. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO ’10, pages 543–544, New York, NY, USA, July 2010. As- sociation for Computing Machinery

  61. [61]

    PySR: Fast & Parallelized Symbolic Regression in Python/Julia

    Miles Cranmer. PySR: Fast & Parallelized Symbolic Regression in Python/Julia. Zenodo, September 2020

  62. [62]

    Kirkpatrick, C

    S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by Simulated Annealing.Science, 220:671–680, May 1983

  63. [63]

    C. G. Broyden. The convergence of a class of double-rank minimization algorithms 1. Gen- eral considerations. IMA Journal of Applied Mathematics, 6(1):76–90, March 1970

  64. [64]

    Optim: A mathematical optimization package for Julia.Journal of Open Source Soft- ware, 3(24):615, 2018

    Patrick Kofod Mogensen and Asbjørn Nilsen Riseth. Optim: A mathematical optimization package for Julia.Journal of Open Source Soft- ware, 3(24):615, 2018

  65. [65]

    Alexander Topchy and W. F. Punch. Faster genetic programming based on local gradient search of numeric leaf values. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation, GECCO’01, pages 155–162, San Francisco, CA, USA, July 2001. Morgan Kaufmann Publishers Inc

  66. [66]

    Scikit- learn: Machine Learning in Python

    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. Scikit- learn: Machine Learning in Python. J. Mach. Learn. Res., 12:2825–2830, November 2011

  67. [67]

    Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B

    Aaron Meurer, Christopher P. Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B. Kirpichev, Matthew Rocklin, AMiT Kumar, et al. Sympy: symbolic computing in python. PeerJ Com- puter Science, 3:e103, January 2017

  68. [68]

    Harris, K

    Charles R. Harris, K. Jarrod Millman, Sté- fan J. van der Walt, Ralf Gommers, Pauli Vir- tanen, David Cournapeau, Eric Wieser, et al. Array programming with NumPy. Nature, 585(7825):357–362, September 2020

  69. [69]

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32 , pages 8024–8035. Curran Assoc...

  70. [70]

    JAX: compos- able transformations of Python+NumPy pro- grams, 2018

    James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, et al. JAX: compos- able transformations of Python+NumPy pro- grams, 2018

  71. [71]

    Rediscovering Newton’s gravity and Solar System properties using deep learning and inductive biases

    Pablo Lemos, Niall Jeffrey, Miles Cranmer, Pe- ter Battaglia, and Shirley Ho. Rediscovering Newton’s gravity and Solar System properties using deep learning and inductive biases. In submission, 2022

  72. [72]

    Whydotree-basedmodelsstillout- perform deep learning on tabular data?, July 2022

    Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Whydotree-basedmodelsstillout- perform deep learning on tabular data?, July 2022

  73. [73]

    AI feynman: A physics-inspired method 22 for symbolic regression

    Silviu-Marian Udrescu and Max Tegmark. AI feynman: A physics-inspired method 22 for symbolic regression. Science Advances, 6(16):eaay2631, 2020

  74. [74]

    Kaptanoglu, Brian M

    Alan A. Kaptanoglu, Brian M. de Silva, Ur- ban Fasel, Kadierdan Kaheman, Andy J. Gold- schmidt, Jared L. Callaham, Charles B. De- lahunt, et al. PySINDy: A comprehensive Python package for robust sparse system iden- tification. November 2021

  75. [75]

    An Approach to Sym- bolic Regression Using Feyn

    Kevin René Broløs, Meera Vieira Machado, Chris Cave, Jaan Kasak, Valdemar Stentoft- Hansen, Victor Galindo Batanero, Tom Jelen, and Casper Wilstrup. An Approach to Sym- bolic Regression Using Feyn. April 2021

  76. [76]

    NguyenQuangUy, NguyenXuanHoai, Michael O’Neill, R. I. McKay, and Edgar Galván-López. Semantically-based crossover in genetic pro- gramming: Application to real-valued symbolic regression. Genetic Programming and Evolvable Machines, 12(2):91–119, June 2011

  77. [77]

    F. O. de Franca, M. Virgolin, M. Kommenda, M. S. Majumder, M. Cranmer, G. Espada, L. Ingelse, et al. Interpretable Symbolic Re- gression for Data Science: Analysis of the 2022 Competition, April 2023

  78. [78]

    Webplotdigitizer: Version 4.6, 2022

    Ankit Rohatgi. Webplotdigitizer: Version 4.6, 2022

  79. [79]

    A relation between distance and radial velocity among extra-galactic neb- ulae

    Edwin Hubble. A relation between distance and radial velocity among extra-galactic neb- ulae. Proceedings of the National Academy of Sciences, 15(3):168–173, March 1929

  80. [80]

    William La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabrício Olivetti de França, Marco Virgolin, Ying Jin, Michael Kommenda, and Jason H. Moore. Contemporary Symbolic Re- gression Methods and their Relative Perfor- mance, July 2021

Showing first 80 references.