arxiv: 2305.01582 · v3 · submitted 2023-05-02 · 🌌 astro-ph.IM · cs.LG· cs.NE· cs.SC· physics.data-an

Recognition: 2 theorem links

· Lean Theorem

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Miles Cranmer (Princeton University , Flatiron Institute)

Authors on Pith no claims yet

Pith reviewed 2026-05-13 03:42 UTC · model grok-4.3

classification 🌌 astro-ph.IM cs.LGcs.NEcs.SCphysics.data-an

keywords symbolic regressionevolutionary algorithmsinterpretable machine learningscientific discoveryequation recoverybenchmarkingdata-driven modeling

0 comments

The pith

A multi-population evolutionary search recovers historical scientific equations from data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a library for symbolic regression built to help scientists find clear, human-readable equations directly from measurements. Its core process runs multiple groups of candidate expressions through repeated cycles of evolution, simplification, and adjustment of any unknown numbers inside them. The system distributes this work across many processors and connects to neural network tools for hybrid use. It also defines a benchmark that checks whether algorithms can reconstruct past empirical equations from both their original data and versions with added synthetic noise. A sympathetic reader would care because this could shift scientific modeling from opaque predictions toward extractable rules that reveal underlying relationships.

Core claim

The paper states that its evolutionary algorithm, consisting of a multi-population search with an evolve-simplify-optimize loop and supported by a high-performance backend that fuses operators into fast kernels and computes derivatives automatically, recovers historical empirical equations when evaluated on the new EmpiricalBench benchmark of original and synthetic scientific datasets.

What carries the argument

The evolve-simplify-optimize loop inside a multi-population evolutionary algorithm that searches for symbolic expressions while tuning their scalar constants.

If this is right

Scientists obtain human-readable models straight from observations instead of black-box predictors.
Large datasets become tractable because populations of expressions can be distributed across clusters.
Hybrid modeling is enabled by connecting the search to deep learning packages for initialization or guidance.
Different symbolic regression approaches can be compared on a shared set of historical scientific cases.
Interpretable alternatives become available in fields where understanding the form of the relation matters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be applied to hypothesize functional forms in areas where no prior equation exists.
Extending the benchmark to include differential equations or constrained systems would test broader utility.
Real-time use during experiments might allow on-the-fly model refinement as data arrives.
Combining the method with other data-driven techniques could accelerate discovery in under-theorized domains.

Load-bearing premise

Recovering known historical equations from original and synthetic data serves as a good test of whether the method will work for finding new equations in fresh scientific problems.

What would settle it

Failure of the search to produce any simple symbolic expression that matches a new, previously untested dataset from a current experiment where an empirical relation is expected.

read the original abstract

PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PySR is a usable open-source symbolic regression library with a new benchmark on historical equations, but the benchmark's relevance to fresh discovery tasks is the main open question.

read the letter

PySR gives scientists a ready-to-run library for symbolic regression, built around a fast Julia backend that supports distributed populations, runtime SIMD fusion for custom operators, and an evolve-simplify-optimize loop that tunes constants in discovered expressions. The Python interface and hooks into deep learning packages make it straightforward to try on real data without writing everything from scratch. Releasing both the core SymbolicRegression.jl package and the wrapper, plus the EmpiricalBench benchmark that checks recovery of known empirical equations on original and synthetic versions of those datasets, is the concrete new piece here. The code is public, so performance claims can be checked directly rather than taken on trust. The benchmark design is reproducible and grounded in actual scientific history, which is better than purely synthetic tests. The main limitation is that success on recovering past equations does not automatically show how the method will behave on new problems where the functional form is unknown and the data characteristics may differ in noise level, number of variables, or operator needs. The paper does not include tests on contemporary open-ended discovery tasks, so the link from benchmark numbers to broad practical utility for science rests on that assumption. This is a tool paper aimed at empirical researchers who want an off-the-shelf interpretable modeling option rather than a theoretical advance in search algorithms. The implementation looks solid enough to warrant peer review, mainly for feedback on benchmark scope and documentation.

Referee Report

1 major / 2 minor

Summary. The manuscript describes PySR, an open-source Python library for symbolic regression in scientific applications, built on the high-performance Julia backend SymbolicRegression.jl. It details a multi-population evolutionary algorithm featuring an evolve-simplify-optimize loop to handle unknown scalar constants, along with runtime SIMD kernel fusion, automatic differentiation, distributed computing across clusters, and interfaces to deep learning packages. The work also introduces EmpiricalBench, a benchmark that quantifies applicability by measuring recovery rates of historical empirical equations from both original and synthetic datasets.

Significance. If the implementation details and benchmark results hold, the paper delivers a practical, accessible tool for interpretable machine learning that lowers barriers to symbolic regression in fields such as astrophysics. Notable strengths include the fully open-source release with reproducible code, the optimized distributed backend, and the introduction of EmpiricalBench as an externally defined evaluation framework grounded in real historical equations rather than synthetic toy problems.

major comments (1)

[EmpiricalBench] In the section introducing EmpiricalBench: the central claim that recovery of historical empirical equations from original and synthetic datasets quantifies applicability for science rests on the untested assumption that these datasets (in noise level, dimensionality, operator sets, and selection biases) are representative of future unknown-equation discovery tasks. The manuscript provides no direct tests on contemporary problems where the true functional form is unknown a priori, which is load-bearing for the 'practical for science' assertion.

minor comments (2)

[Abstract] Abstract: the interfaces with deep learning packages are mentioned but not enumerated; the main text should explicitly name them (e.g., PyTorch, TensorFlow) with version or usage details for reproducibility.
[Software architecture] Software architecture section: clarify the exact mechanism by which user-defined operators are fused into SIMD kernels at runtime, including any constraints on operator arity or type.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation of the manuscript and for the constructive comment on EmpiricalBench. We address the point below and propose a targeted revision to clarify the benchmark's scope.

read point-by-point responses

Referee: In the section introducing EmpiricalBench: the central claim that recovery of historical empirical equations from original and synthetic datasets quantifies applicability for science rests on the untested assumption that these datasets (in noise level, dimensionality, operator sets, and selection biases) are representative of future unknown-equation discovery tasks. The manuscript provides no direct tests on contemporary problems where the true functional form is unknown a priori, which is load-bearing for the 'practical for science' assertion.

Authors: We appreciate the referee highlighting this assumption. EmpiricalBench was constructed precisely to provide a more grounded evaluation than purely synthetic toy problems by recovering equations that were historically discovered from real data. The benchmark incorporates both the original observational datasets (where they exist) and synthetically regenerated versions that preserve the functional form while allowing controlled variation in noise, dimensionality, and sampling. This design directly tests recovery under conditions mirroring past scientific discovery. We agree that no benchmark can exhaustively represent all possible future tasks, and that direct quantitative tests on problems whose functional form is truly unknown a priori are not possible within a recovery benchmark, because success cannot be measured without ground truth. In the revised manuscript we will add an explicit limitations paragraph to the EmpiricalBench section that discusses the representativeness assumptions regarding noise levels, dimensionality, operator sets, and selection biases, and that qualifies the 'practical for science' claim accordingly. This addition will make the scope of the benchmark transparent without altering the reported results. revision: partial

Circularity Check

0 steps flagged

No circularity: software description and external benchmark definition are self-contained

full rationale

The paper describes the PySR library implementation (multi-population evolutionary algorithm with evolve-simplify-optimize loop) and introduces EmpiricalBench as a new benchmark that measures recovery rates on historical empirical equations and their synthetic versions. No derivation, prediction, or central claim reduces by construction to fitted parameters, self-defined quantities, or a load-bearing self-citation chain. The benchmark is defined externally via known historical equations rather than derived from the algorithm's outputs, and the software claims rest on released code rather than internal redefinition. This is a standard methods/software paper with no self-referential reduction in its assertions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software-description and benchmark paper; the central claims rest on engineering choices and the definition of the benchmark rather than scientific axioms or fitted parameters.

pith-pipeline@v0.9.0 · 5487 in / 1058 out tokens · 85373 ms · 2026-05-13T03:42:43.113494+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PySR is an open-source library for practical symbolic regression... built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages.
Foundation.HierarchyEmergence hierarchy_emergence_forces_phi unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we also introduce a new benchmark, 'EmpiricalBench,' to quantify the applicability of symbolic regression algorithms in science.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 31 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SEVerA: Verified Synthesis of Self-Evolving Agents
cs.LG 2026-03 unverdicted novelty 8.0

SEVerA uses Formally Guarded Generative Models and a three-stage Search-Verification-Learning process to synthesize self-evolving agents that satisfy hard formal constraints while improving task performance.
KAN: Kolmogorov-Arnold Networks
cs.LG 2024-04 conditional novelty 8.0

KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.
The finite expression method for turbulent dynamics with high-order moment recovery
cs.LG 2026-05 unverdicted novelty 7.0

A two-stage symbolic regression plus generative model framework recovers governing interaction terms and forcing in stochastic triad models while accurately predicting statistical moments up to order five.
Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs
cs.AI 2026-05 unverdicted novelty 7.0

A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortio...
Reconstructing conformal field theoretical compositions with Transformers
hep-th 2026-05 unverdicted novelty 7.0

Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.
Additive Atomic Forests for Symbolic Function and Antiderivative Discovery
cs.LG 2026-05 unverdicted novelty 7.0

A derivative algebra with EML and SOL primitives plus additive atomic forests enables simultaneous symbolic recovery of functions and antiderivatives from data, matching or exceeding XGBoost on 13 of 17 benchmarks wit...
Machine Collective Intelligence for Explainable Scientific Discovery
cs.AI 2026-04 unverdicted novelty 7.0

Machine collective intelligence uses coordinated AI agents to evolve symbolic hypotheses and recover governing equations from observations in deterministic, stochastic, and uncharacterized systems, achieving up to six...
Neuro-Symbolic ODE Discovery with Latent Grammar Flow
cs.LG 2026-04 unverdicted novelty 7.0

Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by d...
First observational constraints on cosmic backreaction over an extended redshift range
astro-ph.CO 2026-04 unverdicted novelty 7.0

First direct constraints on total cosmic backreaction over a significant redshift range are consistent with vanishing backreaction within 1 sigma but are too weak to exclude meaningful backreaction.
LLM-ODE: Data-driven Discovery of Dynamical Systems with Large Language Models
cs.LG 2026-03 unverdicted novelty 7.0

LLM-ODE integrates large language models into genetic programming to guide symbolic search for governing equations of dynamical systems, outperforming classical GP on 91 test cases in efficiency and solution quality.
In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks
cs.LG 2026-03 unverdicted novelty 7.0

In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.
AlphaEvolve: A coding agent for scientific and algorithmic discovery
cs.AI 2025-06 unverdicted novelty 7.0

AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, ...
Primordial Black Hole from Tensor-induced Density Fluctuation: First-order Phase Transitions and Domain Walls
astro-ph.CO 2026-05 unverdicted novelty 6.0

Tensor perturbations from first-order phase transitions and domain wall annihilation induce curvature fluctuations at second order that form primordial black holes, allowing asteroid-mass PBHs to comprise all dark mat...
FePySR: A Neural Feature Extraction Framework for Efficient and Scalable Symbolic Regression
cs.SC 2026-05 unverdicted novelty 6.0

FePySR uses a neural network to pre-extract valid features before PySR search, recovering more equations than baselines on benchmarks and identifying governing ODEs in 24 of 100 biological cases where PySR finds none.
GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing
cs.AI 2026-05 unverdicted novelty 6.0

GESR uses two BERT models to intelligently direct mutations and crossovers inside genetic programming, yielding higher efficiency and competitive accuracy on symbolic regression benchmarks.
GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing
cs.AI 2026-05 unverdicted novelty 6.0

GESR uses BERT models as guided 'gene editors' within genetic programming to direct mutations and crossovers, yielding higher efficiency and competitive performance on symbolic regression benchmarks.
GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing
cs.AI 2026-05 unverdicted novelty 6.0

GESR uses two BERT models to intelligently guide mutations and crossovers in genetic programming for symbolic regression, claiming better efficiency than standard GP.
Discovery of Nonlinear Dynamics with Automated Basis Function Generation
cs.LG 2026-05 unverdicted novelty 6.0

AutoSINDy automatically builds a tailored basis library from PySR symbolic regression and applies SINDy to recover ground-truth nonlinear dynamics with 92.8% success under noise.
Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation
cs.AI 2026-05 unverdicted novelty 6.0

DoLQ employs a sampler agent, parameter optimizer, and LLM-based scientist agent to iteratively propose, refine, and evaluate ODE candidates, yielding higher success rates and better symbolic term recovery than prior ...
Programmatic Context Augmentation for LLM-based Symbolic Regression
cs.AI 2026-05 unverdicted novelty 6.0

Programmatic context augmentation lets LLM-based symbolic regression perform code-driven data analysis during search, yielding superior efficiency and accuracy over baselines on LLM-SRBench.
Interpretable Analytic Formulae for GWTC-4 Binary Black Hole Population Properties via Symbolic Regression
astro-ph.CO 2026-04 unverdicted novelty 6.0

Symbolic regression on GWTC-4 posteriors yields closed-form analytic formulae for merger-rate evolution, effective-spin dependencies on mass ratio and redshift, and conditional mass-ratio distributions at specific pri...
Physics-Informed Neural Networks for Biological $2\mathrm{D}{+}t$ Reaction-Diffusion Systems
cs.LG 2026-04 unverdicted novelty 6.0

BINNs are extended to 2D+t systems and combined with symbolic regression to recover reaction-diffusion models of lung cancer cell dynamics from time-lapse microscopy data.
Machine Learning Hamiltonian Dynamical Systems with Sparse and Noisy Data
cs.LG 2026-04 unverdicted novelty 6.0

ASRNNs recover Hamiltonian dynamics and symbolic equations from trajectories with only two irregularly spaced noisy points by preserving symplectic structure without derivative estimation.
Discovering quantum phenomena with Interpretable Machine Learning
quant-ph 2026-04 unverdicted novelty 6.0

Variational autoencoders combined with symbolic regression extract physically meaningful representations and order parameters from raw quantum measurement data, revealing new phenomena such as corner-ordering in Rydbe...
Into the Gompverse: A robust Gompertzian reionization model for CMB analyses
astro-ph.CO 2026-04 unverdicted novelty 6.0

A Gompertzian reionization model with three nuisance parameters demotes optical depth to a derived quantity, reducing its uncertainty by a factor of three and revealing potential neutrino mass tension in CMB analyses.
Model-independent constraints on generalized FLRW consistency relations with bootstrap-based symbolic regression
astro-ph.CO 2026-04 unverdicted novelty 6.0

Bootstrap-based symbolic regression on supernova and BAO data finds mild 2-4 sigma deviations from FLRW consistency relations, which if real would rule out most FLRW-based solutions to cosmological tensions.
Discovery of Interpretable Surrogates via Agentic AI: Application to Gravitational Waves
gr-qc 2026-05 unverdicted novelty 5.0

GWAgent agentic workflow produces analytic surrogates for eccentric BBH waveforms with 6.9e-4 median mismatch and 8.4x speedup, outperforming baselines, and infers eccentricity for GW200129.
Balance-Guided Sparse Identification of Multiscale Nonlinear PDEs with Small-coefficient Terms
cs.LG 2026-04 unverdicted novelty 5.0

BG-SINDy reformulates l0-constrained regression as term-level l2,0 regularization and uses progressive pruning guided by balance contributions to recover small-coefficient terms in multiscale PDEs.
Singularity Formation: Synergy in Theoretical, Numerical and Machine Learning Approaches
math.NA 2026-04 unverdicted novelty 5.0

The work introduces a modulation-based analytical method for singularity proofs in singular PDEs and refines ML techniques like PINNs and KANs to identify blowup solutions, with application to the open 3D Keller-Segel...
Identifying Topological Invariants of Non-Hermitian Systems via Domain-Adaptive Multimodal Model for Mathematics
cond-mat.other 2026-04 unverdicted novelty 4.0

A multimodal model with Qwen Math backbone identifies topological invariants of non-Hermitian systems from eigenvalues and eigenvectors in momentum space.
Experimental Design for Missing Physics
stat.ML 2026-03 unverdicted novelty 4.0

A sequential experimental design technique discriminates between model structures from symbolic regression to discover missing physics in process systems such as bioreactors.

Reference graph

Works this paper leans on

106 extracted references · 106 canonical work pages · cited by 29 Pith papers

[1]

Running, Philadelphia, Pa.; London, 2004

Stephen Hawking.On the Shoulders of Giants: The Great Works of Physics and Astronomy. Running, Philadelphia, Pa.; London, 2004

work page 2004
[2]

Über eine Verbesserung der Wien’schen Spectralgleichung

Max Planck. Über eine Verbesserung der Wien’schen Spectralgleichung. Friedr. Vieweg & Sohn, 1900

work page 1900
[3]

Marco Virgolin and Solon P. Pissis. Symbolic Regression is NP-hard, July 2022. 18

work page 2022
[4]

Information processing, data inferences, and scientiﬁc generalization.Behav- ioral Science, 19(5):314–325, 1974

Donald Gerwin. Information processing, data inferences, and scientiﬁc generalization.Behav- ioral Science, 19(5):314–325, 1974

work page 1974
[5]

BACON: A production system that discovers empirical laws

Pat Langley. BACON: A production system that discovers empirical laws. InIJCAI, 1977

work page 1977
[6]

Rediscovering physics with BA- CON.3

Pat Langley. Rediscovering physics with BA- CON.3. InProceedings of the 6th International Joint Conference on Artiﬁcial Intelligence - Volume 1, IJCAI’79, pages 505–507, San Fran- cisco, CA, USA, 1979. Morgan Kaufmann Pub- lishers Inc

work page 1979
[7]

PatLangley, GaryL.Bradshaw, andHerbertA. Simon. BACON.5: The discovery of conserva- tion laws. InIJCAI, 1981

work page 1981
[8]

Pat Langley and Jan M. Zytkow. Data-driven approaches to empirical discovery.Artiﬁcial In- telligence, 40(1):283–312, 1989

work page 1989
[9]

John R. Koza. Genetic programming as a means for programming computers by natural selection. Statistics and Computing, 4(2):87– 112, June 1994

work page 1994
[10]

FromtheCover: Automated reverse engineering of nonlinear dy- namical systems

JoshBongardandHodLipson. FromtheCover: Automated reverse engineering of nonlinear dy- namical systems. Proceedings of the National Academy of Science, 104(24):9943–9948, June 2007

work page 2007
[11]

Distilling Free-Form Natural Laws from Experimental Data

Michael Schmidt and Hod Lipson. Distilling Free-Form Natural Laws from Experimental Data. Science, 324(5923):81–85, April 2009

work page 2009
[12]

Schmidt and H

M. Schmidt and H. Lipson. Symbolic regres- sion of implicit equations. Genetic Program- ming Theory and Practice VII, pages 73–85, 2010

work page 2010
[13]

Wagner and M

S. Wagner and M. Aﬀenzeller. HeuristicLab: A Generic and Extensible Optimization Envi- ronment. In Bernardete Ribeiro, Rudolf F. Al- brecht, Andrej Dobnikar, David W. Pearson, and Nigel C. Steele, editors,Adaptive and Nat- ural Computing Algorithms, pages 538–541, Vi- enna, 2005. Springer

work page 2005
[14]

Meyarivan

KalyanmoyDeb, SamirAgrawal, AmritPratap, and T. Meyarivan. A Fast Elitist Non- dominated Sorting Genetic Algorithm for Multi-objective Optimization: NSGA-II. In Marc Schoenauer, Kalyanmoy Deb, Günther Rudolph, XinYao, EvelyneLutton, JuanJulian Merelo, and Hans-Paul Schwefel, editors,Paral- lel Problem Solving from Nature PPSN VI, Lec- tureNotesinComputerS...

work page 2000
[15]

K. Deb, A. Pratap, S. Agarwal, and T. Meyari- van. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197, April 2002

work page 2002
[16]

Davidson, D.A

J.W. Davidson, D.A. Savic, and G.A. Wal- ters. Symbolic and numerical regression: Ex- periments and applications. Information Sci- ences, 150(1):95–117, 2003

work page 2003
[17]

Sean Bowman

Kyle Cranmer and R. Sean Bowman. Physic- sGP: A Genetic Programming approach to event selection. Computer Physics Communi- cations, 167(3):165–176, May 2005

work page 2005
[18]

Epsilon-Lexicase Selection for Re- gression

William La Cava, Lee Spector, and Kourosh Danai. Epsilon-Lexicase Selection for Re- gression. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 , GECCO ’16, pages 741–748, New York, NY, USA, July 2016. Association for Computing Machinery

work page 2016
[19]

William La Cava, Thomas Helmuth, Lee Spec- tor, and Jason H. Moore. A probabilistic and multi-objective analysis of lexicase selec- tion and epsilon-lexicase selection, April 2018

work page 2018
[20]

Marco Virgolin, Tanja Alderliesten, Cees Wit- teveen, and Peter A. N. Bosman. Scalable ge- netic programming by gene-pool optimal mix- ing and input-space entropy-based building- block learning. In Proceedings of the Ge- netic and Evolutionary Computation Confer- ence, GECCO ’17, pages 1041–1048, New York, NY, USA, July 2017. Association for Comput- ing Machinery

work page 2017
[21]

Marco Virgolin, Tanja Alderliesten, Cees Wit- teveen, and Peter A. N. Bosman. Improving Model-based Genetic Programming for Sym- bolic Regression of Small Expressions. Evo- 19 lutionary Computation, 29(2):211–237, June 2021

work page 2021
[22]

Cranmer, Rui Xu, Peter Battaglia, and Shirley Ho

Miles D. Cranmer, Rui Xu, Peter Battaglia, and Shirley Ho. Learning Symbolic Physics with Graph Networks. ML4Physics Workshop @ NeurIPS 2019, November 2019

work page 2019
[23]

Discovering Symbolic Models from Deep Learning with Inductive Bi- ases

Miles Cranmer, Alvaro Sanchez-Gonzalez, Pe- ter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. Discovering Symbolic Models from Deep Learning with Inductive Bi- ases. NeurIPS, June 2020

work page 2020
[24]

Dis- entangled Sparsity Networks for Explainable AI

Miles Cranmer, Can Cui, Drummond B Field- ing, Shirley Ho, Alvaro Sanchez-Gonzalez, Kimberly Stachenfeld, Tobias Pfaﬀ, et al. Dis- entangled Sparsity Networks for Explainable AI. Workshop on Sparse Neural Networks , page 7, July 2021

work page 2021
[25]

Bayesian Symbolic Regression

Ying Jin, Weilin Fu, Jian Kang, Jiadong Guo, and Jian Guo. Bayesian Symbolic Regression. January 2020

work page 2020
[26]

Massucci, Manuel Miranda, Jordi Pallarès, and Marta Sales- Pardo

Roger Guimerà, Ignasi Reichardt, Antoni Aguilar-Mogas, Francesco A. Massucci, Manuel Miranda, Jordi Pallarès, and Marta Sales- Pardo. A Bayesian machine scientist to aid in the solution of challenging scientiﬁc problems. Science Advances, 6(5), 2020

work page 2020
[27]

Petersen, Mikel Landajuela, T

Brenden K. Petersen, Mikel Landajuela, T. Nathan Mundhenk, Claudio P. Santiago, Soo K. Kim, and Joanne T. Kim. Deep sym- bolic regression: Recovering mathematical ex- pressions from data via risk-seeking policy gra- dients, April 2021

work page 2021
[28]

Neural-guided symbolic regression with asymptotic constraints

Li Li, Minjie Fan, Rishabh Singh, and Patrick Riley. Neural-guided symbolic regression with asymptotic constraints. arXiv preprint arXiv:1901.07714, 2019

work page arXiv 1901
[29]

Deep Symbolic Regression for Recurrent Se- quences, June 2022

Stéphaned’Ascoli, Pierre-AlexandreKamienny, Guillaume Lample, and François Charton. Deep Symbolic Regression for Recurrent Se- quences, June 2022

work page 2022
[30]

End- to-end symbolic regression with transformers

Pierre-AlexandreKamienny, Stéphaned’Ascoli, GuillaumeLample, andFrançoisCharton. End- to-end symbolic regression with transformers. April 2022

work page 2022
[31]

AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity

Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng, Orisvaldo Neto, Tailin Wu, and Max Tegmark. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity. In Advances in Neural Information Pro- cessing Systems, volume 33, pages 4860–4871. Curran Associates, Inc., 2020

work page 2020
[32]

AI poincaré: Machinelearningconservationlawsfromtrajec- tories

Ziming Liu and Max Tegmark. AI poincaré: Machinelearningconservationlawsfromtrajec- tories. arXiv e-prints, page arXiv:2011.04698, November 2020

work page arXiv 2011
[33]

Wetzel, Roger G

Sebastian J. Wetzel, Roger G. Melko, Joseph Scott, Maysum Panju, and Vijay Ganesh. Dis- covering symmetry invariants and conserved quantities by interpreting siamese neural net- works. Physical Review Research, 2(3):033499, September 2020

work page 2020
[34]

Learning Equations for Extrapola- tion and Control

Subham Sahoo, Christoph Lampert, and Georg Martius. Learning Equations for Extrapola- tion and Control. volume 80 of Proceedings of Machine Learning Research, pages 4442– 4450, Stockholmsmässan, Stockholm Sweden, July 2018. PMLR

work page 2018
[35]

Data-driven discovery of free-form governing diﬀerential equations.arXiv preprint arXiv:1910.05117, 2019

Steven Atkinson, Waad Subber, Liping Wang, Genghis Khan, Philippe Hawi, and Roger Ghanem. Data-driven discovery of free-form governing diﬀerential equations.arXiv preprint arXiv:1910.05117, 2019

work page arXiv 1910
[36]

Benchmarking of machine learning ocean subgrid parameterizations in an idealized model, October 2022

Andrew Slavin Ross, Ziwei Li, Pavel Perezhogin, Carlos Fernandez-Granda, and Laure Zanna. Benchmarking of machine learning ocean subgrid parameterizations in an idealized model, October 2022

work page 2022
[37]

Brameier and W

M. Brameier and W. Banzhaf. A comparison of linear genetic programming and neural net- works in medical data mining. IEEE Trans- actions on Evolutionary Computation, 5(1):17– 26, February 2001

work page 2001
[38]

Linear genetic programming for time-series modelling of daily ﬂow rate.Journal of Earth System Science, 118(2):137–146, April 2009

Aytac Guven. Linear genetic programming for time-series modelling of daily ﬂow rate.Journal of Earth System Science, 118(2):137–146, April 2009

work page 2009
[39]

Evolving symbolic density 20 functionals

He Ma, Arunachalam Narayanaswamy, Patrick Riley, and Li Li. Evolving symbolic density 20 functionals. Science Advances, 8(36):eabq0279, September 2022

work page 2022
[40]

Douglas Mota Dias and Marco Aurélio C. Pacheco. Describing Quantum-Inspired Linear Genetic Programming from symbolic regression problems. In 2012 IEEE Congress on Evolu- tionary Computation, pages 1–8, June 2012

work page 2012
[41]

Brunton, Joshua L

Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equa- tions from data by sparse identiﬁcation of non- linear dynamical systems. Proceedings of the National Academy of Sciences, 113(15):3932– 3937, 2016

work page 2016
[42]

Data-driven dis- covery of partial diﬀerential equations.Science Advances, 3(4):e1602614, 2017

Samuel H Rudy, Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Data-driven dis- covery of partial diﬀerential equations.Science Advances, 3(4):e1602614, 2017

work page 2017
[43]

Nathan Kutz, and Steven L

Kathleen Champion, Bethany Lusch, J. Nathan Kutz, and Steven L. Brunton. Data-driven dis- covery of coordinates and governing equations. arXiv e-prints, page arXiv:1904.02107, March 2019

work page arXiv 1904
[44]

Operon C++: An eﬃ- cient genetic programming framework for sym- bolic regression

Bogdan Burlacu, Gabriel Kronberger, and Michael Kommenda. Operon C++: An eﬃ- cient genetic programming framework for sym- bolic regression. In Proceedings of the 2020 Genetic and Evolutionary Computation Con- ference Companion, GECCO ’20, pages 1562– 1570, New York, NY, USA, July 2020. Associ- ation for Computing Machinery

work page 2020
[45]

FFX: Fast, Scalable, Deter- ministic Symbolic Regression Technology

Trent McConaghy. FFX: Fast, Scalable, Deter- ministic Symbolic Regression Technology. In Rick Riolo, Ekaterina Vladislavleva, and Ja- son H. Moore, editors, Genetic Programming Theory and Practice IX, Genetic and Evolu- tionary Computation, pages 235–260. Springer, New York, NY, 2011

work page 2011
[46]

Nathan Kutz, and Steven L

Bethany Lusch, J. Nathan Kutz, and Steven L. Brunton. Deep learning for universal linear em- beddings of nonlinear dynamics.Nature Com- munications, 9:4950, November 2018

work page 2018
[47]

DeepMoD: Deep learning for Model Discovery in noisy data

Gert-Jan Both, Subham Choudhury, Pierre Sens, and Remy Kusters. DeepMoD: Deep learning for Model Discovery in noisy data. 2019

work page 2019
[48]

Deep learningofphysicallawsfromscarcedata

Zhao Chen, Yang Liu, and Hao Sun. Deep learningofphysicallawsfromscarcedata. 2020

work page 2020
[49]

Universal differential equations for scientific machine learning.arXiv preprint arXiv:2001.04385, 2020

Christopher Rackauckas, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Ro- hit Supekar, Dominic Skinner, and Ali Ra- madhan. Universal diﬀerential equations for scientiﬁc machine learning. arXiv preprint arXiv:2001.04385, 2020

work page arXiv 2001
[50]

Sahinidis

Alison Cozad and Nikolaos V. Sahinidis. A global MINLP approach to symbolic regres- sion. Mathematical Programming, 170(1):97– 119, July 2018

work page 2018
[51]

A Greedy Search Tree Heuristic for Symbolic Regression.Infor- mation Sciences, 442–443:18–32, May 2018

Fabricio Olivetti de Franca. A Greedy Search Tree Heuristic for Symbolic Regression.Infor- mation Sciences, 442–443:18–32, May 2018

work page 2018
[52]

AI Descartes: Combining Data and Theory for Derivable Sci- entiﬁc Discovery.arXiv:2109.01634 [cs], Octo- ber 2021

Cristina Cornelio, Sanjeeb Dash, Vernon Aus- tel, Tyler Josephson, Joao Goncalves, Kenneth Clarkson, Nimrod Megiddo, et al. AI Descartes: Combining Data and Theory for Derivable Sci- entiﬁc Discovery.arXiv:2109.01634 [cs], Octo- ber 2021

work page arXiv 2021
[53]

A. M. Price-Whelan, B. M. Sip’ocz, H. M. G"unther, P. L. Lim, S. M. Crawford, S. Con- seil, D. L. Shupe, et al. The Astropy Project: Building an Open-science Project and Status of the v2.0 Core Package.aj, 156:123, September 2018

work page 2018
[54]

Price- Whelan, Pey Lian Lim, Nicholas Earl, Nathaniel Starkman, Larry Bradley, David L

Astropy Collaboration, Adrian M. Price- Whelan, Pey Lian Lim, Nicholas Earl, Nathaniel Starkman, Larry Bradley, David L. Shupe, et al. The Astropy Project: Sustain- ing and Growing a Community-oriented Open- source Project and the Latest Major Release (v5.0) of the Core Package.The Astrophysical Journal, 935:167, August 2022

work page 2022
[55]

PhD thesis, 1980

Anne Brindle.Genetic Algorithms for Function Optimization. PhD thesis, 1980

work page 1980
[56]

Goldberg and Kalyanmoy Deb

David E. Goldberg and Kalyanmoy Deb. A Comparative Analysis of Selection Schemes Used in Genetic Algorithms. In GREGORY J. E. Rawlins, editor, Foundations of Genetic 21 Algorithms, volume 1, pages 69–93. Elsevier, January 1991

work page 1991
[57]

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. Regularized Evolution for Im- age Classiﬁer Architecture Search.Proceedings of the AAAI Conference on Artiﬁcial Intelli- gence, 33(01):4780–4789, July 2019

work page 2019
[58]

AutoML-Zero: Evolving Machine Learn- ing Algorithms From Scratch

Esteban Real, Chen Liang, David So, and Quoc Le. AutoML-Zero: Evolving Machine Learn- ing Algorithms From Scratch. InInternational Conference on Machine Learning, pages 8007–

work page
[59]

Gregory S. Hornby. ALPS: The age-layered population structure for reducing the problem of premature convergence. InProceedings of the 8th Annual Conference on Genetic and Evolu- tionary Computation, GECCO ’06, pages 815– 822, New York, NY, USA, July 2006. Associa- tion for Computing Machinery

work page 2006
[60]

Schmidt and Hod Lipson

Michael D. Schmidt and Hod Lipson. Age- ﬁtness pareto optimization. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO ’10, pages 543–544, New York, NY, USA, July 2010. As- sociation for Computing Machinery

work page 2010
[61]

PySR: Fast & Parallelized Symbolic Regression in Python/Julia

Miles Cranmer. PySR: Fast & Parallelized Symbolic Regression in Python/Julia. Zenodo, September 2020

work page 2020
[62]

Kirkpatrick, C

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by Simulated Annealing.Science, 220:671–680, May 1983

work page 1983
[63]

C. G. Broyden. The convergence of a class of double-rank minimization algorithms 1. Gen- eral considerations. IMA Journal of Applied Mathematics, 6(1):76–90, March 1970

work page 1970
[64]

Optim: A mathematical optimization package for Julia.Journal of Open Source Soft- ware, 3(24):615, 2018

Patrick Kofod Mogensen and Asbjørn Nilsen Riseth. Optim: A mathematical optimization package for Julia.Journal of Open Source Soft- ware, 3(24):615, 2018

work page 2018
[65]

Alexander Topchy and W. F. Punch. Faster genetic programming based on local gradient search of numeric leaf values. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation, GECCO’01, pages 155–162, San Francisco, CA, USA, July 2001. Morgan Kaufmann Publishers Inc

work page 2001
[66]

Scikit- learn: Machine Learning in Python

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. Scikit- learn: Machine Learning in Python. J. Mach. Learn. Res., 12:2825–2830, November 2011

work page 2011
[67]

Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B

Aaron Meurer, Christopher P. Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B. Kirpichev, Matthew Rocklin, AMiT Kumar, et al. Sympy: symbolic computing in python. PeerJ Com- puter Science, 3:e103, January 2017

work page 2017
[68]

Harris, K

Charles R. Harris, K. Jarrod Millman, Sté- fan J. van der Walt, Ralf Gommers, Pauli Vir- tanen, David Cournapeau, Eric Wieser, et al. Array programming with NumPy. Nature, 585(7825):357–362, September 2020

work page 2020
[69]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32 , pages 8024–8035. Curran Assoc...

work page 2019
[70]

JAX: compos- able transformations of Python+NumPy pro- grams, 2018

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, et al. JAX: compos- able transformations of Python+NumPy pro- grams, 2018

work page 2018
[71]

Rediscovering Newton’s gravity and Solar System properties using deep learning and inductive biases

Pablo Lemos, Niall Jeﬀrey, Miles Cranmer, Pe- ter Battaglia, and Shirley Ho. Rediscovering Newton’s gravity and Solar System properties using deep learning and inductive biases. In submission, 2022

work page 2022
[72]

Whydotree-basedmodelsstillout- perform deep learning on tabular data?, July 2022

Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Whydotree-basedmodelsstillout- perform deep learning on tabular data?, July 2022

work page 2022
[73]

AI feynman: A physics-inspired method 22 for symbolic regression

Silviu-Marian Udrescu and Max Tegmark. AI feynman: A physics-inspired method 22 for symbolic regression. Science Advances, 6(16):eaay2631, 2020

work page 2020
[74]

Kaptanoglu, Brian M

Alan A. Kaptanoglu, Brian M. de Silva, Ur- ban Fasel, Kadierdan Kaheman, Andy J. Gold- schmidt, Jared L. Callaham, Charles B. De- lahunt, et al. PySINDy: A comprehensive Python package for robust sparse system iden- tiﬁcation. November 2021

work page 2021
[75]

An Approach to Sym- bolic Regression Using Feyn

Kevin René Broløs, Meera Vieira Machado, Chris Cave, Jaan Kasak, Valdemar Stentoft- Hansen, Victor Galindo Batanero, Tom Jelen, and Casper Wilstrup. An Approach to Sym- bolic Regression Using Feyn. April 2021

work page 2021
[76]

NguyenQuangUy, NguyenXuanHoai, Michael O’Neill, R. I. McKay, and Edgar Galván-López. Semantically-based crossover in genetic pro- gramming: Application to real-valued symbolic regression. Genetic Programming and Evolvable Machines, 12(2):91–119, June 2011

work page 2011
[77]

F. O. de Franca, M. Virgolin, M. Kommenda, M. S. Majumder, M. Cranmer, G. Espada, L. Ingelse, et al. Interpretable Symbolic Re- gression for Data Science: Analysis of the 2022 Competition, April 2023

work page 2022
[78]

Webplotdigitizer: Version 4.6, 2022

Ankit Rohatgi. Webplotdigitizer: Version 4.6, 2022

work page 2022
[79]

A relation between distance and radial velocity among extra-galactic neb- ulae

Edwin Hubble. A relation between distance and radial velocity among extra-galactic neb- ulae. Proceedings of the National Academy of Sciences, 15(3):168–173, March 1929

work page 1929
[80]

William La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabrício Olivetti de França, Marco Virgolin, Ying Jin, Michael Kommenda, and Jason H. Moore. Contemporary Symbolic Re- gression Methods and their Relative Perfor- mance, July 2021

work page 2021

Showing first 80 references.