Recognition: 2 theorem links
· Lean TheoremInterpretable Machine Learning for Science with PySR and SymbolicRegression.jl
Pith reviewed 2026-05-13 03:42 UTC · model grok-4.3
The pith
A multi-population evolutionary search recovers historical scientific equations from data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper states that its evolutionary algorithm, consisting of a multi-population search with an evolve-simplify-optimize loop and supported by a high-performance backend that fuses operators into fast kernels and computes derivatives automatically, recovers historical empirical equations when evaluated on the new EmpiricalBench benchmark of original and synthetic scientific datasets.
What carries the argument
The evolve-simplify-optimize loop inside a multi-population evolutionary algorithm that searches for symbolic expressions while tuning their scalar constants.
If this is right
- Scientists obtain human-readable models straight from observations instead of black-box predictors.
- Large datasets become tractable because populations of expressions can be distributed across clusters.
- Hybrid modeling is enabled by connecting the search to deep learning packages for initialization or guidance.
- Different symbolic regression approaches can be compared on a shared set of historical scientific cases.
- Interpretable alternatives become available in fields where understanding the form of the relation matters.
Where Pith is reading between the lines
- The approach could be applied to hypothesize functional forms in areas where no prior equation exists.
- Extending the benchmark to include differential equations or constrained systems would test broader utility.
- Real-time use during experiments might allow on-the-fly model refinement as data arrives.
- Combining the method with other data-driven techniques could accelerate discovery in under-theorized domains.
Load-bearing premise
Recovering known historical equations from original and synthetic data serves as a good test of whether the method will work for finding new equations in fresh scientific problems.
What would settle it
Failure of the search to produce any simple symbolic expression that matches a new, previously untested dataset from a current experiment where an empirical relation is expected.
read the original abstract
PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes PySR, an open-source Python library for symbolic regression in scientific applications, built on the high-performance Julia backend SymbolicRegression.jl. It details a multi-population evolutionary algorithm featuring an evolve-simplify-optimize loop to handle unknown scalar constants, along with runtime SIMD kernel fusion, automatic differentiation, distributed computing across clusters, and interfaces to deep learning packages. The work also introduces EmpiricalBench, a benchmark that quantifies applicability by measuring recovery rates of historical empirical equations from both original and synthetic datasets.
Significance. If the implementation details and benchmark results hold, the paper delivers a practical, accessible tool for interpretable machine learning that lowers barriers to symbolic regression in fields such as astrophysics. Notable strengths include the fully open-source release with reproducible code, the optimized distributed backend, and the introduction of EmpiricalBench as an externally defined evaluation framework grounded in real historical equations rather than synthetic toy problems.
major comments (1)
- [EmpiricalBench] In the section introducing EmpiricalBench: the central claim that recovery of historical empirical equations from original and synthetic datasets quantifies applicability for science rests on the untested assumption that these datasets (in noise level, dimensionality, operator sets, and selection biases) are representative of future unknown-equation discovery tasks. The manuscript provides no direct tests on contemporary problems where the true functional form is unknown a priori, which is load-bearing for the 'practical for science' assertion.
minor comments (2)
- [Abstract] Abstract: the interfaces with deep learning packages are mentioned but not enumerated; the main text should explicitly name them (e.g., PyTorch, TensorFlow) with version or usage details for reproducibility.
- [Software architecture] Software architecture section: clarify the exact mechanism by which user-defined operators are fused into SIMD kernels at runtime, including any constraints on operator arity or type.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation of the manuscript and for the constructive comment on EmpiricalBench. We address the point below and propose a targeted revision to clarify the benchmark's scope.
read point-by-point responses
-
Referee: In the section introducing EmpiricalBench: the central claim that recovery of historical empirical equations from original and synthetic datasets quantifies applicability for science rests on the untested assumption that these datasets (in noise level, dimensionality, operator sets, and selection biases) are representative of future unknown-equation discovery tasks. The manuscript provides no direct tests on contemporary problems where the true functional form is unknown a priori, which is load-bearing for the 'practical for science' assertion.
Authors: We appreciate the referee highlighting this assumption. EmpiricalBench was constructed precisely to provide a more grounded evaluation than purely synthetic toy problems by recovering equations that were historically discovered from real data. The benchmark incorporates both the original observational datasets (where they exist) and synthetically regenerated versions that preserve the functional form while allowing controlled variation in noise, dimensionality, and sampling. This design directly tests recovery under conditions mirroring past scientific discovery. We agree that no benchmark can exhaustively represent all possible future tasks, and that direct quantitative tests on problems whose functional form is truly unknown a priori are not possible within a recovery benchmark, because success cannot be measured without ground truth. In the revised manuscript we will add an explicit limitations paragraph to the EmpiricalBench section that discusses the representativeness assumptions regarding noise levels, dimensionality, operator sets, and selection biases, and that qualifies the 'practical for science' claim accordingly. This addition will make the scope of the benchmark transparent without altering the reported results. revision: partial
Circularity Check
No circularity: software description and external benchmark definition are self-contained
full rationale
The paper describes the PySR library implementation (multi-population evolutionary algorithm with evolve-simplify-optimize loop) and introduces EmpiricalBench as a new benchmark that measures recovery rates on historical empirical equations and their synthetic versions. No derivation, prediction, or central claim reduces by construction to fitted parameters, self-defined quantities, or a load-bearing self-citation chain. The benchmark is defined externally via known historical equations rather than derived from the algorithm's outputs, and the software claims rest on released code rather than internal redefinition. This is a standard methods/software paper with no self-referential reduction in its assertions.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
Cost.FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PySR is an open-source library for practical symbolic regression... built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages.
-
Foundation.HierarchyEmergencehierarchy_emergence_forces_phi unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we also introduce a new benchmark, 'EmpiricalBench,' to quantify the applicability of symbolic regression algorithms in science.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 31 Pith papers
-
SEVerA: Verified Synthesis of Self-Evolving Agents
SEVerA uses Formally Guarded Generative Models and a three-stage Search-Verification-Learning process to synthesize self-evolving agents that satisfy hard formal constraints while improving task performance.
-
KAN: Kolmogorov-Arnold Networks
KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.
-
The finite expression method for turbulent dynamics with high-order moment recovery
A two-stage symbolic regression plus generative model framework recovers governing interaction terms and forcing in stochastic triad models while accurately predicting statistical moments up to order five.
-
Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs
A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortio...
-
Reconstructing conformal field theoretical compositions with Transformers
Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.
-
Additive Atomic Forests for Symbolic Function and Antiderivative Discovery
A derivative algebra with EML and SOL primitives plus additive atomic forests enables simultaneous symbolic recovery of functions and antiderivatives from data, matching or exceeding XGBoost on 13 of 17 benchmarks wit...
-
Machine Collective Intelligence for Explainable Scientific Discovery
Machine collective intelligence uses coordinated AI agents to evolve symbolic hypotheses and recover governing equations from observations in deterministic, stochastic, and uncharacterized systems, achieving up to six...
-
Neuro-Symbolic ODE Discovery with Latent Grammar Flow
Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by d...
-
First observational constraints on cosmic backreaction over an extended redshift range
First direct constraints on total cosmic backreaction over a significant redshift range are consistent with vanishing backreaction within 1 sigma but are too weak to exclude meaningful backreaction.
-
LLM-ODE: Data-driven Discovery of Dynamical Systems with Large Language Models
LLM-ODE integrates large language models into genetic programming to guide symbolic search for governing equations of dynamical systems, outperforming classical GP on 91 test cases in efficiency and solution quality.
-
In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks
In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.
-
AlphaEvolve: A coding agent for scientific and algorithmic discovery
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, ...
-
Primordial Black Hole from Tensor-induced Density Fluctuation: First-order Phase Transitions and Domain Walls
Tensor perturbations from first-order phase transitions and domain wall annihilation induce curvature fluctuations at second order that form primordial black holes, allowing asteroid-mass PBHs to comprise all dark mat...
-
FePySR: A Neural Feature Extraction Framework for Efficient and Scalable Symbolic Regression
FePySR uses a neural network to pre-extract valid features before PySR search, recovering more equations than baselines on benchmarks and identifying governing ODEs in 24 of 100 biological cases where PySR finds none.
-
GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing
GESR uses two BERT models to intelligently direct mutations and crossovers inside genetic programming, yielding higher efficiency and competitive accuracy on symbolic regression benchmarks.
-
GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing
GESR uses BERT models as guided 'gene editors' within genetic programming to direct mutations and crossovers, yielding higher efficiency and competitive performance on symbolic regression benchmarks.
-
GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing
GESR uses two BERT models to intelligently guide mutations and crossovers in genetic programming for symbolic regression, claiming better efficiency than standard GP.
-
Discovery of Nonlinear Dynamics with Automated Basis Function Generation
AutoSINDy automatically builds a tailored basis library from PySR symbolic regression and applies SINDy to recover ground-truth nonlinear dynamics with 92.8% success under noise.
-
Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation
DoLQ employs a sampler agent, parameter optimizer, and LLM-based scientist agent to iteratively propose, refine, and evaluate ODE candidates, yielding higher success rates and better symbolic term recovery than prior ...
-
Programmatic Context Augmentation for LLM-based Symbolic Regression
Programmatic context augmentation lets LLM-based symbolic regression perform code-driven data analysis during search, yielding superior efficiency and accuracy over baselines on LLM-SRBench.
-
Interpretable Analytic Formulae for GWTC-4 Binary Black Hole Population Properties via Symbolic Regression
Symbolic regression on GWTC-4 posteriors yields closed-form analytic formulae for merger-rate evolution, effective-spin dependencies on mass ratio and redshift, and conditional mass-ratio distributions at specific pri...
-
Physics-Informed Neural Networks for Biological $2\mathrm{D}{+}t$ Reaction-Diffusion Systems
BINNs are extended to 2D+t systems and combined with symbolic regression to recover reaction-diffusion models of lung cancer cell dynamics from time-lapse microscopy data.
-
Machine Learning Hamiltonian Dynamical Systems with Sparse and Noisy Data
ASRNNs recover Hamiltonian dynamics and symbolic equations from trajectories with only two irregularly spaced noisy points by preserving symplectic structure without derivative estimation.
-
Discovering quantum phenomena with Interpretable Machine Learning
Variational autoencoders combined with symbolic regression extract physically meaningful representations and order parameters from raw quantum measurement data, revealing new phenomena such as corner-ordering in Rydbe...
-
Into the Gompverse: A robust Gompertzian reionization model for CMB analyses
A Gompertzian reionization model with three nuisance parameters demotes optical depth to a derived quantity, reducing its uncertainty by a factor of three and revealing potential neutrino mass tension in CMB analyses.
-
Model-independent constraints on generalized FLRW consistency relations with bootstrap-based symbolic regression
Bootstrap-based symbolic regression on supernova and BAO data finds mild 2-4 sigma deviations from FLRW consistency relations, which if real would rule out most FLRW-based solutions to cosmological tensions.
-
Discovery of Interpretable Surrogates via Agentic AI: Application to Gravitational Waves
GWAgent agentic workflow produces analytic surrogates for eccentric BBH waveforms with 6.9e-4 median mismatch and 8.4x speedup, outperforming baselines, and infers eccentricity for GW200129.
-
Balance-Guided Sparse Identification of Multiscale Nonlinear PDEs with Small-coefficient Terms
BG-SINDy reformulates l0-constrained regression as term-level l2,0 regularization and uses progressive pruning guided by balance contributions to recover small-coefficient terms in multiscale PDEs.
-
Singularity Formation: Synergy in Theoretical, Numerical and Machine Learning Approaches
The work introduces a modulation-based analytical method for singularity proofs in singular PDEs and refines ML techniques like PINNs and KANs to identify blowup solutions, with application to the open 3D Keller-Segel...
-
Identifying Topological Invariants of Non-Hermitian Systems via Domain-Adaptive Multimodal Model for Mathematics
A multimodal model with Qwen Math backbone identifies topological invariants of non-Hermitian systems from eigenvalues and eigenvectors in momentum space.
-
Experimental Design for Missing Physics
A sequential experimental design technique discriminates between model structures from symbolic regression to discover missing physics in process systems such as bioreactors.
Reference graph
Works this paper leans on
-
[1]
Running, Philadelphia, Pa.; London, 2004
Stephen Hawking.On the Shoulders of Giants: The Great Works of Physics and Astronomy. Running, Philadelphia, Pa.; London, 2004
work page 2004
-
[2]
Über eine Verbesserung der Wien’schen Spectralgleichung
Max Planck. Über eine Verbesserung der Wien’schen Spectralgleichung. Friedr. Vieweg & Sohn, 1900
work page 1900
-
[3]
Marco Virgolin and Solon P. Pissis. Symbolic Regression is NP-hard, July 2022. 18
work page 2022
-
[4]
Donald Gerwin. Information processing, data inferences, and scientific generalization.Behav- ioral Science, 19(5):314–325, 1974
work page 1974
-
[5]
BACON: A production system that discovers empirical laws
Pat Langley. BACON: A production system that discovers empirical laws. InIJCAI, 1977
work page 1977
-
[6]
Rediscovering physics with BA- CON.3
Pat Langley. Rediscovering physics with BA- CON.3. InProceedings of the 6th International Joint Conference on Artificial Intelligence - Volume 1, IJCAI’79, pages 505–507, San Fran- cisco, CA, USA, 1979. Morgan Kaufmann Pub- lishers Inc
work page 1979
-
[7]
PatLangley, GaryL.Bradshaw, andHerbertA. Simon. BACON.5: The discovery of conserva- tion laws. InIJCAI, 1981
work page 1981
-
[8]
Pat Langley and Jan M. Zytkow. Data-driven approaches to empirical discovery.Artificial In- telligence, 40(1):283–312, 1989
work page 1989
-
[9]
John R. Koza. Genetic programming as a means for programming computers by natural selection. Statistics and Computing, 4(2):87– 112, June 1994
work page 1994
-
[10]
FromtheCover: Automated reverse engineering of nonlinear dy- namical systems
JoshBongardandHodLipson. FromtheCover: Automated reverse engineering of nonlinear dy- namical systems. Proceedings of the National Academy of Science, 104(24):9943–9948, June 2007
work page 2007
-
[11]
Distilling Free-Form Natural Laws from Experimental Data
Michael Schmidt and Hod Lipson. Distilling Free-Form Natural Laws from Experimental Data. Science, 324(5923):81–85, April 2009
work page 2009
-
[12]
M. Schmidt and H. Lipson. Symbolic regres- sion of implicit equations. Genetic Program- ming Theory and Practice VII, pages 73–85, 2010
work page 2010
-
[13]
S. Wagner and M. Affenzeller. HeuristicLab: A Generic and Extensible Optimization Envi- ronment. In Bernardete Ribeiro, Rudolf F. Al- brecht, Andrej Dobnikar, David W. Pearson, and Nigel C. Steele, editors,Adaptive and Nat- ural Computing Algorithms, pages 538–541, Vi- enna, 2005. Springer
work page 2005
-
[14]
KalyanmoyDeb, SamirAgrawal, AmritPratap, and T. Meyarivan. A Fast Elitist Non- dominated Sorting Genetic Algorithm for Multi-objective Optimization: NSGA-II. In Marc Schoenauer, Kalyanmoy Deb, Günther Rudolph, XinYao, EvelyneLutton, JuanJulian Merelo, and Hans-Paul Schwefel, editors,Paral- lel Problem Solving from Nature PPSN VI, Lec- tureNotesinComputerS...
work page 2000
-
[15]
K. Deb, A. Pratap, S. Agarwal, and T. Meyari- van. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197, April 2002
work page 2002
-
[16]
J.W. Davidson, D.A. Savic, and G.A. Wal- ters. Symbolic and numerical regression: Ex- periments and applications. Information Sci- ences, 150(1):95–117, 2003
work page 2003
-
[17]
Kyle Cranmer and R. Sean Bowman. Physic- sGP: A Genetic Programming approach to event selection. Computer Physics Communi- cations, 167(3):165–176, May 2005
work page 2005
-
[18]
Epsilon-Lexicase Selection for Re- gression
William La Cava, Lee Spector, and Kourosh Danai. Epsilon-Lexicase Selection for Re- gression. In Proceedings of the Genetic and Evolutionary Computation Conference 2016 , GECCO ’16, pages 741–748, New York, NY, USA, July 2016. Association for Computing Machinery
work page 2016
-
[19]
William La Cava, Thomas Helmuth, Lee Spec- tor, and Jason H. Moore. A probabilistic and multi-objective analysis of lexicase selec- tion and epsilon-lexicase selection, April 2018
work page 2018
-
[20]
Marco Virgolin, Tanja Alderliesten, Cees Wit- teveen, and Peter A. N. Bosman. Scalable ge- netic programming by gene-pool optimal mix- ing and input-space entropy-based building- block learning. In Proceedings of the Ge- netic and Evolutionary Computation Confer- ence, GECCO ’17, pages 1041–1048, New York, NY, USA, July 2017. Association for Comput- ing Machinery
work page 2017
-
[21]
Marco Virgolin, Tanja Alderliesten, Cees Wit- teveen, and Peter A. N. Bosman. Improving Model-based Genetic Programming for Sym- bolic Regression of Small Expressions. Evo- 19 lutionary Computation, 29(2):211–237, June 2021
work page 2021
-
[22]
Cranmer, Rui Xu, Peter Battaglia, and Shirley Ho
Miles D. Cranmer, Rui Xu, Peter Battaglia, and Shirley Ho. Learning Symbolic Physics with Graph Networks. ML4Physics Workshop @ NeurIPS 2019, November 2019
work page 2019
-
[23]
Discovering Symbolic Models from Deep Learning with Inductive Bi- ases
Miles Cranmer, Alvaro Sanchez-Gonzalez, Pe- ter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. Discovering Symbolic Models from Deep Learning with Inductive Bi- ases. NeurIPS, June 2020
work page 2020
-
[24]
Dis- entangled Sparsity Networks for Explainable AI
Miles Cranmer, Can Cui, Drummond B Field- ing, Shirley Ho, Alvaro Sanchez-Gonzalez, Kimberly Stachenfeld, Tobias Pfaff, et al. Dis- entangled Sparsity Networks for Explainable AI. Workshop on Sparse Neural Networks , page 7, July 2021
work page 2021
-
[25]
Ying Jin, Weilin Fu, Jian Kang, Jiadong Guo, and Jian Guo. Bayesian Symbolic Regression. January 2020
work page 2020
-
[26]
Massucci, Manuel Miranda, Jordi Pallarès, and Marta Sales- Pardo
Roger Guimerà, Ignasi Reichardt, Antoni Aguilar-Mogas, Francesco A. Massucci, Manuel Miranda, Jordi Pallarès, and Marta Sales- Pardo. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Science Advances, 6(5), 2020
work page 2020
-
[27]
Brenden K. Petersen, Mikel Landajuela, T. Nathan Mundhenk, Claudio P. Santiago, Soo K. Kim, and Joanne T. Kim. Deep sym- bolic regression: Recovering mathematical ex- pressions from data via risk-seeking policy gra- dients, April 2021
work page 2021
-
[28]
Neural-guided symbolic regression with asymptotic constraints
Li Li, Minjie Fan, Rishabh Singh, and Patrick Riley. Neural-guided symbolic regression with asymptotic constraints. arXiv preprint arXiv:1901.07714, 2019
-
[29]
Deep Symbolic Regression for Recurrent Se- quences, June 2022
Stéphaned’Ascoli, Pierre-AlexandreKamienny, Guillaume Lample, and François Charton. Deep Symbolic Regression for Recurrent Se- quences, June 2022
work page 2022
-
[30]
End- to-end symbolic regression with transformers
Pierre-AlexandreKamienny, Stéphaned’Ascoli, GuillaumeLample, andFrançoisCharton. End- to-end symbolic regression with transformers. April 2022
work page 2022
-
[31]
AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity
Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng, Orisvaldo Neto, Tailin Wu, and Max Tegmark. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modular- ity. In Advances in Neural Information Pro- cessing Systems, volume 33, pages 4860–4871. Curran Associates, Inc., 2020
work page 2020
-
[32]
AI poincaré: Machinelearningconservationlawsfromtrajec- tories
Ziming Liu and Max Tegmark. AI poincaré: Machinelearningconservationlawsfromtrajec- tories. arXiv e-prints, page arXiv:2011.04698, November 2020
-
[33]
Sebastian J. Wetzel, Roger G. Melko, Joseph Scott, Maysum Panju, and Vijay Ganesh. Dis- covering symmetry invariants and conserved quantities by interpreting siamese neural net- works. Physical Review Research, 2(3):033499, September 2020
work page 2020
-
[34]
Learning Equations for Extrapola- tion and Control
Subham Sahoo, Christoph Lampert, and Georg Martius. Learning Equations for Extrapola- tion and Control. volume 80 of Proceedings of Machine Learning Research, pages 4442– 4450, Stockholmsmässan, Stockholm Sweden, July 2018. PMLR
work page 2018
-
[35]
Steven Atkinson, Waad Subber, Liping Wang, Genghis Khan, Philippe Hawi, and Roger Ghanem. Data-driven discovery of free-form governing differential equations.arXiv preprint arXiv:1910.05117, 2019
-
[36]
Benchmarking of machine learning ocean subgrid parameterizations in an idealized model, October 2022
Andrew Slavin Ross, Ziwei Li, Pavel Perezhogin, Carlos Fernandez-Granda, and Laure Zanna. Benchmarking of machine learning ocean subgrid parameterizations in an idealized model, October 2022
work page 2022
-
[37]
M. Brameier and W. Banzhaf. A comparison of linear genetic programming and neural net- works in medical data mining. IEEE Trans- actions on Evolutionary Computation, 5(1):17– 26, February 2001
work page 2001
-
[38]
Aytac Guven. Linear genetic programming for time-series modelling of daily flow rate.Journal of Earth System Science, 118(2):137–146, April 2009
work page 2009
-
[39]
Evolving symbolic density 20 functionals
He Ma, Arunachalam Narayanaswamy, Patrick Riley, and Li Li. Evolving symbolic density 20 functionals. Science Advances, 8(36):eabq0279, September 2022
work page 2022
-
[40]
Douglas Mota Dias and Marco Aurélio C. Pacheco. Describing Quantum-Inspired Linear Genetic Programming from symbolic regression problems. In 2012 IEEE Congress on Evolu- tionary Computation, pages 1–8, June 2012
work page 2012
-
[41]
Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equa- tions from data by sparse identification of non- linear dynamical systems. Proceedings of the National Academy of Sciences, 113(15):3932– 3937, 2016
work page 2016
-
[42]
Data-driven dis- covery of partial differential equations.Science Advances, 3(4):e1602614, 2017
Samuel H Rudy, Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Data-driven dis- covery of partial differential equations.Science Advances, 3(4):e1602614, 2017
work page 2017
-
[43]
Kathleen Champion, Bethany Lusch, J. Nathan Kutz, and Steven L. Brunton. Data-driven dis- covery of coordinates and governing equations. arXiv e-prints, page arXiv:1904.02107, March 2019
-
[44]
Operon C++: An effi- cient genetic programming framework for sym- bolic regression
Bogdan Burlacu, Gabriel Kronberger, and Michael Kommenda. Operon C++: An effi- cient genetic programming framework for sym- bolic regression. In Proceedings of the 2020 Genetic and Evolutionary Computation Con- ference Companion, GECCO ’20, pages 1562– 1570, New York, NY, USA, July 2020. Associ- ation for Computing Machinery
work page 2020
-
[45]
FFX: Fast, Scalable, Deter- ministic Symbolic Regression Technology
Trent McConaghy. FFX: Fast, Scalable, Deter- ministic Symbolic Regression Technology. In Rick Riolo, Ekaterina Vladislavleva, and Ja- son H. Moore, editors, Genetic Programming Theory and Practice IX, Genetic and Evolu- tionary Computation, pages 235–260. Springer, New York, NY, 2011
work page 2011
-
[46]
Bethany Lusch, J. Nathan Kutz, and Steven L. Brunton. Deep learning for universal linear em- beddings of nonlinear dynamics.Nature Com- munications, 9:4950, November 2018
work page 2018
-
[47]
DeepMoD: Deep learning for Model Discovery in noisy data
Gert-Jan Both, Subham Choudhury, Pierre Sens, and Remy Kusters. DeepMoD: Deep learning for Model Discovery in noisy data. 2019
work page 2019
-
[48]
Deep learningofphysicallawsfromscarcedata
Zhao Chen, Yang Liu, and Hao Sun. Deep learningofphysicallawsfromscarcedata. 2020
work page 2020
-
[49]
Christopher Rackauckas, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Ro- hit Supekar, Dominic Skinner, and Ali Ra- madhan. Universal differential equations for scientific machine learning. arXiv preprint arXiv:2001.04385, 2020
- [50]
-
[51]
Fabricio Olivetti de Franca. A Greedy Search Tree Heuristic for Symbolic Regression.Infor- mation Sciences, 442–443:18–32, May 2018
work page 2018
-
[52]
Cristina Cornelio, Sanjeeb Dash, Vernon Aus- tel, Tyler Josephson, Joao Goncalves, Kenneth Clarkson, Nimrod Megiddo, et al. AI Descartes: Combining Data and Theory for Derivable Sci- entific Discovery.arXiv:2109.01634 [cs], Octo- ber 2021
-
[53]
A. M. Price-Whelan, B. M. Sip’ocz, H. M. G"unther, P. L. Lim, S. M. Crawford, S. Con- seil, D. L. Shupe, et al. The Astropy Project: Building an Open-science Project and Status of the v2.0 Core Package.aj, 156:123, September 2018
work page 2018
-
[54]
Price- Whelan, Pey Lian Lim, Nicholas Earl, Nathaniel Starkman, Larry Bradley, David L
Astropy Collaboration, Adrian M. Price- Whelan, Pey Lian Lim, Nicholas Earl, Nathaniel Starkman, Larry Bradley, David L. Shupe, et al. The Astropy Project: Sustain- ing and Growing a Community-oriented Open- source Project and the Latest Major Release (v5.0) of the Core Package.The Astrophysical Journal, 935:167, August 2022
work page 2022
-
[55]
Anne Brindle.Genetic Algorithms for Function Optimization. PhD thesis, 1980
work page 1980
-
[56]
David E. Goldberg and Kalyanmoy Deb. A Comparative Analysis of Selection Schemes Used in Genetic Algorithms. In GREGORY J. E. Rawlins, editor, Foundations of Genetic 21 Algorithms, volume 1, pages 69–93. Elsevier, January 1991
work page 1991
-
[57]
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. Regularized Evolution for Im- age Classifier Architecture Search.Proceedings of the AAAI Conference on Artificial Intelli- gence, 33(01):4780–4789, July 2019
work page 2019
-
[58]
AutoML-Zero: Evolving Machine Learn- ing Algorithms From Scratch
Esteban Real, Chen Liang, David So, and Quoc Le. AutoML-Zero: Evolving Machine Learn- ing Algorithms From Scratch. InInternational Conference on Machine Learning, pages 8007–
-
[59]
Gregory S. Hornby. ALPS: The age-layered population structure for reducing the problem of premature convergence. InProceedings of the 8th Annual Conference on Genetic and Evolu- tionary Computation, GECCO ’06, pages 815– 822, New York, NY, USA, July 2006. Associa- tion for Computing Machinery
work page 2006
-
[60]
Michael D. Schmidt and Hod Lipson. Age- fitness pareto optimization. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO ’10, pages 543–544, New York, NY, USA, July 2010. As- sociation for Computing Machinery
work page 2010
-
[61]
PySR: Fast & Parallelized Symbolic Regression in Python/Julia
Miles Cranmer. PySR: Fast & Parallelized Symbolic Regression in Python/Julia. Zenodo, September 2020
work page 2020
-
[62]
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by Simulated Annealing.Science, 220:671–680, May 1983
work page 1983
-
[63]
C. G. Broyden. The convergence of a class of double-rank minimization algorithms 1. Gen- eral considerations. IMA Journal of Applied Mathematics, 6(1):76–90, March 1970
work page 1970
-
[64]
Patrick Kofod Mogensen and Asbjørn Nilsen Riseth. Optim: A mathematical optimization package for Julia.Journal of Open Source Soft- ware, 3(24):615, 2018
work page 2018
-
[65]
Alexander Topchy and W. F. Punch. Faster genetic programming based on local gradient search of numeric leaf values. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation, GECCO’01, pages 155–162, San Francisco, CA, USA, July 2001. Morgan Kaufmann Publishers Inc
work page 2001
-
[66]
Scikit- learn: Machine Learning in Python
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. Scikit- learn: Machine Learning in Python. J. Mach. Learn. Res., 12:2825–2830, November 2011
work page 2011
-
[67]
Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B
Aaron Meurer, Christopher P. Smith, Mateusz Paprocki, Ondřej Čertík, Sergey B. Kirpichev, Matthew Rocklin, AMiT Kumar, et al. Sympy: symbolic computing in python. PeerJ Com- puter Science, 3:e103, January 2017
work page 2017
- [68]
-
[69]
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32 , pages 8024–8035. Curran Assoc...
work page 2019
-
[70]
JAX: compos- able transformations of Python+NumPy pro- grams, 2018
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, et al. JAX: compos- able transformations of Python+NumPy pro- grams, 2018
work page 2018
-
[71]
Rediscovering Newton’s gravity and Solar System properties using deep learning and inductive biases
Pablo Lemos, Niall Jeffrey, Miles Cranmer, Pe- ter Battaglia, and Shirley Ho. Rediscovering Newton’s gravity and Solar System properties using deep learning and inductive biases. In submission, 2022
work page 2022
-
[72]
Whydotree-basedmodelsstillout- perform deep learning on tabular data?, July 2022
Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Whydotree-basedmodelsstillout- perform deep learning on tabular data?, July 2022
work page 2022
-
[73]
AI feynman: A physics-inspired method 22 for symbolic regression
Silviu-Marian Udrescu and Max Tegmark. AI feynman: A physics-inspired method 22 for symbolic regression. Science Advances, 6(16):eaay2631, 2020
work page 2020
-
[74]
Alan A. Kaptanoglu, Brian M. de Silva, Ur- ban Fasel, Kadierdan Kaheman, Andy J. Gold- schmidt, Jared L. Callaham, Charles B. De- lahunt, et al. PySINDy: A comprehensive Python package for robust sparse system iden- tification. November 2021
work page 2021
-
[75]
An Approach to Sym- bolic Regression Using Feyn
Kevin René Broløs, Meera Vieira Machado, Chris Cave, Jaan Kasak, Valdemar Stentoft- Hansen, Victor Galindo Batanero, Tom Jelen, and Casper Wilstrup. An Approach to Sym- bolic Regression Using Feyn. April 2021
work page 2021
-
[76]
NguyenQuangUy, NguyenXuanHoai, Michael O’Neill, R. I. McKay, and Edgar Galván-López. Semantically-based crossover in genetic pro- gramming: Application to real-valued symbolic regression. Genetic Programming and Evolvable Machines, 12(2):91–119, June 2011
work page 2011
-
[77]
F. O. de Franca, M. Virgolin, M. Kommenda, M. S. Majumder, M. Cranmer, G. Espada, L. Ingelse, et al. Interpretable Symbolic Re- gression for Data Science: Analysis of the 2022 Competition, April 2023
work page 2022
-
[78]
Webplotdigitizer: Version 4.6, 2022
Ankit Rohatgi. Webplotdigitizer: Version 4.6, 2022
work page 2022
-
[79]
A relation between distance and radial velocity among extra-galactic neb- ulae
Edwin Hubble. A relation between distance and radial velocity among extra-galactic neb- ulae. Proceedings of the National Academy of Sciences, 15(3):168–173, March 1929
work page 1929
-
[80]
William La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabrício Olivetti de França, Marco Virgolin, Ying Jin, Michael Kommenda, and Jason H. Moore. Contemporary Symbolic Re- gression Methods and their Relative Perfor- mance, July 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.