hub Mixed citations

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Miles Cranmer (Princeton University, Flatiron Institute) · 2023 · astro-ph.IM · arXiv 2305.01582

Mixed citation behavior. Most common role is background (40%).

76 Pith papers citing it

Background 40% of classified citations

open full Pith review browse 76 citing papers arXiv PDF

abstract

PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 method 6 baseline 2

citation-polarity summary

background 6 use method 6 baseline 2 unclear 1

representative citing papers

SEVerA: Verified Synthesis of Self-Evolving Agents

cs.LG · 2026-03-26 · unverdicted · novelty 8.0

SEVerA uses Formally Guarded Generative Models and a three-stage Search-Verification-Learning process to synthesize self-evolving agents that satisfy hard formal constraints while improving task performance.

Evaluating Large Language Models in Scientific Discovery

cs.AI · 2025-12-17 · unverdicted · novelty 8.0

The SDE benchmark shows LLMs lag on scientific discovery tasks relative to general science tests, with diminishing scaling returns and shared weaknesses across models.

KAN: Kolmogorov-Arnold Networks

cs.LG · 2024-04-30 · conditional · novelty 8.0

KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.

LLM-Guided ODE Discovery and Parameter Inference from Small-Cohort Aggregate Data

cs.LG · 2026-07-01 · unverdicted · novelty 7.0

AgentODE uses LLMs to discover ODE structures and infer parameter distributions from aggregate data, recovering consistent structures on benchmarks and RDEB clinical data with 231 observations from 46 patients.

Beating micromagnetic limits on skyrmion stability by long-range frustration

cond-mat.mes-hall · 2026-06-30 · unverdicted · novelty 7.0

Long-range exchange frustration in atomistic spin-lattice models can double skyrmion collapse barriers while keeping micromagnetic parameters fixed, revealing a limitation of continuum approximations.

Pathway variability, coat stiffening and mechanical adaptation during clathrin-mediated endocytosis

q-bio.SC · 2026-06-29 · unverdicted · novelty 7.0

Hybrid simulation and non-Euclidean elasticity theory demonstrate that clathrin coats develop adaptive rigidity and memory during growth, producing flat, stalled, or closed outcomes through two energy-landscape gates and matching experiments without fitted parameters.

Sample Complexity of Scientific Discovery: PAC Learnability of Compositional Function Trees

cs.LG · 2026-06-28 · unverdicted · novelty 7.0

Proves that Rademacher complexity of depth-d compositional trees over finite operator vocabulary is controlled by (K b L)^{d} / sqrt(n) under Lipschitz conditions on operators.

Tearing Instability in Gyrotropic MHD: Effects of Equilibrium Pressure Anisotropy

physics.plasm-ph · 2026-06-21 · unverdicted · novelty 7.0

Equilibrium pressure anisotropy modifies the tearing-mode growth-rate prefactor through parameters A and R0 while retaining the S^{-1/2} Lundquist scaling in gyrotropic MHD.

Learning the Universe: The Structure of Dust Attenuation Curves in Galaxy Simulations

astro-ph.GA · 2026-06-08 · unverdicted · novelty 7.0

Four parameters suffice to describe dust attenuation curve diversity in TNG simulations, yielding a new symbolic-regression model that recovers curves and fluxes better than existing parameterizations while linking parameters to SFR surface density, metallicity, and geometry.

FunctionEvolve: Structure-Guided Symbolic Regression with LLMs

cs.LG · 2026-06-05 · unverdicted · novelty 7.0

FunctionEvolve recovers 107 exact symbolic forms out of 129 synthetic tasks (82.9% SA@50) by using expression-tree structure for evolutionary search, parent selection, mutation, and coefficient scoring with LLMs.

Discovering Thermodynamically Admissible Dissipation Potentials via Grammar-Based Symbolic Regression

cond-mat.soft · 2026-05-29 · unverdicted · novelty 7.0

A convexity-preserving grammar enables symbolic regression to discover thermodynamically admissible dissipation potentials for generalized standard materials from noisy data.

Symbolic Regression via Latent Iterative Refinement

cs.LG · 2026-05-26 · unverdicted · novelty 7.0

LEE performs iterative amortized inference in a functionally grounded latent space to produce 2-10x simpler symbolic expressions than strong baselines on SRBench.

The Neural Compiler: Program-to-Network Translation for Hybrid Scientific Machine Learning

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

The Neural Compiler converts symbolic programs into exact differentiable PyTorch modules for hybrid scientific machine learning, enabling precise encoding of known physics with few trainable parameters.

Symbolic Classification-Enabled LHC Limits Online BSM Global Fits

hep-ph · 2026-05-21 · unverdicted · novelty 7.0

Symbolic regression produces an approximate classifier for LHC exclusion limits that enables their direct inclusion during pMSSM global fits.

Graph-based automated discovery of concise soil hydraulic functions from data: beyond the Mualem - van Genuchten model

physics.flu-dyn · 2026-05-19 · unverdicted · novelty 7.0

A graph-based automated model discovery framework identifies new concise soil hydraulic functions from data that outperform the Mualem-van Genuchten model across 249 soil samples.

Diversified Residual Symbolic Regression

cs.NE · 2026-05-15 · unverdicted · novelty 7.0

DRSR uses Quality-Diversity to produce diverse symbolic regression expressions differing in residual distributions, enabling post-search selection on synthetic and astronomical data.

The finite expression method for turbulent dynamics with high-order moment recovery

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

A two-stage symbolic regression plus generative model framework recovers governing interaction terms and forcing in stochastic triad models while accurately predicting statistical moments up to order five.

Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortion-compression trade-off.

Reconstructing conformal field theoretical compositions with Transformers

hep-th · 2026-05-01 · unverdicted · novelty 7.0

Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.

Additive Atomic Forests for Symbolic Function and Antiderivative Discovery

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

A derivative algebra with EML and SOL primitives plus additive atomic forests enables simultaneous symbolic recovery of functions and antiderivatives from data, matching or exceeding XGBoost on 13 of 17 benchmarks with interpretable formulas.

Machine Collective Intelligence for Explainable Scientific Discovery

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

Machine collective intelligence uses coordinated AI agents to evolve symbolic hypotheses and recover governing equations from observations in deterministic, stochastic, and uncharacterized systems, achieving up to six orders of magnitude better extrapolation than neural networks with 5-40 parameters

Neuro-Symbolic ODE Discovery with Latent Grammar Flow

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by data fit and constraints.

LLM-ODE: Data-driven Discovery of Dynamical Systems with Large Language Models

cs.LG · 2026-03-21 · unverdicted · novelty 7.0

LLM-ODE integrates large language models into genetic programming to guide symbolic search for governing equations of dynamical systems, outperforming classical GP on 91 test cases in efficiency and solution quality.

In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks

cs.LG · 2026-03-16 · unverdicted · novelty 7.0

In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.

citing papers explorer

Showing 24 of 24 citing papers after filters.

SEVerA: Verified Synthesis of Self-Evolving Agents cs.LG · 2026-03-26 · unverdicted · none · ref 8 · internal anchor
SEVerA uses Formally Guarded Generative Models and a three-stage Search-Verification-Learning process to synthesize self-evolving agents that satisfy hard formal constraints while improving task performance.
LLM-Guided ODE Discovery and Parameter Inference from Small-Cohort Aggregate Data cs.LG · 2026-07-01 · unverdicted · none · ref 44 · internal anchor
AgentODE uses LLMs to discover ODE structures and infer parameter distributions from aggregate data, recovering consistent structures on benchmarks and RDEB clinical data with 231 observations from 46 patients.
Sample Complexity of Scientific Discovery: PAC Learnability of Compositional Function Trees cs.LG · 2026-06-28 · unverdicted · none · ref 49 · internal anchor
Proves that Rademacher complexity of depth-d compositional trees over finite operator vocabulary is controlled by (K b L)^{d} / sqrt(n) under Lipschitz conditions on operators.
FunctionEvolve: Structure-Guided Symbolic Regression with LLMs cs.LG · 2026-06-05 · unverdicted · none · ref 38 · internal anchor
FunctionEvolve recovers 107 exact symbolic forms out of 129 synthetic tasks (82.9% SA@50) by using expression-tree structure for evolutionary search, parent selection, mutation, and coefficient scoring with LLMs.
Symbolic Regression via Latent Iterative Refinement cs.LG · 2026-05-26 · unverdicted · none · ref 4 · internal anchor
LEE performs iterative amortized inference in a functionally grounded latent space to produce 2-10x simpler symbolic expressions than strong baselines on SRBench.
The Neural Compiler: Program-to-Network Translation for Hybrid Scientific Machine Learning cs.LG · 2026-05-21 · unverdicted · none · ref 6 · internal anchor
The Neural Compiler converts symbolic programs into exact differentiable PyTorch modules for hybrid scientific machine learning, enabling precise encoding of known physics with few trainable parameters.
The finite expression method for turbulent dynamics with high-order moment recovery cs.LG · 2026-05-11 · unverdicted · none · ref 12 · internal anchor
A two-stage symbolic regression plus generative model framework recovers governing interaction terms and forcing in stochastic triad models while accurately predicting statistical moments up to order five.
Additive Atomic Forests for Symbolic Function and Antiderivative Discovery cs.LG · 2026-05-01 · unverdicted · none · ref 4 · internal anchor
A derivative algebra with EML and SOL primitives plus additive atomic forests enables simultaneous symbolic recovery of functions and antiderivatives from data, matching or exceeding XGBoost on 13 of 17 benchmarks with interpretable formulas.
Neuro-Symbolic ODE Discovery with Latent Grammar Flow cs.LG · 2026-04-17 · unverdicted · none · ref 12 · internal anchor
Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by data fit and constraints.
LLM-ODE: Data-driven Discovery of Dynamical Systems with Large Language Models cs.LG · 2026-03-21 · unverdicted · none · ref 11 · internal anchor
LLM-ODE integrates large language models into genetic programming to guide symbolic search for governing equations of dynamical systems, outperforming classical GP on 91 test cases in efficiency and solution quality.
In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks cs.LG · 2026-03-16 · unverdicted · none · ref 7 · internal anchor
In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.
Symbolic recovery of PDEs from measurement data cs.LG · 2026-02-17 · unverdicted · none · ref 27 · internal anchor
Symbolic rational-function networks recover an admissible PDE from noiseless complete measurements and select the regularization-minimizing parameterization within the architecture.
Towards Diverse Scientific Hypothesis Search with Large Language Models cs.LG · 2026-06-09 · unverdicted · none · ref 68 · internal anchor
A parallel-tempering evolutionary framework for LLM hypothesis search improves both quality and diversity of candidates in molecular, equation, and algorithm discovery under fixed validation budgets.
Decision-Making under Combinatorial Risk cs.LG · 2026-06-08 · unverdicted · none · ref 41 · internal anchor
People navigate combinatorial risk by focusing on core features like post-investment success probabilities rather than computing the full induced distribution, unless the PMF is explicitly displayed.
LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs cs.LG · 2026-05-21 · unverdicted · none · ref 11 · internal anchor
LLM-AutoSciLab proposes an LLM-driven closed-loop system for hypothesis generation and adaptive experiment selection that reports higher accuracy and 2-5x better sample efficiency than baselines on new chemistry and gene-network discovery benchmarks.
Symbolic Density Estimation for Discrete Distributions cs.LG · 2026-05-20 · unverdicted · none · ref 2 · internal anchor
SDE recovers closed-form PMFs for discrete distributions via evolutionary search guided by domain priors, recovering all benchmark families with accurate parameters and improving mixture fits on real data.
Discovery of Nonlinear Dynamics with Automated Basis Function Generation cs.LG · 2026-05-10 · unverdicted · none · ref 43 · internal anchor
AutoSINDy automatically builds a tailored basis library from PySR symbolic regression and applies SINDy to recover ground-truth nonlinear dynamics with 92.8% success under noise.
Physics-Informed Neural Networks for Biological $2\mathrm{D}{+}t$ Reaction-Diffusion Systems cs.LG · 2026-04-20 · unverdicted · none · ref 32 · internal anchor
BINNs are extended to 2D+t systems and combined with symbolic regression to recover reaction-diffusion models of lung cancer cell dynamics from time-lapse microscopy data.
Machine Learning Hamiltonian Dynamical Systems with Sparse and Noisy Data cs.LG · 2026-04-19 · unverdicted · none · ref 40 · internal anchor
ASRNNs recover Hamiltonian dynamics and symbolic equations from trajectories with only two irregularly spaced noisy points by preserving symplectic structure without derivative estimation.
Automatic Construction of Clinical Scoring Systems with LLM Agents cs.LG · 2026-01-29 · unverdicted · none · ref 3 · internal anchor
AgentScore uses LLM agents for semantically guided search over clinical scoring rules combined with data-driven verification, outperforming prior score generation methods on eight tasks and established guidelines on two externally validated tasks.
Prediction Is Not Physics: Learning and Evaluating Conserved Quantities in Neural Simulators cs.LG · 2026-05-16 · unverdicted · none · ref 5 · internal anchor
Conservation Discovery Networks recover analytical energy with R² ≥ 0.996 in Hamiltonian systems using temporal consistency and λ_align=0.2, but collapse without alignment and show mixed noise robustness.
Balance-Guided Sparse Identification of Multiscale Nonlinear PDEs with Small-coefficient Terms cs.LG · 2026-04-20 · unverdicted · none · ref 28 · internal anchor
BG-SINDy reformulates l0-constrained regression as term-level l2,0 regularization and uses progressive pruning guided by balance contributions to recover small-coefficient terms in multiscale PDEs.
From inverse problems to neural operators: prediction, mechanism, and generalization of data-driven models cs.LG · 2026-06-08 · unverdicted · none · ref 43 · internal anchor
Data-driven models for physical systems share a common structure differing only in model class assumptions, with only mechanism-discovering models capable of generalization.
PyCC.id: A package for hypothesis-driven equation discovery with structural identifiability cs.LG · 2026-05-07 · unverdicted · none · ref 11 · internal anchor
PyCC.id packages a hypothesis-driven method using identifiable ODE skeletons for equation discovery from data, supporting multiple paradigms like neural networks and sparse regression.

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer