hub Mixed citations

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Miles Cranmer (Princeton University, Flatiron Institute) · 2023 · astro-ph.IM · arXiv 2305.01582

Mixed citation behavior. Most common role is background (57%).

31 Pith papers citing it

Background 57% of classified citations

open full Pith review browse 31 citing papers arXiv PDF

abstract

PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 2 baseline 1

citation-polarity summary

background 4 use method 2 baseline 1

representative citing papers

SEVerA: Verified Synthesis of Self-Evolving Agents

cs.LG · 2026-03-26 · unverdicted · novelty 8.0

SEVerA uses Formally Guarded Generative Models and a three-stage Search-Verification-Learning process to synthesize self-evolving agents that satisfy hard formal constraints while improving task performance.

KAN: Kolmogorov-Arnold Networks

cs.LG · 2024-04-30 · conditional · novelty 8.0

KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.

The finite expression method for turbulent dynamics with high-order moment recovery

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

A two-stage symbolic regression plus generative model framework recovers governing interaction terms and forcing in stochastic triad models while accurately predicting statistical moments up to order five.

Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortion-compression trade-off.

Reconstructing conformal field theoretical compositions with Transformers

hep-th · 2026-05-01 · unverdicted · novelty 7.0

Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.

Additive Atomic Forests for Symbolic Function and Antiderivative Discovery

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

A derivative algebra with EML and SOL primitives plus additive atomic forests enables simultaneous symbolic recovery of functions and antiderivatives from data, matching or exceeding XGBoost on 13 of 17 benchmarks with interpretable formulas.

Machine Collective Intelligence for Explainable Scientific Discovery

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

Machine collective intelligence uses coordinated AI agents to evolve symbolic hypotheses and recover governing equations from observations in deterministic, stochastic, and uncharacterized systems, achieving up to six orders of magnitude better extrapolation than neural networks with 5-40 parameters

Neuro-Symbolic ODE Discovery with Latent Grammar Flow

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by data fit and constraints.

First observational constraints on cosmic backreaction over an extended redshift range

astro-ph.CO · 2026-04-13 · unverdicted · novelty 7.0

First direct constraints on total cosmic backreaction over a significant redshift range are consistent with vanishing backreaction within 1 sigma but are too weak to exclude meaningful backreaction.

LLM-ODE: Data-driven Discovery of Dynamical Systems with Large Language Models

cs.LG · 2026-03-21 · unverdicted · novelty 7.0

LLM-ODE integrates large language models into genetic programming to guide symbolic search for governing equations of dynamical systems, outperforming classical GP on 91 test cases in efficiency and solution quality.

In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks

cs.LG · 2026-03-16 · unverdicted · novelty 7.0

In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.

Symbolic recovery of PDEs from measurement data

cs.LG · 2026-02-17 · unverdicted · novelty 7.0

Symbolic rational-function networks recover an admissible PDE from noiseless complete measurements and select the regularization-minimizing parameterization within the architecture.

AlphaEvolve: A coding agent for scientific and algorithmic discovery

cs.AI · 2025-06-16 · unverdicted · novelty 7.0

AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.

Primordial Black Hole from Tensor-induced Density Fluctuation: First-order Phase Transitions and Domain Walls

astro-ph.CO · 2026-05-14 · unverdicted · novelty 6.0

Tensor perturbations from first-order phase transitions and domain wall annihilation induce curvature fluctuations at second order that form primordial black holes, allowing asteroid-mass PBHs to comprise all dark matter for specific parameter ranges with associated gravitational wave peaks in LISA,

FePySR: A Neural Feature Extraction Framework for Efficient and Scalable Symbolic Regression

cs.SC · 2026-05-12 · unverdicted · novelty 6.0

FePySR uses a neural network to pre-extract valid features before PySR search, recovering more equations than baselines on benchmarks and identifying governing ODEs in 24 of 100 biological cases where PySR finds none.

GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing

cs.AI · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

GESR uses two BERT models to intelligently direct mutations and crossovers inside genetic programming, yielding higher efficiency and competitive accuracy on symbolic regression benchmarks.

Discovery of Nonlinear Dynamics with Automated Basis Function Generation

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

AutoSINDy automatically builds a tailored basis library from PySR symbolic regression and applies SINDy to recover ground-truth nonlinear dynamics with 92.8% success under noise.

Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

DoLQ employs a sampler agent, parameter optimizer, and LLM-based scientist agent to iteratively propose, refine, and evaluate ODE candidates, yielding higher success rates and better symbolic term recovery than prior symbolic regression methods on multi-dimensional benchmarks.

Programmatic Context Augmentation for LLM-based Symbolic Regression

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

Programmatic context augmentation lets LLM-based symbolic regression perform code-driven data analysis during search, yielding superior efficiency and accuracy over baselines on LLM-SRBench.

Interpretable Analytic Formulae for GWTC-4 Binary Black Hole Population Properties via Symbolic Regression

astro-ph.CO · 2026-04-22 · unverdicted · novelty 6.0

Symbolic regression on GWTC-4 posteriors yields closed-form analytic formulae for merger-rate evolution, effective-spin dependencies on mass ratio and redshift, and conditional mass-ratio distributions at specific primary mass peaks.

Physics-Informed Neural Networks for Biological $2\mathrm{D}{+}t$ Reaction-Diffusion Systems

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

BINNs are extended to 2D+t systems and combined with symbolic regression to recover reaction-diffusion models of lung cancer cell dynamics from time-lapse microscopy data.

Machine Learning Hamiltonian Dynamical Systems with Sparse and Noisy Data

cs.LG · 2026-04-19 · unverdicted · novelty 6.0

ASRNNs recover Hamiltonian dynamics and symbolic equations from trajectories with only two irregularly spaced noisy points by preserving symplectic structure without derivative estimation.

Discovering quantum phenomena with Interpretable Machine Learning

quant-ph · 2026-04-17 · unverdicted · novelty 6.0

Variational autoencoders combined with symbolic regression extract physically meaningful representations and order parameters from raw quantum measurement data, revealing new phenomena such as corner-ordering in Rydberg arrays.

Into the Gompverse: A robust Gompertzian reionization model for CMB analyses

astro-ph.CO · 2026-04-15 · unverdicted · novelty 6.0

A Gompertzian reionization model with three nuisance parameters demotes optical depth to a derived quantity, reducing its uncertainty by a factor of three and revealing potential neutrino mass tension in CMB analyses.

citing papers explorer

Showing 31 of 31 citing papers.

SEVerA: Verified Synthesis of Self-Evolving Agents cs.LG · 2026-03-26 · unverdicted · none · ref 8 · internal anchor
SEVerA uses Formally Guarded Generative Models and a three-stage Search-Verification-Learning process to synthesize self-evolving agents that satisfy hard formal constraints while improving task performance.
KAN: Kolmogorov-Arnold Networks cs.LG · 2024-04-30 · conditional · none · ref 96 · internal anchor
KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.
The finite expression method for turbulent dynamics with high-order moment recovery cs.LG · 2026-05-11 · unverdicted · none · ref 12 · internal anchor
A two-stage symbolic regression plus generative model framework recovers governing interaction terms and forcing in stochastic triad models while accurately predicting statistical moments up to order five.
Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs cs.AI · 2026-05-07 · unverdicted · none · ref 65 · internal anchor
A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortion-compression trade-off.
Reconstructing conformal field theoretical compositions with Transformers hep-th · 2026-05-01 · unverdicted · none · ref 33 · internal anchor
Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.
Additive Atomic Forests for Symbolic Function and Antiderivative Discovery cs.LG · 2026-05-01 · unverdicted · none · ref 4 · internal anchor
A derivative algebra with EML and SOL primitives plus additive atomic forests enables simultaneous symbolic recovery of functions and antiderivatives from data, matching or exceeding XGBoost on 13 of 17 benchmarks with interpretable formulas.
Machine Collective Intelligence for Explainable Scientific Discovery cs.AI · 2026-04-30 · unverdicted · none · ref 20 · internal anchor
Machine collective intelligence uses coordinated AI agents to evolve symbolic hypotheses and recover governing equations from observations in deterministic, stochastic, and uncharacterized systems, achieving up to six orders of magnitude better extrapolation than neural networks with 5-40 parameters
Neuro-Symbolic ODE Discovery with Latent Grammar Flow cs.LG · 2026-04-17 · unverdicted · none · ref 12 · internal anchor
Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by data fit and constraints.
First observational constraints on cosmic backreaction over an extended redshift range astro-ph.CO · 2026-04-13 · unverdicted · none · ref 34 · internal anchor
First direct constraints on total cosmic backreaction over a significant redshift range are consistent with vanishing backreaction within 1 sigma but are too weak to exclude meaningful backreaction.
LLM-ODE: Data-driven Discovery of Dynamical Systems with Large Language Models cs.LG · 2026-03-21 · unverdicted · none · ref 11 · internal anchor
LLM-ODE integrates large language models into genetic programming to guide symbolic search for governing equations of dynamical systems, outperforming classical GP on 91 test cases in efficiency and solution quality.
In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks cs.LG · 2026-03-16 · unverdicted · none · ref 7 · internal anchor
In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.
Symbolic recovery of PDEs from measurement data cs.LG · 2026-02-17 · unverdicted · none · ref 27 · internal anchor
Symbolic rational-function networks recover an admissible PDE from noiseless complete measurements and select the regularization-minimizing parameterization within the architecture.
AlphaEvolve: A coding agent for scientific and algorithmic discovery cs.AI · 2025-06-16 · unverdicted · none · ref 21 · internal anchor
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.
Primordial Black Hole from Tensor-induced Density Fluctuation: First-order Phase Transitions and Domain Walls astro-ph.CO · 2026-05-14 · unverdicted · none · ref 233 · internal anchor
Tensor perturbations from first-order phase transitions and domain wall annihilation induce curvature fluctuations at second order that form primordial black holes, allowing asteroid-mass PBHs to comprise all dark matter for specific parameter ranges with associated gravitational wave peaks in LISA,
FePySR: A Neural Feature Extraction Framework for Efficient and Scalable Symbolic Regression cs.SC · 2026-05-12 · unverdicted · none · ref 7 · internal anchor
FePySR uses a neural network to pre-extract valid features before PySR search, recovering more equations than baselines on benchmarks and identifying governing ODEs in 24 of 100 biological cases where PySR finds none.
GESR: A Genetic Programming-Based Symbolic Regression Method with Gene Editing cs.AI · 2026-05-11 · unverdicted · none · ref 8 · 3 links · internal anchor
GESR uses two BERT models to intelligently direct mutations and crossovers inside genetic programming, yielding higher efficiency and competitive accuracy on symbolic regression benchmarks.
Discovery of Nonlinear Dynamics with Automated Basis Function Generation cs.LG · 2026-05-10 · unverdicted · none · ref 43 · internal anchor
AutoSINDy automatically builds a tailored basis library from PySR symbolic regression and applies SINDy to recover ground-truth nonlinear dynamics with 92.8% success under noise.
Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation cs.AI · 2026-05-08 · unverdicted · none · ref 2 · internal anchor
DoLQ employs a sampler agent, parameter optimizer, and LLM-based scientist agent to iteratively propose, refine, and evaluate ODE candidates, yielding higher success rates and better symbolic term recovery than prior symbolic regression methods on multi-dimensional benchmarks.
Programmatic Context Augmentation for LLM-based Symbolic Regression cs.AI · 2026-05-04 · unverdicted · none · ref 17 · internal anchor
Programmatic context augmentation lets LLM-based symbolic regression perform code-driven data analysis during search, yielding superior efficiency and accuracy over baselines on LLM-SRBench.
Interpretable Analytic Formulae for GWTC-4 Binary Black Hole Population Properties via Symbolic Regression astro-ph.CO · 2026-04-22 · unverdicted · none · ref 11 · internal anchor
Symbolic regression on GWTC-4 posteriors yields closed-form analytic formulae for merger-rate evolution, effective-spin dependencies on mass ratio and redshift, and conditional mass-ratio distributions at specific primary mass peaks.
Physics-Informed Neural Networks for Biological $2\mathrm{D}{+}t$ Reaction-Diffusion Systems cs.LG · 2026-04-20 · unverdicted · none · ref 32 · internal anchor
BINNs are extended to 2D+t systems and combined with symbolic regression to recover reaction-diffusion models of lung cancer cell dynamics from time-lapse microscopy data.
Machine Learning Hamiltonian Dynamical Systems with Sparse and Noisy Data cs.LG · 2026-04-19 · unverdicted · none · ref 40 · internal anchor
ASRNNs recover Hamiltonian dynamics and symbolic equations from trajectories with only two irregularly spaced noisy points by preserving symplectic structure without derivative estimation.
Discovering quantum phenomena with Interpretable Machine Learning quant-ph · 2026-04-17 · unverdicted · none · ref 9 · internal anchor
Variational autoencoders combined with symbolic regression extract physically meaningful representations and order parameters from raw quantum measurement data, revealing new phenomena such as corner-ordering in Rydberg arrays.
Into the Gompverse: A robust Gompertzian reionization model for CMB analyses astro-ph.CO · 2026-04-15 · unverdicted · none · ref 28 · internal anchor
A Gompertzian reionization model with three nuisance parameters demotes optical depth to a derived quantity, reducing its uncertainty by a factor of three and revealing potential neutrino mass tension in CMB analyses.
Model-independent constraints on generalized FLRW consistency relations with bootstrap-based symbolic regression astro-ph.CO · 2026-04-07 · unverdicted · none · ref 47 · internal anchor
Bootstrap-based symbolic regression on supernova and BAO data finds mild 2-4 sigma deviations from FLRW consistency relations, which if real would rule out most FLRW-based solutions to cosmological tensions.
Generating Literature-Driven Scientific Theories at Scale cs.CL · 2026-01-22 · unverdicted · none · ref 2 · internal anchor
Literature-grounded LLM synthesis of theories from 13.7k papers yields 2.9k theories that better match evidence and predict future results from 4.6k subsequent papers than parametric baselines.
Discovery of Interpretable Surrogates via Agentic AI: Application to Gravitational Waves gr-qc · 2026-05-11 · unverdicted · none · ref 87 · internal anchor
GWAgent agentic workflow produces analytic surrogates for eccentric BBH waveforms with 6.9e-4 median mismatch and 8.4x speedup, outperforming baselines, and infers eccentricity for GW200129.
Balance-Guided Sparse Identification of Multiscale Nonlinear PDEs with Small-coefficient Terms cs.LG · 2026-04-20 · unverdicted · none · ref 28 · internal anchor
BG-SINDy reformulates l0-constrained regression as term-level l2,0 regularization and uses progressive pruning guided by balance contributions to recover small-coefficient terms in multiscale PDEs.
Singularity Formation: Synergy in Theoretical, Numerical and Machine Learning Approaches math.NA · 2026-04-18 · unverdicted · none · ref 80 · internal anchor
The work introduces a modulation-based analytical method for singularity proofs in singular PDEs and refines ML techniques like PINNs and KANs to identify blowup solutions, with application to the open 3D Keller-Segel problem.
Identifying Topological Invariants of Non-Hermitian Systems via Domain-Adaptive Multimodal Model for Mathematics cond-mat.other · 2026-04-08 · unverdicted · none · ref 72 · internal anchor
A multimodal model with Qwen Math backbone identifies topological invariants of non-Hermitian systems from eigenvalues and eigenvectors in momentum space.
Experimental Design for Missing Physics stat.ML · 2026-03-21 · unverdicted · none · ref 6 · internal anchor
A sequential experimental design technique discriminates between model structures from symbolic regression to discover missing physics in process systems such as bioreactors.

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer