SEVerA uses Formally Guarded Generative Models and a three-stage Search-Verification-Learning process to synthesize self-evolving agents that satisfy hard formal constraints while improving task performance.
hub Mixed citations
Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl
Mixed citation behavior. Most common role is background (43%).
abstract
PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
The SDE benchmark shows LLMs lag on scientific discovery tasks relative to general science tests, with diminishing scaling returns and shared weaknesses across models.
KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.
The Neural Compiler converts symbolic programs into exact differentiable PyTorch modules for hybrid scientific machine learning, enabling precise encoding of known physics with few trainable parameters.
Symbolic regression produces an approximate classifier for LHC exclusion limits that enables their direct inclusion during pMSSM global fits.
A graph-based automated model discovery framework identifies new concise soil hydraulic functions from data that outperform the Mualem-van Genuchten model across 249 soil samples.
DRSR uses Quality-Diversity to produce diverse symbolic regression expressions differing in residual distributions, enabling post-search selection on synthetic and astronomical data.
A two-stage symbolic regression plus generative model framework recovers governing interaction terms and forcing in stochastic triad models while accurately predicting statistical moments up to order five.
A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortion-compression trade-off.
Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.
A derivative algebra with EML and SOL primitives plus additive atomic forests enables simultaneous symbolic recovery of functions and antiderivatives from data, matching or exceeding XGBoost on 13 of 17 benchmarks with interpretable formulas.
Machine collective intelligence uses coordinated AI agents to evolve symbolic hypotheses and recover governing equations from observations in deterministic, stochastic, and uncharacterized systems, achieving up to six orders of magnitude better extrapolation than neural networks with 5-40 parameters
Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by data fit and constraints.
First direct constraints on total cosmic backreaction over a significant redshift range are consistent with vanishing backreaction within 1 sigma but are too weak to exclude meaningful backreaction.
LLM-ODE integrates large language models into genetic programming to guide symbolic search for governing equations of dynamical systems, outperforming classical GP on 91 test cases in efficiency and solution quality.
In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.
Symbolic rational-function networks recover an admissible PDE from noiseless complete measurements and select the regularization-minimizing parameterization within the architecture.
KA-CRNNs learn pressure-dependent and collider-specific kinetic rate laws from data using Kolmogorov-Arnold activations inside a CRNN framework, outperforming interpolative methods by 2.88x in MSE on two proof-of-concept reactions.
Symbolic regression yields an emulator for the radial Fourier transform of the Sérsic profile that enables 2.5 times faster galaxy profile fitting with minimal accuracy loss.
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.
Symbolic regression on mobility data recovers gravity and distance-decay models while identifying a new exponential-power-law form linked to maximum entropy.
SDE recovers closed-form PMFs for discrete distributions via evolutionary search guided by domain priors, recovering all benchmark families with accurate parameters and improving mixture fits on real data.
STRIDE is a self-reflective agent framework that improves accuracy, OOD robustness, and structural recovery in LLM-based symbolic regression by integrating generation, evaluation, repair, and diversity-preserving memory.
Tensor perturbations from first-order phase transitions and domain wall annihilation induce curvature fluctuations at second order that form primordial black holes, allowing asteroid-mass PBHs to comprise all dark matter for specific parameter ranges with associated gravitational wave peaks in LISA,
citing papers explorer
-
Reconstructing conformal field theoretical compositions with Transformers
Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.
-
Interpretable Analytic Formulae for GWTC-4 Binary Black Hole Population Properties via Symbolic Regression
Symbolic regression on GWTC-4 posteriors yields closed-form analytic formulae for merger-rate evolution, effective-spin dependencies on mass ratio and redshift, and conditional mass-ratio distributions at specific primary mass peaks.
-
Into the Gompverse: A robust Gompertzian reionization model for CMB analyses
A Gompertzian reionization model with three nuisance parameters demotes optical depth to a derived quantity, reducing its uncertainty by a factor of three and revealing potential neutrino mass tension in CMB analyses.
-
Model-independent constraints on generalized FLRW consistency relations with bootstrap-based symbolic regression
Bootstrap-based symbolic regression on supernova and BAO data finds mild 2-4 sigma deviations from FLRW consistency relations, which if real would rule out most FLRW-based solutions to cosmological tensions.
-
Discovery of Interpretable Surrogates via Agentic AI: Application to Gravitational Waves
GWAgent agentic workflow produces analytic surrogates for eccentric BBH waveforms with 6.9e-4 median mismatch and 8.4x speedup, outperforming baselines, and infers eccentricity for GW200129.
-
Balance-Guided Sparse Identification of Multiscale Nonlinear PDEs with Small-coefficient Terms
BG-SINDy reformulates l0-constrained regression as term-level l2,0 regularization and uses progressive pruning guided by balance contributions to recover small-coefficient terms in multiscale PDEs.