SEVerA uses Formally Guarded Generative Models and a three-stage Search-Verification-Learning process to synthesize self-evolving agents that satisfy hard formal constraints while improving task performance.
hub Mixed citations
Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl
Mixed citation behavior. Most common role is background (40%).
abstract
PySR is an open-source library for practical symbolic regression, a type of machine learning which aims to discover human-interpretable symbolic models. PySR was developed to democratize and popularize symbolic regression for the sciences, and is built on a high-performance distributed back-end, a flexible search algorithm, and interfaces with several deep learning packages. PySR's internal search algorithm is a multi-population evolutionary algorithm, which consists of a unique evolve-simplify-optimize loop, designed for optimization of unknown scalar constants in newly-discovered empirical expressions. PySR's backend is the extremely optimized Julia library SymbolicRegression.jl, which can be used directly from Julia. It is capable of fusing user-defined operators into SIMD kernels at runtime, performing automatic differentiation, and distributing populations of expressions to thousands of cores across a cluster. In describing this software, we also introduce a new benchmark, "EmpiricalBench," to quantify the applicability of symbolic regression algorithms in science. This benchmark measures recovery of historical empirical equations from original and synthetic datasets.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
The SDE benchmark shows LLMs lag on scientific discovery tasks relative to general science tests, with diminishing scaling returns and shared weaknesses across models.
KANs with learnable univariate spline activations on edges achieve better accuracy than MLPs with fewer parameters, faster scaling, and direct visualization for scientific discovery.
Long-range exchange frustration in atomistic spin-lattice models can double skyrmion collapse barriers while keeping micromagnetic parameters fixed, revealing a limitation of continuum approximations.
Hybrid simulation and non-Euclidean elasticity theory demonstrate that clathrin coats develop adaptive rigidity and memory during growth, producing flat, stalled, or closed outcomes through two energy-landscape gates and matching experiments without fitted parameters.
Proves that Rademacher complexity of depth-d compositional trees over finite operator vocabulary is controlled by (K b L)^{d} / sqrt(n) under Lipschitz conditions on operators.
Equilibrium pressure anisotropy modifies the tearing-mode growth-rate prefactor through parameters A and R0 while retaining the S^{-1/2} Lundquist scaling in gyrotropic MHD.
Four parameters suffice to describe dust attenuation curve diversity in TNG simulations, yielding a new symbolic-regression model that recovers curves and fluxes better than existing parameterizations while linking parameters to SFR surface density, metallicity, and geometry.
A convexity-preserving grammar enables symbolic regression to discover thermodynamically admissible dissipation potentials for generalized standard materials from noisy data.
LEE performs iterative amortized inference in a functionally grounded latent space to produce 2-10x simpler symbolic expressions than strong baselines on SRBench.
The Neural Compiler converts symbolic programs into exact differentiable PyTorch modules for hybrid scientific machine learning, enabling precise encoding of known physics with few trainable parameters.
Symbolic regression produces an approximate classifier for LHC exclusion limits that enables their direct inclusion during pMSSM global fits.
A graph-based automated model discovery framework identifies new concise soil hydraulic functions from data that outperform the Mualem-van Genuchten model across 249 soil samples.
DRSR uses Quality-Diversity to produce diverse symbolic regression expressions differing in residual distributions, enabling post-search selection on synthetic and astronomical data.
A two-stage symbolic regression plus generative model framework recovers governing interaction terms and forcing in stochastic triad models while accurately predicting statistical moments up to order five.
A knowledge-first approach to LLM-driven automatic heuristic design in combinatorial optimization yields better discovery efficiency, transfer, and generalization than code-centric baselines by formalizing a distortion-compression trade-off.
Transformers reconstruct the constituent RCFTs in tensor-product theories from low-energy spectra, reaching 98% accuracy on WZW models and generalizing to larger central charges with few out-of-domain examples.
A derivative algebra with EML and SOL primitives plus additive atomic forests enables simultaneous symbolic recovery of functions and antiderivatives from data, matching or exceeding XGBoost on 13 of 17 benchmarks with interpretable formulas.
Machine collective intelligence uses coordinated AI agents to evolve symbolic hypotheses and recover governing equations from observations in deterministic, stochastic, and uncharacterized systems, achieving up to six orders of magnitude better extrapolation than neural networks with 5-40 parameters
Latent Grammar Flow discovers ODEs by placing grammar-based equation representations in a discrete latent space, using a behavioral loss to cluster similar equations, and sampling via a discrete flow model guided by data fit and constraints.
First direct constraints on total cosmic backreaction over a significant redshift range are consistent with vanishing backreaction within 1 sigma but are too weak to exclude meaningful backreaction.
LLM-ODE integrates large language models into genetic programming to guide symbolic search for governing equations of dynamical systems, outperforming classical GP on 91 test cases in efficiency and solution quality.
In-context symbolic regression methods improve robustness of symbolic formula recovery from KANs, cutting median OFAT test MSE by up to 99.8 percent across hyperparameter sweeps.
Symbolic rational-function networks recover an admissible PDE from noiseless complete measurements and select the regularization-minimizing parameterization within the architecture.
citing papers explorer
-
On the definition and importance of interpretability in scientific machine learning
Interpretability in SciML requires mechanistic understanding rather than sparsity, and prior knowledge is often essential for interpretable scientific discovery.
-
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers
LLM-FE is a framework that treats feature engineering as LLM-driven program search with data feedback, reporting consistent gains over baselines on classification and regression tabular tasks.
-
SINDyG: Sparse Identification of Nonlinear Dynamical Systems from Graph-Structured Data, with Applications to Stuart-Landau Oscillator Networks
SINDyG extends SINDy by adding a graph-informed penalty to sparse regression, yielding more accurate and simpler models of network dynamics on Stuart-Landau oscillator networks than standard SINDy.
-
Evolutional Math: Cross-Validated Island-Model Genetic Programming for Interpretable Symbolic Regression on Small, Wide Datasets
Evolutional Math combines cross-validated R-squared fitness, island-model GP with operator-subset islands, structural deduplication, and L-BFGS-B constant refinement to recover compact ground-truth expressions with R² >= 0.99 on synthetic benchmarks and a 24-row clinical dataset.
-
GP-GOMEA with GPU-Based Fitness Evaluations: Design and Performance Analysis
GPU fitness evaluation for GP-GOMEA boosts throughput, improves benchmark results especially on large datasets, and allows reliable regression of large Feynman equations within hours.
-
Guiding Multi-Objective Genetic Programming with Description Length Improves Symbolic Regression Solutions
Post-selection with DL or FBF after multi-objective GP search improves test-set performance over AIC/BIC baselines on noisy synthetic and real regression tasks, while using DL directly as fitness often causes premature convergence to overly simple models.
-
Prediction Is Not Physics: Learning and Evaluating Conserved Quantities in Neural Simulators
Conservation Discovery Networks recover analytical energy with R² ≥ 0.996 in Hamiltonian systems using temporal consistency and λ_align=0.2, but collapse without alignment and show mixed noise robustness.
-
Discovering interpretable low-dimensional dynamics using maximum entropy
Edwin integrates dynamic maximum entropy dimensionality reduction with symbolic regression to recover physically interpretable low-dimensional dynamics from high-dimensional observations that generalize to unseen conditions.
-
Discovery of Interpretable Surrogates via Agentic AI: Application to Gravitational Waves
GWAgent agentic workflow produces analytic surrogates for eccentric BBH waveforms with 6.9e-4 median mismatch and 8.4x speedup, outperforming baselines, and infers eccentricity for GW200129.
-
Balance-Guided Sparse Identification of Multiscale Nonlinear PDEs with Small-coefficient Terms
BG-SINDy reformulates l0-constrained regression as term-level l2,0 regularization and uses progressive pruning guided by balance contributions to recover small-coefficient terms in multiscale PDEs.
-
Singularity Formation: Synergy in Theoretical, Numerical and Machine Learning Approaches
The work introduces a modulation-based analytical method for singularity proofs in singular PDEs and refines ML techniques like PINNs and KANs to identify blowup solutions, with application to the open 3D Keller-Segel problem.
-
What is the diatomic molecule with the largest dipole moment?
A machine learning model based on atomic properties predicts diatomic dipole moments, screens the periodic table for the largest values, and condenses into an analytical expression.
-
In Context Learning and Reasoning for Symbolic Regression with Large Language Models
GPT-4 models rediscover Langmuir isotherms and produce fits on Nikuradse pipe-flow data via iterative chain-of-thought prompting with scientific context and external code feedback.
-
Experimental Design for Missing Physics
A sequential experimental design technique discriminates between model structures from symbolic regression to discover missing physics in process systems such as bioreactors.
-
Proper time expansions and glasma dynamics
The authors test methods that extend the reliable reach of proper time expansions for glasma dynamics from roughly 0.05 fm/c to about 0.08 fm/c.
-
PyCC.id: A package for hypothesis-driven equation discovery with structural identifiability
PyCC.id packages a hypothesis-driven method using identifiable ODE skeletons for equation discovery from data, supporting multiple paradigms like neural networks and sparse regression.
-
A Practitioner's Guide to Kolmogorov-Arnold Networks
A systematic review of Kolmogorov-Arnold Networks that maps their relation to Kolmogorov superposition theory, MLPs, and kernels, examines basis-function design choices, summarizes performance advances, and supplies a practitioner's selection guide plus open challenges.
-
Interpreting "Interpretability" and Explaining "Explainability" in Machine Learning in Physics
The paper defines interpretability as model structural transparency and explainability as scientific content mapping, discusses their trade-offs, and frames both as deliberate modeling choices for ML in physics.
-
Introduction to Symbolic Regression in the Physical Sciences
Symbolic regression provides an interpretable way to extract mathematical relationships from data for scientific discovery and surrogate modeling in the physical sciences.
- Identifying Topological Invariants of Non-Hermitian Systems via Domain-Adaptive Multimodal Model for Mathematics