pith. sign in

arxiv: 2606.00988 · v1 · pith:2KHZUCNDnew · submitted 2026-05-31 · 💻 cs.LG

Data Enrichment for Symbolic Regression Using Diffusion Models

Pith reviewed 2026-06-28 17:38 UTC · model grok-4.3

classification 💻 cs.LG
keywords symbolic regressiondata enrichmentdiffusion modelsphysics-informed modelssparse dataequation discoverygenerative modelslatent diffusion
0
0 comments X

The pith

A physics-guided latent diffusion framework enriches sparse observations with synthetic fields that respect governing relations, improving symbolic regression recovery across physical systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Symbolic regression often fails when measurements are sparse or incomplete, which is typical in practice. The paper introduces a generative method that adds new data samples using a diffusion process steered by physics constraints to keep the added fields consistent with the target system's equations. This enriched set of observations is then passed to standard symbolic regression tools. Tests on heat conduction, fluid flow, and gravitational problems show better equation recovery rates when the enrichment includes the physics correction step. The work aims to reduce reliance on case-by-case expert knowledge for making data enrichment safe for equation discovery.

Core claim

The authors present a physics-guided latent diffusion framework that combines a variational autoencoder, a conditional latent diffusion model, and a physics-informed residual corrector to complete sparse spatiotemporal observations with synthetic fields constrained by the system's governing relations, resulting in consistently higher equation recovery rates in sparse regimes when used with downstream symbolic regression algorithms such as GPLearn, DEAP, and PySR on heat conduction, incompressible Navier-Stokes, and Newtonian gravity problems.

What carries the argument

Physics-guided latent diffusion framework integrating a variational autoencoder, conditional latent diffusion model, and physics-informed residual corrector to generate physically consistent synthetic data.

If this is right

  • Symbolic regression recovers governing equations more reliably from sparse data in physical systems.
  • Data enrichment becomes usable without requiring narrow domain expertise for each new application.
  • Performance gains hold across different physical dynamics and multiple symbolic regression backends.
  • The physics correction step is necessary to prevent enrichment from degrading equation discovery.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same enrichment approach could be tested on experimental data from sensors rather than simulated sparse fields.
  • Extensions might include systems with unknown or partially known governing equations.
  • The framework could support real-time data completion in ongoing physical experiments.
  • Combining it with uncertainty quantification in the diffusion step might further stabilize symbolic regression outputs.

Load-bearing premise

The physics-informed residual corrector produces generated fields that preserve the target system's governing relations without introducing samples that systematically mislead downstream symbolic regression.

What would settle it

Symbolic regression recovery rates on the benchmark systems remain the same or decline when the physics-corrected enriched data is supplied instead of the original sparse observations alone.

Figures

Figures reproduced from arXiv: 2606.00988 by Simon De Reuver, Tamas Kristof Toth, Teddy Lazebnik.

Figure 1
Figure 1. Figure 1: A schematic view of the proposed method. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A schematic view of the experimental flow. [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
read the original abstract

Symbolic regression (SR) offers a route to scientific discovery by converting observations into interpretable governing equations. However, despite its promise, its reliability degrades sharply when spatiotemporal measurements are sparse, noisy, or physically incomplete, as commonly occurring in practice. Data enrichment (DE) has been shown to be able to mitigate this limitation, yet additional samples can mislead equation discovery unless they preserve the physical structure of the target system. Such implication of DE requires narrow domain expertise as well as technical fluidity, highly limiting its practical usefulness. In this study, we introduce a physics-guided latent diffusion framework for DE for down the line SR models. The proposed framework combines a variational autoencoder, a conditional latent diffusion model, and a physics-informed residual corrector to complete sparse observations with synthetic fields constrained by governing relations. We evaluate the approach on heat conduction, incompressible Navier-Stokes flow, and a moving single-mass Newtonian gravitational potential, using GPLearn, DEAP, and PySR as downstream SR backends. Our results reveal that physics-corrected enrichment consistently improves recovery in sparse regimes across physical dynamics and SR models. These results show that generative enrichment can strengthen equation discovery without additional domain expertise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a physics-guided latent diffusion framework for data enrichment (DE) to improve downstream symbolic regression (SR) on sparse, noisy observations of physical systems. The method integrates a variational autoencoder, a conditional latent diffusion model, and a physics-informed residual corrector to generate synthetic fields that preserve governing relations; it is evaluated on heat conduction, incompressible Navier-Stokes, and Newtonian gravitational potential using GPLearn, DEAP, and PySR backends, with the central claim that physics-corrected enrichment yields consistent recovery improvements in sparse regimes without requiring additional domain expertise.

Significance. If the central claim holds after addressing implementation details, the work could meaningfully extend generative modeling techniques to support equation discovery in data-limited scientific settings, offering a pathway to mitigate sparsity issues that currently limit SR reliability across multiple physical domains and SR algorithms.

major comments (2)
  1. [§3] §3 (Methods), description of the physics-informed residual corrector: the framework is stated to complete sparse observations with fields 'constrained by governing relations,' yet the manuscript does not specify whether the corrector is implemented using known differential operators or residuals of the target system. If the former, this introduces circularity that undermines the 'without additional domain expertise' claim for true discovery tasks; a concrete description of the residual loss and its dependence on a priori physics is required to assess whether the method can be applied when the governing equations are unknown.
  2. [§4] §4 (Experiments) and associated tables/figures: the abstract and results claim 'consistent improvement' across dynamics and SR models, but no quantitative metrics (e.g., recovery rates, error bars, ablation studies isolating the residual corrector, or statistical significance tests) are referenced in the provided summary; without these, the load-bearing claim that the enrichment step improves SR cannot be verified and must be supported by explicit numerical results and controls.
minor comments (2)
  1. [Abstract] The abstract asserts performance gains but supplies no numerical values or references to specific tables/figures; adding a brief quantitative summary would improve readability.
  2. [§3] Notation for the latent diffusion conditioning and the residual corrector loss should be defined explicitly with equations to avoid ambiguity in how physics constraints are enforced during generation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point-by-point below and will revise the manuscript to provide the requested clarifications and explicit results.

read point-by-point responses
  1. Referee: [§3] §3 (Methods), description of the physics-informed residual corrector: the framework is stated to complete sparse observations with fields 'constrained by governing relations,' yet the manuscript does not specify whether the corrector is implemented using known differential operators or residuals of the target system. If the former, this introduces circularity that undermines the 'without additional domain expertise' claim for true discovery tasks; a concrete description of the residual loss and its dependence on a priori physics is required to assess whether the method can be applied when the governing equations are unknown.

    Authors: We will add a precise description of the residual corrector in §3, including the loss formulation. The corrector computes the PDE residual (e.g., via finite differences for the heat equation or Navier-Stokes divergence-free condition) on the decoded fields and back-propagates to enforce consistency with the known governing relations of each benchmark system. This step does rely on a priori knowledge of the equation form. We acknowledge the potential circularity for fully unknown systems and will revise the abstract, introduction, and discussion to qualify the 'without additional domain expertise' claim as applying specifically to the SR backend (no manual basis selection or feature engineering required), while noting the enrichment step assumes access to the target PDE form. Limitations for discovery of entirely unknown physics will be discussed. revision: yes

  2. Referee: [§4] §4 (Experiments) and associated tables/figures: the abstract and results claim 'consistent improvement' across dynamics and SR models, but no quantitative metrics (e.g., recovery rates, error bars, ablation studies isolating the residual corrector, or statistical significance tests) are referenced in the provided summary; without these, the load-bearing claim that the enrichment step improves SR cannot be verified and must be supported by explicit numerical results and controls.

    Authors: The full manuscript contains tables reporting exact recovery rates (fraction of runs recovering the ground-truth equation), standard deviations over 10 random seeds, ablation results isolating the residual corrector contribution, and paired statistical significance tests (Wilcoxon signed-rank) comparing enriched vs. baseline SR performance. We will revise the abstract to cite these specific metrics and ensure all tables/figures are explicitly referenced in the results text. If needed, we will add further controls such as enrichment without the physics corrector. revision: yes

Circularity Check

1 steps flagged

Physics-informed residual corrector presupposes target governing relations for enrichment

specific steps
  1. self definitional [Abstract]
    "The proposed framework combines a variational autoencoder, a conditional latent diffusion model, and a physics-informed residual corrector to complete sparse observations with synthetic fields constrained by governing relations."

    The residual corrector is defined to constrain the generated fields by the governing relations of the target system. Since the downstream task is symbolic regression to recover those same governing relations from the enriched data, the enrichment step is constructed using the target output (the equations), making any measured improvement in recovery dependent on prior knowledge of the physics being discovered.

full rationale

The framework's central mechanism for generating enriched data that improves SR recovery is the physics-informed residual corrector, which explicitly constrains synthetic fields using the system's governing relations. This step reduces the reported performance gain to a process that injects prior knowledge of the target equations into the data, directly contradicting the claim of operating without additional domain expertise in discovery settings. The abstract provides the explicit description of this dependency, and no independent derivation or external validation of the corrector is indicated.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only; the ledger is populated from the high-level description. The central claim rests on the assumption that a learned physics residual can enforce governing equations on generated samples without access to the true equations during inference.

axioms (1)
  • domain assumption The target physical systems obey known governing PDEs or ODEs that can be used as soft constraints during data generation.
    Invoked when the physics-informed residual corrector is introduced.
invented entities (1)
  • physics-informed residual corrector no independent evidence
    purpose: Adjusts diffusion-generated fields to satisfy physical constraints
    New module introduced in the framework; no independent evidence supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5734 in / 1214 out tokens · 16689 ms · 2026-06-28T17:38:28.298655+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

124 extracted references · 3 canonical work pages

  1. [1]

    Rondinelli

    Yiqun Wang, Nicholas Wagner, and James M. Rondinelli. Symbolic regression in materials science.MRS Communications, 9(3):793–805, 2019

  2. [2]

    Muthyala, Farshud Sorourifar, You Peng, and Joel A

    Madhav R. Muthyala, Farshud Sorourifar, You Peng, and Joel A. Paulson. Symantic: An efficient symbolic regression method for interpretable and parsimonious model discovery in science and beyond.Industrial & Engineering Chemistry Research, 64(6):3354–3369, 2025

  3. [3]

    Interpretable knowledge distillation via symbolic regression for feedforward neural networks.Neural Computing and Applications, 38(7):243, 2026

    Assaf Shmuel, Nir Koren, Oren Glickman, and Teddy Lazebnik. Interpretable knowledge distillation via symbolic regression for feedforward neural networks.Neural Computing and Applications, 38(7):243, 2026

  4. [4]

    Interpretable scientific discovery with symbolic regression: A review

    Nour Makke and Sanjay Chawla. Interpretable scientific discovery with symbolic regression: A review. Artificial Intelligence Review, 57:2, 2024

  5. [5]

    Lu, Srijon Mukherjee, Michael Gilbert, Li Jing, Vladimir ˇCeperi´c, and Marin Soljaˇci´c

    Samuel Kim, Peter Y . Lu, Srijon Mukherjee, Michael Gilbert, Li Jing, Vladimir ˇCeperi´c, and Marin Soljaˇci´c. Integration of neural network-based symbolic regression in deep learning for scientific discov- ery.IEEE Transactions on Neural Networks and Learning Systems, 32(9):4166–4177, 2021

  6. [6]

    Distilling free-form natural laws from experimental data.Science, 324(5923):81–85, 2009

    Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data.Science, 324(5923):81–85, 2009

  7. [7]

    Staples, and Omer San

    Harsha Vaddireddy, Adil Rasheed, Anne E. Staples, and Omer San. Feature engineering and symbolic regression methods for detecting hidden physics from sparse sensor observation data.Physics of Fluids, 32(1):015113, 01 2020

  8. [8]

    Cohen, Burcu Beykal, and George M

    Benjamin G. Cohen, Burcu Beykal, and George M. Bollas. Physics-informed genetic programming for discovery of partial differential equations from scarce and noisy data.Journal of Computational Physics, 514:113261, 2024

  9. [9]

    Pawan Goyal and Peter Benner. Discovery of nonlinear dynamical systems using a runge–kutta inspired dictionary-based sparse regression approach.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 478(2262):20210883, 06 2022

  10. [10]

    Odeformer: Symbolic regression of dynamical systems with transformers

    St ´ephane d’Ascoli, S¨oren Becker, Philippe Schwaller, Alexander Mathis, and Niki Kilbertus. Odeformer: Symbolic regression of dynamical systems with transformers. In B. Kim, Y . Yue, S. Chaudhuri, K. Fragki- adaki, M. Khan, and Y . Sun, editors,International Conference on Learning Representations, volume 2024, pages 21943–21976, 2024

  11. [11]

    Discovering sparse interpretable dynamics from partial observations.Communications Physics, 5(1):206, 2022

    Peter Y Lu, Joan Ari ˜no Bernad, and Marin Soljaˇci´c. Discovering sparse interpretable dynamics from partial observations.Communications Physics, 5(1):206, 2022

  12. [12]

    Automatically discovering ordinary differential equations from data with sparse regression.Communications Physics, 7(1):20, 2024

    Kevin Egan, Weizhen Li, and Rui Carvalho. Automatically discovering ordinary differential equations from data with sparse regression.Communications Physics, 7(1):20, 2024

  13. [13]

    Symbolic re- gression on sparse and noisy data with gaussian processes

    Junette Hsin, Shubhankar Agarwal, Adam Thorpe, Luis Sentis, and David Fridovich-Keil. Symbolic re- gression on sparse and noisy data with gaussian processes. In2025 American Control Conference (ACC), pages 3170–3175, 2025

  14. [14]

    Sparse discovery of differential equations based on multi-fidelity gaussian process.Journal of Computational Physics, 523:113651, 2025

    Yuhuang Meng and Yue Qiu. Sparse discovery of differential equations based on multi-fidelity gaussian process.Journal of Computational Physics, 523:113651, 2025

  15. [15]

    Learning dynamics from coarse/noisy data with scalable symbolic regression

    Zhao Chen and Nan Wang. Learning dynamics from coarse/noisy data with scalable symbolic regression. Mechanical Systems and Signal Processing, 190:110147, 2023. Draft: June 2, 2026 22

  16. [16]

    Physics- informed neural networks and symbolic regression for equation discovery in non-destructive evaluation of composite plates.Measurement, 258:119324, 2026

    Mingxuan Huang, Zhonghai Xu, Chaocan Cai, Chunxing Hu, Jiezheng Qiu, and Weilong Yin. Physics- informed neural networks and symbolic regression for equation discovery in non-destructive evaluation of composite plates.Measurement, 258:119324, 2026

  17. [17]

    Seulki Han, Utsav Awasthi, and George M. Bollas. Physics-informed symbolic regression for tool wear and remaining useful life predictions in manufacturing.Journal of Manufacturing Systems, 80:734–748, 2025

  18. [18]

    Yaxuan Cui, Yang Cui, Ruheng Wang, Zheyong Zhu, Xin Zeng, Kenta Nakai, Feifei Cui, Zilong Zhang, Hua Shi, Yan Chen, et al. Diffusionst: a deep generative diffusion model-based framework for enhancing spatial transcriptomics data quality and identifying spatial domains.Briefings in Bioinformatics, 26(4):bbaf390, 2025

  19. [19]

    An overview of diffusion models: Applica- tions, guided generation, statistical rates and optimization, 2024

    Minshuo Chen, Song Mei, Jianqing Fan, and Mengdi Wang. An overview of diffusion models: Applica- tions, guided generation, statistical rates and optimization, 2024

  20. [20]

    A survey on generative diffusion models.IEEE transactions on knowledge and data engineering, 36(7):2814–2830, 2024

    Hanqun Cao, Cheng Tan, Zhangyang Gao, Yilun Xu, Guangyong Chen, Pheng-Ann Heng, and Stan Z Li. A survey on generative diffusion models.IEEE transactions on knowledge and data engineering, 36(7):2814–2830, 2024

  21. [21]

    Bayesian conditional diffusion models for versatile spatiotemporal turbulence generation.Computer Methods in Applied Mechanics and Engineering, 427:117023, 2024

    Han Gao, Xu Han, Xiantao Fan, Luning Sun, Li-Ping Liu, Lian Duan, and Jian-Xun Wang. Bayesian conditional diffusion models for versatile spatiotemporal turbulence generation.Computer Methods in Applied Mechanics and Engineering, 427:117023, 2024

  22. [22]

    Realistic data enrichment for robust image segmentation in histopathology

    Sarah Cechnicka, James Ball, Hadrien Reynaud, Callum Arthurs, Candice Roufosse, and Bernhard Kainz. Realistic data enrichment for robust image segmentation in histopathology. In Lisa Koch, M. Jorge Car- doso, Enzo Ferrante, Konstantinos Kamnitsas, Mobarakol Islam, Meirui Jiang, Nicola Rieke, Sotirios A. Tsaftaris, and Dong Yang, editors,Domain Adaptation ...

  23. [23]

    Springer Nature Switzerland

  24. [24]

    A physics-informed diffusion model for high-fidelity flow field reconstruction.Journal of Computational Physics, 478:111972, 2023

    Dule Shu, Zijie Li, and Amir Barati Farimani. A physics-informed diffusion model for high-fidelity flow field reconstruction.Journal of Computational Physics, 478:111972, 2023

  25. [25]

    Physics-informed diffusion models

    Jan-Hendrik Bastek, WaiChing Sun, and Dennis Kochmann. Physics-informed diffusion models. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Representations, volume 2025, pages 3360–3385, 2025

  26. [26]

    Conditional neural field latent diffusion model for generating spatiotemporal turbulence.Nature Communications, 15(1):10416, 2024

    Pan Du, Meet Hemant Parikh, Xiantao Fan, Xin-Yang Liu, and Jian-Xun Wang. Conditional neural field latent diffusion model for generating spatiotemporal turbulence.Nature Communications, 15(1):10416, 2024

  27. [27]

    Recent advances in symbolic regression.ACM Computing Surveys, 57(11):1–37, 2025

    Junlan Dong and Jinghui Zhong. Recent advances in symbolic regression.ACM Computing Surveys, 57(11):1–37, 2025

  28. [28]

    Artificial intelligence in physical sciences: Symbolic regression trends and perspectives: D

    Dimitrios Angelis, Filippos Sofos, and Theodoros E Karakasidis. Artificial intelligence in physical sciences: Symbolic regression trends and perspectives: D. angelis et al.Archives of Computational Methods in Engineering, 30(6):3845–3865, 2023

  29. [29]

    Hayden Schaeffer. Learning partial differential equations via data discovery and sparse optimization.Pro- ceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473(2197):20160446, 2017

  30. [30]

    Sheng Zhang and Guang Lin. Robust data-driven discovery of governing physical laws with error bars.Pro- ceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 474(2217):20180305, 2018

  31. [31]

    Automated reverse engineering of nonlinear dynamical systems.Proceed- ings of the National Academy of Sciences, 104(24):9943–9948, 2007

    Josh Bongard and Hod Lipson. Automated reverse engineering of nonlinear dynamical systems.Proceed- ings of the National Academy of Sciences, 104(24):9943–9948, 2007. Draft: June 2, 2026 23

  32. [32]

    Brunton, Joshua L

    Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the National Academy of Sciences, 113(15):3932–3937, 2016

  33. [33]

    Rudy, Steven L

    Samuel H. Rudy, Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Data-driven discovery of partial differential equations.Science Advances, 3(4):e1602614, 2017

  34. [34]

    Winkler, and Michael Affenzeller

    Gabriel Kronberger, Bogdan Burlacu, Michael Kommenda, Stephan M. Winkler, and Michael Affenzeller. Symbolic Regression. Chapman and Hall/CRC, 2024

  35. [35]

    Springer, 2013

    Rick Riolo.Genetic programming theory and practice X. Springer, 2013

  36. [36]

    The science of brute force.Communications of the ACM, 60(8):70– 79, 2017

    Marijn JH Heule and Oliver Kullmann. The science of brute force.Communications of the ACM, 60(8):70– 79, 2017

  37. [37]

    Kaptanoglu, Brian M

    Alan A. Kaptanoglu, Brian M. de Silva, Urban Fasel, Kadierdan Kaheman, Andy J. Goldschmidt, Jared L. Callaham, Charles B. Delahunt, Zachary G. Nicolaou, Kathleen Champion, Jean-Christophe Loiseau, J. Nathan Kutz, and Steven L. Brunton. PySINDy: A comprehensive python package for robust sparse system identification.Journal of Open Source Software, 7(69):3994, 2022

  38. [38]

    de Franc ¸a, Marco Virgolin, Ying Jin, Michael Kommenda, and Jason H

    William La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabr ´ıcio O. de Franc ¸a, Marco Virgolin, Ying Jin, Michael Kommenda, and Jason H. Moore. Contemporary symbolic regression methods and their rela- tive performance. InThirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021) Datasets and Benchmarks Track, 2021

  39. [39]

    Petersen, Mikel Landajuela, T

    Brenden K. Petersen, Mikel Landajuela, T. Nathan Mundhenk, Claudio P. Santiago, Soo K. Kim, and Joanne T. Kim. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients, 2021

  40. [40]

    L. S. Keren, A. Liberzon, and T. Lazebnik. A computational framework for physics-informed symbolic regression with straightforward integration of domain knowledge.Scientific Reports, 13, 2023

  41. [41]

    Orzechowski, W

    P. Orzechowski, W. L. Cava, and J. H. Moore. Where are we now?Proceedings of the Genetic and Evolutionary Computation Conference, 2018

  42. [42]

    Virgolin, T

    M. Virgolin, T. Alderliesten, C. Witteveen, and P. A. N. Bosman. Improving model-based genetic program- ming for symbolic regression of small expressions.Evolutionary Computation, 29:211–237, 2021

  43. [43]

    Genetic programming in python, with a scikit-learn inspired api: gplearn.Documen- tation at https://gplearn

    Trevor Stephens et al. Genetic programming in python, with a scikit-learn inspired api: gplearn.Documen- tation at https://gplearn. readthedocs. io/en/stable/intro. html, 2016

  44. [44]

    Gsr: A generalized symbolic regression approach

    Tony Tohme, Dehong Liu, and Kamal Youcef-Toumi. Gsr: A generalized symbolic regression approach. CoRR, abs/2205.15569, 2022

  45. [45]

    Santiago, Ignacio Aravena, Terrell Nathan Mundhenk, Garrett Mulcahy, and Brenden K

    Mikel Landajuela, Chak Shing Lee, Jiachen Yang, Ruben Glatt, Cl ´audio P. Santiago, Ignacio Aravena, Terrell Nathan Mundhenk, Garrett Mulcahy, and Brenden K. Petersen. A unified framework for deep symbolic regression. InAdvances in Neural Information Processing Systems 35 (NeurIPS 2022), 2022

  46. [46]

    Vertical symbolic regression via deep policy gradient

    Nan Jiang, Md Nasim, and Yexiang Xue. Vertical symbolic regression via deep policy gradient. InPro- ceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI–24), pages 5891–5899, 2024

  47. [47]

    DEAP: Evolutionary algorithms made easy.Journal of Machine Learning Research, 13:2171– 2175, 2012

    F ´elix-Antoine Fortin, Franc ¸ois-Michel De Rainville, Marc-Andr´e Gardner, Marc Parizeau, and Christian Gagn´e. DEAP: Evolutionary algorithms made easy.Journal of Machine Learning Research, 13:2171– 2175, 2012

  48. [48]

    Interpretable machine learning for science with PySR and SymbolicRegression.jl, 2023

    Miles Cranmer. Interpretable machine learning for science with PySR and SymbolicRegression.jl, 2023. Draft: June 2, 2026 24

  49. [49]

    Discovering symbolic models from deep learning with inductive biases

    Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. Discovering symbolic models from deep learning with inductive biases. InProceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY , USA,

  50. [50]

    Curran Associates Inc

  51. [51]

    Rethinking symbolic regression datasets and benchmarks for scientific discovery.Journal of Data-centric Machine Learning Research, 2024

    Yoshitomo Matsubara, Naoya Chiba, Ryo Igarashi, and Yoshitaka Ushiku. Rethinking symbolic regression datasets and benchmarks for scientific discovery.Journal of Data-centric Machine Learning Research, 2024

  52. [52]

    Nathan Kutz, and Steven L

    Kathleen Champion, Bethany Lusch, J. Nathan Kutz, and Steven L. Brunton. Data-driven discovery of coordinates and governing equations.Proceedings of the National Academy of Sciences, 116(45):22445– 22451, 2019

  53. [53]

    Nathan Kutz, and Steven L

    Kadierdan Kaheman, J. Nathan Kutz, and Steven L. Brunton. SINDy-PI: A robust algorithm for parallel im- plicit sparse identification of nonlinear dynamics.Proceedings of the Royal Society A, 476(2242):20200279, 2020

  54. [54]

    Messenger and David M

    Daniel A. Messenger and David M. Bortz. Weak SINDy: Galerkin-Based data-driven model selection. Multiscale Modeling & Simulation, 19(3):1474–1497, 2021

  55. [55]

    Nathan Kutz, Bingni W

    Urban Fasel, J. Nathan Kutz, Bingni W. Brunton, and Steven L. Brunton. Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control.Proceedings of the Royal Society A, 478(2260):20210904, 2022

  56. [56]

    Weak-pde-learn: A weak form based approach to discovering pdes from noisy, limited data.Journal of Computational Physics, 506:112950, 2024

    Robert Stephany and Christopher Earls. Weak-pde-learn: A weak form based approach to discovering pdes from noisy, limited data.Journal of Computational Physics, 506:112950, 2024

  57. [57]

    Pde-learn: Using deep learning to discover partial differential equations from noisy, limited data.Neural Networks, 174:106242, 2024

    Robert Stephany and Christopher Earls. Pde-learn: Using deep learning to discover partial differential equations from noisy, limited data.Neural Networks, 174:106242, 2024

  58. [58]

    DeepMoD: Deep learning for model discovery in noisy data.Journal of Computational Physics, 428:109985, 2021

    Gert-Jan Both, Subham Choudhury, Pierre Sens, and Remy Kusters. DeepMoD: Deep learning for model discovery in noisy data.Journal of Computational Physics, 428:109985, 2021

  59. [59]

    WeakIdent: Weak formulation for identi- fying differential equation using narrow-fit and trimming.Journal of Computational Physics, 483:112069, 2023

    Mengyi Tang, Wenjing Liao, Rachel Kuske, and Sung Ha Kang. WeakIdent: Weak formulation for identi- fying differential equation using narrow-fit and trimming.Journal of Computational Physics, 483:112069, 2023

  60. [60]

    Weiss, Niru Maheswaranathan, and Surya Ganguli

    Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InProceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 2256–2265, 2015

  61. [61]

    Generative modeling by estimating gradients of the data distribution

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, volume 32, 2019

  62. [62]

    Understanding diffusion models: A unified perspective, 2022

    Calvin Luo. Understanding diffusion models: A unified perspective, 2022

  63. [63]

    MCVD: Masked conditional video diffu- sion for prediction, generation, and interpolation

    Vikram V oleti, Alexia Jolicoeur-Martineau, and Christopher Pal. MCVD: Masked conditional video diffu- sion for prediction, generation, and interpolation. InAdvances in Neural Information Processing Systems, volume 35, pages 23371–23385, 2022

  64. [64]

    Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

    Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

  65. [65]

    Brian D. O. Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applica- tions, 12(3):313–326, 1982. Draft: June 2, 2026 25

  66. [66]

    Estimation of non-normalized statistical models by score matching.Journal of Machine Learning Research, 6(24):695–709, 2005

    Aapo Hyv ¨arinen. Estimation of non-normalized statistical models by score matching.Journal of Machine Learning Research, 6(24):695–709, 2005

  67. [67]

    Improved denoising diffusion probabilistic models

    Alex Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 8162–8171, 2021

  68. [68]

    Diffusion models beat gans on image synthesis

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. InAdvances in Neural Information Processing Systems, volume 34, 2021

  69. [69]

    Classifier-free diffusion guidance, 2022

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance, 2022

  70. [70]

    Solving inverse problems in medical imaging with score-based generative models

    Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. Solving inverse problems in medical imaging with score-based generative models. InInternational Conference on Learning Representations, 2022

  71. [71]

    Denoising diffusion restoration models

    Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models. InAdvances in Neural Information Processing Systems, volume 35, 2022

  72. [72]

    Fleet, and Mohammad Norouzi

    Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. Image super-resolution via iterative refinement.IEEE Transactions on Pattern Analysis and Machine Intel- ligence, 45(4):4713–4726, 2023

  73. [73]

    Dif- fusion posterior sampling for general noisy inverse problems

    Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Dif- fusion posterior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representations, 2023

  74. [74]

    Stuart, and Anima Anandkumar

    Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew M. Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces.Journal of Machine Learning Research, 24(89):1–97, 2023

  75. [75]

    Andrew M. Stuart. Inverse problems: A bayesian perspective.Acta Numerica, 19:451–559, 2010

  76. [76]

    Society for Industrial and Applied Mathematics, 2016

    Mark Asch, Marc Bocquet, and Ma ¨elle Nodet.Data Assimilation: Methods, Algorithms, and Applications. Society for Industrial and Applied Mathematics, 2016

  77. [77]

    Score-based data assimilation

    Franc ¸ois Rozet and Gilles Louppe. Score-based data assimilation. In A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 40521–40541. Curran Associates, Inc., 2023

  78. [78]

    Turner, and Emile Mathieu

    Aliaksandra Shysheya, Cristiana Diaconu, Federico Bergamin, Paris Perdikaris, Jos ´e Miguel Hern ´andez- Lobato, Richard E. Turner, and Emile Mathieu. On conditional diffusion models for pde simulations, 2024

  79. [79]

    Diffusionpde: Generative pde-solving under partial observation, 2024

    Jiahe Huang, Guandao Yang, Zichen Wang, and Jeong Joon Park. Diffusionpde: Generative pde-solving under partial observation, 2024

  80. [80]

    Probabilistic weather forecasting with machine learning.Nature, 637(8044):84–90, 2025

    Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, et al. Probabilistic weather forecasting with machine learning.Nature, 637(8044):84–90, 2025

Showing first 80 references.