pith. machine review for the scientific record. sign in

arxiv: 2605.09696 · v1 · submitted 2026-05-10 · 💻 cs.LG · cs.NE· cs.SC

Recognition: 2 theorem links

· Lean Theorem

Discovery of Nonlinear Dynamics with Automated Basis Function Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:18 UTC · model grok-4.3

classification 💻 cs.LG cs.NEcs.SC
keywords equation discoverySINDysymbolic regressionnonlinear dynamicssparse identificationbasis functionsautomated library construction
0
0 comments X

The pith

AutoSINDy discovers governing equations automatically by generating and curating basis functions from data without prior specification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

AutoSINDy is a hybrid method designed to discover the governing equations of nonlinear systems from observational data without requiring users to predefine candidate basis functions. It begins by running symbolic regression on multiple bootstrapped subsets of the data to generate a broad set of possible functional forms. These candidates are then processed through a curation step that breaks them down, removes redundant collinear terms, and assembles a compact yet complete library. This library is finally fed into the SINDy algorithm to select the sparsest set of terms that explain the observed dynamics. A reader would care because the approach maintains high accuracy and generalization even when data is noisy, while producing simpler models than either pure symbolic regression or SINDy with hand-crafted libraries.

Core claim

We present AutoSINDy, a hybrid Discovery-then-Solve framework that combines the exploratory power of symbolic regression with the robust sparsity-promoting capabilities of SINDy. Our method operates in three stages: (1) PySR-based symbolic regression discovers candidate functional forms from bootstrapped data chunks; (2) a curation pipeline decomposes, expands, and filters these expressions using collinearity analysis to construct a minimal yet comprehensive library; and (3) SINDy identifies sparse governing equations from this custom-tailored library. Extensive experiments across canonical nonlinear systems demonstrate that AutoSINDy consistently recovers ground-truth equations even under高觀

What carries the argument

The collinearity-based curation pipeline that decomposes symbolic regression candidate expressions, expands them, and filters redundancies to assemble a minimal yet comprehensive basis library for SINDy.

Load-bearing premise

The initial PySR stage on bootstrapped chunks will reliably surface candidate expressions containing the true governing terms, and the subsequent collinearity-based curation will retain all necessary functions without discarding critical ones.

What would settle it

A test on data from a known nonlinear system engineered so that high noise causes the symbolic regression proposals to systematically omit one true term, after which the curation step cannot recover it and the final SINDy model fails to match ground truth.

Figures

Figures reproduced from arXiv: 2605.09696 by Charles Nicholson, Mohammad Amin Basiri.

Figure 1
Figure 1. Figure 1: FIG. 1. Conceptual overview of [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. The AutoSINDy hybrid architecture for robust automated discovery of governing equations. [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Experimental protocol and an illustrative example. A trajectory is generated from a [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Overall reliability of the three methods, aggregated over all six dynamical systems and [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Aggregate noise robustness distributions summarizing performance across all six systems. [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. Per-system performance dashboard. Each column corresponds to one of the six benchmark [PITH_FULL_IMAGE:figures/full_fig_p032_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. Simulation failure analysis [PITH_FULL_IMAGE:figures/full_fig_p033_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. Model complexity (median canonical operator count across all noise levels and trials). [PITH_FULL_IMAGE:figures/full_fig_p034_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIG. 9. Scatter plots comparing canonical model complexity against prediction error (Derivative [PITH_FULL_IMAGE:figures/full_fig_p035_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: FIG. 10. Computation time distributions for discovery (left) and simulation (right). Both axes [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: FIG. 11. Effect of measurement noise on system observability. State responses ( [PITH_FULL_IMAGE:figures/full_fig_p047_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: FIG. 12. Identification results for the [PITH_FULL_IMAGE:figures/full_fig_p048_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: FIG. 13. Identification results for the [PITH_FULL_IMAGE:figures/full_fig_p049_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: FIG. 14. Identification results for the [PITH_FULL_IMAGE:figures/full_fig_p050_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: FIG. 15. Identification results for the [PITH_FULL_IMAGE:figures/full_fig_p051_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: FIG. 16. Identification results for the [PITH_FULL_IMAGE:figures/full_fig_p052_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: FIG. 17. Identification results for the [PITH_FULL_IMAGE:figures/full_fig_p053_17.png] view at source ↗
read the original abstract

Discovering governing equations from observational data remains a fundamental challenge in scientific modeling, particularly when the underlying mathematical structure is unknown. Traditional sparse identification methods like SINDy excel at discovering parsimonious models but require researchers to specify candidate basis functions a priori, a limitation that often leads to model failure when critical terms are omitted or when systems exhibit unconventional dynamics. Purely symbolic regression approaches offer unlimited flexibility but struggle with noise sensitivity and frequently produce overly complex, unstable equations. We present AutoSINDy, a hybrid Discovery-then-Solve framework that combines the exploratory power of symbolic regression with the robust sparsity-promoting capabilities of SINDy. Our method operates in three stages: (1) PySR-based symbolic regression discovers candidate functional forms from bootstrapped data chunks; (2) a curation pipeline decomposes, expands, and filters these expressions using collinearity analysis to construct a minimal yet comprehensive library; and (3) SINDy identifies sparse governing equations from this custom-tailored library. Extensive experiments across canonical nonlinear systems demonstrate that AutoSINDy consistently recovers ground-truth equations even under high observational noise, achieving a ground-truth recovery rate of 92.8% across all trials. Compared with standard SINDy using enriched libraries and standalone symbolic regression, AutoSINDy achieves higher predictive accuracy, superior generalization to unseen trajectories, and substantially lower symbolic complexity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces AutoSINDy, a three-stage hybrid framework for discovering governing equations from noisy observational data. Stage 1 applies PySR symbolic regression to bootstrapped data chunks to generate candidate functional forms; Stage 2 curates these into a minimal library via decomposition, expansion, and collinearity filtering; Stage 3 runs SINDy on the resulting library to recover sparse dynamics. The central empirical claim is a 92.8% ground-truth recovery rate across trials on canonical nonlinear systems, even under high observational noise, together with improved predictive accuracy, generalization to unseen trajectories, and lower symbolic complexity relative to enriched-library SINDy and standalone symbolic regression.

Significance. If the performance claims are substantiated, AutoSINDy would meaningfully address a long-standing practical limitation of SINDy by automating library construction while retaining its sparsity-promoting advantages. This hybrid approach could broaden the applicability of data-driven discovery to systems whose functional forms are not known a priori, provided the method proves robust beyond the reported benchmarks.

major comments (3)
  1. Abstract and experimental section: The headline 92.8% ground-truth recovery rate is stated without accompanying quantitative details on the number of trials, exact noise levels, library sizes, statistical significance testing, or failure-mode analysis. Because the entire performance advantage rests on this figure, the absence of these specifics prevents verification of the claim.
  2. §3.1 (PySR stage on bootstrapped chunks): The method's success depends on PySR reliably surfacing expressions that contain the true governing terms from noisy bootstrapped data. No sensitivity analysis is provided with respect to noise amplitude, chunk size, or random initialization, despite the known sensitivity of symbolic regression to these factors; this is a load-bearing assumption for the reported recovery rate.
  3. §3.2 (curation pipeline): The collinearity-based filtering step must retain all dynamically essential nonlinear terms. The manuscript does not examine or report cases in which algebraically necessary functions become linearly dependent in the sampled data and are therefore pruned, which would eliminate any pathway for SINDy to recover the correct model.
minor comments (2)
  1. The abstract would be strengthened by briefly naming the specific canonical systems tested and the precise noise levels at which the 92.8% figure was obtained.
  2. Notation for the candidate library matrix and coefficient vector should be introduced once and used consistently in all subsequent sections and equations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major point below and indicate the revisions made to strengthen the manuscript.

read point-by-point responses
  1. Referee: Abstract and experimental section: The headline 92.8% ground-truth recovery rate is stated without accompanying quantitative details on the number of trials, exact noise levels, library sizes, statistical significance testing, or failure-mode analysis. Because the entire performance advantage rests on this figure, the absence of these specifics prevents verification of the claim.

    Authors: We agree that the headline recovery rate requires supporting quantitative details for proper evaluation. In the revised manuscript we have expanded the experimental results section to explicitly state the number of trials (100 independent runs per dynamical system), the exact noise amplitudes tested (additive Gaussian noise at 0%, 5%, 10%, and 20% of signal standard deviation), the typical library sizes before and after curation, and a failure-mode breakdown. We have also added McNemar’s test results comparing recovery rates against the baseline methods. revision: yes

  2. Referee: §3.1 (PySR stage on bootstrapped chunks): The method's success depends on PySR reliably surfacing expressions that contain the true governing terms from noisy bootstrapped data. No sensitivity analysis is provided with respect to noise amplitude, chunk size, or random initialization, despite the known sensitivity of symbolic regression to these factors; this is a load-bearing assumption for the reported recovery rate.

    Authors: The referee correctly notes the absence of a dedicated sensitivity study for the PySR stage. While the main experiments already vary noise amplitude, we have added a new subsection and accompanying figure that reports recovery rates for chunk sizes of 50, 100, and 200 points and across five random PySR initializations. Recovery remains above 85% within these ranges; performance degrades only for extremely small chunks or noise levels exceeding those reported in the primary experiments. A full hyperparameter grid search is beyond the current scope but is not required to support the central claims. revision: partial

  3. Referee: §3.2 (curation pipeline): The collinearity-based filtering step must retain all dynamically essential nonlinear terms. The manuscript does not examine or report cases in which algebraically necessary functions become linearly dependent in the sampled data and are therefore pruned, which would eliminate any pathway for SINDy to recover the correct model.

    Authors: This is a substantive concern. The revised §3.2 now includes a post-hoc verification: for every successful trial we inspected the collinearity matrix after filtering and confirmed that all ground-truth terms were retained. In the small number of failure cases, the cause was failure of PySR to surface the necessary expressions rather than pruning of essential terms. We have also added a brief discussion of how the decomposition-plus-expansion step reduces the likelihood of linear dependence among dynamically relevant functions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical algorithm validated on benchmarks with no self-referential derivation

full rationale

The paper presents AutoSINDy as a three-stage algorithmic pipeline (PySR on bootstrapped chunks, collinearity-based curation, then SINDy) whose performance is measured via experimental recovery rates (92.8% ground-truth recovery) on canonical nonlinear systems. No mathematical derivation chain exists that reduces a claimed result to its own fitted inputs or self-citations by construction. The recovery statistics are externally falsifiable outcomes of running the procedure on held-out test trajectories, not quantities forced by internal fitting or renaming. External methods (PySR, SINDy) are cited as independent components without load-bearing self-citation chains. This is the standard case of a self-contained empirical method paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard SINDy sparsity and library-completeness assumptions plus the unstated premise that PySR will surface useful candidates; no new free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Governing equations can be expressed as a sparse linear combination of candidate basis functions drawn from a sufficiently rich library
    Core premise inherited from the SINDy framework and invoked in stage 3.

pith-pipeline@v0.9.0 · 5537 in / 1367 out tokens · 32263 ms · 2026-05-12T04:18:42.562926+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

  1. [1]

    Each chunk is constructed by drawing, without replacement, an index setIk of sizec=⌊m train/d⌋from the training samples: Dk = (xi, ˙xi) :i∈ I k , k= 1,

    Chunked Stochastic Sampling Rather than applying symbolic regression to the full training trajectory,AutoSINDy operates onKshort, randomly sampled windows of the data. Each chunk is constructed by drawing, without replacement, an index setIk of sizec=⌊m train/d⌋from the training samples: Dk = (xi, ˙xi) :i∈ I k , k= 1, . . . , K,(4) wheredis the chunk-size...

  2. [2]

    Nested trigonometric compositions (e.g.sin(sin(·))) are explicitly forbidden via nested constraints, preventing the evolutionary search from exploiting degenerate identities

    Symbolic Regression and Pareto Harvesting For each chunkD k and each target derivative˙xj,AutoSINDyinvokesPySRwith a fixed operator vocabulary comprising the binary operators and the unary operators. Nested trigonometric compositions (e.g.sin(sin(·))) are explicitly forbidden via nested constraints, preventing the evolutionary search from exploiting degen...

  3. [3]

    All numeric con- stant prefactors are stripped, sinceSINDylearns all coefficients independently during the regression step

    Step 2a: Symbolic Decomposition Each expressione∈ E raw is parsed via SymPy and decomposed into its additive sub-terms (atoms): Ae = a:ais an additive atom ofe .(6) For example, the expression3x 2 0 + 2 sin(x1)yields atoms{x 2 0,sin(x 1)}. All numeric con- stant prefactors are stripped, sinceSINDylearns all coefficients independently during the regression...

  4. [4]

    This maximizes coverage of monomial basis functions at the cost of a larger intermediate pool

    Step 2b: Algebraic Expansion Compound atomic terms may contain grouped polynomial sub-expressions whose mono- mials are useful basis functions in their own right.AutoSINDysupports three configurable expansion strategies applied to each atom via SymPy’s algebraic and trigonometric expan- sion routines: •Severe:Fullpolynomialandtrigonometricexpansion.(x 0+x...

  5. [5]

    LetL=∅be the accepted library, initially empty

    Step 2c: Collinearity Pruning with Simplicity Bias Multicollinearity among library terms is the primary driver of coefficient instability in sparse regression: near-linearly dependent columns inflate coefficient variance and frustrate the sparsity optimizer.AutoSINDyremoves such redundancy through a greedy forward- selection procedure that processes the c...

  6. [6]

    Evaluatev i =a i(Xtrain)∈R mtrain

  7. [7]

    Discarda i ifstd(v i)< ε(constantornumericallydegenerateoverthetrainingdomain)

  8. [8]

    Accepta i if and only if corr vi,v j < ρ max ∀a j ∈ L,(7) whereρ max is the collinearity threshold

    Otherwise, compute the pairwise Pearson correlation betweenvi and every already- accepted featurev j =a j(Xtrain),a j ∈ L. Accepta i if and only if corr vi,v j < ρ max ∀a j ∈ L,(7) whereρ max is the collinearity threshold

  9. [9]

    15 The complexity-ordered traversal guarantees the simplicity bias: a complex atomai is re- tainedonlyif it provides information that no simpler already-accepted term can replicate

    If accepted, appendai toL. 15 The complexity-ordered traversal guarantees the simplicity bias: a complex atomai is re- tainedonlyif it provides information that no simpler already-accepted term can replicate. This directly implements Occam’s razor at the library level, independently of the down- stream sparse optimizer. An alternative pruning criterion ba...

  10. [10]

    , ℓq}is augmented with a constant bias term to form a library ofp=q+ 1functions

    Library Matrix Construction The curated libraryL={ℓ 1, . . . , ℓq}is augmented with a constant bias term to form a library ofp=q+ 1functions. The library matrix evaluated over the training data is Θ(Xtrain) = 1, ℓ 1(Xtrain), . . . , ℓ q(Xtrain) ∈R mtrain×(q+1),(9) where each column is the function evaluated point-wise over all training observations. 17

  11. [11]

    Sparse Regression via STLSQ or SR3 The sparse coefficient matrixˆΞis obtained by solving theℓ0-penalized regression problem ˆΞ= arg min Ξ ˙Xtrain −Θ(X train)Ξ 2 F +λ∥Ξ∥ 0,(10) where∥ · ∥ F is the Frobenius norm. Two sparse optimizers are supported.Sequential Thresholded Least Squares (STLSQ)[9] is the default: it alternates between a ridge-regularized lea...

  12. [12]

    A total ofBbootstrap replicates are drawn from the training set, and an independent sparse model is fitted on each replicate

    Bootstrap Ensemble with Inclusion-Probability Masking To suppress library terms that are selected spuriously due to noise,AutoSINDywraps the chosen optimizer inside a bootstrap ensemble following theE-SINDyparadigm [19]. A total ofBbootstrap replicates are drawn from the training set, and an independent sparse model is fitted on each replicate. The empiri...

  13. [13]

    Unified vs. Per-Variable Library Strategies AutoSINDysupports two strategies for library construction and fitting, offering a trade- off between cross-equation consistency and per-equation specificity. a. Separate libraries (default).For each target derivative˙x j, Stage 1 runs PySR in- dependently and Stage 2 curates a variable-specific libraryLj. A sepa...

  14. [14]

    Scores on the test segment of the training trajectory serve as a secondary check for overfitting

    Derivative Prediction Accuracy The coefficient of determinationR2 and mean squared error (MSE) between the true time derivatives and the model’s instantaneous predictions are computed on both the held-out test segment of the training trajectory and the independent clean validation trajectory: R2 = 1− ˙X− ˆ˙X 2 F ˙X− ¯˙X 2 F ,MSE = 1 N n ˙X− ˆ˙X 2 F .(14) ...

  15. [15]

    This strict binary threshold acts as a conservative proxy forcorrect structural identification

    Equation Recovery Rate For each system and noise level, we report the fraction of trials (across seeds) in which the discovered model achievesR2 >0.99on the clean validation trajectory. This strict binary threshold acts as a conservative proxy forcorrect structural identification. The threshold is motivated empirically: a model with an incorrect functiona...

  16. [16]

    Simulated trajecto- ries are compared against the corresponding noise-free ground-truth trajectory, withR2 and MSE computed over the state variables

    Long-Horizon Simulation Stability The discovered governing equations are numerically integrated forward from the valida- tion trajectory’s initial condition using SciPy’s stiff Radau ODE solver. Simulated trajecto- ries are compared against the corresponding noise-free ground-truth trajectory, withR2 and MSE computed over the state variables. Integration ...

  17. [17]

    Symbolic Parsimony We measure equation complexity as the total operator count of the fully algebraically ex- pandedgoverningequations, computedviaSymPy’scount_opsafterapplyingsympy.expand(). Formally, for a discovered equationˆfwith canonical (expanded) form ˜f= expand( ˆf), we define C( ˆf) =count_ops ˜f ,(15) wherecount_opstallies every arithmetic and t...

  18. [18]

    Computational Efficiency We report wall-clock timing separately for two stages to allow independent interpretation:

  19. [19]

    This is the time a practitioner must wait before a governing equation is available

    Discovery time: the total elapsed time for Stage 1 (PySRacross all chunks and state variables), Stage 2 (curation), and Stage 3 (ensembleSINDyfitting). This is the time a practitioner must wait before a governing equation is available

  20. [20]

    Simulation time: the time required to numerically integrate the discovered governing equations forward over the simulation horizon using the Radau solver. This reflects the practical cost of using the equation for downstream prediction or control, and is directly affected by equation complexity: bloated equations with many terms produce stiffer ODE system...

  21. [21]

    Its bistable potential (α <0,β >0) concentrates trajectory density near the two equilibria, where the derivative ofx3 0 is largest and most sensitive to noise

    exposes a fundamental boundary of the current framework. Its bistable potential (α <0,β >0) concentrates trajectory density near the two equilibria, where the derivative ofx3 0 is largest and most sensitive to noise. Under moderate noise, all three methods struggle:AutoSINDyachieves 70% recovery (vs. 0% for Standard SINDy and 17% for PySR), but the remain...

  22. [22]

    Camps-Valls, A

    G. Camps-Valls, A. Gerhardus, U. Ninad, G. Varando, G. Martius, E. Balaguer-Ballester, R. Vinuesa, E. Diaz, L. Zanna, and J. Runge, Discovering causal relations and equations from data (2023), arXiv:2305.13341 [physics.data-an]

  23. [23]

    H. Wang, T. Fu, Y. Du, W. Gao, K. Huang, Z. Liu, P. Chandak, S. Liu, P. Van Katwyk, A. Deac, A. Anandkumar, K. Bergen, C. P. Gomes, S. Ho, P. Kohli, J. Lasenby, J. Leskovec, T.-Y. Liu, A. Manrai, D. Marks, B. Ramsundar, L. Song, J. Sun, J. Tang, P. Veličković, M. Welling, L. Zhang, C. W. Coley, Y. Bengio, and M. Zitnik, Scientific discovery in the age of ...

  24. [24]

    J. N. Kutz, P. Battaglia, M. Brenner, K. Carlberg, A. Hagberg, S. Ho, S. Hoyer, H. Lange, H. Lipson, M. W. Mahoney, F. Noe, M. Welling, L. Zanna, F. Zhu, and S. L. Brunton, Acceler- ating scientific discovery with the common task framework (2025), arXiv:2511.04001 [cs.LG]

  25. [25]

    J. L. Callaham, S. L. Brunton, and J.-C. Loiseau, On the role of nonlinear correlations in reduced-order modelling, Journal of Fluid Mechanics938, A1 (2022)

  26. [26]

    E. P. Alves and F. Fiuza, Data-driven discovery of reduced plasma physics models from fully kinetic simulations, Phys. Rev. Res.4, 033192 (2022)

  27. [27]

    Zanna and T

    L. Zanna and T. Bolton, Data-driven equation discovery of ocean mesoscale closures, Geo- physical Research Letters47, e2020GL088376 (2020), e2020GL088376 10.1029/2020GL088376, https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2020GL088376

  28. [28]

    Brenner, F

    M. Brenner, F. Hess, J. M. Mikhaeil, L. Bereska, Z. Monfared, P.-C. Kuo, and D. Durste- witz, Tractable dendritic rnns for reconstructing nonlinear dynamical systems (2022), arXiv:2207.02542 [cs.LG]

  29. [29]

    Métayer, A

    C. Métayer, A. Ballesta, and J. Martinelli, Data-driven discovery of digital twins in biomedical research, Briefings in Bioinformatics27, bbaf722 (2026), https://academic.oup.com/bib/article-pdf/27/1/bbaf722/66846059/bbaf722.pdf

  30. [30]

    S. L. Brunton, J. L. Proctor, and J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proceedings of the National Academy of Sciences113, 3932 (2016), https://www.pnas.org/doi/pdf/10.1073/pnas.1517384113. 43

  31. [31]

    S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz, Data-driven dis- covery of partial differential equations, Science Advances3, e1602614 (2017), https://www.science.org/doi/pdf/10.1126/sciadv.1602614

  32. [32]

    B. M. de Silva, K. Champion, M. Quade, J.-C. Loiseau, J. N. Kutz, and S. L. Brunton, Pysindy: A python package for the sparse identification of nonlinear dynamical systems from data, Journal of Open Source Software5, 2104 (2020)

  33. [33]

    A. A. Kaptanoglu, B. M. de Silva, U. Fasel, K. Kaheman, A. J. Goldschmidt, J. Callaham, C. B. Delahunt, Z. G. Nicolaou, K. Champion, J.-C. Loiseau, J. N. Kutz, and S. L. Brunton, Pysindy: A comprehensive python package for robust sparse system identification, Journal of Open Source Software7, 3994 (2022)

  34. [34]

    S. L. Brunton, J. L. Proctor, and J. N. Kutz, Sparse identification of nonlinear dynamics with control (SINDYc), IFAC-PapersOnLine49, 710 (2016), 10th IFAC Symposium on Nonlinear Control Systems NOLCOS 2016

  35. [35]

    Kaheman, J

    K. Kaheman, J. N. Kutz, and S. L. Brunton, Sindy-pi: a robust algo- rithm for parallel implicit sparse identification of nonlinear dynamics, Pro- ceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences476, 20200279 (2020), https://royalsocietypublishing.org/rspa/article- pdf/doi/10.1098/rspa.2020.0279/638633/rspa.2020.0279.pdf

  36. [36]

    Champion, B

    K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton, Data-driven discovery of coordinates and governing equations, Proceedings of the National Academy of Sciences116, 22445 (2019), https://www.pnas.org/doi/pdf/10.1073/pnas.1906995116

  37. [37]

    Champion, P

    K. Champion, P. Zheng, A. Y. Aravkin, S. L. Brunton, and J. N. Kutz, A unified sparse op- timization framework to learn parsimonious physics-informed models from data, IEEE Access 8, 169259 (2020)

  38. [38]

    D. A. Messenger and D. M. Bortz, Weak sindy for partial differential equations, Journal of Computational Physics443, 110525 (2021)

  39. [39]

    N. M. Mangan, J. N. Kutz, S. L. Brunton, and J. L. Proctor, Model se- lection for dynamical systems via sparse regression and information criteria, Proceedings of the Royal Society A: Mathematical, Physical and Engineer- ing Sciences473, 20170009 (2017), https://royalsocietypublishing.org/rspa/article- pdf/doi/10.1098/rspa.2017.0009/365284/rspa.2017.0009.pdf. 44

  40. [40]

    Fasel, J

    U. Fasel, J. N. Kutz, B. W. Brunton, and S. L. Brunton, Ensemble-sindy: Ro- bust sparse model discovery in the low-data, high-noise limit, with active learning and control, Proceedings of the Royal Society A: Mathematical, Physical and En- gineering Sciences478, 20210904 (2022), https://royalsocietypublishing.org/rspa/article- pdf/doi/10.1098/rspa.2021.09...

  41. [41]

    J. R. Koza,Genetic Programming: On the Programming of Computers by Means of Natural Selection(MIT Press, Cambridge, MA, 1992)

  42. [42]

    Science324(5923), 81–85 (2009).https://doi.org/10.1126/science.1165893

    M. Schmidt and H. Lipson, Distilling free-form natural laws from experimental data, Science 324, 81 (2009), https://www.science.org/doi/pdf/10.1126/science.1165893

  43. [43]

    Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

    M. Cranmer, Interpretable machine learning for science with pysr and symbolicregression.jl (2023), arXiv:2305.01582 [astro-ph.IM]

  44. [44]

    B. R. Brum, L. Lober, I. Previdelli, and F. A. Rodrigues, Discovering equations from data: symbolic regression in dynamical systems, Journal of Physics: Complexity7, 012001 (2026)

  45. [45]

    Dong, Y.-L

    X. Dong, Y.-L. Bai, Y. Lu, and M. Fan, An improved sparse identification of nonlinear dy- namics with akaike information criterion and group sparsity, Nonlinear Dynamics111, 1485 (2023)

  46. [46]

    S. M. Hirsh, D. A. Barajas-Solano, and J. N. Kutz, Sparsifying priors for bayesian uncertainty quantification in model discovery, Royal Society Open Science9, 211823 (2022), https://royalsocietypublishing.org/rsos/article- pdf/doi/10.1098/rsos.211823/994951/rsos.211823.pdf

  47. [47]

    Virgolin and S

    M. Virgolin and S. P. Pissis, Symbolic regression is np-hard (2022), arXiv:2207.01018 [cs.NE]

  48. [48]

    S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser, and J. N. Kutz, Chaos as an inter- mittently forced linear system, Nature Communications8, 19 (2017)

  49. [49]

    S. H. Rudy, J. Nathan Kutz, and S. L. Brunton, Deep learning of dynamics and signal-noise decomposition with time-stepping constraints, Journal of Computational Physics396, 483 (2019)

  50. [50]

    Raissi, P

    M. Raissi, P. Perdikaris, and G. Karniadakis, Physics-informed neural networks: A deep learn- ing framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics378, 686 (2019)

  51. [51]

    R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, Neural ordinary differential equations, inAdvances in Neural Information Processing Systems, Vol. 31, edited by S. Bengio, 45 H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Curran Associates, Inc., 2018)

  52. [52]

    d’Ascoli, S

    S. d’Ascoli, S. Becker, A. Mathis, P. Schwaller, and N. Kilbertus, Odeformer: Symbolic regres- sion of dynamical systems with transformers (2023), arXiv:2310.05573 [cs.LG]

  53. [53]

    Meurer, C

    A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K. Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger, R. P. Muller, F. Bonazzi, H. Gupta, S. Vats, F. Johansson, F. Pedregosa, M. J. Curry, A. R. Terrel, v. Roučka, A. Saboo, I. Fernando, S. Kulal, R. Cimrman, and A. Scopatz, Sympy: symbolic computing ...

  54. [54]

    at σ= 0.05. This is the most challenging system for all methods due to the bistable potential and high derivative sensitivity near the equilibria.AutoSINDyrecovered the true dynamics and the phase portrait closely matches the true trajectory. 52 FIG. 17. Identification results for theComplex Lorenz system(˙x0 =σ(x 1 −x 0),˙x 1 =x 0(ρ− x2)−x 1,˙x 2 =x 0x1 ...