pith. sign in

arxiv: 2510.23166 · v4 · submitted 2025-10-27 · 💻 cs.CE · physics.comp-ph

Common Task Framework For a Critical Evaluation of Scientific Machine Learning Algorithms

Pith reviewed 2026-05-18 03:54 UTC · model grok-4.3

classification 💻 cs.CE physics.comp-ph
keywords scientific machine learningcommon task frameworkbenchmarkingreproducibilitynonlinear systemsforecastingstate reconstructionevaluation metrics
0
0 comments X

The pith

A Common Task Framework standardizes head-to-head evaluations of scientific machine learning algorithms on hidden test sets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a Common Task Framework for scientific machine learning to replace inconsistent ad hoc comparisons with structured evaluations. It includes curated datasets spanning forecasting, state reconstruction, and generalization under noise and limited data, along with task-specific metrics. The framework benchmarks methods on the Kuramoto-Sivashinsky and Lorenz systems to show how it reveals method strengths and limitations for different problems. It also launches a competition on a global sea surface temperature dataset with a true holdout set to encourage community participation and higher standards for reproducibility.

Core claim

The central claim is that a Common Task Framework featuring a curated set of datasets and task-specific metrics for forecasting, state reconstruction, and generalization under realistic constraints provides a structured and rigorous foundation for head-to-head evaluation of diverse scientific machine learning algorithms, as illustrated by benchmarks on the Kuramoto-Sivashinsky and Lorenz systems and a planned competition on a real-world sea surface temperature dataset with hidden test data.

What carries the argument

The Common Task Framework, which supplies standardized datasets, metrics, and hidden test sets to enable objective comparisons across algorithms for scientific modeling tasks.

If this is right

  • Diverse algorithms can be compared directly on identical tasks and metrics instead of differing setups.
  • Method performance differences become visible across problem classes such as forecasting versus state reconstruction.
  • Reproducibility improves because evaluations rely on hidden test sets rather than self-reported results.
  • Community competitions around real-world datasets can accelerate engagement and shared progress.
  • Resource allocation in scientific machine learning research can be guided by objective benchmark outcomes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread use could shorten the time new papers spend establishing weak baselines by referencing the shared framework.
  • The approach might extend naturally to other domains such as biological or chemical modeling if similar curated tasks are added.
  • Over time the benchmark tasks would likely require updates to stay challenging and avoid methods overfitting to the initial set.
  • Adoption could be faster if the framework connects to existing public competition platforms for easier participation.

Load-bearing premise

That the community will accept the two chosen nonlinear dynamical systems and the sea surface temperature dataset with their metrics as representative enough to adopt the framework for genuine progress rather than tuning methods specifically to these tests.

What would settle it

A review of papers published after the framework release showing that the majority continue to report results on custom datasets and metrics without using the proposed standardized tasks or hidden test sets.

Figures

Figures reproduced from arXiv: 2510.23166 by Alexey Yermakov, Amy Sara Rude, David Zoro, Georg Maierhofer, Jan P. Williams, J. Nathan Kutz, Joe Germany, Joseph Bakarji, Judah Goldfeder, Matteo Tomasetto, Miles Cranmer, Philippe Martin Wyder, Stefano Riva, Yue Zhao.

Figure 1
Figure 1. Figure 1: The twelve-axis radar plot characterizes a method’s performance across all tasks on a [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The CTF Evaluation framework scores the performance of methods on (a) the Lorenz [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ranked average scores of each model on the KS and Lorenz Dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Top three performing models per metric on the (a) Lorenz and (b) KS dataset. The blue [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Architecture of the Deep Operator Network. The target field at the evaluation point [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Schematic of the Sparse Identification of Nonlinear Dynamics (SINDy) algorithm from [ [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Scheme of the Dynamic Mode Decomposition algorithm from [ [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Architecture of the Fourier Neural Operator from [49] [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Sample architecture of a Kolmogorov-Arnold Network with three layers of size [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗
read the original abstract

Machine learning (ML) is transforming modeling and control in the physical, engineering, and biological sciences. However, rapid development has outpaced the creation of standardized, objective benchmarks - leading to weak baselines, reporting bias, and inconsistent evaluations across methods. This undermines reproducibility, misguides resource allocation, and obscures scientific progress. To address this, we propose a Common Task Framework (CTF) for scientific machine learning. The CTF features a curated set of datasets and task-specific metrics spanning forecasting, state reconstruction, and generalization under realistic constraints, including noise and limited data. Inspired by the success of CTFs in fields like natural language processing and computer vision, our framework provides a structured, rigorous foundation for head-to-head evaluation of diverse algorithms. As a first step, we benchmark methods on two canonical nonlinear systems: Kuramoto-Sivashinsky and Lorenz. These results illustrate the utility of the CTF in revealing method strengths, limitations, and suitability for specific classes of problems and diverse objectives. Next, we are launching a competition around a global real world sea surface temperature dataset with a true holdout dataset to foster community engagement. Our long-term vision is to replace ad hoc comparisons with standardized evaluations on hidden test sets that raise the bar for rigor and reproducibility in scientific ML.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Common Task Framework (CTF) for scientific machine learning to address inconsistent evaluations and weak baselines. It defines curated datasets and task-specific metrics spanning forecasting, state reconstruction, and generalization under noise and limited data. Initial benchmarks on the Kuramoto-Sivashinsky and Lorenz systems illustrate differentiation among methods, while a competition on a global sea surface temperature dataset with a hidden test set is planned to promote standardized, reproducible head-to-head comparisons.

Significance. If adopted, the CTF could meaningfully raise evaluation standards in scientific ML by replacing ad hoc comparisons with hidden-test-set protocols, similar to established frameworks in NLP and computer vision. The initial benchmarks on canonical nonlinear systems demonstrate the framework's capacity to expose method strengths and limitations for specific problem classes and objectives.

major comments (2)
  1. [Dataset Selection and Task Definition] The central claim that the CTF supplies a rigorous, standardized foundation capable of replacing ad hoc comparisons (abstract and concluding sections) is load-bearing on community acceptance of the chosen tasks as representative. However, the manuscript provides no systematic argument showing how the Kuramoto-Sivashinsky, Lorenz, and sea surface temperature datasets cover the broader space of scientific ML challenges, such as high-dimensional PDEs, stiff systems, or multi-physics regimes.
  2. [Benchmarking Results] In the benchmarking sections on the Kuramoto-Sivashinsky and Lorenz systems, the description of data splits, exact metric definitions, and procedures to prevent post-hoc algorithm or hyperparameter selection is insufficient. Without these details it remains unclear whether the reported differentiation truly supports the framework's claimed rigor and reproducibility.
minor comments (2)
  1. [Abstract] The abstract states that a competition around the sea surface temperature dataset is being launched but does not specify timelines, access protocols for the hidden test set, or evaluation rules; adding these would improve clarity for potential participants.
  2. [Figures] Figure captions and legends in the benchmarking results should explicitly define all plotted metrics and error measures to allow readers to interpret the comparisons without reference to the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and positive evaluation of the potential impact of the proposed Common Task Framework. We address each major comment below and have prepared revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Dataset Selection and Task Definition] The central claim that the CTF supplies a rigorous, standardized foundation capable of replacing ad hoc comparisons (abstract and concluding sections) is load-bearing on community acceptance of the chosen tasks as representative. However, the manuscript provides no systematic argument showing how the Kuramoto-Sivashinsky, Lorenz, and sea surface temperature datasets cover the broader space of scientific ML challenges, such as high-dimensional PDEs, stiff systems, or multi-physics regimes.

    Authors: We agree that the manuscript would be strengthened by a more explicit rationale for the initial dataset choices. The Kuramoto-Sivashinsky and Lorenz systems were selected as canonical, well-studied examples of spatiotemporal chaos and low-dimensional chaotic dynamics, while the sea-surface-temperature dataset provides a real-world, high-dimensional forecasting task with a hidden test set. In the revised manuscript we will insert a new subsection that articulates the selection criteria (diversity of dynamical regimes, dimensionality, and task type) and explicitly acknowledges that these examples do not exhaustively cover stiff systems, multi-physics problems, or other high-dimensional PDE regimes. We will also outline how the CTF can be extended to such cases through future community contributions. revision: yes

  2. Referee: [Benchmarking Results] In the benchmarking sections on the Kuramoto-Sivashinsky and Lorenz systems, the description of data splits, exact metric definitions, and procedures to prevent post-hoc algorithm or hyperparameter selection is insufficient. Without these details it remains unclear whether the reported differentiation truly supports the framework's claimed rigor and reproducibility.

    Authors: We appreciate this observation. The current manuscript provides only high-level descriptions of the experimental setup. In the revision we will expand the relevant sections to include: (i) precise specifications of the data-generation procedure, temporal or spatial train/validation/test splits, and any noise-injection protocols; (ii) the exact mathematical definitions of all reported metrics; and (iii) a clear statement of the hyperparameter-selection protocol, including whether fixed literature values, grid search with held-out validation, or other safeguards against post-hoc tuning were employed. These additions will make the benchmarking results fully reproducible and better substantiate the framework's claims. revision: yes

Circularity Check

0 steps flagged

No circularity: methodological proposal with independent benchmarks

full rationale

The paper proposes a Common Task Framework for evaluating scientific ML algorithms on curated datasets and metrics, with illustrative benchmarks on Kuramoto-Sivashinsky and Lorenz systems plus a planned SST competition. No equations, derivations, or predictions are present that reduce to self-defined quantities, fitted inputs renamed as outputs, or self-citation chains. The central claim is the framework itself as a new standard, inspired by external CTFs in NLP and CV rather than prior author work; benchmarks serve to demonstrate utility without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal rests on the domain assumption that standardized benchmarks will improve scientific progress and on the implicit premise that the chosen example systems and metrics capture the relevant difficulties in scientific ML. No free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Standardized, objective benchmarks reduce reporting bias and improve reproducibility in scientific machine learning.
    Invoked in the motivation section of the abstract as the core justification for the CTF.

pith-pipeline@v0.9.0 · 5807 in / 1324 out tokens · 30009 ms · 2026-05-18T03:54:02.875624+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · 1 internal anchor

  1. [1]

    A Dynamic Mode Decomposition Extension for the Forecasting of Parametric Dynamical Systems.SIAM Journal on Applied Dynamical Systems, 22(3):2432–2458, September 2023

    Francesco Andreuzzi, Nicola Demo, and Gianluigi Rozza. A Dynamic Mode Decomposition Extension for the Forecasting of Parametric Dynamical Systems.SIAM Journal on Applied Dynamical Systems, 22(3):2432–2458, September 2023

  2. [2]

    A dynamic mode decomposition extension for the forecasting of parametric dynamical systems.SIAM Journal on Applied Dynamical Systems, 22(3):2432–2458, 2023

    Francesco Andreuzzi, Nicola Demo, and Gianluigi Rozza. A dynamic mode decomposition extension for the forecasting of parametric dynamical systems.SIAM Journal on Applied Dynamical Systems, 22(3):2432–2458, 2023

  3. [3]

    Maddix, Hao Wang, Michael W

    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Olek- sandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the languag...

  4. [4]

    Variable projection methods for an optimized dynamic mode decompo- sition.SIAM Journal on Applied Dynamical Systems, 17(1):380–416, 2018

    Travis Askham and J Nathan Kutz. Variable projection methods for an optimized dynamic mode decompo- sition.SIAM Journal on Applied Dynamical Systems, 17(1):380–416, 2018

  5. [5]

    Nathan Kutz

    Travis Askham and J. Nathan Kutz. Variable projection methods for an optimized dynamic mode decom- position.SIAM Journal on Applied Dynamical Systems, 17(1):380–416, 2018

  6. [6]

    S. L. Brunton and N. J. Kutz.Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, USA, 2nd edition, 2022

  7. [7]

    Brunton, Marko Budiši´c, Eurika Kaiser, and J

    Steven L. Brunton, Marko Budiši´c, Eurika Kaiser, and J. Nathan Kutz. Modern Koopman Theory for Dynamical Systems.SIAM Review, 64(2):229–340, 2022

  8. [8]

    Brunton, Joshua L

    Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.Proceedings of the National Academy of Sciences, 113(15):3932–3937, 2016

  9. [9]

    Codbench: a critical evaluation of data-driven models for continuous dynamical systems.Digital Discovery, 3(6):1172– 1181, 2024

    Priyanshu Burark, Karn Tiwari, Meer Mehran Rashid, Prathosh AP, and NM Anoop Krishnan. Codbench: a critical evaluation of data-driven models for continuous dynamical systems.Digital Discovery, 3(6):1172– 1181, 2024

  10. [10]

    Nathan Kutz, and Steven L

    Kathleen Champion, Bethany Lusch, J. Nathan Kutz, and Steven L. Brunton. Data-driven discovery of coordinates and governing equations.Proceedings of the National Academy of Sciences, 116(45):22445– 22451, 2019

  11. [12]

    Biao Chen, Zheng Sheng, and Fei Cui. Refined short-term forecasting atmospheric temperature pro- files in the stratosphere based on operators learning of neural networks.Earth and Space Science, 11(4):e2024EA003509, 2024. e2024EA003509 2024EA003509

  12. [13]

    Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. InProceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 6572–6583, Red Hook, NY , USA, 2018. Curran Associates Inc

  13. [14]

    Tianping Chen and Hong Chen. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its applications to dynamic systems.Neural Networks, IEEE Transactions on, pages 911 – 917, 08 1995

  14. [15]

    Physics-informed learning of governing equations from scarce data

    Zhao Chen, Yang Liu, and Hao Sun. Physics-informed learning of governing equations from scarce data. Nature communications, 12(1):6136, 2021

  15. [16]

    Coelho, M

    C. Coelho, M. Fernanda P. Costa, and Luis L. Ferrás. Enhancing continuous time series modelling with a latent ode-lstm approach.Applied Mathematics and Computation, 475:128727, 2024

  16. [17]

    Kutz, and Steven Brunton

    Brian de Silva, Kathleen Champion, Markus Quade, Jean-Christophe Loiseau, J. Kutz, and Steven Brunton. Pysindy: A python package for the sparse identification of nonlinear dynamical systems from data.Journal of Open Source Software, 5(49):2104, May 2020

  17. [18]

    Pilco: A model-based and data-efficient approach to policy search

    Marc Peter Deisenroth and Carl Edward Rasmussen. Pilco: A model-based and data-efficient approach to policy search. InProceedings of the 28th International Conference on Machine Learning (ICML-11), pages 465–472, 2011. 11

  18. [19]

    PyDMD: Python Dynamic Mode Decomposition

    Nicola Demo, Marco Tezzele, and Gianluigi Rozza. PyDMD: Python Dynamic Mode Decomposition. Journal of Open Source Software, 3(22):530, 2018. Publisher: The Open Journal

  19. [20]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. IEEE, 2009

  20. [21]

    50 years of data science.Journal of Computational and Graphical Statistics, 26(4):745–766, 2017

    David Donoho. 50 years of data science.Journal of Computational and Graphical Statistics, 26(4):745–766, 2017

  21. [22]

    Packt Publishing Ltd, 2018

    Sayon Dutta.Reinforcement Learning with TensorFlow. Packt Publishing Ltd, 2018

  22. [23]

    Nathan Kutz

    Farbod Faraji, Maryam Reza, Aaron Knoll, and J. Nathan Kutz. Data-driven local operator finding for reduced-order modelling of plasma systems: II. Application to parametric dynamics, 2024. _eprint: 2403.01532

  23. [24]

    Nathan Kutz, Bingni W

    Urban Fasel, J. Nathan Kutz, Bingni W. Brunton, and Steven L. Brunton. Ensemble-sindy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 478(2260):20210904, 2022

  24. [25]

    Large language models are zero-shot time series forecasters.Advances in Neural Information Processing Systems, 36:19622–19635, 2023

    Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew G Wilson. Large language models are zero-shot time series forecasters.Advances in Neural Information Processing Systems, 36:19622–19635, 2023

  25. [26]

    Learning parametric koopman decom- positions for prediction and control.SIAM Journal on Applied Dynamical Systems, 24(1):744–781, 2025

    Yue Guo, Milan Korda, Ioannis G Kevrekidis, and Qianxiao Li. Learning parametric koopman decom- positions for prediction and control.SIAM Journal on Applied Dynamical Systems, 24(1):744–781, 2025

  26. [27]

    Gupta and Johannes Brandstetter

    Jayesh K Gupta and Johannes Brandstetter. Towards multi-spatiotemporal-scale generalized pde modeling. arXiv preprint arXiv:2209.15616, 2022

  27. [28]

    Predictions of transient vector solution fields with sequential deep operator network.Acta Mechanica, 235(8):5257–5272, June 2024

    Junyan He, Shashank Kushwaha, Jaewan Park, Seid Koric, Diab Abueidda, and Iwona Jasiuk. Predictions of transient vector solution fields with sequential deep operator network.Acta Mechanica, 235(8):5257–5272, June 2024

  28. [29]

    Sequen- tial deep operator networks (s-deeponet) for predicting full-field solutions under time-dependent loads

    Junyan He, Shashank Kushwaha, Jaewan Park, Seid Koric, Diab Abueidda, and Iwona Jasiuk. Sequen- tial deep operator networks (s-deeponet) for predicting full-field solutions under time-dependent loads. Engineering Applications of Artificial Intelligence, 127:107258, January 2024

  29. [30]

    Long short-term memory.Neural computation, 9(8):1735–1780, 1997

    Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.Neural computation, 9(8):1735–1780, 1997

  30. [31]

    Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, January 2025

    Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, January 2025

  31. [32]

    From tables to time: How tabpfn-v2 outperforms specialized time series forecasting models, 2025

    Shi Bin Hoo, Samuel Müller, David Salinas, and Frank Hutter. From tables to time: How tabpfn-v2 outperforms specialized time series forecasting models, 2025

  32. [33]

    Ichinaga, Francesco Andreuzzi, Nicola Demo, Marco Tezzele, Karl Lapo, Gianluigi Rozza, Steven L

    Sara M. Ichinaga, Francesco Andreuzzi, Nicola Demo, Marco Tezzele, Karl Lapo, Gianluigi Rozza, Steven L. Brunton, and J. Nathan Kutz. PyDMD: A Python package for robust dynamic mode decomposi- tion, 2024. _eprint: 2402.07463

  33. [34]

    Nathan Kutz, Steven L

    J. Nathan Kutz, Steven L. Brunton, Bingni W. Brunton, and Joshua L. Proctor.Dynamic Mode Decomposi- tion. Society for Industrial and Applied Mathematics (SIAM), 2016

  34. [35]

    echo state

    Herbert Jaeger. The "echo state" approach to analysing and training recurrent neural networks – with an Erratum note.GMD Report 148, 2001

  35. [36]

    the ‘echo state’ approach to analyzing and training recurrent neural networks

    Herbet Jaeger. "the ‘echo state’ approach to analyzing and training recurrent neural networks". Technical report, German National Research Center for Information Technology, Technical Report GMD 148, 2001

  36. [37]

    On the representations of continuous functions of many variables by superposition of continuous functions of one variable and addition

    Andrei Nikolaevich Kolmogorov. On the representations of continuous functions of many variables by superposition of continuous functions of one variable and addition. InDokl. Akad. Nauk USSR, volume 114, pages 953–956, 1957

  37. [38]

    Katiana Kontolati, Somdatta Goswami, George Em Karniadakis, and Michael D. Shields. Learning nonlinear operators in latent spaces for real-time predictions of complex dynamics in physical systems. Nature Communications, 15(1), June 2024. 12

  38. [39]

    Hamiltonian systems and transformation in hilbert space.Proceedings of the National Academy of Sciences, 17(5):315–318, 1931

    Bernard O Koopman. Hamiltonian systems and transformation in hilbert space.Proceedings of the National Academy of Sciences, 17(5):315–318, 1931

  39. [40]

    Dynamical systems of continuous spectra.Proceedings of the National Academy of Sciences, 18(3):255–263, 1932

    Bernard O Koopman and J von Neumann. Dynamical systems of continuous spectra.Proceedings of the National Academy of Sciences, 18(3):255–263, 1932

  40. [41]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. InAdvances in Neural Information Processing Systems, pages 1097–1105, 2012

  41. [42]

    Critical assess- ment of methods of protein structure prediction (casp)—round xv.Proteins: Structure, Function, and Bioinformatics, 91(12):1539–1549, 2023

    Andriy Kryshtafovych, Torsten Schwede, Maya Topf, Krzysztof Fidelis, and John Moult. Critical assess- ment of methods of protein structure prediction (casp)—round xv.Proteins: Structure, Function, and Bioinformatics, 91(12):1539–1549, 2023

  42. [43]

    Nathan Kutz, Peter Battaglia, Michael Brenner, Kevin Carlberg, Aric Hagberg, Shirley Ho, Stephan Hoyer, Henning Lange, Hod Lipson, Michael W

    J. Nathan Kutz, Peter Battaglia, Michael Brenner, Kevin Carlberg, Aric Hagberg, Shirley Ho, Stephan Hoyer, Henning Lange, Hod Lipson, Michael W. Mahoney, Frank Noe, Max Welling, Laure Zanna, Francis Zhu, and Steven L. Brunton. Accelerating scientific discovery with the common task framework, 2025

  43. [44]

    Artificial neural networks for solving ordinary and partial differential equations.IEEE transactions on neural networks, 9(5):987–1000, 1998

    Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations.IEEE transactions on neural networks, 9(5):987–1000, 1998

  44. [45]

    Panda: A pretrained forecast model for chaotic dynamics

    Jeffrey Lai, Anthony Bao, and William Gilpin. Panda: A pretrained forecast model for chaotic dynamics. arXiv preprint arXiv:2505.13755, 2025

  45. [47]

    Soledad Le Clainche and José M. Vega. Higher order dynamic mode decomposition.SIAM Journal on Applied Dynamical Systems, 16(2):882–925, 2017

  46. [48]

    A system for massively parallel hyperparameter tuning, 2020

    Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. A system for massively parallel hyperparameter tuning, 2020

  47. [49]

    Fourier neural operator for parametric partial differential equations

    Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Burigede liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021

  48. [50]

    Gonzalez, and Ion Stoica

    Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E. Gonzalez, and Ion Stoica. Tune: A research platform for distributed model selection and training, 2018

  49. [51]

    Guang Lin, Christian Moya, and Zecheng Zhang. Learning the dynamical response of nonlinear non- autonomous dynamical systems with deep operator neural networks.Engineering Applications of Artificial Intelligence, 125:106689, October 2023

  50. [52]

    Moirai-moe: Empowering time series foundation models with sparse mixture of experts, 2024

    Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yuxuan Liang, Roger Zimmermann, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Moirai-moe: Empowering time series foundation models with sparse mixture of experts, 2024

  51. [53]

    Sundial: A family of highly capable time series foundation models, 2025

    Yong Liu, Guo Qin, Zhiyuan Shi, Zhi Chen, Caiyin Yang, Xiangdong Huang, Jianmin Wang, and Mingsheng Long. Sundial: A family of highly capable time series foundation models, 2025

  52. [54]

    KAN: Kolmogorov-Arnold Networks

    Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljaˇci´c, Thomas Y Hou, and Max Tegmark. Kan: Kolmogorov-arnold networks.arXiv preprint arXiv:2404.19756, 2024

  53. [55]

    Hou, and Max Tegmark

    Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljaˇci´c, Thomas Y . Hou, and Max Tegmark. Kan: Kolmogorov-arnold networks, 2025

  54. [56]

    Learning nonlinear operators via deeponet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3(3):218–229, 2021

    Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3(3):218–229, 2021

  55. [57]

    A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data.Computer Methods in Applied Mechanics and Engineering, 393:114778, 2022

    Lu Lu, Xuhui Meng, Shengze Cai, Zhiping Mao, Somdatta Goswami, Zhongqiang Zhang, and George Em Karniadakis. A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data.Computer Methods in Applied Mechanics and Engineering, 393:114778, 2022

  56. [58]

    DeepXDE: A deep learning library for solving differential equations.SIAM Review, 63(1):208–228, 2021

    Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. DeepXDE: A deep learning library for solving differential equations.SIAM Review, 63(1):208–228, 2021

  57. [59]

    Lorenz inverse problem example – deepxde documentation

    Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. Lorenz inverse problem example – deepxde documentation. https://deepxde.readthedocs.io/en/latest/demos/pinn_inverse/ lorenz.inverse.html, 2021. 13

  58. [60]

    On the computational power of circuits of spiking neurons.Journal of Computer and System Sciences, 69(4):593–616, December 2004

    Wolfgang Maass and Henry Markram. On the computational power of circuits of spiking neurons.Journal of Computer and System Sciences, 69(4):593–616, December 2004

  59. [61]

    Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations.Neural Computation, 14(11):2531– 2560, November 2002

    Wolfgang Maass, Thomas Natschläger, and Henry Markram. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations.Neural Computation, 14(11):2531– 2560, November 2002

  60. [62]

    Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations.Nature Machine Intelligence, 6(10):1256–1269, Oct 2024

    Nick McGreivy and Ammar Hakim. Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations.Nature Machine Intelligence, 6(10):1256–1269, Oct 2024

  61. [63]

    Neural operator learning for long-time integration in dynamical systems with recurrent neural networks

    Katarzyna Michałowska, Somdatta Goswami, George Em Karniadakis, and Signe Riemer-Sørensen. Neural operator learning for long-time integration in dynamical systems with recurrent neural networks. In2024 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2024

  62. [64]

    Human-level control through deep reinforcement learning.Nature, 518(7540):529–533, 2015

    V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, et al. Human-level control through deep reinforcement learning.Nature, 518(7540):529–533, 2015

  63. [65]

    The well: a large-scale collection of diverse physics simulations for machine learning.Advances in Neural Information Processing Systems, 37:44989–45037, 2024

    Ruben Ohana, Michael McCabe, Lucas Meyer, Rudy Morel, Fruzsina Agocs, Miguel Beneitez, Marsha Berger, Blakesly Burkhart, Stuart Dalziel, Drummond Fielding, et al. The well: a large-scale collection of diverse physics simulations for machine learning.Advances in Neural Information Processing Systems, 37:44989–45037, 2024

  64. [66]

    de Silva, J

    Shaowu Pan, Eurika Kaiser, Brian M. de Silva, J. Nathan Kutz, and Steven L. Brunton. Pykoopman documentation.https://pykoopman.readthedocs.io/en/, 2023. Accessed: 2025-05-13

  65. [67]

    de Silva, J

    Shaowu Pan, Eurika Kaiser, Brian M. de Silva, J. Nathan Kutz, and Steven L. Brunton. PyKoopman: A Python Package for Data-Driven Approximation of the Koopman Operator.Journal of Open Source Software, 9(94):5881, 2024

  66. [68]

    Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach.Physical Review Letters, 120(2):024102, January 2018

    Jaideep Pathak, Brian Hunt, Michelle Girvan, Zhixin Lu, and Edward Ott. Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach.Physical Review Letters, 120(2):024102, January 2018. Publisher: American Physical Society

  67. [69]

    Platt, Stephen G

    Jason A. Platt, Stephen G. Penny, Timothy A. Smith, Tse-Chun Chen, and Henry D. I. Abarbanel. A systematic exploration of reservoir computing for forecasting complex spatiotemporal dynamics.Neural Networks, 153:530–552, September 2022

  68. [70]

    Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational physics, 378:686–707, 2019

  69. [71]

    Packt Publishing Ltd, 2018

    Sudharsan Ravichandiran.Hands-On Reinforcement Learning with Python. Packt Publishing Ltd, 2018

  70. [72]

    Rowley, Igor Mezi´c, Shervin Bagheri, Philipp Schlatter, and Dan S

    Clarence W. Rowley, Igor Mezi´c, Shervin Bagheri, Philipp Schlatter, and Dan S. Henningson. Spectral analysis of nonlinear flows.Journal of Fluid Mechanics, 641:115–127, December 2009. Publisher: Cambridge University Press

  71. [73]

    Nathan Kutz

    Diya Sashidhar and J. Nathan Kutz. Bagging, optimized dynamic mode decomposition for robust, stable forecasting with spatial and temporal uncertainty quantification.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 380(2229):20210199, 2022

  72. [74]

    P. J. Schmid. Dynamic Mode Decomposition of numerical and experimental data.Journal of Fluid Mechanics, 656:5–28, 2010. Publisher: Cambridge University Press

  73. [75]

    A general reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419):1140–1144, 2018

    David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419):1140–1144, 2018

  74. [76]

    Pdebench: An extensive benchmark for scientific machine learning.Advances in Neural Information Processing Systems, 35:1596–1611, 2022

    Makoto Takamoto, Timothy Praditia, Raphael Leiteritz, Daniel MacKinlay, Francesco Alesiani, Dirk Pflüger, and Mathias Niepert. Pdebench: An extensive benchmark for scientific machine learning.Advances in Neural Information Processing Systems, 35:1596–1611, 2022

  75. [77]

    Recent advances in physical reservoir computing: A review.Neural Networks, 115:100–123, July 2019

    Gouhei Tanaka, Toshiyuki Yamane, Jean Benoit Héroux, Ryosho Nakane, Naoki Kanazawa, Seiji Takeda, Hidetoshi Numata, Daiju Nakano, and Akira Hirose. Recent advances in physical reservoir computing: A review.Neural Networks, 115:100–123, July 2019

  76. [78]

    Mujoco: A physics engine for model-based control

    Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012. 14

  77. [79]

    Grandmaster level in starcraft ii using multi-agent reinforcement learning.Nature, 575(7782):350–354, 2019

    Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning.Nature, 575(7782):350–354, 2019

  78. [80]

    Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J

    Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, ˙Ilhan Polat, Yu Feng, Eric W. M...

  79. [81]

    P. R. Vlachas, J. Pathak, B. R. Hunt, T. P. Sapsis, M. Girvan, E. Ott, and P. Koumoutsakos. Backpropagation algorithms and Reservoir Computing in Recurrent Neural Networks for the forecasting of complex spatiotemporal dynamics.Neural Networks: The Official Journal of the International Neural Network Society, 126:191–217, June 2020

  80. [82]

    Fast pde-constrained optimization via self- supervised operator learning, 2021

    Sifan Wang, Mohamed Aziz Bhouri, and Paris Perdikaris. Fast pde-constrained optimization via self- supervised operator learning, 2021

Showing first 80 references.