pith. sign in

arxiv: 2606.05371 · v1 · pith:Q7T5QZSXnew · submitted 2026-06-03 · 💻 cs.LG · cs.NA· math.NA· stat.ML

Mamba-Assisted Non-Markovian Closure for Reduced-Order Modeling

Pith reviewed 2026-06-28 07:14 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NAstat.ML
keywords reduced-order modelingnon-Markovian closureMambasequence modelingMori-Zwanzig formalismBurgers equationLorenz 96closure modeling
0
0 comments X

The pith

Mamba sequence models learn non-Markovian closure terms to stabilize reduced-order simulations of chaotic systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper recasts the memory-dependent closure term from the Mori-Zwanzig formalism as a sequence modeling task, where a Mamba state-space model is trained to predict the unresolved effects from the history of resolved variables. This learned closure is then inserted back into the reduced governing equations and advanced with a numerical integrator. The dual representation lets the model train efficiently on full trajectories in convolutional mode and run at constant cost per step in recurrent mode. Tests on the viscous Burgers equation and the two-scale Lorenz 96 system show the resulting simulations remain accurate and stable over longer times than Markovian closures, GRU sequence models, or the Wilks method.

Core claim

The Mamba-Assisted Closure framework trains a Mamba model on resolved trajectories to predict the non-Markovian closure term, then couples those predictions into the reduced-order equations via numerical integration; the convolutional form enables efficient long-trajectory training while the recurrent form supports stable autoregressive rollout.

What carries the argument

Mamba state-space model that maps resolved trajectories to the closure term, using its convolutional representation for training and recurrent representation for inference.

If this is right

  • Higher predictive accuracy than Markovian reduced-order models on both Burgers and Lorenz 96 test cases.
  • Longer stable rollouts than GRU-based sequence models and the Wilks method.
  • Constant per-step cost during deployment because of the recurrent inference mode.
  • Efficient training on extended trajectories via the convolutional training mode.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could be applied to other systems with strong memory effects such as turbulent flows or climate models.
  • Coarser spatial discretizations might become viable if the learned closure compensates for the missing scales.
  • Hybrid models combining Mamba closures with existing physics-based reduced equations could be tested on real observational data.

Load-bearing premise

A Mamba model trained on resolved trajectories can accurately predict the non-Markovian closure term when its outputs are fed into the numerical integration of the reduced equations.

What would settle it

Integrate the MAC closure into the two-scale Lorenz 96 reduced equations and check whether the long-time error or statistical divergence exceeds that of the GRU baseline after the reported stable horizons.

Figures

Figures reproduced from arXiv: 2606.05371 by Panos Stinis, Saad Qadeer, Zhi-Feng Wei.

Figure 1
Figure 1. Figure 1: Schematic illustration of the autoregressive rollout procedure for reduced-order model evolution with predicted closure terms. At each step, the sequence model predicts the closure term Ct from the current resolved state φbt (red arrows). The resolved state is then advanced by one integrator step (RK4) that takes both φbt and Ct as inputs to produce φbt+1 (gray arrows). See Appendix B.8 for details of the … view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of resolved-mode predictions from the MAC model and the Markovian reduced-order model on one representative test initial condition over the temporal interpolation regime [0, 1]. For each resolved Fourier mode, the relative L 2 error over the time interval [0, 1] is also reported. closely than the Markovian model, especially for higher-frequency modes. The mean relat￾ive L 2 error for each Fourie… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of resolved-mode predictions from the MAC model and the Markovian reduced-order model on the same representative test initial condition over the temporal extrapolation regime [0, 2]. For each resolved Fourier mode, the relative L 2 errors over the intervals [0, 1], [1, 2], and [0, 2] are also reported. Page 12 of 57 [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Evolution of the physical-space L 2 error over the temporal extrapolation interval [0, 2] for the same representative test initial condition, comparing the MAC model and the Markovian reduced-order model. even-mode Fourier pattern that is entirely absent from the training data, providing a strong structural distribution shift in addition to the substantial amplitude extrapolation [PITH_FULL_IMAGE:figures/… view at source ↗
Figure 5
Figure 5. Figure 5: Evolution of the physical-space L 2 error for the out-of-distribution initial condition u0(x) = sin x over the long-time rollout interval [0, 20], comparing the MAC model, the Markovian reduced-order model, and the linear and cubic memory models in [Qadeer et al. 2025]. 0 4 8 12 16 20 Time (t) 0.00 0.08 0.16 0.24 0.32 L 2 Error MAC Markov Linear Cubic [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Evolution of the physical-space L 2 error for the out-of-distribution initial condition u0(x) = e sin x over the long-time rollout interval [0, 20], comparing the MAC model, the Markovian reduced-order model, and the linear and cubic memory models in [Qadeer et al. 2025]. Page 15 of 57 [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Evolution of the physical-space L 2 error for the out-of-distribution initial condition u0(x) = cos(2 sin x) over the long-time rollout interval [0, 20], comparing the MAC model, the Markovian reduced-order model, and the linear and cubic memory models in [Qadeer et al. 2025]. structured initial condition, the MAC model still substantially outperforms the Markovian reduced-order model and remains comparabl… view at source ↗
Figure 8
Figure 8. Figure 8: Evolution of the resolved energy over the temporal extrapolation interval [0, 2] for the same representative test initial condition, comparing the MAC model and the Markovian reduced￾order model. Finally, we investigate the computational scalability of the proposed MAC model in both training and inference. We note that the autoregressive rollouts reported in the previous experiments are performed in parall… view at source ↗
Figure 9
Figure 9. Figure 9: compares the predicted trajectories of the resolved slow variables against the true resolved trajectories extracted from the full Lorenz ’96 simulation on the time interval [0, 20]. As shown in [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Evolution of the running cumulative relative L 2 error over the temporal interpolation regime [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. We next examine the running cumulative correlation coefficient between the predicted and true resolved trajectories. The running cumulative correlation coefficient at time ti is defined by Corr(ti) = Pi j=0 [PI… view at source ↗
Figure 11
Figure 11. Figure 11: Evolution of the running cumulative correlation coefficient between the predicted and true resolved variables over the temporal interpolation regime [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. time lag τ , we define the uncentered temporal autocorrelation function by ACF(τ ) = PT −τ t=0 PN k=1 Uk(t)Uk(t + τ ) PT t=0 PN k=1 Uk(t) 2 , where T = 2001… view at source ↗
Figure 12
Figure 12. Figure 12: Temporal autocorrelation function of the generated closure trajectories over the tem￾poral interpolation regime [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRU￾based model, the Wilks method, and the true dynamics. As shown in [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Comparison of resolved slow-variable predictions from the MAC model, the GRU-based model, and the Wilks method over the temporal extrapolation regime [120, 140] (displayed as [0, 20]). For each resolved slow variable, the relative L 2 error is also reported. Page 23 of 57 [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Evolution of the running cumulative relative L 2 error over the temporal extrapolation regime [120, 140] (displayed as [0, 20]) for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Time (t) 0.66 0.72 0.78 0.84 0.90 0.96 1.00 Correlation Coefficient MAC GRU Wilks [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Evolution of the running cumulative correlation coefficient between the predicted and true resolved variables over the temporal extrapolation regime [120, 140] (displayed as [0, 20]) for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. Page 24 of 57 [PITH_FULL_IMAGE:figures/full_fig_p026_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Temporal autocorrelation function of the generated closure trajectories over the tem￾poral extrapolation regime [120, 140] (displayed as [0, 20]) for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, the Wilks method, and the true dynamics. Finally, [PITH_FULL_IMAGE:figures/full_fig_p027_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Comparison of resolved slow-variable predictions from the MAC model, the GRU-based model, and the Wilks method for one representative unseen initial condition over the time interval [0, 20]. For each resolved slow variable, the relative L 2 error is also reported. 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Time (t) 0.00 0.15 0.30 0.45 C u m ula tiv e L 2 Error MAC GRU Wilks [PITH_FULL_IMAGE:figures/full_fi… view at source ↗
Figure 18
Figure 18. Figure 18: Evolution of the running cumulative relative L 2 error averaged over 100 unseen-initial￾condition test trajectories over the time interval [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. Page 26 of 57 [PITH_FULL_IMAGE:figures/full_fig_p028_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Evolution of the running cumulative correlation coefficient between the predicted and true resolved variables, averaged over 100 unseen-initial-condition test trajectories over the time interval [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. 0 2 4 6 8 10 Lag 0.965 0.970 0.975 0.980 0.985 0.990 0.995 1.000 Autocorrelation True MAC GRU Wilks [PITH_FUL… view at source ↗
Figure 20
Figure 20. Figure 20: Temporal autocorrelation function of the generated closure trajectories, averaged over 100 unseen-initial-condition test trajectories over the time interval [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, the Wilks method, and the true dynamics. Page 27 of 57 [PITH_FULL_IMAGE:figures/full_fig_p029_20.png] view at source ↗
read the original abstract

Reduced-order modeling of high-dimensional dynamical systems is often hindered by the non-Markovian closure term that represents the effect of unresolved variables on the resolved dynamics. Inspired by the Mori--Zwanzig formalism, in which the closure takes the form of a memory functional of the resolved trajectory, we recast closure modeling as a sequence modeling problem and propose the Mamba-Assisted Closure (MAC) framework: a Mamba-based sequence model, trained to predict the closure from the resolved trajectory, is coupled with the reduced-order governing equations through a numerical integrator to advance the resolved variables in time. A key feature of the framework is its exploitation of the dual representation of state-space models -- the model is trained in a sequence-to-sequence fashion via the convolutional form, and deployed for step-by-step autoregressive rollout via the recurrent form, yielding both efficient long-trajectory training and constant per-step inference cost. On the viscous Burgers' equation and the chaotic two-scale Lorenz '96 system, the MAC model substantially outperforms the Markovian reduced-order model, the GRU-based sequence model, and the Wilks method in predictive accuracy and long-time rollout stability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes the Mamba-Assisted Closure (MAC) framework, which recasts the non-Markovian closure term from the Mori-Zwanzig formalism as a sequence modeling task. A Mamba state-space model is trained in sequence-to-sequence (convolutional) mode on resolved trajectories and corresponding closure terms extracted from full-order simulations, then deployed autoregressively (recurrent form) inside a numerical integrator to advance the reduced-order equations. The central empirical claim is that MAC substantially outperforms a Markovian ROM, a GRU-based sequence model, and the Wilks method in predictive accuracy and long-time stability on the viscous Burgers equation and the chaotic two-scale Lorenz '96 system.

Significance. If the performance claims survive rigorous controls for distribution shift and hyperparameter variation, the work would demonstrate a practical way to exploit Mamba's dual convolutional/recurrent representation for efficient, stable non-Markovian closure modeling, addressing a long-standing challenge in reduced-order modeling of high-dimensional chaotic systems.

major comments (2)
  1. [§3.2 and §4] §4 (deployment) and §3.2 (training): the framework trains the Mamba model exclusively on clean full-order trajectories, yet deploys it inside an error-accumulating ROM integration; no analysis or experiments quantify the resulting distribution shift, which directly threatens the claimed long-time rollout stability advantage on the chaotic Lorenz '96 system.
  2. [Results section] Results section (implicitly Tables/Figures reporting Burgers and Lorenz '96 rollouts): the abstract and reader's summary assert outperformance without reported error bars, ablation studies on sequence length or Mamba hyperparameters, or cross-validation across data splits, leaving the quantitative strength of the central claim unverifiable.
minor comments (2)
  1. [§2] Notation for the memory kernel and closure term is introduced without an explicit equation reference tying it back to the Mori-Zwanzig projection; a single clarifying equation would improve readability.
  2. [§3] The description of the dual convolutional/recurrent deployment is clear in principle but would benefit from a small schematic or pseudocode block showing the exact interface between the Mamba output and the numerical integrator.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the presentation of the Mamba-Assisted Closure framework. We address each major comment below and commit to revisions that improve rigor without altering the core claims.

read point-by-point responses
  1. Referee: [§3.2 and §4] §4 (deployment) and §3.2 (training): the framework trains the Mamba model exclusively on clean full-order trajectories, yet deploys it inside an error-accumulating ROM integration; no analysis or experiments quantify the resulting distribution shift, which directly threatens the claimed long-time rollout stability advantage on the chaotic Lorenz '96 system.

    Authors: We agree this is a substantive point. The manuscript reports empirical long-time stability on Lorenz '96 but does not explicitly quantify the train-deployment distribution shift. In revision we will add a dedicated analysis (new subsection in §4) that compares input statistics (means, variances, autocorrelation) between training trajectories and those encountered during ROM rollout, together with a controlled experiment injecting small perturbations to the resolved state to measure sensitivity. This directly addresses the concern while preserving the existing results. revision: yes

  2. Referee: [Results section] Results section (implicitly Tables/Figures reporting Burgers and Lorenz '96 rollouts): the abstract and reader's summary assert outperformance without reported error bars, ablation studies on sequence length or Mamba hyperparameters, or cross-validation across data splits, leaving the quantitative strength of the central claim unverifiable.

    Authors: We accept that the current results section lacks these elements. In the revised manuscript we will (i) report means and standard deviations over at least five independent training runs with different random seeds, (ii) include ablations on sequence length and Mamba state dimension, and (iii) add results across two distinct data splits (temporal and initial-condition based). These additions will appear as new tables/figures in the Results section and will be summarized in the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical framework is self-contained

full rationale

The paper recasts closure modeling as a sequence-to-sequence task and trains a Mamba model on resolved trajectories extracted from full-order simulations, then couples the trained model into the reduced-order equations for rollout. This is a standard data-driven modeling pipeline with no load-bearing derivation that reduces to its own inputs by construction. The dual convolutional/recurrent deployment is a known property of state-space models and does not create self-definition or fitted-input-as-prediction circularity. No self-citation chains, uniqueness theorems, or ansatzes smuggled via prior work are invoked as the central justification. Claims rest on comparative numerical experiments rather than algebraic equivalence to the training procedure.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim depends on the existence of a learnable non-Markovian closure (from Mori-Zwanzig) and on the ability of a high-capacity sequence model to approximate it from finite data; both are domain assumptions rather than new axioms.

free parameters (1)
  • Mamba model weights
    All parameters of the sequence model are fitted to simulation data from the full-order system.
axioms (1)
  • domain assumption The effect of unresolved variables on resolved dynamics can be expressed as a memory functional of the resolved trajectory (Mori-Zwanzig).
    Invoked in the first sentence of the abstract as the foundation for recasting closure as sequence modeling.

pith-pipeline@v0.9.1-grok · 5741 in / 1272 out tokens · 19908 ms · 2026-06-28T07:14:17.861954+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 30 canonical work pages

  1. [1]

    On closures for reduced order models ia spectrum of first-principle to machine- learned avenues

    Ahmed, Shady E., Suraj Pawar, Omer San, Adil Rasheed, Traian Iliescu and Bernd R. Noack 2021 “On closures for reduced order models ia spectrum of first-principle to machine- learned avenues”, Physics of Fluids , vol. 33, issue 9, article 091301, doi : 10.1063/5.0061577

  2. [2]

    Benettin, L

    Benettin, Giancarlo, Luigi Galgani, Antonio Giorgilli and Jean-Marie Strelcyn 1980 “Lyapunov characteristic exponents for smooth dynamical systems and for hamiltonian systems; a method for computing all of them. part 1: theory”, Meccanica, vol. 15, issue 1, pp. 9–20, doi : 10.1007/BF02128236

  3. [3]

    Learning long-term dependencies with gradient descent is difficult

    Bengio, Yoshua, Patrice Simard and Paolo Frasconi 1994 “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks , vol. 5, issue 2, pp. 157–166, doi : 10.1109/72.279181

  4. [4]

    Memory-based parameterization with differentiable solver: application to Lorenz ’96

    Bhouri, Mohamed Aziz and Pierre Gentine 2023 “Memory-based parameterization with differentiable solver: application to Lorenz ’96”, Chaos: An Interdisciplinary Journal of Nonlinear Science , vol. 33, issue 7, article 073116, doi : 10.1063/5.0131929

  5. [5]

    Learning phrase representations using RNN encoder ⚶decoder for statistical machine translation

    Cho, Kyunghyun, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bou- gares, Holger Schwenk and Yoshua Bengio 2014 “Learning phrase representations using RNN encoder ⚶decoder for statistical machine translation”, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , Association for Compu- tationa...

  6. [6]

    Optimal prediction and the Mori ⚶Zwanzig representation of irreversible pro- cesses

    Chorin, Alexandre J., Ole H. Hald and Raz Kupferman 2000 “Optimal prediction and the Mori ⚶Zwanzig representation of irreversible pro- cesses”, Proceedings of the National Academy of Sciences , vol. 97, issue 7, pp. 2968–2973, doi : 10.1073/pnas.97.7.2968

  7. [7]

    T., & Peruccacci, S

    Christensen, Hannah and Laure Zanna 2022 “Parametrization in weather and climate models”, in, Oxford research encyc- lopedia of climate science , ed. by Hans von Storch, Oxford University Press, doi : 10.1093/acrefore/9780190228620.013.826. Page 53 of 57

  8. [8]

    Stochastic parameterization of subgrid-scale processes: a review of recent physically based approaches

    Demaeyer, Jonathan and Stéphane Vannitsem 2017 “Stochastic parameterization of subgrid-scale processes: a review of recent physically based approaches”, in, Advances in nonlinear geosciences , ed. by Anastasios A. Tsonis, Springer International Publishing, Cham, pp. 55–85, doi : 10.1007/978-3-319-58895-7_3

  9. [9]

    Some recent developments in turbulence closure modeling

    Durbin, Paul A. 2018 “Some recent developments in turbulence closure modeling”, Annual Review of Fluid Mechanics , vol. 50, issue 1, pp. 77–103, doi : 10.1146/annurev-fluid-122316-045020

  10. [10]

    A computational strategy for multiscale systems with applications to Lorenz 96 model

    Fatkullin, Ibrahim and Eric Vanden-Eijnden 2004 “A computational strategy for multiscale systems with applications to Lorenz 96 model”, Journal of Computational Physics , vol. 200, issue 2, pp. 605–638, doi : 10.1016/j.jcp.2004.04.013

  11. [11]

    Frezat, Hugo, Ronan Fablet, Guillaume Balarac and Julien Le Sommer 2023 Gradient-free online learning of subgrid-scale dynamics with neural emulators, arxiv : 2310.19385

  12. [12]

    Gu, Albert and Tri Dao 2023 Mamba: linear-time sequence modeling with selective state spaces, arxiv : 2312.00752

  13. [13]

    Gu, Albert, Karan Goel and Christopher Ré 2021 Efficiently modeling long sequences with structured state spaces, arxiv : 2111.00396

  14. [14]

    Combining recurrent, convolutional, and continuous-time models with linear state-space layers

    Gu, Albert, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra and Christopher Ré 2021 “Combining recurrent, convolutional, and continuous-time models with linear state-space layers”, in Proceedings of the 35th International Conference on Neural Information Processing Systems , Curran Associates Inc., pp. 572–585, url : https://dl.acm.org/doi/10.5...

  15. [15]

    Neural closure models for dynamical systems

    Gupta, Abhinav and Pierre F. J. Lermusiaux 2021 “Neural closure models for dynamical systems”, Proceedings of the Royal Soci- ety A: Mathematical, Physical and Engineering Sciences , vol. 477, issue 2252, article 20201004, doi : 10.1098/rspa.2020.1004. Page 54 of 57

  16. [16]

    Deepomamba: state-space model for spatio-temporal PDE neural operator learning

    Hu, Zheyuan, Qianying Cao, Kenji Kawaguchi and George Em Karniadakis 2025 “Deepomamba: state-space model for spatio-temporal PDE neural operator learning”, Journal of Computational Physics , vol. 540, article 114272, doi : https://doi.org/10.1016/j.jcp.2025.114272

  17. [17]

    State-space models are accurate and efficient neural operators for dynamical systems

    Hu, Zheyuan, Nazanin Ahmadi Daryakenari, Qianli Shen, Kenji Kawaguchi and George Em Karniadakis 2026 “State-space models are accurate and efficient neural operators for dynamical systems”, Neural Networks , vol. 197, article 108496, doi : https://doi.org/10.1016/j.neunet.2025.108496

  18. [18]

    Machine learning ⚶accelerated computational fluid dynamics

    Kochkov, Dmitrii, Jamie A. Smith, Ayya Alieva, Qing Wang, Michael P. Brenner and Stephan Hoyer 2021 “Machine learning ⚶accelerated computational fluid dynamics”, Proceedings of the National Academy of Sciences , vol. 118, issue 21, article e2101784118, doi : 10.1073/pnas.2101784118

  19. [19]

    Predictability: a problem partly solved

    Lorenz, Edward N 1996 “Predictability: a problem partly solved”, in Proc. Seminar on Predictability , Reading, pp. 1–18, url : https : / / www . ecmwf . int / en / elibrary / 75462 - predictability - problem-partly-solved

  20. [20]

    Model reduction with memory and the machine learning of dynamical sys- tems

    Ma, Chao, Jianchun Wang null and Weinan E 2019 “Model reduction with memory and the machine learning of dynamical sys- tems”, Communications in Computational Physics , vol. 25, issue 4, pp. 947– 962, doi : 10.4208/cicp.OA-2018-0269

  21. [21]

    Time-series learning of latent-space dynamics for reduced-order model clos- ure

    Maulik, Romit, Arvind Mohan, Bethany Lusch, Sandeep Madireddy, Prasanna Balaprakash and Daniel Livescu 2020 “Time-series learning of latent-space dynamics for reduced-order model clos- ure”, Physica D: Nonlinear Phenomena , vol. 405, article 132368, doi : https://doi.org/10.1016/j.physd.2020.132368

  22. [22]

    Transport, collective motion, and Brownian motion

    Mori, Hazime 1965 “Transport, collective motion, and Brownian motion”, Progress of Theoretical Physics, vol. 33, issue 3, pp. 423–455, doi : 10.1143/ptp.33.423

  23. [23]

    Non-markovian closure models for large eddy simulations using the Mori- Zwanzig formalism

    Parish, Eric J. and Karthik Duraisamy 2017 “Non-markovian closure models for large eddy simulations using the Mori- Zwanzig formalism”, Physical Review Fluids , vol. 2, issue 1, article 014604, doi : 10.1103/PhysRevFluids.2.014604. Page 55 of 57

  24. [24]

    On the difficulty of training recurrent neural networks

    Pascanu, Razvan, Tomas Mikolov and Yoshua Bengio 2013 “On the difficulty of training recurrent neural networks”, in Proceedings of the 30th International Conference on Machine Learning , JMLR.org, pp. III- 1310–III-1318, url : https://dl.acm.org/doi/10.5555/3042817.3043083

  25. [25]

    Qadeer, Saad, Panos Stinis and Hui Wan 2025 Stabilizing PDE–ML coupled systems, arxiv : 2506.19274

  26. [26]

    Scientific machine learning for closure models in multiscale problems: a re- view

    Sanderse, Benjamin, Panos Stinis, Romit Maulik and Shady E. Ahmed 2025 “Scientific machine learning for closure models in multiscale problems: a re- view”, Foundations of Data Science , vol. 7, issue 1, pp. 298–337, doi : 10.3934/fods.2024043

  27. [27]

    Coarse-graining methods for computational biology

    Saunders, Marissa G. and Gregory A. Voth 2013 “Coarse-graining methods for computational biology”, Annual Review Bio- physics, vol. 42, pp. 73–93, doi : 10.1146/annurev-biophys-083012-130348

  28. [28]

    Solver-in-the-loop: learning from differentiable physics to interact with it- erative PDE-solvers

    Um, Kiwon, Robert Brand, Yun Fei, Philipp Holl and Nils Thuerey 2020 “Solver-in-the-loop: learning from differentiable physics to interact with it- erative PDE-solvers”, in Proceedings of the 34th International Conference on Neural Information Processing Systems , Curran Associates Inc., pp. 6111– 6122, url : https://dl.acm.org/doi/10.5555/3495724.3496237...

  29. [29]

    Roadmap on multiscale materials modeling

    Rottler, Alexander Shluger, Ryan B. Sills, Ingo Steinbach, Alejandro Strachan and Ellad B. Tadmor 2020 “Roadmap on multiscale materials modeling”, Modelling and Simulation in Materials Science and Engineering , vol. 28, issue 4, article 043001, doi : 10.1088/1361-651X/ab7150

  30. [30]

    Attention is all you need

    Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser and Illia Polosukhin 2017 “Attention is all you need”, in Proceedings of the 31st International Confer- ence on Neural Information Processing Systems , NIPS’17, Curran Associates, Inc., pp. 6000–6010, url : https://dl.acm.org/doi/10.5555/3295222.3295349...

  31. [31]

    Recurrent neural network closure of parametric pod-Galerkin reduced-order models based on the Mori-Zwanzig formalism

    Wang, Qian, Nicolò Ripamonti and Jan S. Hesthaven 2020 “Recurrent neural network closure of parametric pod-Galerkin reduced-order models based on the Mori-Zwanzig formalism”, Journal of Computational Physics, vol. 410, article 109402, doi : https://doi.org/10.1016/j.jcp.2020.109402

  32. [32]

    Effects of stochastic parametrizations in the Lorenz ’96 system

    Wilks, Daniel S. 2005 “Effects of stochastic parametrizations in the Lorenz ’96 system”, Quarterly Journal of the Royal Meteorological Society , vol. 131, issue 606, pp. 389–407, doi : 10.1256/qj.04.03

  33. [33]

    Modelling climate change: the role of unresolved processes

    Williams, Paul D 2005 “Modelling climate change: the role of unresolved processes”, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 363, issue 1837, pp. 2931–2946, doi : 10.1098/rsta.2005.1676

  34. [34]

    Xue, Tingkai, Chin Chun Ooi, Zhengwei Ge, Fong Yew Leong, Hongying Li and Chang Wei Kang 2025 Differentiable physics-neural models enable learning of non-Markovian clos- ures for accelerated coarse-grained physics simulations, arxiv : 2511.21369

  35. [35]

    On the estimation of the Mori-Zwanzig memory integral

    Zhu, Yuanran, Jason M. Dominy and Daniele Venturi 2018 “On the estimation of the Mori-Zwanzig memory integral”, Journal of Math- ematical Physics , vol. 59, issue 10, article 103501, doi : 10.1063/1.5003467

  36. [36]

    Nonlinear generalized Langevin equations

    Zwanzig, Robert 1973 “Nonlinear generalized Langevin equations”, Journal of Statistical Physics , vol. 9, issue 3, pp. 215–220, doi : 10.1007/bf01008729. Page 57 of 57