Mamba-Assisted Non-Markovian Closure for Reduced-Order Modeling

Panos Stinis; Saad Qadeer; Zhi-Feng Wei

arxiv: 2606.05371 · v1 · pith:Q7T5QZSXnew · submitted 2026-06-03 · 💻 cs.LG · cs.NA· math.NA· stat.ML

Mamba-Assisted Non-Markovian Closure for Reduced-Order Modeling

Zhi-Feng Wei , Saad Qadeer , Panos Stinis This is my paper

Pith reviewed 2026-06-28 07:14 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NAstat.ML

keywords reduced-order modelingnon-Markovian closureMambasequence modelingMori-Zwanzig formalismBurgers equationLorenz 96closure modeling

0 comments

The pith

Mamba sequence models learn non-Markovian closure terms to stabilize reduced-order simulations of chaotic systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper recasts the memory-dependent closure term from the Mori-Zwanzig formalism as a sequence modeling task, where a Mamba state-space model is trained to predict the unresolved effects from the history of resolved variables. This learned closure is then inserted back into the reduced governing equations and advanced with a numerical integrator. The dual representation lets the model train efficiently on full trajectories in convolutional mode and run at constant cost per step in recurrent mode. Tests on the viscous Burgers equation and the two-scale Lorenz 96 system show the resulting simulations remain accurate and stable over longer times than Markovian closures, GRU sequence models, or the Wilks method.

Core claim

The Mamba-Assisted Closure framework trains a Mamba model on resolved trajectories to predict the non-Markovian closure term, then couples those predictions into the reduced-order equations via numerical integration; the convolutional form enables efficient long-trajectory training while the recurrent form supports stable autoregressive rollout.

What carries the argument

Mamba state-space model that maps resolved trajectories to the closure term, using its convolutional representation for training and recurrent representation for inference.

If this is right

Higher predictive accuracy than Markovian reduced-order models on both Burgers and Lorenz 96 test cases.
Longer stable rollouts than GRU-based sequence models and the Wilks method.
Constant per-step cost during deployment because of the recurrent inference mode.
Efficient training on extended trajectories via the convolutional training mode.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be applied to other systems with strong memory effects such as turbulent flows or climate models.
Coarser spatial discretizations might become viable if the learned closure compensates for the missing scales.
Hybrid models combining Mamba closures with existing physics-based reduced equations could be tested on real observational data.

Load-bearing premise

A Mamba model trained on resolved trajectories can accurately predict the non-Markovian closure term when its outputs are fed into the numerical integration of the reduced equations.

What would settle it

Integrate the MAC closure into the two-scale Lorenz 96 reduced equations and check whether the long-time error or statistical divergence exceeds that of the GRU baseline after the reported stable horizons.

Figures

Figures reproduced from arXiv: 2606.05371 by Panos Stinis, Saad Qadeer, Zhi-Feng Wei.

**Figure 1.** Figure 1: Schematic illustration of the autoregressive rollout procedure for reduced-order model evolution with predicted closure terms. At each step, the sequence model predicts the closure term Ct from the current resolved state φbt (red arrows). The resolved state is then advanced by one integrator step (RK4) that takes both φbt and Ct as inputs to produce φbt+1 (gray arrows). See Appendix B.8 for details of the … view at source ↗

**Figure 2.** Figure 2: Comparison of resolved-mode predictions from the MAC model and the Markovian reduced-order model on one representative test initial condition over the temporal interpolation regime [0, 1]. For each resolved Fourier mode, the relative L 2 error over the time interval [0, 1] is also reported. closely than the Markovian model, especially for higher-frequency modes. The mean relative L 2 error for each Fourie… view at source ↗

**Figure 3.** Figure 3: Comparison of resolved-mode predictions from the MAC model and the Markovian reduced-order model on the same representative test initial condition over the temporal extrapolation regime [0, 2]. For each resolved Fourier mode, the relative L 2 errors over the intervals [0, 1], [1, 2], and [0, 2] are also reported. Page 12 of 57 [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Evolution of the physical-space L 2 error over the temporal extrapolation interval [0, 2] for the same representative test initial condition, comparing the MAC model and the Markovian reduced-order model. even-mode Fourier pattern that is entirely absent from the training data, providing a strong structural distribution shift in addition to the substantial amplitude extrapolation [PITH_FULL_IMAGE:figures/… view at source ↗

**Figure 5.** Figure 5: Evolution of the physical-space L 2 error for the out-of-distribution initial condition u0(x) = sin x over the long-time rollout interval [0, 20], comparing the MAC model, the Markovian reduced-order model, and the linear and cubic memory models in [Qadeer et al. 2025]. 0 4 8 12 16 20 Time (t) 0.00 0.08 0.16 0.24 0.32 L 2 Error MAC Markov Linear Cubic [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Evolution of the physical-space L 2 error for the out-of-distribution initial condition u0(x) = e sin x over the long-time rollout interval [0, 20], comparing the MAC model, the Markovian reduced-order model, and the linear and cubic memory models in [Qadeer et al. 2025]. Page 15 of 57 [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Evolution of the physical-space L 2 error for the out-of-distribution initial condition u0(x) = cos(2 sin x) over the long-time rollout interval [0, 20], comparing the MAC model, the Markovian reduced-order model, and the linear and cubic memory models in [Qadeer et al. 2025]. structured initial condition, the MAC model still substantially outperforms the Markovian reduced-order model and remains comparabl… view at source ↗

**Figure 8.** Figure 8: Evolution of the resolved energy over the temporal extrapolation interval [0, 2] for the same representative test initial condition, comparing the MAC model and the Markovian reducedorder model. Finally, we investigate the computational scalability of the proposed MAC model in both training and inference. We note that the autoregressive rollouts reported in the previous experiments are performed in parall… view at source ↗

**Figure 9.** Figure 9: compares the predicted trajectories of the resolved slow variables against the true resolved trajectories extracted from the full Lorenz ’96 simulation on the time interval [0, 20]. As shown in [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 10.** Figure 10: Evolution of the running cumulative relative L 2 error over the temporal interpolation regime [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. We next examine the running cumulative correlation coefficient between the predicted and true resolved trajectories. The running cumulative correlation coefficient at time ti is defined by Corr(ti) = Pi j=0 [PI… view at source ↗

**Figure 11.** Figure 11: Evolution of the running cumulative correlation coefficient between the predicted and true resolved variables over the temporal interpolation regime [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. time lag τ , we define the uncentered temporal autocorrelation function by ACF(τ ) = PT −τ t=0 PN k=1 Uk(t)Uk(t + τ ) PT t=0 PN k=1 Uk(t) 2 , where T = 2001… view at source ↗

**Figure 12.** Figure 12: Temporal autocorrelation function of the generated closure trajectories over the temporal interpolation regime [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRUbased model, the Wilks method, and the true dynamics. As shown in [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗

**Figure 13.** Figure 13: Comparison of resolved slow-variable predictions from the MAC model, the GRU-based model, and the Wilks method over the temporal extrapolation regime [120, 140] (displayed as [0, 20]). For each resolved slow variable, the relative L 2 error is also reported. Page 23 of 57 [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗

**Figure 14.** Figure 14: Evolution of the running cumulative relative L 2 error over the temporal extrapolation regime [120, 140] (displayed as [0, 20]) for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Time (t) 0.66 0.72 0.78 0.84 0.90 0.96 1.00 Correlation Coefficient MAC GRU Wilks [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗

**Figure 15.** Figure 15: Evolution of the running cumulative correlation coefficient between the predicted and true resolved variables over the temporal extrapolation regime [120, 140] (displayed as [0, 20]) for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. Page 24 of 57 [PITH_FULL_IMAGE:figures/full_fig_p026_15.png] view at source ↗

**Figure 16.** Figure 16: Temporal autocorrelation function of the generated closure trajectories over the temporal extrapolation regime [120, 140] (displayed as [0, 20]) for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, the Wilks method, and the true dynamics. Finally, [PITH_FULL_IMAGE:figures/full_fig_p027_16.png] view at source ↗

**Figure 17.** Figure 17: Comparison of resolved slow-variable predictions from the MAC model, the GRU-based model, and the Wilks method for one representative unseen initial condition over the time interval [0, 20]. For each resolved slow variable, the relative L 2 error is also reported. 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Time (t) 0.00 0.15 0.30 0.45 C u m ula tiv e L 2 Error MAC GRU Wilks [PITH_FULL_IMAGE:figures/full_fi… view at source ↗

**Figure 18.** Figure 18: Evolution of the running cumulative relative L 2 error averaged over 100 unseen-initialcondition test trajectories over the time interval [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. Page 26 of 57 [PITH_FULL_IMAGE:figures/full_fig_p028_18.png] view at source ↗

**Figure 19.** Figure 19: Evolution of the running cumulative correlation coefficient between the predicted and true resolved variables, averaged over 100 unseen-initial-condition test trajectories over the time interval [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, and the Wilks method. 0 2 4 6 8 10 Lag 0.965 0.970 0.975 0.980 0.985 0.990 0.995 1.000 Autocorrelation True MAC GRU Wilks [PITH_FUL… view at source ↗

**Figure 20.** Figure 20: Temporal autocorrelation function of the generated closure trajectories, averaged over 100 unseen-initial-condition test trajectories over the time interval [0, 20] for the Lorenz ’96 system, comparing the MAC model, the GRU-based model, the Wilks method, and the true dynamics. Page 27 of 57 [PITH_FULL_IMAGE:figures/full_fig_p029_20.png] view at source ↗

read the original abstract

Reduced-order modeling of high-dimensional dynamical systems is often hindered by the non-Markovian closure term that represents the effect of unresolved variables on the resolved dynamics. Inspired by the Mori--Zwanzig formalism, in which the closure takes the form of a memory functional of the resolved trajectory, we recast closure modeling as a sequence modeling problem and propose the Mamba-Assisted Closure (MAC) framework: a Mamba-based sequence model, trained to predict the closure from the resolved trajectory, is coupled with the reduced-order governing equations through a numerical integrator to advance the resolved variables in time. A key feature of the framework is its exploitation of the dual representation of state-space models -- the model is trained in a sequence-to-sequence fashion via the convolutional form, and deployed for step-by-step autoregressive rollout via the recurrent form, yielding both efficient long-trajectory training and constant per-step inference cost. On the viscous Burgers' equation and the chaotic two-scale Lorenz '96 system, the MAC model substantially outperforms the Markovian reduced-order model, the GRU-based sequence model, and the Wilks method in predictive accuracy and long-time rollout stability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mamba learns non-Markovian closures for ROMs and beats GRU on two systems, but the training-deployment mismatch in chaotic cases needs direct testing.

read the letter

The paper's main result is that a Mamba sequence model, trained on full-order trajectories to predict the Mori-Zwanzig closure, improves both short-term accuracy and long-time stability when inserted into reduced-order models for viscous Burgers and two-scale Lorenz 96, outperforming Markovian closures, GRU, and Wilks.

The practical contribution is the dual use of the state-space form: convolutional training on long clean sequences for efficiency, then recurrent rollout at constant cost per step inside the numerical integrator. That matches the workflow for closure modeling.

The abstract reports clear gains on the two examples. If the full experiments hold up with proper controls, this is a useful data-driven option for people already using sequence models in scientific computing.

The stress-test concern lands. Training uses resolved full-model data, but deployment feeds the model its own accumulating truncation errors from the ROM integration. In the chaotic Lorenz 96 case this mismatch can grow and directly threaten the claimed stability edge. The abstract gives no sign that the authors tested robustness to input noise or closed the loop during training.

No error bars, hyperparameter sweeps, or ablation details appear in the abstract, so the size and reliability of the gains remain hard to judge.

This is aimed at the scientific ML subgroup working on reduced-order modeling of PDEs and ODEs. It is worth sending to peer review if the full manuscript shows the distribution-shift tests and basic reproducibility details; the core setup is straightforward enough that referees can evaluate it quickly.

Referee Report

2 major / 2 minor

Summary. The paper proposes the Mamba-Assisted Closure (MAC) framework, which recasts the non-Markovian closure term from the Mori-Zwanzig formalism as a sequence modeling task. A Mamba state-space model is trained in sequence-to-sequence (convolutional) mode on resolved trajectories and corresponding closure terms extracted from full-order simulations, then deployed autoregressively (recurrent form) inside a numerical integrator to advance the reduced-order equations. The central empirical claim is that MAC substantially outperforms a Markovian ROM, a GRU-based sequence model, and the Wilks method in predictive accuracy and long-time stability on the viscous Burgers equation and the chaotic two-scale Lorenz '96 system.

Significance. If the performance claims survive rigorous controls for distribution shift and hyperparameter variation, the work would demonstrate a practical way to exploit Mamba's dual convolutional/recurrent representation for efficient, stable non-Markovian closure modeling, addressing a long-standing challenge in reduced-order modeling of high-dimensional chaotic systems.

major comments (2)

[§3.2 and §4] §4 (deployment) and §3.2 (training): the framework trains the Mamba model exclusively on clean full-order trajectories, yet deploys it inside an error-accumulating ROM integration; no analysis or experiments quantify the resulting distribution shift, which directly threatens the claimed long-time rollout stability advantage on the chaotic Lorenz '96 system.
[Results section] Results section (implicitly Tables/Figures reporting Burgers and Lorenz '96 rollouts): the abstract and reader's summary assert outperformance without reported error bars, ablation studies on sequence length or Mamba hyperparameters, or cross-validation across data splits, leaving the quantitative strength of the central claim unverifiable.

minor comments (2)

[§2] Notation for the memory kernel and closure term is introduced without an explicit equation reference tying it back to the Mori-Zwanzig projection; a single clarifying equation would improve readability.
[§3] The description of the dual convolutional/recurrent deployment is clear in principle but would benefit from a small schematic or pseudocode block showing the exact interface between the Mamba output and the numerical integrator.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the presentation of the Mamba-Assisted Closure framework. We address each major comment below and commit to revisions that improve rigor without altering the core claims.

read point-by-point responses

Referee: [§3.2 and §4] §4 (deployment) and §3.2 (training): the framework trains the Mamba model exclusively on clean full-order trajectories, yet deploys it inside an error-accumulating ROM integration; no analysis or experiments quantify the resulting distribution shift, which directly threatens the claimed long-time rollout stability advantage on the chaotic Lorenz '96 system.

Authors: We agree this is a substantive point. The manuscript reports empirical long-time stability on Lorenz '96 but does not explicitly quantify the train-deployment distribution shift. In revision we will add a dedicated analysis (new subsection in §4) that compares input statistics (means, variances, autocorrelation) between training trajectories and those encountered during ROM rollout, together with a controlled experiment injecting small perturbations to the resolved state to measure sensitivity. This directly addresses the concern while preserving the existing results. revision: yes
Referee: [Results section] Results section (implicitly Tables/Figures reporting Burgers and Lorenz '96 rollouts): the abstract and reader's summary assert outperformance without reported error bars, ablation studies on sequence length or Mamba hyperparameters, or cross-validation across data splits, leaving the quantitative strength of the central claim unverifiable.

Authors: We accept that the current results section lacks these elements. In the revised manuscript we will (i) report means and standard deviations over at least five independent training runs with different random seeds, (ii) include ablations on sequence length and Mamba state dimension, and (iii) add results across two distinct data splits (temporal and initial-condition based). These additions will appear as new tables/figures in the Results section and will be summarized in the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical framework is self-contained

full rationale

The paper recasts closure modeling as a sequence-to-sequence task and trains a Mamba model on resolved trajectories extracted from full-order simulations, then couples the trained model into the reduced-order equations for rollout. This is a standard data-driven modeling pipeline with no load-bearing derivation that reduces to its own inputs by construction. The dual convolutional/recurrent deployment is a known property of state-space models and does not create self-definition or fitted-input-as-prediction circularity. No self-citation chains, uniqueness theorems, or ansatzes smuggled via prior work are invoked as the central justification. Claims rest on comparative numerical experiments rather than algebraic equivalence to the training procedure.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim depends on the existence of a learnable non-Markovian closure (from Mori-Zwanzig) and on the ability of a high-capacity sequence model to approximate it from finite data; both are domain assumptions rather than new axioms.

free parameters (1)

Mamba model weights
All parameters of the sequence model are fitted to simulation data from the full-order system.

axioms (1)

domain assumption The effect of unresolved variables on resolved dynamics can be expressed as a memory functional of the resolved trajectory (Mori-Zwanzig).
Invoked in the first sentence of the abstract as the foundation for recasting closure as sequence modeling.

pith-pipeline@v0.9.1-grok · 5741 in / 1272 out tokens · 19908 ms · 2026-06-28T07:14:17.861954+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 30 canonical work pages

[1]

On closures for reduced order models ia spectrum of first-principle to machine- learned avenues

Ahmed, Shady E., Suraj Pawar, Omer San, Adil Rasheed, Traian Iliescu and Bernd R. Noack 2021 “On closures for reduced order models ia spectrum of first-principle to machine- learned avenues”, Physics of Fluids , vol. 33, issue 9, article 091301, doi : 10.1063/5.0061577

work page doi:10.1063/5.0061577 2021
[2]

Benettin, L

Benettin, Giancarlo, Luigi Galgani, Antonio Giorgilli and Jean-Marie Strelcyn 1980 “Lyapunov characteristic exponents for smooth dynamical systems and for hamiltonian systems; a method for computing all of them. part 1: theory”, Meccanica, vol. 15, issue 1, pp. 9–20, doi : 10.1007/BF02128236

work page doi:10.1007/bf02128236 1980
[3]

Learning long-term dependencies with gradient descent is diﬀicult

Bengio, Yoshua, Patrice Simard and Paolo Frasconi 1994 “Learning long-term dependencies with gradient descent is diﬀicult”, IEEE Transactions on Neural Networks , vol. 5, issue 2, pp. 157–166, doi : 10.1109/72.279181

work page doi:10.1109/72.279181 1994
[4]

Memory-based parameterization with differentiable solver: application to Lorenz ’96

Bhouri, Mohamed Aziz and Pierre Gentine 2023 “Memory-based parameterization with differentiable solver: application to Lorenz ’96”, Chaos: An Interdisciplinary Journal of Nonlinear Science , vol. 33, issue 7, article 073116, doi : 10.1063/5.0131929

work page doi:10.1063/5.0131929 2023
[5]

Learning phrase representations using RNN encoder ⚶decoder for statistical machine translation

Cho, Kyunghyun, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bou- gares, Holger Schwenk and Yoshua Bengio 2014 “Learning phrase representations using RNN encoder ⚶decoder for statistical machine translation”, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , Association for Compu- tationa...

work page doi:10.3115/v1/d14-1179 2014
[6]

Optimal prediction and the Mori ⚶Zwanzig representation of irreversible pro- cesses

Chorin, Alexandre J., Ole H. Hald and Raz Kupferman 2000 “Optimal prediction and the Mori ⚶Zwanzig representation of irreversible pro- cesses”, Proceedings of the National Academy of Sciences , vol. 97, issue 7, pp. 2968–2973, doi : 10.1073/pnas.97.7.2968

work page doi:10.1073/pnas.97.7.2968 2000
[7]

T., & Peruccacci, S

Christensen, Hannah and Laure Zanna 2022 “Parametrization in weather and climate models”, in, Oxford research encyc- lopedia of climate science , ed. by Hans von Storch, Oxford University Press, doi : 10.1093/acrefore/9780190228620.013.826. Page 53 of 57

work page doi:10.1093/acrefore/9780190228620.013.826 2022
[8]

Stochastic parameterization of subgrid-scale processes: a review of recent physically based approaches

Demaeyer, Jonathan and Stéphane Vannitsem 2017 “Stochastic parameterization of subgrid-scale processes: a review of recent physically based approaches”, in, Advances in nonlinear geosciences , ed. by Anastasios A. Tsonis, Springer International Publishing, Cham, pp. 55–85, doi : 10.1007/978-3-319-58895-7_3

work page doi:10.1007/978-3-319-58895-7_3 2017
[9]

Some recent developments in turbulence closure modeling

Durbin, Paul A. 2018 “Some recent developments in turbulence closure modeling”, Annual Review of Fluid Mechanics , vol. 50, issue 1, pp. 77–103, doi : 10.1146/annurev-fluid-122316-045020

work page doi:10.1146/annurev-fluid-122316-045020 2018
[10]

A computational strategy for multiscale systems with applications to Lorenz 96 model

Fatkullin, Ibrahim and Eric Vanden-Eijnden 2004 “A computational strategy for multiscale systems with applications to Lorenz 96 model”, Journal of Computational Physics , vol. 200, issue 2, pp. 605–638, doi : 10.1016/j.jcp.2004.04.013

work page doi:10.1016/j.jcp.2004.04.013 2004
[11]

Frezat, Hugo, Ronan Fablet, Guillaume Balarac and Julien Le Sommer 2023 Gradient-free online learning of subgrid-scale dynamics with neural emulators, arxiv : 2310.19385

arXiv 2023
[12]

Gu, Albert and Tri Dao 2023 Mamba: linear-time sequence modeling with selective state spaces, arxiv : 2312.00752

Pith/arXiv arXiv 2023
[13]

Gu, Albert, Karan Goel and Christopher Ré 2021 Eﬀiciently modeling long sequences with structured state spaces, arxiv : 2111.00396

Pith/arXiv arXiv 2021
[14]

Combining recurrent, convolutional, and continuous-time models with linear state-space layers

Gu, Albert, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra and Christopher Ré 2021 “Combining recurrent, convolutional, and continuous-time models with linear state-space layers”, in Proceedings of the 35th International Conference on Neural Information Processing Systems , Curran Associates Inc., pp. 572–585, url : https://dl.acm.org/doi/10.5...

work page doi:10.5555/3540261.3540305 2021
[15]

Neural closure models for dynamical systems

Gupta, Abhinav and Pierre F. J. Lermusiaux 2021 “Neural closure models for dynamical systems”, Proceedings of the Royal Soci- ety A: Mathematical, Physical and Engineering Sciences , vol. 477, issue 2252, article 20201004, doi : 10.1098/rspa.2020.1004. Page 54 of 57

work page doi:10.1098/rspa.2020.1004 2021
[16]

Deepomamba: state-space model for spatio-temporal PDE neural operator learning

Hu, Zheyuan, Qianying Cao, Kenji Kawaguchi and George Em Karniadakis 2025 “Deepomamba: state-space model for spatio-temporal PDE neural operator learning”, Journal of Computational Physics , vol. 540, article 114272, doi : https://doi.org/10.1016/j.jcp.2025.114272

work page doi:10.1016/j.jcp.2025.114272 2025
[17]

State-space models are accurate and eﬀicient neural operators for dynamical systems

Hu, Zheyuan, Nazanin Ahmadi Daryakenari, Qianli Shen, Kenji Kawaguchi and George Em Karniadakis 2026 “State-space models are accurate and eﬀicient neural operators for dynamical systems”, Neural Networks , vol. 197, article 108496, doi : https://doi.org/10.1016/j.neunet.2025.108496

work page doi:10.1016/j.neunet.2025.108496 2026
[18]

Machine learning ⚶accelerated computational fluid dynamics

Kochkov, Dmitrii, Jamie A. Smith, Ayya Alieva, Qing Wang, Michael P. Brenner and Stephan Hoyer 2021 “Machine learning ⚶accelerated computational fluid dynamics”, Proceedings of the National Academy of Sciences , vol. 118, issue 21, article e2101784118, doi : 10.1073/pnas.2101784118

work page doi:10.1073/pnas.2101784118 2021
[19]

Predictability: a problem partly solved

Lorenz, Edward N 1996 “Predictability: a problem partly solved”, in Proc. Seminar on Predictability , Reading, pp. 1–18, url : https : / / www . ecmwf . int / en / elibrary / 75462 - predictability - problem-partly-solved

1996
[20]

Model reduction with memory and the machine learning of dynamical sys- tems

Ma, Chao, Jianchun Wang null and Weinan E 2019 “Model reduction with memory and the machine learning of dynamical sys- tems”, Communications in Computational Physics , vol. 25, issue 4, pp. 947– 962, doi : 10.4208/cicp.OA-2018-0269

work page doi:10.4208/cicp.oa-2018-0269 2019
[21]

Time-series learning of latent-space dynamics for reduced-order model clos- ure

Maulik, Romit, Arvind Mohan, Bethany Lusch, Sandeep Madireddy, Prasanna Balaprakash and Daniel Livescu 2020 “Time-series learning of latent-space dynamics for reduced-order model clos- ure”, Physica D: Nonlinear Phenomena , vol. 405, article 132368, doi : https://doi.org/10.1016/j.physd.2020.132368

work page doi:10.1016/j.physd.2020.132368 2020
[22]

Transport, collective motion, and Brownian motion

Mori, Hazime 1965 “Transport, collective motion, and Brownian motion”, Progress of Theoretical Physics, vol. 33, issue 3, pp. 423–455, doi : 10.1143/ptp.33.423

work page doi:10.1143/ptp.33.423 1965
[23]

Non-markovian closure models for large eddy simulations using the Mori- Zwanzig formalism

Parish, Eric J. and Karthik Duraisamy 2017 “Non-markovian closure models for large eddy simulations using the Mori- Zwanzig formalism”, Physical Review Fluids , vol. 2, issue 1, article 014604, doi : 10.1103/PhysRevFluids.2.014604. Page 55 of 57

work page doi:10.1103/physrevfluids.2.014604 2017
[24]

On the diﬀiculty of training recurrent neural networks

Pascanu, Razvan, Tomas Mikolov and Yoshua Bengio 2013 “On the diﬀiculty of training recurrent neural networks”, in Proceedings of the 30th International Conference on Machine Learning , JMLR.org, pp. III- 1310–III-1318, url : https://dl.acm.org/doi/10.5555/3042817.3043083

work page doi:10.5555/3042817.3043083 2013
[25]

Qadeer, Saad, Panos Stinis and Hui Wan 2025 Stabilizing PDE–ML coupled systems, arxiv : 2506.19274

arXiv 2025
[26]

Scientific machine learning for closure models in multiscale problems: a re- view

Sanderse, Benjamin, Panos Stinis, Romit Maulik and Shady E. Ahmed 2025 “Scientific machine learning for closure models in multiscale problems: a re- view”, Foundations of Data Science , vol. 7, issue 1, pp. 298–337, doi : 10.3934/fods.2024043

work page doi:10.3934/fods.2024043 2025
[27]

Coarse-graining methods for computational biology

Saunders, Marissa G. and Gregory A. Voth 2013 “Coarse-graining methods for computational biology”, Annual Review Bio- physics, vol. 42, pp. 73–93, doi : 10.1146/annurev-biophys-083012-130348

work page doi:10.1146/annurev-biophys-083012-130348 2013
[28]

Solver-in-the-loop: learning from differentiable physics to interact with it- erative PDE-solvers

Um, Kiwon, Robert Brand, Yun Fei, Philipp Holl and Nils Thuerey 2020 “Solver-in-the-loop: learning from differentiable physics to interact with it- erative PDE-solvers”, in Proceedings of the 34th International Conference on Neural Information Processing Systems , Curran Associates Inc., pp. 6111– 6122, url : https://dl.acm.org/doi/10.5555/3495724.3496237...

work page doi:10.5555/3495724.3496237 2020
[29]

Roadmap on multiscale materials modeling

Rottler, Alexander Shluger, Ryan B. Sills, Ingo Steinbach, Alejandro Strachan and Ellad B. Tadmor 2020 “Roadmap on multiscale materials modeling”, Modelling and Simulation in Materials Science and Engineering , vol. 28, issue 4, article 043001, doi : 10.1088/1361-651X/ab7150

work page doi:10.1088/1361-651x/ab7150 2020
[30]

Attention is all you need

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser and Illia Polosukhin 2017 “Attention is all you need”, in Proceedings of the 31st International Confer- ence on Neural Information Processing Systems , NIPS’17, Curran Associates, Inc., pp. 6000–6010, url : https://dl.acm.org/doi/10.5555/3295222.3295349...

work page doi:10.5555/3295222.3295349 2017
[31]

Recurrent neural network closure of parametric pod-Galerkin reduced-order models based on the Mori-Zwanzig formalism

Wang, Qian, Nicolò Ripamonti and Jan S. Hesthaven 2020 “Recurrent neural network closure of parametric pod-Galerkin reduced-order models based on the Mori-Zwanzig formalism”, Journal of Computational Physics, vol. 410, article 109402, doi : https://doi.org/10.1016/j.jcp.2020.109402

work page doi:10.1016/j.jcp.2020.109402 2020
[32]

Effects of stochastic parametrizations in the Lorenz ’96 system

Wilks, Daniel S. 2005 “Effects of stochastic parametrizations in the Lorenz ’96 system”, Quarterly Journal of the Royal Meteorological Society , vol. 131, issue 606, pp. 389–407, doi : 10.1256/qj.04.03

work page doi:10.1256/qj.04.03 2005
[33]

Modelling climate change: the role of unresolved processes

Williams, Paul D 2005 “Modelling climate change: the role of unresolved processes”, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 363, issue 1837, pp. 2931–2946, doi : 10.1098/rsta.2005.1676

work page doi:10.1098/rsta.2005.1676 2005
[34]

Xue, Tingkai, Chin Chun Ooi, Zhengwei Ge, Fong Yew Leong, Hongying Li and Chang Wei Kang 2025 Differentiable physics-neural models enable learning of non-Markovian clos- ures for accelerated coarse-grained physics simulations, arxiv : 2511.21369

arXiv 2025
[35]

On the estimation of the Mori-Zwanzig memory integral

Zhu, Yuanran, Jason M. Dominy and Daniele Venturi 2018 “On the estimation of the Mori-Zwanzig memory integral”, Journal of Math- ematical Physics , vol. 59, issue 10, article 103501, doi : 10.1063/1.5003467

work page doi:10.1063/1.5003467 2018
[36]

Nonlinear generalized Langevin equations

Zwanzig, Robert 1973 “Nonlinear generalized Langevin equations”, Journal of Statistical Physics , vol. 9, issue 3, pp. 215–220, doi : 10.1007/bf01008729. Page 57 of 57

work page doi:10.1007/bf01008729 1973

[1] [1]

On closures for reduced order models ia spectrum of first-principle to machine- learned avenues

Ahmed, Shady E., Suraj Pawar, Omer San, Adil Rasheed, Traian Iliescu and Bernd R. Noack 2021 “On closures for reduced order models ia spectrum of first-principle to machine- learned avenues”, Physics of Fluids , vol. 33, issue 9, article 091301, doi : 10.1063/5.0061577

work page doi:10.1063/5.0061577 2021

[2] [2]

Benettin, L

Benettin, Giancarlo, Luigi Galgani, Antonio Giorgilli and Jean-Marie Strelcyn 1980 “Lyapunov characteristic exponents for smooth dynamical systems and for hamiltonian systems; a method for computing all of them. part 1: theory”, Meccanica, vol. 15, issue 1, pp. 9–20, doi : 10.1007/BF02128236

work page doi:10.1007/bf02128236 1980

[3] [3]

Learning long-term dependencies with gradient descent is diﬀicult

Bengio, Yoshua, Patrice Simard and Paolo Frasconi 1994 “Learning long-term dependencies with gradient descent is diﬀicult”, IEEE Transactions on Neural Networks , vol. 5, issue 2, pp. 157–166, doi : 10.1109/72.279181

work page doi:10.1109/72.279181 1994

[4] [4]

Memory-based parameterization with differentiable solver: application to Lorenz ’96

Bhouri, Mohamed Aziz and Pierre Gentine 2023 “Memory-based parameterization with differentiable solver: application to Lorenz ’96”, Chaos: An Interdisciplinary Journal of Nonlinear Science , vol. 33, issue 7, article 073116, doi : 10.1063/5.0131929

work page doi:10.1063/5.0131929 2023

[5] [5]

Learning phrase representations using RNN encoder ⚶decoder for statistical machine translation

Cho, Kyunghyun, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bou- gares, Holger Schwenk and Yoshua Bengio 2014 “Learning phrase representations using RNN encoder ⚶decoder for statistical machine translation”, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , Association for Compu- tationa...

work page doi:10.3115/v1/d14-1179 2014

[6] [6]

Optimal prediction and the Mori ⚶Zwanzig representation of irreversible pro- cesses

Chorin, Alexandre J., Ole H. Hald and Raz Kupferman 2000 “Optimal prediction and the Mori ⚶Zwanzig representation of irreversible pro- cesses”, Proceedings of the National Academy of Sciences , vol. 97, issue 7, pp. 2968–2973, doi : 10.1073/pnas.97.7.2968

work page doi:10.1073/pnas.97.7.2968 2000

[7] [7]

T., & Peruccacci, S

Christensen, Hannah and Laure Zanna 2022 “Parametrization in weather and climate models”, in, Oxford research encyc- lopedia of climate science , ed. by Hans von Storch, Oxford University Press, doi : 10.1093/acrefore/9780190228620.013.826. Page 53 of 57

work page doi:10.1093/acrefore/9780190228620.013.826 2022

[8] [8]

Stochastic parameterization of subgrid-scale processes: a review of recent physically based approaches

Demaeyer, Jonathan and Stéphane Vannitsem 2017 “Stochastic parameterization of subgrid-scale processes: a review of recent physically based approaches”, in, Advances in nonlinear geosciences , ed. by Anastasios A. Tsonis, Springer International Publishing, Cham, pp. 55–85, doi : 10.1007/978-3-319-58895-7_3

work page doi:10.1007/978-3-319-58895-7_3 2017

[9] [9]

Some recent developments in turbulence closure modeling

Durbin, Paul A. 2018 “Some recent developments in turbulence closure modeling”, Annual Review of Fluid Mechanics , vol. 50, issue 1, pp. 77–103, doi : 10.1146/annurev-fluid-122316-045020

work page doi:10.1146/annurev-fluid-122316-045020 2018

[10] [10]

A computational strategy for multiscale systems with applications to Lorenz 96 model

Fatkullin, Ibrahim and Eric Vanden-Eijnden 2004 “A computational strategy for multiscale systems with applications to Lorenz 96 model”, Journal of Computational Physics , vol. 200, issue 2, pp. 605–638, doi : 10.1016/j.jcp.2004.04.013

work page doi:10.1016/j.jcp.2004.04.013 2004

[11] [11]

Frezat, Hugo, Ronan Fablet, Guillaume Balarac and Julien Le Sommer 2023 Gradient-free online learning of subgrid-scale dynamics with neural emulators, arxiv : 2310.19385

arXiv 2023

[12] [12]

Gu, Albert and Tri Dao 2023 Mamba: linear-time sequence modeling with selective state spaces, arxiv : 2312.00752

Pith/arXiv arXiv 2023

[13] [13]

Gu, Albert, Karan Goel and Christopher Ré 2021 Eﬀiciently modeling long sequences with structured state spaces, arxiv : 2111.00396

Pith/arXiv arXiv 2021

[14] [14]

Combining recurrent, convolutional, and continuous-time models with linear state-space layers

Gu, Albert, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra and Christopher Ré 2021 “Combining recurrent, convolutional, and continuous-time models with linear state-space layers”, in Proceedings of the 35th International Conference on Neural Information Processing Systems , Curran Associates Inc., pp. 572–585, url : https://dl.acm.org/doi/10.5...

work page doi:10.5555/3540261.3540305 2021

[15] [15]

Neural closure models for dynamical systems

Gupta, Abhinav and Pierre F. J. Lermusiaux 2021 “Neural closure models for dynamical systems”, Proceedings of the Royal Soci- ety A: Mathematical, Physical and Engineering Sciences , vol. 477, issue 2252, article 20201004, doi : 10.1098/rspa.2020.1004. Page 54 of 57

work page doi:10.1098/rspa.2020.1004 2021

[16] [16]

Deepomamba: state-space model for spatio-temporal PDE neural operator learning

Hu, Zheyuan, Qianying Cao, Kenji Kawaguchi and George Em Karniadakis 2025 “Deepomamba: state-space model for spatio-temporal PDE neural operator learning”, Journal of Computational Physics , vol. 540, article 114272, doi : https://doi.org/10.1016/j.jcp.2025.114272

work page doi:10.1016/j.jcp.2025.114272 2025

[17] [17]

State-space models are accurate and eﬀicient neural operators for dynamical systems

Hu, Zheyuan, Nazanin Ahmadi Daryakenari, Qianli Shen, Kenji Kawaguchi and George Em Karniadakis 2026 “State-space models are accurate and eﬀicient neural operators for dynamical systems”, Neural Networks , vol. 197, article 108496, doi : https://doi.org/10.1016/j.neunet.2025.108496

work page doi:10.1016/j.neunet.2025.108496 2026

[18] [18]

Machine learning ⚶accelerated computational fluid dynamics

Kochkov, Dmitrii, Jamie A. Smith, Ayya Alieva, Qing Wang, Michael P. Brenner and Stephan Hoyer 2021 “Machine learning ⚶accelerated computational fluid dynamics”, Proceedings of the National Academy of Sciences , vol. 118, issue 21, article e2101784118, doi : 10.1073/pnas.2101784118

work page doi:10.1073/pnas.2101784118 2021

[19] [19]

Predictability: a problem partly solved

Lorenz, Edward N 1996 “Predictability: a problem partly solved”, in Proc. Seminar on Predictability , Reading, pp. 1–18, url : https : / / www . ecmwf . int / en / elibrary / 75462 - predictability - problem-partly-solved

1996

[20] [20]

Model reduction with memory and the machine learning of dynamical sys- tems

Ma, Chao, Jianchun Wang null and Weinan E 2019 “Model reduction with memory and the machine learning of dynamical sys- tems”, Communications in Computational Physics , vol. 25, issue 4, pp. 947– 962, doi : 10.4208/cicp.OA-2018-0269

work page doi:10.4208/cicp.oa-2018-0269 2019

[21] [21]

Time-series learning of latent-space dynamics for reduced-order model clos- ure

Maulik, Romit, Arvind Mohan, Bethany Lusch, Sandeep Madireddy, Prasanna Balaprakash and Daniel Livescu 2020 “Time-series learning of latent-space dynamics for reduced-order model clos- ure”, Physica D: Nonlinear Phenomena , vol. 405, article 132368, doi : https://doi.org/10.1016/j.physd.2020.132368

work page doi:10.1016/j.physd.2020.132368 2020

[22] [22]

Transport, collective motion, and Brownian motion

Mori, Hazime 1965 “Transport, collective motion, and Brownian motion”, Progress of Theoretical Physics, vol. 33, issue 3, pp. 423–455, doi : 10.1143/ptp.33.423

work page doi:10.1143/ptp.33.423 1965

[23] [23]

Non-markovian closure models for large eddy simulations using the Mori- Zwanzig formalism

Parish, Eric J. and Karthik Duraisamy 2017 “Non-markovian closure models for large eddy simulations using the Mori- Zwanzig formalism”, Physical Review Fluids , vol. 2, issue 1, article 014604, doi : 10.1103/PhysRevFluids.2.014604. Page 55 of 57

work page doi:10.1103/physrevfluids.2.014604 2017

[24] [24]

On the diﬀiculty of training recurrent neural networks

Pascanu, Razvan, Tomas Mikolov and Yoshua Bengio 2013 “On the diﬀiculty of training recurrent neural networks”, in Proceedings of the 30th International Conference on Machine Learning , JMLR.org, pp. III- 1310–III-1318, url : https://dl.acm.org/doi/10.5555/3042817.3043083

work page doi:10.5555/3042817.3043083 2013

[25] [25]

Qadeer, Saad, Panos Stinis and Hui Wan 2025 Stabilizing PDE–ML coupled systems, arxiv : 2506.19274

arXiv 2025

[26] [26]

Scientific machine learning for closure models in multiscale problems: a re- view

Sanderse, Benjamin, Panos Stinis, Romit Maulik and Shady E. Ahmed 2025 “Scientific machine learning for closure models in multiscale problems: a re- view”, Foundations of Data Science , vol. 7, issue 1, pp. 298–337, doi : 10.3934/fods.2024043

work page doi:10.3934/fods.2024043 2025

[27] [27]

Coarse-graining methods for computational biology

Saunders, Marissa G. and Gregory A. Voth 2013 “Coarse-graining methods for computational biology”, Annual Review Bio- physics, vol. 42, pp. 73–93, doi : 10.1146/annurev-biophys-083012-130348

work page doi:10.1146/annurev-biophys-083012-130348 2013

[28] [28]

Solver-in-the-loop: learning from differentiable physics to interact with it- erative PDE-solvers

Um, Kiwon, Robert Brand, Yun Fei, Philipp Holl and Nils Thuerey 2020 “Solver-in-the-loop: learning from differentiable physics to interact with it- erative PDE-solvers”, in Proceedings of the 34th International Conference on Neural Information Processing Systems , Curran Associates Inc., pp. 6111– 6122, url : https://dl.acm.org/doi/10.5555/3495724.3496237...

work page doi:10.5555/3495724.3496237 2020

[29] [29]

Roadmap on multiscale materials modeling

Rottler, Alexander Shluger, Ryan B. Sills, Ingo Steinbach, Alejandro Strachan and Ellad B. Tadmor 2020 “Roadmap on multiscale materials modeling”, Modelling and Simulation in Materials Science and Engineering , vol. 28, issue 4, article 043001, doi : 10.1088/1361-651X/ab7150

work page doi:10.1088/1361-651x/ab7150 2020

[30] [30]

Attention is all you need

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser and Illia Polosukhin 2017 “Attention is all you need”, in Proceedings of the 31st International Confer- ence on Neural Information Processing Systems , NIPS’17, Curran Associates, Inc., pp. 6000–6010, url : https://dl.acm.org/doi/10.5555/3295222.3295349...

work page doi:10.5555/3295222.3295349 2017

[31] [31]

Recurrent neural network closure of parametric pod-Galerkin reduced-order models based on the Mori-Zwanzig formalism

Wang, Qian, Nicolò Ripamonti and Jan S. Hesthaven 2020 “Recurrent neural network closure of parametric pod-Galerkin reduced-order models based on the Mori-Zwanzig formalism”, Journal of Computational Physics, vol. 410, article 109402, doi : https://doi.org/10.1016/j.jcp.2020.109402

work page doi:10.1016/j.jcp.2020.109402 2020

[32] [32]

Effects of stochastic parametrizations in the Lorenz ’96 system

Wilks, Daniel S. 2005 “Effects of stochastic parametrizations in the Lorenz ’96 system”, Quarterly Journal of the Royal Meteorological Society , vol. 131, issue 606, pp. 389–407, doi : 10.1256/qj.04.03

work page doi:10.1256/qj.04.03 2005

[33] [33]

Modelling climate change: the role of unresolved processes

Williams, Paul D 2005 “Modelling climate change: the role of unresolved processes”, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 363, issue 1837, pp. 2931–2946, doi : 10.1098/rsta.2005.1676

work page doi:10.1098/rsta.2005.1676 2005

[34] [34]

Xue, Tingkai, Chin Chun Ooi, Zhengwei Ge, Fong Yew Leong, Hongying Li and Chang Wei Kang 2025 Differentiable physics-neural models enable learning of non-Markovian clos- ures for accelerated coarse-grained physics simulations, arxiv : 2511.21369

arXiv 2025

[35] [35]

On the estimation of the Mori-Zwanzig memory integral

Zhu, Yuanran, Jason M. Dominy and Daniele Venturi 2018 “On the estimation of the Mori-Zwanzig memory integral”, Journal of Math- ematical Physics , vol. 59, issue 10, article 103501, doi : 10.1063/1.5003467

work page doi:10.1063/1.5003467 2018

[36] [36]

Nonlinear generalized Langevin equations

Zwanzig, Robert 1973 “Nonlinear generalized Langevin equations”, Journal of Statistical Physics , vol. 9, issue 3, pp. 215–220, doi : 10.1007/bf01008729. Page 57 of 57

work page doi:10.1007/bf01008729 1973