pith. sign in

arxiv: 2605.15754 · v1 · pith:PLEFZJW5new · submitted 2026-05-15 · ⚛️ physics.comp-ph · cs.CE

Spatiotemporal decoupled physics-informed Stone-Weierstrass neural operator for long-time prediction of time-dependent parametric PDEs

Pith reviewed 2026-05-19 18:51 UTC · model grok-4.3

classification ⚛️ physics.comp-ph cs.CE
keywords neural operatorsphysics-informed learningpartial differential equationslong-time predictionspatiotemporal decouplingStone-Weierstrass approximationtime-marching samplingparametric PDEs
0
0 comments X

The pith

Encoding spatial and temporal information via separate subnetworks allows a physics-informed neural operator to avoid error accumulation in long-time predictions of parametric PDEs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the physics-informed Stone-Weierstrass neural operator to solve time-dependent parametric PDEs over long time horizons where standard neural operators lose accuracy and stability. It separates the learning of spatial structure from temporal evolution by using two subnetworks that implement time-invariant spatial basis functions paired with time-varying coefficients, drawing on the Stone-Weierstrass theorem for the approximation guarantee. A time-marching batch-wise sampling method is added to handle memory limits when training across extended time intervals while preserving solution continuity. If the separation works as intended, neural operators become practical for simulating sustained physical processes without the rapid degradation that currently restricts their use.

Core claim

The PI-SWNO architecture encodes spatial information in one subnetwork to produce time-invariant basis functions and temporal information in a second subnetwork to produce time-varying coefficients; their combination approximates the solution operator for time-dependent parametric PDEs. This decoupling, justified by the Stone-Weierstrass theorem, is claimed to structurally limit error accumulation over long intervals. The time-marching batch sampling strategy then enables full-domain training without exceeding memory constraints, yielding continuous and convergent solutions across the entire time span.

What carries the argument

Spatiotemporal decoupling realized by two separate subnetworks that learn time-invariant spatial basis functions and time-varying evolution coefficients.

If this is right

  • Long-time predictions remain accurate without progressive degradation from accumulated approximation errors.
  • Memory usage during training drops enough to allow full-domain modeling of extended time sequences.
  • The framework applies directly to families of parametric time-dependent PDEs encountered in physics and engineering.
  • Training stability improves because the separation removes one source of compounding numerical drift.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same spatial-temporal split could be tested inside other neural operator families to check whether the error-mitigation benefit is architecture-specific or more general.
  • Application to stiff or multi-scale temporal problems would reveal whether the fixed spatial bases still capture the required dynamics without frequent retraining.
  • Hybrid models that combine this decoupling with classical numerical time-steppers could be examined for further gains in long-horizon accuracy.

Load-bearing premise

The assumption that time-invariant spatial basis functions combined with time-varying coefficients will inherently prevent error accumulation over long time intervals for time-dependent parametric PDEs.

What would settle it

A side-by-side run on a standard benchmark PDE such as the time-dependent Burgers equation or Navier-Stokes, comparing whether prediction error stays bounded or grows much more slowly with the decoupled architecture than with a conventional integrated neural operator when the time horizon is extended by factors of ten or more.

Figures

Figures reproduced from arXiv: 2605.15754 by Guofeng Su, Hongxiang Ma, Lang Qin, Rui Yang, Shan Ding, Yongfu Tian.

Figure 1
Figure 1. Figure 1: The schematic diagram of PI-DeepONet framework details the structural mod [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The schematic diagram of PI-SWNO framework details its core structural mod [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The schematic diagram of time-marching batch-wise sampling strategy illustrates [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: 1D heat conduction equation: The first column shows the randomly sampled [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of long-time ANRL2E growth trends between the baseline PI [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Statistical characteristics of mean squared error (MSE) and ANRL2E for PI [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: 2D heat conduction equation: The first plot shows the randomly sampled source [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of long-time ANRL2E growth trends between the baseline PI [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Statistical characteristics of MSE and ANRL2E for PI-DeepONet and PI-SWNO [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: 1D wave equation: The first column shows the randomly sampled initial con [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of long-time ANRL2E growth trends between the baseline PI [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Statistical characteristics of MSE and ANRL2E for PI-DeepONet and PI [PITH_FULL_IMAGE:figures/full_fig_p032_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: 2D wave equation: The first plot shows the randomly sampled initial condition; [PITH_FULL_IMAGE:figures/full_fig_p033_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Comparison of long-time ANRL2E growth trends between the baseline PI [PITH_FULL_IMAGE:figures/full_fig_p034_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Statistical characteristics of MSE and ANRL2E for PI-DeepONet and PI [PITH_FULL_IMAGE:figures/full_fig_p035_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: 1D KdV equation: The first column shows the randomly sampled initial con [PITH_FULL_IMAGE:figures/full_fig_p038_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Comparison of long-time ANRL2E growth trends between the baseline PI [PITH_FULL_IMAGE:figures/full_fig_p038_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Statistical characteristics of MSE and ANRL2E for PI-DeepONet and PI [PITH_FULL_IMAGE:figures/full_fig_p039_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: 1D Burgers equation: The first column shows the randomly sampled initial [PITH_FULL_IMAGE:figures/full_fig_p041_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Comparison of long-time ANRL2E growth trends between the baseline PI [PITH_FULL_IMAGE:figures/full_fig_p041_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Statistical characteristics of MSE and ANRL2E for PI-DeepONet and PI [PITH_FULL_IMAGE:figures/full_fig_p042_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: 2D Burgers equation: The first plot shows the randomly sampled initial [PITH_FULL_IMAGE:figures/full_fig_p044_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Comparison of long-time ANRL2E growth trends between the baseline PI [PITH_FULL_IMAGE:figures/full_fig_p045_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Statistical characteristics of MSE and ANRL2E for PI-DeepONet and PI [PITH_FULL_IMAGE:figures/full_fig_p045_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Ablation study of the time-stepping batch-wise sampling strategy: We validate [PITH_FULL_IMAGE:figures/full_fig_p047_25.png] view at source ↗
read the original abstract

Driven by rapid advances in artificial intelligence and modern GPU computing capabilities, deep learning methods based on the optimization paradigm have provided new pathways to solve spatiotemporal physical problems, whose mathematical core lies in solving partial differential equations (PDEs). As an emerging class of function-space learning methods, neural operators (NOs) have exhibited great potential in efficient PDE solving. However, existing mainstream neural operator frameworks suffer from critical bottlenecks when modeling time-dependent PDEs over long time horizons, including accuracy degradation, insufficient stability, high training costs, and excessive memory consumption, which severely limit their practical deployment. To address these challenges in long-time prediction with neural operators, we propose a novel spatiotemporally decoupled physics-informed neural operator architecture, termed the physics-informed Stone-Weierstrass neural operator (PI-SWNO). The design is theoretically grounded in the decoupling paradigm combining time-invariant spatial basis functions with time-varying evolution coefficients, as well as the Stone-Weierstrass approximation theorem. By encoding spatial and temporal information via two separate subnetworks, the framework structurally mitigates the accumulation of errors over extended time intervals. Furthermore, we introduce a time-marching batch-wise sampling strategy to resolve the memory bottleneck of full-range modeling over extended time spans, ensuring continuity and convergence of full-time-domain solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the physics-informed Stone-Weierstrass neural operator (PI-SWNO) for long-time prediction of time-dependent parametric PDEs. It proposes a spatiotemporally decoupled architecture that encodes spatial information via time-invariant basis functions in one subnetwork and temporal evolution via time-varying coefficients in a second subnetwork, grounded in the Stone-Weierstrass approximation theorem. A time-marching batch-wise sampling strategy is added to address memory bottlenecks while maintaining continuity of the full-time-domain solution.

Significance. If the decoupling can be shown to control error growth without additional stability assumptions, the approach would offer a practical route to stable long-horizon neural-operator predictions at reduced memory cost, addressing a recognized limitation of existing operator-learning frameworks for evolutionary PDEs.

major comments (2)
  1. [Abstract] Abstract: the assertion that separate subnetworks for spatial bases and temporal coefficients 'structurally mitigates the accumulation of errors' is not accompanied by a Lipschitz bound, contraction mapping, or stability estimate on the learned coefficient evolution; Stone-Weierstrass supplies only density on compact sets and does not control amplification of coefficient errors under the underlying PDE dynamics.
  2. [Theoretical grounding] Theoretical grounding section: the decoupling replaces one source of temporal accumulation with another whose growth rate is not shown to be bounded by the separation alone; for PDEs that develop fine-scale structures or exhibit sensitivity to initial data, a small error in the coefficient subnetwork at step n can still be amplified even if the spatial basis remains fixed.
minor comments (1)
  1. [Abstract] Abstract: inclusion of at least one schematic equation or diagram illustrating the two-subnetwork split would clarify the claimed decoupling for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the theoretical scope of our claims regarding error mitigation in the PI-SWNO framework. We address each major comment below and describe the revisions planned for the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the assertion that separate subnetworks for spatial bases and temporal coefficients 'structurally mitigates the accumulation of errors' is not accompanied by a Lipschitz bound, contraction mapping, or stability estimate on the learned coefficient evolution; Stone-Weierstrass supplies only density on compact sets and does not control amplification of coefficient errors under the underlying PDE dynamics.

    Authors: We agree that the manuscript does not supply a Lipschitz bound, contraction mapping, or other stability estimate to rigorously prove control of error growth. The abstract phrasing is motivated by the architectural separation, in which time-invariant spatial basis functions are learned independently of the time-varying coefficients, thereby avoiding repeated spatial approximation errors at each time step. Stone-Weierstrass is invoked only for the density of the basis representation on compact sets. In the revised manuscript we will replace 'structurally mitigates' with 'is designed to mitigate' in the abstract and add a short paragraph in the theoretical section noting that a complete stability analysis under the PDE dynamics remains an open question for future work. revision: yes

  2. Referee: [Theoretical grounding] Theoretical grounding section: the decoupling replaces one source of temporal accumulation with another whose growth rate is not shown to be bounded by the separation alone; for PDEs that develop fine-scale structures or exhibit sensitivity to initial data, a small error in the coefficient subnetwork at step n can still be amplified even if the spatial basis remains fixed.

    Authors: The referee is correct that fixing the spatial basis does not by itself bound the growth rate of errors in the coefficient subnetwork, and that amplification remains possible for sensitive or fine-scale PDEs. The design choice reduces one pathway of error compounding (repeated spatial re-approximation) while leaving the temporal coefficient evolution to be learned; empirical results in the paper indicate improved long-time stability, yet no general bound is derived. We will revise the theoretical grounding section to state this limitation explicitly and to clarify that the separation provides a structural advantage rather than a proven contraction property. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation grounded in external theorem and design choice

full rationale

The paper's central architecture is presented as a novel combination of time-invariant spatial subnetworks and time-varying coefficient subnetworks, justified by appeal to the standard Stone-Weierstrass density theorem and a decoupling paradigm. No load-bearing step reduces by construction to a fitted parameter, self-citation chain, or renamed input; the error-mitigation claim is an asserted structural property rather than a tautological re-expression of training data. The time-marching batch strategy is a practical implementation detail without circular dependence on the predicted outputs. The derivation chain is therefore self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the applicability of the Stone-Weierstrass theorem to the decoupled representation and the effectiveness of the time-marching strategy for continuity; no free parameters or invented entities are explicitly named in the abstract.

axioms (2)
  • standard math Stone-Weierstrass approximation theorem can be used to justify the encoding of spatial and temporal information via separate subnetworks in the neural operator.
    Invoked to ground the decoupling paradigm for function approximation in the operator.
  • domain assumption Time-invariant spatial basis functions combined with time-varying coefficients will maintain stability over long time horizons for the PDEs considered.
    Core premise of the spatiotemporal decoupling that the architecture depends on.

pith-pipeline@v0.9.0 · 5778 in / 1397 out tokens · 34974 ms · 2026-05-19T18:51:22.695549+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 5 internal anchors

  1. [1]

    A. M. Vargas, Finite difference method for solving fractional differential equations at irregular meshes, Mathematics and Computers in Simulation 193 (2022) 204–216.doi:https: //doi.org/10.1016/j.matcom.2021.10.010. URLhttps://www.sciencedirect.com/science/article/pii/ S037847542100361X

  2. [2]

    Kergrene, I

    K. Kergrene, I. Babuška, U. Banerjee, Stable generalized finite element method and associated iterative schemes; application to interface prob- lems, Computer Methods in Applied Mechanics and Engineering 305 (2016) 1–36.doi:https://doi.org/10.1016/j.cma.2016.02.030. URLhttps://www.sciencedirect.com/science/article/pii/ S0045782516300603

  3. [3]

    Buchmüller, J

    P. Buchmüller, J. Dreher, C. Helzel, Finite volume weno methods for hyperbolic conservation laws on cartesian grids with adaptive mesh refinement, Applied Mathematics and Computation 272 (2016) 460–478.doi:https://doi.org/10.1016/j.amc.2015.03.078. URLhttps://www.sciencedirect.com/science/article/pii/ S0096300315003926

  4. [4]

    Karniadakis

    M. Raissi, P. Perdikaris, G. Karniadakis, Physics-informed neu- ral networks: A deep learning framework for solving forward 60 and inverse problems involving nonlinear partial differential equa- tions, Journal of Computational Physics 378 (2019) 686–707. doi:https://doi.org/10.1016/j.jcp.2018.10.045. URLhttps://www.sciencedirect.com/science/article/pii/ S...

  5. [5]

    Z. Li, K. Meidani, A. B. Farimani, Transformer for partial differen- tial equations’ operator learning, Transactions on Machine Learning Re- search (2023). URLhttps://openreview.net/forum?id=EPPqt3uERT

  6. [6]

    N. T. Mücke, S. M. Bohté, C. W. Oosterlee, Reduced order modeling for parameterized time-dependent pdes using spatially and memory aware deep learning, Journal of Computational Science 53 (2021) 101408. doi:https://doi.org/10.1016/j.jocs.2021.101408. URLhttps://www.sciencedirect.com/science/article/pii/ S1877750321000934

  7. [7]

    L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via deeponet based on the universal approximation theorem of operators, Nature Machine Intelligence 3 (3) (2021) 218–229.doi: 10.1038/s42256-021-00302-5. URLhttps://doi.org/10.1038/s42256-021-00302-5

  8. [8]

    Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stu- art, A. Anandkumar, Fourier neural operator for parametric partial dif- ferential equations (2021).arXiv:2010.08895. URLhttps://arxiv.org/abs/2010.08895

  9. [9]

    Z. Li, H. Zheng, N. Kovachki, D. Jin, H. Chen, B. Liu, K. Azizzade- nesheli, A. Anandkumar, Physics-informed neural operator for learning partial differential equations (2023).arXiv:2111.03794. URLhttps://arxiv.org/abs/2111.03794

  10. [10]

    S. Wang, H. Wang, P. Perdikaris, Learning the solution operator of parametric partial differential equations with physics-informed deep- onets, Science Advances 7 (40) (2021) eabi8605.arXiv:https:// www.science.org/doi/pdf/10.1126/sciadv.abi8605,doi:10.1126/ sciadv.abi8605. URLhttps://www.science.org/doi/abs/10.1126/sciadv.abi8605 61

  11. [11]

    You are given a context below. Your task is to generate 15 diverse questions and answers based on this context:\n\n

    L. Mandl, D. Nayak, T. Ricken, S. Goswami, Physics-informed time- integrated deeponet: Temporal tangent space operator learning for high-accuracy inference (August 01, 2025 2025).doi:10.48550/arXiv. 2508.05190. URLhttps://ui.adsabs.harvard.edu/abs/2025arXiv250805190M

  12. [12]

    D. N. Arnold, F. Brezzi, B. Cockburn, L. D. Marini, Unified analysis of discontinuous galerkin methods for elliptic problems, SIAM J. Numer. Anal. 39 (5) (2001) 1749–1779.doi:10.1137/S0036142901384162. URLhttps://doi.org/10.1137/S0036142901384162

  13. [13]

    Villadsen, W

    J. Villadsen, W. Stewart, Solution of boundary-value problems by orthogonal collocation, Chemical Engineering Science 50 (24) (1995) 3981–3996.doi:https://doi.org/10.1016/0009-2509(96)81831-8. URLhttps://www.sciencedirect.com/science/article/pii/ 0009250996818318

  14. [14]

    J. He, S. Kushwaha, J. Park, S. Koric, D. Abueidda, I. Jasiuk, Sequen- tial deep operator networks (s-deeponet) for predicting full-field solu- tions under time-dependent loads, Engineering Applications of Artifi- cial Intelligence 127 (2024) 107258.doi:https://doi.org/10.1016/ j.engappai.2023.107258

  15. [15]

    P. Jin, S. Meng, L. Lu, Mionet: Learning multiple-input operators via tensor product, SIAM Journal on Scientific Computing 44 (2022) A3490–A3514.doi:10.1137/22M1477751

  16. [16]

    W. Diab, M. Al Kobaisi, Temporal neural operator for modeling time- dependent physical phenomena, Scientific Reports 15 (1) (2025) 32791. doi:10.1038/s41598-025-16922-5. URLhttps://doi.org/10.1038/s41598-025-16922-5

  17. [17]

    Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stu- art, A. Anandkumar, Neural operator: Graph kernel network for partial differential equations (2020).arXiv:2003.03485. URLhttps://arxiv.org/abs/2003.03485

  18. [18]

    Raonić, R

    B. Raonić, R. Molinaro, T. D. Ryck, T. Rohner, F. Bartolucci, R. Alai- fari, S. Mishra, E. de Bézenac, Convolutional neural operators for robust 62 and accurate learning of pdes (2023).arXiv:2302.01178. URLhttps://arxiv.org/abs/2302.01178

  19. [19]

    Karumuri, L

    S. Karumuri, L. Graham-Brady, S. Goswami, Physics-informed latent neural operator for real-time predictions of time-dependent parametric pdes, Computer Methods in Applied Mechanics and Engineering 450 (2026) 118599.doi:https://doi.org/10.1016/j.cma.2025.118599. URLhttps://www.sciencedirect.com/science/article/pii/ S0045782525008710

  20. [20]

    T. Wang, C. Wang, Latent neural operator pretraining for solving time- dependent pdes, in: M. Mahmud, M. Doborjeh, K. Wong, A. C. S. Leung, Z. Doborjeh, M. Tanveer (Eds.), Neural Information Processing, Springer Nature Singapore, Singapore, 2025, pp. 163–178

  21. [21]

    Koric, D

    S. Koric, D. W. Abueidda, Data-driven and physics-informed deep learning operators for solution of heat conduction equa- tion with parametric heat source, International Journal of Heat and Mass Transfer 203 (2023) 123809.doi:https: //doi.org/10.1016/j.ijheatmasstransfer.2022.123809. URLhttps://www.sciencedirect.com/science/article/pii/ S0017931022012777

  22. [22]

    S. W. Cho, H. Son, Physics-informed deep inverse operator networks for solving pde inverse problems (2025).arXiv:2412.03161. URLhttps://arxiv.org/abs/2412.03161

  23. [23]

    S. Ding, Y. Tian, L. Qin, H. Ma, R. Yang, Physics-informed hierar- chical neural operator for solving inverse problem of unsteady heat conduction, International Journal of Heat and Mass Transfer 258 (2026) 128335.doi:https://doi.org/10.1016/j.ijheatmasstransfer. 2026.128335. URLhttps://www.sciencedirect.com/science/article/pii/ S0017931026000116

  24. [24]

    G. Lei, Z. Lei, L. Shi, Long-time integration of nonlinear wave equations with neural operators (2025).arXiv:2410.15617. URLhttps://arxiv.org/abs/2410.15617 63

  25. [25]

    S. W. Cho, J. Y. Lee, H. J. Hwang, Learning time-dependent pde via graph neural networks and deep operator network for robust accuracy on irregular grids, Journal of Computational Physics 544 (2026) 114430. doi:https://doi.org/10.1016/j.jcp.2025.114430. URLhttps://www.sciencedirect.com/science/article/pii/ S0021999125007120

  26. [26]

    Nayak, S

    D. Nayak, S. Goswami, Ti-deeponet: Learnable time integration for stable long-term extrapolation (2025).arXiv:2505.17341. URLhttps://arxiv.org/abs/2505.17341

  27. [27]

    T. Dao, A. Gu, Transformers are ssms: Generalized models and efficient algorithms through structured state space duality (2024).arXiv:2405. 21060. URLhttps://arxiv.org/abs/2405.21060

  28. [28]

    Z. Hu, N. A. Daryakenari, Q. Shen, K. Kawaguchi, G. E. Kar- niadakis, State-space models are accurate and efficient neu- ral operators for dynamical systems, Neural Networks (2025) 108496doi:https://doi.org/10.1016/j.neunet.2025.108496. URLhttps://www.sciencedirect.com/science/article/pii/ S0893608025013772

  29. [29]

    A. Gu, T. Dao, Mamba: Linear-time sequence modeling with selective state spaces (2024). URLhttps://openreview.net/forum?id=AL1fq05o7H

  30. [30]

    Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

    J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling (2014).arXiv: 1412.3555. URLhttps://arxiv.org/abs/1412.3555

  31. [31]

    H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, W. Zhang, In- former: Beyond efficient transformer for long sequence time-series fore- casting (2021).arXiv:2012.07436. URLhttps://arxiv.org/abs/2012.07436

  32. [32]

    Buitrago, T

    R. Buitrago, T. Marwah, A. Gu, A. Risteski, On the benefits of memory for modeling time-dependent PDEs, in: The Thirteenth International 64 Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=o9kqa5K3tB

  33. [33]

    Michałowska, S

    K. Michałowska, S. Goswami, G. E. Karniadakis, S. Riemer-Sørensen, Neural operator learning for long-time integration in dynamical systems with recurrent neural networks (2024).arXiv:2303.02243. URLhttps://arxiv.org/abs/2303.02243

  34. [34]

    Z. Hu, Q. Cao, K. Kawaguchi, G. E. Karniadakis, Deepomamba: State- space model for spatio-temporal pde neural operator learning, Journal of Computational Physics 540 (2025) 114272.doi:https://doi.org/ 10.1016/j.jcp.2025.114272

  35. [35]

    W. Wang, M. Hakimzadeh, H. Ruan, S. Goswami, Time-marching neu- ral operator–fe coupling: Ai-accelerated physics modeling, Computer Methods in Applied Mechanics and Engineering 446 (2025) 118319. doi:https://doi.org/10.1016/j.cma.2025.118319. URLhttps://www.sciencedirect.com/science/article/pii/ S0045782525005912

  36. [36]

    Y. Chen, Y. Lin, X. Sun, C. Yuan, Z. Gao, Tensor decomposition-based neural operator with dynamic mode decomposition for parameterized time-dependent problems, Journal of Computational Physics 533 (2025) 113996.doi:https://doi.org/10.1016/j.jcp.2025.113996. URLhttps://www.sciencedirect.com/science/article/pii/ S0021999125002797

  37. [37]

    J. Chen, W. Xu, Z. Xu, N. Grande Gutiérrez, S. P. Narra, C. McComb, Enforcing the principle of locality for physical simulations with neural operators, Journal of Computational Physics 538 (2025) 114131. doi:https://doi.org/10.1016/j.jcp.2025.114131. URLhttps://www.sciencedirect.com/science/article/pii/ S0021999125004140

  38. [38]

    T. Chen, H. Chen, Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems, IEEE Trans Neural Netw 6 (4) (1995) 911–917

  39. [39]

    Rudin, Real and Complex Analysis, 3rd Edition, McGraw-Hill, New York, 1987

    W. Rudin, Real and Complex Analysis, 3rd Edition, McGraw-Hill, New York, 1987. 65

  40. [40]

    Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2:303–314, 1989

    G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems 2 (4) (1989) 303–314.doi: 10.1007/BF02551274. URLhttps://doi.org/10.1007/BF02551274

  41. [41]

    URLhttps://www.sciencedirect.com/science/article/pii/ S0045782524009538 66

    D.W.Abueidda, P.Pantidis, M.E.Mobasher, Deepokan: Deepoperator network based on kolmogorov arnold networks for mechanics problems, Computer Methods in Applied Mechanics and Engineering 436 (2025) 117699.doi:https://doi.org/10.1016/j.cma.2024.117699. URLhttps://www.sciencedirect.com/science/article/pii/ S0045782524009538 66