pith. sign in

arxiv: 2605.25413 · v3 · pith:SRK7FPFEnew · submitted 2026-05-25 · 💻 cs.LG · cs.AI· cs.NA· math.NA

Autoregression-Free Neural Operators for Time-Dependent PDEs

Pith reviewed 2026-06-29 22:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NAmath.NA
keywords neural operatorsflow matchingtime-dependent PDEslatent spaceautoregression-freelong-horizon predictioncontinuous-time modeling
0
0 comments X

The pith

AFNO maps PDE time evolution to a latent space and uses flow matching to enable continuous, autoregression-free long-horizon predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that autoregressive rollout in physical space causes progressive error buildup when neural operators predict solutions to time-dependent PDEs over long horizons. AFNO instead encodes the evolution into a latent space and learns a continuous vector field there via flow matching, which supports direct continuous-time stepping while conditioning explicitly on physical parameters. A reader would care because many simulation tasks in physics and engineering need forecasts that remain accurate far beyond short training windows without the instability that recursive feedback introduces. Theoretical analysis plus experiments across six PDEs are presented to show gains in stability and lower rollout errors relative to prior baselines.

Core claim

The central claim is that mapping the time evolution of PDEs into a latent space and modeling continuous-time vector fields within it using flow matching enables autoregression-free prediction over extended horizons, avoids error accumulation from physical-space rollouts, and captures dynamics under varying parameter configurations through explicit conditioning on physical parameters.

What carries the argument

The latent-space vector field learned by flow matching, which drives continuous evolution of the PDE state without recursive physical-space feedback.

If this is right

  • Long-horizon predictions become more stable because error no longer accumulates from recursive physical-space feedback.
  • Continuous evolution replaces discrete autoregressive steps, supporting predictions at arbitrary times.
  • Explicit conditioning on physical parameters allows the same model to handle multiple configurations without retraining.
  • Rollout errors are reduced consistently across the tested PDEs relative to existing neural-operator baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The latent-flow approach might transfer to other dynamical systems where autoregressive error accumulation limits forecast length.
  • Flow matching in latent space could support adaptive time stepping by querying the vector field at chosen intervals.
  • The method opens a route to couple neural operators with continuous-time control or optimization loops that require stable long trajectories.

Load-bearing premise

That the latent-space representation and flow-matching vector field preserve the necessary dynamics so that continuous evolution does not introduce new errors comparable to or larger than those from autoregressive physical rollouts.

What would settle it

An experiment on any of the six PDEs or a seventh unseen PDE where AFNO produces higher long-horizon rollout error than the autoregressive baselines.

Figures

Figures reproduced from arXiv: 2605.25413 by Caiyan Qin, Chaoning Zhang, Haoyu Bian, Heng Tao Shen, Jiaquan Zhang, Libin Cai, Wei Dong, Yang Yang, Yi Lu, Yuanfang Guo.

Figure 1
Figure 1. Figure 1: Comparison of autoregressive rollout and autoregression-free rollout [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Motivating case study of long-horizon rollout error on the 1D-Burgers [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the Autoregression-Free Neural Operators (AFNO). The input physical field is encoded into a compact latent representation using an [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison of predicted solutions over time for the 2D shallow water equation. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of error maps over time for the 2D shallow water equation. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of time-evolving predicted solutions and corresponding error maps for the 1D Kuramoto-Sivashinsky equation. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Rollout error evolution under extrapolation across the PDEs for different models. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison with long-horizon forecasting methods in the 1D-Burgers equation and 2D-NS equation. [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Temporal evolution of rollout errors for three metrics under cross [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Latent space trajectory visualization for the 1D-Burgers equation. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Energy spectrum comparison on the 1D-Burgers equation. [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Latent trajectory (top) and energy spectrum visualization (bottom) for the 2D-NS equations under different parameters. [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Comparison of time-evolving predicted solutions and corresponding error maps for the 1D Burgers equation. [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Comparison of time-evolving predicted solutions and corresponding error maps for the 1D Allen-Cahn equation. [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Qualitative comparison of predicted solutions over time for the 2D Navier-Stokes equations. [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Qualitative comparison of error maps over time for the 2D Navier-Stokes equations. [PITH_FULL_IMAGE:figures/full_fig_p023_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Qualitative comparison of predicted solutions over time for the 2D complex Ginzburg-Landau equation. [PITH_FULL_IMAGE:figures/full_fig_p023_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Qualitative comparison of error maps over time for the 2D complex Ginzburg-Landau equation. [PITH_FULL_IMAGE:figures/full_fig_p023_18.png] view at source ↗
read the original abstract

Neural operators learn mappings from function-dependent inputs to solutions, providing an effective framework for solving partial differential equations (PDEs). For time-dependent PDEs, existing methods typically perform long-horizon prediction through autoregressive rollout directly in high-dimensional physical field spaces, where each predicted state is recursively fed back as the input for the next step. Although effective for short-term prediction, this autoregressive rollout and the lack of continuous-time modeling lead to progressive error accumulation over long-horizon rollouts. In this work, we propose Autoregression-Free Neural Operators (AFNO), which map the time evolution of PDEs into a latent space and model continuous-time vector fields within it. AFNO uses flow matching to learn the latent vector field, thereby enabling continuous evolution over extended horizons, avoiding autoregressive rollout and capturing dynamics under varying parameter configurations through explicit conditioning on physical parameters. Theoretical analysis and extensive experiments on six PDEs demonstrate that AFNO improves long-horizon prediction stability and consistently reduces rollout errors compared with the baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes Autoregression-Free Neural Operators (AFNO) for time-dependent PDEs. It maps PDE time evolution into a latent space, learns a continuous vector field via flow matching, and conditions explicitly on physical parameters. This construction is intended to enable continuous evolution over long horizons without autoregressive feedback in physical space, thereby reducing progressive error accumulation. The authors claim theoretical analysis plus experiments across six PDEs showing improved long-horizon stability and lower rollout errors relative to baselines.

Significance. If the latent-space flow-matching construction faithfully preserves the original PDE dynamics (including parameter dependence) without introducing errors comparable to or larger than those of physical-space autoregression, the method would address a recognized limitation of existing neural operators for long-term time-dependent problems. The explicit parameter conditioning and avoidance of discrete autoregressive steps constitute a clear technical distinction from prior work.

major comments (1)
  1. [Abstract / §3 (method)] The abstract states that the latent mapping and flow-matching step 'preserve the necessary dynamics,' yet no derivation, theorem, or error bound is visible that quantifies how the latent vector field approximates the original PDE operator. Without such a result (e.g., a stability or approximation theorem in §4 or §5), the central claim that rollout errors are reduced remains an unverified assumption.
minor comments (2)
  1. [Abstract] The abstract refers to 'six PDEs' and 'baselines' without naming either; the introduction or experimental section should list the specific equations and competing methods for immediate context.
  2. [§3] Notation for the latent vector field and the conditioning mechanism should be introduced with explicit symbols and a diagram in the method section to aid readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and for highlighting the need for clearer theoretical grounding of the latent-space construction. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract / §3 (method)] The abstract states that the latent mapping and flow-matching step 'preserve the necessary dynamics,' yet no derivation, theorem, or error bound is visible that quantifies how the latent vector field approximates the original PDE operator. Without such a result (e.g., a stability or approximation theorem in §4 or §5), the central claim that rollout errors are reduced remains an unverified assumption.

    Authors: We agree that the manuscript does not contain a formal approximation theorem or explicit error bound relating the learned latent vector field to the original PDE operator. Section 4 provides an analysis of the flow-matching objective and the effect of explicit parameter conditioning, but this analysis stops short of a quantitative stability or approximation result for the encoder-decoder composition. The central empirical claim—that long-horizon rollout error is reduced—is therefore supported by the experiments on six PDEs rather than by a theoretical guarantee. We will revise the abstract and the opening of §3 to replace the phrasing “preserve the necessary dynamics” with language that makes the empirical nature of the validation explicit and will add a short limitations paragraph in §5 acknowledging the absence of such a bound. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract and description provide no equations, derivations, or load-bearing mathematical steps that can be inspected. Claims rest on a proposed method (latent-space flow matching with parameter conditioning) and empirical results on six PDEs, but without visible self-definitional reductions, fitted inputs renamed as predictions, or self-citation chains that collapse the central result to its inputs, the derivation chain cannot be shown to reduce by construction. This is the expected honest non-finding when the source text supplies no inspectable formal content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities used by the method.

pith-pipeline@v0.9.1-grok · 5736 in / 973 out tokens · 36268 ms · 2026-06-29T22:15:49.159734+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TF-SNO: Time-Frequency Gated Spectral Neural Operators for Learning Non-Stationary Partial Differential Equations

    cs.LG 2026-06 unverdicted novelty 6.0

    TF-SNO introduces state-adaptive time-frequency gating in spectral neural operators to model non-stationary PDEs, achieving lower errors than baselines on 1D and 2D benchmarks especially in long rollouts.

Reference graph

Works this paper leans on

64 extracted references · 14 canonical work pages · cited by 1 Pith paper · 9 internal anchors

  1. [1]

    A review of physics- informed machine learning in fluid mechanics,

    P. Sharma, W. T. Chung, B. Akoush, and M. Ihme, “A review of physics- informed machine learning in fluid mechanics,”Energies, vol. 16, no. 5, p. 2343, 2023

  2. [2]

    Current and emerging time-integration strate- gies in global numerical weather and climate prediction,

    G. Mengaldo, A. Wyszogrodzki, M. Diamantakis, S.-J. Lock, F. X. Giraldo, and N. P. Wedi, “Current and emerging time-integration strate- gies in global numerical weather and climate prediction,”Archives of Computational Methods in Engineering, vol. 26, no. 3, pp. 663–684, 2019

  3. [3]

    From traditional to computationally efficient scientific computing algorithms in option pricing: Current progresses with future directions,

    A. Patawari and P. Das, “From traditional to computationally efficient scientific computing algorithms in option pricing: Current progresses with future directions,”Archives of Computational Methods in Engi- neering, pp. 1–40, 2025

  4. [4]

    A review of numerical methods for nonlinear partial differential equations,

    E. Tadmor, “A review of numerical methods for nonlinear partial differential equations,”Bulletin of the American Mathematical Society, vol. 49, no. 4, pp. 507–554, 2012

  5. [5]

    Partial differential equations meet deep neural networks: A survey,

    S. Huang, W. Feng, C. Tang, Z. He, C. Yu, and J. Lv, “Partial differential equations meet deep neural networks: A survey,”IEEE Transactions on Neural Networks and Learning Systems, 2025

  6. [6]

    Neural operators for accelerating scientific simu- lations and design,

    K. Azizzadenesheli, N. Kovachki, Z. Li, M. Liu-Schiaffini, J. Kossaifi, and A. Anandkumar, “Neural operators for accelerating scientific simu- lations and design,”Nature Reviews Physics, vol. 6, no. 5, pp. 320–328, 2024

  7. [7]

    Neural operator: Learning maps between function spaces with applications to pdes,

    N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stu- art, and A. Anandkumar, “Neural operator: Learning maps between function spaces with applications to pdes,”Journal of Machine Learning Research, vol. 24, no. 89, pp. 1–97, 2023

  8. [8]

    Physics-informed neural networks for solving time-dependent mode-resolved phonon boltzmann transport equation,

    J. Zhou, R. Li, and T. Luo, “Physics-informed neural networks for solving time-dependent mode-resolved phonon boltzmann transport equation,”npj Computational Materials, vol. 9, no. 1, p. 212, 2023

  9. [9]

    Temporal neural operator for modeling time-dependent physical phenomena,

    W. Diab and M. Al Kobaisi, “Temporal neural operator for modeling time-dependent physical phenomena,”Scientific Reports, vol. 15, no. 1, p. 32791, 2025

  10. [10]

    Rethinking materials simulations: Blending direct numerical simula- tions with neural operators,

    V . Oommen, K. Shukla, S. Desai, R. Dingreville, and G. E. Karniadakis, “Rethinking materials simulations: Blending direct numerical simula- tions with neural operators,”npj Computational Materials, vol. 10, no. 1, p. 145, 2024

  11. [11]

    When physics meets machine learning: A survey of physics-informed machine learning,

    C. Meng, S. Griesemer, D. Cao, S. Seo, and Y . Liu, “When physics meets machine learning: A survey of physics-informed machine learning,” Machine Learning for Computational Science and Engineering, vol. 1, no. 1, p. 20, 2025

  12. [12]

    Fourier Neural Operator for Parametric Partial Differential Equations

    Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, “Fourier neural operator for parametric partial differential equations,”arXiv preprint arXiv:2010.08895, 2020

  13. [13]

    Seismic traveltime simulation for variable velocity models using physics-informed fourier neural operator,

    C. Song, T. Zhao, U. B. Waheed, C. Liu, and Y . Tian, “Seismic traveltime simulation for variable velocity models using physics-informed fourier neural operator,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–9, 2024

  14. [14]

    Domain agnostic fourier neural operators,

    N. Liu, S. Jafarzadeh, and Y . Yu, “Domain agnostic fourier neural operators,”Advances in Neural Information Processing Systems, vol. 36, pp. 47 438–47 450, 2023

  15. [15]

    Mscalefno: Multi-scale fourier neural op- erator learning for oscillatory functions and wave scattering problems,

    Z. You, Z. Xu, and W. Cai, “Mscalefno: Multi-scale fourier neural op- erator learning for oscillatory functions and wave scattering problems,” Journal of Computational Physics, p. 114530, 2025

  16. [16]

    Deep learning methods for partial differential equations and related parameter identification problems,

    D. N. Tanyu, J. Ning, T. Freudenberg, N. Heilenk ¨otter, A. Rademacher, U. Iben, and P. Maass, “Deep learning methods for partial differential equations and related parameter identification problems,”Inverse Prob- lems, vol. 39, no. 10, p. 103001, 2023

  17. [17]

    To- wards physics-informed deep learning for turbulent flow prediction,

    R. Wang, K. Kashinath, M. Mustafa, A. Albert, and R. Yu, “To- wards physics-informed deep learning for turbulent flow prediction,” inProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 1457–1466

  18. [18]

    Neupde: Neural network based ordinary and partial differential equations for modeling time-dependent data,

    Y . Sun, L. Zhang, and H. Schaeffer, “Neupde: Neural network based ordinary and partial differential equations for modeling time-dependent data,” inMathematical and Scientific Machine Learning. PMLR, 2020, pp. 352–372

  19. [19]

    Three-dimensional deep learning-based re- duced order model for unsteady flow dynamics with variable reynolds number,

    R. Gupta and R. Jaiman, “Three-dimensional deep learning-based re- duced order model for unsteady flow dynamics with variable reynolds number,”Physics of Fluids, vol. 34, no. 3, 2022

  20. [20]

    Diffusion models in vision: A survey,

    F.-A. Croitoru, V . Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,”IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 9, pp. 10 850–10 869, 2023

  21. [21]

    High- resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695

  22. [22]

    Flowturbo: Towards real- time flow-based image generation with velocity refiner,

    W. Zhao, M. Shi, X. Yu, J. Zhou, and J. Lu, “Flowturbo: Towards real- time flow-based image generation with velocity refiner,”Advances in Neural Information Processing Systems, vol. 37, pp. 4148–4176, 2024

  23. [23]

    Image gener- ation: A review,

    M. Elasri, O. Elharrouss, S. Al-Maadeed, and H. Tairi, “Image gener- ation: A review,”Neural Processing Letters, vol. 54, no. 5, pp. 4609– 4646, 2022

  24. [24]

    Pdebench: An extensive benchmark for scientific machine learning,

    M. Takamoto, T. Praditia, R. Leiteritz, D. MacKinlay, F. Alesiani, D. Pfl ¨uger, and M. Niepert, “Pdebench: An extensive benchmark for scientific machine learning,”Advances in Neural Information Processing Systems, vol. 35, pp. 1596–1611, 2022

  25. [25]

    DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

    L. Lu, P. Jin, and G. E. Karniadakis, “Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators,”arXiv preprint arXiv:1910.03193, 2019

  26. [26]

    Neural Operator: Graph Kernel Network for Partial Differential Equations

    Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, “Neural operator: Graph kernel network for partial differential equations,”arXiv preprint arXiv:2003.03485, 2020

  27. [27]

    Fourier neural operator with learned deformations for pdes on general geometries,

    Z. Li, D. Z. Huang, B. Liu, and A. Anandkumar, “Fourier neural operator with learned deformations for pdes on general geometries,”Journal of Machine Learning Research, vol. 24, no. 388, pp. 1–26, 2023

  28. [28]

    Physics-informed neural operator for learning partial differential equations,

    Z. Li, H. Zheng, N. Kovachki, D. Jin, H. Chen, B. Liu, K. Aziz- zadenesheli, and A. Anandkumar, “Physics-informed neural operator for learning partial differential equations,”ACM/IMS Journal of Data Science, vol. 1, no. 3, pp. 1–27, 2024

  29. [29]

    Scientific machine learning through physics–informed neural networks: Where we are and what’s next,

    S. Cuomo, V . S. Di Cola, F. Giampaolo, G. Rozza, M. Raissi, and F. Piccialli, “Scientific machine learning through physics–informed neural networks: Where we are and what’s next,”Journal of Scientific Computing, vol. 92, no. 3, p. 88, 2022

  30. [30]

    Convolutional neural operators for robust and accurate learning of pdes,

    B. Raonic, R. Molinaro, T. De Ryck, T. Rohner, F. Bartolucci, R. Alai- fari, S. Mishra, and E. de B ´ezenac, “Convolutional neural operators for robust and accurate learning of pdes,”Advances in Neural Information Processing Systems, vol. 36, pp. 77 187–77 200, 2023

  31. [31]

    Gnot: A general neural operator transformer for operator learning,

    Z. Hao, Z. Wang, H. Su, C. Ying, Y . Dong, S. Liu, Z. Cheng, J. Song, and J. Zhu, “Gnot: A general neural operator transformer for operator learning,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 12 556–12 569

  32. [32]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

  33. [33]

    Mamba neural operator: Who wins? transformers vs. state-space models for pdes,

    C.-W. Cheng, J. Huang, Y . Zhang, G. Yang, C.-B. Sch ¨onlieb, and A. I. Aviles-Rivero, “Mamba neural operator: Who wins? transformers vs. state-space models for pdes,”Journal of Computational Physics, p. 114567, 2025

  34. [34]

    Mamba: Linear-time sequence modeling with selective state spaces,

    A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” inFirst conference on language modeling, 2024

  35. [35]

    Recurrent neural operators: Stable long-term pde prediction,

    Z. Ye, C.-S. Zhang, and W. Wang, “Recurrent neural operators: Stable long-term pde prediction,”arXiv preprint arXiv:2505.20721, 2025

  36. [36]

    SGNO: Spectral Generator Neural Operators for Stable Long Horizon PDE Rollouts

    J. Li, Z. Wang, and F. D. Salim, “Sgno: Spectral generator neu- ral operators for stable long horizon pde rollouts,”arXiv preprint arXiv:2602.18801, 2026

  37. [37]

    Flow matching in latent space.arXiv preprint arXiv:2307.08698, 2023

    Q. Dao, H. Phung, B. Nguyen, and A. Tran, “Flow matching in latent space,”arXiv preprint arXiv:2307.08698, 2023

  38. [38]

    Weditgan: Few-shot image generation via latent space relocation,

    Y . Duan, L. Niu, Y . Hong, and L. Zhang, “Weditgan: Few-shot image generation via latent space relocation,” inProceedings of the AAAI JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 16 conference on artificial intelligence, vol. 38, no. 2, 2024, pp. 1653– 1661

  39. [39]

    Deepmdp: Learning continuous latent space models for representation learning,

    C. Gelada, S. Kumar, J. Buckman, O. Nachum, and M. G. Bellemare, “Deepmdp: Learning continuous latent space models for representation learning,” inInternational conference on machine learning. PMLR, 2019, pp. 2170–2179

  40. [40]

    Auto-Encoding Variational Bayes

    D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”arXiv preprint arXiv:1312.6114, 2013

  41. [41]

    Generative adversarial networks,

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020

  42. [42]

    Density estimation using Real NVP

    L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real nvp,”arXiv preprint arXiv:1605.08803, 2016

  43. [43]

    Glow: Generative flow with invertible 1x1 convolutions,

    D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,”Advances in neural information processing systems, vol. 31, 2018

  44. [44]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840– 6851, 2020

  45. [45]

    Bert: Pre-training of deep bidirectional transformers for language understanding,

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inPro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp. 4171–4186

  46. [46]

    Neural discrete representation learning,

    A. Van Den Oord, O. Vinyalset al., “Neural discrete representation learning,”Advances in neural information processing systems, vol. 30, 2017

  47. [47]

    WaveNet: A Generative Model for Raw Audio

    S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalch- brenner, A. Senior, K. Kavukcuogluet al., “Wavenet: A generative model for raw audio,”arXiv preprint arXiv:1609.03499, vol. 12, p. 1, 2016

  48. [48]

    Latent neural operator for solving forward and inverse pde problems,

    T. Wang and C. Wang, “Latent neural operator for solving forward and inverse pde problems,”Advances in Neural Information Processing Systems, vol. 37, pp. 33 085–33 107, 2024

  49. [49]

    Scheduled sampling for sequence prediction with recurrent neural networks,

    S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, “Scheduled sampling for sequence prediction with recurrent neural networks,”Advances in neural information processing systems, vol. 28, 2015

  50. [50]

    A reduction of imitation learning and structured prediction to no-regret online learning,

    S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” inProceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011, pp. 627–635

  51. [51]

    Neural ordinary differential equations,

    R. T. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,”Advances in neural information pro- cessing systems, vol. 31, 2018

  52. [52]

    Latent ordinary differential equations for irregularly-sampled time series,

    Y . Rubanova, R. T. Chen, and D. K. Duvenaud, “Latent ordinary differential equations for irregularly-sampled time series,”Advances in neural information processing systems, vol. 32, 2019

  53. [53]

    The fast fourier transform,

    H. J. Nussbaumer, “The fast fourier transform,” inFast Fourier transform and convolution algorithms. Springer, 1981, pp. 80–111

  54. [54]

    Learning repre- sentations by back-propagating errors,

    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning repre- sentations by back-propagating errors,”nature, vol. 323, no. 6088, pp. 533–536, 1986

  55. [55]

    Computing nearly singular solutions using pseudo- spectral methods,

    T. Y . Hou and R. Li, “Computing nearly singular solutions using pseudo- spectral methods,”Journal of Computational Physics, vol. 226, no. 1, pp. 379–397, 2007

  56. [56]

    A practical method for numerical evaluation of solutions of partial differential equations of the heat-conduction type,

    J. Crank and P. Nicolson, “A practical method for numerical evaluation of solutions of partial differential equations of the heat-conduction type,” inMathematical proceedings of the Cambridge philosophical society, vol. 43, no. 1. Cambridge University Press, 1947, pp. 50–67

  57. [57]

    U-no: U-shaped neural operators,

    M. A. Rahman, Z. E. Ross, and K. Azizzadenesheli, “U-no: U-shaped neural operators,”arXiv preprint arXiv:2204.11127, 2022

  58. [58]

    Wavelet neural operator for solving parametric partial differential equations in computational mechanics problems,

    T. Tripura and S. Chakraborty, “Wavelet neural operator for solving parametric partial differential equations in computational mechanics problems,”Computer Methods in Applied Mechanics and Engineering, vol. 404, p. 115783, 2023

  59. [59]

    Lightweight fourier neural operator for time-dependent partial differential equations,

    D. Ahn, S. Chandran, D. Leibovici, N. Kovachki, V . Papalexakis, and J. Kossaifi, “Lightweight fourier neural operator for time-dependent partial differential equations,” inProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) - Workshop on Machine Learning and the Physical Sciences, 2025. [Online]. Available: https://...

  60. [60]

    Freqmoe: Dynamic frequency enhancement for neural pde solvers,

    T. Chen, H. Zhou, Y . Li, H. Wang, Z. Zhang, T. Zhu, S. Zhang, and J. Li, “Freqmoe: Dynamic frequency enhancement for neural pde solvers,” arXiv preprint arXiv:2505.06858, 2025

  61. [61]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma, “Adam: A method for stochastic optimization,”arXiv preprint arXiv:1412.6980, 2014

  62. [62]

    Message passing neural pde solvers.arXiv preprint arXiv:2202.03376, 2022

    J. Brandstetter, D. Worrall, and M. Welling, “Message passing neural pde solvers,”arXiv preprint arXiv:2202.03376, 2022

  63. [63]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  64. [64]

    A Tutorial on Principal Component Analysis

    J. Shlens, “A tutorial on principal component analysis,”arXiv preprint arXiv:1404.1100, 2014. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 17 APPENDIXA BENCHMARKDESIGNS This section provides additional details on the specific design of the PDEs used in the experiments. This section de- scribes the specific 1D and 2D PDEs used in the experimen...