Stable Fine-Time-Step Long-Horizon Turbulence Prediction with a Multi-Stepsize Mixture-of-Experts Neural Operator

Guanyu Pan; Huiyu Yang; Jianchun Wang; Nianyu Yi; Yunpeng Wang; Zikun Xu

arxiv: 2604.12794 · v1 · submitted 2026-04-14 · ⚛️ physics.flu-dyn

Stable Fine-Time-Step Long-Horizon Turbulence Prediction with a Multi-Stepsize Mixture-of-Experts Neural Operator

Guanyu Pan , Huiyu Yang , Yunpeng Wang , Zikun Xu , Jianchun Wang , Nianyu Yi This is my paper

Pith reviewed 2026-05-10 14:10 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn

keywords neural operatorsmixture of expertsturbulent flow predictionautoregressive forecastinghomogeneous isotropic turbulencechannel flowtime-step adaptationfluid dynamics

0 comments

The pith

Mixture-of-experts neural operators stay stable during long turbulence forecasts at fine time steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that a mixture-of-experts neural operator can deliver stable long-horizon predictions of three-dimensional turbulence even when the time steps are kept small. The key is to condition the model on the desired time stride and let a router pick the right experts for that scale. A sympathetic reader would care because many engineering and scientific applications need accurate long-time statistics from turbulent flows, yet standard neural operators accumulate errors too fast at fine resolutions. The model is tested on forced homogeneous isotropic turbulence and turbulent channel flow using filtered direct numerical simulation data at up to twenty times finer temporal resolution than prior studies.

Core claim

The Ms-MoE-IFactFormer architecture conditions on relative stride and employs a time-step router to activate scale-specific routed experts along with a shared expert, allowing one model to represent multiple time-advancement operators and yielding more stable autoregressive long-horizon predictions with better agreement to long-time-averaged statistics on forced homogeneous isotropic turbulence and turbulent channel flow datasets at up to twenty times finer temporal resolution.

What carries the argument

Multi-stepsize mixture-of-experts (Ms-MoE) neural operator on an implicit factorized Transformer backbone, using a router that selects experts based on the input relative stride to handle different temporal scales.

If this is right

Long-horizon autoregressive rollouts remain stable at fine temporal resolutions instead of degrading quickly.
Improved matching to time-averaged flow statistics on both homogeneous isotropic turbulence and channel flow cases.
One architecture can serve as a family of stride-parameterized operators without retraining separate models.
Opens way for applying similar techniques to more complex turbulent flows beyond the tested cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the routing mechanism generalizes, it could allow adaptive time-stepping in simulations where local flow features require varying resolutions.
The approach might integrate with physics-informed constraints to further reduce drift in conserved quantities over long times.
Testing on experimental data rather than filtered DNS could reveal sensitivity to noise or incomplete observations.

Load-bearing premise

That routing to stride-specific experts plus a shared expert based on relative stride is enough to control error accumulation in fine-step autoregressive rollouts for the range of turbulent flows considered.

What would settle it

A long-horizon rollout on the channel flow dataset that shows rapid growth in deviation from reference statistics or numerical instability within the tested time horizon.

Figures

Figures reproduced from arXiv: 2604.12794 by Guanyu Pan, Huiyu Yang, Jianchun Wang, Nianyu Yi, Yunpeng Wang, Zikun Xu.

**Figure 1.** Figure 1: Ms-MoE-IFactFormer framework with a shared expert, scale-routed experts, and a stride-indexed multilayer perceptron (MLP) corrector [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Channel ∆T50: x–y slices of the streamwise velocity at z = 8 for the fDNS reference, FNO, IFactFormer, and Ms-MoE-IFactFormer (top to bottom). The columns correspond to rollout steps n = 100, 500, 1000, and 2000 [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Channel ∆T50: long-time-averaged wall-normal statistics for the fDNS reference, DSM, WALE, IFactFormer, and Ms-MoE-IFactFormer. Panels (a)–(e) show ⟨u + ⟩, ⟨u ′ v ′ ⟩, u + rms, v + rms, and w + rms, respectively. 10 [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Channel ∆T10: x–y slices of the streamwise velocity at z = 8 for the fDNS reference, FNO, IFactFormer, and Ms-MoE-IFactFormer (top to bottom). The columns correspond to rollout steps n = 500, 1000, 2000, and 4000 [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Channel ∆T10: long-time-averaged wall-normal statistics for the fDNS reference, DSM, WALE, IFactFormer, and Ms-MoE-IFactFormer. Panels (a)–(e) show ⟨u + ⟩, ⟨u ′ v ′ ⟩, u + rms, v + rms, and w + rms, respectively. 11 [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: HIT ∆T50: kinetic energy spectra E(k) for the fDNS reference, DSM, FNO, IFactFormer, and Ms-MoE-IFactFormer. Legends mark unstable FNO rollouts as FNO (NaN). Panels (a)–(d) correspond to t/τ ≈ 10, 20, 40, and 80, respectively [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: HIT ∆T50: PDFs of the normalized longitudinal velocity increment δru/u rms at r = ∆ for the fDNS reference, DSM, FNO, IFactFormer, and Ms-MoE-IFactFormer. Legends mark unstable FNO rollouts as FNO (NaN). Panels (a)–(d) correspond to t/τ ≈ 10, 20, 40, and 80, respectively. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: HIT ∆T50: PDFs of the normalized vorticity magnitude ¯ω/ω¯ rms fDNS for the fDNS reference, DSM, FNO, IFactFormer, and Ms-MoEIFactFormer. Legends mark unstable FNO rollouts as FNO (NaN). Panels (a)–(d) correspond to t/τ ≈ 10, 20, 40, and 80, respectively. 3.2.3. HIT-∆T10 (∆T = 0.01) At the finer interval ∆T10, all learned baselines remain statistically comparable over the reported horizon, so the differen… view at source ↗

**Figure 9.** Figure 9: HIT ∆T10: kinetic energy spectra E(k) for the fDNS reference, DSM, FNO, IFactFormer, and Ms-MoE-IFactFormer. Panels (a)–(d) correspond to t/τ ≈ 4, 8, 16, and 32, respectively [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: HIT ∆T10: PDFs of the normalized longitudinal velocity increment δru/u rms at r = ∆ for the fDNS reference, DSM, FNO, IFactFormer, and Ms-MoE-IFactFormer. Panels (a)–(d) correspond to t/τ ≈ 4, 8, 16, and 32, respectively. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: HIT ∆T10: PDFs of the normalized vorticity magnitude ¯ω/ω¯ rms fDNS for the fDNS reference, DSM, FNO, IFactFormer, and Ms-MoEIFactFormer. Panels (a)–(d) correspond to t/τ ≈ 4, 8, 16, and 32, respectively. 3.3. Ablation on Ms-MoE hyperparameters We briefly examine the sensitivity of the MoE design to (K, Tmax) and the router parameters (σ, p) under the same training budget. Across both benchmarks, the beh… view at source ↗

read the original abstract

Neural operators have been increasingly used as data-driven surrogates for time-marching predictions of turbulent flows. However, long-horizon autoregressive prediction is sensitive to error accumulation and the choice of prediction interval. Excessively small time increments may increase temporal redundancy and lengthen rollouts, which can degrade the stability of neural operators in turbulence forecasting. This work pursues a unified objective: stable long-horizon autoregressive prediction at fine temporal resolution for three-dimensional turbulence. We propose a multi-stepsize mixture-of-experts (Ms-MoE) neural operator built on an implicit factorized Transformer (IFactFormer) backbone. The model conditions on a requested relative stride and uses a time-step router to activate scale-specific routed experts together with a shared expert, yielding a single architecture that represents a family of stride-parameterized time-advancement operators. We evaluate the approach on forced homogeneous isotropic turbulence (HIT) and turbulent channel flow using filtered direct numerical simulation datasets. Relative to sampling intervals used in previous studies, we construct training datasets with up to 20 times finer temporal resolution and report long-horizon autoregressive rollouts using qualitative time-slice comparisons and long-time-averaged statistics. Ms-MoE-IFactFormer yields more stable long-horizon rollouts and improved agreement with long-time-averaged statistics on both HIT and turbulent channel flow, suggesting potential for stable time-marching at fine temporal resolution in more complex turbulent flows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a multi-stepsize mixture-of-experts neural operator (Ms-MoE-IFactFormer) built on an implicit factorized Transformer backbone for stable long-horizon autoregressive prediction of 3D turbulent flows at fine temporal resolution. The model conditions on a requested relative stride, employs a time-step router to activate scale-specific routed experts plus a shared expert, and is evaluated on filtered DNS datasets for forced homogeneous isotropic turbulence (HIT) and turbulent channel flow. It claims more stable rollouts and improved agreement with long-time-averaged statistics relative to prior sampling intervals, suggesting applicability to more complex flows.

Significance. If the quantitative results hold, the stride-conditioned Ms-MoE approach provides a unified architecture for a family of time-advancement operators, addressing error accumulation in fine-time-step autoregressive rollouts of turbulence. This could meaningfully advance neural-operator surrogates in fluid dynamics by enabling stable long-horizon predictions at temporal resolutions up to 20 times finer than previous studies, with potential for broader use in multi-scale flow modeling.

major comments (2)

Abstract and §4 (results): the central claim of 'more stable long-horizon rollouts and improved agreement with long-time-averaged statistics' is asserted without reported quantitative metrics (e.g., L2 error norms, kinetic-energy spectra, or stability measures with error bars over rollout horizon), training details, or ablation studies on the router/expert activation; this prevents verification that the stride conditioning actually suppresses error accumulation as hypothesized.
§3 (method): the description of the time-step router and scale-specific experts is high-level; without explicit equations for the conditioning mechanism, router loss, or how relative stride is encoded into the IFactFormer layers, it is unclear whether the architecture guarantees the claimed parameter-free family of operators or merely interpolates between discrete strides.

minor comments (2)

Abstract: 'up to 20 times finer temporal resolution' should specify the exact baseline sampling intervals from prior studies for reproducibility.
Notation: the acronym 'Ms-MoE-IFactFormer' is introduced without expanding 'IFactFormer' on first use in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major point below, providing clarifications and committing to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [—] Abstract and §4 (results): the central claim of 'more stable long-horizon rollouts and improved agreement with long-time-averaged statistics' is asserted without reported quantitative metrics (e.g., L2 error norms, kinetic-energy spectra, or stability measures with error bars over rollout horizon), training details, or ablation studies on the router/expert activation; this prevents verification that the stride conditioning actually suppresses error accumulation as hypothesized.

Authors: We agree that the presentation would benefit from additional quantitative support. The manuscript already reports long-time-averaged statistics (mean velocity profiles, Reynolds stresses for channel flow, and kinetic energy spectra for HIT) that quantify improved agreement relative to baselines. To directly address the concern, we will revise §4 to include L2 error norms of the velocity field over increasing rollout horizons, time-evolving kinetic energy spectra, and stability metrics (e.g., error growth rates) with error bars computed from multiple independent rollouts. Expanded training details (dataset sizes, optimizer settings, and hyperparameter choices) and ablation studies isolating the router and expert activation will also be added to demonstrate the contribution of stride conditioning to error suppression. These revisions will be incorporated in the next version. revision: yes
Referee: [—] §3 (method): the description of the time-step router and scale-specific experts is high-level; without explicit equations for the conditioning mechanism, router loss, or how relative stride is encoded into the IFactFormer layers, it is unclear whether the architecture guarantees the claimed parameter-free family of operators or merely interpolates between discrete strides.

Authors: The §3 description is concise by design, but we accept that explicit formulations are needed for reproducibility. The relative stride s_rel is encoded as a continuous scalar that is embedded and injected into the query, key, and value projections of the IFactFormer layers, enabling the shared backbone to modulate its temporal scale without any parameter changes. The router is a lightweight MLP whose output gates the scale-specific experts; it is trained with the primary prediction loss plus a load-balancing term that penalizes under-utilization of experts. This conditioning produces a single set of weights that realizes a continuous family of operators for arbitrary strides via the learned embedding, rather than discrete interpolation. We will expand §3 with the precise equations for stride embedding, router gating, and the composite loss, together with a schematic of the conditioning path, in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper proposes a new Ms-MoE-IFactFormer architecture that conditions on relative stride to activate scale-specific routed experts plus a shared expert, yielding a family of stride-parameterized operators. The central claims of improved stability and agreement with long-time-averaged statistics in long-horizon autoregressive rollouts are supported by direct evaluation on filtered DNS datasets for HIT and turbulent channel flow at up to 20x finer temporal resolution than prior work. No equations, fitted parameters, or results are shown to reduce by construction to the inputs or to prior self-citations; the model is introduced as an explicit design choice and the reported improvements are independent empirical outcomes rather than tautological renamings or self-referential fits.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The abstract identifies no explicit free parameters, background axioms, or additional invented physical entities; the central contribution is the proposed neural architecture itself.

invented entities (1)

Ms-MoE-IFactFormer no independent evidence
purpose: Single architecture representing a family of stride-parameterized time-advancement operators for turbulence
The model is introduced as the novel contribution in the abstract.

pith-pipeline@v0.9.0 · 5575 in / 1190 out tokens · 39825 ms · 2026-05-10T14:10:40.694337+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages

[1]

S. B. Pope,Turbulent Flows, Cambridge University Press, (2000)

work page 2000
[2]

S. L. Brunton, B. R. Noack, and P. Koumoutsakos,Machine learning for fluid mechanics, Annual Review of Fluid Mechanics, 52(1)(2020), 477–508. 18

work page 2020
[3]

Duraisamy, G

K. Duraisamy, G. Iaccarino, and H. Xiao,Turbulence modeling in the age of data, Annual Review of Fluid Mechanics, 51(1)(2019), 357–377

work page 2019
[4]

A. D. Beck, and M. Kurz,A perspective on machine learning methods in turbulence modeling, GAMM- Mitteilungen, 44(1)(2021), e202100002

work page 2021
[5]

Kovachki, Z

N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. M. Stuart, and A. Anandkumar,Neural Operator: Learning Maps Between Function Spaces with Applications to PDEs, Journal of Machine Learning Research, 24(89)(2023), 1–97

work page 2023
[6]

L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis,Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators, Nature Machine Intelligence, 3(3)(2021), 218–229

work page 2021
[7]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anandkumar,Fourier Neural Operator for Parametric Partial Differential Equations, International Conference on Learning Represen- tations (ICLR), (2021)

work page 2021
[8]

Z. Li, W. Peng, Z. Yuan, and J. Wang,Fourier neural operator approach to large eddy simulation of three- dimensional turbulence, Theoretical and Applied Mechanics Letters, 12(6)(2022), 100389

work page 2022
[9]

T. Luo, Z. Li, Z. Yuan, W. Peng, T. Liu, L. Wang, and J. Wang,Fourier neural operator for large eddy simulation of compressible Rayleigh–Taylor turbulence, Physics of Fluids, 36(7)(2024), 075165

work page 2024
[10]

Park, and H

J. Park, and H. Choi,Toward neural-network-based large-eddy simulation: application to turbulent channel flow, Journal of Fluid Mechanics, 914(2021), A16

work page 2021
[11]

Y . Guan, A. Chattopadhyay, A. Subel, and P. Hassanzadeh,Stable a posteriori LES of 2D turbulence using convolutional neural networks: Backscattering analysis and generalization to higher Re via transfer learning, Journal of Computational Physics, 458(2022), 111090

work page 2022
[12]

S. Zhao, Z. Li, B. Fan, Y . Wang, H. Yang, and J. Wang,LESnets (large-eddy simulation nets): Physics-informed neural operator for large-eddy simulation of turbulence, Journal of Computational Physics, 537(2025), 114125

work page 2025
[13]

W. Peng, Z. Yuan, and J. Wang,Attention-enhanced neural network models for turbulence simulation, Physics of Fluids, 34(2)(2022), 025111

work page 2022
[14]

W. Peng, Z. Yuan, Z. Li, and J. Wang,Linear attention coupled Fourier neural operator for simulation of three- dimensional turbulence, Physics of Fluids, 35(1)(2023), 015106

work page 2023
[15]

Z. Hao, Z. Wang, H. Su, C. Ying, Y . Dong, S. Liu, Z. Cheng, J. Song, and J. Zhu,GNOT: A general neural oper- ator transformer for operator learning, Proceedings of the 40th International Conference on Machine Learning (ICML), PMLR, 202(2023), 12556–12569

work page 2023
[16]

Z. Li, D. Shu, and A. Barati Farimani,Scalable Transformer for PDE Surrogate Modeling, Advances in Neural Information Processing Systems, 36(2023), 28010–28039

work page 2023
[17]

H. Wu, H. Luo, H. Wang, J. Wang, and M. Long,Transolver: A Fast Transformer Solver for PDEs on General Geometries, Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR, 235(2024), 53681–53705

work page 2024
[18]

Z. Li, T. Liu, W. Peng, Z. Yuan, and J. Wang,A transformer-based neural operator for large-eddy simulation of turbulence, Physics of Fluids, 36(6)(2024), 065167

work page 2024
[19]

Du, and A

Y . Du, and A. S. Krishnapriyan,EddyFormer: Accelerated Neural Simulations of Three-Dimensional Turbulence at Scale, Advances in Neural Information Processing Systems, 38(2025)

work page 2025
[20]

P. Lai, Y . Chen, D. Yang, R. Wang, F. Wang, and H. Xu,From Complex Dynamics to DynFormer: Rethinking Transformers for PDEs, arXiv preprint, arXiv:2603.03112, doi:10.48550/arXiv.2603.03112, (2026). 19

work page doi:10.48550/arxiv.2603.03112 2026
[21]

H. Yang, Z. Li, X. Wang, and J. Wang,An implicit factorized transformer with applications to fast prediction of three-dimensional turbulence, Theoretical and Applied Mechanics Letters, 14(6)(2024), 100527

work page 2024
[22]

H. Yang, Y . Wang, and J. Wang,Implicit factorized transformer approach to fast prediction of turbulent channel flows, Science China Physics, Mechanics & Astronomy, 69(1)(2026), 214606

work page 2026
[23]

Gonzalez, F.-X

F. Gonzalez, F.-X. Demoulin, and S. Bernard,Towards Long-Term Predictions of Turbulence Using Neural Operators, arXiv preprint, arXiv:2307.13517, doi:10.48550/arXiv.2307.13517, (2023)

work page doi:10.48550/arxiv.2307.13517 2023
[24]

Wu, X.-L

C. Wu, X.-L. Zhang, and G. He,Neural operator-based stochastic forcing for resolvent prediction of space-time turbulence statistics in channel flows, Journal of Fluid Mechanics, 1024(2025), A1

work page 2025
[25]

Z. Li, W. Peng, Z. Yuan, and J. Wang,Long-term predictions of turbulence by implicit U-Net enhanced Fourier neural operator, Physics of Fluids, 35(7)(2023), 075145

work page 2023
[26]

Y . Wang, Z. Li, Z. Yuan, W. Peng, T. Liu, and J. Wang,Prediction of turbulent channel flow using Fourier neural operator-based machine-learning strategy, Physical Review Fluids, 9(2024), 084604

work page 2024
[27]

X. Zou, Z. Li, Y . Wang, H. Yang, and J. Wang,Uncertainty quantification and stability of neural operators for prediction of three-dimensional turbulence, Journal of Computational Physics, 549(2026), 114640

work page 2026
[28]

McCabe, P

M. McCabe, P. Harrington, S. Subramanian, and J. Brown,Towards Stability of Autoregressive Neural Operators, Transactions on Machine Learning Research, (2023)

work page 2023
[29]

Bengio, O

S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer,Scheduled sampling for sequence prediction with recurrent neural networks, Advances in Neural Information Processing Systems, 28(2015), 1171–1179

work page 2015
[30]

A. Lamb, A. Goyal, Y . Zhang, S. Zhang, A. Courville, and Y . Bengio,Professor forcing: A new algorithm for training recurrent networks, Advances in Neural Information Processing Systems, 29(2016), 4601–4609

work page 2016
[31]

Choi, and P

H. Choi, and P. Moin,Effects of the computational time step on numerical solutions of turbulent flow, Journal of Computational Physics, 113(1)(1994), 1–4

work page 1994
[32]

P. K. Yeung, K. R. Sreenivasan, and S. B. Pope,Effects of finite spatial and temporal resolution in direct numer- ical simulations of incompressible isotropic turbulence, Physical Review Fluids, 3(6)(2018), 064603

work page 2018
[33]

P. K. Yeung, and S. B. Pope,An algorithm for tracking fluid particles in numerical simulations of homogeneous turbulence, Journal of Computational Physics, 79(2)(1988), 373–416

work page 1988
[34]

Fossella, L

F. Fossella, L. Biferale, A. Carrassi, M. Cencini, and V . Gupta,Multiscale data assimilation in turbulent models, Physical Review E, 113(2)(2026), 024208

work page 2026
[35]

D. B. Quinn, Y . van Halder, and D. Lentink,Adaptive control of turbulence intensity is accelerated by frugal flow sampling, Journal of The Royal Society Interface, 14(136)(2017), 20170621

work page 2017
[36]

Y . Liu, J. N. Kutz, and S. L. Brunton,Hierarchical deep learning of multiscale differential equation time- steppers, Philosophical Transactions of the Royal Society A, 380(2229)(2022), 20210200

work page 2022
[37]

A. J. Linot, J. Burby, Q. Tang, P. Balaprakash, M. D. Graham, and R. Maulik,Stabilized neural ordinary differ- ential equations for long-time forecasting of dynamical systems, Journal of Computational Physics, 474(2023), 111838

work page 2023
[38]

Chen, and J.-L

C. Chen, and J.-L. Wu,Neural dynamical operator: Continuous spatial-temporal model with gradient-based and derivative-free optimization methods, Journal of Computational Physics, 520(2025), 113480

work page 2025
[39]

D. W. Abueidda, M. Nonna, P. Pantidis, and M. E. Mobasher,Time resolution independent operator learning, Computer Methods in Applied Mechanics and Engineering, 450(2026), 118586. 20

work page 2026
[40]

Lippe, B

P. Lippe, B. S. Veeling, P. Perdikaris, R. E. Turner, and J. Brandstetter,PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers, Advances in Neural Information Processing Systems, 36(2023), 67398– 67433

work page 2023
[41]

Huang, and P

X. Huang, and P. Perdikaris,PhysicsCorrect: A Training-Free Approach for Stable Neural PDE Simulations, arXiv preprint, arXiv:2507.02227, doi:10.48550/arXiv.2507.02227, (2025)

work page doi:10.48550/arxiv.2507.02227 2025
[42]

R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton,Adaptive mixtures of local experts, Neural Compu- tation, 3(1)(1991), 79–87

work page 1991
[43]

Shazeer, A

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V . Le, G. E. Hinton, and J. Dean,Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, International Conference on Learning Representations (ICLR), (2017)

work page 2017
[44]

D. Dai, C. Deng, C. Zhao, R. X. Xu, H. Gao, D. Chen, J. Li, W. Zeng, X. Yu, Y . Wu, Z. Xie, Y . K. Li, P. Huang, F. Luo, C. Ruan, Z. Sui, and W. Liang,DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture- of-Experts Language Models, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), ...

work page doi:10.18653/v1/2024.acl-long.70 2024
[45]

H. Wang, H. Xin, J. Wang, X. Yang, F. Zha, H. Dong, and Y . Jiang,Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training, arXiv preprint, arXiv:2510.25803, doi:10.48550/arXiv.2510.25803, (2025)

work page doi:10.48550/arxiv.2510.25803 2025
[46]

D. Sun, X. Zhou, X. Wang, H. Si, W. Lyu, J. Tang, and B. Luo,NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training, arXiv preprint, arXiv:2602.22059, doi:10.48550/arXiv.2602.22059, (2026)

work page doi:10.48550/arxiv.2602.22059 2026
[47]

X. Han, L. Wei, Z. Dou, Y . Sun, Z. Han, and Q. Tian,ViMoE: An Empirical Study of Designing Vision Mixture- of-Experts, IEEE Transactions on Image Processing, 34(2025), 7209–7221

work page 2025
[48]

Smagorinsky,General circulation experiments with the primitive equations

J. Smagorinsky,General circulation experiments with the primitive equations. I. The basic experiment, Monthly Weather Review, 91(3)(1963), 99–164

work page 1963
[49]

Holtzman, J

A. Holtzman, J. Buys, L. Du, M. Forbes, and Y . Choi,The Curious Case of Neural Text Degeneration, Interna- tional Conference on Learning Representations (ICLR), (2020)

work page 2020
[50]

P. Moin, K. Squires, W. Cabot, and S. Lee,A dynamic subgrid-scale model for compressible turbulence and scalar transport, Physics of Fluids A: Fluid Dynamics, 3(11)(1991), 2746–2757

work page 1991
[51]

Nicoud, and F

F. Nicoud, and F. Ducros,Subgrid-scale stress modelling based on the square of the velocity gradient tensor, Flow, Turbulence and Combustion, 62(3)(1999), 183–200

work page 1999
[52]

Loshchilov, and F

I. Loshchilov, and F. Hutter,Decoupled Weight Decay Regularization, International Conference on Learning Representations (ICLR), (2019)

work page 2019
[53]

Sanchez-Gonzalez, J

A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W. Battaglia,Learning to simulate complex physics with graph networks, Proceedings of the 37th International Conference on Machine Learning, PMLR, 119(2020), 8459–8468

work page 2020
[54]

Stachenfeld, D

K. Stachenfeld, D. B. Fielding, D. Kochkov, M. Cranmer, T. Pfaff, J. Godwin, C. Cui, S. Ho, P. W. Battaglia, and A. Sanchez-Gonzalez,Learned coarse models for efficient turbulence simulation, International Conference on Learning Representations (ICLR), (2022)

work page 2022
[55]

A. Tran, A. Mathews, L. Xie, and C. S. Ong,Factorized Fourier neural operators, International Conference on Learning Representations (ICLR), (2023). 21

work page 2023

[1] [1]

S. B. Pope,Turbulent Flows, Cambridge University Press, (2000)

work page 2000

[2] [2]

S. L. Brunton, B. R. Noack, and P. Koumoutsakos,Machine learning for fluid mechanics, Annual Review of Fluid Mechanics, 52(1)(2020), 477–508. 18

work page 2020

[3] [3]

Duraisamy, G

K. Duraisamy, G. Iaccarino, and H. Xiao,Turbulence modeling in the age of data, Annual Review of Fluid Mechanics, 51(1)(2019), 357–377

work page 2019

[4] [4]

A. D. Beck, and M. Kurz,A perspective on machine learning methods in turbulence modeling, GAMM- Mitteilungen, 44(1)(2021), e202100002

work page 2021

[5] [5]

Kovachki, Z

N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. M. Stuart, and A. Anandkumar,Neural Operator: Learning Maps Between Function Spaces with Applications to PDEs, Journal of Machine Learning Research, 24(89)(2023), 1–97

work page 2023

[6] [6]

L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis,Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators, Nature Machine Intelligence, 3(3)(2021), 218–229

work page 2021

[7] [7]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anandkumar,Fourier Neural Operator for Parametric Partial Differential Equations, International Conference on Learning Represen- tations (ICLR), (2021)

work page 2021

[8] [8]

Z. Li, W. Peng, Z. Yuan, and J. Wang,Fourier neural operator approach to large eddy simulation of three- dimensional turbulence, Theoretical and Applied Mechanics Letters, 12(6)(2022), 100389

work page 2022

[9] [9]

T. Luo, Z. Li, Z. Yuan, W. Peng, T. Liu, L. Wang, and J. Wang,Fourier neural operator for large eddy simulation of compressible Rayleigh–Taylor turbulence, Physics of Fluids, 36(7)(2024), 075165

work page 2024

[10] [10]

Park, and H

J. Park, and H. Choi,Toward neural-network-based large-eddy simulation: application to turbulent channel flow, Journal of Fluid Mechanics, 914(2021), A16

work page 2021

[11] [11]

Y . Guan, A. Chattopadhyay, A. Subel, and P. Hassanzadeh,Stable a posteriori LES of 2D turbulence using convolutional neural networks: Backscattering analysis and generalization to higher Re via transfer learning, Journal of Computational Physics, 458(2022), 111090

work page 2022

[12] [12]

S. Zhao, Z. Li, B. Fan, Y . Wang, H. Yang, and J. Wang,LESnets (large-eddy simulation nets): Physics-informed neural operator for large-eddy simulation of turbulence, Journal of Computational Physics, 537(2025), 114125

work page 2025

[13] [13]

W. Peng, Z. Yuan, and J. Wang,Attention-enhanced neural network models for turbulence simulation, Physics of Fluids, 34(2)(2022), 025111

work page 2022

[14] [14]

W. Peng, Z. Yuan, Z. Li, and J. Wang,Linear attention coupled Fourier neural operator for simulation of three- dimensional turbulence, Physics of Fluids, 35(1)(2023), 015106

work page 2023

[15] [15]

Z. Hao, Z. Wang, H. Su, C. Ying, Y . Dong, S. Liu, Z. Cheng, J. Song, and J. Zhu,GNOT: A general neural oper- ator transformer for operator learning, Proceedings of the 40th International Conference on Machine Learning (ICML), PMLR, 202(2023), 12556–12569

work page 2023

[16] [16]

Z. Li, D. Shu, and A. Barati Farimani,Scalable Transformer for PDE Surrogate Modeling, Advances in Neural Information Processing Systems, 36(2023), 28010–28039

work page 2023

[17] [17]

H. Wu, H. Luo, H. Wang, J. Wang, and M. Long,Transolver: A Fast Transformer Solver for PDEs on General Geometries, Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR, 235(2024), 53681–53705

work page 2024

[18] [18]

Z. Li, T. Liu, W. Peng, Z. Yuan, and J. Wang,A transformer-based neural operator for large-eddy simulation of turbulence, Physics of Fluids, 36(6)(2024), 065167

work page 2024

[19] [19]

Du, and A

Y . Du, and A. S. Krishnapriyan,EddyFormer: Accelerated Neural Simulations of Three-Dimensional Turbulence at Scale, Advances in Neural Information Processing Systems, 38(2025)

work page 2025

[20] [20]

P. Lai, Y . Chen, D. Yang, R. Wang, F. Wang, and H. Xu,From Complex Dynamics to DynFormer: Rethinking Transformers for PDEs, arXiv preprint, arXiv:2603.03112, doi:10.48550/arXiv.2603.03112, (2026). 19

work page doi:10.48550/arxiv.2603.03112 2026

[21] [21]

H. Yang, Z. Li, X. Wang, and J. Wang,An implicit factorized transformer with applications to fast prediction of three-dimensional turbulence, Theoretical and Applied Mechanics Letters, 14(6)(2024), 100527

work page 2024

[22] [22]

H. Yang, Y . Wang, and J. Wang,Implicit factorized transformer approach to fast prediction of turbulent channel flows, Science China Physics, Mechanics & Astronomy, 69(1)(2026), 214606

work page 2026

[23] [23]

Gonzalez, F.-X

F. Gonzalez, F.-X. Demoulin, and S. Bernard,Towards Long-Term Predictions of Turbulence Using Neural Operators, arXiv preprint, arXiv:2307.13517, doi:10.48550/arXiv.2307.13517, (2023)

work page doi:10.48550/arxiv.2307.13517 2023

[24] [24]

Wu, X.-L

C. Wu, X.-L. Zhang, and G. He,Neural operator-based stochastic forcing for resolvent prediction of space-time turbulence statistics in channel flows, Journal of Fluid Mechanics, 1024(2025), A1

work page 2025

[25] [25]

Z. Li, W. Peng, Z. Yuan, and J. Wang,Long-term predictions of turbulence by implicit U-Net enhanced Fourier neural operator, Physics of Fluids, 35(7)(2023), 075145

work page 2023

[26] [26]

Y . Wang, Z. Li, Z. Yuan, W. Peng, T. Liu, and J. Wang,Prediction of turbulent channel flow using Fourier neural operator-based machine-learning strategy, Physical Review Fluids, 9(2024), 084604

work page 2024

[27] [27]

X. Zou, Z. Li, Y . Wang, H. Yang, and J. Wang,Uncertainty quantification and stability of neural operators for prediction of three-dimensional turbulence, Journal of Computational Physics, 549(2026), 114640

work page 2026

[28] [28]

McCabe, P

M. McCabe, P. Harrington, S. Subramanian, and J. Brown,Towards Stability of Autoregressive Neural Operators, Transactions on Machine Learning Research, (2023)

work page 2023

[29] [29]

Bengio, O

S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer,Scheduled sampling for sequence prediction with recurrent neural networks, Advances in Neural Information Processing Systems, 28(2015), 1171–1179

work page 2015

[30] [30]

A. Lamb, A. Goyal, Y . Zhang, S. Zhang, A. Courville, and Y . Bengio,Professor forcing: A new algorithm for training recurrent networks, Advances in Neural Information Processing Systems, 29(2016), 4601–4609

work page 2016

[31] [31]

Choi, and P

H. Choi, and P. Moin,Effects of the computational time step on numerical solutions of turbulent flow, Journal of Computational Physics, 113(1)(1994), 1–4

work page 1994

[32] [32]

P. K. Yeung, K. R. Sreenivasan, and S. B. Pope,Effects of finite spatial and temporal resolution in direct numer- ical simulations of incompressible isotropic turbulence, Physical Review Fluids, 3(6)(2018), 064603

work page 2018

[33] [33]

P. K. Yeung, and S. B. Pope,An algorithm for tracking fluid particles in numerical simulations of homogeneous turbulence, Journal of Computational Physics, 79(2)(1988), 373–416

work page 1988

[34] [34]

Fossella, L

F. Fossella, L. Biferale, A. Carrassi, M. Cencini, and V . Gupta,Multiscale data assimilation in turbulent models, Physical Review E, 113(2)(2026), 024208

work page 2026

[35] [35]

D. B. Quinn, Y . van Halder, and D. Lentink,Adaptive control of turbulence intensity is accelerated by frugal flow sampling, Journal of The Royal Society Interface, 14(136)(2017), 20170621

work page 2017

[36] [36]

Y . Liu, J. N. Kutz, and S. L. Brunton,Hierarchical deep learning of multiscale differential equation time- steppers, Philosophical Transactions of the Royal Society A, 380(2229)(2022), 20210200

work page 2022

[37] [37]

A. J. Linot, J. Burby, Q. Tang, P. Balaprakash, M. D. Graham, and R. Maulik,Stabilized neural ordinary differ- ential equations for long-time forecasting of dynamical systems, Journal of Computational Physics, 474(2023), 111838

work page 2023

[38] [38]

Chen, and J.-L

C. Chen, and J.-L. Wu,Neural dynamical operator: Continuous spatial-temporal model with gradient-based and derivative-free optimization methods, Journal of Computational Physics, 520(2025), 113480

work page 2025

[39] [39]

D. W. Abueidda, M. Nonna, P. Pantidis, and M. E. Mobasher,Time resolution independent operator learning, Computer Methods in Applied Mechanics and Engineering, 450(2026), 118586. 20

work page 2026

[40] [40]

Lippe, B

P. Lippe, B. S. Veeling, P. Perdikaris, R. E. Turner, and J. Brandstetter,PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers, Advances in Neural Information Processing Systems, 36(2023), 67398– 67433

work page 2023

[41] [41]

Huang, and P

X. Huang, and P. Perdikaris,PhysicsCorrect: A Training-Free Approach for Stable Neural PDE Simulations, arXiv preprint, arXiv:2507.02227, doi:10.48550/arXiv.2507.02227, (2025)

work page doi:10.48550/arxiv.2507.02227 2025

[42] [42]

R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton,Adaptive mixtures of local experts, Neural Compu- tation, 3(1)(1991), 79–87

work page 1991

[43] [43]

Shazeer, A

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V . Le, G. E. Hinton, and J. Dean,Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, International Conference on Learning Representations (ICLR), (2017)

work page 2017

[44] [44]

D. Dai, C. Deng, C. Zhao, R. X. Xu, H. Gao, D. Chen, J. Li, W. Zeng, X. Yu, Y . Wu, Z. Xie, Y . K. Li, P. Huang, F. Luo, C. Ruan, Z. Sui, and W. Liang,DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture- of-Experts Language Models, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), ...

work page doi:10.18653/v1/2024.acl-long.70 2024

[45] [45]

H. Wang, H. Xin, J. Wang, X. Yang, F. Zha, H. Dong, and Y . Jiang,Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-Training, arXiv preprint, arXiv:2510.25803, doi:10.48550/arXiv.2510.25803, (2025)

work page doi:10.48550/arxiv.2510.25803 2025

[46] [46]

D. Sun, X. Zhou, X. Wang, H. Si, W. Lyu, J. Tang, and B. Luo,NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training, arXiv preprint, arXiv:2602.22059, doi:10.48550/arXiv.2602.22059, (2026)

work page doi:10.48550/arxiv.2602.22059 2026

[47] [47]

X. Han, L. Wei, Z. Dou, Y . Sun, Z. Han, and Q. Tian,ViMoE: An Empirical Study of Designing Vision Mixture- of-Experts, IEEE Transactions on Image Processing, 34(2025), 7209–7221

work page 2025

[48] [48]

Smagorinsky,General circulation experiments with the primitive equations

J. Smagorinsky,General circulation experiments with the primitive equations. I. The basic experiment, Monthly Weather Review, 91(3)(1963), 99–164

work page 1963

[49] [49]

Holtzman, J

A. Holtzman, J. Buys, L. Du, M. Forbes, and Y . Choi,The Curious Case of Neural Text Degeneration, Interna- tional Conference on Learning Representations (ICLR), (2020)

work page 2020

[50] [50]

P. Moin, K. Squires, W. Cabot, and S. Lee,A dynamic subgrid-scale model for compressible turbulence and scalar transport, Physics of Fluids A: Fluid Dynamics, 3(11)(1991), 2746–2757

work page 1991

[51] [51]

Nicoud, and F

F. Nicoud, and F. Ducros,Subgrid-scale stress modelling based on the square of the velocity gradient tensor, Flow, Turbulence and Combustion, 62(3)(1999), 183–200

work page 1999

[52] [52]

Loshchilov, and F

I. Loshchilov, and F. Hutter,Decoupled Weight Decay Regularization, International Conference on Learning Representations (ICLR), (2019)

work page 2019

[53] [53]

Sanchez-Gonzalez, J

A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W. Battaglia,Learning to simulate complex physics with graph networks, Proceedings of the 37th International Conference on Machine Learning, PMLR, 119(2020), 8459–8468

work page 2020

[54] [54]

Stachenfeld, D

K. Stachenfeld, D. B. Fielding, D. Kochkov, M. Cranmer, T. Pfaff, J. Godwin, C. Cui, S. Ho, P. W. Battaglia, and A. Sanchez-Gonzalez,Learned coarse models for efficient turbulence simulation, International Conference on Learning Representations (ICLR), (2022)

work page 2022

[55] [55]

A. Tran, A. Mathews, L. Xie, and C. S. Ong,Factorized Fourier neural operators, International Conference on Learning Representations (ICLR), (2023). 21

work page 2023