arxiv: 2605.04198 · v1 · submitted 2026-05-05 · 💻 cs.LG · cs.AI· physics.comp-ph· physics.flu-dyn· physics.plasm-ph

Recognition: unknown

Deep Wave Network for Modeling Multi-Scale Physical Dynamics

Alexander I. Khrabry , Edward A. Startsev , Andrew T. Powis , Igor D. Kaganovich

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:29 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.comp-phphysics.flu-dynphysics.plasm-ph

keywords Deep Wave NetworkU-Netmulti-scale modelingfluid dynamicsphysical simulationsPareto frontierencoder-decoderneural network depth

0 comments

The pith

Stacking multiple U-Net waves improves accuracy-cost trade-offs for modeling multi-scale physical flows

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard U-Nets build multi-resolution representations for gas, fluid, and plasma dynamics through one encoder-decoder path and skip connections, yet depth is usually fixed while width is tuned. The paper replaces that single wave with several waves stacked in series, adding skip connections both inside each wave and between waves to support progressive cross-scale refinement. Training data, optimization, and schedules remain identical while many width variants of each architecture are trained. Pareto fronts of prediction error versus GPU time then show that the stacked-wave models dominate the single-wave baselines on multiple 2D and 3D flow benchmarks, reaching target accuracies with up to three times less training time.

Core claim

Replacing the single encoder-decoder wave of a U-Net with a sequence of such waves linked by additional skip connections yields Deep Wave Networks whose accuracy-cost scaling dominates that of conventional single-wave U-Nets on physical flow prediction tasks.

What carries the argument

The stacked-wave architecture with intra-wave and inter-wave skip connections, which increases effective depth while preserving multi-resolution feature flow.

If this is right

DW-Net models reach low-error regimes with up to 3x less training time under identical training settings.
For matched computational cost, DW-Nets deliver higher accuracy than single-wave U-Nets on the tested flow problems.
For matched accuracy, DW-Nets require less GPU time than single-wave U-Nets.
The advantage holds across several 2D and 3D benchmarks for gas, fluid, and plasma dynamics.
Varying width together with the number of stacked waves explores the accuracy-cost space more effectively than varying width alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The stacking approach could be tested on other encoder-decoder tasks such as weather forecasting or medical image segmentation where multi-scale features matter.
If cross-wave skips remain effective at larger depths, they might allow deeper networks without the usual optimization difficulties of very deep single-path models.
The optimal number of waves may increase with the range of spatial scales in the target physical system, which could be checked by varying Reynolds number or domain size.

Load-bearing premise

That keeping training data, optimization, and schedules identical across architectures isolates the effect of the stacked-wave structure without hidden differences in convergence behavior or effective capacity.

What would settle it

A single-wave U-Net, when allowed its own optimized schedule or deeper single-path configuration, that matches or exceeds the DW-Net accuracy versus GPU-time Pareto front on the same benchmarks would falsify the reported consistent improvement.

Figures

Figures reproduced from arXiv: 2605.04198 by Alexander I. Khrabry, Andrew T. Powis, Edward A. Startsev, Igor D. Kaganovich.

**Figure 1.** Figure 1: Schematic of U-Net variants. More details on the architecture design are shown in Fig. view at source ↗

**Figure 2.** Figure 2: Details of the U-Net variants to be compared. The models are illustrated with 4 resolution view at source ↗

**Figure 3.** Figure 3: Pareto frontiers illustrating the trade-off between prediction accuracy and training time (a view at source ↗

**Figure 4.** Figure 4: Pareto frontiers illustrating the trade-offs between prediction accuracy and training/inference view at source ↗

**Figure 5.** Figure 5: Pareto trade-off between prediction accuracy (measured via statistical characteristics of view at source ↗

**Figure 6.** Figure 6: Trade-offs between prediction error at the final trajectory step and training time for DW-Net view at source ↗

**Figure 7.** Figure 7: Pareto frontiers illustrating the trade-off between prediction accuracy and inference time (a view at source ↗

**Figure 8.** Figure 8: Time-averaged spatial FFT spectra for n and ϕ. Solid lines – ground truth (simulation data), dotted lines – results of the DW-Net-3 model. 16 view at source ↗

**Figure 9.** Figure 9: Additional comparison results of DW-Net and SineNet models with more waves on two view at source ↗

**Figure 10.** Figure 10: Example of a trajectory for the buoyant smoke flow generated by the DW-Net-3 model view at source ↗

**Figure 11.** Figure 11: Example of a trajectory for the buoyant smoke flow generated by the DW-Net-3 model view at source ↗

**Figure 12.** Figure 12: Example of a trajectory for the buoyant smoke flow generated by the DW-Net-3 model view at source ↗

**Figure 13.** Figure 13: Example of a trajectory for the shallow water system generated by the DW-Net-3 model view at source ↗

**Figure 14.** Figure 14: Example of a trajectory for the shallow water system generated by the DW-Net-3 model view at source ↗

**Figure 15.** Figure 15: Example of a trajectory for the shallow water system generated by the DW-Net-3 model view at source ↗

**Figure 16.** Figure 16: Example of a trajectory for the Kolmogorov turbulence generated by the DW-Net-3 model view at source ↗

**Figure 17.** Figure 17: Beginning of a 2000 time step trajectory for HW turbulence generated by the DW-Net-3 view at source ↗

**Figure 18.** Figure 18: Beginning of a 2000 time step trajectory for HW turbulence generated by the DW-Net-3 view at source ↗

read the original abstract

Performance of deep learning models is strongly governed by architectural capacity, with width and depth as primary controls. However, in physical-science applications, models are often compared at a single fixed size or by separating accuracy and computational cost, which can be misleading since architectures exhibit different accuracy-cost scaling as width and depth vary. This issue is particularly relevant for U-Net-type encoder-decoder models, widely used for multi-scale gas, fluid, and plasma dynamics due to their ability to represent features across spatial scales. A U-Net constructs a multi-resolution representation via an encoder that progressively reduces spatial resolution, followed by a decoder that restores it for prediction. Skip connections link corresponding encoder and decoder features, preserving fine-scale information and improving optimization. In practice, U-Net width is routinely tuned, while depth is typically kept fixed (a set number of down/up-sampling stages with few convolutions per stage), limiting systematic exploration of depth for improving the accuracy-cost trade-off. We address this limitation by increasing effective depth through stacking multiple encoder-decoder "waves" in series, with skip connections both within and across waves to enable progressive cross-scale refinement. We call this architecture a Deep Wave Network (DW-Net). Training data, optimization, and schedules are kept identical across models. Instead of evaluating single configurations, we train multiple width variants of each architecture and compare accuracy vs. GPU time Pareto fronts. Across several 2D and 3D flow benchmarks, DW-Net models consistently improve the Pareto frontier over single-wave U-Nets, achieving higher accuracy at matched cost or similar accuracy at reduced cost, and reaching low-error regimes with up to 3x less training time under identical training settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DW-Net stacks U-Net waves with cross skips to add controllable depth, and the width-swept Pareto comparisons are a clean way to show accuracy-cost gains on flow benchmarks, though fixed schedules leave room for optimization confounds.

read the letter

The main point is that stacking multiple full encoder-decoder waves with both intra- and inter-wave skips gives a direct lever for increasing effective depth in U-Net models without just widening layers. They keep training data, optimizer, and schedules the same across variants, train several widths of each architecture, and compare the resulting accuracy versus GPU-time Pareto fronts on 2D and 3D flow benchmarks. The DW-Net versions sit on a better frontier, sometimes matching low error with less training time.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Deep Wave Networks (DW-Net) as an extension of U-Net architectures for multi-scale physical dynamics modeling. By stacking multiple encoder-decoder waves in series and adding intra- and inter-wave skip connections, the design increases effective depth while preserving multi-resolution features. The central empirical claim is that, when multiple width variants of DW-Net and baseline single-wave U-Nets are trained under identical data, optimizer, and schedules, the DW-Net models improve the accuracy-versus-GPU-time Pareto frontier on 2D and 3D flow benchmarks, yielding higher accuracy at matched cost or equivalent accuracy with up to 3x less training time.

Significance. If the results hold after addressing experimental controls, the work provides a systematic method for exploring depth in U-Net-style models without relying solely on width scaling, which is relevant for efficiency in scientific machine learning tasks such as fluid and plasma simulations. The Pareto-front evaluation across width variants is a methodological strength that avoids single-point cherry-picking and directly addresses accuracy-cost trade-offs.

major comments (3)

[Experiments] Experiments section: The Pareto-front results rely on fixed training schedules applied identically to single-wave U-Nets and multi-wave DW-Nets. Because wave stacking increases effective depth and modifies skip-connection topology, gradient magnitudes and convergence behavior can differ; without loss curves, epoch-to-target-error metrics, or schedule-sensitivity ablations, it is impossible to separate claimed multi-scale refinement benefits from incidental optimization advantages under the shared schedule.
[Method] Method section (DW-Net definition): The inter-wave skip connections are presented as enabling progressive cross-scale refinement, yet no explicit accounting is given for how these connections affect total parameter count or per-forward-pass FLOPs relative to a single-wave U-Net of comparable nominal width. This detail is load-bearing for interpreting the cost axis of the reported Pareto fronts.
[Results] Results (Pareto-front figures): The manuscript should report whether width variants were selected to match parameter budgets or FLOPs across architectures, and whether error metrics include run-to-run uncertainty; post-hoc selection of the best width per architecture could inflate the apparent frontier improvement.

minor comments (2)

[Abstract] Abstract: The phrase 'up to 3x less training time' should specify the precise error threshold at which this factor is measured.
Notation: Ensure consistent use of symbols for wave count, skip-connection types, and width scaling factors throughout the text and figures.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the referee's constructive comments on our manuscript. We address each major point below with point-by-point responses and indicate where revisions will be made to improve clarity and rigor.

read point-by-point responses

Referee: [Experiments] Experiments section: The Pareto-front results rely on fixed training schedules applied identically to single-wave U-Nets and multi-wave DW-Nets. Because wave stacking increases effective depth and modifies skip-connection topology, gradient magnitudes and convergence behavior can differ; without loss curves, epoch-to-target-error metrics, or schedule-sensitivity ablations, it is impossible to separate claimed multi-scale refinement benefits from incidental optimization advantages under the shared schedule.

Authors: We agree that convergence diagnostics would strengthen the claims. The reported Pareto fronts reflect final accuracies under the fixed, identical training protocol, which matches practical use cases in scientific ML. To address the concern, we will add training loss curves for representative DW-Net and U-Net variants in the revised experiments section, along with a short discussion of observed convergence behavior. This will help demonstrate that the gains arise from the architecture rather than incidental optimization effects. revision: yes
Referee: [Method] Method section (DW-Net definition): The inter-wave skip connections are presented as enabling progressive cross-scale refinement, yet no explicit accounting is given for how these connections affect total parameter count or per-forward-pass FLOPs relative to a single-wave U-Net of comparable nominal width. This detail is load-bearing for interpreting the cost axis of the reported Pareto fronts.

Authors: The referee correctly notes this omission. Inter-wave skip connections add projection convolutions that modestly increase parameter count beyond the extra wave computations. We will revise the method section to include explicit parameter counts, FLOPs estimates, and a comparison table for DW-Net versus single-wave U-Net at matched nominal widths. Our primary cost metric remains measured GPU training time, which already incorporates these overheads for the Pareto analysis. revision: yes
Referee: [Results] Results (Pareto-front figures): The manuscript should report whether width variants were selected to match parameter budgets or FLOPs across architectures, and whether error metrics include run-to-run uncertainty; post-hoc selection of the best width per architecture could inflate the apparent frontier improvement.

Authors: Width variants were selected independently for each architecture by scaling the base channel count to span comparable capacity ranges; exact parameter or FLOP matching was not performed due to structural differences. All trained points are shown in the figures, with the frontier as the lower envelope, avoiding post-hoc selection of only the best widths. We will update the results section to state the selection process explicitly and add run-to-run uncertainty (standard deviations from repeated seeds) for representative points where such data is available. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical Pareto comparison under fixed training protocol

full rationale

The paper's central claim is an empirical result: DW-Net (stacked-wave U-Net) improves the accuracy-vs-GPU-time Pareto frontier over single-wave U-Nets across 2D/3D flow benchmarks when width variants are trained with identical data, optimizer, and schedules. No equations, predictions, or first-principles derivations are offered; the architecture is defined explicitly by adding inter-wave skip connections, and performance is measured on external benchmarks. The comparison does not reduce any claimed improvement to a quantity defined by the architecture itself or to a self-citation chain. The skeptic concern about schedule optimality is a question of experimental isolation, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions about optimization and generalization rather than new physical axioms or invented entities.

axioms (1)

domain assumption Identical training data, optimizer, and schedule isolate architectural effects
Stated explicitly in the abstract as the basis for fair comparison.

pith-pipeline@v0.9.0 · 5629 in / 1192 out tokens · 21004 ms · 2026-05-08T17:29:06.329918+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

106 extracted references · 22 canonical work pages · 5 internal anchors

[1]

D. C. Wilcoxet al.,Turbulence modeling for CFD, vol. 2. DCW industries La Canada, CA, 1998

1998
[2]

Hasegawa,Plasma instabilities and nonlinear effects, vol

A. Hasegawa,Plasma instabilities and nonlinear effects, vol. 8. Springer Science & Business Media, 2012

2012
[3]

Learning skillful medium-range global weather forecasting,

R. Lam, A. Sanchez-Gonzalez, M. Willson, P. Wirnsberger, M. Fortunato, F. Alet, S. Ravuri, T. Ewalds, Z. Eaton-Rosen, W. Hu,et al., “Learning skillful medium-range global weather forecasting,”Science, vol. 382, no. 6677, pp. 1416–1421, 2023

2023
[4]

A refinement of previous hypotheses concerning the local structure of tur- bulence in a viscous incompressible fluid at high reynolds number,

A. N. Kolmogorov, “A refinement of previous hypotheses concerning the local structure of tur- bulence in a viscous incompressible fluid at high reynolds number,”Journal of Fluid Mechanics, vol. 13, no. 1, pp. 82–85, 1962

1962
[5]

arXiv preprint arXiv:2408.12171 , year=

H. Wang, Y. Cao, Z. Huang, Y. Liu, P. Hu, X. Luo, Z. Song, W. Zhao, J. Liu, J. Sun,et al., “Recent advances on machine learning for computational fluid dynamics: A survey,”arXiv preprint arXiv:2408.12171, 2024

work page arXiv 2024
[6]

Current and emerging deep-learning methods for the simulation of fluid dynamics,

M. Lino, S. Fotiadis, A. A. Bharath, and C. D. Cantwell, “Current and emerging deep-learning methods for the simulation of fluid dynamics,”Proceedings of the Royal Society A, vol. 479, no. 2275, p. 20230058, 2023

2023
[7]

Solver-in-the-loop: Learning from differentiable physics to interact with iterative pde-solvers,

K. Um, R. Brand, Y. R. Fei, P. Holl, and N. Thuerey, “Solver-in-the-loop: Learning from differentiable physics to interact with iterative pde-solvers,”Advances in neural information processing systems, vol. 33, pp. 6111–6122, 2020

2020
[8]

Machine learn- ing–accelerated computational fluid dynamics,

D. Kochkov, J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer, “Machine learn- ing–accelerated computational fluid dynamics,”Proceedings of the National Academy of Sciences, vol. 118, no. 21, p. e2101784118, 2021

2021
[9]

Physics-preserving ai-accelerated simulations of plasma turbulence,

R. Greif, F. Jenko, and N. Thuerey, “Physics-preserving ai-accelerated simulations of plasma turbulence,”arXiv preprint arXiv:2309.16400, 2023

work page arXiv 2023
[10]

Combining differentiable pde solvers and graph neural networks for fluid flow prediction,

F. D. A. Belbute-Peres, T. Economon, and Z. Kolter, “Combining differentiable pde solvers and graph neural networks for fluid flow prediction,” ininternational conference on machine learning, pp. 2402–2411, PMLR, 2020

2020
[11]

Deep neural networks for data-driven les closure models,

A. Beck, D. Flad, and C.-D. Munz, “Deep neural networks for data-driven les closure models,” Journal of Computational Physics, vol. 398, p. 108910, 2019

2019
[12]

Training convolutional neural networks to estimate turbulent sub-grid scale reaction rates,

C. J. Lapeyre, A. Misdariis, N. Cazard, D. Veynante, and T. Poinsot, “Training convolutional neural networks to estimate turbulent sub-grid scale reaction rates,”Combustion and Flame, vol. 203, pp. 255–264, 2019

2019
[13]

Data-driven subgrid-scale modeling of forced burgers turbulence using deep learning with generalization to higher reynolds numbers via transfer learning,

A. Subel, A. Chattopadhyay, Y. Guan, and P. Hassanzadeh, “Data-driven subgrid-scale modeling of forced burgers turbulence using deep learning with generalization to higher reynolds numbers via transfer learning,”Physics of Fluids, vol. 33, no. 3, 2021

2021
[14]

Cfdnet: a deep learning- based accelerator for fluid simulations,

O. Obiols-Sales, A. Vishnu, N. Malaya, and A. Chandramowliswharan, “Cfdnet: a deep learning- based accelerator for fluid simulations,” inProceedings of the 34th ACM International Conference on Supercomputing, ICS ’20, (New York, NY, USA), Association for Computing Machinery, 2020

2020
[15]

Accelerating kinetic plasma simulations with machine-learning-generated initial conditions,

A. T. Powis, D. C. Rivera, A. Khrabry, and I. D. Kaganovich, “Accelerating kinetic plasma simulations with machine-learning-generated initial conditions,”Physics of Plasmas, vol. 33, p. 013902, 2026

2026
[16]

Prediction of aerodynamic flow fields using convolutional neural networks,

S. Bhatnagar, Y. Afshar, S. Pan, K. Duraisamy, and S. Kaushik, “Prediction of aerodynamic flow fields using convolutional neural networks,”Computational Mechanics, vol. 64, pp. 525–545, 2019. 26

2019
[17]

Clifford neural layers for PDE modeling,

J. Brandstetter, R. van den Berg, M. Welling, and J. K. Gupta, “Clifford neural layers for PDE modeling,” inThe Eleventh International Conference on Learning Representations, 2023

2023
[18]

Universal physics transformers: A framework for efficiently scaling neural operators,

B. Alkin, A. F¨ urst, S. Schmid, L. Gruber, M. Holzleitner, and J. Brandstetter, “Universal physics transformers: A framework for efficiently scaling neural operators,”Advances in Neural Infor- mation Processing Systems, vol. 37, pp. 25152–25194, 2024

2024
[19]

Fourier Neural Operator for Parametric Partial Differential Equations

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anand- kumar, “Fourier neural operator for parametric partial differential equations,”arXiv preprint arXiv:2010.08895, 2020

work page internal anchor Pith review arXiv 2010
[20]

Learning nonlinear operators via deep- onet based on the universal approximation theorem of operators,

L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis, “Learning nonlinear operators via deep- onet based on the universal approximation theorem of operators,”Nature machine intelligence, vol. 3, no. 3, pp. 218–229, 2021

2021
[21]

Learning the solution operator of parametric partial differ- ential equations with physics-informed deeponets,

S. Wang, H. Wang, and P. Perdikaris, “Learning the solution operator of parametric partial differ- ential equations with physics-informed deeponets,”Science advances, vol. 7, no. 40, p. eabi8605, 2021

2021
[22]

Hierarchical-embedding autoencoder with a predictor as efficient architecture for learning time-evolution in multi-scale turbulent flows,

A. I. Khrabry, E. A. Startsev, A. T. Powis, and I. D. Kaganovich, “Hierarchical-embedding autoencoder with a predictor as efficient architecture for learning time-evolution in multi-scale turbulent flows,”Physics of Fluids, vol. 38, no. 4, 2026

2026
[23]

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differ- ential equations,

M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differ- ential equations,”Journal of Computational physics, vol. 378, pp. 686–707, 2019

2019
[24]

Physics- informed machine learning,

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics- informed machine learning,”Nature Reviews Physics, vol. 3, no. 6, pp. 422–440, 2021

2021
[25]

Physics-informed neural networks as surrogate models of hydrodynamic simulators,

J. Donnelly, A. Daneshkhah, and S. Abolfathi, “Physics-informed neural networks as surrogate models of hydrodynamic simulators,”Science of the Total Environment, vol. 912, p. 168814, 2024

2024
[26]

Neuralpde: Automating physics-informed neural networks (pinns) with error approximations,

K. Zubov, Z. McCarthy, Y. Ma, F. Calisto, V. Pagliarino, S. Azeglio, L. Bottero, E. Luj´ an, V. Sulzer, A. Bharambe,et al., “Neuralpde: Automating physics-informed neural networks (pinns) with error approximations,”arXiv preprint arXiv:2107.09443, 2021

work page arXiv 2021
[27]

Pinnsformer: A transformer-based framework for physics- informed neural networks.arXiv preprint arXiv:2307.11833, 2023

Z. Zhao, X. Ding, and B. A. Prakash, “Pinnsformer: A transformer-based framework for physics- informed neural networks,”arXiv preprint arXiv:2307.11833, 2023

work page arXiv 2023
[28]

Nsfnets (navier-stokes flow nets): Physics- informed neural networks for the incompressible navier-stokes equations,

X. Jin, S. Cai, H. Li, and G. E. Karniadakis, “Nsfnets (navier-stokes flow nets): Physics- informed neural networks for the incompressible navier-stokes equations,”Journal of Compu- tational Physics, vol. 426, p. 109951, 2021

2021
[29]

Physics-informed neural networks for solving reynolds-averaged navier–stokes equations,

H. Eivazi, M. Tahani, P. Schlatter, and R. Vinuesa, “Physics-informed neural networks for solving reynolds-averaged navier–stokes equations,”Physics of Fluids, vol. 34, no. 7, 2022

2022
[30]

Physics-informed neural operator for learning partial differential equations,

Z. Li, H. Zheng, N. Kovachki, D. Jin, H. Chen, B. Liu, K. Azizzadenesheli, and A. Anandkumar, “Physics-informed neural operator for learning partial differential equations,”ACM/IMS Journal of Data Science, vol. 1, no. 3, pp. 1–27, 2024

2024
[31]

Generative-machine-learning surrogate model of plasma turbulence,

B. Clavier, D. Zarzoso, D. del Castillo-Negrete, and E. Fr´ enod, “Generative-machine-learning surrogate model of plasma turbulence,”Physical Review E, vol. 111, no. 1, p. L013202, 2025

2025
[32]

Learned coarse models for efficient turbulence simula- tion,

K. Stachenfeld, D. B. Fielding, D. Kochkov, M. Cranmer, T. Pfaff, J. Godwin, C. Cui, S. Ho, P. Battaglia, and A. Sanchez-Gonzalez, “Learned coarse models for efficient turbulence simula- tion,” inInternational Conference on Learning Representations, 2022

2022
[33]

Towards physics-informed deep learning for turbulent flow prediction,

R. Wang, K. Kashinath, M. Mustafa, A. Albert, and R. Yu, “Towards physics-informed deep learning for turbulent flow prediction,” inProceedings of the 26th ACM SIGKDD Interna- tional Conference on Knowledge Discovery & Data Mining, KDD ’20, (New York, NY, USA), p. 1457–1466, Association for Computing Machinery, 2020. 27

2020
[34]

Scientific machine learning based reduced-order models for plasma turbulence simulations,

C. Gahr, I.-G. Farca¸ s, and F. Jenko, “Scientific machine learning based reduced-order models for plasma turbulence simulations,”Physics of Plasmas, vol. 31, no. 11, 2024

2024
[35]

PDE-refiner: Achiev- ing accurate long rollouts with neural PDE solvers,

P. Lippe, B. S. Veeling, P. Perdikaris, R. E. Turner, and J. Brandstetter, “PDE-refiner: Achiev- ing accurate long rollouts with neural PDE solvers,” inThirty-seventh Conference on Neural Information Processing Systems, 2023

2023
[36]

Sinenet: Learning tem- poral dynamics in time-dependent partial differential equations,

X. Zhang, J. Helwig, Y. Lin, Y. Xie, C. Fu, S. Wojtowytsch, and S. Ji, “Sinenet: Learning tem- poral dynamics in time-dependent partial differential equations,” inThe Twelfth International Conference on Learning Representations, 2024

2024
[37]

Deep fluids: A generative network for parameterized fluid simulations,

B. Kim, V. C. Azevedo, N. Thuerey, T. Kim, M. Gross, and B. Solenthaler, “Deep fluids: A generative network for parameterized fluid simulations,” inComputer graphics forum, vol. 38, pp. 59–70, Wiley Online Library, 2019

2019
[38]

Towards multi-spatiotemporal-scale generalized pde modeling.arXiv preprint arXiv:2209.15616, 2022

J. K. Gupta and J. Brandstetter, “Towards multi-spatiotemporal-scale generalized pde model- ing,”arXiv preprint arXiv:2209.15616, 2022

work page arXiv 2022
[39]

The well: a large-scale collection of diverse physics simulations for machine learning,

R. Ohana, M. McCabe, L. Meyer, R. Morel, F. Agocs, M. Beneitez, M. Berger, B. Burkhart, S. Dalziel, D. Fielding,et al., “The well: a large-scale collection of diverse physics simulations for machine learning,”Advances in Neural Information Processing Systems, vol. 37, pp. 44989– 45037, 2024

2024
[40]

Transform once: Efficient operator learning in frequency domain,

M. Poli, S. Massaroli, F. Berto, J. Park, T. Dao, C. R´ e, and S. Ermon, “Transform once: Efficient operator learning in frequency domain,”Advances in Neural Information Processing Systems, vol. 35, pp. 7947–7959, 2022

2022
[41]

Factorized fourier neural operators,

A. Tran, A. Mathews, L. Xie, and C. S. Ong, “Factorized fourier neural operators,” inThe Eleventh International Conference on Learning Representations, 2023

2023
[42]

Group equivariant fourier neural operators for partial differential equations.arXiv preprint arXiv:2306.05697, 2023

J. Helwig, X. Zhang, C. Fu, J. Kurtin, S. Wojtowytsch, and S. Ji, “Group equivariant fourier neural operators for partial differential equations,”arXiv preprint arXiv:2306.05697, 2023

work page arXiv 2023
[43]

U-no: U-shaped neural operators.arXiv preprint arXiv:2204.11127, 2022

M. A. Rahman, Z. E. Ross, and K. Azizzadenesheli, “U-no: U-shaped neural operators,”arXiv preprint arXiv:2204.11127, 2022

work page arXiv 2022
[44]

Plasma surrogate modelling using fourier neural operators,

Z. Li, V. Gopakumar, S. Pamela, L. Zanisi, and A. Anandkumar, “Plasma surrogate modelling using fourier neural operators,” inAPS Division of Plasma Physics Meeting Abstracts, vol. 2024, pp. CM11–007, 2024

2024
[45]

Neural operator surrogate models of plasma edge simulations: feasibility and data efficiency,

N. Carey, L. Zanisi, S. Pamela, V. Gopakumar, J. Omotani, J. Buchanan, J. Brandstetter, F. Paischer, G. Galletti, and P. Setinek, “Neural operator surrogate models of plasma edge simulations: feasibility and data efficiency,”Nuclear Fusion, vol. 65, no. 10, p. 106010, 2025

2025
[46]

Fourier neural operator for large eddy simulation of compressible rayleigh–taylor turbulence,

T. Luo, Z. Li, Z. Yuan, W. Peng, T. Liu, L. L. Wang, and J. Wang, “Fourier neural operator for large eddy simulation of compressible rayleigh–taylor turbulence,”Physics of Fluids, vol. 36, no. 7, 2024

2024
[47]

Spatiotemporal wall pressure forecast of a rectangular cylinder with physics-aware deepu-fourier neural network,

J. Liu, C. Liu, Y. Ke, W. Chen, K. Shum, T. K. Tse, and G. Hu, “Spatiotemporal wall pressure forecast of a rectangular cylinder with physics-aware deepu-fourier neural network,”Physics of Fluids, vol. 37, no. 12, 2025

2025
[48]

Enhancing fourier neural operators with local spatial features.arXiv preprint arXiv:2503.17797, 2025

C. Liu, D. Murari, C. Budd, L. Liu, and C.-B. Sch¨ onlieb, “Enhancing fourier neural operators with local spatial features,”arXiv preprint arXiv:2503.17797, 2025

work page arXiv 2025
[49]

J., Li, Z., and Anandkumar, A

R. J. George, J. Zhao, J. Kossaifi, Z. Li, and A. Anandkumar, “Incremental spatial and spectral learning of neural operators for solving large-scale pdes,”arXiv preprint arXiv:2211.15188, 2022

work page arXiv 2022
[50]

Learning how landscapes evolve with neural operators,

G. G. Roberts, “Learning how landscapes evolve with neural operators,”Earth Surface Dynam- ics, vol. 13, no. 4, pp. 563–570, 2025

2025
[51]

Fourier neural operator network for fast photoacoustic wave simulations,

S. Guan, K.-T. Hsu, and P. V. Chitnis, “Fourier neural operator network for fast photoacoustic wave simulations,”Algorithms, vol. 16, no. 2, p. 124, 2023. 28

2023
[52]

Modeling multivariable high-resolution 3d urban microclimate using localized fourier neural operator,

S. Qin, D. Zhan, D. Geng, W. Peng, G. Tian, Y. Shi, N. Gao, X. Liu, and L. L. Wang, “Modeling multivariable high-resolution 3d urban microclimate using localized fourier neural operator,” Building and Environment, vol. 273, p. 112668, 2025

2025
[53]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polo- sukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017
[54]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly,et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review arXiv 2010
[55]

Multiple physics pretraining for spatiotemporal surrogate models,

M. McCabe, B. R´ egaldo-Saint Blancard, L. Parker, R. Ohana, M. Cranmer, A. Bietti, M. Eicken- berg, S. Golkar, G. Krawezik, F. Lanusse,et al., “Multiple physics pretraining for spatiotemporal surrogate models,”Advances in Neural Information Processing Systems, vol. 37, pp. 119301– 119335, 2024

2024
[56]

Climformer-a spherical transformer model for long-term climate projections,

S. R. Cachay, P. Mitra, H. Hirasawa, S. Kim, S. Hazarika, D. Hingmire, P. Rasch, H. Singh, and K. Ramea, “Climformer-a spherical transformer model for long-term climate projections,” inProceedings of the Machine Learning and the Phys-ical Sciences Workshop, NeurIPS 2022, 2022

2022
[57]

Deep spatial transformers for autoregressive data-driven forecasting of geophysical turbulence,

A. Chattopadhyay, M. Mustafa, P. Hassanzadeh, and K. Kashinath, “Deep spatial transformers for autoregressive data-driven forecasting of geophysical turbulence,” inProceedings of the 10th international conference on climate informatics, pp. 106–112, 2020

2020
[58]

Earthformer: Ex- ploring space-time transformers for earth system forecasting,

Z. Gao, X. Shi, H. Wang, Y. Zhu, Y. B. Wang, M. Li, and D.-Y. Yeung, “Earthformer: Ex- ploring space-time transformers for earth system forecasting,”Advances in Neural Information Processing Systems, vol. 35, pp. 25390–25403, 2022

2022
[59]

In: Proceedings of the 40th International Conference on Machine Learning

T. Nguyen, J. Brandstetter, A. Kapoor, J. K. Gupta, and A. Grover, “Climax: A foundation model for weather and climate,”arXiv preprint arXiv:2301.10343, 2023

work page arXiv 2023
[60]

Scalable transformer for pde surrogate modeling,

Z. Li, D. Shu, and A. Barati Farimani, “Scalable transformer for pde surrogate modeling,” Advances in Neural Information Processing Systems, vol. 36, pp. 28010–28039, 2023

2023
[61]

Choose a transformer: Fourier or galerkin,

S. Cao, “Choose a transformer: Fourier or galerkin,”Advances in neural information processing systems, vol. 34, pp. 24924–24940, 2021

2021
[62]

Gnot: A general neural operator transformer for operator learning,

Z. Hao, Z. Wang, H. Su, C. Ying, Y. Dong, S. Liu, Z. Cheng, J. Song, and J. Zhu, “Gnot: A general neural operator transformer for operator learning,” inInternational Conference on Machine Learning, pp. 12556–12569, PMLR, 2023

2023
[63]

Transformer for partial differential equations’ operator learning,

Z. Li, K. Meidani, and A. B. Farimani, “Transformer for partial differential equations’ operator learning,”Transactions on Machine Learning Research, 2023

2023
[64]

Large-scale distributed training of transformers for chemical fingerprinting,

H. Abdel-Aty and I. R. Gould, “Large-scale distributed training of transformers for chemical fingerprinting,”Journal of Chemical Information and Modeling, vol. 62, no. 20, pp. 4852–4862, 2022

2022
[65]

Self-attention with relative position repre- sentations

P. Shaw, J. Uszkoreit, and A. Vaswani, “Self-attention with relative position representations,” arXiv preprint arXiv:1803.02155, 2018

work page arXiv 2018
[66]

Rethinking and improving relative position encoding for vision transformer,

K. Wu, H. Peng, M. Chen, J. Fu, and H. Chao, “Rethinking and improving relative position encoding for vision transformer,” inProceedings of the IEEE/CVF international conference on computer vision, pp. 10033–10041, 2021

2021
[67]

Roformer: Enhanced transformer with rotary position embedding,

J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, and Y. Liu, “Roformer: Enhanced transformer with rotary position embedding,”Neurocomputing, vol. 568, p. 127063, 2024

2024
[68]

Relational inductive biases, deep learning, and graph networks

P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner,et al., “Relational inductive biases, deep learning, and graph networks,”arXiv preprint arXiv:1806.01261, 2018. 29

work page internal anchor Pith review arXiv 2018
[69]

Graph networks as learnable physics engines for inference and control,

A. Sanchez-Gonzalez, N. Heess, J. T. Springenberg, J. Merel, M. Riedmiller, R. Hadsell, and P. Battaglia, “Graph networks as learnable physics engines for inference and control,” inInter- national conference on machine learning, pp. 4470–4479, PMLR, 2018

2018
[70]

Learning to simulate complex physics with graph networks,

A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. Battaglia, “Learning to simulate complex physics with graph networks,” inInternational conference on machine learning, pp. 8459–8468, PMLR, 2020

2020
[71]

Deep learning method based on physics informed neural network with resnet block for solving fluid flow problems,

C. Cheng and G.-T. Zhang, “Deep learning method based on physics informed neural network with resnet block for solving fluid flow problems,”Water, vol. 13, no. 4, p. 423, 2021

2021
[72]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inPro- ceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016

2016
[73]

Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data,

F. I. Diakogiannis, F. Waldner, P. Caccetta, and C. Wu, “Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 162, pp. 94–114, 2020

2020
[74]

arXiv preprint arXiv:1711.08506 , year=

X. Xia and B. Kulis, “W-net: A deep model for fully unsupervised image segmentation,”arXiv preprint arXiv:1711.08506, 2017

work page arXiv 2017
[75]

Stacked u-nets: a no-frills approach to natural image segmentation,

S. Shah, P. Ghosh, L. S. Davis, and T. Goldstein, “Stacked u-nets: a no-frills approach to natural image segmentation,”arXiv preprint arXiv:1804.10343, 2018

work page arXiv 2018
[76]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pp. 234–241, Springer, 2015

2015
[77]

Pdebench: An extensive benchmark for scientific machine learning,

M. Takamoto, T. Praditia, R. Leiteritz, D. MacKinlay, F. Alesiani, D. Pfl¨ uger, and M. Niepert, “Pdebench: An extensive benchmark for scientific machine learning,”Advances in Neural Infor- mation Processing Systems, vol. 35, pp. 1596–1611, 2022

2022
[78]

Group normalization,

Y. Wu and K. He, “Group normalization,” inProceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018

2018
[79]

Wide Residual Networks

S. Zagoruyko and N. Komodakis, “Wide residual networks,”arXiv preprint arXiv:1605.07146, 2016

work page internal anchor Pith review arXiv 2016
[80]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020

2020

Showing first 80 references.