Breakeven complexity: A new perspective on neural partial differential equation solvers

Mikhail Khodak; Nicholas Roberts; Tanya Marwah; Yijing Zhang

arxiv: 2605.15399 · v1 · pith:WVPUXISAnew · submitted 2026-05-14 · 💻 cs.LG · cs.AI· cs.NA· math.NA· physics.comp-ph

Breakeven complexity: A new perspective on neural partial differential equation solvers

Yijing Zhang , Nicholas Roberts , Tanya Marwah , Mikhail Khodak This is my paper

Pith reviewed 2026-05-19 16:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NAmath.NAphysics.comp-ph

keywords neural PDE solversbreakeven complexityscaling lawspartial differential equationssurrogate modelingcomputational costerror matchingfluid dynamics

0 comments

The pith

Neural PDE solvers reach cost parity with traditional methods after fewer solves as problems grow harder, higher-dimensional, or higher-Reynolds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines breakeven complexity as the number of forward solves required before a trained neural solver becomes cheaper than a traditional solver that matches its error. It shows how scaling laws can guide the split of training budget between data generation and model fitting, and how to match errors smoothly across regimes. Experiments on periodic 2D domains and flows past multiple obstacles indicate that this breakeven threshold drops as problem cost, dimension, rollout length, or Reynolds number rises. The framework therefore reframes evaluation around total end-to-end cost rather than accuracy alone.

Core claim

Breakeven complexity counts the forward solves needed for a neural PDE solver to become cheaper than an error-equivalent traditional solver once data-generation, training, and inference costs are included. Scaling laws determine how to allocate limited training budget between data and model updates, while error-matching procedures keep the comparison fair across different physics regimes. On three APEBench PDEs and a new multi-obstacle flow benchmark generated with PyFR, the measured breakeven point decreases as problems become more demanding in cost, dimension, rollout, or Reynolds number.

What carries the argument

Breakeven complexity, the count of forward solves at which the learned solver's cumulative cost falls below that of an error-matched classical solver.

If this is right

Neural solvers become relatively cheaper once problems exceed moderate cost or dimension thresholds.
Higher Reynolds numbers and longer rollouts widen the regime where learned solvers win on total cost.
Scaling-law allocation of data versus training compute yields usable error-matched comparisons.
Evaluation must include the full pipeline of data generation and training, not accuracy in isolation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners could use breakeven estimates to decide in advance whether to invest in a neural surrogate for a given PDE class.
The same cost lens might be applied to other surrogate modeling tasks such as reduced-order models or graph-based simulators.
Extending the measure to three-dimensional domains or time-varying boundaries would test whether the hardness advantage persists.

Load-bearing premise

Scaling laws remain reliable for splitting training budget between data generation and model training while error curves can be matched smoothly between neural and traditional solvers across many PDE regimes.

What would settle it

A controlled experiment in which breakeven complexity rises, rather than falls, when Reynolds number, spatial dimension, or rollout length is increased would refute the central claim.

Figures

Figures reproduced from arXiv: 2605.15399 by Mikhail Khodak, Nicholas Roberts, Tanya Marwah, Yijing Zhang.

**Figure 1.** Figure 1: Illustration of the breakeven complexity definition, using FFNO and Poseidon-T on Navier-Stokes as examples. For a fixed training budget B = 1000, a learned solver’s total costs for simulating N trajectories is Btotal = B + CinfN, while the cheapest classical configuration whose error matches the learned accuracy εB has cost CδB N. The intersection of these two lines, if ever, will occur at N⋆ (B) = B C… view at source ↗

**Figure 3.** Figure 3: Plots of the test nRMSE (top) and breakeven complexity (bottom), both as functions of the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Comparison of how decreasing resolution increases simulation speed across our four PDE settings; the time saved is relative to the high-fidelity solver, with the sequence of low-fidelity solvers determined as in Section 4.3. Unlike the APEBench simulations, BreakFlow is extremely sensitive to changes in resolution, with error increasing rapidly after a slight decrease in solve cost. (i) FFNO [31], an effi… view at source ↗

**Figure 6.** Figure 6: Comparison of the error distributions of FFNO and Poseidon-T (training budget 8K) along [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Plot showing the inverse relationship between test error and breakeven complexity of [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Vorticity snapshots at t = 0, 200, 400 of BreakFlow simulations of different Re bins (Re ∈ (10, 40],(40, 90] and (90, 160]) generated via PyFR. There are inlets boundaries at all sides apart from an outlet on the right. Black boxes represent obstacles. rectangle’s dimensions are determined by randomly sampling an area a between 4 and 30 (or 4 and 60 if y = 0), then randomly sampling a height h ∈ Z and widt… view at source ↗

read the original abstract

Neural surrogate solvers of partial differential equations (PDEs) promise dramatic speedups over numerical methods, especially in scenarios requiring many solves. However, current accuracy-based evaluations do not fully consider two central issues: (1) neural solvers incur substantial up-front costs for data generation, training, and tuning; and (2) classical solvers can also generate low-fidelity solutions at a sufficiently low simulation cost. To explicitly account for these realities and fully incorporate end-to-end costs, we propose an evaluation framework centered on breakeven complexity, a metric that counts the forward solves before a learned solver is cost-effective relative to an error-equivalent traditional solver. To evaluate this measure, we apply scaling laws to determine how much training budget to allocate to data generation and discuss how to achieve smooth error-matching in diverse settings. We evaluate the breakeven complexity of multiple neural PDE solvers on three PDEs on 2D periodic domains from APEBench and a novel benchmark of flows past multiple obstacles generated by the GPU-native PyFR code. Among other findings, our results suggest that neural PDE solvers become more effective as problems get harder in terms of cost, dimension, rollout, physics regime (e.g. higher Reynolds number), etc.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Breakeven complexity is a reasonable new framing for neural PDE solver costs, but the headline claim about better performance on harder problems rests on unverified scaling-law assumptions.

read the letter

The main takeaway is that this paper pushes evaluation of neural PDE solvers beyond accuracy or post-training speed by defining breakeven complexity as the number of forward solves needed for the learned model to beat an error-matched classical solver once training and data costs are included. That framing is new and directly addresses the practical reality that classical methods can run cheap low-fidelity versions while neural ones carry heavy upfront overhead. They apply scaling laws to split the training budget between data generation and model fitting, then test the resulting breakeven numbers on several 2D periodic PDEs plus a new multi-obstacle flow case generated with PyFR. The suggestion that neural solvers pull ahead as problems grow harder in Reynolds number, rollout length, or dimension comes out of those runs and is worth noting for people who actually deploy these things. The paper is clear about the metric and shows results across a few benchmarks, which is better than the usual accuracy-only tables. The soft spot is exactly the one the stress-test flags: the breakeven numbers are produced only after the authors choose data-versus-training splits via scaling laws and enforce error parity. If those scaling relations shift outside the training distribution, especially at higher Reynolds numbers or longer rollouts, the reported advantage on harder problems becomes an artifact of the allocation rule rather than a property of the solvers themselves. The abstract gives no concrete breakeven values, error bars, or sensitivity checks, so the full paper needs to demonstrate that the scaling holds or at least quantify how much it matters. This is useful reading for anyone in scientific machine learning who has to decide when a surrogate is actually worth the training investment. It is not a foundational result but a practical evaluation tool that deserves referee time to tighten the scaling assumptions and reproducibility details. I would send it to peer review with a request that reviewers check whether the breakeven calculations can be reproduced from the reported data and scaling exponents.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces breakeven complexity as a metric counting the number of forward solves required before a neural PDE solver becomes cost-effective relative to an error-equivalent traditional solver, explicitly incorporating data-generation, training, and tuning costs. Scaling laws are applied to allocate training budget between data generation and model fitting to achieve smooth error parity; the metric is then evaluated on 2D periodic PDEs from APEBench plus a new multi-obstacle flow benchmark generated with GPU-native PyFR, with the central finding that breakeven complexity decreases (i.e., neural solvers become relatively more attractive) as problem difficulty increases along axes of cost, dimension, rollout length, and Reynolds number.

Significance. If the scaling-law assumptions and error-matching procedure prove robust, the breakeven framework supplies a more realistic, end-to-end cost perspective that could usefully complement accuracy-only benchmarks in scientific machine learning. The multi-benchmark evaluation and introduction of a new obstacle-flow dataset add concrete empirical content to discussions of when neural surrogates are practically preferable.

major comments (2)

[Abstract] Abstract: the headline claim that breakeven complexity decreases with problem hardness (higher Reynolds number, longer rollout, etc.) is obtained only after invoking scaling laws from prior literature to set the data-generation versus training compute split and then enforcing error parity with a classical solver. The manuscript supplies no independent verification or cross-validation that these scaling exponents and prefactors remain valid once mesh Reynolds number or rollout length moves outside the training distribution; this assumption is load-bearing for the reported trend.
[Evaluation / Results] Evaluation / Results: the abstract describes suggestive results across benchmarks yet reports no quantitative breakeven numbers, error bars, statistical significance tests, or explicit data-exclusion rules. Without these, the strength of evidence for the cross-regime claims cannot be assessed and the findings remain difficult to reproduce or compare.

minor comments (2)

[Methods] Clarify in the methods how the cost of the error-equivalent traditional solver is computed when low-fidelity options are available, and state whether the breakeven count includes or excludes the classical solver's own setup costs.
[Budget allocation] Add a short table or explicit list of the exact scaling-law exponents and prefactors adopted from the literature, together with the source references, so readers can see the precise functional forms used for budget allocation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their insightful review of our manuscript introducing breakeven complexity for neural PDE solvers. We respond to each major comment in turn and describe the changes implemented in the revised version.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim that breakeven complexity decreases with problem hardness (higher Reynolds number, longer rollout, etc.) is obtained only after invoking scaling laws from prior literature to set the data-generation versus training compute split and then enforcing error parity with a classical solver. The manuscript supplies no independent verification or cross-validation that these scaling exponents and prefactors remain valid once mesh Reynolds number or rollout length moves outside the training distribution; this assumption is load-bearing for the reported trend.

Authors: We agree that the scaling laws constitute a central modeling assumption. These exponents and prefactors are taken from the established neural scaling literature and are applied here to allocate the training budget between data generation and model fitting. In the revised manuscript we have added an explicit subsection in the discussion that (i) restates the precise scaling relations employed, (ii) cites their original sources, and (iii) enumerates the regimes in which the relations have been previously validated. We also include a short paragraph acknowledging that independent cross-validation for the higher-Reynolds-number and longer-rollout regimes examined in this work is not provided and would constitute valuable future work. The reported trends are therefore conditional on the validity of these scaling assumptions, which we now state more clearly. revision: partial
Referee: [Evaluation / Results] Evaluation / Results: the abstract describes suggestive results across benchmarks yet reports no quantitative breakeven numbers, error bars, statistical significance tests, or explicit data-exclusion rules. Without these, the strength of evidence for the cross-regime claims cannot be assessed and the findings remain difficult to reproduce or compare.

Authors: We thank the referee for highlighting this gap in presentation. The revised manuscript now contains a dedicated results table that reports the numerical breakeven-complexity values (with standard deviations obtained from five independent training runs using different random seeds) for every PDE and parameter regime examined. We have added two-sided t-tests with p-values comparing breakeven points across regimes and have inserted an explicit paragraph in the evaluation protocol that lists the data-exclusion criteria (cases in which error parity could not be reached within the allocated compute budget are flagged and omitted from the aggregate statistics). These additions make the quantitative claims directly reproducible and allow readers to assess the strength of the cross-regime trends. revision: yes

Circularity Check

0 steps flagged

Breakeven metric grounded in external cost accounting; scaling laws drawn from prior literature without redefinition

full rationale

The paper defines breakeven complexity via explicit end-to-end cost comparison between neural and classical solvers, allocating training budget via scaling laws invoked from external literature rather than refitting them to the present runs. No load-bearing step reduces a reported prediction or uniqueness claim to a self-citation chain or to parameters fitted from the same data used for the headline result. Evaluations on APEBench and PyFR benchmarks remain independent of the breakeven formula itself, yielding a self-contained derivation against external cost benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on the applicability of scaling laws to neural PDE training costs and the feasibility of smooth error equivalence; no new physical entities are postulated. Because only the abstract is available, the ledger is necessarily incomplete and may miss additional fitted constants present in the full text.

free parameters (1)

scaling-law exponents and prefactors
Used to decide allocation of training budget to data generation; these are typically fitted quantities even when drawn from prior literature.

axioms (2)

domain assumption Scaling laws observed in other neural training regimes extend to neural PDE solvers
Invoked to determine optimal training budget without new derivation in the abstract.
domain assumption Error-equivalent low-fidelity classical solutions can be generated at predictably lower cost
Central to defining the breakeven point against traditional solvers.

pith-pipeline@v0.9.0 · 5763 in / 1483 out tokens · 82670 ms · 2026-05-19T16:18:16.733337+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

breakeven complexity N* := B / max{CδB − Cinf, 0} (Eq. 1); scaling laws to allocate compute between optimization and data generation
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

error-matched classical solver via resolution coarsening and CFL-constrained timestepping

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

[1]

Abnar, S., H. Shah, D. Busbridge, A. El-Nouby, J. M. Susskind, and V . Thilak (2025). Parameters vs FLOPs: Scaling laws for optimal sparsity for mixture-of-experts language models. InForty- second International Conference on Machine Learning

work page 2025
[2]

Ahrens, J. P., B. Geveci, and C. C. Law (2005). ParaView: An end-user tool for large-data visualization. InThe Visualization Handbook

work page 2005
[3]

Brandstetter, and S

Ashton, N., J. Brandstetter, and S. Mishra (2025). Fluid intelligence: A forward look on ai foundation models in computational fluid dynamics.arXiv preprint arXiv:2511.20455

work page arXiv 2025
[4]

Bochev, P. B. and D. Ridzal (2009). An optimization-based approach for the design of pde solution algorithms.SIAM journal on numerical analysis 47(5), 3938–3955

work page 2009
[5]

Singh, A

Choudhary, N., V . Singh, A. Talwalkar, N. M. Boffi, M. Khodak, and T. Marwah (2025). Pre-generating multi-difficulty pde data for few-shot neural pde solvers.arXiv preprint arXiv:2512.00564

work page arXiv 2025
[6]

De Ryck, T. and S. Mishra (2022). Generic bounds on the approximation error for physics- informed (and) operator learning.Advances in Neural Information Processing Systems 35, 10945–10958

work page 2022
[7]

Du, Y . and A. S. Krishnapriyan (2025). Eddyformer: Accelerated neural simulations of three- dimensional turbulence at scale. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

work page 2025
[8]

Evans, L. C. (2022).Partial differential equations, V olume 19. American mathematical society

work page 2022
[9]

Shen, and J

Fu, Z., J. Shen, and J. Yang (2025). Higher-order energy-decreasing exponential time differencing runge-kutta methods for gradient flows.Science China Mathematics 68(7), 1727–1746

work page 2025
[10]

and J.-F

Geuzaine, C. and J.-F. Remacle (2009). Gmsh: A three-dimensional finite element mesh generator with built-in pre- and post-processing facilities.International Journal for Numerical Methods in Engineering 79, 1309–1331

work page 2009
[11]

Grossmann, T. G., U. J. Komorowska, J. Latz, and C.-B. Schönlieb (2024, 01). Can physics- informed neural networks beat the finite element method?IMA Journal of Applied Mathemat- ics 89(1), 143–174

work page 2024
[12]

Gupta, J. K. and J. Brandstetter (2023). Towards multi-spatiotemporal-scale generalized PDE modeling.Transactions on Machine Learning Research

work page 2023
[13]

Hao, Z., C. Su, S. Liu, J. Berner, C. Ying, H. Su, A. Anandkumar, J. Song, and J. Zhu (2024). Dpot: auto-regressive denoising operator transformer for large-scale pde pre-training. In Proceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org

work page 2024
[14]

Raonic, T

Herde, M., B. Raonic, T. Rohner, R. Käppeli, R. Molinaro, E. de Bezenac, and S. Mishra (2024). Poseidon: Efficient foundation models for PDEs. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems

work page 2024
[15]

Borgeaud, A

Hoffmann, J., S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, O. Vinyals, J. W. Rae, and L. Sifre (2022). Training compute-optimal large language models. InProceedings of the ...

work page 2022
[16]

Kochkov, D., J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer (2021). Machine learning–accelerated computational fluid dynamics.Proceedings of the National Academy of Sciences 118(21)

work page 2021
[17]

Niedermayr, R

Koehler, F., S. Niedermayr, R. Westermann, and N. Thuerey (2024). APEBench: A benchmark for autoregressive neural emulators of PDEs.Advances in Neural Information Processing Systems (NeurIPS) 38

work page 2024
[18]

LeVeque, R. J. (2007).Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM

work page 2007
[19]

Li, Z., N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandku- mar (2021). Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations

work page 2021
[20]

Lu, Y ., H. Chen, J. Lu, L. Ying, and J. Blanchet (2022). Machine learning for elliptic PDEs: Fast rate generalization bound, neural scaling law and minimax optimality. InInternational Conference on Learning Representations

work page 2022
[21]

Walrus: A cross-domain foundation model for continuum dynamics.arXiv preprint arXiv:2511.15684, 2025

McCabe, M., P. Mukhopadhyay, T. Marwah, B. R.-S. Blancard, F. Rozet, C. Diaconu, L. Meyer, K. W. Wong, H. Sotoudeh, A. Bietti, et al. (2025). Walrus: A cross-domain foundation model for continuum dynamics.arXiv preprint arXiv:2511.15684

work page arXiv 2025
[22]

McGreivy, N. and A. Hakim (2024, September). Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations.Nature Machine Intelligence 6(10), 1256–1269

work page 2024
[23]

Han, and E

Morel, R., J. Han, and E. Oyallon (2025). DISCO: learning to DISCover an evolution operator for multi-physics-agnostic prediction. InForty-second International Conference on Machine Learning

work page 2025
[24]

McCabe, L

Ohana, R., M. McCabe, L. Meyer, R. Morel, F. Agocs, M. Beneitez, M. Berger, B. Burkhart, S. Dalziel, D. Fielding, et al. (2024). The well: a large-scale collection of diverse physics simulations for machine learning.Advances in Neural Information Processing Systems 37, 44989– 45037

work page 2024
[25]

Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations

Raissi, M., P. Perdikaris, and G. E. Karniadakis (2017). Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations.arXiv preprint arXiv:1711.10561

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

Roshko, A. (1954). On the development of turbulent wakes from vortex streets. Technical report

work page 1954
[27]

Bachmann, K

Sacchetti, A., B. Bachmann, K. Löffel, U.-M. Künzi, and B. Paoli (2022). Neural networks to solve partial differential equations: a comparison with finite elements.IEEE Access 10, 32271–32279

work page 2022
[28]

Praditia, R

Takamoto, M., T. Praditia, R. Leiteritz, D. MacKinlay, F. Alesiani, D. Pflüger, and M. Niepert (2022). Pdebench: An extensive benchmark for scientific machine learning.Advances in Neural Information Processing Systems 35, 1596–1611

work page 2022
[29]

Rabeh, C.-H

Tali, R., A. Rabeh, C.-H. Yang, M. Shadkhah, S. Karki, A. Upadhyaya, S. Dhakshinamoorthy, M. Saadati, S. Sarkar, A. Krishnamurthy, C. Hegde, A. Balu, and B. Ganapathysubramanian (2025). Flowbench: A large scale benchmark for flow simulation over complex geometries.Journal of Data-centric Machine Learning Research

work page 2025
[30]

Ranade, M

Tangsali, K., R. Ranade, M. A. Nabian, A. Kamenev, P. Sharpe, N. Ashton, R. Cherukuri, and S. Choudhry (2025). A benchmarking framework for ai models in automotive aerodynamics. arXiv preprint arXiv:2507.10747

work page arXiv 2025
[31]

Mathews, L

Tran, A., A. Mathews, L. Xie, and C. S. Ong (2023). Factorized fourier neural operators. In The Eleventh International Conference on Learning Representations

work page 2023
[32]

Williamson, C. H. (1989). Oblique and parallel modes of vortex shedding in the wake of a circular cylinder at low reynolds numbers.Journal of Fluid Mechanics 206, 579–627. 11

work page 1989
[33]

Farrington, and P

Witherden, F., A. Farrington, and P. Vincent (2014, November). Pyfr: An open source frame- work for solving advection–diffusion type problems on streaming architectures using the flux reconstruction approach.Computer Physics Communications 185(11), 3028–3040

work page 2014
[34]

Sun, and Y .-T

Xu, Z., Z. Sun, and Y .-T. Zhang (2025). Stability and time-step constraints of exponential time differencing runge–kutta discontinuous galerkin methods for advection-diffusion equations. Journal of Scientific Computing 105(2), 1–31. 12 A Dataset specifications All data generation is conducted on NVIDIA L40S GPUs with no jobs running in parallel. 2D incom...

work page 2025
[35]

We generate rectangular obstacles by looping in a nested fashion over the integers x={11,

Domain generation: we simulate over the square domain [0,50]×[−25,25] with an outlet boundary on the right and inlet boundaries on left, top, and bottom. We generate rectangular obstacles by looping in a nested fashion over the integers x={11, . . . ,25} and y∈ {0, . . . ,15}; at each (x, y) coordinate we place a rectangle with probability 0.5. The 13 (10...

work page
[36]

Mesh generation: we generate a first-order triangular mesh using the Gmsh utility [ 10]. The mesh is graded to have edge-length 1 3 near the obstacles, 2 3 at the edge of a refinement zone [5,45]×[−20,20] , and 1 at the edge of the domain, with edge-sizes varying smoothly between them. To decrease the simulation resolution, we simultaneously increase thes...

work page
[37]

To decrease the simulation resolution, we increase the timestepping parameters dt and pseudo-dt

Simulation: we run PyFR using the settings of the 2D incompressible cylinder flow example, but extending the simulation time to 400. To decrease the simulation resolution, we increase the timestepping parameters dt and pseudo-dt. The average per-trajectory simulation time on NVIDIA L40S GPUs is 136.9045 seconds

work page
[38]

optimal frontier

Lastly, we convert the data from an unstructured mesh to a regular64×64 grid using the ParaView utility [2]. Both these and the original mesh files will be released. The dataset spans a Reynolds number range of Re∈(10,160] . Data generation is organized into three discrete bins, with Re sampled uniformly at random within each respective range: (1) Bin 1: ...

work page 2000

[1] [1]

Abnar, S., H. Shah, D. Busbridge, A. El-Nouby, J. M. Susskind, and V . Thilak (2025). Parameters vs FLOPs: Scaling laws for optimal sparsity for mixture-of-experts language models. InForty- second International Conference on Machine Learning

work page 2025

[2] [2]

Ahrens, J. P., B. Geveci, and C. C. Law (2005). ParaView: An end-user tool for large-data visualization. InThe Visualization Handbook

work page 2005

[3] [3]

Brandstetter, and S

Ashton, N., J. Brandstetter, and S. Mishra (2025). Fluid intelligence: A forward look on ai foundation models in computational fluid dynamics.arXiv preprint arXiv:2511.20455

work page arXiv 2025

[4] [4]

Bochev, P. B. and D. Ridzal (2009). An optimization-based approach for the design of pde solution algorithms.SIAM journal on numerical analysis 47(5), 3938–3955

work page 2009

[5] [5]

Singh, A

Choudhary, N., V . Singh, A. Talwalkar, N. M. Boffi, M. Khodak, and T. Marwah (2025). Pre-generating multi-difficulty pde data for few-shot neural pde solvers.arXiv preprint arXiv:2512.00564

work page arXiv 2025

[6] [6]

De Ryck, T. and S. Mishra (2022). Generic bounds on the approximation error for physics- informed (and) operator learning.Advances in Neural Information Processing Systems 35, 10945–10958

work page 2022

[7] [7]

Du, Y . and A. S. Krishnapriyan (2025). Eddyformer: Accelerated neural simulations of three- dimensional turbulence at scale. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

work page 2025

[8] [8]

Evans, L. C. (2022).Partial differential equations, V olume 19. American mathematical society

work page 2022

[9] [9]

Shen, and J

Fu, Z., J. Shen, and J. Yang (2025). Higher-order energy-decreasing exponential time differencing runge-kutta methods for gradient flows.Science China Mathematics 68(7), 1727–1746

work page 2025

[10] [10]

and J.-F

Geuzaine, C. and J.-F. Remacle (2009). Gmsh: A three-dimensional finite element mesh generator with built-in pre- and post-processing facilities.International Journal for Numerical Methods in Engineering 79, 1309–1331

work page 2009

[11] [11]

Grossmann, T. G., U. J. Komorowska, J. Latz, and C.-B. Schönlieb (2024, 01). Can physics- informed neural networks beat the finite element method?IMA Journal of Applied Mathemat- ics 89(1), 143–174

work page 2024

[12] [12]

Gupta, J. K. and J. Brandstetter (2023). Towards multi-spatiotemporal-scale generalized PDE modeling.Transactions on Machine Learning Research

work page 2023

[13] [13]

Hao, Z., C. Su, S. Liu, J. Berner, C. Ying, H. Su, A. Anandkumar, J. Song, and J. Zhu (2024). Dpot: auto-regressive denoising operator transformer for large-scale pde pre-training. In Proceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org

work page 2024

[14] [14]

Raonic, T

Herde, M., B. Raonic, T. Rohner, R. Käppeli, R. Molinaro, E. de Bezenac, and S. Mishra (2024). Poseidon: Efficient foundation models for PDEs. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems

work page 2024

[15] [15]

Borgeaud, A

Hoffmann, J., S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, O. Vinyals, J. W. Rae, and L. Sifre (2022). Training compute-optimal large language models. InProceedings of the ...

work page 2022

[16] [16]

Kochkov, D., J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer (2021). Machine learning–accelerated computational fluid dynamics.Proceedings of the National Academy of Sciences 118(21)

work page 2021

[17] [17]

Niedermayr, R

Koehler, F., S. Niedermayr, R. Westermann, and N. Thuerey (2024). APEBench: A benchmark for autoregressive neural emulators of PDEs.Advances in Neural Information Processing Systems (NeurIPS) 38

work page 2024

[18] [18]

LeVeque, R. J. (2007).Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM

work page 2007

[19] [19]

Li, Z., N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandku- mar (2021). Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations

work page 2021

[20] [20]

Lu, Y ., H. Chen, J. Lu, L. Ying, and J. Blanchet (2022). Machine learning for elliptic PDEs: Fast rate generalization bound, neural scaling law and minimax optimality. InInternational Conference on Learning Representations

work page 2022

[21] [21]

Walrus: A cross-domain foundation model for continuum dynamics.arXiv preprint arXiv:2511.15684, 2025

McCabe, M., P. Mukhopadhyay, T. Marwah, B. R.-S. Blancard, F. Rozet, C. Diaconu, L. Meyer, K. W. Wong, H. Sotoudeh, A. Bietti, et al. (2025). Walrus: A cross-domain foundation model for continuum dynamics.arXiv preprint arXiv:2511.15684

work page arXiv 2025

[22] [22]

McGreivy, N. and A. Hakim (2024, September). Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations.Nature Machine Intelligence 6(10), 1256–1269

work page 2024

[23] [23]

Han, and E

Morel, R., J. Han, and E. Oyallon (2025). DISCO: learning to DISCover an evolution operator for multi-physics-agnostic prediction. InForty-second International Conference on Machine Learning

work page 2025

[24] [24]

McCabe, L

Ohana, R., M. McCabe, L. Meyer, R. Morel, F. Agocs, M. Beneitez, M. Berger, B. Burkhart, S. Dalziel, D. Fielding, et al. (2024). The well: a large-scale collection of diverse physics simulations for machine learning.Advances in Neural Information Processing Systems 37, 44989– 45037

work page 2024

[25] [25]

Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations

Raissi, M., P. Perdikaris, and G. E. Karniadakis (2017). Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations.arXiv preprint arXiv:1711.10561

work page internal anchor Pith review Pith/arXiv arXiv 2017

[26] [26]

Roshko, A. (1954). On the development of turbulent wakes from vortex streets. Technical report

work page 1954

[27] [27]

Bachmann, K

Sacchetti, A., B. Bachmann, K. Löffel, U.-M. Künzi, and B. Paoli (2022). Neural networks to solve partial differential equations: a comparison with finite elements.IEEE Access 10, 32271–32279

work page 2022

[28] [28]

Praditia, R

Takamoto, M., T. Praditia, R. Leiteritz, D. MacKinlay, F. Alesiani, D. Pflüger, and M. Niepert (2022). Pdebench: An extensive benchmark for scientific machine learning.Advances in Neural Information Processing Systems 35, 1596–1611

work page 2022

[29] [29]

Rabeh, C.-H

Tali, R., A. Rabeh, C.-H. Yang, M. Shadkhah, S. Karki, A. Upadhyaya, S. Dhakshinamoorthy, M. Saadati, S. Sarkar, A. Krishnamurthy, C. Hegde, A. Balu, and B. Ganapathysubramanian (2025). Flowbench: A large scale benchmark for flow simulation over complex geometries.Journal of Data-centric Machine Learning Research

work page 2025

[30] [30]

Ranade, M

Tangsali, K., R. Ranade, M. A. Nabian, A. Kamenev, P. Sharpe, N. Ashton, R. Cherukuri, and S. Choudhry (2025). A benchmarking framework for ai models in automotive aerodynamics. arXiv preprint arXiv:2507.10747

work page arXiv 2025

[31] [31]

Mathews, L

Tran, A., A. Mathews, L. Xie, and C. S. Ong (2023). Factorized fourier neural operators. In The Eleventh International Conference on Learning Representations

work page 2023

[32] [32]

Williamson, C. H. (1989). Oblique and parallel modes of vortex shedding in the wake of a circular cylinder at low reynolds numbers.Journal of Fluid Mechanics 206, 579–627. 11

work page 1989

[33] [33]

Farrington, and P

Witherden, F., A. Farrington, and P. Vincent (2014, November). Pyfr: An open source frame- work for solving advection–diffusion type problems on streaming architectures using the flux reconstruction approach.Computer Physics Communications 185(11), 3028–3040

work page 2014

[34] [34]

Sun, and Y .-T

Xu, Z., Z. Sun, and Y .-T. Zhang (2025). Stability and time-step constraints of exponential time differencing runge–kutta discontinuous galerkin methods for advection-diffusion equations. Journal of Scientific Computing 105(2), 1–31. 12 A Dataset specifications All data generation is conducted on NVIDIA L40S GPUs with no jobs running in parallel. 2D incom...

work page 2025

[35] [35]

We generate rectangular obstacles by looping in a nested fashion over the integers x={11,

Domain generation: we simulate over the square domain [0,50]×[−25,25] with an outlet boundary on the right and inlet boundaries on left, top, and bottom. We generate rectangular obstacles by looping in a nested fashion over the integers x={11, . . . ,25} and y∈ {0, . . . ,15}; at each (x, y) coordinate we place a rectangle with probability 0.5. The 13 (10...

work page

[36] [36]

Mesh generation: we generate a first-order triangular mesh using the Gmsh utility [ 10]. The mesh is graded to have edge-length 1 3 near the obstacles, 2 3 at the edge of a refinement zone [5,45]×[−20,20] , and 1 at the edge of the domain, with edge-sizes varying smoothly between them. To decrease the simulation resolution, we simultaneously increase thes...

work page

[37] [37]

To decrease the simulation resolution, we increase the timestepping parameters dt and pseudo-dt

Simulation: we run PyFR using the settings of the 2D incompressible cylinder flow example, but extending the simulation time to 400. To decrease the simulation resolution, we increase the timestepping parameters dt and pseudo-dt. The average per-trajectory simulation time on NVIDIA L40S GPUs is 136.9045 seconds

work page

[38] [38]

optimal frontier

Lastly, we convert the data from an unstructured mesh to a regular64×64 grid using the ParaView utility [2]. Both these and the original mesh files will be released. The dataset spans a Reynolds number range of Re∈(10,160] . Data generation is organized into three discrete bins, with Re sampled uniformly at random within each respective range: (1) Bin 1: ...

work page 2000