pith. sign in

arxiv: 2605.15399 · v1 · pith:WVPUXISAnew · submitted 2026-05-14 · 💻 cs.LG · cs.AI· cs.NA· math.NA· physics.comp-ph

Breakeven complexity: A new perspective on neural partial differential equation solvers

Pith reviewed 2026-05-19 16:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NAmath.NAphysics.comp-ph
keywords neural PDE solversbreakeven complexityscaling lawspartial differential equationssurrogate modelingcomputational costerror matchingfluid dynamics
0
0 comments X

The pith

Neural PDE solvers reach cost parity with traditional methods after fewer solves as problems grow harder, higher-dimensional, or higher-Reynolds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines breakeven complexity as the number of forward solves required before a trained neural solver becomes cheaper than a traditional solver that matches its error. It shows how scaling laws can guide the split of training budget between data generation and model fitting, and how to match errors smoothly across regimes. Experiments on periodic 2D domains and flows past multiple obstacles indicate that this breakeven threshold drops as problem cost, dimension, rollout length, or Reynolds number rises. The framework therefore reframes evaluation around total end-to-end cost rather than accuracy alone.

Core claim

Breakeven complexity counts the forward solves needed for a neural PDE solver to become cheaper than an error-equivalent traditional solver once data-generation, training, and inference costs are included. Scaling laws determine how to allocate limited training budget between data and model updates, while error-matching procedures keep the comparison fair across different physics regimes. On three APEBench PDEs and a new multi-obstacle flow benchmark generated with PyFR, the measured breakeven point decreases as problems become more demanding in cost, dimension, rollout, or Reynolds number.

What carries the argument

Breakeven complexity, the count of forward solves at which the learned solver's cumulative cost falls below that of an error-matched classical solver.

If this is right

  • Neural solvers become relatively cheaper once problems exceed moderate cost or dimension thresholds.
  • Higher Reynolds numbers and longer rollouts widen the regime where learned solvers win on total cost.
  • Scaling-law allocation of data versus training compute yields usable error-matched comparisons.
  • Evaluation must include the full pipeline of data generation and training, not accuracy in isolation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners could use breakeven estimates to decide in advance whether to invest in a neural surrogate for a given PDE class.
  • The same cost lens might be applied to other surrogate modeling tasks such as reduced-order models or graph-based simulators.
  • Extending the measure to three-dimensional domains or time-varying boundaries would test whether the hardness advantage persists.

Load-bearing premise

Scaling laws remain reliable for splitting training budget between data generation and model training while error curves can be matched smoothly between neural and traditional solvers across many PDE regimes.

What would settle it

A controlled experiment in which breakeven complexity rises, rather than falls, when Reynolds number, spatial dimension, or rollout length is increased would refute the central claim.

Figures

Figures reproduced from arXiv: 2605.15399 by Mikhail Khodak, Nicholas Roberts, Tanya Marwah, Yijing Zhang.

Figure 1
Figure 1. Figure 1: Illustration of the breakeven complex￾ity definition, using FFNO and Poseidon-T on Navier-Stokes as examples. For a fixed train￾ing budget B = 1000, a learned solver’s total costs for simulating N trajectories is Btotal = B + CinfN, while the cheapest classical configu￾ration whose error matches the learned accuracy εB has cost CδB N. The intersection of these two lines, if ever, will occur at N⋆ (B) = B C… view at source ↗
Figure 3
Figure 3. Figure 3: Plots of the test nRMSE (top) and breakeven complexity (bottom), both as functions of the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of how decreas￾ing resolution increases simulation speed across our four PDE settings; the time saved is relative to the high-fidelity solver, with the sequence of low-fidelity solvers determined as in Section 4.3. Unlike the APEBench simulations, BreakFlow is extremely sensitive to changes in resolution, with error increasing rapidly after a slight decrease in solve cost. (i) FFNO [31], an effi… view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of the error distributions of FFNO and Poseidon-T (training budget 8K) along [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Plot showing the inverse relationship between test error and breakeven complexity of [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Vorticity snapshots at t = 0, 200, 400 of BreakFlow simulations of different Re bins (Re ∈ (10, 40],(40, 90] and (90, 160]) generated via PyFR. There are inlets boundaries at all sides apart from an outlet on the right. Black boxes represent obstacles. rectangle’s dimensions are determined by randomly sampling an area a between 4 and 30 (or 4 and 60 if y = 0), then randomly sampling a height h ∈ Z and widt… view at source ↗
read the original abstract

Neural surrogate solvers of partial differential equations (PDEs) promise dramatic speedups over numerical methods, especially in scenarios requiring many solves. However, current accuracy-based evaluations do not fully consider two central issues: (1) neural solvers incur substantial up-front costs for data generation, training, and tuning; and (2) classical solvers can also generate low-fidelity solutions at a sufficiently low simulation cost. To explicitly account for these realities and fully incorporate end-to-end costs, we propose an evaluation framework centered on breakeven complexity, a metric that counts the forward solves before a learned solver is cost-effective relative to an error-equivalent traditional solver. To evaluate this measure, we apply scaling laws to determine how much training budget to allocate to data generation and discuss how to achieve smooth error-matching in diverse settings. We evaluate the breakeven complexity of multiple neural PDE solvers on three PDEs on 2D periodic domains from APEBench and a novel benchmark of flows past multiple obstacles generated by the GPU-native PyFR code. Among other findings, our results suggest that neural PDE solvers become more effective as problems get harder in terms of cost, dimension, rollout, physics regime (e.g. higher Reynolds number), etc.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces breakeven complexity as a metric counting the number of forward solves required before a neural PDE solver becomes cost-effective relative to an error-equivalent traditional solver, explicitly incorporating data-generation, training, and tuning costs. Scaling laws are applied to allocate training budget between data generation and model fitting to achieve smooth error parity; the metric is then evaluated on 2D periodic PDEs from APEBench plus a new multi-obstacle flow benchmark generated with GPU-native PyFR, with the central finding that breakeven complexity decreases (i.e., neural solvers become relatively more attractive) as problem difficulty increases along axes of cost, dimension, rollout length, and Reynolds number.

Significance. If the scaling-law assumptions and error-matching procedure prove robust, the breakeven framework supplies a more realistic, end-to-end cost perspective that could usefully complement accuracy-only benchmarks in scientific machine learning. The multi-benchmark evaluation and introduction of a new obstacle-flow dataset add concrete empirical content to discussions of when neural surrogates are practically preferable.

major comments (2)
  1. [Abstract] Abstract: the headline claim that breakeven complexity decreases with problem hardness (higher Reynolds number, longer rollout, etc.) is obtained only after invoking scaling laws from prior literature to set the data-generation versus training compute split and then enforcing error parity with a classical solver. The manuscript supplies no independent verification or cross-validation that these scaling exponents and prefactors remain valid once mesh Reynolds number or rollout length moves outside the training distribution; this assumption is load-bearing for the reported trend.
  2. [Evaluation / Results] Evaluation / Results: the abstract describes suggestive results across benchmarks yet reports no quantitative breakeven numbers, error bars, statistical significance tests, or explicit data-exclusion rules. Without these, the strength of evidence for the cross-regime claims cannot be assessed and the findings remain difficult to reproduce or compare.
minor comments (2)
  1. [Methods] Clarify in the methods how the cost of the error-equivalent traditional solver is computed when low-fidelity options are available, and state whether the breakeven count includes or excludes the classical solver's own setup costs.
  2. [Budget allocation] Add a short table or explicit list of the exact scaling-law exponents and prefactors adopted from the literature, together with the source references, so readers can see the precise functional forms used for budget allocation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their insightful review of our manuscript introducing breakeven complexity for neural PDE solvers. We respond to each major comment in turn and describe the changes implemented in the revised version.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim that breakeven complexity decreases with problem hardness (higher Reynolds number, longer rollout, etc.) is obtained only after invoking scaling laws from prior literature to set the data-generation versus training compute split and then enforcing error parity with a classical solver. The manuscript supplies no independent verification or cross-validation that these scaling exponents and prefactors remain valid once mesh Reynolds number or rollout length moves outside the training distribution; this assumption is load-bearing for the reported trend.

    Authors: We agree that the scaling laws constitute a central modeling assumption. These exponents and prefactors are taken from the established neural scaling literature and are applied here to allocate the training budget between data generation and model fitting. In the revised manuscript we have added an explicit subsection in the discussion that (i) restates the precise scaling relations employed, (ii) cites their original sources, and (iii) enumerates the regimes in which the relations have been previously validated. We also include a short paragraph acknowledging that independent cross-validation for the higher-Reynolds-number and longer-rollout regimes examined in this work is not provided and would constitute valuable future work. The reported trends are therefore conditional on the validity of these scaling assumptions, which we now state more clearly. revision: partial

  2. Referee: [Evaluation / Results] Evaluation / Results: the abstract describes suggestive results across benchmarks yet reports no quantitative breakeven numbers, error bars, statistical significance tests, or explicit data-exclusion rules. Without these, the strength of evidence for the cross-regime claims cannot be assessed and the findings remain difficult to reproduce or compare.

    Authors: We thank the referee for highlighting this gap in presentation. The revised manuscript now contains a dedicated results table that reports the numerical breakeven-complexity values (with standard deviations obtained from five independent training runs using different random seeds) for every PDE and parameter regime examined. We have added two-sided t-tests with p-values comparing breakeven points across regimes and have inserted an explicit paragraph in the evaluation protocol that lists the data-exclusion criteria (cases in which error parity could not be reached within the allocated compute budget are flagged and omitted from the aggregate statistics). These additions make the quantitative claims directly reproducible and allow readers to assess the strength of the cross-regime trends. revision: yes

Circularity Check

0 steps flagged

Breakeven metric grounded in external cost accounting; scaling laws drawn from prior literature without redefinition

full rationale

The paper defines breakeven complexity via explicit end-to-end cost comparison between neural and classical solvers, allocating training budget via scaling laws invoked from external literature rather than refitting them to the present runs. No load-bearing step reduces a reported prediction or uniqueness claim to a self-citation chain or to parameters fitted from the same data used for the headline result. Evaluations on APEBench and PyFR benchmarks remain independent of the breakeven formula itself, yielding a self-contained derivation against external cost benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on the applicability of scaling laws to neural PDE training costs and the feasibility of smooth error equivalence; no new physical entities are postulated. Because only the abstract is available, the ledger is necessarily incomplete and may miss additional fitted constants present in the full text.

free parameters (1)
  • scaling-law exponents and prefactors
    Used to decide allocation of training budget to data generation; these are typically fitted quantities even when drawn from prior literature.
axioms (2)
  • domain assumption Scaling laws observed in other neural training regimes extend to neural PDE solvers
    Invoked to determine optimal training budget without new derivation in the abstract.
  • domain assumption Error-equivalent low-fidelity classical solutions can be generated at predictably lower cost
    Central to defining the breakeven point against traditional solvers.

pith-pipeline@v0.9.0 · 5763 in / 1483 out tokens · 82670 ms · 2026-05-19T16:18:16.733337+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

  1. [1]

    Abnar, S., H. Shah, D. Busbridge, A. El-Nouby, J. M. Susskind, and V . Thilak (2025). Parameters vs FLOPs: Scaling laws for optimal sparsity for mixture-of-experts language models. InForty- second International Conference on Machine Learning

  2. [2]

    Ahrens, J. P., B. Geveci, and C. C. Law (2005). ParaView: An end-user tool for large-data visualization. InThe Visualization Handbook

  3. [3]

    Brandstetter, and S

    Ashton, N., J. Brandstetter, and S. Mishra (2025). Fluid intelligence: A forward look on ai foundation models in computational fluid dynamics.arXiv preprint arXiv:2511.20455

  4. [4]

    Bochev, P. B. and D. Ridzal (2009). An optimization-based approach for the design of pde solution algorithms.SIAM journal on numerical analysis 47(5), 3938–3955

  5. [5]

    Singh, A

    Choudhary, N., V . Singh, A. Talwalkar, N. M. Boffi, M. Khodak, and T. Marwah (2025). Pre-generating multi-difficulty pde data for few-shot neural pde solvers.arXiv preprint arXiv:2512.00564

  6. [6]

    De Ryck, T. and S. Mishra (2022). Generic bounds on the approximation error for physics- informed (and) operator learning.Advances in Neural Information Processing Systems 35, 10945–10958

  7. [7]

    Du, Y . and A. S. Krishnapriyan (2025). Eddyformer: Accelerated neural simulations of three- dimensional turbulence at scale. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

  8. [8]

    Evans, L. C. (2022).Partial differential equations, V olume 19. American mathematical society

  9. [9]

    Shen, and J

    Fu, Z., J. Shen, and J. Yang (2025). Higher-order energy-decreasing exponential time differencing runge-kutta methods for gradient flows.Science China Mathematics 68(7), 1727–1746

  10. [10]

    and J.-F

    Geuzaine, C. and J.-F. Remacle (2009). Gmsh: A three-dimensional finite element mesh generator with built-in pre- and post-processing facilities.International Journal for Numerical Methods in Engineering 79, 1309–1331

  11. [11]

    Grossmann, T. G., U. J. Komorowska, J. Latz, and C.-B. Schönlieb (2024, 01). Can physics- informed neural networks beat the finite element method?IMA Journal of Applied Mathemat- ics 89(1), 143–174

  12. [12]

    Gupta, J. K. and J. Brandstetter (2023). Towards multi-spatiotemporal-scale generalized PDE modeling.Transactions on Machine Learning Research

  13. [13]

    Hao, Z., C. Su, S. Liu, J. Berner, C. Ying, H. Su, A. Anandkumar, J. Song, and J. Zhu (2024). Dpot: auto-regressive denoising operator transformer for large-scale pde pre-training. In Proceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org

  14. [14]

    Raonic, T

    Herde, M., B. Raonic, T. Rohner, R. Käppeli, R. Molinaro, E. de Bezenac, and S. Mishra (2024). Poseidon: Efficient foundation models for PDEs. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems

  15. [15]

    Borgeaud, A

    Hoffmann, J., S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, O. Vinyals, J. W. Rae, and L. Sifre (2022). Training compute-optimal large language models. InProceedings of the ...

  16. [16]

    Kochkov, D., J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer (2021). Machine learning–accelerated computational fluid dynamics.Proceedings of the National Academy of Sciences 118(21)

  17. [17]

    Niedermayr, R

    Koehler, F., S. Niedermayr, R. Westermann, and N. Thuerey (2024). APEBench: A benchmark for autoregressive neural emulators of PDEs.Advances in Neural Information Processing Systems (NeurIPS) 38

  18. [18]

    LeVeque, R. J. (2007).Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM

  19. [19]

    Li, Z., N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandku- mar (2021). Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations

  20. [20]

    Lu, Y ., H. Chen, J. Lu, L. Ying, and J. Blanchet (2022). Machine learning for elliptic PDEs: Fast rate generalization bound, neural scaling law and minimax optimality. InInternational Conference on Learning Representations

  21. [21]

    Walrus: A cross-domain foundation model for continuum dynamics.arXiv preprint arXiv:2511.15684, 2025

    McCabe, M., P. Mukhopadhyay, T. Marwah, B. R.-S. Blancard, F. Rozet, C. Diaconu, L. Meyer, K. W. Wong, H. Sotoudeh, A. Bietti, et al. (2025). Walrus: A cross-domain foundation model for continuum dynamics.arXiv preprint arXiv:2511.15684

  22. [22]

    McGreivy, N. and A. Hakim (2024, September). Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations.Nature Machine Intelligence 6(10), 1256–1269

  23. [23]

    Han, and E

    Morel, R., J. Han, and E. Oyallon (2025). DISCO: learning to DISCover an evolution operator for multi-physics-agnostic prediction. InForty-second International Conference on Machine Learning

  24. [24]

    McCabe, L

    Ohana, R., M. McCabe, L. Meyer, R. Morel, F. Agocs, M. Beneitez, M. Berger, B. Burkhart, S. Dalziel, D. Fielding, et al. (2024). The well: a large-scale collection of diverse physics simulations for machine learning.Advances in Neural Information Processing Systems 37, 44989– 45037

  25. [25]

    Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations

    Raissi, M., P. Perdikaris, and G. E. Karniadakis (2017). Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations.arXiv preprint arXiv:1711.10561

  26. [26]

    Roshko, A. (1954). On the development of turbulent wakes from vortex streets. Technical report

  27. [27]

    Bachmann, K

    Sacchetti, A., B. Bachmann, K. Löffel, U.-M. Künzi, and B. Paoli (2022). Neural networks to solve partial differential equations: a comparison with finite elements.IEEE Access 10, 32271–32279

  28. [28]

    Praditia, R

    Takamoto, M., T. Praditia, R. Leiteritz, D. MacKinlay, F. Alesiani, D. Pflüger, and M. Niepert (2022). Pdebench: An extensive benchmark for scientific machine learning.Advances in Neural Information Processing Systems 35, 1596–1611

  29. [29]

    Rabeh, C.-H

    Tali, R., A. Rabeh, C.-H. Yang, M. Shadkhah, S. Karki, A. Upadhyaya, S. Dhakshinamoorthy, M. Saadati, S. Sarkar, A. Krishnamurthy, C. Hegde, A. Balu, and B. Ganapathysubramanian (2025). Flowbench: A large scale benchmark for flow simulation over complex geometries.Journal of Data-centric Machine Learning Research

  30. [30]

    Ranade, M

    Tangsali, K., R. Ranade, M. A. Nabian, A. Kamenev, P. Sharpe, N. Ashton, R. Cherukuri, and S. Choudhry (2025). A benchmarking framework for ai models in automotive aerodynamics. arXiv preprint arXiv:2507.10747

  31. [31]

    Mathews, L

    Tran, A., A. Mathews, L. Xie, and C. S. Ong (2023). Factorized fourier neural operators. In The Eleventh International Conference on Learning Representations

  32. [32]

    Williamson, C. H. (1989). Oblique and parallel modes of vortex shedding in the wake of a circular cylinder at low reynolds numbers.Journal of Fluid Mechanics 206, 579–627. 11

  33. [33]

    Farrington, and P

    Witherden, F., A. Farrington, and P. Vincent (2014, November). Pyfr: An open source frame- work for solving advection–diffusion type problems on streaming architectures using the flux reconstruction approach.Computer Physics Communications 185(11), 3028–3040

  34. [34]

    Sun, and Y .-T

    Xu, Z., Z. Sun, and Y .-T. Zhang (2025). Stability and time-step constraints of exponential time differencing runge–kutta discontinuous galerkin methods for advection-diffusion equations. Journal of Scientific Computing 105(2), 1–31. 12 A Dataset specifications All data generation is conducted on NVIDIA L40S GPUs with no jobs running in parallel. 2D incom...

  35. [35]

    We generate rectangular obstacles by looping in a nested fashion over the integers x={11,

    Domain generation: we simulate over the square domain [0,50]×[−25,25] with an outlet boundary on the right and inlet boundaries on left, top, and bottom. We generate rectangular obstacles by looping in a nested fashion over the integers x={11, . . . ,25} and y∈ {0, . . . ,15}; at each (x, y) coordinate we place a rectangle with probability 0.5. The 13 (10...

  36. [36]

    Mesh generation: we generate a first-order triangular mesh using the Gmsh utility [ 10]. The mesh is graded to have edge-length 1 3 near the obstacles, 2 3 at the edge of a refinement zone [5,45]×[−20,20] , and 1 at the edge of the domain, with edge-sizes varying smoothly between them. To decrease the simulation resolution, we simultaneously increase thes...

  37. [37]

    To decrease the simulation resolution, we increase the timestepping parameters dt and pseudo-dt

    Simulation: we run PyFR using the settings of the 2D incompressible cylinder flow example, but extending the simulation time to 400. To decrease the simulation resolution, we increase the timestepping parameters dt and pseudo-dt. The average per-trajectory simulation time on NVIDIA L40S GPUs is 136.9045 seconds

  38. [38]

    optimal frontier

    Lastly, we convert the data from an unstructured mesh to a regular64×64 grid using the ParaView utility [2]. Both these and the original mesh files will be released. The dataset spans a Reynolds number range of Re∈(10,160] . Data generation is organized into three discrete bins, with Re sampled uniformly at random within each respective range: (1) Bin 1: ...