NOWS: Neural Operator Warm Starts for Accelerating Iterative Solvers
Pith reviewed 2026-05-18 01:30 UTC · model grok-4.3
The pith
Neural operators generate initial guesses that cut iterative PDE solver time by up to 90 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Neural Operator Warm Starts (NOWS) harness learned solution operators to produce high-quality initial guesses for Krylov methods such as conjugate gradient and GMRES. This hybrid strategy accelerates classical iterative solvers for PDEs while preserving stability and convergence guarantees. Across benchmarks the learned initialization reduces iteration counts and end-to-end runtime, delivering a computational-time reduction of up to 90 percent, and integrates directly with finite-difference, finite-element, isogeometric, and finite-volume discretizations.
What carries the argument
Neural Operator Warm Starts (NOWS), the use of a trained neural operator to supply an initial guess to an otherwise unchanged Krylov iterative solver.
If this is right
- Iteration counts for conjugate gradient and GMRES drop consistently across the tested benchmarks.
- End-to-end runtime falls by up to 90 percent while the underlying numerical algorithm's stability and convergence guarantees remain intact.
- The same learned operator can be paired with finite-difference, finite-element, isogeometric, and finite-volume discretizations without code changes.
- The method targets many-query, real-time, and design tasks where repeated PDE solves are the bottleneck.
Where Pith is reading between the lines
- The same warm-start idea could be applied to time-dependent or nonlinear PDEs by training the operator on solution snapshots rather than steady-state fields.
- An online version might retrain or fine-tune the operator on recent solves to maintain performance when problem statistics drift.
- Because the iterative solver still runs to convergence, the approach could serve as a safe drop-in replacement inside existing engineering workflows that already trust Krylov methods.
Load-bearing premise
A neural operator trained on one distribution of right-hand sides, boundary conditions, and geometries will still produce initial guesses close enough to the true solution on new problems that the iteration-count savings stay large and reliable.
What would settle it
Run the method on a collection of right-hand sides, boundary conditions, or geometries deliberately drawn from outside the training distribution and measure whether the iteration reduction drops below 10 percent or the solver fails to converge within a preset budget.
Figures
read the original abstract
Partial differential equations (PDEs) underpin quantitative descriptions across the physical sciences and engineering, yet high-fidelity simulation remains a major computational bottleneck for many-query, real-time, and design tasks. Data-driven surrogates can be strikingly fast but are often unreliable when applied outside their training distribution. Here we introduce Neural Operator Warm Starts (NOWS), a hybrid strategy that harnesses learned solution operators to accelerate classical iterative solvers by producing high-quality initial guesses for Krylov methods such as conjugate gradient and GMRES. NOWS leaves existing discretizations and solver infrastructures intact, integrating seamlessly with finite-difference, finite-element, isogeometric analysis, finite volume method, etc. Across our benchmarks, the learned initialization consistently reduces iteration counts and end-to-end runtime, resulting in a reduction of the computational time of up to 90 %, while preserving the stability and convergence guarantees of the underlying numerical algorithms. By combining the rapid inference of neural operators with the rigor of traditional solvers, NOWS provides a practical and trustworthy approach to accelerate high-fidelity PDE simulations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Neural Operator Warm Starts (NOWS), a hybrid method that trains a neural operator to generate initial guesses for Krylov iterative solvers (CG, GMRES) applied to discretized PDEs. The approach leaves existing discretizations and solver code unchanged and is claimed to reduce iteration counts and end-to-end runtime by up to 90 % across benchmarks while inheriting the stability and convergence guarantees of the underlying numerical algorithms.
Significance. If the reported speed-ups prove robust outside the training distribution, the work supplies a practical route to accelerate many-query and real-time PDE simulations without sacrificing the reliability that pure data-driven surrogates often lack. The explicit preservation of classical convergence theory and the compatibility with standard discretizations (finite elements, finite volumes, etc.) are concrete strengths.
major comments (2)
- [Experiments] Experiments section: the headline claim of consistent iteration-count and runtime reductions (up to 90 %) across benchmarks is load-bearing for the paper’s contribution, yet the manuscript supplies no quantitative characterization of the training distribution versus the diversity of test instances (new RHS, BCs, or geometries). Without such characterization or worst-case distance bounds on the learned initial guess, it is impossible to verify that the observed speed-ups will persist rather than revert to baseline behavior.
- [§3 and §4] §3 (Method) and §4 (Numerical results): while the method correctly inherits convergence guarantees from the Krylov solver, no analysis or empirical quantification is given for how close the neural-operator output lies to the true solution on out-of-distribution problems. This distance directly controls the iteration reduction and therefore the practical utility of the warm-start strategy.
minor comments (2)
- [Abstract] The abstract states performance claims without reference to any table or figure; a single sentence pointing to the relevant result table would improve readability.
- [§2] Notation for the neural operator and the underlying linear system could be introduced earlier and used consistently to avoid occasional ambiguity between the learned map and the discrete operator.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback emphasizing the need for clearer characterization of training versus test distributions and direct quantification of warm-start quality on out-of-distribution instances. These points help strengthen the presentation of robustness. We respond to each major comment below and have revised the manuscript to incorporate additional details and experiments.
read point-by-point responses
-
Referee: Experiments section: the headline claim of consistent iteration-count and runtime reductions (up to 90 %) across benchmarks is load-bearing for the paper’s contribution, yet the manuscript supplies no quantitative characterization of the training distribution versus the diversity of test instances (new RHS, BCs, or geometries). Without such characterization or worst-case distance bounds on the learned initial guess, it is impossible to verify that the observed speed-ups will persist rather than revert to baseline behavior.
Authors: We agree that explicit characterization of the training distribution relative to test diversity strengthens the claims. In the revised manuscript we have added a new subsection to §4 that specifies the training distribution parameters (e.g., ranges of forcing terms, boundary condition types, and geometry variations) and documents the test instances, which include previously unseen RHS, BCs, and geometries. We also report empirical relative L2 distances between neural-operator predictions and reference solutions on these test cases. While deriving rigorous worst-case distance bounds would require additional theoretical assumptions beyond the scope of the current work, the added empirical metrics confirm that iteration and runtime reductions remain substantial across the evaluated distribution shifts. revision: yes
-
Referee: §3 (Method) and §4 (Numerical results): while the method correctly inherits convergence guarantees from the Krylov solver, no analysis or empirical quantification is given for how close the neural-operator output lies to the true solution on out-of-distribution problems. This distance directly controls the iteration reduction and therefore the practical utility of the warm-start strategy.
Authors: The referee correctly notes that the practical speed-up depends on the quality of the initial guess. Although the Krylov convergence theory holds for any initial vector, we have now included direct empirical quantification in the revised §4. Specifically, we added tables and figures reporting the initial residual norms and relative solution errors of the neural-operator outputs on out-of-distribution problems. These results show that the learned warm starts consistently produce smaller initial residuals than zero or random initializations, directly explaining the observed 50–90 % iteration reductions even when the test instances differ from the training distribution. revision: yes
Circularity Check
No significant circularity; speedup follows from standard Krylov theory plus external neural-operator training
full rationale
The paper presents NOWS as a hybrid that uses a separately trained neural operator to supply initial guesses to unmodified Krylov solvers (CG, GMRES, etc.). Convergence guarantees and the iteration-reduction mechanism are inherited from classical numerical linear algebra, not derived inside the paper. No equations equate the claimed runtime reduction to a fitted parameter by construction, and no load-bearing premise rests on a self-citation chain whose validity is presupposed. The training distribution and generalization behavior are treated as empirical questions outside the derivation itself, consistent with the reader's assessment of score 2.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Krylov subspace methods converge for any initial guess, with iteration count depending on the quality of that guess.
invented entities (1)
-
Neural Operator Warm Starts (NOWS)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
NOWS employs a neural operator to generate high-quality initial guesses that sharply reduce the initial residual, thereby lowering the iteration count required for full convergence... preserving the stability, interpretability, and rigorous convergence guarantees of the underlying numerical method
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Quarteroni, A. & V alli, A. Numerical approximation of partial differential equations (Springer, 1994)
work page 1994
-
[2]
Menghal, P . & Laxmi, A. J. Real time simulation: Recent progress & challenges. In 2012 International Conference on Power , Signals, Controls and Computation, 1–6 (IEEE, 2012)
work page 2012
-
[3]
Biegler, L. T. Nonlinear programming: concepts, algorithms, and applications to chemical processes (SIAM, 2010)
work page 2010
-
[4]
Smith, R. C. Uncertainty quantification: theory, implementation, and applications (SIAM, 2024)
work page 2024
-
[5]
Fuller, A., Fan, Z., Day, C. & Barlow, C. Digital twin: enabling technologies, challenges and open research. IEEE access 8, 108952–108971 (2020)
work page 2020
-
[6]
Es-haghi, M. S., Anitescu, C. & Rabczuk, T. Methods for enabling real-time analysis in digital twins: A literature review. Computers & Structures 297, 107342 (2024)
work page 2024
-
[7]
Hageman, L. A. & Y oung, D. M. Applied iterative methods (Courier Corporation, 2012)
work page 2012
-
[8]
Iterative methods for solving linear systems (SIAM, 1997)
Greenbaum, A. Iterative methods for solving linear systems (SIAM, 1997)
work page 1997
-
[9]
A.Iterative Krylov methods for large linear systems
V an der V orst, H. A.Iterative Krylov methods for large linear systems . 13 (Cambridge University Press, 2003)
work page 2003
-
[10]
Iterative methods by space decomposition and subspace correction
Xu, J. Iterative methods by space decomposition and subspace correction. SIAM Review 34, 581–613 (1992) . Available at https://doi.org/10.1137/1034116. https://doi.org/10.1137/1034116
-
[11]
Kawata, S. & Nalcioglu, O. Constrained iterative reconstruction by the conjugate gradient method. IEEE Transactions on Medical Imaging 4, 65–71 (1985)
work page 1985
- [12]
-
[13]
Strandén, I. & Lidauer, M. Solving large mixed linear models using preconditioned conjugate gradient iteration. Journal of Dairy Science 82, 2779–2787 (1999)
work page 1999
-
[14]
Solving sparse linear systems via flexible gmres with in-memory analog preconditioning
Kalantzis, V .et al. Solving sparse linear systems via flexible gmres with in-memory analog preconditioning. In 2023 IEEE High Performance Extreme Computing Conference (HPEC), 1–7 (2023)
work page 2023
-
[15]
Lindquist, N., Luszczek, P . & Dongarra, J. Accelerating restarted gmres with mixed precision arithmetic. IEEE Transactions on Parallel and Distributed Systems 33, 1027–1037 (2022)
work page 2022
-
[16]
Thomas, S., Carson, E., Rozložník, M., Carr, A. & ´Swirydowicz, K. Iterated gauss–seidel gmres. SIAM Journal on Scientific Computing 46, S254–S279 (2024) . Available at https://doi.org/10.1137/22M1491241. https: //doi.org/10.1137/22M1491241. 11/15
-
[17]
Amestoy, P .et al. Five-precision gmres-based iterative refinement. SIAM Journal on Matrix Analysis and Applications 45, 529–552 (2024) . Available at https://doi.org/10.1137/23M1549079. https://doi.org/10.1137/ 23M1549079
-
[18]
B., V an Wingerden, J.-W., V erhaegen, M
Qiu, Y ., V an Gijzen, M. B., V an Wingerden, J.-W., V erhaegen, M. & Vuik, C. Efficient preconditioners for pde-constrained optimization problems with a multi-level sequentially semi-separable matrix structure. Electronic Transactions on Numerical Analysis 44, 3 (2015)
work page 2015
- [19]
-
[20]
Kronbichler, M. & Kormann, K. A generic interface for parallel cell-based finite element operator application. Computers & Fluids 63, 135–147 (2012)
work page 2012
-
[21]
Li, Z. et al. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2003
-
[22]
Lu, L., Jin, P ., Pang, G., Zhang, Z. & Karniadakis, G. E. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature Machine Intelligence 3, 218–229 (2021)
work page 2021
-
[23]
Li, Z. et al. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[24]
Eshaghi, M. S. et al. V ariational physics-informed neural operator (vino) for solving partial differential equations.Computer Methods in Applied Mechanics and Engineering 437, 117785 (2025)
work page 2025
-
[25]
Hao, Z. et al. GNOT: A general neural operator transformer for operator learning. In Krause, A. et al. (eds.) Proceedings of the 40th International Conference on Machine Learning , vol. 202 of Proceedings of Machine Learning Research , 12556–12569 (PMLR, 2023). Available at https://proceedings.mlr.press/v202/hao23c.html
work page 2023
-
[26]
Shih, B., Peyvan, A., Zhang, Z. & Karniadakis, G. E. Transformers as neural operators for solutions of differential equations with finite regularity. Computer Methods in Applied Mechanics and Engineering 434, 117560 (2025) . Available at https://www.sciencedirect.com/science/article/pii/S0045782524008144
work page 2025
- [27]
- [28]
-
[29]
Fu, X. et al. Spatio-temporal neural operator on complex geometries. Computer Physics Communica- tions 315, 109754 (2025) . Available at https://www.sciencedirect.com/science/article/pii/ S0010465525002565
work page 2025
- [30]
- [31]
-
[32]
Um, K., Brand, R., Fei, Y . R., Holl, P . & Thuerey, N. Solver-in-the-loop: Learning from differentiable physics to interact with iterative pde-solvers. Advances in neural information processing systems 33, 6111–6122 (2020)
work page 2020
-
[33]
Learning Neural PDE Solvers with Convergence Guarantees
Hsieh, J.-T., Zhao, S., Eismann, S., Mirabella, L. & Ermon, S. Learning neural pde solvers with convergence guarantees. arXiv preprint arXiv:1906.01200 (2019)
-
[34]
He, J. & Xu, J. Mgnet: A unified framework of multigrid and convolutional neural network. Science china mathematics 62, 1331–1354 (2019)
work page 2019
-
[35]
Chen, Y ., Dong, B. & Xu, J. Meta-mgnet: Meta multigrid networks for solving parameterized partial differential equations. Journal of computational physics 455, 110996 (2022)
work page 2022
-
[36]
Huang, J., Wang, H. & Y ang, H. Int-deep: A deep learning initialized iterative method for nonlinear problems. Journal of computational physics 419, 109675 (2020)
work page 2020
-
[37]
Luz, I., Galun, M., Maron, H., Basri, R. & Y avneh, I. Learning algebraic multigrid using graph neural networks. In III, H. D. & Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning , vol. 119 of Proceedings of Machine Learning Research , 6489–6499 (PMLR, 2020). Available at https://proceedings.mlr.press/ v119/luz20a.html. 12/15
work page 2020
-
[38]
Greenfeld, D., Galun, M., Basri, R., Y avneh, I. & Kimmel, R. Learning to optimize multigrid PDE solvers. In Chaudhuri, K. & Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning , vol. 97 of Proceedings of Machine Learning Research , 2415–2423 (PMLR, 2019). Available at https://proceedings.mlr.press/ v97/greenfeld19a.html
work page 2019
-
[39]
Azulay, Y . & Treister, E. Multigrid-augmented deep learning preconditioners for the helmholtz equation. SIAM Journal on Scientific Computing 45, S127–S151 (2022)
work page 2022
-
[40]
Tan, S., Miao, K., Edelman, A. & Rackauckas, C. Scalable higher-order nonlinear solvers via higher-order automatic differentiation. arXiv preprint arXiv:2501.16895 (2025)
-
[41]
Lee, Y .et al. Fast meta-solvers for 3d complex-shape scatterers using neural operators trained on a non-scattering problem. Computer Methods in Applied Mechanics and Engineering 446, 118231 (2025)
work page 2025
-
[42]
Herb, J. & Fritzen, F. Accelerating conjugate gradient solvers for homogenization problems with unitary neural operators. arXiv preprint arXiv:2508.02681 (2025)
-
[43]
Giraud, L., Kruse, C., Mycek, P ., Shpakovych, M. & Xiang, Y . Neural network preconditioning: a case study for the solution of the parametric Helmholtz equation . Ph.D. thesis, Inria Centre at the University of Bordeaux, France (2025). Available at https://hal.science/hal-05157038
work page 2025
-
[44]
Zhang, E. et al. Blending neural operators and relaxation methods in pde numerical solvers. Nature Machine Intelligence 6, 1303–1313 (2024)
work page 2024
- [45]
- [46]
-
[47]
Kopaniˇcáková, A., Lee, Y . & Karniadakis, G. E. Leveraging operator learning to accelerate convergence of the precondi- tioned conjugate gradient method. arXiv preprint arXiv:2508.00101 (2025)
-
[48]
Rubio, R., Ferrer, A. & Hernández, J. Preconditioning iterative solvers via the empirical interscale finite element method (eifem). Computer Methods in Applied Mechanics and Engineering 446, 118257 (2025)
work page 2025
-
[49]
Song, J., Cao, W. & Zhang, W. A matrix preconditioning framework for physics-informed neural networks based on adjoint method. arXiv preprint arXiv:2508.03421 (2025)
-
[50]
Zhou, X.-H. et al. Neural operator-based super-fidelity: A warm-start approach for accelerating steady-state simulations. Journal of Computational Physics 529, 113871 (2025)
work page 2025
-
[51]
Eshaghi, M. S. et al. Multi-head neural operator for modelling interfacial dynamics. arXiv preprint arXiv:2507.17763 (2025). Acknowledgement The authors would like to acknowledge the support provided by the German Academic Exchange Service (DAAD) through a scholarship awarded to Mohammad Sadegh Eshaghi during this research, as well as the Compute Servers ...
-
[52]
Sample a coefficient function a(x) from a prescribed distribution
-
[53]
Solve the PDE numerically to obtain the corresponding solution u(x)
-
[54]
Evaluate the neural operator on a(x) and compute the predicted ˆu(x)
-
[55]
Minimize a loss function such as the relative L2 error, L (θ ) = ∥u − ˆu∥L2(D) ∥u∥L2(D) . Once trained, the same model can be evaluated on unseen meshes, finer resolutions, or new geometries, owing to its continuous and mesh-independent formulation. Interpretation and Applications. Neural operators provide a unifying framework for learning solution operato...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.