Self-supervised neural operator for solving partial differential equations

Shaoqian Zhou; Wen You; Xuhui Meng

arxiv: 2509.00867 · v3 · pith:S32U7FZNnew · submitted 2025-08-31 · ⚛️ physics.comp-ph

Self-supervised neural operator for solving partial differential equations

Wen You , Shaoqian Zhou , Xuhui Meng This is my paper

Pith reviewed 2026-05-21 22:04 UTC · model grok-4.3

classification ⚛️ physics.comp-ph

keywords self-supervised neural operatorpartial differential equationsphysics-informed neural networksneural operatorstransformerdata generationPDE solverfluid dynamics

0 comments

The pith

A self-supervised neural operator learns PDE solutions by generating its own training data without numerical solvers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that neural operators can solve partial differential equations in a self-supervised manner by creating accurate training examples during the learning process. This matters because conventional training depends on costly simulations that limit use in complex or changing systems. The approach uses a sampler grounded in physics-informed networks to produce varied data on the fly, then applies a function encoder and Transformer to map boundary conditions, sources, and geometries directly to solutions. Tests cover steady and unsteady one-dimensional reaction-diffusion cases, two-dimensional problems with changing shapes, and a fluid-structure interaction example involving cylinder vibrations. If the claim holds, it supports building general-purpose PDE models that require far less upfront computation.

Core claim

The self-supervised neural operator (SNO) generates accurate and diverse training data on the fly without numerical solvers. It consists of a physics-informed sampler based on Bayesian PINNs for efficient data generation, a function encoder for compact input-output representations, and an encoder-only Transformer for operator learning that maps boundary/initial conditions, source terms, and geometries to PDE solutions. SNO achieves high accuracy on 1D steady/unsteady nonlinear reaction-diffusion equations, a 2D nonlinear PDE with varying geometries, and vortex-induced vibration of a flexible cylinder in fluid dynamics, with lightweight finetuning of O(100) trainable variables further raising

What carries the argument

The self-supervised neural operator (SNO) that combines a Bayesian physics-informed sampler to create training data, a function encoder for representations, and a Transformer to learn the mapping from conditions to solutions.

If this is right

High-accuracy solutions for nonlinear PDEs without precomputed datasets from solvers.
Effective handling of problems with varying geometries in two dimensions.
Modeling of fluid dynamics cases such as vortex-induced vibrations on flexible structures.
Further accuracy gains from lightweight finetuning using only a few hundred steps.
A route toward pretrained models that act as efficient surrogates for PDE solving.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could lower computational barriers for PDE work in engineering where generating large training sets is impractical.
It may scale to time-dependent three-dimensional problems if the sampler continues to produce suitable data.
Hybrid systems could combine SNO approximations with occasional full simulations to balance speed and precision.

Load-bearing premise

The physics-informed sampler can produce accurate and diverse enough training data on its own to let the encoder and Transformer learn a reliable operator without any external high-fidelity simulations.

What would settle it

Comparing SNO predictions against results from a standard numerical solver on a new nonlinear PDE problem and finding errors that remain large even after additional training steps.

Figures

Figures reproduced from arXiv: 2509.00867 by Shaoqian Zhou, Wen You, Xuhui Meng.

**Figure 2.** Figure 2: Schematic of Bayesian physics-informed neural networks (B-PINNs), in which [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Schematic of physics-informed sampler (PI-sampler), where the Bayesian neural [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Schematic of function encoder (FE), in which (1) the [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Schematic of the encoder-only Transformer for operator learning, in which [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: SNO for steady nonlinear reaction-diffusion equation: predictions of [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: SNO for time-dependent reaction-diffusion equation: (a) In-distribution testing [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: SNO for time-dependent reaction-diffusion equation: Predicted [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: SNO for nonlinear elliptic partial differential equation: (a) In-distribution [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: SNO for nonlinear elliptic partial differential equation: geometries unseen [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 10.** Figure 10: Here we employ the same testing data for [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

**Figure 11.** Figure 11: Schematic of the vortex-induced-vibration of a flexible cylinder. [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: SNO for VIV problem: Predicted η from SNO. The time domains at the pretraining and testing stages are t ∈ [0, 1] and t ∈ [0, 10], respectively. Colored background and solid line: reference solutions, Red dashed: predictions from SNO. In this specific case, the training data are generated using the same BNNs as in Sec. 3.2. After the pretraining of SNO, we assume that the solution for η in the new test ca… view at source ↗

read the original abstract

Neural operators (NOs) provide a new paradigm for efficiently solving partial differential equations (PDEs), but their training depends on costly high-fidelity data from numerical solvers, limiting applications in complex systems. We propose a self-supervised neural operator (SNO) that generates accurate and diverse training data on the fly without numerical solvers. SNO consists of three parts: a physics-informed sampler (PI-sampler) based on Bayesian PINNs for efficient data generation, a function encoder (FE) for compact input-output representations, and an encoder-only Transformer for operator learning, mapping boundary/initial conditions, source terms, and geometries to PDE solutions. We validate SNO on 1D steady/unsteady nonlinear reaction-diffusion equations, a 2D nonlinear PDE with varying geometries, and vortex-induced vibration of a flexible cylinder in fluid dynamics. SNO achieves high accuracy in all cases, and lightweight finetuning (O(100) trainable variables) further improves predictions with only a few hundred steps. This work provides a new route toward pretrained foundation models as efficient PDE surrogates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SNO aims to skip expensive numerical data for neural operators by using Bayesian PINNs to generate training pairs on the fly, but the abstract and claims leave the sampler's accuracy and diversity unverified.

read the letter

Colleague, the main takeaway is that this paper tries to remove the usual bottleneck in neural operator training by generating its own data via a Bayesian PINN sampler instead of running high-fidelity solvers first. If the sampler actually produces accurate and diverse enough pairs, it could help scale these models to more complex problems, but the current write-up does not show that it does. What is new is the specific three-part pipeline: the physics-informed sampler, a function encoder for compact representations, and an encoder-only Transformer that maps inputs like boundaries and geometries to solutions. They test the whole thing on 1D nonlinear reaction-diffusion cases, a 2D PDE with changing geometries, and a vortex-induced vibration problem in fluids, plus a lightweight fine-tuning step that uses only a few hundred parameters. That combination is not just a restatement of prior PINN or operator work, and the goal of moving toward pretrained PDE foundation models is a reasonable direction. The paper does a decent job framing the motivation and laying out the components clearly. The soft spot is the lack of any quantitative checks on the sampler itself. There are no L2 error tables against ground-truth solvers, no posterior diagnostics, no diversity measures, and no side-by-side comparisons for the nonlinear cases where Bayesian PINNs are known to have convergence and mode-capture issues. Without those, it is difficult to tell whether the reported high accuracy comes from the self-supervised route or from other factors. The abstract asserts good results but supplies none of the usual metrics or baselines. This is for readers working on scientific machine learning who want to reduce dependence on traditional solvers for operator learning. Someone already familiar with PINNs and Transformers could extract the integration idea and the test problems, but they would need the full experimental details to judge reliability. I would send it to peer review. The idea addresses a real practical issue and the architecture is grounded enough to merit referee time, even though the validation will probably need substantial strengthening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a self-supervised neural operator (SNO) for PDEs that generates its own training data without external numerical solvers. The architecture comprises a physics-informed sampler (PI-sampler) based on Bayesian PINNs, a function encoder (FE) for compact input-output representations, and an encoder-only Transformer that maps boundary/initial conditions, source terms, and geometries to solutions. The method is demonstrated on 1D steady and unsteady nonlinear reaction-diffusion equations, a 2D nonlinear PDE with varying geometries, and vortex-induced vibration of a flexible cylinder, with additional claims that lightweight finetuning (O(100) parameters) further improves accuracy in a few hundred steps.

Significance. If the central claims are substantiated, the work could meaningfully reduce the data-generation bottleneck for neural operators and support development of pretrained PDE foundation models. The combination of Bayesian PINN sampling with Transformer-based operator learning is a coherent extension of existing PINN and neural-operator literature, and the breadth of test problems (including a fluid-structure interaction case) is appropriate for the claim.

major comments (2)

[Abstract] Abstract and validation sections: The central claim that the Bayesian-PINN-based PI-sampler produces training pairs that are both accurate and sufficiently diverse for the FE+Transformer to learn reliable operators without any high-fidelity external data is not accompanied by quantitative support. No L2-error tables, posterior diagnostics, or side-by-side comparisons against finite-difference or finite-element references are referenced for the nonlinear reaction-diffusion or 2D geometry-varying cases.
[Method] Method description of the PI-sampler: The assertion that the physics-informed sampler 'efficiently produce[s] accurate and diverse training data on the fly' is load-bearing for the self-supervised route, yet no convergence metrics, mode-capture diagnostics, or diversity measures (e.g., coverage of the solution manifold) are provided to show that the residual-loss optimization succeeds for the nonlinear problems where Bayesian PINNs are known to be sensitive to initialization and multimodality.

minor comments (2)

[Abstract] The phrase 'lightweight finetuning (O(100) trainable variables)' would benefit from an explicit statement of which subset of parameters is updated and how the O(100) count is obtained.
[Method] Notation for the function encoder (FE) and its interface with the Transformer could be introduced with a small diagram or equation block to clarify the dimensionality reduction step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential of the self-supervised approach. We address each major comment below and have revised the manuscript to incorporate additional quantitative validation where the original submission was lacking.

read point-by-point responses

Referee: [Abstract] Abstract and validation sections: The central claim that the Bayesian-PINN-based PI-sampler produces training pairs that are both accurate and sufficiently diverse for the FE+Transformer to learn reliable operators without any high-fidelity external data is not accompanied by quantitative support. No L2-error tables, posterior diagnostics, or side-by-side comparisons against finite-difference or finite-element references are referenced for the nonlinear reaction-diffusion or 2D geometry-varying cases.

Authors: We agree that the original manuscript would benefit from explicit quantitative support for the accuracy and diversity of the PI-sampler outputs. In the revised version we have added L2-error tables that directly compare Bayesian-PINN-generated solutions against independent finite-difference and finite-element references for both the 1D nonlinear reaction-diffusion problems and the 2D geometry-varying cases. We have also included posterior diagnostics (Gelman-Rubin statistics, effective sample sizes) and visualizations that quantify the diversity of the sampled training pairs. revision: yes
Referee: [Method] Method description of the PI-sampler: The assertion that the physics-informed sampler 'efficiently produce[s] accurate and diverse training data on the fly' is load-bearing for the self-supervised route, yet no convergence metrics, mode-capture diagnostics, or diversity measures (e.g., coverage of the solution manifold) are provided to show that the residual-loss optimization succeeds for the nonlinear problems where Bayesian PINNs are known to be sensitive to initialization and multimodality.

Authors: We acknowledge that the known sensitivities of Bayesian PINNs to initialization and multimodality require explicit supporting evidence. The revised manuscript now contains convergence plots of the residual loss, results from multiple independent chains demonstrating mode capture, and quantitative diversity metrics (pairwise solution distances and manifold coverage statistics) for the nonlinear test problems. These additions substantiate that the residual-loss optimization produces sufficiently accurate and diverse training data for the downstream operator learner. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain is self-contained

full rationale

The paper's central method chains a Bayesian-PINN-based PI-sampler to generate training pairs on the fly, followed by a function encoder and encoder-only Transformer to learn the operator mapping. Validation is asserted on external benchmark PDEs (1D reaction-diffusion, 2D variable-geometry, and vortex-induced vibration) with claimed high accuracy and optional lightweight finetuning. No quoted equation or step reduces a prediction to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation whose content is itself unverified within the paper. The approach therefore remains independent of its own outputs and relies on external physics constraints and standard neural-operator components.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The work introduces new architectural components for self-supervised learning but rests on standard assumptions about neural network approximation of PDE solutions and Bayesian inference in PINNs. No explicit free parameters are named in the abstract.

axioms (1)

domain assumption Neural networks can approximate solutions to partial differential equations when trained with physics constraints.
Core premise underlying both PINNs and neural operators, invoked for the sampler and operator learning.

invented entities (2)

Physics-informed sampler (PI-sampler) no independent evidence
purpose: Generate accurate and diverse training data on the fly using Bayesian PINNs
New component proposed to replace external numerical solvers.
Function encoder (FE) no independent evidence
purpose: Provide compact input-output representations for the operator
Architectural element introduced as part of SNO.

pith-pipeline@v0.9.0 · 5712 in / 1453 out tokens · 52668 ms · 2026-05-21T22:04:20.409270+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SNO consists of three parts: a physics-informed sampler (PI-sampler) based on Bayesian PINNs for efficient data generation, a function encoder (FE) for compact input-output representations, and an encoder-only Transformer for operator learning
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The PI-sampler is inspired by the Bayesian physics-informed neural networks (B-PINNs)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

[1]

Schunk, Mathematical structure of transport equations for multi- species flows, Reviews of Geophysics 15 (4) (1977) 429–445

R. Schunk, Mathematical structure of transport equations for multi- species flows, Reviews of Geophysics 15 (4) (1977) 429–445. 30

work page 1977
[2]

Zhang, L

D. Zhang, L. Lu, L. Guo, G. E. Karniadakis, Quantifying total uncer- tainty in physics-informed neural networks for solving forward and in- verse stochastic problems, Journal of Computational Physics 397 (2019) 108850

work page 2019
[3]

Mazumder, Boltzmann transport equation based modeling of phonon heat conduction: progress and challenges, Annual Review of Heat Trans- fer 24 (2021)

S. Mazumder, Boltzmann transport equation based modeling of phonon heat conduction: progress and challenges, Annual Review of Heat Trans- fer 24 (2021)

work page 2021
[4]

Q. Lin, C. Zhang, X. Meng, Z. Guo, Monte carlo physics-informed neural networks for multiscale heat conduction via phonon boltzmann transport equation, arXiv preprint arXiv:2408.10965 (2024)

work page arXiv 2024
[5]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stu- art, A. Anandkumar, Fourier neural operator for parametric partial dif- ferential equations, arXiv preprint arXiv:2010.08895 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[6]

Raissi, P

M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational physics 378 (2019) 686–707

work page 2019
[7]

Z. Long, Y. Lu, X. Ma, B. Dong, Pde-net: Learning pdes from data, in: International conference on machine learning, PMLR, 2018, pp. 3208– 3216

work page 2018
[8]

Sirignano, K

J. Sirignano, K. Spiliopoulos, Dgm: A deep learning algorithm for solv- ing partial differential equations, Journal of computational physics 375 (2018) 1339–1364

work page 2018
[9]

Yu, et al., The deep ritz method: a deep learning-based numerical algorithm for solving variational problems, Communications in Mathe- matics and Statistics 6 (1) (2018) 1–12

B. Yu, et al., The deep ritz method: a deep learning-based numerical algorithm for solving variational problems, Communications in Mathe- matics and Statistics 6 (1) (2018) 1–12

work page 2018
[10]

L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via deeponet based on the universal approximation theorem of operators, Nature machine intelligence 3 (3) (2021) 218–229

work page 2021
[11]

Q. Cao, S. Goswami, G. E. Karniadakis, Laplace neural operator for solving differential equations, Nature Machine Intelligence 6 (6) (2024) 631–640. 31

work page 2024
[12]

Ovadia, A

O. Ovadia, A. Kahana, P. Stinis, E. Turkel, D. Givoli, G. E. Karniadakis, Vito: Vision transformer-operator, Computer Methods in Applied Me- chanics and Engineering 428 (2024) 117109

work page 2024
[13]

Cheng, J

C.-W. Cheng, J. Huang, Y. Zhang, G. Yang, C.-B. Sch¨ onlieb, A. I. Aviles-Rivero, Mamba neural operator: Who wins? transformers vs. state-space models for pdes, arXiv preprint arXiv:2410.02113 (2024)

work page arXiv 2024
[14]

B. Shih, A. Peyvan, Z. Zhang, G. E. Karniadakis, Transformers as neural operators for solutions of differential equations with finite regularity, Computer Methods in Applied Mechanics and Engineering 434 (2025) 117560

work page 2025
[15]

Cao, Choose a transformer: Fourier or galerkin, Advances in neural information processing systems 34 (2021) 24924–24940

S. Cao, Choose a transformer: Fourier or galerkin, Advances in neural information processing systems 34 (2021) 24924–24940

work page 2021
[16]

L. Lu, X. Meng, S. Cai, Z. Mao, S. Goswami, Z. Zhang, G. E. Karni- adakis, A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data, Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778

work page 2022
[17]

Kovachki, Z

N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stu- art, A. Anandkumar, Neural operator: Learning maps between function spaces with applications to pdes, Journal of Machine Learning Research 24 (89) (2023) 1–97

work page 2023
[18]

Azizzadenesheli, N

K. Azizzadenesheli, N. Kovachki, Z. Li, M. Liu-Schiaffini, J. Kossaifi, A. Anandkumar, Neural operators for accelerating scientific simulations and design, Nature Reviews Physics 6 (5) (2024) 320–328

work page 2024
[19]

P. Jin, S. Meng, L. Lu, Mionet: Learning multiple-input operators via tensor product, SIAM Journal on Scientific Computing 44 (6) (2022) A3490–A3514

work page 2022
[20]

Shukla, V

K. Shukla, V. Oommen, A. Peyvan, M. Penwarden, N. Plewacki, L. Bravo, A. Ghoshal, R. M. Kirby, G. E. Karniadakis, Deep neural operators as accurate surrogates for shape optimization, Engineering Applications of Artificial Intelligence 129 (2024) 107615

work page 2024
[21]

Z. Li, N. Kovachki, C. Choy, B. Li, J. Kossaifi, S. Otta, M. A. Nabian, M. Stadler, C. Hundt, K. Azizzadenesheli, et al., Geometry-informed 32 neural operator for large-scale 3d pdes, Advances in Neural Information Processing Systems 36 (2023) 35836–35854

work page 2023
[22]

Z. Ye, Z. Liu, B. Wu, H. Jiang, L. Chen, M. Zhang, X. Huang, Q. M. Zou, H. Liu, B. Dong, et al., Pdeformer-2: A versatile foundation model for two-dimensional partial differential equations, arXiv preprint arXiv:2507.15409 (2025)

work page arXiv 2025
[23]

Subramanian, P

S. Subramanian, P. Harrington, K. Keutzer, W. Bhimji, D. Morozov, M. W. Mahoney, A. Gholami, Towards foundation models for scien- tific machine learning: Characterizing scaling and transfer behavior, Advances in Neural Information Processing Systems 36 (2023) 71242– 71262

work page 2023
[24]

J. Sun, Y. Liu, Z. Zhang, H. Schaeffer, Towards a foundation model for partial differential equations: Multioperator learning and extrapolation, Physical Review E 111 (3) (2025) 035304

work page 2025
[25]

Herde, B

M. Herde, B. Raonic, T. Rohner, R. K¨ appeli, R. Molinaro, E. de B´ ezenac, S. Mishra, Poseidon: Efficient foundation models for pdes, Advances in Neural Information Processing Systems 37 (2024) 72525–72624

work page 2024
[26]

Y. Liu, J. Sun, X. He, G. Pinney, Z. Zhang, H. Schaeffer, Prose-fd: A multimodal pde foundation model for learning multiple operators for forecasting fluid dynamics, arXiv preprint arXiv:2409.09811 (2024)

work page arXiv 2024
[27]

Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred Warmuth

R. Balestriero, M. Ibrahim, V. Sobal, A. Morcos, S. Shekhar, T. Gold- stein, F. Bordes, A. Bardes, G. Mialon, Y. Tian, et al., A cookbook of self-supervised learning, arXiv preprint arXiv:2304.12210 (2023)

work page arXiv 2023
[28]

X. Yang, Z. Song, I. King, Z. Xu, A survey on deep semi-supervised learning, IEEE transactions on knowledge and data engineering 35 (9) (2022) 8934–8954

work page 2022
[29]

L. Yang, D. Zhang, G. E. Karniadakis, Physics-informed generative ad- versarial networks for stochastic differential equations, SIAM Journal on Scientific Computing 42 (1) (2020) A292–A317

work page 2020
[30]

R. M. Neal, Bayesian learning for neural networks, Vol. 118, Springer Science & Business Media, 2012. 33

work page 2012
[31]

G. Pang, L. Yang, G. E. Karniadakis, Neural-net-induced gaussian pro- cess regression for function approximation and pde solution, Journal of Computational Physics 384 (2019) 270–288

work page 2019
[32]

Pearce, R

T. Pearce, R. Tsuchida, M. Zaki, A. Brintrup, A. Neely, Expressive priors in bayesian neural networks: Kernel combinations and periodic functions, in: Uncertainty in artificial intelligence, PMLR, 2020, pp. 134–144

work page 2020
[33]

C. K. Williams, C. E. Rasmussen, Gaussian processes for machine learn- ing, Vol. 2, MIT press Cambridge, MA, 2006

work page 2006
[34]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017)

work page 2017
[35]

K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, et al., A survey on vision transformer, IEEE transactions on pattern analysis and machine intelligence 45 (1) (2022) 87–110

work page 2022
[36]

Kharazmi, D

E. Kharazmi, D. Fan, Z. Wang, M. S. Triantafyllou, Inferring vortex induced vibrations of flexible cylinders using physics-informed neural networks, Journal of Fluids and Structures 107 (2021) 103367

work page 2021
[37]

Gehring, M

J. Gehring, M. Auli, D. Grangier, D. Yarats, Y. N. Dauphin, Convo- lutional sequence to sequence learning, in: International conference on machine learning, PMLR, 2017, pp. 1243–1252. 34

work page 2017

[1] [1]

Schunk, Mathematical structure of transport equations for multi- species flows, Reviews of Geophysics 15 (4) (1977) 429–445

R. Schunk, Mathematical structure of transport equations for multi- species flows, Reviews of Geophysics 15 (4) (1977) 429–445. 30

work page 1977

[2] [2]

Zhang, L

D. Zhang, L. Lu, L. Guo, G. E. Karniadakis, Quantifying total uncer- tainty in physics-informed neural networks for solving forward and in- verse stochastic problems, Journal of Computational Physics 397 (2019) 108850

work page 2019

[3] [3]

Mazumder, Boltzmann transport equation based modeling of phonon heat conduction: progress and challenges, Annual Review of Heat Trans- fer 24 (2021)

S. Mazumder, Boltzmann transport equation based modeling of phonon heat conduction: progress and challenges, Annual Review of Heat Trans- fer 24 (2021)

work page 2021

[4] [4]

Q. Lin, C. Zhang, X. Meng, Z. Guo, Monte carlo physics-informed neural networks for multiscale heat conduction via phonon boltzmann transport equation, arXiv preprint arXiv:2408.10965 (2024)

work page arXiv 2024

[5] [5]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stu- art, A. Anandkumar, Fourier neural operator for parametric partial dif- ferential equations, arXiv preprint arXiv:2010.08895 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[6] [6]

Raissi, P

M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational physics 378 (2019) 686–707

work page 2019

[7] [7]

Z. Long, Y. Lu, X. Ma, B. Dong, Pde-net: Learning pdes from data, in: International conference on machine learning, PMLR, 2018, pp. 3208– 3216

work page 2018

[8] [8]

Sirignano, K

J. Sirignano, K. Spiliopoulos, Dgm: A deep learning algorithm for solv- ing partial differential equations, Journal of computational physics 375 (2018) 1339–1364

work page 2018

[9] [9]

Yu, et al., The deep ritz method: a deep learning-based numerical algorithm for solving variational problems, Communications in Mathe- matics and Statistics 6 (1) (2018) 1–12

B. Yu, et al., The deep ritz method: a deep learning-based numerical algorithm for solving variational problems, Communications in Mathe- matics and Statistics 6 (1) (2018) 1–12

work page 2018

[10] [10]

L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via deeponet based on the universal approximation theorem of operators, Nature machine intelligence 3 (3) (2021) 218–229

work page 2021

[11] [11]

Q. Cao, S. Goswami, G. E. Karniadakis, Laplace neural operator for solving differential equations, Nature Machine Intelligence 6 (6) (2024) 631–640. 31

work page 2024

[12] [12]

Ovadia, A

O. Ovadia, A. Kahana, P. Stinis, E. Turkel, D. Givoli, G. E. Karniadakis, Vito: Vision transformer-operator, Computer Methods in Applied Me- chanics and Engineering 428 (2024) 117109

work page 2024

[13] [13]

Cheng, J

C.-W. Cheng, J. Huang, Y. Zhang, G. Yang, C.-B. Sch¨ onlieb, A. I. Aviles-Rivero, Mamba neural operator: Who wins? transformers vs. state-space models for pdes, arXiv preprint arXiv:2410.02113 (2024)

work page arXiv 2024

[14] [14]

B. Shih, A. Peyvan, Z. Zhang, G. E. Karniadakis, Transformers as neural operators for solutions of differential equations with finite regularity, Computer Methods in Applied Mechanics and Engineering 434 (2025) 117560

work page 2025

[15] [15]

Cao, Choose a transformer: Fourier or galerkin, Advances in neural information processing systems 34 (2021) 24924–24940

S. Cao, Choose a transformer: Fourier or galerkin, Advances in neural information processing systems 34 (2021) 24924–24940

work page 2021

[16] [16]

L. Lu, X. Meng, S. Cai, Z. Mao, S. Goswami, Z. Zhang, G. E. Karni- adakis, A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data, Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778

work page 2022

[17] [17]

Kovachki, Z

N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stu- art, A. Anandkumar, Neural operator: Learning maps between function spaces with applications to pdes, Journal of Machine Learning Research 24 (89) (2023) 1–97

work page 2023

[18] [18]

Azizzadenesheli, N

K. Azizzadenesheli, N. Kovachki, Z. Li, M. Liu-Schiaffini, J. Kossaifi, A. Anandkumar, Neural operators for accelerating scientific simulations and design, Nature Reviews Physics 6 (5) (2024) 320–328

work page 2024

[19] [19]

P. Jin, S. Meng, L. Lu, Mionet: Learning multiple-input operators via tensor product, SIAM Journal on Scientific Computing 44 (6) (2022) A3490–A3514

work page 2022

[20] [20]

Shukla, V

K. Shukla, V. Oommen, A. Peyvan, M. Penwarden, N. Plewacki, L. Bravo, A. Ghoshal, R. M. Kirby, G. E. Karniadakis, Deep neural operators as accurate surrogates for shape optimization, Engineering Applications of Artificial Intelligence 129 (2024) 107615

work page 2024

[21] [21]

Z. Li, N. Kovachki, C. Choy, B. Li, J. Kossaifi, S. Otta, M. A. Nabian, M. Stadler, C. Hundt, K. Azizzadenesheli, et al., Geometry-informed 32 neural operator for large-scale 3d pdes, Advances in Neural Information Processing Systems 36 (2023) 35836–35854

work page 2023

[22] [22]

Z. Ye, Z. Liu, B. Wu, H. Jiang, L. Chen, M. Zhang, X. Huang, Q. M. Zou, H. Liu, B. Dong, et al., Pdeformer-2: A versatile foundation model for two-dimensional partial differential equations, arXiv preprint arXiv:2507.15409 (2025)

work page arXiv 2025

[23] [23]

Subramanian, P

S. Subramanian, P. Harrington, K. Keutzer, W. Bhimji, D. Morozov, M. W. Mahoney, A. Gholami, Towards foundation models for scien- tific machine learning: Characterizing scaling and transfer behavior, Advances in Neural Information Processing Systems 36 (2023) 71242– 71262

work page 2023

[24] [24]

J. Sun, Y. Liu, Z. Zhang, H. Schaeffer, Towards a foundation model for partial differential equations: Multioperator learning and extrapolation, Physical Review E 111 (3) (2025) 035304

work page 2025

[25] [25]

Herde, B

M. Herde, B. Raonic, T. Rohner, R. K¨ appeli, R. Molinaro, E. de B´ ezenac, S. Mishra, Poseidon: Efficient foundation models for pdes, Advances in Neural Information Processing Systems 37 (2024) 72525–72624

work page 2024

[26] [26]

Y. Liu, J. Sun, X. He, G. Pinney, Z. Zhang, H. Schaeffer, Prose-fd: A multimodal pde foundation model for learning multiple operators for forecasting fluid dynamics, arXiv preprint arXiv:2409.09811 (2024)

work page arXiv 2024

[27] [27]

Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred Warmuth

R. Balestriero, M. Ibrahim, V. Sobal, A. Morcos, S. Shekhar, T. Gold- stein, F. Bordes, A. Bardes, G. Mialon, Y. Tian, et al., A cookbook of self-supervised learning, arXiv preprint arXiv:2304.12210 (2023)

work page arXiv 2023

[28] [28]

X. Yang, Z. Song, I. King, Z. Xu, A survey on deep semi-supervised learning, IEEE transactions on knowledge and data engineering 35 (9) (2022) 8934–8954

work page 2022

[29] [29]

L. Yang, D. Zhang, G. E. Karniadakis, Physics-informed generative ad- versarial networks for stochastic differential equations, SIAM Journal on Scientific Computing 42 (1) (2020) A292–A317

work page 2020

[30] [30]

R. M. Neal, Bayesian learning for neural networks, Vol. 118, Springer Science & Business Media, 2012. 33

work page 2012

[31] [31]

G. Pang, L. Yang, G. E. Karniadakis, Neural-net-induced gaussian pro- cess regression for function approximation and pde solution, Journal of Computational Physics 384 (2019) 270–288

work page 2019

[32] [32]

Pearce, R

T. Pearce, R. Tsuchida, M. Zaki, A. Brintrup, A. Neely, Expressive priors in bayesian neural networks: Kernel combinations and periodic functions, in: Uncertainty in artificial intelligence, PMLR, 2020, pp. 134–144

work page 2020

[33] [33]

C. K. Williams, C. E. Rasmussen, Gaussian processes for machine learn- ing, Vol. 2, MIT press Cambridge, MA, 2006

work page 2006

[34] [34]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017)

work page 2017

[35] [35]

K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, et al., A survey on vision transformer, IEEE transactions on pattern analysis and machine intelligence 45 (1) (2022) 87–110

work page 2022

[36] [36]

Kharazmi, D

E. Kharazmi, D. Fan, Z. Wang, M. S. Triantafyllou, Inferring vortex induced vibrations of flexible cylinders using physics-informed neural networks, Journal of Fluids and Structures 107 (2021) 103367

work page 2021

[37] [37]

Gehring, M

J. Gehring, M. Auli, D. Grangier, D. Yarats, Y. N. Dauphin, Convo- lutional sequence to sequence learning, in: International conference on machine learning, PMLR, 2017, pp. 1243–1252. 34

work page 2017