pith. sign in

arxiv: 2509.00867 · v3 · pith:S32U7FZNnew · submitted 2025-08-31 · ⚛️ physics.comp-ph

Self-supervised neural operator for solving partial differential equations

Pith reviewed 2026-05-21 22:04 UTC · model grok-4.3

classification ⚛️ physics.comp-ph
keywords self-supervised neural operatorpartial differential equationsphysics-informed neural networksneural operatorstransformerdata generationPDE solverfluid dynamics
0
0 comments X

The pith

A self-supervised neural operator learns PDE solutions by generating its own training data without numerical solvers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that neural operators can solve partial differential equations in a self-supervised manner by creating accurate training examples during the learning process. This matters because conventional training depends on costly simulations that limit use in complex or changing systems. The approach uses a sampler grounded in physics-informed networks to produce varied data on the fly, then applies a function encoder and Transformer to map boundary conditions, sources, and geometries directly to solutions. Tests cover steady and unsteady one-dimensional reaction-diffusion cases, two-dimensional problems with changing shapes, and a fluid-structure interaction example involving cylinder vibrations. If the claim holds, it supports building general-purpose PDE models that require far less upfront computation.

Core claim

The self-supervised neural operator (SNO) generates accurate and diverse training data on the fly without numerical solvers. It consists of a physics-informed sampler based on Bayesian PINNs for efficient data generation, a function encoder for compact input-output representations, and an encoder-only Transformer for operator learning that maps boundary/initial conditions, source terms, and geometries to PDE solutions. SNO achieves high accuracy on 1D steady/unsteady nonlinear reaction-diffusion equations, a 2D nonlinear PDE with varying geometries, and vortex-induced vibration of a flexible cylinder in fluid dynamics, with lightweight finetuning of O(100) trainable variables further raising

What carries the argument

The self-supervised neural operator (SNO) that combines a Bayesian physics-informed sampler to create training data, a function encoder for representations, and a Transformer to learn the mapping from conditions to solutions.

If this is right

  • High-accuracy solutions for nonlinear PDEs without precomputed datasets from solvers.
  • Effective handling of problems with varying geometries in two dimensions.
  • Modeling of fluid dynamics cases such as vortex-induced vibrations on flexible structures.
  • Further accuracy gains from lightweight finetuning using only a few hundred steps.
  • A route toward pretrained models that act as efficient surrogates for PDE solving.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could lower computational barriers for PDE work in engineering where generating large training sets is impractical.
  • It may scale to time-dependent three-dimensional problems if the sampler continues to produce suitable data.
  • Hybrid systems could combine SNO approximations with occasional full simulations to balance speed and precision.

Load-bearing premise

The physics-informed sampler can produce accurate and diverse enough training data on its own to let the encoder and Transformer learn a reliable operator without any external high-fidelity simulations.

What would settle it

Comparing SNO predictions against results from a standard numerical solver on a new nonlinear PDE problem and finding errors that remain large even after additional training steps.

Figures

Figures reproduced from arXiv: 2509.00867 by Shaoqian Zhou, Wen You, Xuhui Meng.

Figure 1
Figure 1. Figure 1: Schematic of self-supervised neural operator, which consists of three parts: (I) [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Schematic of Bayesian physics-informed neural networks (B-PINNs), in which [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Schematic of physics-informed sampler (PI-sampler), where the Bayesian neural [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Schematic of function encoder (FE), in which (1) the [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Schematic of the encoder-only Transformer for operator learning, in which [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: SNO for steady nonlinear reaction-diffusion equation: predictions of [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: SNO for time-dependent reaction-diffusion equation: (a) In-distribution testing [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: SNO for time-dependent reaction-diffusion equation: Predicted [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: SNO for nonlinear elliptic partial differential equation: (a) In-distribution [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: SNO for nonlinear elliptic partial differential equation: geometries unseen [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 10
Figure 10. Figure 10: Here we employ the same testing data for [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Schematic of the vortex-induced-vibration of a flexible cylinder. [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: SNO for VIV problem: Predicted η from SNO. The time domains at the pre￾training and testing stages are t ∈ [0, 1] and t ∈ [0, 10], respectively. Colored background and solid line: reference solutions, Red dashed: predictions from SNO. In this specific case, the training data are generated using the same BNNs as in Sec. 3.2. After the pretraining of SNO, we assume that the solution for η in the new test ca… view at source ↗
read the original abstract

Neural operators (NOs) provide a new paradigm for efficiently solving partial differential equations (PDEs), but their training depends on costly high-fidelity data from numerical solvers, limiting applications in complex systems. We propose a self-supervised neural operator (SNO) that generates accurate and diverse training data on the fly without numerical solvers. SNO consists of three parts: a physics-informed sampler (PI-sampler) based on Bayesian PINNs for efficient data generation, a function encoder (FE) for compact input-output representations, and an encoder-only Transformer for operator learning, mapping boundary/initial conditions, source terms, and geometries to PDE solutions. We validate SNO on 1D steady/unsteady nonlinear reaction-diffusion equations, a 2D nonlinear PDE with varying geometries, and vortex-induced vibration of a flexible cylinder in fluid dynamics. SNO achieves high accuracy in all cases, and lightweight finetuning (O(100) trainable variables) further improves predictions with only a few hundred steps. This work provides a new route toward pretrained foundation models as efficient PDE surrogates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a self-supervised neural operator (SNO) for PDEs that generates its own training data without external numerical solvers. The architecture comprises a physics-informed sampler (PI-sampler) based on Bayesian PINNs, a function encoder (FE) for compact input-output representations, and an encoder-only Transformer that maps boundary/initial conditions, source terms, and geometries to solutions. The method is demonstrated on 1D steady and unsteady nonlinear reaction-diffusion equations, a 2D nonlinear PDE with varying geometries, and vortex-induced vibration of a flexible cylinder, with additional claims that lightweight finetuning (O(100) parameters) further improves accuracy in a few hundred steps.

Significance. If the central claims are substantiated, the work could meaningfully reduce the data-generation bottleneck for neural operators and support development of pretrained PDE foundation models. The combination of Bayesian PINN sampling with Transformer-based operator learning is a coherent extension of existing PINN and neural-operator literature, and the breadth of test problems (including a fluid-structure interaction case) is appropriate for the claim.

major comments (2)
  1. [Abstract] Abstract and validation sections: The central claim that the Bayesian-PINN-based PI-sampler produces training pairs that are both accurate and sufficiently diverse for the FE+Transformer to learn reliable operators without any high-fidelity external data is not accompanied by quantitative support. No L2-error tables, posterior diagnostics, or side-by-side comparisons against finite-difference or finite-element references are referenced for the nonlinear reaction-diffusion or 2D geometry-varying cases.
  2. [Method] Method description of the PI-sampler: The assertion that the physics-informed sampler 'efficiently produce[s] accurate and diverse training data on the fly' is load-bearing for the self-supervised route, yet no convergence metrics, mode-capture diagnostics, or diversity measures (e.g., coverage of the solution manifold) are provided to show that the residual-loss optimization succeeds for the nonlinear problems where Bayesian PINNs are known to be sensitive to initialization and multimodality.
minor comments (2)
  1. [Abstract] The phrase 'lightweight finetuning (O(100) trainable variables)' would benefit from an explicit statement of which subset of parameters is updated and how the O(100) count is obtained.
  2. [Method] Notation for the function encoder (FE) and its interface with the Transformer could be introduced with a small diagram or equation block to clarify the dimensionality reduction step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential of the self-supervised approach. We address each major comment below and have revised the manuscript to incorporate additional quantitative validation where the original submission was lacking.

read point-by-point responses
  1. Referee: [Abstract] Abstract and validation sections: The central claim that the Bayesian-PINN-based PI-sampler produces training pairs that are both accurate and sufficiently diverse for the FE+Transformer to learn reliable operators without any high-fidelity external data is not accompanied by quantitative support. No L2-error tables, posterior diagnostics, or side-by-side comparisons against finite-difference or finite-element references are referenced for the nonlinear reaction-diffusion or 2D geometry-varying cases.

    Authors: We agree that the original manuscript would benefit from explicit quantitative support for the accuracy and diversity of the PI-sampler outputs. In the revised version we have added L2-error tables that directly compare Bayesian-PINN-generated solutions against independent finite-difference and finite-element references for both the 1D nonlinear reaction-diffusion problems and the 2D geometry-varying cases. We have also included posterior diagnostics (Gelman-Rubin statistics, effective sample sizes) and visualizations that quantify the diversity of the sampled training pairs. revision: yes

  2. Referee: [Method] Method description of the PI-sampler: The assertion that the physics-informed sampler 'efficiently produce[s] accurate and diverse training data on the fly' is load-bearing for the self-supervised route, yet no convergence metrics, mode-capture diagnostics, or diversity measures (e.g., coverage of the solution manifold) are provided to show that the residual-loss optimization succeeds for the nonlinear problems where Bayesian PINNs are known to be sensitive to initialization and multimodality.

    Authors: We acknowledge that the known sensitivities of Bayesian PINNs to initialization and multimodality require explicit supporting evidence. The revised manuscript now contains convergence plots of the residual loss, results from multiple independent chains demonstrating mode capture, and quantitative diversity metrics (pairwise solution distances and manifold coverage statistics) for the nonlinear test problems. These additions substantiate that the residual-loss optimization produces sufficiently accurate and diverse training data for the downstream operator learner. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain is self-contained

full rationale

The paper's central method chains a Bayesian-PINN-based PI-sampler to generate training pairs on the fly, followed by a function encoder and encoder-only Transformer to learn the operator mapping. Validation is asserted on external benchmark PDEs (1D reaction-diffusion, 2D variable-geometry, and vortex-induced vibration) with claimed high accuracy and optional lightweight finetuning. No quoted equation or step reduces a prediction to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation whose content is itself unverified within the paper. The approach therefore remains independent of its own outputs and relies on external physics constraints and standard neural-operator components.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The work introduces new architectural components for self-supervised learning but rests on standard assumptions about neural network approximation of PDE solutions and Bayesian inference in PINNs. No explicit free parameters are named in the abstract.

axioms (1)
  • domain assumption Neural networks can approximate solutions to partial differential equations when trained with physics constraints.
    Core premise underlying both PINNs and neural operators, invoked for the sampler and operator learning.
invented entities (2)
  • Physics-informed sampler (PI-sampler) no independent evidence
    purpose: Generate accurate and diverse training data on the fly using Bayesian PINNs
    New component proposed to replace external numerical solvers.
  • Function encoder (FE) no independent evidence
    purpose: Provide compact input-output representations for the operator
    Architectural element introduced as part of SNO.

pith-pipeline@v0.9.0 · 5712 in / 1453 out tokens · 52668 ms · 2026-05-21T22:04:20.409270+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

  1. [1]

    Schunk, Mathematical structure of transport equations for multi- species flows, Reviews of Geophysics 15 (4) (1977) 429–445

    R. Schunk, Mathematical structure of transport equations for multi- species flows, Reviews of Geophysics 15 (4) (1977) 429–445. 30

  2. [2]

    Zhang, L

    D. Zhang, L. Lu, L. Guo, G. E. Karniadakis, Quantifying total uncer- tainty in physics-informed neural networks for solving forward and in- verse stochastic problems, Journal of Computational Physics 397 (2019) 108850

  3. [3]

    Mazumder, Boltzmann transport equation based modeling of phonon heat conduction: progress and challenges, Annual Review of Heat Trans- fer 24 (2021)

    S. Mazumder, Boltzmann transport equation based modeling of phonon heat conduction: progress and challenges, Annual Review of Heat Trans- fer 24 (2021)

  4. [4]

    Q. Lin, C. Zhang, X. Meng, Z. Guo, Monte carlo physics-informed neural networks for multiscale heat conduction via phonon boltzmann transport equation, arXiv preprint arXiv:2408.10965 (2024)

  5. [5]

    Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stu- art, A. Anandkumar, Fourier neural operator for parametric partial dif- ferential equations, arXiv preprint arXiv:2010.08895 (2020)

  6. [6]

    Raissi, P

    M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational physics 378 (2019) 686–707

  7. [7]

    Z. Long, Y. Lu, X. Ma, B. Dong, Pde-net: Learning pdes from data, in: International conference on machine learning, PMLR, 2018, pp. 3208– 3216

  8. [8]

    Sirignano, K

    J. Sirignano, K. Spiliopoulos, Dgm: A deep learning algorithm for solv- ing partial differential equations, Journal of computational physics 375 (2018) 1339–1364

  9. [9]

    Yu, et al., The deep ritz method: a deep learning-based numerical algorithm for solving variational problems, Communications in Mathe- matics and Statistics 6 (1) (2018) 1–12

    B. Yu, et al., The deep ritz method: a deep learning-based numerical algorithm for solving variational problems, Communications in Mathe- matics and Statistics 6 (1) (2018) 1–12

  10. [10]

    L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via deeponet based on the universal approximation theorem of operators, Nature machine intelligence 3 (3) (2021) 218–229

  11. [11]

    Q. Cao, S. Goswami, G. E. Karniadakis, Laplace neural operator for solving differential equations, Nature Machine Intelligence 6 (6) (2024) 631–640. 31

  12. [12]

    Ovadia, A

    O. Ovadia, A. Kahana, P. Stinis, E. Turkel, D. Givoli, G. E. Karniadakis, Vito: Vision transformer-operator, Computer Methods in Applied Me- chanics and Engineering 428 (2024) 117109

  13. [13]

    Cheng, J

    C.-W. Cheng, J. Huang, Y. Zhang, G. Yang, C.-B. Sch¨ onlieb, A. I. Aviles-Rivero, Mamba neural operator: Who wins? transformers vs. state-space models for pdes, arXiv preprint arXiv:2410.02113 (2024)

  14. [14]

    B. Shih, A. Peyvan, Z. Zhang, G. E. Karniadakis, Transformers as neural operators for solutions of differential equations with finite regularity, Computer Methods in Applied Mechanics and Engineering 434 (2025) 117560

  15. [15]

    Cao, Choose a transformer: Fourier or galerkin, Advances in neural information processing systems 34 (2021) 24924–24940

    S. Cao, Choose a transformer: Fourier or galerkin, Advances in neural information processing systems 34 (2021) 24924–24940

  16. [16]

    L. Lu, X. Meng, S. Cai, Z. Mao, S. Goswami, Z. Zhang, G. E. Karni- adakis, A comprehensive and fair comparison of two neural operators (with practical extensions) based on fair data, Computer Methods in Applied Mechanics and Engineering 393 (2022) 114778

  17. [17]

    Kovachki, Z

    N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stu- art, A. Anandkumar, Neural operator: Learning maps between function spaces with applications to pdes, Journal of Machine Learning Research 24 (89) (2023) 1–97

  18. [18]

    Azizzadenesheli, N

    K. Azizzadenesheli, N. Kovachki, Z. Li, M. Liu-Schiaffini, J. Kossaifi, A. Anandkumar, Neural operators for accelerating scientific simulations and design, Nature Reviews Physics 6 (5) (2024) 320–328

  19. [19]

    P. Jin, S. Meng, L. Lu, Mionet: Learning multiple-input operators via tensor product, SIAM Journal on Scientific Computing 44 (6) (2022) A3490–A3514

  20. [20]

    Shukla, V

    K. Shukla, V. Oommen, A. Peyvan, M. Penwarden, N. Plewacki, L. Bravo, A. Ghoshal, R. M. Kirby, G. E. Karniadakis, Deep neural operators as accurate surrogates for shape optimization, Engineering Applications of Artificial Intelligence 129 (2024) 107615

  21. [21]

    Z. Li, N. Kovachki, C. Choy, B. Li, J. Kossaifi, S. Otta, M. A. Nabian, M. Stadler, C. Hundt, K. Azizzadenesheli, et al., Geometry-informed 32 neural operator for large-scale 3d pdes, Advances in Neural Information Processing Systems 36 (2023) 35836–35854

  22. [22]

    Z. Ye, Z. Liu, B. Wu, H. Jiang, L. Chen, M. Zhang, X. Huang, Q. M. Zou, H. Liu, B. Dong, et al., Pdeformer-2: A versatile foundation model for two-dimensional partial differential equations, arXiv preprint arXiv:2507.15409 (2025)

  23. [23]

    Subramanian, P

    S. Subramanian, P. Harrington, K. Keutzer, W. Bhimji, D. Morozov, M. W. Mahoney, A. Gholami, Towards foundation models for scien- tific machine learning: Characterizing scaling and transfer behavior, Advances in Neural Information Processing Systems 36 (2023) 71242– 71262

  24. [24]

    J. Sun, Y. Liu, Z. Zhang, H. Schaeffer, Towards a foundation model for partial differential equations: Multioperator learning and extrapolation, Physical Review E 111 (3) (2025) 035304

  25. [25]

    Herde, B

    M. Herde, B. Raonic, T. Rohner, R. K¨ appeli, R. Molinaro, E. de B´ ezenac, S. Mishra, Poseidon: Efficient foundation models for pdes, Advances in Neural Information Processing Systems 37 (2024) 72525–72624

  26. [26]

    Y. Liu, J. Sun, X. He, G. Pinney, Z. Zhang, H. Schaeffer, Prose-fd: A multimodal pde foundation model for learning multiple operators for forecasting fluid dynamics, arXiv preprint arXiv:2409.09811 (2024)

  27. [27]

    Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred Warmuth

    R. Balestriero, M. Ibrahim, V. Sobal, A. Morcos, S. Shekhar, T. Gold- stein, F. Bordes, A. Bardes, G. Mialon, Y. Tian, et al., A cookbook of self-supervised learning, arXiv preprint arXiv:2304.12210 (2023)

  28. [28]

    X. Yang, Z. Song, I. King, Z. Xu, A survey on deep semi-supervised learning, IEEE transactions on knowledge and data engineering 35 (9) (2022) 8934–8954

  29. [29]

    L. Yang, D. Zhang, G. E. Karniadakis, Physics-informed generative ad- versarial networks for stochastic differential equations, SIAM Journal on Scientific Computing 42 (1) (2020) A292–A317

  30. [30]

    R. M. Neal, Bayesian learning for neural networks, Vol. 118, Springer Science & Business Media, 2012. 33

  31. [31]

    G. Pang, L. Yang, G. E. Karniadakis, Neural-net-induced gaussian pro- cess regression for function approximation and pde solution, Journal of Computational Physics 384 (2019) 270–288

  32. [32]

    Pearce, R

    T. Pearce, R. Tsuchida, M. Zaki, A. Brintrup, A. Neely, Expressive priors in bayesian neural networks: Kernel combinations and periodic functions, in: Uncertainty in artificial intelligence, PMLR, 2020, pp. 134–144

  33. [33]

    C. K. Williams, C. E. Rasmussen, Gaussian processes for machine learn- ing, Vol. 2, MIT press Cambridge, MA, 2006

  34. [34]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017)

  35. [35]

    K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, et al., A survey on vision transformer, IEEE transactions on pattern analysis and machine intelligence 45 (1) (2022) 87–110

  36. [36]

    Kharazmi, D

    E. Kharazmi, D. Fan, Z. Wang, M. S. Triantafyllou, Inferring vortex induced vibrations of flexible cylinders using physics-informed neural networks, Journal of Fluids and Structures 107 (2021) 103367

  37. [37]

    Gehring, M

    J. Gehring, M. Auli, D. Grangier, D. Yarats, Y. N. Dauphin, Convo- lutional sequence to sequence learning, in: International conference on machine learning, PMLR, 2017, pp. 1243–1252. 34