Kernel Neural Operators (KNOs) for Scalable, Memory-efficient, Geometrically-flexible Operator Learning

Akil Narayan; John D. Jakeman; John Turnage; Matthew Lowery; Shandian Zhe; Varun Shankar; Zachary Morrow

arxiv: 2407.00809 · v3 · submitted 2024-06-30 · 💻 cs.LG · cs.NA· math.NA

Kernel Neural Operators (KNOs) for Scalable, Memory-efficient, Geometrically-flexible Operator Learning

Matthew Lowery , John Turnage , Zachary Morrow , John D. Jakeman , Akil Narayan , Shandian Zhe , Varun Shankar This is my paper

Pith reviewed 2026-05-23 23:07 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NA

keywords kernel neural operatoroperator learninguniversal approximationintegral operatorsirregular domainsneural kernelsfunction space approximationmachine learning

0 comments

The pith

The Kernel Neural Operator learns maps between function spaces using compositions of kernel integral operators that are universal approximators and require an order of magnitude fewer parameters than existing neural operators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents the Kernel Neural Operator as an architecture for approximating operators that map functions to functions by composing kernel-based integral operators. The key step is decoupling the choice of kernel from the numerical integration scheme, which permits trainable kernels on irregular domains using appropriate quadrature rules and supports highly expressive neural anisotropic kernels. The authors prove universal approximation theorems for both the continuous and fully discretized versions of the KNO. Experiments on standard benchmarks show training and test accuracy that is comparable to or higher than popular neural operators while using roughly ten times fewer trainable parameters, with the more expressive kernels contributing to the accuracy gains.

Core claim

The Kernel Neural Operator approximates operators by composing deep kernel integral operators, with the key innovation being the decoupling of kernel choice from the quadrature scheme used for numerical integration. This allows explicit specification of trainable kernels, including non-stationary neural anisotropic kernels, and domain-specific integration on irregular geometries. Universal approximation theorems are proven for both the continuous and fully discretized KNO, and numerical experiments confirm competitive or superior performance on operator learning benchmarks with an order of magnitude reduction in the number of parameters.

What carries the argument

Compositions of deep kernel-based integral operators, with kernels specified independently of the quadrature rule for numerical evaluation.

If this is right

KNOs apply directly to irregular domains using domain-specific quadrature without losing approximation guarantees.
Dimension-wise factorization reduces the effect of the curse of dimensionality on regular domains.
Neural anisotropic kernels increase expressivity and help attain higher accuracy.
Memory and parameter requirements drop by roughly an order of magnitude while accuracy remains competitive.
The architecture keeps the implementation simplicity of traditional kernel methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit kernel form could allow direct insertion of domain-specific physical structure into the operator approximator.
Lower memory use might enable operator learning on larger spatial domains or with limited hardware.
The separation of kernel and quadrature opens a route to hybrid methods that combine kernel transparency with deep learning flexibility.
Further tests on complex three-dimensional irregular geometries would clarify how far the geometric flexibility extends.

Load-bearing premise

That separating the kernel specification from the numerical quadrature rule preserves both the universal approximation property and numerical convergence for operator learning on irregular geometries.

What would settle it

A numerical test on an irregular domain where a KNO with an explicitly chosen kernel and matching quadrature rule fails to improve in accuracy as the discretization is refined, or where the claimed order-of-magnitude parameter reduction does not hold against Fourier neural operators on a held-out benchmark.

Figures

Figures reproduced from arXiv: 2407.00809 by Akil Narayan, John D. Jakeman, John Turnage, Matthew Lowery, Shandian Zhe, Varun Shankar, Zachary Morrow.

**Figure 2.** Figure 2: Clustered quadrature points on [0, 1]2 (left) and a reference triangle (right). Consider the discretization of an integral operator R Ω K(x, y)f(y)dµ(y) that acts on a scalar-valued function f : R d → R; the generalization to vector-valued functions is straightforward. Then given a quadrature rule {w q i , y q i } NQ i=1, where w q i ∈ R are quadrature weights and y q i ∈ R d are quadrature points, the qua… view at source ↗

**Figure 3.** Figure 3: Solutions of the Navier-Stokes problem 3.1.3 on a test example. We show the initial [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Solutions of the Darcy (triangular-notch) problem 3.2.2. We show two input functions [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: An illustration of zero-shot super-resolution. The KNO was trained on the Darcy (PWC) [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Eigenvalues of the neural tangent kernel (NTK) for three choices of kernels: (1) Gaussian [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Ablation Study for Burgers’ Equation. On the left the number of trainable kernels [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: On the right is a quadrature rule for the Darcy (triangular-notch) problem, created [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

read the original abstract

This paper introduces the Kernel Neural Operator (KNO), a provably convergent operator-learning architecture that utilizes compositions of deep kernel-based integral operators for function-space approximation of operators (maps from functions to functions). The KNO decouples the choice of kernel from the numerical integration scheme (quadrature), thereby naturally allowing for operator learning with explicitly-chosen trainable kernels on irregular geometries. On irregular domains, this allows the KNO to utilize domain-specific quadrature rules. To help ameliorate the curse of dimensionality, we also leverage an efficient dimension-wise factorization algorithm on regular domains. More importantly, the ability to explicitly specify kernels also allows the use of highly expressive, non-stationary, neural anisotropic kernels whose parameters are computed by training neural networks. We present universal approximation theorems showing that both the continuous and fully discretized KNO are universal approximators on operator learning problems. Numerical results demonstrate that on existing benchmarks the training and test accuracy of KNOs is closely comparable to or higher than that of popular neural operators while typically using an order of magnitude fewer trainable parameters, with the more expressive kernels proving important to attaining high accuracy. KNOs thus facilitate low-memory, geometrically-flexible, deep operator learning, while retaining the implementation simplicity and transparency of traditional kernel methods from both scientific computing and machine learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KNOs decouple trainable non-stationary kernels from quadrature for operator learning on irregular domains and claim UAT plus 10x parameter savings, but the discretized theorem may not automatically survive data-dependent kernel variation.

read the letter

The main point is that this work defines Kernel Neural Operators as compositions of kernel integral operators where the kernel itself is generated by a neural network and can be non-stationary and anisotropic, while the integration rule is chosen separately. That separation is presented as the route to using domain-specific quadrature on irregular geometries without losing the universal approximation property. The paper also reports that the resulting models match or exceed standard neural operators on benchmarks while using roughly an order of magnitude fewer parameters, with the more expressive kernels mattering for the accuracy numbers.

Referee Report

1 major / 2 minor

Summary. The paper introduces Kernel Neural Operators (KNOs) as compositions of deep kernel-based integral operators for learning maps between function spaces. It decouples kernel selection from the quadrature rule to support irregular geometries via domain-specific integration and to permit expressive non-stationary kernels whose parameters are outputs of neural networks. Universal approximation theorems are asserted for both the continuous KNO and its fully discretized version; numerical experiments on standard benchmarks are reported to achieve accuracy comparable to or exceeding popular neural operators while using roughly an order of magnitude fewer trainable parameters.

Significance. Should the universal approximation results hold for trainable non-stationary kernels and the numerical comparisons prove robust, the work would supply a geometrically flexible, low-memory operator-learning framework that retains the transparency of classical kernel methods while adding deep composition and neural parameterization. The explicit separation of kernel and quadrature is a potentially useful design principle if the accompanying convergence theory is complete.

major comments (1)

[universal approximation theorems for the discretized KNO] § on universal approximation theorems for the discretized KNO: the argument that decoupling the kernel from quadrature automatically preserves the UAT and convergence on irregular domains does not address the case in which the kernel is non-stationary and its parameters are neural-network outputs. Standard quadrature error bounds rely on uniform smoothness or Lipschitz constants of the integrand; when these constants become data-dependent and potentially unbounded across layers, the discretization error may accumulate and prevent density in the operator space. A revised statement or additional assumption controlling the variation of the learned kernel is required for the claim to be load-bearing.

minor comments (2)

[Numerical results] Numerical results section: claims of superior accuracy with fewer parameters are presented without reported error bars, explicit baseline implementation details, or data-exclusion criteria, making it difficult to evaluate the statistical reliability of the reported gains.
[Abstract] Abstract and introduction: the phrase 'an order of magnitude fewer trainable parameters' should be tied to a specific table or figure for immediate verification.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on the universal approximation results. The concern regarding error control for non-stationary, neural-parameterized kernels in the discretized setting is well-taken, and we address it directly below.

read point-by-point responses

Referee: the argument that decoupling the kernel from quadrature automatically preserves the UAT and convergence on irregular domains does not address the case in which the kernel is non-stationary and its parameters are neural-network outputs. Standard quadrature error bounds rely on uniform smoothness or Lipschitz constants of the integrand; when these constants become data-dependent and potentially unbounded across layers, the discretization error may accumulate and prevent density in the operator space. A revised statement or additional assumption controlling the variation of the learned kernel is required for the claim to be load-bearing.

Authors: We agree that the current proof sketch for the discretized KNO does not explicitly control the data-dependent Lipschitz constants arising from neural-network outputs for the kernel parameters. The decoupling argument in the manuscript establishes that any fixed continuous kernel can be discretized consistently via domain-specific quadrature, but it does not yet address uniform bounds when the kernel varies across layers and inputs. In the revision we will add an explicit assumption that the neural networks generating kernel parameters produce outputs whose Lipschitz constants are uniformly bounded (e.g., via weight constraints or output clipping), which restores the standard quadrature error estimates. We will also revise the statement of the discretized UAT to include this assumption and supply a short appendix deriving the accumulated discretization error under the new hypothesis. revision: yes

Circularity Check

0 steps flagged

No circularity: UATs derived directly from operator definitions without reduction to fits or self-citations.

full rationale

The paper states universal approximation theorems for the continuous KNO and its fully discretized version as independent mathematical results grounded in the decoupled kernel-quadrature construction. No load-bearing step reduces a claimed prediction or theorem to a data fit, parameter renaming, or prior self-citation; the numerical experiments are presented separately as empirical validation. The derivation chain remains self-contained against external operator-learning benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard assumptions from kernel methods and operator learning plus the new architectural choice of decoupling kernel from quadrature; no new physical entities are postulated.

free parameters (1)

neural-network parameters for kernel definition
Parameters inside the neural networks that output the anisotropic kernel coefficients are learned from data during training.

axioms (1)

domain assumption Compositions of kernel-based integral operators can form universal approximators for continuous operators between function spaces
This is the mathematical foundation invoked for the universal approximation theorems.

pith-pipeline@v0.9.0 · 5789 in / 1354 out tokens · 31280 ms · 2026-05-23T23:07:11.400537+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

KNOs use parameterized, closed-form, finitely-smooth, and compactly-supported kernels with trainable sparsity parameters within the integral operators
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

universal approximation theorems showing that both the continuous and fully discretized KNO are universal approximators

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Enabling Real-Time Training of a Wildfire-to-Smoke Map with Multilinear Operators
cs.LG 2026-05 unverdicted novelty 7.0

A multilinear operator learned on PCA coefficients maps time-since-ignition inputs to smoke outputs, matching Monte Carlo accuracy with half the model calls and outperforming prior classifiers on holdout data.
Fluids You Can Trust: Property-Preserving Operator Learning for Incompressible Flows
physics.flu-dyn 2026-02 conditional novelty 7.0

A kernel operator learning framework constructs property-preserving bases so that predicted incompressible velocity fields satisfy divergence-free and periodicity conditions exactly, delivering up to six orders lower ...

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 2 Pith papers · 2 internal anchors

[1]

A DCOCK , R

B. A DCOCK , R. B. P LATTE , AND A. S HADRIN , Optimal sampling rates for approximating analytic functions from pointwise samples , IMA Journal of Numerical Analysis, 39 (2019), pp. 1360–1390

work page 2019
[2]

B ATLLE , M

P. B ATLLE , M. D ARCY, B. H OSSEINI , AND H. OWHADI , Kernel methods are competitive for operator learning, Journal of Computational Physics, 496 (2024), p. 112549

work page 2024
[3]

B AYONA, N

V. B AYONA, N. F LYER, AND B. F ORNBERG , On the role of polynomials in RBF-FD ap- proximations: III. Behavior near domain boundaries , Journal of Computational Physics, 380 (2019), pp. 378–399

work page 2019
[4]

B. E. B OSER , I. M. G UYON , AND V. N. V APNIK , A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ACM, 1992, pp. 144–152

work page 1992
[5]

D. S. B ROOMHEAD AND D. L OWE, Multivariable functional interpolation and adaptive net- works, Complex Systems, 2 (1988), pp. 321–355

work page 1988
[6]

C ANTWELL , D

C. C ANTWELL , D. M OXEY, A. C OMERFORD , A. B OLIS , G. R OCCO , G. M ENGALDO , D. D E GRAZIA , S. Y AKOVLEV , J.-E. L OMBARD , D. E KELSCHOT , B. J ORDI , H. X U, Y. MOHAMIED , C. E SKILSSON , B. N ELSON , P. VOS, C. B IOTTO , R. K IRBY, AND S. S HER - WIN, Nektar++: An open-source spectral/hp element framework, Computer Physics Commu- nications, 192 (2...

work page 2015
[7]

C ORTES AND V

C. C ORTES AND V. VAPNIK , Support-vector networks, Machine learning, 20 (1995), pp. 273– 297

work page 1995
[8]

C ORTEZ , The method of regularized stokeslets, SIAM Journal on Scientific Computing, 23 (2001), pp

R. C ORTEZ , The method of regularized stokeslets, SIAM Journal on Scientific Computing, 23 (2001), pp. 1204–1225

work page 2001
[9]

G. E. F ASSHAUER , Meshfree Approximation Methods with MATLAB , vol. 6 of Interdisci- plinary Mathematical Sciences, World Scientific, 2007. 10

work page 2007
[10]

G. E. F ASSHAUER AND M. J. M CCOURT, Kernel-based Approximation Methods Using MAT- LAB, vol. 19 of Interdisciplinary Mathematical Sciences, World Scientific, 2015

work page 2015
[11]

F ORNBERG AND N

B. F ORNBERG AND N. F LYER, Solving PDEs with radial basis functions, Acta Numerica, 24 (2015), pp. 215–258

work page 2015
[12]

B. A. F RENO , W. A. J OHNSON , B. F. Z INSER , AND S. C AMPIONE , Symmetric triangle quadrature rules for arbitrary functions , Computers & Mathematics with Applications, 79 (2020), p. 2885–2896

work page 2020
[13]

R. A. G INGOLD AND J. J. M ONAGHAN , Smoothed particle hydrodynamics: theory and appli- cation to non-spherical stars, Monthly Notices of the Royal Astronomical Society, 181 (1977), pp. 375–389

work page 1977
[14]

M. H AN, V. S HANKAR , J. M. P HILLIPS , AND C. Y E, Locally adaptive and differentiable regression, Journal of Machine Learning for Modeling and Computing, 4 (2023), pp. 103–122

work page 2023
[15]

H ENDRYCKS AND K

D. H ENDRYCKS AND K. G IMPEL , Gaussian error linear units (GELUs), 2023

work page 2023
[16]

G. C. H SIAO AND W. L. WENDLAND , Boundary integral equations, vol. 164, Springer, 2008

work page 2008
[17]

P. J IN, S. M ENG , AND L. L U, MIONet: Learning multiple-input operators via tensor product, SIAM Journal on Scientific Computing, 44 (2022), pp. A3490–A3514

work page 2022
[18]

G. E. K ARNIADAKIS AND S. J. S HERWIN , Spectral/hp Element Methods for Computational Fluid Dynamics, Oxford University Press, 2nd ed., 2005

work page 2005
[19]

K ASSEN , A

A. K ASSEN , A. B ARRETT , V. SHANKAR , AND A. L. F OGELSON , Immersed boundary simu- lations of cell-cell interactions in whole blood, Journal of Computational Physics, 469 (2022), p. 111499

work page 2022
[20]

K ASSEN , V

A. K ASSEN , V. S HANKAR , AND A. L. F OGELSON , A fine-grained parallelization of the immersed boundary method, The International Journal of High Performance Computing Ap- plications, 36 (2022), pp. 443–458

work page 2022
[21]

D. P. K INGMA AND J. B A, Adam: A method for stochastic optimization, 2017

work page 2017
[22]

N. B. K OVACHKI , Z. L I, B. L IU, K. A ZIZZADENESHELI , K. B HATTACHARYA , A. M. S TU- ART, AND A. A NANDKUMAR , Neural operator: Learning maps between function spaces , CoRR, abs/2108.08481 (2021)

work page arXiv 2021
[23]

Z. L I, D. Z. H UANG , B. L IU, AND A. A NANDKUMAR , Fourier neural operator with learned deformations for PDEs on general geometries , Journal of Machine Learning Research, 24 (2023), pp. 1–26

work page 2023
[24]

Z. L I, N. K OVACHKI , K. A ZIZZADENESHELI , B. L IU, K. B HATTACHARYA , A. S TUART, AND A. A NANDKUMAR , Multipole graph neural operator for parametric partial differential equations, in Proceedings of the 34th International Conference on Neural Information Process- ing Systems, NIPS ’20, Red Hook, NY , USA, 2020, Curran Associates Inc

work page 2020
[25]

Z. L I, N. K OVACHKI , K. A ZIZZADENESHELI , B. L IU, K. B HATTACHARYA , A. S TUART, AND A. ANANDKUMAR , Fourier neural operator for parametric partial differential equations, 2021

work page 2021
[26]

Z. L I, N. K OVACHKI , C. C HOY, B. L I, J. K OSSAIFI , S. O TTA, M. A. N ABIAN , M. S TADLER , C. H UNDT , K. A ZIZZADENESHELI , ET AL ., Geometry-informed neural oper- ator for large-scale 3d PDEs, Advances in Neural Information Processing Systems, 36 (2024)

work page 2024
[27]

L I, Z ONGYI AND KOVACHKI , N IKOLA AND AZIZZADENESHELI , K AMYAR AND LIU, BURIGEDE AND BHATTACHARYA , K AUSHIK AND STUART, A NDREW AND ANANDKU - MAR , ANIMA , Neural operator: Graph kernel network for partial differential equations, arXiv preprint arXiv:2003.03485, (2020). 11

work page internal anchor Pith review Pith/arXiv arXiv 2003
[28]

L. L U, P. J IN, G. P ANG , Z. Z HANG , AND G. E. K ARNIADAKIS , Learning nonlinear op- erators via DeepONet based on the universal approximation theorem of operators , Nature Machine Intelligence, 3 (2021), p. 218–229

work page 2021
[29]

L. L U, X. M ENG , S. C AI, Z. M AO, S. G OSWAMI , Z. Z HANG , AND G. E. K ARNIADAKIS , A comprehensive and fair comparison of two neural operators (with practical extensions) based on FAIR data, Computer Methods in Applied Mechanics and Engineering, 393 (2022), p. 114778

work page 2022
[30]

L’E CUYER , Randomized quasi-Monte Carlo: An introduction for practitioners , Springer, 2018

P. L’E CUYER , Randomized quasi-Monte Carlo: An introduction for practitioners , Springer, 2018

work page 2018
[31]

A Nonstationary Designer Space-Time Kernel

M. M CCOURT, G. F ASSHAUER , AND D. K OZAK , A nonstationary designer space-time ker- nel, arXiv preprint arXiv:1812.00173, (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

C. S. P ESKIN , The immersed boundary method, Acta Numerica, 11 (2002), pp. 479–517

work page 2002
[33]

P EYVAN , V

A. P EYVAN , V. O OMMEN , A. D. J AGTAP, AND G. E. K ARNIADAKIS , RiemannONets: In- terpretable neural operators for Riemann problems, arXiv preprint arXiv:2401.08886, (2024)

work page arXiv 2024
[34]

R. B. P LATTE , L. N. T REFETHEN , AND A. B. K UIJLAARS , Impossibility of fast stable ap- proximation of analytic functions from equispaced samples, SIAM review, 53 (2011), pp. 308– 318

work page 2011
[35]

C. E. R ASMUSSEN AND C. K. W ILLIAMS , Gaussian Processes for Machine Learning, The MIT Press, 2006

work page 2006
[36]

S CHABACK AND H

R. S CHABACK AND H. W ENDLAND , Kernel techniques: From machine learning to meshless methods, Acta Numerica, 15 (2006), pp. 543–639

work page 2006
[37]

S HANKAR AND A

V. S HANKAR AND A. L. F OGELSON , Hyperviscosity-based stabilization for radial basis function-finite difference (RBF-FD) discretizations of advection-diffusion equations , Journal of Computational Physics, 372 (2018), pp. 616–639

work page 2018
[38]

S HANKAR AND S

V. S HANKAR AND S. D. O LSON , Radial basis function (RBF)-based parametric models for closed and open curves within the method of regularized stokeslets , International Journal for Numerical Methods in Fluids, 79 (2015), pp. 269–289

work page 2015
[39]

S HANKAR , G

V. S HANKAR , G. B. W RIGHT , R. M. K IRBY, AND A. L. F OGELSON , A radial basis function (RBF)-finite difference (FD) method for diffusion and reaction-diffusion equations on surfaces, Journal of Scientific Computing, 60 (2014), pp. 342–368

work page 2014
[40]

S HARMA AND V

R. S HARMA AND V. S HANKAR , Accelerated training of physics-informed neural networks (PINNs) using meshless discretizations , in Advances in Neural Information Processing Sys- tems, vol. 35, Curran Associates, Inc., 2022, pp. 1034–1046

work page 2022
[41]

S OLODSKIKH , A

K. S OLODSKIKH , A. K URBANOV , R. A YDARKHANOV , I. Z HELAVSKAYA , Y. P ARFENOV , D. S ONG , AND S. L EFKIMMIATIS , Integral neural networks, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16113–16122

work page 2023
[42]

W ENDLAND , Piecewise polynomial, positive definite and compactly supported radial func- tions of minimal degree, Advances in Computational Mathematics, 4 (1995), pp

H. W ENDLAND , Piecewise polynomial, positive definite and compactly supported radial func- tions of minimal degree, Advances in Computational Mathematics, 4 (1995), pp. 389–396

work page 1995
[43]

W ENDLAND , Error estimates for interpolation by compactly supported radial basis func- tions of minimal degree, Journal of Approximation Theory, 93 (1998), pp

H. W ENDLAND , Error estimates for interpolation by compactly supported radial basis func- tions of minimal degree, Journal of Approximation Theory, 93 (1998), pp. 258–272

work page 1998
[44]

W ENDLAND , Scattered Data Approximation, Cambridge University Press, 2005

H. W ENDLAND , Scattered Data Approximation, Cambridge University Press, 2005

work page 2005
[45]

A. G. W ILSON AND R. P. A DAMS , Gaussian process kernels for pattern discovery and ex- trapolation, in Proceedings of the 30th International Conference on Machine Learning, S. Das- gupta and D. McAllester, eds., vol. 28 of Proceedings of Machine Learning Research, Atlanta, Georgia, USA, 17–19 Jun 2013, PMLR, pp. 1067–1075. 12

work page 2013
[46]

G. B. W RIGHT AND B. F ORNBERG , Scattered node compact finite difference-type formulas generated from radial basis functions, Journal of Computational Physics, 212 (2006), pp. 99– 123

work page 2006
[47]

Z ECH AND C

J. Z ECH AND C. S CHWAB , Convergence rates of high dimensional smolyak quadrature , ESAIM: Mathematical Modelling and Numerical Analysis, 54 (2020), pp. 1259–1307

work page 2020
[48]

Z HANG , L

Z. Z HANG , L. W ING TAT, AND H. S CHAEFFER , BelNet: Basis enhanced learning, a mesh- free neural operator, Proceedings of the Royal Society A: Mathematical, Physical and Engi- neering Sciences, 479 (2023), p. 20230043. 13 A Appendix A.1 Zero-shot super-resolution As every layer in the KNO is composed of function-space operations, the KNO can achieve zer...

work page 2023
[49]

Gaussians everywhere (overfitting): When isotropic Gaussian kernelsϕ(x, x′) = eϵ2∥x−y∥2 2 were used throughout the KNO, we found that the resulting architecture tended to achieve low training error and high test error, while also being highly sensitive to the initial random seed used to optimize the KNO

work page
[50]

This experiment revealed to us that using a kernel that was not compactly- supported for the final integral operator was important for accuracy

Wendland everywhere (higher training and test errors): When we used Wendland kernels everywhere, we found that the resulting architecture had significantly higher training and test errors than using Wendland kernels almost everywhere and a spectral mixture kernel at the end. This experiment revealed to us that using a kernel that was not compactly- suppor...

work page
[51]

Wendland almost-everywhere, Gaussian for I p q L: This choice of kernels produced ex- cellent training and test accuracy and was relatively robust to choices in the other hyperpa- rameters, but produced higher errors than using the spectral mixture kernel for I p q L. In order to quantify the differences between these choices, we computed the eigenvalue s...

work page
[52]

We simply transformed the Gauss-Legendre points to the domain of interest in this case

As was mentioned previously, all 1D examples used Gauss-Legendre points defined on [−1, 1]. We simply transformed the Gauss-Legendre points to the domain of interest in this case

work page
[53]

For the Darcy (PWC) and Navier-Stokes problems, we subdivided the domain [0, 1]2 into four squares, then further subdivided each square into two triangles, for a total of eight triangles

work page
[54]

For the Darcy (cont.) problem, we simply used two triangles

work page
[55]

16 Table 3: This table denotes our chosen configuration for KNO on each dataset

For the Darcy (triangular-notch) problem, we created a five triangle Delaunay mesh over the whole domain; see Figure 8. 16 Table 3: This table denotes our chosen configuration for KNO on each dataset. An asterisk indicates a hyperparameter that when increased also increases the total number of trainable parameters. Here XQ is the total number of quadratur...

work page
[56]

A.5.3 Hyperparameter choices The optimal hyperparameters for the KNO on each dataset are shown in Table 3

For the Darcy (triangle) problem, the domain matched our reference triangle, and so no further subdivision or mapping was used. A.5.3 Hyperparameter choices The optimal hyperparameters for the KNO on each dataset are shown in Table 3. These hyperpa- rameters were tuned manually via trial and error. The following are some relevant observations: (1) Setting...

work page 2080
[57]

dgFNO+” in Table 1 directly using the numbers from [29]. However, that work unfortunately does not describe the FNO or “dgFNO+

We found that freeze-training (i.e. training kernel-based layers independently back to front) prior to training the full model hastened its convergence and so used this tactic quite often for the sake of convenience. More specifically, for a certain number of epochs, we allowed only a single layer to affect gradient updates, effectively freezing all other...

work page

[1] [1]

A DCOCK , R

B. A DCOCK , R. B. P LATTE , AND A. S HADRIN , Optimal sampling rates for approximating analytic functions from pointwise samples , IMA Journal of Numerical Analysis, 39 (2019), pp. 1360–1390

work page 2019

[2] [2]

B ATLLE , M

P. B ATLLE , M. D ARCY, B. H OSSEINI , AND H. OWHADI , Kernel methods are competitive for operator learning, Journal of Computational Physics, 496 (2024), p. 112549

work page 2024

[3] [3]

B AYONA, N

V. B AYONA, N. F LYER, AND B. F ORNBERG , On the role of polynomials in RBF-FD ap- proximations: III. Behavior near domain boundaries , Journal of Computational Physics, 380 (2019), pp. 378–399

work page 2019

[4] [4]

B. E. B OSER , I. M. G UYON , AND V. N. V APNIK , A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ACM, 1992, pp. 144–152

work page 1992

[5] [5]

D. S. B ROOMHEAD AND D. L OWE, Multivariable functional interpolation and adaptive net- works, Complex Systems, 2 (1988), pp. 321–355

work page 1988

[6] [6]

C ANTWELL , D

C. C ANTWELL , D. M OXEY, A. C OMERFORD , A. B OLIS , G. R OCCO , G. M ENGALDO , D. D E GRAZIA , S. Y AKOVLEV , J.-E. L OMBARD , D. E KELSCHOT , B. J ORDI , H. X U, Y. MOHAMIED , C. E SKILSSON , B. N ELSON , P. VOS, C. B IOTTO , R. K IRBY, AND S. S HER - WIN, Nektar++: An open-source spectral/hp element framework, Computer Physics Commu- nications, 192 (2...

work page 2015

[7] [7]

C ORTES AND V

C. C ORTES AND V. VAPNIK , Support-vector networks, Machine learning, 20 (1995), pp. 273– 297

work page 1995

[8] [8]

C ORTEZ , The method of regularized stokeslets, SIAM Journal on Scientific Computing, 23 (2001), pp

R. C ORTEZ , The method of regularized stokeslets, SIAM Journal on Scientific Computing, 23 (2001), pp. 1204–1225

work page 2001

[9] [9]

G. E. F ASSHAUER , Meshfree Approximation Methods with MATLAB , vol. 6 of Interdisci- plinary Mathematical Sciences, World Scientific, 2007. 10

work page 2007

[10] [10]

G. E. F ASSHAUER AND M. J. M CCOURT, Kernel-based Approximation Methods Using MAT- LAB, vol. 19 of Interdisciplinary Mathematical Sciences, World Scientific, 2015

work page 2015

[11] [11]

F ORNBERG AND N

B. F ORNBERG AND N. F LYER, Solving PDEs with radial basis functions, Acta Numerica, 24 (2015), pp. 215–258

work page 2015

[12] [12]

B. A. F RENO , W. A. J OHNSON , B. F. Z INSER , AND S. C AMPIONE , Symmetric triangle quadrature rules for arbitrary functions , Computers & Mathematics with Applications, 79 (2020), p. 2885–2896

work page 2020

[13] [13]

R. A. G INGOLD AND J. J. M ONAGHAN , Smoothed particle hydrodynamics: theory and appli- cation to non-spherical stars, Monthly Notices of the Royal Astronomical Society, 181 (1977), pp. 375–389

work page 1977

[14] [14]

M. H AN, V. S HANKAR , J. M. P HILLIPS , AND C. Y E, Locally adaptive and differentiable regression, Journal of Machine Learning for Modeling and Computing, 4 (2023), pp. 103–122

work page 2023

[15] [15]

H ENDRYCKS AND K

D. H ENDRYCKS AND K. G IMPEL , Gaussian error linear units (GELUs), 2023

work page 2023

[16] [16]

G. C. H SIAO AND W. L. WENDLAND , Boundary integral equations, vol. 164, Springer, 2008

work page 2008

[17] [17]

P. J IN, S. M ENG , AND L. L U, MIONet: Learning multiple-input operators via tensor product, SIAM Journal on Scientific Computing, 44 (2022), pp. A3490–A3514

work page 2022

[18] [18]

G. E. K ARNIADAKIS AND S. J. S HERWIN , Spectral/hp Element Methods for Computational Fluid Dynamics, Oxford University Press, 2nd ed., 2005

work page 2005

[19] [19]

K ASSEN , A

A. K ASSEN , A. B ARRETT , V. SHANKAR , AND A. L. F OGELSON , Immersed boundary simu- lations of cell-cell interactions in whole blood, Journal of Computational Physics, 469 (2022), p. 111499

work page 2022

[20] [20]

K ASSEN , V

A. K ASSEN , V. S HANKAR , AND A. L. F OGELSON , A fine-grained parallelization of the immersed boundary method, The International Journal of High Performance Computing Ap- plications, 36 (2022), pp. 443–458

work page 2022

[21] [21]

D. P. K INGMA AND J. B A, Adam: A method for stochastic optimization, 2017

work page 2017

[22] [22]

N. B. K OVACHKI , Z. L I, B. L IU, K. A ZIZZADENESHELI , K. B HATTACHARYA , A. M. S TU- ART, AND A. A NANDKUMAR , Neural operator: Learning maps between function spaces , CoRR, abs/2108.08481 (2021)

work page arXiv 2021

[23] [23]

Z. L I, D. Z. H UANG , B. L IU, AND A. A NANDKUMAR , Fourier neural operator with learned deformations for PDEs on general geometries , Journal of Machine Learning Research, 24 (2023), pp. 1–26

work page 2023

[24] [24]

Z. L I, N. K OVACHKI , K. A ZIZZADENESHELI , B. L IU, K. B HATTACHARYA , A. S TUART, AND A. A NANDKUMAR , Multipole graph neural operator for parametric partial differential equations, in Proceedings of the 34th International Conference on Neural Information Process- ing Systems, NIPS ’20, Red Hook, NY , USA, 2020, Curran Associates Inc

work page 2020

[25] [25]

Z. L I, N. K OVACHKI , K. A ZIZZADENESHELI , B. L IU, K. B HATTACHARYA , A. S TUART, AND A. ANANDKUMAR , Fourier neural operator for parametric partial differential equations, 2021

work page 2021

[26] [26]

Z. L I, N. K OVACHKI , C. C HOY, B. L I, J. K OSSAIFI , S. O TTA, M. A. N ABIAN , M. S TADLER , C. H UNDT , K. A ZIZZADENESHELI , ET AL ., Geometry-informed neural oper- ator for large-scale 3d PDEs, Advances in Neural Information Processing Systems, 36 (2024)

work page 2024

[27] [27]

L I, Z ONGYI AND KOVACHKI , N IKOLA AND AZIZZADENESHELI , K AMYAR AND LIU, BURIGEDE AND BHATTACHARYA , K AUSHIK AND STUART, A NDREW AND ANANDKU - MAR , ANIMA , Neural operator: Graph kernel network for partial differential equations, arXiv preprint arXiv:2003.03485, (2020). 11

work page internal anchor Pith review Pith/arXiv arXiv 2003

[28] [28]

L. L U, P. J IN, G. P ANG , Z. Z HANG , AND G. E. K ARNIADAKIS , Learning nonlinear op- erators via DeepONet based on the universal approximation theorem of operators , Nature Machine Intelligence, 3 (2021), p. 218–229

work page 2021

[29] [29]

L. L U, X. M ENG , S. C AI, Z. M AO, S. G OSWAMI , Z. Z HANG , AND G. E. K ARNIADAKIS , A comprehensive and fair comparison of two neural operators (with practical extensions) based on FAIR data, Computer Methods in Applied Mechanics and Engineering, 393 (2022), p. 114778

work page 2022

[30] [30]

L’E CUYER , Randomized quasi-Monte Carlo: An introduction for practitioners , Springer, 2018

P. L’E CUYER , Randomized quasi-Monte Carlo: An introduction for practitioners , Springer, 2018

work page 2018

[31] [31]

A Nonstationary Designer Space-Time Kernel

M. M CCOURT, G. F ASSHAUER , AND D. K OZAK , A nonstationary designer space-time ker- nel, arXiv preprint arXiv:1812.00173, (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

C. S. P ESKIN , The immersed boundary method, Acta Numerica, 11 (2002), pp. 479–517

work page 2002

[33] [33]

P EYVAN , V

A. P EYVAN , V. O OMMEN , A. D. J AGTAP, AND G. E. K ARNIADAKIS , RiemannONets: In- terpretable neural operators for Riemann problems, arXiv preprint arXiv:2401.08886, (2024)

work page arXiv 2024

[34] [34]

R. B. P LATTE , L. N. T REFETHEN , AND A. B. K UIJLAARS , Impossibility of fast stable ap- proximation of analytic functions from equispaced samples, SIAM review, 53 (2011), pp. 308– 318

work page 2011

[35] [35]

C. E. R ASMUSSEN AND C. K. W ILLIAMS , Gaussian Processes for Machine Learning, The MIT Press, 2006

work page 2006

[36] [36]

S CHABACK AND H

R. S CHABACK AND H. W ENDLAND , Kernel techniques: From machine learning to meshless methods, Acta Numerica, 15 (2006), pp. 543–639

work page 2006

[37] [37]

S HANKAR AND A

V. S HANKAR AND A. L. F OGELSON , Hyperviscosity-based stabilization for radial basis function-finite difference (RBF-FD) discretizations of advection-diffusion equations , Journal of Computational Physics, 372 (2018), pp. 616–639

work page 2018

[38] [38]

S HANKAR AND S

V. S HANKAR AND S. D. O LSON , Radial basis function (RBF)-based parametric models for closed and open curves within the method of regularized stokeslets , International Journal for Numerical Methods in Fluids, 79 (2015), pp. 269–289

work page 2015

[39] [39]

S HANKAR , G

V. S HANKAR , G. B. W RIGHT , R. M. K IRBY, AND A. L. F OGELSON , A radial basis function (RBF)-finite difference (FD) method for diffusion and reaction-diffusion equations on surfaces, Journal of Scientific Computing, 60 (2014), pp. 342–368

work page 2014

[40] [40]

S HARMA AND V

R. S HARMA AND V. S HANKAR , Accelerated training of physics-informed neural networks (PINNs) using meshless discretizations , in Advances in Neural Information Processing Sys- tems, vol. 35, Curran Associates, Inc., 2022, pp. 1034–1046

work page 2022

[41] [41]

S OLODSKIKH , A

K. S OLODSKIKH , A. K URBANOV , R. A YDARKHANOV , I. Z HELAVSKAYA , Y. P ARFENOV , D. S ONG , AND S. L EFKIMMIATIS , Integral neural networks, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16113–16122

work page 2023

[42] [42]

W ENDLAND , Piecewise polynomial, positive definite and compactly supported radial func- tions of minimal degree, Advances in Computational Mathematics, 4 (1995), pp

H. W ENDLAND , Piecewise polynomial, positive definite and compactly supported radial func- tions of minimal degree, Advances in Computational Mathematics, 4 (1995), pp. 389–396

work page 1995

[43] [43]

W ENDLAND , Error estimates for interpolation by compactly supported radial basis func- tions of minimal degree, Journal of Approximation Theory, 93 (1998), pp

H. W ENDLAND , Error estimates for interpolation by compactly supported radial basis func- tions of minimal degree, Journal of Approximation Theory, 93 (1998), pp. 258–272

work page 1998

[44] [44]

W ENDLAND , Scattered Data Approximation, Cambridge University Press, 2005

H. W ENDLAND , Scattered Data Approximation, Cambridge University Press, 2005

work page 2005

[45] [45]

A. G. W ILSON AND R. P. A DAMS , Gaussian process kernels for pattern discovery and ex- trapolation, in Proceedings of the 30th International Conference on Machine Learning, S. Das- gupta and D. McAllester, eds., vol. 28 of Proceedings of Machine Learning Research, Atlanta, Georgia, USA, 17–19 Jun 2013, PMLR, pp. 1067–1075. 12

work page 2013

[46] [46]

G. B. W RIGHT AND B. F ORNBERG , Scattered node compact finite difference-type formulas generated from radial basis functions, Journal of Computational Physics, 212 (2006), pp. 99– 123

work page 2006

[47] [47]

Z ECH AND C

J. Z ECH AND C. S CHWAB , Convergence rates of high dimensional smolyak quadrature , ESAIM: Mathematical Modelling and Numerical Analysis, 54 (2020), pp. 1259–1307

work page 2020

[48] [48]

Z HANG , L

Z. Z HANG , L. W ING TAT, AND H. S CHAEFFER , BelNet: Basis enhanced learning, a mesh- free neural operator, Proceedings of the Royal Society A: Mathematical, Physical and Engi- neering Sciences, 479 (2023), p. 20230043. 13 A Appendix A.1 Zero-shot super-resolution As every layer in the KNO is composed of function-space operations, the KNO can achieve zer...

work page 2023

[49] [49]

Gaussians everywhere (overfitting): When isotropic Gaussian kernelsϕ(x, x′) = eϵ2∥x−y∥2 2 were used throughout the KNO, we found that the resulting architecture tended to achieve low training error and high test error, while also being highly sensitive to the initial random seed used to optimize the KNO

work page

[50] [50]

This experiment revealed to us that using a kernel that was not compactly- supported for the final integral operator was important for accuracy

Wendland everywhere (higher training and test errors): When we used Wendland kernels everywhere, we found that the resulting architecture had significantly higher training and test errors than using Wendland kernels almost everywhere and a spectral mixture kernel at the end. This experiment revealed to us that using a kernel that was not compactly- suppor...

work page

[51] [51]

Wendland almost-everywhere, Gaussian for I p q L: This choice of kernels produced ex- cellent training and test accuracy and was relatively robust to choices in the other hyperpa- rameters, but produced higher errors than using the spectral mixture kernel for I p q L. In order to quantify the differences between these choices, we computed the eigenvalue s...

work page

[52] [52]

We simply transformed the Gauss-Legendre points to the domain of interest in this case

As was mentioned previously, all 1D examples used Gauss-Legendre points defined on [−1, 1]. We simply transformed the Gauss-Legendre points to the domain of interest in this case

work page

[53] [53]

For the Darcy (PWC) and Navier-Stokes problems, we subdivided the domain [0, 1]2 into four squares, then further subdivided each square into two triangles, for a total of eight triangles

work page

[54] [54]

For the Darcy (cont.) problem, we simply used two triangles

work page

[55] [55]

16 Table 3: This table denotes our chosen configuration for KNO on each dataset

For the Darcy (triangular-notch) problem, we created a five triangle Delaunay mesh over the whole domain; see Figure 8. 16 Table 3: This table denotes our chosen configuration for KNO on each dataset. An asterisk indicates a hyperparameter that when increased also increases the total number of trainable parameters. Here XQ is the total number of quadratur...

work page

[56] [56]

A.5.3 Hyperparameter choices The optimal hyperparameters for the KNO on each dataset are shown in Table 3

For the Darcy (triangle) problem, the domain matched our reference triangle, and so no further subdivision or mapping was used. A.5.3 Hyperparameter choices The optimal hyperparameters for the KNO on each dataset are shown in Table 3. These hyperpa- rameters were tuned manually via trial and error. The following are some relevant observations: (1) Setting...

work page 2080

[57] [57]

dgFNO+” in Table 1 directly using the numbers from [29]. However, that work unfortunately does not describe the FNO or “dgFNO+

We found that freeze-training (i.e. training kernel-based layers independently back to front) prior to training the full model hastened its convergence and so used this tactic quite often for the sake of convenience. More specifically, for a certain number of epochs, we allowed only a single layer to affect gradient updates, effectively freezing all other...

work page