Kernel Neural Operators (KNOs) for Scalable, Memory-efficient, Geometrically-flexible Operator Learning
Pith reviewed 2026-05-23 23:07 UTC · model grok-4.3
The pith
The Kernel Neural Operator learns maps between function spaces using compositions of kernel integral operators that are universal approximators and require an order of magnitude fewer parameters than existing neural operators.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Kernel Neural Operator approximates operators by composing deep kernel integral operators, with the key innovation being the decoupling of kernel choice from the quadrature scheme used for numerical integration. This allows explicit specification of trainable kernels, including non-stationary neural anisotropic kernels, and domain-specific integration on irregular geometries. Universal approximation theorems are proven for both the continuous and fully discretized KNO, and numerical experiments confirm competitive or superior performance on operator learning benchmarks with an order of magnitude reduction in the number of parameters.
What carries the argument
Compositions of deep kernel-based integral operators, with kernels specified independently of the quadrature rule for numerical evaluation.
If this is right
- KNOs apply directly to irregular domains using domain-specific quadrature without losing approximation guarantees.
- Dimension-wise factorization reduces the effect of the curse of dimensionality on regular domains.
- Neural anisotropic kernels increase expressivity and help attain higher accuracy.
- Memory and parameter requirements drop by roughly an order of magnitude while accuracy remains competitive.
- The architecture keeps the implementation simplicity of traditional kernel methods.
Where Pith is reading between the lines
- The explicit kernel form could allow direct insertion of domain-specific physical structure into the operator approximator.
- Lower memory use might enable operator learning on larger spatial domains or with limited hardware.
- The separation of kernel and quadrature opens a route to hybrid methods that combine kernel transparency with deep learning flexibility.
- Further tests on complex three-dimensional irregular geometries would clarify how far the geometric flexibility extends.
Load-bearing premise
That separating the kernel specification from the numerical quadrature rule preserves both the universal approximation property and numerical convergence for operator learning on irregular geometries.
What would settle it
A numerical test on an irregular domain where a KNO with an explicitly chosen kernel and matching quadrature rule fails to improve in accuracy as the discretization is refined, or where the claimed order-of-magnitude parameter reduction does not hold against Fourier neural operators on a held-out benchmark.
Figures
read the original abstract
This paper introduces the Kernel Neural Operator (KNO), a provably convergent operator-learning architecture that utilizes compositions of deep kernel-based integral operators for function-space approximation of operators (maps from functions to functions). The KNO decouples the choice of kernel from the numerical integration scheme (quadrature), thereby naturally allowing for operator learning with explicitly-chosen trainable kernels on irregular geometries. On irregular domains, this allows the KNO to utilize domain-specific quadrature rules. To help ameliorate the curse of dimensionality, we also leverage an efficient dimension-wise factorization algorithm on regular domains. More importantly, the ability to explicitly specify kernels also allows the use of highly expressive, non-stationary, neural anisotropic kernels whose parameters are computed by training neural networks. We present universal approximation theorems showing that both the continuous and fully discretized KNO are universal approximators on operator learning problems. Numerical results demonstrate that on existing benchmarks the training and test accuracy of KNOs is closely comparable to or higher than that of popular neural operators while typically using an order of magnitude fewer trainable parameters, with the more expressive kernels proving important to attaining high accuracy. KNOs thus facilitate low-memory, geometrically-flexible, deep operator learning, while retaining the implementation simplicity and transparency of traditional kernel methods from both scientific computing and machine learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Kernel Neural Operators (KNOs) as compositions of deep kernel-based integral operators for learning maps between function spaces. It decouples kernel selection from the quadrature rule to support irregular geometries via domain-specific integration and to permit expressive non-stationary kernels whose parameters are outputs of neural networks. Universal approximation theorems are asserted for both the continuous KNO and its fully discretized version; numerical experiments on standard benchmarks are reported to achieve accuracy comparable to or exceeding popular neural operators while using roughly an order of magnitude fewer trainable parameters.
Significance. Should the universal approximation results hold for trainable non-stationary kernels and the numerical comparisons prove robust, the work would supply a geometrically flexible, low-memory operator-learning framework that retains the transparency of classical kernel methods while adding deep composition and neural parameterization. The explicit separation of kernel and quadrature is a potentially useful design principle if the accompanying convergence theory is complete.
major comments (1)
- [universal approximation theorems for the discretized KNO] § on universal approximation theorems for the discretized KNO: the argument that decoupling the kernel from quadrature automatically preserves the UAT and convergence on irregular domains does not address the case in which the kernel is non-stationary and its parameters are neural-network outputs. Standard quadrature error bounds rely on uniform smoothness or Lipschitz constants of the integrand; when these constants become data-dependent and potentially unbounded across layers, the discretization error may accumulate and prevent density in the operator space. A revised statement or additional assumption controlling the variation of the learned kernel is required for the claim to be load-bearing.
minor comments (2)
- [Numerical results] Numerical results section: claims of superior accuracy with fewer parameters are presented without reported error bars, explicit baseline implementation details, or data-exclusion criteria, making it difficult to evaluate the statistical reliability of the reported gains.
- [Abstract] Abstract and introduction: the phrase 'an order of magnitude fewer trainable parameters' should be tied to a specific table or figure for immediate verification.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on the universal approximation results. The concern regarding error control for non-stationary, neural-parameterized kernels in the discretized setting is well-taken, and we address it directly below.
read point-by-point responses
-
Referee: the argument that decoupling the kernel from quadrature automatically preserves the UAT and convergence on irregular domains does not address the case in which the kernel is non-stationary and its parameters are neural-network outputs. Standard quadrature error bounds rely on uniform smoothness or Lipschitz constants of the integrand; when these constants become data-dependent and potentially unbounded across layers, the discretization error may accumulate and prevent density in the operator space. A revised statement or additional assumption controlling the variation of the learned kernel is required for the claim to be load-bearing.
Authors: We agree that the current proof sketch for the discretized KNO does not explicitly control the data-dependent Lipschitz constants arising from neural-network outputs for the kernel parameters. The decoupling argument in the manuscript establishes that any fixed continuous kernel can be discretized consistently via domain-specific quadrature, but it does not yet address uniform bounds when the kernel varies across layers and inputs. In the revision we will add an explicit assumption that the neural networks generating kernel parameters produce outputs whose Lipschitz constants are uniformly bounded (e.g., via weight constraints or output clipping), which restores the standard quadrature error estimates. We will also revise the statement of the discretized UAT to include this assumption and supply a short appendix deriving the accumulated discretization error under the new hypothesis. revision: yes
Circularity Check
No circularity: UATs derived directly from operator definitions without reduction to fits or self-citations.
full rationale
The paper states universal approximation theorems for the continuous KNO and its fully discretized version as independent mathematical results grounded in the decoupled kernel-quadrature construction. No load-bearing step reduces a claimed prediction or theorem to a data fit, parameter renaming, or prior self-citation; the numerical experiments are presented separately as empirical validation. The derivation chain remains self-contained against external operator-learning benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural-network parameters for kernel definition
axioms (1)
- domain assumption Compositions of kernel-based integral operators can form universal approximators for continuous operators between function spaces
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
KNOs use parameterized, closed-form, finitely-smooth, and compactly-supported kernels with trainable sparsity parameters within the integral operators
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
universal approximation theorems showing that both the continuous and fully discretized KNO are universal approximators
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Enabling Real-Time Training of a Wildfire-to-Smoke Map with Multilinear Operators
A multilinear operator learned on PCA coefficients maps time-since-ignition inputs to smoke outputs, matching Monte Carlo accuracy with half the model calls and outperforming prior classifiers on holdout data.
-
Fluids You Can Trust: Property-Preserving Operator Learning for Incompressible Flows
A kernel operator learning framework constructs property-preserving bases so that predicted incompressible velocity fields satisfy divergence-free and periodicity conditions exactly, delivering up to six orders lower ...
Reference graph
Works this paper leans on
-
[1]
B. A DCOCK , R. B. P LATTE , AND A. S HADRIN , Optimal sampling rates for approximating analytic functions from pointwise samples , IMA Journal of Numerical Analysis, 39 (2019), pp. 1360–1390
work page 2019
-
[2]
P. B ATLLE , M. D ARCY, B. H OSSEINI , AND H. OWHADI , Kernel methods are competitive for operator learning, Journal of Computational Physics, 496 (2024), p. 112549
work page 2024
-
[3]
V. B AYONA, N. F LYER, AND B. F ORNBERG , On the role of polynomials in RBF-FD ap- proximations: III. Behavior near domain boundaries , Journal of Computational Physics, 380 (2019), pp. 378–399
work page 2019
-
[4]
B. E. B OSER , I. M. G UYON , AND V. N. V APNIK , A training algorithm for optimal margin classifiers, in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ACM, 1992, pp. 144–152
work page 1992
-
[5]
D. S. B ROOMHEAD AND D. L OWE, Multivariable functional interpolation and adaptive net- works, Complex Systems, 2 (1988), pp. 321–355
work page 1988
-
[6]
C. C ANTWELL , D. M OXEY, A. C OMERFORD , A. B OLIS , G. R OCCO , G. M ENGALDO , D. D E GRAZIA , S. Y AKOVLEV , J.-E. L OMBARD , D. E KELSCHOT , B. J ORDI , H. X U, Y. MOHAMIED , C. E SKILSSON , B. N ELSON , P. VOS, C. B IOTTO , R. K IRBY, AND S. S HER - WIN, Nektar++: An open-source spectral/hp element framework, Computer Physics Commu- nications, 192 (2...
work page 2015
-
[7]
C. C ORTES AND V. VAPNIK , Support-vector networks, Machine learning, 20 (1995), pp. 273– 297
work page 1995
-
[8]
C ORTEZ , The method of regularized stokeslets, SIAM Journal on Scientific Computing, 23 (2001), pp
R. C ORTEZ , The method of regularized stokeslets, SIAM Journal on Scientific Computing, 23 (2001), pp. 1204–1225
work page 2001
-
[9]
G. E. F ASSHAUER , Meshfree Approximation Methods with MATLAB , vol. 6 of Interdisci- plinary Mathematical Sciences, World Scientific, 2007. 10
work page 2007
-
[10]
G. E. F ASSHAUER AND M. J. M CCOURT, Kernel-based Approximation Methods Using MAT- LAB, vol. 19 of Interdisciplinary Mathematical Sciences, World Scientific, 2015
work page 2015
-
[11]
B. F ORNBERG AND N. F LYER, Solving PDEs with radial basis functions, Acta Numerica, 24 (2015), pp. 215–258
work page 2015
-
[12]
B. A. F RENO , W. A. J OHNSON , B. F. Z INSER , AND S. C AMPIONE , Symmetric triangle quadrature rules for arbitrary functions , Computers & Mathematics with Applications, 79 (2020), p. 2885–2896
work page 2020
-
[13]
R. A. G INGOLD AND J. J. M ONAGHAN , Smoothed particle hydrodynamics: theory and appli- cation to non-spherical stars, Monthly Notices of the Royal Astronomical Society, 181 (1977), pp. 375–389
work page 1977
-
[14]
M. H AN, V. S HANKAR , J. M. P HILLIPS , AND C. Y E, Locally adaptive and differentiable regression, Journal of Machine Learning for Modeling and Computing, 4 (2023), pp. 103–122
work page 2023
-
[15]
D. H ENDRYCKS AND K. G IMPEL , Gaussian error linear units (GELUs), 2023
work page 2023
-
[16]
G. C. H SIAO AND W. L. WENDLAND , Boundary integral equations, vol. 164, Springer, 2008
work page 2008
-
[17]
P. J IN, S. M ENG , AND L. L U, MIONet: Learning multiple-input operators via tensor product, SIAM Journal on Scientific Computing, 44 (2022), pp. A3490–A3514
work page 2022
-
[18]
G. E. K ARNIADAKIS AND S. J. S HERWIN , Spectral/hp Element Methods for Computational Fluid Dynamics, Oxford University Press, 2nd ed., 2005
work page 2005
-
[19]
A. K ASSEN , A. B ARRETT , V. SHANKAR , AND A. L. F OGELSON , Immersed boundary simu- lations of cell-cell interactions in whole blood, Journal of Computational Physics, 469 (2022), p. 111499
work page 2022
-
[20]
A. K ASSEN , V. S HANKAR , AND A. L. F OGELSON , A fine-grained parallelization of the immersed boundary method, The International Journal of High Performance Computing Ap- plications, 36 (2022), pp. 443–458
work page 2022
-
[21]
D. P. K INGMA AND J. B A, Adam: A method for stochastic optimization, 2017
work page 2017
- [22]
-
[23]
Z. L I, D. Z. H UANG , B. L IU, AND A. A NANDKUMAR , Fourier neural operator with learned deformations for PDEs on general geometries , Journal of Machine Learning Research, 24 (2023), pp. 1–26
work page 2023
-
[24]
Z. L I, N. K OVACHKI , K. A ZIZZADENESHELI , B. L IU, K. B HATTACHARYA , A. S TUART, AND A. A NANDKUMAR , Multipole graph neural operator for parametric partial differential equations, in Proceedings of the 34th International Conference on Neural Information Process- ing Systems, NIPS ’20, Red Hook, NY , USA, 2020, Curran Associates Inc
work page 2020
-
[25]
Z. L I, N. K OVACHKI , K. A ZIZZADENESHELI , B. L IU, K. B HATTACHARYA , A. S TUART, AND A. ANANDKUMAR , Fourier neural operator for parametric partial differential equations, 2021
work page 2021
-
[26]
Z. L I, N. K OVACHKI , C. C HOY, B. L I, J. K OSSAIFI , S. O TTA, M. A. N ABIAN , M. S TADLER , C. H UNDT , K. A ZIZZADENESHELI , ET AL ., Geometry-informed neural oper- ator for large-scale 3d PDEs, Advances in Neural Information Processing Systems, 36 (2024)
work page 2024
-
[27]
L I, Z ONGYI AND KOVACHKI , N IKOLA AND AZIZZADENESHELI , K AMYAR AND LIU, BURIGEDE AND BHATTACHARYA , K AUSHIK AND STUART, A NDREW AND ANANDKU - MAR , ANIMA , Neural operator: Graph kernel network for partial differential equations, arXiv preprint arXiv:2003.03485, (2020). 11
work page internal anchor Pith review Pith/arXiv arXiv 2003
-
[28]
L. L U, P. J IN, G. P ANG , Z. Z HANG , AND G. E. K ARNIADAKIS , Learning nonlinear op- erators via DeepONet based on the universal approximation theorem of operators , Nature Machine Intelligence, 3 (2021), p. 218–229
work page 2021
-
[29]
L. L U, X. M ENG , S. C AI, Z. M AO, S. G OSWAMI , Z. Z HANG , AND G. E. K ARNIADAKIS , A comprehensive and fair comparison of two neural operators (with practical extensions) based on FAIR data, Computer Methods in Applied Mechanics and Engineering, 393 (2022), p. 114778
work page 2022
-
[30]
L’E CUYER , Randomized quasi-Monte Carlo: An introduction for practitioners , Springer, 2018
P. L’E CUYER , Randomized quasi-Monte Carlo: An introduction for practitioners , Springer, 2018
work page 2018
-
[31]
A Nonstationary Designer Space-Time Kernel
M. M CCOURT, G. F ASSHAUER , AND D. K OZAK , A nonstationary designer space-time ker- nel, arXiv preprint arXiv:1812.00173, (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[32]
C. S. P ESKIN , The immersed boundary method, Acta Numerica, 11 (2002), pp. 479–517
work page 2002
-
[33]
A. P EYVAN , V. O OMMEN , A. D. J AGTAP, AND G. E. K ARNIADAKIS , RiemannONets: In- terpretable neural operators for Riemann problems, arXiv preprint arXiv:2401.08886, (2024)
-
[34]
R. B. P LATTE , L. N. T REFETHEN , AND A. B. K UIJLAARS , Impossibility of fast stable ap- proximation of analytic functions from equispaced samples, SIAM review, 53 (2011), pp. 308– 318
work page 2011
-
[35]
C. E. R ASMUSSEN AND C. K. W ILLIAMS , Gaussian Processes for Machine Learning, The MIT Press, 2006
work page 2006
-
[36]
R. S CHABACK AND H. W ENDLAND , Kernel techniques: From machine learning to meshless methods, Acta Numerica, 15 (2006), pp. 543–639
work page 2006
-
[37]
V. S HANKAR AND A. L. F OGELSON , Hyperviscosity-based stabilization for radial basis function-finite difference (RBF-FD) discretizations of advection-diffusion equations , Journal of Computational Physics, 372 (2018), pp. 616–639
work page 2018
-
[38]
V. S HANKAR AND S. D. O LSON , Radial basis function (RBF)-based parametric models for closed and open curves within the method of regularized stokeslets , International Journal for Numerical Methods in Fluids, 79 (2015), pp. 269–289
work page 2015
-
[39]
V. S HANKAR , G. B. W RIGHT , R. M. K IRBY, AND A. L. F OGELSON , A radial basis function (RBF)-finite difference (FD) method for diffusion and reaction-diffusion equations on surfaces, Journal of Scientific Computing, 60 (2014), pp. 342–368
work page 2014
-
[40]
R. S HARMA AND V. S HANKAR , Accelerated training of physics-informed neural networks (PINNs) using meshless discretizations , in Advances in Neural Information Processing Sys- tems, vol. 35, Curran Associates, Inc., 2022, pp. 1034–1046
work page 2022
-
[41]
K. S OLODSKIKH , A. K URBANOV , R. A YDARKHANOV , I. Z HELAVSKAYA , Y. P ARFENOV , D. S ONG , AND S. L EFKIMMIATIS , Integral neural networks, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16113–16122
work page 2023
-
[42]
H. W ENDLAND , Piecewise polynomial, positive definite and compactly supported radial func- tions of minimal degree, Advances in Computational Mathematics, 4 (1995), pp. 389–396
work page 1995
-
[43]
H. W ENDLAND , Error estimates for interpolation by compactly supported radial basis func- tions of minimal degree, Journal of Approximation Theory, 93 (1998), pp. 258–272
work page 1998
-
[44]
W ENDLAND , Scattered Data Approximation, Cambridge University Press, 2005
H. W ENDLAND , Scattered Data Approximation, Cambridge University Press, 2005
work page 2005
-
[45]
A. G. W ILSON AND R. P. A DAMS , Gaussian process kernels for pattern discovery and ex- trapolation, in Proceedings of the 30th International Conference on Machine Learning, S. Das- gupta and D. McAllester, eds., vol. 28 of Proceedings of Machine Learning Research, Atlanta, Georgia, USA, 17–19 Jun 2013, PMLR, pp. 1067–1075. 12
work page 2013
-
[46]
G. B. W RIGHT AND B. F ORNBERG , Scattered node compact finite difference-type formulas generated from radial basis functions, Journal of Computational Physics, 212 (2006), pp. 99– 123
work page 2006
-
[47]
J. Z ECH AND C. S CHWAB , Convergence rates of high dimensional smolyak quadrature , ESAIM: Mathematical Modelling and Numerical Analysis, 54 (2020), pp. 1259–1307
work page 2020
-
[48]
Z. Z HANG , L. W ING TAT, AND H. S CHAEFFER , BelNet: Basis enhanced learning, a mesh- free neural operator, Proceedings of the Royal Society A: Mathematical, Physical and Engi- neering Sciences, 479 (2023), p. 20230043. 13 A Appendix A.1 Zero-shot super-resolution As every layer in the KNO is composed of function-space operations, the KNO can achieve zer...
work page 2023
-
[49]
Gaussians everywhere (overfitting): When isotropic Gaussian kernelsϕ(x, x′) = eϵ2∥x−y∥2 2 were used throughout the KNO, we found that the resulting architecture tended to achieve low training error and high test error, while also being highly sensitive to the initial random seed used to optimize the KNO
-
[50]
Wendland everywhere (higher training and test errors): When we used Wendland kernels everywhere, we found that the resulting architecture had significantly higher training and test errors than using Wendland kernels almost everywhere and a spectral mixture kernel at the end. This experiment revealed to us that using a kernel that was not compactly- suppor...
-
[51]
Wendland almost-everywhere, Gaussian for I p q L: This choice of kernels produced ex- cellent training and test accuracy and was relatively robust to choices in the other hyperpa- rameters, but produced higher errors than using the spectral mixture kernel for I p q L. In order to quantify the differences between these choices, we computed the eigenvalue s...
-
[52]
We simply transformed the Gauss-Legendre points to the domain of interest in this case
As was mentioned previously, all 1D examples used Gauss-Legendre points defined on [−1, 1]. We simply transformed the Gauss-Legendre points to the domain of interest in this case
-
[53]
For the Darcy (PWC) and Navier-Stokes problems, we subdivided the domain [0, 1]2 into four squares, then further subdivided each square into two triangles, for a total of eight triangles
-
[54]
For the Darcy (cont.) problem, we simply used two triangles
-
[55]
16 Table 3: This table denotes our chosen configuration for KNO on each dataset
For the Darcy (triangular-notch) problem, we created a five triangle Delaunay mesh over the whole domain; see Figure 8. 16 Table 3: This table denotes our chosen configuration for KNO on each dataset. An asterisk indicates a hyperparameter that when increased also increases the total number of trainable parameters. Here XQ is the total number of quadratur...
-
[56]
For the Darcy (triangle) problem, the domain matched our reference triangle, and so no further subdivision or mapping was used. A.5.3 Hyperparameter choices The optimal hyperparameters for the KNO on each dataset are shown in Table 3. These hyperpa- rameters were tuned manually via trial and error. The following are some relevant observations: (1) Setting...
work page 2080
-
[57]
We found that freeze-training (i.e. training kernel-based layers independently back to front) prior to training the full model hastened its convergence and so used this tactic quite often for the sake of convenience. More specifically, for a certain number of epochs, we allowed only a single layer to affect gradient updates, effectively freezing all other...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.