On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains
Pith reviewed 2026-05-24 08:37 UTC · model grok-4.3
The pith
A strategy determines eigenvalue decay rates for neural tangent kernels and related functions on arbitrary domains rather than the sphere.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We provide a strategy to determine the eigenvalue decay rate of a large class of kernel functions defined on a general domain rather than the sphere. This class includes but is not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underlying truth function f is in [H_NTK]^s, an interpolation space associated with the RKHS H_NTK of NTK. We also showed that the overfitted神经
What carries the argument
The strategy for determining eigenvalue decay rates of the class of kernels on general domains, extending the spherical case to arbitrary domains.
If this is right
- Training dynamics of wide neural networks uniformly approximate neural tangent kernel regression on general domains.
- Wide neural networks achieve minimax optimality when the true function lies in the interpolation space [H_NTK]^s of the NTK RKHS.
- Overfitted neural networks cannot generalize well.
- The eigenvalue decay rate approach applies to the full class of kernels including NTKs of different depths and activations.
Where Pith is reading between the lines
- The domain-general strategy may allow similar decay-rate analysis for kernel methods outside neural-network settings.
- Uniform approximation on arbitrary domains broadens the settings where NTK theory directly informs neural network behavior.
- The failure of overfitted networks to generalize suggests examining regularization choices even when parameter count greatly exceeds sample size.
Load-bearing premise
The strategy for determining eigenvalue decay rates extends from the spherical case to arbitrary domains for the entire class of kernels that includes NTKs of varying depths and activations.
What would settle it
A concrete counterexample on a non-spherical domain where the computed eigenvalue decay rate deviates from the predicted rate, or where wide neural network training fails to uniformly approximate neural tangent kernel regression.
Figures
read the original abstract
In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb S^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a strategy for determining eigenvalue decay rates (EDR) of a class of kernels including neural tangent kernels (NTKs) of varying depths and activations, defined on general domains rather than the sphere. It claims this EDR result implies that wide neural network training dynamics uniformly approximate NTK regression on general domains, that wide NNs achieve minimax optimality when the target lies in the interpolation space [H_NTK]^s, and that overfitted networks fail to generalize.
Significance. A domain-general EDR strategy for NTK-type kernels would enable spectral analysis of kernel regression and NN generalization beyond spheres, strengthening the link between NTK theory and minimax rates via interpolation spaces. The independent-interest claim for the EDR method is plausible if it relies on standard integral-operator spectral theory rather than sphere-specific tools, but the manuscript provides no explicit comparison to existing results on compact domains.
major comments (2)
- [Section on EDR strategy and its application to general domains] The central extension of the EDR strategy from the sphere to arbitrary domains is load-bearing for both the uniform approximation claim and the minimax optimality statement, yet the manuscript does not exhibit a domain-general proof that replaces spherical-harmonic or zonal-harmonic expansions with the spectral theory of the integral operator on L^2(Ω) for compact Ω with boundary regularity; without this, the decay rates used to define [H_NTK]^s are not guaranteed to hold uniformly.
- [Proof of uniform approximation of training dynamics] The claim that wide NN training dynamics uniformly approximate NTK regression on general domains (used to reach the optimality result) rests on the EDR rates; if those rates are only established under sphere-specific assumptions, the approximation statement fails to extend as stated.
minor comments (2)
- [Definition of interpolation spaces] Notation for the interpolation spaces [H_NTK]^s should be defined explicitly with reference to the RKHS norm and the eigenvalue decay, rather than left implicit.
- [Section on overfitted networks] The statement that overfitted networks cannot generalize well would benefit from a precise quantitative bound linking the EDR to the generalization gap.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. Below we respond point-by-point to the major comments, clarifying the domain-general nature of the EDR strategy.
read point-by-point responses
-
Referee: The central extension of the EDR strategy from the sphere to arbitrary domains is load-bearing for both the uniform approximation claim and the minimax optimality statement, yet the manuscript does not exhibit a domain-general proof that replaces spherical-harmonic or zonal-harmonic expansions with the spectral theory of the integral operator on L^2(Ω) for compact Ω with boundary regularity; without this, the decay rates used to define [H_NTK]^s are not guaranteed to hold uniformly.
Authors: The EDR strategy in the manuscript is formulated directly via the spectral theory of the integral operator on L^2(Ω) for a general compact domain Ω (with the stated boundary regularity). No spherical or zonal harmonic expansions are employed; the decay rates follow from the assumed smoothness of the kernel and standard results on the eigenvalues of integral operators with continuous kernels. The relevant arguments appear in Sections 3–4 and apply uniformly to the class of kernels considered, including NTKs of varying depths. We will add an explicit remark in the introduction contrasting the approach with sphere-specific techniques to make this generality clearer. revision: partial
-
Referee: The claim that wide NN training dynamics uniformly approximate NTK regression on general domains (used to reach the optimality result) rests on the EDR rates; if those rates are only established under sphere-specific assumptions, the approximation statement fails to extend as stated.
Authors: Because the EDR rates are obtained from the domain-general integral-operator analysis described above, the subsequent uniform approximation of wide-NN training dynamics by NTK regression (Theorem 5.1) likewise holds on general domains. The approximation argument relies on the eigenvalue decay to control the RKHS norm and does not invoke sphere geometry. revision: no
Circularity Check
No circularity detected; derivation relies on claimed new EDR strategy for general domains.
full rationale
The abstract and provided excerpts describe a new strategy for eigenvalue decay rates on arbitrary domains (distinct from the sphere), followed by an approximation result for wide NN dynamics to NTK regression and a minimax optimality claim under an interpolation-space assumption. No equations, self-citations, or fitted quantities are exhibited that reduce any central prediction or uniqueness claim to its own inputs by construction. The extension to general domains is presented as the paper's contribution rather than imported via self-reference or ansatz smuggling. This is the expected non-finding for a paper whose core technical step is an independent methodological extension.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The class of kernels includes NTKs for networks of different depths and activations, and the EDR strategy applies uniformly to this class on general domains.
Forward citations
Cited by 1 Pith paper
-
Large Dimensional Kernel Ridge Regression: Extending to Product Kernels
Extends high-dimensional KRR to product kernels, proving convergence rates that recover minimax optimality for source condition s ≤ 1, saturation for s > 1, and multiple-descent phenomena with respect to sample size n.
Reference graph
Works this paper leans on
-
[1]
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. On the convergence rate of training recurrent neural networks. Advances in neural information processing systems , 32, 2019a. Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. A convergence theory for deep learning via over-parameterization, June 2019b. URL http://arxiv.org/abs/1811.03962. Ingo Steinwart (auth.) And...
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
URL https://proceedings.neurips.cc/paper/2019/hash/ dbc4d84bfcfe2284ba11beffb853a8c4-Abstract.html
Curran Asso- ciates, Inc., 2019b. URL https://proceedings.neurips.cc/paper/2019/hash/ dbc4d84bfcfe2284ba11beffb853a8c4-Abstract.html. D. Azevedo and V.A. Menegatto. Sharp estimates for eigenvalues of integral operators generated by dot product kernels on the sphere. Journal of Approximation Theory , 177: 57–68, January
work page 2019
-
[3]
doi: 10.1016/j.jat.2013.10.002
ISSN 00219045. doi: 10.1016/j.jat.2013.10.002. 43 Li, Yu, Chen and Lin Peter L. Bartlett, Philip M. Long, G´ abor Lugosi, and Alexander Tsigler. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences , 117(48):30063– 30070,
-
[4]
Daniel Beaglehole, Mikhail Belkin, and Parthe Pandit
doi: 10.1016/j.jco.2006.07.001. Daniel Beaglehole, Mikhail Belkin, and Parthe Pandit. Kernel ridgeless regression is incon- sistent in low dimensions, June
-
[5]
Deep equals shallow for ReLU networks in kernel regimes
Alberto Bietti and Francis Bach. Deep equals shallow for ReLU networks in kernel regimes. arXiv preprint arXiv:2009.14397 ,
-
[6]
1007/s10208-006-0196-8. Lin Chen and Sheng Xu. Deep neural tangent kernel and laplace kernel have the same RKHS. arXiv preprint arXiv:2009.10683 ,
-
[7]
URL https://proceedings.neurips.cc/paper/2009/file/ 5751ec3e9a4feab575962e78e006250d-Paper.pdf. Feng Dai and Yuan Xu. Approximation Theory and Harmonic Analysis on Spheres and Balls. Springer Monographs in Mathematics. Springer New York, New York, NY,
work page 2009
-
[8]
doi: 10.1007/978-1-4614-6660-4
ISBN 978-1-4614-6659-8 978-1-4614-6660-4. doi: 10.1007/978-1-4614-6660-4. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding, May
-
[9]
neurips.cc/paper/2018/file/5a4be1fa34e62bb8a6ec6b91d2462f5a-Paper.pdf
URL https://proceedings. neurips.cc/paper/2018/file/5a4be1fa34e62bb8a6ec6b91d2462f5a-Paper.pdf. Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for gen- erative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410,
work page 2018
-
[10]
45 Li, Yu, Chen and Lin Yicheng Li, Haobo Zhang, and Qian Lin
URL https://proceedings.neurips.cc/ paper/2019/hash/0d1a9651497a38d8b1c3871c84528bd4-Abstract.html. 45 Li, Yu, Chen and Lin Yicheng Li, Haobo Zhang, and Qian Lin. Kernel interpolation generalizes poorly. Biometrika, page asad048, August 2023a. ISSN 0006-3444, 1464-3510. doi: 10.1093/ biomet/asad048. Yicheng Li, Haobo Zhang, and Qian Lin. On the saturation...
work page 2019
-
[11]
Andrea Montanari and Yiqiao Zhong
doi: 10.1016/j.acha.2018.09.009. Andrea Montanari and Yiqiao Zhong. The interpolation phase transition in neural networks: Memorization and generalization under lazy training. The Annals of Statistics , 50(5): 2816–2847,
-
[12]
ISBN 978-1-4704-1103-9 978-1-4704-2763-4. doi: 10.1090/simon/004. Ingo Steinwart and C. Scovel. Mercer’s theorem on general domains: On the interaction between measures, kernels, and RKHSs. Constructive Approximation , 35(3):363–417,
-
[13]
Mercer’s Theorem on General Domains: On the Interaction between Measures, Kernels, and RKHSs
doi: 10.1007/S00365-012-9153-3. 46 Eigenvalues of NTK on General Domains Namjoon Suh, Hyunouk Ko, and Xiaoming Huo. A non-parametric regression view- point: Generalization of overparametrized deep ReLU network under noisy observa- tions. In International Conference on Learning Representations , May
-
[14]
Introduction to the non-asymptotic analysis of random matrices
URL https://openreview.net/forum?id=bZJbzaj_IlP. Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027,
work page internal anchor Pith review Pith/arXiv arXiv
- [15]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.