Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

Qian Lin; Weiye Gan; Yicheng Li; Zuoqiang Shi

arxiv: 2401.01599 · v4 · pith:XZI6XNA3new · submitted 2024-01-03 · 💻 cs.LG · math.ST· stat.TH

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

Yicheng Li , Weiye Gan , Zuoqiang Shi , Qian Lin This is my paper

classification 💻 cs.LG math.STstat.TH

keywords kernelgeneralizationerroralgorithmsanalyticregressioncurvesmethod

0 comments

read the original abstract

The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression. Consequently, we could sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification, etc. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks. A novel technical contribution, the analytic functional argument, might be of independent interest.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Large Dimensional Kernel Ridge Regression: Extending to Product Kernels
stat.ML 2026-05 unverdicted novelty 7.0

Extends high-dimensional KRR to product kernels, proving convergence rates that recover minimax optimality for source condition s ≤ 1, saturation for s > 1, and multiple-descent phenomena with respect to sample size n.
Alignment-Sensitive Minimax Rates for Spectral Algorithms with Learned Kernels
cs.LG 2025-09 unverdicted novelty 7.0

Introduces alignment-sensitive effective span dimension (ESD) for learned-kernel spectral algorithms and proves minimax excess risk bounds of order sigma^2 * ESD, with gradient flow shown to reduce ESD.
Sharp convergence rates for Spectral methods via the feature space decomposition method
math.ST 2025-12 unverdicted novelty 5.0

The paper derives sharp matching convergence rates for spectral methods in linear regression via feature space decomposition, enabling pre-ordering of algorithms and generalizing saturation effects.