pith. sign in

arxiv: 2401.01599 · v4 · pith:XZI6XNA3new · submitted 2024-01-03 · 💻 cs.LG · math.ST· stat.TH

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

classification 💻 cs.LG math.STstat.TH
keywords kernelgeneralizationerroralgorithmsanalyticregressioncurvesmethod
0
0 comments X
read the original abstract

The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression. Consequently, we could sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification, etc. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks. A novel technical contribution, the analytic functional argument, might be of independent interest.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Large Dimensional Kernel Ridge Regression: Extending to Product Kernels

    stat.ML 2026-05 unverdicted novelty 7.0

    Extends high-dimensional KRR to product kernels, proving convergence rates that recover minimax optimality for source condition s ≤ 1, saturation for s > 1, and multiple-descent phenomena with respect to sample size n.

  2. Alignment-Sensitive Minimax Rates for Spectral Algorithms with Learned Kernels

    cs.LG 2025-09 unverdicted novelty 7.0

    Introduces alignment-sensitive effective span dimension (ESD) for learned-kernel spectral algorithms and proves minimax excess risk bounds of order sigma^2 * ESD, with gradient flow shown to reduce ESD.

  3. Sharp convergence rates for Spectral methods via the feature space decomposition method

    math.ST 2025-12 unverdicted novelty 5.0

    The paper derives sharp matching convergence rates for spectral methods in linear regression via feature space decomposition, enabling pre-ordering of algorithms and generalizing saturation effects.