pith. sign in

arxiv: 2604.23212 · v1 · submitted 2026-04-25 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Learning Curves and Benign Overfitting of Spectral Algorithms in Large Dimensions

Pith reviewed 2026-05-08 07:11 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH
keywords learning curvesbenign overfittingspectral algorithmskernel ridge regressionhigh-dimensional asymptoticsregularization pathsource conditions
0
0 comments X

The pith

Spectral algorithms in high dimensions have excess risk that splits into three distinct regimes along the full regularization path, with benign overfitting in the under-regularized and interpolation regimes for source conditions 0 < s ≤ s*.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives an exact asymptotic expression for the excess risk of spectral algorithms, including kernel ridge regression, when sample size n scales as a positive power of dimension d. This formula shows that the risk curve is not a simple U-shape but instead transitions through an over-regularized regime, an under-regularized regime, and an interpolation regime at zero regularization. The characterization identifies a critical smoothness threshold s* such that benign overfitting, where risk remains close to the optimal level, holds consistently in the under-regularized and interpolation regimes whenever the target satisfies 0 < s ≤ s*. The same asymptotic picture extends to a broader class of kernels whose low-degree eigenspaces obey spectral scaling and hyper-contractivity.

Core claim

In the proportional regime n ≍ d^γ with γ > 0, the excess risk of spectral algorithms admits a sharp asymptotic characterization across all regularization strengths under source conditions s ≥ 0. This yields three regimes: over-regularized, where risk decreases as regularization weakens; under-regularized, where risk behavior depends on s; and the interpolation limit. Benign overfitting occurs for all 0 < s ≤ s*, and the kernel risk in the sufficiently regularized regime matches that of an associated sequence model.

What carries the argument

The sharp asymptotic excess-risk formula obtained by analyzing the eigenvalue distribution of the kernel operator together with the source condition, which decomposes risk into bias and variance terms whose scaling changes across regularization strengths.

If this is right

  • The risk remains controlled in the under-regularized regime for targets with positive but bounded smoothness.
  • Benign overfitting is explained by the asymptotic balance of bias and variance without needing explicit interpolation analysis.
  • In the over-regularized regime the kernel estimator behaves like a finite-dimensional sequence model.
  • The three-regime structure extends to kernels on general domains whose low-degree eigenfunctions obey the stated scaling and concentration conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners could safely use moderate under-regularization for moderately smooth targets without incurring large excess risk.
  • The same regime decomposition may apply to other high-dimensional linear estimators whose effective degrees of freedom follow similar eigenvalue decay.
  • Empirical checks on real high-dimensional data sets with controlled smoothness could locate the critical s* and test the sharpness of the transitions.

Load-bearing premise

The kernels are inner-product kernels on the sphere or satisfy spectral-scaling and hyper-contractivity on their low-degree eigenspaces, and the data live in the proportional high-dimensional regime n ≍ d^γ.

What would settle it

For an inner-product kernel on the unit sphere, fix s between 0 and s*, generate data with n proportional to d^γ, compute empirical excess risk over a grid of regularization values from large to zero, and check whether the observed curve exhibits the predicted three-regime shape with benign overfitting in the final two regimes.

Figures

Figures reproduced from arXiv: 2604.23212 by Dongming Huang, Qian Lin, Weihao Lu, Yingcun Xia.

Figure 1
Figure 1. Figure 1: A graphical representation of the learning curves of large dimensional spectral algorithms view at source ↗
Figure 2
Figure 2. Figure 2: Another graphical representation of the learning curve of large dimensional spectral algorithms view at source ↗
Figure 3
Figure 3. Figure 3: Type 1 experiments with parameters (γ, s, u) = (1.5, 1.5, 0.5) and regularization λ = Cλd −u . The left panel uses the NTK kernel, and the right panel uses the RBF kernel. In each panel, we plot ln(Excess risk) versus ln(d) for KRR (τ = 1) and KGF (τ = ∞), with each method using its optimal Cλ value (selected separately). Dashed lines show the least-squares fits. The legend reports the fitted slopes (with … view at source ↗
Figure 4
Figure 4. Figure 4: Type 1 experiments with (γ, s, u) = (0.8, 1.0, 2.0). The setup and analysis are the same as in view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of the experimental and theoretical convergence rates for Type 2 experiments view at source ↗
Figure 6
Figure 6. Figure 6: Type 2 experiments with (γ, s) = (1.5, 2). The setup and analysis are the same as in view at source ↗
Figure 7
Figure 7. Figure 7: Type 1 experiments with (γ, s, u) = (1.0, 1.0, 2.0). The setup and analysis are the same as in view at source ↗
Figure 8
Figure 8. Figure 8: Type 1 experiments with (γ, s, u) = (1.2, 2.0, 1.0). The setup and analysis are the same as in view at source ↗
Figure 9
Figure 9. Figure 9: Complete results for Type 1 experiments with view at source ↗
Figure 10
Figure 10. Figure 10: Complete results for Type 1 experiments with view at source ↗
Figure 11
Figure 11. Figure 11: Complete results for Type 1 experiments with view at source ↗
Figure 12
Figure 12. Figure 12: Complete results for Type 1 experiments with view at source ↗
read the original abstract

Existing large-dimensional theory for spectral algorithms resolves either the optimally tuned point or the interpolation limit, but leaves the under-regularized regime unexplored. We study the learning curve and benign overfitting of spectral algorithms in the large-dimensional setting where the sample size and dimension are of comparable order, i.e., $n \asymp d^{\gamma}$ for some $\gamma>0$. We first consider inner-product kernels on the sphere $\mathbb{S}^{d-1}$ and establish a sharp asymptotic characterization of the excess risk across the full regularization path under various source conditions $s \geq 0$, where $s$ measures the relative smoothness of the regression function. Our results reveal that the learning curve is not simply U-shaped but instead consists of three distinct regimes: over-regularized, under-regularized, and interpolation regimes. This characterization allows us to fully capture the benign overfitting phenomenon, demonstrating that benign overfitting arises consistently across both the under-regularized and interpolation regimes whenever $s$ is positive but no larger than a critical threshold. We further show that, in the sufficiently regularized regime, the kernel learning curve is recovered by an associated sequence model. Finally, we extend the learning-curve analysis to large-dimensional KRR for a class of kernels on general domains in $\mathbb{R}^d$ whose low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript establishes sharp asymptotic characterizations of the excess risk for spectral algorithms (including kernel ridge regression) in the large-dimensional proportional regime where n ≍ d^γ for γ > 0. For inner-product kernels on the sphere, under source conditions s ≥ 0, the learning curve exhibits three distinct regimes (over-regularized, under-regularized, and interpolation) across the full regularization path; benign overfitting is shown to occur consistently for 0 < s ≤ s*. The analysis further recovers the kernel learning curve via an associated sequence model in the sufficiently regularized regime and extends the results to a class of kernels on general domains whose low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity.

Significance. If the claimed asymptotics hold, this provides a complete theoretical description of the regularization path for spectral methods in high dimensions, moving beyond isolated analyses of optimal tuning or the interpolation limit. The explicit three-regime decomposition, precise conditions for benign overfitting, and the sequence-model equivalence constitute a substantive advance for understanding generalization in overparameterized kernel models. The extension to general domains under verifiable structural conditions on the kernel further broadens applicability.

major comments (2)
  1. [§3] §3 (main asymptotic results): the boundaries separating the three regimes are characterized in terms of λ relative to n and d, but the explicit dependence of these thresholds on the proportionality exponent γ is not stated; this dependence is load-bearing for the claim that the regimes are distinct and exhaustive for any γ > 0.
  2. [Theorem 4.3] Theorem 4.3 (benign overfitting for 0 < s ≤ s*): the critical threshold s* is defined via the kernel spectrum and source condition, yet the proof sketch does not explicitly verify that the variance term remains bounded while the bias vanishes uniformly in the under-regularized regime; an additional uniform integrability argument appears necessary to make the limit sharp.
minor comments (2)
  1. [Introduction] The notation for the excess risk R(λ) versus the population risk should be introduced with a displayed equation in the introduction to prevent any ambiguity with in-sample quantities.
  2. [§5] In the extension to general domains (§5), the hyper-contractivity assumption is invoked for low-degree eigenspaces; a brief remark on which standard kernels (e.g., Gaussian) satisfy it for the relevant degree range would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and insightful comments on our manuscript. We address the major comments point by point below and will revise the manuscript to incorporate the requested clarifications.

read point-by-point responses
  1. Referee: [§3] §3 (main asymptotic results): the boundaries separating the three regimes are characterized in terms of λ relative to n and d, but the explicit dependence of these thresholds on the proportionality exponent γ is not stated; this dependence is load-bearing for the claim that the regimes are distinct and exhaustive for any γ > 0.

    Authors: We agree that explicitly stating the dependence on γ strengthens the clarity of the regime separation. In the revised manuscript, we will add explicit expressions for the regime thresholds in terms of γ (derived directly from the asymptotic characterizations in §3). For example, the boundary between the over-regularized and under-regularized regimes scales as λ ≍ n^{-1} d^{γ(1-s)} or analogous forms depending on the source condition, confirming that the three regimes remain distinct and exhaustive for every γ > 0. This addition will be placed in the statement of the main results and the accompanying discussion. revision: yes

  2. Referee: [Theorem 4.3] Theorem 4.3 (benign overfitting for 0 < s ≤ s*): the critical threshold s* is defined via the kernel spectrum and source condition, yet the proof sketch does not explicitly verify that the variance term remains bounded while the bias vanishes uniformly in the under-regularized regime; an additional uniform integrability argument appears necessary to make the limit sharp.

    Authors: We thank the referee for highlighting this point. The full proof in the appendix already controls the variance term via moment bounds that remain uniform in the under-regularized regime and shows bias vanishing under the source condition. However, to make the argument fully explicit and address the uniform integrability concern, we will insert an additional lemma (or expanded remark) in the proof of Theorem 4.3 that verifies uniform boundedness of the variance and applies a uniform integrability argument to justify interchanging limits. This will render the benign-overfitting statement sharp without altering the result itself. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central results consist of sharp asymptotic characterizations of excess risk derived from explicit large-dimensional analysis of the kernel matrix spectrum, bias-variance decomposition, and source conditions s ≥ 0 under the proportional regime n ≍ d^γ. These limits are tracked directly via the stated assumptions on inner-product kernels (or kernels satisfying spectral scaling and hyper-contractivity) without any reduction of the target quantities to fitted parameters, self-definitions, or load-bearing self-citations. The three-regime structure and benign overfitting claims for 0 < s ≤ s* emerge as consequences of the asymptotic tracking rather than being presupposed by the inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard high-dimensional random matrix theory for kernel matrices, source-condition assumptions on the regression function, and structural conditions on the kernel (spectral scaling, hyper-contractivity). No new entities are postulated.

free parameters (2)
  • regularization parameter λ
    Analysis is performed for the entire path λ > 0; λ is not fitted to data but treated as a variable.
  • source condition parameter s
    s ≥ 0 is a modeling choice that indexes the smoothness class; the critical threshold s* is derived from the kernel spectrum.
axioms (2)
  • domain assumption Kernel matrix spectrum admits a deterministic equivalent in the proportional limit n ≍ d^γ
    Invoked to obtain the sharp asymptotic risk formulas.
  • domain assumption Low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity
    Required for the extension to general domains in R^d.

pith-pipeline@v0.9.0 · 5554 in / 1334 out tokens · 48578 ms · 2026-05-08T07:11:56.787541+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

  1. [1]

    Optimal rates for the regularized least-squares algorithm,

    PMLR. Brown, L. D. and M. G. Low (1996). Asymptotic equivalence of nonparametric regression and white noise.The Annals of Statistics 24(6), 2384–2398. Buchholz, S. (2022). Kernel interpolation in sobolev spaces is not consistent in low dimensions. In Conference on Learning Theory, pp. 3410–3440. PMLR. Caponnetto, A. (2006, September). Optimal rates for re...

  2. [2]

    Step (IV).Finally we combine the estimates from Step (II) (variance) and Step (III) (bias decomposition) to obtain the learning curve

    Notice that for kernel ridge regression, we have φλ, ˜K :=n −1φKRR λ ( ˜K/n) = ( ˜K+nλI n)−1 = ˜M −1.(87) 70 Learning curves in large dimensions Moreover, from Step (I), the generalization of Lemma G.8 and Lemma G.6 (v) imply that ˜Σ≤˜ℓ − ˜Σ≤˜ℓ ˜Ψ⊤ ≤˜ℓ ˜M −1 ˜Ψ≤˜ℓ ˜Σ≤˜ℓ = ˜Σ−1 ≤˜ℓ + ˜Ψ⊤ ≤˜ℓ ˜M −1 >˜ℓ ˜Ψ≤˜ℓ −1 ,(88) λmax ˜Σ−1 ≤˜ℓ + ˜Ψ⊤ ≤˜ℓ ˜M −1 >˜ℓ ˜Ψ≤˜ℓ ...

  3. [3]

    dY i=1 ˜L(αi) ki (x(i)) #

    Therefore, we have Ωd,P dγ−ℓγ −1 +d ℓγ −γ +d −(ℓγ+1)s = Ωd,P dℓγ −γ . Furthermore, we have −(p+ 1)s <−ℓ γΓ(γ) =ℓ γ −γ,ifp=ℓ γ −1, −(p+ 1)s < p+ 1−γ < ℓ γ −γ,ifp < ℓ γ −1, where the first case uses s >Γ(γ) and (16), and the second case uses γ <(p+ 1)(s+ 1) by the definition ofp. Together withp < ℓ γ, we have Ωd,P dℓγ −γ ≫d p−γ +d −(p+1)s in prob. Case 3:γ∈...