Learning Curves and Benign Overfitting of Spectral Algorithms in Large Dimensions

Dongming Huang; Qian Lin; Weihao Lu; Yingcun Xia

arxiv: 2604.23212 · v1 · submitted 2026-04-25 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Learning Curves and Benign Overfitting of Spectral Algorithms in Large Dimensions

Weihao Lu , Qian Lin , Yingcun Xia , Dongming Huang This is my paper

Pith reviewed 2026-05-08 07:11 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords learning curvesbenign overfittingspectral algorithmskernel ridge regressionhigh-dimensional asymptoticsregularization pathsource conditions

0 comments

The pith

Spectral algorithms in high dimensions have excess risk that splits into three distinct regimes along the full regularization path, with benign overfitting in the under-regularized and interpolation regimes for source conditions 0 < s ≤ s*.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives an exact asymptotic expression for the excess risk of spectral algorithms, including kernel ridge regression, when sample size n scales as a positive power of dimension d. This formula shows that the risk curve is not a simple U-shape but instead transitions through an over-regularized regime, an under-regularized regime, and an interpolation regime at zero regularization. The characterization identifies a critical smoothness threshold s* such that benign overfitting, where risk remains close to the optimal level, holds consistently in the under-regularized and interpolation regimes whenever the target satisfies 0 < s ≤ s*. The same asymptotic picture extends to a broader class of kernels whose low-degree eigenspaces obey spectral scaling and hyper-contractivity.

Core claim

In the proportional regime n ≍ d^γ with γ > 0, the excess risk of spectral algorithms admits a sharp asymptotic characterization across all regularization strengths under source conditions s ≥ 0. This yields three regimes: over-regularized, where risk decreases as regularization weakens; under-regularized, where risk behavior depends on s; and the interpolation limit. Benign overfitting occurs for all 0 < s ≤ s*, and the kernel risk in the sufficiently regularized regime matches that of an associated sequence model.

What carries the argument

The sharp asymptotic excess-risk formula obtained by analyzing the eigenvalue distribution of the kernel operator together with the source condition, which decomposes risk into bias and variance terms whose scaling changes across regularization strengths.

If this is right

The risk remains controlled in the under-regularized regime for targets with positive but bounded smoothness.
Benign overfitting is explained by the asymptotic balance of bias and variance without needing explicit interpolation analysis.
In the over-regularized regime the kernel estimator behaves like a finite-dimensional sequence model.
The three-regime structure extends to kernels on general domains whose low-degree eigenfunctions obey the stated scaling and concentration conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners could safely use moderate under-regularization for moderately smooth targets without incurring large excess risk.
The same regime decomposition may apply to other high-dimensional linear estimators whose effective degrees of freedom follow similar eigenvalue decay.
Empirical checks on real high-dimensional data sets with controlled smoothness could locate the critical s* and test the sharpness of the transitions.

Load-bearing premise

The kernels are inner-product kernels on the sphere or satisfy spectral-scaling and hyper-contractivity on their low-degree eigenspaces, and the data live in the proportional high-dimensional regime n ≍ d^γ.

What would settle it

For an inner-product kernel on the unit sphere, fix s between 0 and s*, generate data with n proportional to d^γ, compute empirical excess risk over a grid of regularization values from large to zero, and check whether the observed curve exhibits the predicted three-regime shape with benign overfitting in the final two regimes.

Figures

Figures reproduced from arXiv: 2604.23212 by Dongming Huang, Qian Lin, Weihao Lu, Yingcun Xia.

**Figure 1.** Figure 1: A graphical representation of the learning curves of large dimensional spectral algorithms view at source ↗

**Figure 2.** Figure 2: Another graphical representation of the learning curve of large dimensional spectral algorithms view at source ↗

**Figure 3.** Figure 3: Type 1 experiments with parameters (γ, s, u) = (1.5, 1.5, 0.5) and regularization λ = Cλd −u . The left panel uses the NTK kernel, and the right panel uses the RBF kernel. In each panel, we plot ln(Excess risk) versus ln(d) for KRR (τ = 1) and KGF (τ = ∞), with each method using its optimal Cλ value (selected separately). Dashed lines show the least-squares fits. The legend reports the fitted slopes (with … view at source ↗

**Figure 4.** Figure 4: Type 1 experiments with (γ, s, u) = (0.8, 1.0, 2.0). The setup and analysis are the same as in view at source ↗

**Figure 5.** Figure 5: Comparison of the experimental and theoretical convergence rates for Type 2 experiments view at source ↗

**Figure 6.** Figure 6: Type 2 experiments with (γ, s) = (1.5, 2). The setup and analysis are the same as in view at source ↗

**Figure 7.** Figure 7: Type 1 experiments with (γ, s, u) = (1.0, 1.0, 2.0). The setup and analysis are the same as in view at source ↗

**Figure 8.** Figure 8: Type 1 experiments with (γ, s, u) = (1.2, 2.0, 1.0). The setup and analysis are the same as in view at source ↗

**Figure 9.** Figure 9: Complete results for Type 1 experiments with view at source ↗

**Figure 10.** Figure 10: Complete results for Type 1 experiments with view at source ↗

**Figure 11.** Figure 11: Complete results for Type 1 experiments with view at source ↗

**Figure 12.** Figure 12: Complete results for Type 1 experiments with view at source ↗

read the original abstract

Existing large-dimensional theory for spectral algorithms resolves either the optimally tuned point or the interpolation limit, but leaves the under-regularized regime unexplored. We study the learning curve and benign overfitting of spectral algorithms in the large-dimensional setting where the sample size and dimension are of comparable order, i.e., $n \asymp d^{\gamma}$ for some $\gamma>0$. We first consider inner-product kernels on the sphere $\mathbb{S}^{d-1}$ and establish a sharp asymptotic characterization of the excess risk across the full regularization path under various source conditions $s \geq 0$, where $s$ measures the relative smoothness of the regression function. Our results reveal that the learning curve is not simply U-shaped but instead consists of three distinct regimes: over-regularized, under-regularized, and interpolation regimes. This characterization allows us to fully capture the benign overfitting phenomenon, demonstrating that benign overfitting arises consistently across both the under-regularized and interpolation regimes whenever $s$ is positive but no larger than a critical threshold. We further show that, in the sufficiently regularized regime, the kernel learning curve is recovered by an associated sequence model. Finally, we extend the learning-curve analysis to large-dimensional KRR for a class of kernels on general domains in $\mathbb{R}^d$ whose low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first sharp asymptotics for the entire regularization path of spectral algorithms in the proportional high-dimensional limit, with a three-regime learning curve and benign overfitting in the under-regularized regime.

read the letter

The main point is that this fills the gap between the optimally tuned and interpolation cases by characterizing the excess risk for all regularization levels in the n ~ d^gamma setting. They split the curve into over-regularized, under-regularized, and interpolation regimes, and show benign overfitting occurs for source conditions 0 < s ≤ s* in both the under-regularized and interpolation parts. What they do well is derive explicit limits for bias and variance terms using the kernel spectrum and source conditions, then recover the standard kernel learning curve in the regularized regime via a sequence model. Extending the results to kernels on general domains that meet spectral-scaling and hyper-contractivity is a solid step toward broader applicability. The assumptions are stated up front—inner-product kernels on the sphere or similar—and the analysis avoids circularity by building from the matrix spectrum. The stress-test found no internal contradictions, which is reassuring. Soft spots are limited. The three-regime picture depends on those specific kernel properties, so it may not generalize immediately to other kernels. Without the full proofs in front of us, it's difficult to confirm the sharpness of the asymptotics or the exact error controls, though the abstract suggests they are handled. The proportional regime is standard but still a restriction. This paper is aimed at theorists working on high-dimensional kernel ridge regression and benign overfitting. Readers who want to see how the learning curve behaves away from the usual points will get concrete value from the regime breakdown. I would bring this to a reading group focused on statistical machine learning theory. It deserves peer review because it addresses an unexplored regime with a structured asymptotic analysis that builds on existing tools.

Referee Report

2 major / 2 minor

Summary. The manuscript establishes sharp asymptotic characterizations of the excess risk for spectral algorithms (including kernel ridge regression) in the large-dimensional proportional regime where n ≍ d^γ for γ > 0. For inner-product kernels on the sphere, under source conditions s ≥ 0, the learning curve exhibits three distinct regimes (over-regularized, under-regularized, and interpolation) across the full regularization path; benign overfitting is shown to occur consistently for 0 < s ≤ s*. The analysis further recovers the kernel learning curve via an associated sequence model in the sufficiently regularized regime and extends the results to a class of kernels on general domains whose low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity.

Significance. If the claimed asymptotics hold, this provides a complete theoretical description of the regularization path for spectral methods in high dimensions, moving beyond isolated analyses of optimal tuning or the interpolation limit. The explicit three-regime decomposition, precise conditions for benign overfitting, and the sequence-model equivalence constitute a substantive advance for understanding generalization in overparameterized kernel models. The extension to general domains under verifiable structural conditions on the kernel further broadens applicability.

major comments (2)

[§3] §3 (main asymptotic results): the boundaries separating the three regimes are characterized in terms of λ relative to n and d, but the explicit dependence of these thresholds on the proportionality exponent γ is not stated; this dependence is load-bearing for the claim that the regimes are distinct and exhaustive for any γ > 0.
[Theorem 4.3] Theorem 4.3 (benign overfitting for 0 < s ≤ s*): the critical threshold s* is defined via the kernel spectrum and source condition, yet the proof sketch does not explicitly verify that the variance term remains bounded while the bias vanishes uniformly in the under-regularized regime; an additional uniform integrability argument appears necessary to make the limit sharp.

minor comments (2)

[Introduction] The notation for the excess risk R(λ) versus the population risk should be introduced with a displayed equation in the introduction to prevent any ambiguity with in-sample quantities.
[§5] In the extension to general domains (§5), the hyper-contractivity assumption is invoked for low-degree eigenspaces; a brief remark on which standard kernels (e.g., Gaussian) satisfy it for the relevant degree range would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and insightful comments on our manuscript. We address the major comments point by point below and will revise the manuscript to incorporate the requested clarifications.

read point-by-point responses

Referee: [§3] §3 (main asymptotic results): the boundaries separating the three regimes are characterized in terms of λ relative to n and d, but the explicit dependence of these thresholds on the proportionality exponent γ is not stated; this dependence is load-bearing for the claim that the regimes are distinct and exhaustive for any γ > 0.

Authors: We agree that explicitly stating the dependence on γ strengthens the clarity of the regime separation. In the revised manuscript, we will add explicit expressions for the regime thresholds in terms of γ (derived directly from the asymptotic characterizations in §3). For example, the boundary between the over-regularized and under-regularized regimes scales as λ ≍ n^{-1} d^{γ(1-s)} or analogous forms depending on the source condition, confirming that the three regimes remain distinct and exhaustive for every γ > 0. This addition will be placed in the statement of the main results and the accompanying discussion. revision: yes
Referee: [Theorem 4.3] Theorem 4.3 (benign overfitting for 0 < s ≤ s*): the critical threshold s* is defined via the kernel spectrum and source condition, yet the proof sketch does not explicitly verify that the variance term remains bounded while the bias vanishes uniformly in the under-regularized regime; an additional uniform integrability argument appears necessary to make the limit sharp.

Authors: We thank the referee for highlighting this point. The full proof in the appendix already controls the variance term via moment bounds that remain uniform in the under-regularized regime and shows bias vanishing under the source condition. However, to make the argument fully explicit and address the uniform integrability concern, we will insert an additional lemma (or expanded remark) in the proof of Theorem 4.3 that verifies uniform boundedness of the variance and applies a uniform integrability argument to justify interchanging limits. This will render the benign-overfitting statement sharp without altering the result itself. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central results consist of sharp asymptotic characterizations of excess risk derived from explicit large-dimensional analysis of the kernel matrix spectrum, bias-variance decomposition, and source conditions s ≥ 0 under the proportional regime n ≍ d^γ. These limits are tracked directly via the stated assumptions on inner-product kernels (or kernels satisfying spectral scaling and hyper-contractivity) without any reduction of the target quantities to fitted parameters, self-definitions, or load-bearing self-citations. The three-regime structure and benign overfitting claims for 0 < s ≤ s* emerge as consequences of the asymptotic tracking rather than being presupposed by the inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard high-dimensional random matrix theory for kernel matrices, source-condition assumptions on the regression function, and structural conditions on the kernel (spectral scaling, hyper-contractivity). No new entities are postulated.

free parameters (2)

regularization parameter λ
Analysis is performed for the entire path λ > 0; λ is not fitted to data but treated as a variable.
source condition parameter s
s ≥ 0 is a modeling choice that indexes the smoothness class; the critical threshold s* is derived from the kernel spectrum.

axioms (2)

domain assumption Kernel matrix spectrum admits a deterministic equivalent in the proportional limit n ≍ d^γ
Invoked to obtain the sharp asymptotic risk formulas.
domain assumption Low-degree eigenspaces satisfy spectral-scaling and hyper-contractivity
Required for the extension to general domains in R^d.

pith-pipeline@v0.9.0 · 5554 in / 1334 out tokens · 48578 ms · 2026-05-08T07:11:56.787541+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

Optimal rates for the regularized least-squares algorithm,

PMLR. Brown, L. D. and M. G. Low (1996). Asymptotic equivalence of nonparametric regression and white noise.The Annals of Statistics 24(6), 2384–2398. Buchholz, S. (2022). Kernel interpolation in sobolev spaces is not consistent in low dimensions. In Conference on Learning Theory, pp. 3410–3440. PMLR. Caponnetto, A. (2006, September). Optimal rates for re...

work page arXiv 1996
[2]

Step (IV).Finally we combine the estimates from Step (II) (variance) and Step (III) (bias decomposition) to obtain the learning curve

Notice that for kernel ridge regression, we have φλ, ˜K :=n −1φKRR λ ( ˜K/n) = ( ˜K+nλI n)−1 = ˜M −1.(87) 70 Learning curves in large dimensions Moreover, from Step (I), the generalization of Lemma G.8 and Lemma G.6 (v) imply that ˜Σ≤˜ℓ − ˜Σ≤˜ℓ ˜Ψ⊤ ≤˜ℓ ˜M −1 ˜Ψ≤˜ℓ ˜Σ≤˜ℓ = ˜Σ−1 ≤˜ℓ + ˜Ψ⊤ ≤˜ℓ ˜M −1 >˜ℓ ˜Ψ≤˜ℓ −1 ,(88) λmax ˜Σ−1 ≤˜ℓ + ˜Ψ⊤ ≤˜ℓ ˜M −1 >˜ℓ ˜Ψ≤˜ℓ ...

work page 2024
[3]

dY i=1 ˜L(αi) ki (x(i)) #

Therefore, we have Ωd,P dγ−ℓγ −1 +d ℓγ −γ +d −(ℓγ+1)s = Ωd,P dℓγ −γ . Furthermore, we have −(p+ 1)s <−ℓ γΓ(γ) =ℓ γ −γ,ifp=ℓ γ −1, −(p+ 1)s < p+ 1−γ < ℓ γ −γ,ifp < ℓ γ −1, where the first case uses s >Γ(γ) and (16), and the second case uses γ <(p+ 1)(s+ 1) by the definition ofp. Together withp < ℓ γ, we have Ωd,P dℓγ −γ ≫d p−γ +d −(p+1)s in prob. Case 3:γ∈...

work page 2025

[1] [1]

Optimal rates for the regularized least-squares algorithm,

PMLR. Brown, L. D. and M. G. Low (1996). Asymptotic equivalence of nonparametric regression and white noise.The Annals of Statistics 24(6), 2384–2398. Buchholz, S. (2022). Kernel interpolation in sobolev spaces is not consistent in low dimensions. In Conference on Learning Theory, pp. 3410–3440. PMLR. Caponnetto, A. (2006, September). Optimal rates for re...

work page arXiv 1996

[2] [2]

Step (IV).Finally we combine the estimates from Step (II) (variance) and Step (III) (bias decomposition) to obtain the learning curve

Notice that for kernel ridge regression, we have φλ, ˜K :=n −1φKRR λ ( ˜K/n) = ( ˜K+nλI n)−1 = ˜M −1.(87) 70 Learning curves in large dimensions Moreover, from Step (I), the generalization of Lemma G.8 and Lemma G.6 (v) imply that ˜Σ≤˜ℓ − ˜Σ≤˜ℓ ˜Ψ⊤ ≤˜ℓ ˜M −1 ˜Ψ≤˜ℓ ˜Σ≤˜ℓ = ˜Σ−1 ≤˜ℓ + ˜Ψ⊤ ≤˜ℓ ˜M −1 >˜ℓ ˜Ψ≤˜ℓ −1 ,(88) λmax ˜Σ−1 ≤˜ℓ + ˜Ψ⊤ ≤˜ℓ ˜M −1 >˜ℓ ˜Ψ≤˜ℓ ...

work page 2024

[3] [3]

dY i=1 ˜L(αi) ki (x(i)) #

Therefore, we have Ωd,P dγ−ℓγ −1 +d ℓγ −γ +d −(ℓγ+1)s = Ωd,P dℓγ −γ . Furthermore, we have −(p+ 1)s <−ℓ γΓ(γ) =ℓ γ −γ,ifp=ℓ γ −1, −(p+ 1)s < p+ 1−γ < ℓ γ −γ,ifp < ℓ γ −1, where the first case uses s >Γ(γ) and (16), and the second case uses γ <(p+ 1)(s+ 1) by the definition ofp. Together withp < ℓ γ, we have Ωd,P dℓγ −γ ≫d p−γ +d −(p+1)s in prob. Case 3:γ∈...

work page 2025