pith. sign in

arxiv: 2606.22993 · v1 · pith:KGBIHRNUnew · submitted 2026-06-22 · 🧮 math.ST · cs.LG· stat.ML· stat.TH

Generalized nonparametric regression in reproducing kernel Hilbert spaces: Consistency and rates of convergence

Pith reviewed 2026-06-26 06:37 UTC · model grok-4.3

classification 🧮 math.ST cs.LGstat.MLstat.TH
keywords nonparametric regressionreproducing kernel Hilbert spacesM-estimationconvergence ratesbias-variance decompositionSobolev spacesmixed smoothnessregularized estimation
0
0 comments X

The pith

Regularized M-estimation in reproducing kernel Hilbert spaces yields consistent estimators with sharp convergence rates via an explicit bias-variance decomposition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes existence and measurability for a broad class of regularized M-estimators in RKHS under mild loss conditions. It then derives sharp rates of convergence that decompose into bias and variance terms controlled by a new complexity measure. The variance term remains unaffected by model misspecification while the bias depends on a source condition. For tensor product Sobolev spaces the rates link to functions with dominating mixed smoothness and avoid the curse of dimensionality.

Core claim

Under mild conditions on the loss function, the regularized M-estimator in an RKHS exists and is measurable. Sharp rates follow from an asymptotic linearisation of the objective, with an explicit bias-variance split governed by a novel complexity measure. The variance is independent of misspecification while the bias is governed by the source condition parameter. In tensor product Sobolev spaces these rates connect to dominating mixed smoothness and thereby circumvent the curse of dimensionality.

What carries the argument

The novel complexity measure that governs the explicit bias-variance decomposition in the convergence rates for regularized M-estimators.

If this is right

  • Consistency holds for convex and non-convex losses including bounded robust losses.
  • Rates connect tensor product Sobolev spaces to spaces with dominating mixed smoothness.
  • The methodology allows asymptotic linearisation without closed-form solutions or global Lipschitz assumptions.
  • Estimators can be implemented in C++ with supporting numerical experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar bias-variance splits may apply to other function spaces beyond Sobolev.
  • The independence of variance from misspecification could simplify model checking in practice.
  • Extensions to non-regularized settings or different regularizers might follow from the linearisation technique.

Load-bearing premise

The loss function satisfies mild conditions that guarantee existence, measurability, and permit an asymptotic linearisation of the objective.

What would settle it

A counterexample where the variance term of the estimator depends on the degree of misspecification would disprove the claimed independence.

Figures

Figures reproduced from arXiv: 2606.22993 by Ioannis Kalogridis.

Figure 1
Figure 1. Figure 1: 20 estimates with β = 0.5 and Gaussian errors of the least squares (left) and least absolute deviations (right) estimators. The solid black line represents the true regression function. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: 20 representative estimates with β = 0.5 and t2 errors of the least squares (left) and least absolute deviations (right) estimators. The solid black line represents the true regression function. 6 Numerical illustrations In our numerical experiments, we study the effects of (i) heavy-tailed errors {ϵi} n i=1 (ii) target function roughness and (iii) dimensionality on the estimates. The estimators to be comp… view at source ↗
Figure 3
Figure 3. Figure 3: Contours of the two-dimensional true regression function (left) and representative least [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Contours of the two-dimensional true regression function (left) and representative least [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗
read the original abstract

We develop a comprehensive theory for regularized M-estimation in reproducing kernel Hilbert spaces. Under mild conditions on the loss we establish existence and measurability of the estimator, covering a wide range of convex and non-convex losses, including bounded robust losses. We further prove sharp rates of convergence with an explicit bias-variance decomposition governed by a novel complexity measure. We show that the variance is independent of misspecification, while the bias depends on a source condition parameter known in the learning literature. For tensor product Sobolev spaces we obtain new rates that connect to spaces of functions with dominating mixed smoothness, substantially extending existing results and explaining why these estimators circumvent the curse of dimensionality. Our methodology, combining elements from both functional analysis and empirical process theory, allows for an asymptotic linearisation of the objective function that avoids both closed-form solutions and global Lipschitz assumptions, and may be of independent interest. The estimators are implemented in C++ and theory is supported by numerical experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript develops a comprehensive theory for regularized M-estimation in reproducing kernel Hilbert spaces. Under mild conditions on the loss function it establishes existence, measurability, and consistency of the estimator for both convex and non-convex losses (including bounded robust losses). It proves sharp rates of convergence together with an explicit bias-variance decomposition governed by a novel complexity measure; the variance term is stated to be independent of misspecification while the bias depends on a source-condition parameter. For tensor-product Sobolev spaces the paper derives new rates that connect to spaces of dominating mixed smoothness and thereby avoid the curse of dimensionality. The key technical device is an asymptotic linearisation of the regularized objective that combines functional analysis and empirical-process arguments and does not require closed-form solutions or global Lipschitz conditions.

Significance. If the central derivations hold, the work supplies a unified framework for nonparametric M-estimation in RKHS that covers non-convex losses, furnishes an explicit bias-variance split, and yields dimension-independent rates via mixed-smoothness spaces. The asymptotic-linearisation technique may be of independent methodological interest. The combination of functional-analytic and empirical-process tools, together with the novel complexity measure, strengthens the literature on kernel methods beyond the usual convex, Lipschitz setting.

major comments (2)
  1. [Abstract / methodology paragraph] Abstract / methodology paragraph: the claim that the variance term in the bias-variance decomposition is independent of misspecification rests on an asymptotic linearisation that is asserted to hold under only mild loss conditions. For non-convex losses the linearisation is necessarily local around the population minimizer; because that minimizer itself shifts with the degree of misspecification, the leading stochastic term (and hence the variance) can inherit dependence on misspecification unless the paper supplies uniform control on the Hessian or an explicit decoupling argument. The manuscript must state the precise conditions that guarantee this independence and verify them for the bounded robust losses cited.
  2. [Section introducing the complexity measure (exact section number not visible in abstract)] The novel complexity measure is introduced as governing the rates; the manuscript should clarify whether this measure is defined independently of the estimator or whether its construction implicitly uses properties of the regularized M-estimator, as any circularity would undermine the claimed parameter-free character of the variance term.
minor comments (1)
  1. The abstract states that numerical experiments support the theory; a short description of the experimental design, loss functions tested, and observed rates would help readers assess the practical reach of the results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract / methodology paragraph] Abstract / methodology paragraph: the claim that the variance term in the bias-variance decomposition is independent of misspecification rests on an asymptotic linearisation that is asserted to hold under only mild loss conditions. For non-convex losses the linearisation is necessarily local around the population minimizer; because that minimizer itself shifts with the degree of misspecification, the leading stochastic term (and hence the variance) can inherit dependence on misspecification unless the paper supplies uniform control on the Hessian or an explicit decoupling argument. The manuscript must state the precise conditions that guarantee this independence and verify them for the bounded robust losses cited.

    Authors: The asymptotic linearisation holds under loss conditions that provide uniform Hessian control in a neighborhood of the population minimizer whose size is independent of misspecification; for the bounded robust losses this follows from global Lipschitz continuity together with a uniform bound on the second derivative. These conditions are already stated in the paper but we will add an explicit remark in the bias-variance section that isolates the decoupling argument and verifies it for the cited robust losses. revision: yes

  2. Referee: [Section introducing the complexity measure (exact section number not visible in abstract)] The novel complexity measure is introduced as governing the rates; the manuscript should clarify whether this measure is defined independently of the estimator or whether its construction implicitly uses properties of the regularized M-estimator, as any circularity would undermine the claimed parameter-free character of the variance term.

    Authors: The complexity measure is constructed solely from the RKHS geometry and the population risk (via entropy integrals over sublevel sets of the risk) and makes no reference to any estimator or sample. We will insert a short clarifying sentence in the section that introduces the measure to state this independence explicitly. revision: yes

Circularity Check

0 steps flagged

No circularity; derivations rely on external functional analysis and empirical process theory

full rationale

The paper's central results—an asymptotic linearisation of the regularized M-estimator, existence/measurability under mild loss conditions, and an explicit bias-variance decomposition governed by a novel complexity measure—are presented as obtained by combining standard tools from functional analysis and empirical process theory. The source condition is explicitly described as known in the learning literature rather than derived internally, and the complexity measure is introduced as new rather than fitted or self-referential. No self-citation load-bearing steps, self-definitional reductions, or renamings of known results appear in the provided claims or abstract. The derivation chain is therefore self-contained against external benchmarks and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Central claims rest on mild conditions on the loss and the introduction of a novel complexity measure; the source condition parameter is referenced as already known in the learning literature.

axioms (1)
  • domain assumption Mild conditions on the loss function
    Invoked to guarantee existence, measurability, and asymptotic linearisation for both convex and non-convex losses including bounded robust losses.
invented entities (1)
  • Novel complexity measure no independent evidence
    purpose: Governs the explicit bias-variance decomposition and controls the sharp rates of convergence
    Introduced to obtain the rates; no independent evidence or external validation supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5698 in / 1267 out tokens · 29812 ms · 2026-06-26T06:37:43.535843+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 1 canonical work pages

  1. [1]

    (1965).Lectures on Elliptic Boundary Value Problems, D

    Agmon, S. (1965).Lectures on Elliptic Boundary Value Problems, D. Van Nostrand Company, Princeton

  2. [2]

    and Wu, Y

    Bai, Z.D. and Wu, Y. (1994). Limiting behavior of M-estimators of regression coefficients in high dimensional linear models I. Scale-dependent case J. Multivariate Anal. (51), 211–239

  3. [3]

    and Thomas-Agnan, C

    Berlinet, A. and Thomas-Agnan, C. (2004).Reproducing Kernel Hilbert Spaces in Probability and

  4. [4]

    and Zhou, D.X

    Cucker, F. and Zhou, D.X. (2005).Learning Theory: An Approximation Theory Viewpoint, Cam- bridge University Press

  5. [5]

    and De Vito, E

    Caponnetto, A. and De Vito, E. (2007). Optimal rates for the regularized least-squares algorithm, Found. Comput. Math. (7), 331–368

  6. [6]

    and Pichler, A

    Dommel, P. and Pichler, A. (2025). On the approximation of kernel functions, J. Mach. Learn. Res. (26), 1–30

  7. [7]

    (1977) Splines minimizing rotation-invariant semi-norms in Sobolev spaces, in:

    Duchon, J. (1977) Splines minimizing rotation-invariant semi-norms in Sobolev spaces, in:

  8. [8]

    Splines Minimizing Rotation-Invariant Semi-Norms in

    Schempp, W. and Zeller, K. (eds), Constructive Theory of Functions of Several Variables, Lecture Notes in Mathematics (571), Springer, Berlin, Heidelberg.doi:10.1007/BFb0086566

  9. [9]

    and Steinwart, I

    Eberts, M. and Steinwart, I. (2013). Optimal regression rates for SVMs using Gaussian kernels, Electron. J. Stat. (7), 1–42

  10. [10]

    and Steinwart, I

    Farooq, M. and Steinwart, I. (2019). Learning rates for kernel-based expectile regression, Mach. Learn. (108), 203–227

  11. [11]

    and Steinwart, I

    Fischer, S. and Steinwart, I. (2020). Sobolev norm learning rates for regularized least-squares algorithms, J. Mach. Learn. Res. (21), 1–38

  12. [12]

    and Eubank, R

    Hsing, T. and Eubank, R. (2015).Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators, Wiley. 29

  13. [13]

    and Ronchetti, E.M

    Huber, P.J. and Ronchetti, E.M. (2009).Robust Statistics, 2nd edition, Wiley

  14. [14]

    Kalogridis, I. (2022). Asymptotics for M-type smoothing splines with non-smooth objective func- tions, TEST (31), 373–389

  15. [15]

    and Van Aelst, S

    Kalogridis, I. and Van Aelst, S. (2023). Robust optimal estimation of location from discretely sampled functional data, Scand. J. Stat (50), 411–451

  16. [16]

    and Talagrand, M

    Ledoux, M. and Talagrand, M. (1991).Probability in Banach Spaces: Isoperimetry and Processes

  17. [17]

    and Zhu, J

    Li, Y., Liu, Y. and Zhu, J. (2007). Quantile regression in reproducing kernel Hilbert spaces, J. Amer. Statist. Assoc. (102), 255–268

  18. [18]

    Lin, Y. (2000). Tensor product space ANOVA models, Ann. Statist. (28), 734–755

  19. [19]

    and Salibi´ an-Barrera, M

    Maronna, R.A., Martin, D.R., Yohai, V.J. and Salibi´ an-Barrera, M. (2019).Robust Statistics: Theory and Methods (with R), 2nd edition, Wiley

  20. [20]

    and O’Connell, M

    Nyckha, D., Gray, G., Haaland, P., Martin, D. and O’Connell, M. (1995). A nonparametric re- gression approach to syringe grading for quality improvement, J. Amer. Statist. Assoc. (90), 1171–1178. R Core Team (2026).R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria,https://www.R-project.org/

  21. [21]

    and Wets, R

    Rockafellar, R.T. and Wets, R. J.-B. (1998).Variational Analysis, Springer

  22. [22]

    and Harchaoui, Z

    Scetbon, M. and Harchaoui, Z. (2021). A spectral analysis of dot-product kernels, Proc. Mach. Learn. Res. (130), 3394–3402. Sch¨ olkopf, B. and Smola, A.J. (2002).Learning with Kernels: Support Vector Machines, Regular- ization, Optimization, and Beyond, MIT Press

  23. [23]

    and Ullrich, T

    Sickel, W. and Ullrich, T. (2009). Tensor products of Sobolev–Besov spaces and applications to approximation from the hyperbolic cross, J. Approx. Theory (161), 748–786

  24. [24]

    and Zhou, D.-X

    Smale, S. and Zhou, D.-X. (2005). Shannon sampling II: Connections to learning theory, Appl. Comput. Harmon. Anal. (19), 285–302

  25. [25]

    and Christmann, A

    Steinwart, I. and Christmann, A. (2008).Support Vector Machines, Springer

  26. [26]

    and Scovel, C

    Steinwart, I. and Scovel, C. (2012). Mercer’s theorem on general domains: On the interaction between measures, kernels and RKHSs, Constr. Approx. (35), 363–417. 30

  27. [27]

    (2019).Function Spaces with Dominating Mixed Smoothness, European Mathematical Society

    Triebel, H. (2019).Function Spaces with Dominating Mixed Smoothness, European Mathematical Society. van de Geer, S. (2000).Empirical Processes in M-estimation, Cambridge University Press. van der Vaart, A.W. and Wellner, J.A. (1996).Weak Convergence and Empirical Processes. Springer

  28. [28]

    (2019).High-Dimensional Statistics: A Non-Asymptotic Viewpoint, Cambridge University Press

    Wainwright, M.J. (2019).High-Dimensional Statistics: A Non-Asymptotic Viewpoint, Cambridge University Press

  29. [29]

    (2005).Scattered Data Approximation, Cambridge University Press

    Wendland, H. (2005).Scattered Data Approximation, Cambridge University Press

  30. [30]

    and Lin, Q

    Zhang, H., Li, Y., Lu, W. and Lin, Q. (2023). On the optimality of misspecified kernel ridge regression, Proc. Mach. Learn. Res. (202), 41331–41353. 31