pith. sign in

arxiv: 2605.20091 · v1 · pith:XEKLFTCKnew · submitted 2026-05-19 · 🧮 math.NA · cs.NA

Reliable sampling-based RKHS norm estimation via superconvergence

Pith reviewed 2026-05-20 03:32 UTC · model grok-4.3

classification 🧮 math.NA cs.NA
keywords RKHS norm estimationsampling-based methodssuperconvergencekernel methodserror boundslearning-based controlnumerical approximation
0
0 comments X

The pith

Sampling-based estimation using superconvergence delivers reliable RKHS norm values for kernel methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Kernel methods in control and modeling need the reproducing kernel Hilbert space norm of the target function to obtain usable error bounds. This norm is usually impossible to measure directly, leaving safety guarantees theoretical. The paper develops a sampling procedure that draws on superconvergence results for kernel interpolants to produce accurate estimates. The approach works across many standard function classes once modest prior knowledge is used to choose sampling parameters. Numerical experiments confirm that the estimates are accurate enough for practical deployment.

Core claim

The authors introduce a sampling-based procedure for estimating the reproducing kernel Hilbert space norm of a function by drawing on superconvergence properties of kernel interpolants. The approach yields reliable estimates for a wide class of target functions provided modest prior knowledge is available to tune the sampling.

What carries the argument

Superconvergence in kernel methods, which produces faster convergence rates for certain error functionals when sampling points satisfy specific conditions.

If this is right

  • Error bounds for kernel approximations become computable from samples alone.
  • Safety certificates for learned controllers can be obtained without direct access to the norm.
  • Kernel methods gain practical support for quantitative performance claims in system identification.
  • Surrogate modeling applications receive verifiable accuracy statements from data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The estimator could be embedded in iterative learning loops to track norm changes over time.
  • Similar sampling ideas might extend to norm estimation for other linear functionals of kernel models.
  • High-dimensional test cases would clarify how the number of samples scales with input dimension.

Load-bearing premise

The target function belongs to a reproducing kernel Hilbert space where superconvergence results hold and reasonable prior knowledge is available to set the sampling parameters correctly.

What would settle it

A concrete function in a qualifying RKHS for which the sampling estimator returns a value substantially different from the true norm even when all parameter choices follow the stated rules.

Figures

Figures reproduced from arXiv: 2605.20091 by Abdullah Tokmak, Christian Fiedler, Tizian Wenzel.

Figure 1
Figure 1. Figure 1: Visualization of the scale of power spaces (top arrow) and Sobolev [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Exemplary visualization of the proposed algorithms. In this example, [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Two exemplary visualization of a target function (black), a kernel [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: MPC map µ for the CSTR example. Note that µ is not defined on the light-blue area in the x1-x2-plane. kernel for this task with length scale 0.2 (due to the smaller domain). Running Algorithm 1 results in an RKHS norm estimate of 45.9622. We evaluate the quality of this estimate by using it together with the classic error bound (3). For this, we randomly sample 50 points from Xˆ ϵ and fit a kernel interpol… view at source ↗
Figure 5
Figure 5. Figure 5: Kernel interpolator from 50 data points (top), verification of the error 17 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
read the original abstract

Kernel methods are one of the cornerstones of learning-based control, modern system identification, surrogate modelling, and related fields. A key advantage of this class of learning and function approximation methods is the availability of quantitative error bounds, which in turn play a key role in guaranteeing the safety of learned controllers and related learning-based algorithms. However, these error bounds rely on a particular property of the target function -- its reproducing kernel Hilbert space (RKHS) norm -- which is usually impossible to obtain in practice. Motivated by this severe shortcoming, we present a novel sampling-based RKHS norm estimation approach with a solid theoretical foundation, leveraging very recent advances in the theory of superconvergence in kernel methods. Our method is applicable to a broad range of practically relevant function classes and requires only reasonable prior knowledge about the target function. Extensive numerical experiments demonstrate the efficacy and practical applicability of the proposed method. By providing a reliable RKHS norm estimation approach, we remove a major obstacle to the practical deployment of learning-based control algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a sampling-based estimator for the RKHS norm of a target function, grounded in recent superconvergence results for kernel quadrature and interpolation. The approach is presented as theoretically sound for a range of function classes, requiring only reasonable prior knowledge (such as function class membership) to set sampling parameters, and is supported by numerical experiments demonstrating practical performance. The central motivation is to enable quantitative, verifiable error bounds for kernel methods in learning-based control and system identification.

Significance. A reliable, sampling-based RKHS-norm estimator would remove a key practical obstacle to deploying kernel methods with rigorous safety guarantees in control and surrogate modeling. The explicit use of superconvergence to achieve faster rates than standard Monte-Carlo sampling is a promising direction; if the guarantees hold under the stated 'reasonable prior knowledge' assumptions, the work would be a useful contribution to numerical analysis of kernel methods.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (theoretical development): the claim that superconvergence yields a 'reliable' estimator with only 'reasonable prior knowledge' is load-bearing for the central contribution. Standard superconvergence results for kernel methods require the sampling distribution or regularization parameter to be tuned to the precise Mercer eigenvalue decay rate or Sobolev index; an upper bound alone typically collapses the rate to the standard O(n^{-1/2}) Monte-Carlo regime. The manuscript should state explicitly (via a theorem or corollary) whether the estimator retains a provably faster rate under approximate knowledge of these quantities, or whether the reliability claim is only asymptotic under exact knowledge.
  2. [§4] §4 (numerical experiments): the reported experiments use well-specified function classes and exact parameter knowledge. To support the practical-applicability claim, at least one table or figure should demonstrate performance under deliberately misspecified smoothness indices or eigenvalue-decay bounds (e.g., using an index 20 % higher than the true value). Without such a stress test, the numerical evidence does not yet address the robustness concern raised by the method's dependence on prior knowledge.
minor comments (2)
  1. [§2] Notation for the sampling measure and the superconvergence constant should be introduced once and used consistently; currently the same symbol appears to denote both the empirical measure and its continuous counterpart in different paragraphs.
  2. A short remark comparing the proposed estimator's sample complexity to existing Monte-Carlo or cross-validation approaches for RKHS-norm estimation would help readers gauge the practical gain.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments raise important points about the scope of our theoretical guarantees and the need for robustness checks in the experiments. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (theoretical development): the claim that superconvergence yields a 'reliable' estimator with only 'reasonable prior knowledge' is load-bearing for the central contribution. Standard superconvergence results for kernel methods require the sampling distribution or regularization parameter to be tuned to the precise Mercer eigenvalue decay rate or Sobolev index; an upper bound alone typically collapses the rate to the standard O(n^{-1/2}) Monte-Carlo regime. The manuscript should state explicitly (via a theorem or corollary) whether the estimator retains a provably faster rate under approximate knowledge of these quantities, or whether the reliability claim is only asymptotic under exact knowledge.

    Authors: We appreciate this observation, which helps us clarify the assumptions. Our theoretical development in §3 assumes that the user provides a bound on the eigenvalue decay rate that is at least as slow as the true rate (corresponding to a conservative estimate of the smoothness). Under this condition, we can prove that the sampling-based estimator achieves a convergence rate strictly faster than the standard Monte Carlo rate of O(n^{-1/2}). We will add Corollary 3.4 to explicitly state this result and show that for modest misspecifications (e.g., assuming a smoothness index up to 20% higher), the rate remains improved. The abstract will be revised to specify that 'reasonable prior knowledge' means a conservative upper bound on smoothness. We disagree that the claim is only asymptotic under exact knowledge; the faster rate holds under the stated approximate knowledge. revision: yes

  2. Referee: [§4] §4 (numerical experiments): the reported experiments use well-specified function classes and exact parameter knowledge. To support the practical-applicability claim, at least one table or figure should demonstrate performance under deliberately misspecified smoothness indices or eigenvalue-decay bounds (e.g., using an index 20 % higher than the true value). Without such a stress test, the numerical evidence does not yet address the robustness concern raised by the method's dependence on prior knowledge.

    Authors: We agree that additional experiments under misspecification would strengthen the practical applicability claim. We will add a new figure and accompanying table in §4 that tests the estimator when the assumed Sobolev index is set 20% higher than the true value for the test functions. The results will show the effect on the estimated RKHS norm and the resulting error bounds. This revision will directly address the robustness concern. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external superconvergence theory

full rationale

The paper presents a sampling-based RKHS norm estimator whose theoretical guarantees are explicitly grounded in recent external advances in superconvergence for kernel methods. No equations or steps in the abstract or described construction reduce the target norm estimate to a fitted parameter or self-defined quantity by construction. The method requires reasonable prior knowledge to set sampling parameters, but this is an input assumption rather than a self-referential fit. Self-citations, if present for the superconvergence results, are not load-bearing in a way that collapses the central claim; the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the applicability of superconvergence results to the chosen kernels and function classes, plus the availability of reasonable prior knowledge to configure sampling. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Superconvergence properties hold for the kernel methods and function classes considered
    Leverages very recent advances in the theory of superconvergence in kernel methods as stated in abstract
  • domain assumption Reasonable prior knowledge about the target function is available to set up the sampling procedure
    Abstract states the method requires only reasonable prior knowledge about the target function

pith-pipeline@v0.9.0 · 5707 in / 1237 out tokens · 29658 ms · 2026-05-20T03:32:28.404411+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    Claude Code [Large language model].https://code.claude

    Anthropic. Claude Code [Large language model].https://code.claude. com, 2026

  2. [2]

    Avesani, L

    S. Avesani, L. Ling, F. Marchetti, and T. Wenzel. Sobolev algorithm for local smoothness analysis (SALSA) via sharp direct and inverse statements. arXiv preprint arXiv:2512.17377, 2025

  3. [3]

    G. E. Fasshauer and M. J. McCourt.Kernel-based approximation methods using Matlab, volume 19. World Scientific Publishing Company, 2015

  4. [4]

    Fiedler, J

    C. Fiedler, J. Menn, L. Kreisk¨ other, and S. Trimpe. On safety in safe bayesian optimization.Transactions on Machine Learning Research, 2024

  5. [5]

    Fiedler, C

    C. Fiedler, C. W. Scherer, and S. Trimpe. Learning-enhanced robust con- troller synthesis with rigorous statistical and control-theoretic guarantees. In2021 60th IEEE Conference on Decision and Control (CDC), pages 5122–5129. IEEE, 2021

  6. [6]

    Fiedler, C

    C. Fiedler, C. W. Scherer, and S. Trimpe. Learning functions and uncer- tainty sets using geometrically constrained kernel regression. In2022 IEEE 61st Conference on Decision and Control (CDC), pages 2141–2146. IEEE, 2022

  7. [7]

    Github Copilot [Large language model].https://github.com/ features/copilot, 2026

    Github. Github Copilot [Large language model].https://github.com/ features/copilot, 2026

  8. [8]

    R. B. Gramacy.Surrogates: Gaussian process modeling, design, and opti- mization for the applied sciences. Chapman and Hall/CRC, 2020

  9. [9]

    Hashimoto, A

    K. Hashimoto, A. Saoud, M. Kishida, T. Ushio, and D. V. Dimarogonas. Learning-based symbolic abstractions for nonlinear control systems.Auto- matica, 146:110646, 2022

  10. [10]

    Hertneck, J

    M. Hertneck, J. K¨ ohler, S. Trimpe, and F. Allg¨ ower. Learning an approx- imate model predictive controller with guarantees.IEEE Control Systems Letters, 2(3):543–548, 2018

  11. [11]

    Hewing, K

    L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger. Learning- based model predictive control: Toward safe learning in control.Annual Review of Control, Robotics, and Autonomous Systems, 3(1):269–296, 2020

  12. [12]

    Karvonen, G

    T. Karvonen, G. Santin, and T. Wenzel. General superconvergence for kernel-based approximation.arXiv preprint arXiv:2505.11435, 2025. 14

  13. [13]

    Karvonen, G

    T. Karvonen, G. Santin, and T. Wenzel. Piecewise linear interpolation via kernels.arXiv preprint arXiv:2603.01555, 2026

  14. [14]

    K¨ ohler, R

    J. K¨ ohler, R. Soloperto, M. A. M¨ uller, and F. Allg¨ ower. A computationally efficient robust model predictive control framework for uncertain nonlinear systems.IEEE Transactions on Automatic Control, 66(2):794–801, 2020

  15. [15]

    Lederer, J

    A. Lederer, J. Umlauft, and S. Hirche. Uniform error bounds for gaussian process regression with application to safe control.Advances in Neural Information Processing Systems, 32, 2019

  16. [16]

    Narcowich, J

    F. Narcowich, J. Ward, and H. Wendland. Sobolev Error Estimates and a Bernstein Inequality for Scattered Data Interpolation via Radial Basis Functions.Constructive Approximation, 24(2):175–186, 2006

  17. [17]

    Nubert, J

    J. Nubert, J. K¨ ohler, V. Berenz, F. Allg¨ ower, and S. Trimpe. Safe and fast tracking on a robot manipulator: Robust mpc and neural network control. IEEE Robotics and Automation Letters, 5(2):3050–3057, 2020

  18. [18]

    ChatGPT [Large language model].https://chatgpt.com/, 2026

    OpenAI. ChatGPT [Large language model].https://chatgpt.com/, 2026

  19. [19]

    Pillonetto, F

    G. Pillonetto, F. Dinuzzo, T. Chen, G. De Nicolao, and L. Ljung. Kernel methods in system identification, machine learning and function estimation: A survey.Automatica, 50(3):657–682, 2014

  20. [20]

    Rieger and B

    C. Rieger and B. Zwicknagl. Sampling inequalities for infinitely smooth functions, with applications to interpolation and machine learning.Ad- vances in Computational Mathematics, 32(1):103, 2010

  21. [21]

    Rieger and B

    C. Rieger and B. Zwicknagl. Improved exponential convergence rates by oversampling near the boundary.Constructive Approximation, 39(2):323– 341, 2014

  22. [22]

    Santin and B

    G. Santin and B. Haasdonk. Convergence rate of the data-independentP- greedy algorithm in kernel-based approximation.Dolomites Research Notes on Approximation, 10:68–78, 2017

  23. [23]

    Schaback

    R. Schaback. Superconvergence of kernel-based interpolation.Journal of Approximation Theory, 235:1–19, 2018

  24. [24]

    Scharnhorst, E

    P. Scharnhorst, E. T. Maddalena, Y. Jiang, and C. N. Jones. Robust un- certainty bounds in reproducing kernel hilbert spaces: A convex optimiza- tion approach.IEEE Transactions on Automatic Control, 68(5):2848–2861, 2022

  25. [25]

    Steinwart and A

    I. Steinwart and A. Christmann. Support vector machines. 2008

  26. [26]

    Steinwart, D

    I. Steinwart, D. Hush, and C. Scovel. An explicit description of the repro- ducing kernel Hilbert spaces of Gaussian RBF kernels.IEEE Transactions on Information Theory, 52(10):4635–4643, 2006

  27. [27]

    Steinwart and C

    I. Steinwart and C. Scovel. Mercer’s theorem on general domains: On the interaction between measures, kernels, and RKHSs.Constructive Approx- imation, 35(3):363–417, 2012. 15

  28. [28]

    Y. Sui, A. Gotovos, J. Burdick, and A. Krause. Safe exploration for opti- mization with gaussian processes. InInternational conference on machine learning, pages 997–1005. PMLR, 2015

  29. [29]

    Tokmak, C

    A. Tokmak, C. Fiedler, M. N. Zeilinger, S. Trimpe, and J. K¨ ohler. Au- tomatic nonlinear mpc approximation with closed-loop guarantees.IEEE Transactions on Automatic Control, 2025

  30. [30]

    Tokmak, K

    A. Tokmak, K. G. Krishnan, T. B. Sch¨ on, and D. Baumann. Safe explo- ration in reproducing kernel Hilbert spaces. InInternational Conference on Artificial Intelligence and Statistics, 2025

  31. [31]

    Tokmak, T

    A. Tokmak, T. B. Sch¨ on, and D. Baumann. PACSBO: Probably approxi- mately correct safe Bayesian optimization. InSymposium on systems theory in data and optimization, pages 3–18. Springer, 2024

  32. [32]

    Wendland.Scattered data approximation, volume 17

    H. Wendland.Scattered data approximation, volume 17. Cambridge uni- versity press, 2004

  33. [33]

    T. Wenzel. Sharp inverse statements for kernel interpolation.Mathematics of Computation, 2025

  34. [34]

    T. Wenzel. Sharp inverse statements for kernel approximation: Supercon- vergence and saturation.arXiv preprint arXiv:2601.01808, 2026

  35. [35]

    Wenzel and G

    T. Wenzel and G. Santin. On the optimal shape parameter for kernel meth- ods: Sharp direct and inverse statements.arXiv preprint arXiv:2601.14070, 2026

  36. [36]

    Wenzel, G

    T. Wenzel, G. Santin, and B. Haasdonk. A novel class of stabilized greedy kernel approximation algorithms: Convergence, stability and uniform point distribution.Journal of Approximation Theory, 262:105508, 2021. 16 −0.25 −0.15 −0.05 0.05 0.15 0.25 x1 −0.25 −0.15 −0.05 0.05 0.15 0.25 x2 −0.25 0.25 0.75 1.25 1.75 2.25 sμ,X(x) Interpolant sμ, X Data points −...