pith. sign in

arxiv: 2605.18134 · v1 · pith:KZUAJWTPnew · submitted 2026-05-18 · 📊 stat.CO · stat.ME

Optimal Sampling for Kernel Quadrature on Unbounded Domains

Pith reviewed 2026-05-20 00:00 UTC · model grok-4.3

classification 📊 stat.CO stat.ME
keywords kernel quadratureworst-case errorminimax ratesrandomized samplingunbounded domainssampling distributionrobust quadrature
0
0 comments X

The pith

An explicit n-dependent sampling distribution achieves minimax rates for kernel quadrature on unbounded domains without needing the kernel.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a randomized sampling method for approximating integrals of smooth functions via kernel quadrature. The method uses a distribution that changes explicitly with the number of samples n, delivering the fastest possible worst-case error decay over functions with given smoothness, and it does so without any dependence on the particular kernel chosen. This holds for integrals against unbounded measures such as Gaussian and Student-t distributions. A sympathetic reader would value the result because it separates the achievement of optimal theoretical rates from the practical risk of kernel misspecification, offering a single recipe that remains reliable across different smoothness assumptions and domains.

Core claim

The authors construct an explicit, n-dependent sampling distribution for randomized kernel quadrature that achieves the minimax rate of n to the power minus alpha over d for the worst-case error over the smoothness class alpha in dimension d. This construction is kernel-agnostic and applies to unbounded sampling measures including the Gaussian and Student-t distributions, thereby providing both theoretical guarantees and a practical recipe for robust rate-optimal quadrature.

What carries the argument

The explicit n-dependent sampling distribution, which is designed to match the minimax worst-case error rate over smoothness classes independently of any kernel.

If this is right

  • Randomized quadrature methods can attain optimal rates while remaining insensitive to kernel choice.
  • The approach extends rate-optimal performance from compact to unbounded domains such as those with Gaussian or Student-t measures.
  • A single sampling recipe suffices for multiple smoothness classes without retuning.
  • Practical implementation becomes feasible without requiring kernel-specific point sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same construction principle could be tested on other unbounded measures, such as Cauchy distributions, to check whether the minimax rate still holds.
  • In applications with uncertain smoothness, the kernel-agnostic property might allow adaptive refinement of n without restarting the design.
  • The method may reduce the need for cross-validation over kernels in high-dimensional integration tasks.

Load-bearing premise

There exists an explicit n-dependent sampling distribution whose worst-case error matches the minimax rate over the smoothness class and does so independently of the kernel, for the chosen unbounded measures.

What would settle it

Generate samples from the proposed n-dependent distribution for increasing n, compute the empirical worst-case error for a fixed smoothness class alpha in moderate dimension d, and check whether the observed decay falls short of the rate n to the power minus alpha over d.

Figures

Figures reproduced from arXiv: 2605.18134 by Christian Robert (CEREMADE), Edoardo Bandoni (CEREMADE), Julien Stoehr (CEREMADE).

Figure 1
Figure 1. Figure 1: Trace plots of the Gaussian kernel hyperparameters obtained via a Metropolis-within￾Gibbs sampler with n = 100 points. A Markov chain of T = 1000 iterations was generated, with the first 200 samples discarded as burn-in, leaving 800 retained samples shown. The left panel displays the trace of the length-scale parameter ℓ, sampled using a random￾walk Metropolis–Hastings step with proposal step size sℓ = 0.2… view at source ↗
Figure 2
Figure 2. Figure 2: Left: posterior mean (solid lines) with ±2 standard deviation bands over 70 sequential design iterations for two sampling strategies: standard Gaussian and Qn; the true integral value is shown as a dashed red line. Right: posterior variance (log scale) versus the number of evaluations for both strategies. To propagate uncertainty from both the integrand and the random design, we repeat the experiment N tim… view at source ↗
Figure 3
Figure 3. Figure 3: Left: average posterior mean (solid lines) with 95% uncertainty bands over sequential de￾sign iterations, computed from 1,000 independent repetitions, for two sampling strategies: standard Gaussian and Qn; the true integral value is shown as a dashed red line. Right: total posterior variance versus the number of evaluations for both strategies. It is important to note that the Gaussian kernel induces a rep… view at source ↗
Figure 4
Figure 4. Figure 4: Left: posterior mean (solid lines) with ±2 standard deviation bands over 250 sequential design iterations for two sampling strategies: standard Gaussian and Qn; the true integral value is shown as a dashed red line. Right: posterior variance (log scale) versus the number of evaluations for both strategies. As in the previous experiments, the kernel hyperparameters are estimated in the same manner, but now … view at source ↗
Figure 5
Figure 5. Figure 5: Left: average posterior mean (solid lines) with 95% uncertainty bands over sequential de￾sign iterations, computed from 1,000 independent repetitions, for two sampling strategies: standard Gaussian and Qn; the true integral value is shown as a dashed red line. Right: total posterior variance versus the number of evaluations for both strategies. 5.4 Matérn Kernel and Student-t Measure We now consider a sett… view at source ↗
Figure 6
Figure 6. Figure 6: Left: posterior mean (solid lines) with ±2 standard deviation bands over the last 400 sequential design iterations for two sampling strategies: standard t4.49 and inflated t4.49; the true integral value is shown as a dashed red line. Right: posterior variance (log scale) versus the number of evaluations for both strategies [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Posterior mean with 95% credible intervals and posterior variance across iterations for Gaussian and Matérn 3/2 kernels. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
read the original abstract

Kernel quadrature is widely used to approximate integrals of smooth functions, with worst-case error typically decaying at the minimax rate $n^{-\alpha/d}$ for smoothness $\alpha$ in dimension $d$. Existing rate-optimal methods often depend on deterministic point sets tailored to a specific kernel, making them sensitive to misspecification and less robust in practice. In this work, we study randomized quadrature methods with a focus on robustness rather than kernel-specific optimality. We construct an explicit, $n$-dependent sampling distribution that achieves minimax rates for worst-case error over smoothness classes without requiring knowledge of the kernel. This kernel-agnostic design improves robustness while retaining optimal rates. Our analysis includes unbounded sampling measures such as Gaussian and Student-$t$ distributions, extending beyond compact domains. The results provide both theoretical guarantees and a practical recipe for robust, rate-optimal randomized quadrature.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to construct an explicit n-dependent sampling distribution π_n, independent of the kernel, that achieves the minimax worst-case quadrature error rate n^{-α/d} over the unit ball of a smoothness class of order α in dimension d. The analysis extends this construction and rate to unbounded domains equipped with Gaussian and Student-t measures, providing both theoretical guarantees and a practical sampling recipe for robust randomized kernel quadrature.

Significance. If the central claims hold, the work would be a useful contribution to statistical computing by supplying a kernel-agnostic sampling rule that attains optimal rates on non-compact supports. The explicit, n-dependent construction (rather than a fitted or kernel-dependent one) and the extension beyond compact domains are concrete strengths that could improve robustness in applications where the kernel is misspecified.

major comments (2)
  1. [§5] §5 (unbounded-domain analysis): the argument that the tail contribution to the worst-case error under π_n decays at the full minimax rate n^{-α/d} for all functions in the smoothness class is not yet load-bearing; the sketch leaves open an additive term arising when the sampling density is small relative to the target measure far from the origin, and this term must be shown to be o(n^{-α/d}) uniformly over the unit ball without reintroducing kernel dependence.
  2. [Theorem 3] Theorem 3 (main rate statement): the reduction from the randomized quadrature error to the minimax rate over the smoothness class assumes that the explicit π_n can be chosen independently of both the kernel and the precise tail decay of the target measure; a concrete counter-example or additional hypothesis on the measure tails would be needed to confirm this holds for Student-t with low degrees of freedom.
minor comments (2)
  1. [§2] The notation for the smoothness class (e.g., Sobolev or RKHS ball) is introduced only in the abstract and could be restated with a precise norm definition in §2 to improve readability.
  2. [Figure 2] Figure 2 (empirical rates) would benefit from error bars or multiple random seeds to illustrate variability of the proposed sampler.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive comments. We respond to each major comment below and indicate how we will revise the paper accordingly.

read point-by-point responses
  1. Referee: [§5] §5 (unbounded-domain analysis): the argument that the tail contribution to the worst-case error under π_n decays at the full minimax rate n^{-α/d} for all functions in the smoothness class is not yet load-bearing; the sketch leaves open an additive term arising when the sampling density is small relative to the target measure far from the origin, and this term must be shown to be o(n^{-α/d}) uniformly over the unit ball without reintroducing kernel dependence.

    Authors: We agree that the current presentation in §5 is a sketch and requires a more detailed argument to be fully rigorous. In the revision, we will provide an expanded proof that bounds the tail contribution explicitly. Using the definition of the smoothness class (which controls the behavior of functions and their derivatives at infinity) and the explicit form of π_n (which has sufficient mass in the tails relative to the target measure), we show that this additive term is indeed o(n^{-α/d}) uniformly over the unit ball. Importantly, this bound does not depend on the kernel, as it is derived from the worst-case error over the function class alone. revision: yes

  2. Referee: [Theorem 3] Theorem 3 (main rate statement): the reduction from the randomized quadrature error to the minimax rate over the smoothness class assumes that the explicit π_n can be chosen independently of both the kernel and the precise tail decay of the target measure; a concrete counter-example or additional hypothesis on the measure tails would be needed to confirm this holds for Student-t with low degrees of freedom.

    Authors: The π_n is constructed independently of the kernel, depending only on n, d, and α. For the target measures, including Student-t, the analysis relies on the measure having finite moments of order sufficient for the smoothness class to be embedded appropriately. We will revise Theorem 3 to include an explicit hypothesis on the tail decay of the measure (such as the existence of moments up to a certain order depending on α and d). For Student-t distributions, this holds when the degrees of freedom exceed a threshold determined by α and d, which covers the cases of interest in the paper. We do not believe a counter-example is necessary as the current proof sketch extends to these cases under the stated conditions; however, we will clarify this in the revision. revision: partial

Circularity Check

0 steps flagged

Explicit n-dependent sampling construction for kernel-agnostic quadrature rates shows no load-bearing circularity.

full rationale

The paper's central claim is an explicit construction of a sampling distribution π_n (n-dependent but kernel-independent) whose worst-case error over Sobolev-type smoothness classes matches the minimax rate n^{-α/d} for unbounded measures such as Gaussian and Student-t. No equations or steps in the provided abstract or claims reduce this construction or the rate achievement to a fitted parameter, self-definition, or self-citation chain; the design is presented as derived from robustness considerations rather than kernel-specific optimization. The analysis for tail control on non-compact supports is framed as part of the theoretical guarantee rather than presupposing the result. This yields a self-contained derivation with independent content, consistent with a minimal circularity score.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from kernel quadrature theory about function smoothness classes and the ability to define sampling measures on unbounded domains.

axioms (1)
  • domain assumption The target functions lie in a smoothness class of order alpha in dimension d, for which the minimax rate is n^{-alpha/d}.
    The rate optimality claim is defined with respect to this class.

pith-pipeline@v0.9.0 · 5683 in / 1158 out tokens · 35211 ms · 2026-05-20T00:00:43.490572+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

  1. [1]

    Adachi, S

    M. Adachi, S. Hayakawa, M. J rgensen, H. Oberhauser, and M. A. Osborne. Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination . In Advances in Neural Information Processing Systems (NeurIPS 2022), volume 35, pages 16533--16547, 2022

  2. [2]

    R. A. Adams. Sobolev Spaces. Academic Press, New York, 1975

  3. [3]

    F. Bach. On the equivalence between kernel quadrature rules and random feature expansions. Journal of Machine Learning Research, 18 0 (21): 0 1--38, 2017

  4. [4]

    Belhadji, R

    A. Belhadji, R. Bardenet, and P. Chainais. Kernel quadrature with dpps. In Advances in Neural Information Processing Systems, 2019

  5. [5]

    Berlinet and C

    A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, New York, NY, USA, 2011. ISBN 9781441973104

  6. [6]

    Boucheron, G

    S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford, UK, 2013

  7. [7]

    Briol, C

    F. Briol, C. J. Oates, M. Girolami, and M. A. Osborne. Frank‑wolfe bayesian quadrature: Probabilistic integration with theoretical guarantees. In Advances in Neural Information Processing Systems, volume 28, pages 1162--1170. NeurIPS, 2015

  8. [8]

    Briol, C

    F.-X. Briol, C. J. Oates, J. Cockayne, W. Y. Chen, and M. Girolami. On the sampling problem for kernel quadrature. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 586--595. PMLR, 06--11 Aug 2017

  9. [9]

    Briol, C

    F.-X. Briol, C. J. Oates, M. Girolami, M. A. Osborne, and D. Sejdinovic. Probabilistic integration: A role in statistical computation? Statistical Science, 34 0 (1): 0 1--22, 2019. doi:10.1214/18-STS660

  10. [10]

    Z. Chen, M. Naslidnyk, and F.-X. Briol. Nested expectations with kernel quadrature. In Proceedings of the 42nd International Conference on Machine Learning, volume 267 of Proceedings of Machine Learning Research, pages 8760--8793. PMLR, 2025

  11. [11]

    R. Durrett. Probability: Theory and Examples. Cambridge University Press, Cambridge, UK, 5th edition, 2019. ISBN 9781108473682

  12. [12]

    W. Feller. An Introduction to Probability Theory and Its Applications, Volume II. John Wiley & Sons, New York, NY, USA, 2nd edition, 1971. ISBN 9780471257097

  13. [13]

    F. L. Fern \'a ndez, L. Martino, V. Elvira, D. Delgado-G \'o mez, and J. L \'o pez-Santiago. Adaptive quadrature schemes for bayesian inference via active learning. IEEE Access, 8: 0 208462--208483, 2020. doi:10.1109/ACCESS.2020.3038333

  14. [14]

    Fisher, C

    M. Fisher, C. Oates, C. Powell, and A. Teckentrup. A locally adaptive bayesian cubature method. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 1265--1275. PMLR, 2020 a

  15. [15]

    Fisher, C

    M. Fisher, C. Oates, C. Powell, and A. Teckentrup. A locally adaptive bayesian cubature method. In S. Chiappa and R. Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 1265--1275. PMLR, 2020 b

  16. [16]

    Gunter, M

    T. Gunter, M. A. Osborne, R. Garnett, P. Hennig, and S. J. Roberts. Sampling for inference in probabilistic models with fast bayesian quadrature. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2789--2797. Curran Associates, Inc., 2014

  17. [17]

    Hayakawa, H

    S. Hayakawa, H. Oberhauser, and T. J. Lyons. Positively weighted kernel quadrature via subsampling. In Advances in Neural Information Processing Systems 35, 2022

  18. [18]

    Hennig, M

    P. Hennig, M. A. Osborne, and M. Girolami. Probabilistic Numerics: Computation as Machine Learning. Cambridge University Press, 2022. doi:10.1017/9781108671879

  19. [19]

    Kanagawa and P

    M. Kanagawa and P. Hennig. Convergence guarantees for adaptive bayesian quadrature methods. In Advances in Neural Information Processing Systems 32, pages 6234--6245. Curran Associates, Inc., 2019

  20. [20]

    Kanagawa, B

    M. Kanagawa, B. K. Sriperumbudur, and K. Fukumizu. Convergence analysis of deterministic kernel-based quadrature rules in misspecified settings. Foundations of Computational Mathematics, 20 0 (1): 0 155--194, 2020 a

  21. [21]

    Kanagawa, B

    M. Kanagawa, B. K. Sriperumbudur, and K. Fukumizu. Convergence analysis of deterministic kernel-based quadrature rules in misspecified settings. Foundations of Computational Mathematics, 20: 0 155--194, 2020 b . doi:10.1007/s10208-018-09407-7

  22. [22]

    Karvonen and S

    T. Karvonen and S. S \"a rkk \"a . Classical quadrature rules via gaussian processes. In Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pages 31--46, Tokyo, Japan, 2017. doi:10.1109/MLSP.2017.8168195

  23. [23]

    Karvonen, M

    T. Karvonen, M. Kanagawa, and S. S\"arkk\"a. On the positivity and magnitudes of bayesian quadrature weights. Statistics and Computing, 29: 0 1317--1333, 2019. doi:10.1007/s11222-019-09901-0

  24. [24]

    Mahsereci and T

    M. Mahsereci and T. Karvonen. Bayesian quadrature: Gaussian processes for integration. arXiv:2602.16218, 2026. arXiv preprint

  25. [25]

    P. Massart. Concentration Inequalities and Model Selection, volume 1896 of Lecture Notes in Mathematics. Springer, Berlin, Heidelberg, 2007. ISBN 978-3-540-37575-5

  26. [26]

    E. Novak. Deterministic and Stochastic Error Bounds in Numerical Analysis, volume 1349 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, Heidelberg, 1988. ISBN 978-3-540-50368-2

  27. [27]

    M. A. Osborne, R. Garnett, Z. Ghahramani, D. Duvenaud, S. J. Roberts, and C. E. Rasmussen. Active learning of model evidence using bayesian quadrature. In Advances in Neural Information Processing Systems (NeurIPS), volume 25, pages 46--54. Curran Associates, Inc., 2012

  28. [28]

    u her, F. Tronarp, T. Karvonen, S. S \

    J. Pr \"u her, F. Tronarp, T. Karvonen, S. S \"a rkk \"a , and O. Straka. Student-t process quadratures for filtering of non-linear systems with heavy-tailed noise. In 20th International Conference on Information Fusion (FUSION), pages 875--882. IEEE, 2017

  29. [29]

    C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006

  30. [30]

    C. P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer, New York, 2nd edition, 2004

  31. [31]

    G. O. Roberts and J. S. Rosenthal. Harris recurrence of metropolis-within-gibbs and trans-dimensional markov chains. The Annals of Applied Probability, 16 0 (4): 0 2123--2139, Nov. 2006

  32. [32]

    S \"a rkk \"a , J

    S. S \"a rkk \"a , J. Hartikainen, L. Svensson, and F. Sandblom. On the relation between gaussian process quadratures and sigma-point methods. Journal of Advances in Information Fusion, 11 0 (1): 0 31--46, 2016. doi:10.5270/JAIF11-1-204

  33. [33]

    A. Shah, A. G. Wilson, and Z. Ghahramani. Student-t processes as alternatives to gaussian processes. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 33 of Proceedings of Machine Learning Research, pages 877--885. PMLR, 2014

  34. [34]

    Simpson, G

    D. Simpson, G. Riutort-Mayol, M. Binois, et al. Marginalising over stationary kernels with B ayesian quadrature. Advances in Neural Information Processing Systems, 2021

  35. [35]

    A. V. Suldin. Wiener measure and its applications to approximation methods. i. Izvestiya Vysshikh Uchebnykh Zavedenii. Matematika, pages 145--158, 1959. in Russian

  36. [36]

    Batch Selection for Parallelisation of Bayesian Quadrature

    E. Wagstaff, S. Hamid, and M. A. Osborne. Batch selection for parallelisation of bayesian quadrature. arXiv preprint arXiv:1812.01553, 2018

  37. [37]

    Wendland

    H. Wendland. Scattered Data Approximation. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2004

  38. [38]

    Wendland and C

    H. Wendland and C. Rieger. Approximate interpolation with applications to selecting smoothing parameters. Numerische Mathematik, 101 0 (4): 0 729--748, 2005. doi:10.1007/s00211-005-0637-y

  39. [39]

    Wu and R

    Z. Wu and R. Schaback. Local error estimates for radial basis function interpolation of scattered data. IMA Journal of Numerical Analysis, 13 0 (1): 0 13--27, 1993

  40. [40]

    Wynne, F.-X

    G. Wynne, F.-X. Briol, and M. Girolami. Convergence guarantees for gaussian process means with misspecified likelihoods and smoothness. Journal of Machine Learning Research, 22 0 (123): 0 1--40, 2021