Optimal Sampling for Kernel Quadrature on Unbounded Domains
Pith reviewed 2026-05-20 00:00 UTC · model grok-4.3
The pith
An explicit n-dependent sampling distribution achieves minimax rates for kernel quadrature on unbounded domains without needing the kernel.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors construct an explicit, n-dependent sampling distribution for randomized kernel quadrature that achieves the minimax rate of n to the power minus alpha over d for the worst-case error over the smoothness class alpha in dimension d. This construction is kernel-agnostic and applies to unbounded sampling measures including the Gaussian and Student-t distributions, thereby providing both theoretical guarantees and a practical recipe for robust rate-optimal quadrature.
What carries the argument
The explicit n-dependent sampling distribution, which is designed to match the minimax worst-case error rate over smoothness classes independently of any kernel.
If this is right
- Randomized quadrature methods can attain optimal rates while remaining insensitive to kernel choice.
- The approach extends rate-optimal performance from compact to unbounded domains such as those with Gaussian or Student-t measures.
- A single sampling recipe suffices for multiple smoothness classes without retuning.
- Practical implementation becomes feasible without requiring kernel-specific point sets.
Where Pith is reading between the lines
- The same construction principle could be tested on other unbounded measures, such as Cauchy distributions, to check whether the minimax rate still holds.
- In applications with uncertain smoothness, the kernel-agnostic property might allow adaptive refinement of n without restarting the design.
- The method may reduce the need for cross-validation over kernels in high-dimensional integration tasks.
Load-bearing premise
There exists an explicit n-dependent sampling distribution whose worst-case error matches the minimax rate over the smoothness class and does so independently of the kernel, for the chosen unbounded measures.
What would settle it
Generate samples from the proposed n-dependent distribution for increasing n, compute the empirical worst-case error for a fixed smoothness class alpha in moderate dimension d, and check whether the observed decay falls short of the rate n to the power minus alpha over d.
Figures
read the original abstract
Kernel quadrature is widely used to approximate integrals of smooth functions, with worst-case error typically decaying at the minimax rate $n^{-\alpha/d}$ for smoothness $\alpha$ in dimension $d$. Existing rate-optimal methods often depend on deterministic point sets tailored to a specific kernel, making them sensitive to misspecification and less robust in practice. In this work, we study randomized quadrature methods with a focus on robustness rather than kernel-specific optimality. We construct an explicit, $n$-dependent sampling distribution that achieves minimax rates for worst-case error over smoothness classes without requiring knowledge of the kernel. This kernel-agnostic design improves robustness while retaining optimal rates. Our analysis includes unbounded sampling measures such as Gaussian and Student-$t$ distributions, extending beyond compact domains. The results provide both theoretical guarantees and a practical recipe for robust, rate-optimal randomized quadrature.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to construct an explicit n-dependent sampling distribution π_n, independent of the kernel, that achieves the minimax worst-case quadrature error rate n^{-α/d} over the unit ball of a smoothness class of order α in dimension d. The analysis extends this construction and rate to unbounded domains equipped with Gaussian and Student-t measures, providing both theoretical guarantees and a practical sampling recipe for robust randomized kernel quadrature.
Significance. If the central claims hold, the work would be a useful contribution to statistical computing by supplying a kernel-agnostic sampling rule that attains optimal rates on non-compact supports. The explicit, n-dependent construction (rather than a fitted or kernel-dependent one) and the extension beyond compact domains are concrete strengths that could improve robustness in applications where the kernel is misspecified.
major comments (2)
- [§5] §5 (unbounded-domain analysis): the argument that the tail contribution to the worst-case error under π_n decays at the full minimax rate n^{-α/d} for all functions in the smoothness class is not yet load-bearing; the sketch leaves open an additive term arising when the sampling density is small relative to the target measure far from the origin, and this term must be shown to be o(n^{-α/d}) uniformly over the unit ball without reintroducing kernel dependence.
- [Theorem 3] Theorem 3 (main rate statement): the reduction from the randomized quadrature error to the minimax rate over the smoothness class assumes that the explicit π_n can be chosen independently of both the kernel and the precise tail decay of the target measure; a concrete counter-example or additional hypothesis on the measure tails would be needed to confirm this holds for Student-t with low degrees of freedom.
minor comments (2)
- [§2] The notation for the smoothness class (e.g., Sobolev or RKHS ball) is introduced only in the abstract and could be restated with a precise norm definition in §2 to improve readability.
- [Figure 2] Figure 2 (empirical rates) would benefit from error bars or multiple random seeds to illustrate variability of the proposed sampler.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for the constructive comments. We respond to each major comment below and indicate how we will revise the paper accordingly.
read point-by-point responses
-
Referee: [§5] §5 (unbounded-domain analysis): the argument that the tail contribution to the worst-case error under π_n decays at the full minimax rate n^{-α/d} for all functions in the smoothness class is not yet load-bearing; the sketch leaves open an additive term arising when the sampling density is small relative to the target measure far from the origin, and this term must be shown to be o(n^{-α/d}) uniformly over the unit ball without reintroducing kernel dependence.
Authors: We agree that the current presentation in §5 is a sketch and requires a more detailed argument to be fully rigorous. In the revision, we will provide an expanded proof that bounds the tail contribution explicitly. Using the definition of the smoothness class (which controls the behavior of functions and their derivatives at infinity) and the explicit form of π_n (which has sufficient mass in the tails relative to the target measure), we show that this additive term is indeed o(n^{-α/d}) uniformly over the unit ball. Importantly, this bound does not depend on the kernel, as it is derived from the worst-case error over the function class alone. revision: yes
-
Referee: [Theorem 3] Theorem 3 (main rate statement): the reduction from the randomized quadrature error to the minimax rate over the smoothness class assumes that the explicit π_n can be chosen independently of both the kernel and the precise tail decay of the target measure; a concrete counter-example or additional hypothesis on the measure tails would be needed to confirm this holds for Student-t with low degrees of freedom.
Authors: The π_n is constructed independently of the kernel, depending only on n, d, and α. For the target measures, including Student-t, the analysis relies on the measure having finite moments of order sufficient for the smoothness class to be embedded appropriately. We will revise Theorem 3 to include an explicit hypothesis on the tail decay of the measure (such as the existence of moments up to a certain order depending on α and d). For Student-t distributions, this holds when the degrees of freedom exceed a threshold determined by α and d, which covers the cases of interest in the paper. We do not believe a counter-example is necessary as the current proof sketch extends to these cases under the stated conditions; however, we will clarify this in the revision. revision: partial
Circularity Check
Explicit n-dependent sampling construction for kernel-agnostic quadrature rates shows no load-bearing circularity.
full rationale
The paper's central claim is an explicit construction of a sampling distribution π_n (n-dependent but kernel-independent) whose worst-case error over Sobolev-type smoothness classes matches the minimax rate n^{-α/d} for unbounded measures such as Gaussian and Student-t. No equations or steps in the provided abstract or claims reduce this construction or the rate achievement to a fitted parameter, self-definition, or self-citation chain; the design is presented as derived from robustness considerations rather than kernel-specific optimization. The analysis for tail control on non-compact supports is framed as part of the theoretical guarantee rather than presupposing the result. This yields a self-contained derivation with independent content, consistent with a minimal circularity score.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The target functions lie in a smoothness class of order alpha in dimension d, for which the minimax rate is n^{-alpha/d}.
Reference graph
Works this paper leans on
- [1]
-
[2]
R. A. Adams. Sobolev Spaces. Academic Press, New York, 1975
work page 1975
-
[3]
F. Bach. On the equivalence between kernel quadrature rules and random feature expansions. Journal of Machine Learning Research, 18 0 (21): 0 1--38, 2017
work page 2017
-
[4]
A. Belhadji, R. Bardenet, and P. Chainais. Kernel quadrature with dpps. In Advances in Neural Information Processing Systems, 2019
work page 2019
-
[5]
A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, New York, NY, USA, 2011. ISBN 9781441973104
work page 2011
-
[6]
S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford, UK, 2013
work page 2013
- [7]
-
[8]
F.-X. Briol, C. J. Oates, J. Cockayne, W. Y. Chen, and M. Girolami. On the sampling problem for kernel quadrature. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 586--595. PMLR, 06--11 Aug 2017
work page 2017
-
[9]
F.-X. Briol, C. J. Oates, M. Girolami, M. A. Osborne, and D. Sejdinovic. Probabilistic integration: A role in statistical computation? Statistical Science, 34 0 (1): 0 1--22, 2019. doi:10.1214/18-STS660
-
[10]
Z. Chen, M. Naslidnyk, and F.-X. Briol. Nested expectations with kernel quadrature. In Proceedings of the 42nd International Conference on Machine Learning, volume 267 of Proceedings of Machine Learning Research, pages 8760--8793. PMLR, 2025
work page 2025
-
[11]
R. Durrett. Probability: Theory and Examples. Cambridge University Press, Cambridge, UK, 5th edition, 2019. ISBN 9781108473682
work page 2019
-
[12]
W. Feller. An Introduction to Probability Theory and Its Applications, Volume II. John Wiley & Sons, New York, NY, USA, 2nd edition, 1971. ISBN 9780471257097
work page 1971
-
[13]
F. L. Fern \'a ndez, L. Martino, V. Elvira, D. Delgado-G \'o mez, and J. L \'o pez-Santiago. Adaptive quadrature schemes for bayesian inference via active learning. IEEE Access, 8: 0 208462--208483, 2020. doi:10.1109/ACCESS.2020.3038333
-
[14]
M. Fisher, C. Oates, C. Powell, and A. Teckentrup. A locally adaptive bayesian cubature method. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 1265--1275. PMLR, 2020 a
work page 2020
-
[15]
M. Fisher, C. Oates, C. Powell, and A. Teckentrup. A locally adaptive bayesian cubature method. In S. Chiappa and R. Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 1265--1275. PMLR, 2020 b
work page 2020
-
[16]
T. Gunter, M. A. Osborne, R. Garnett, P. Hennig, and S. J. Roberts. Sampling for inference in probabilistic models with fast bayesian quadrature. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2789--2797. Curran Associates, Inc., 2014
work page 2014
-
[17]
S. Hayakawa, H. Oberhauser, and T. J. Lyons. Positively weighted kernel quadrature via subsampling. In Advances in Neural Information Processing Systems 35, 2022
work page 2022
-
[18]
P. Hennig, M. A. Osborne, and M. Girolami. Probabilistic Numerics: Computation as Machine Learning. Cambridge University Press, 2022. doi:10.1017/9781108671879
-
[19]
M. Kanagawa and P. Hennig. Convergence guarantees for adaptive bayesian quadrature methods. In Advances in Neural Information Processing Systems 32, pages 6234--6245. Curran Associates, Inc., 2019
work page 2019
-
[20]
M. Kanagawa, B. K. Sriperumbudur, and K. Fukumizu. Convergence analysis of deterministic kernel-based quadrature rules in misspecified settings. Foundations of Computational Mathematics, 20 0 (1): 0 155--194, 2020 a
work page 2020
-
[21]
M. Kanagawa, B. K. Sriperumbudur, and K. Fukumizu. Convergence analysis of deterministic kernel-based quadrature rules in misspecified settings. Foundations of Computational Mathematics, 20: 0 155--194, 2020 b . doi:10.1007/s10208-018-09407-7
-
[22]
T. Karvonen and S. S \"a rkk \"a . Classical quadrature rules via gaussian processes. In Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pages 31--46, Tokyo, Japan, 2017. doi:10.1109/MLSP.2017.8168195
-
[23]
T. Karvonen, M. Kanagawa, and S. S\"arkk\"a. On the positivity and magnitudes of bayesian quadrature weights. Statistics and Computing, 29: 0 1317--1333, 2019. doi:10.1007/s11222-019-09901-0
-
[24]
M. Mahsereci and T. Karvonen. Bayesian quadrature: Gaussian processes for integration. arXiv:2602.16218, 2026. arXiv preprint
-
[25]
P. Massart. Concentration Inequalities and Model Selection, volume 1896 of Lecture Notes in Mathematics. Springer, Berlin, Heidelberg, 2007. ISBN 978-3-540-37575-5
work page 2007
-
[26]
E. Novak. Deterministic and Stochastic Error Bounds in Numerical Analysis, volume 1349 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, Heidelberg, 1988. ISBN 978-3-540-50368-2
work page 1988
-
[27]
M. A. Osborne, R. Garnett, Z. Ghahramani, D. Duvenaud, S. J. Roberts, and C. E. Rasmussen. Active learning of model evidence using bayesian quadrature. In Advances in Neural Information Processing Systems (NeurIPS), volume 25, pages 46--54. Curran Associates, Inc., 2012
work page 2012
-
[28]
u her, F. Tronarp, T. Karvonen, S. S \
J. Pr \"u her, F. Tronarp, T. Karvonen, S. S \"a rkk \"a , and O. Straka. Student-t process quadratures for filtering of non-linear systems with heavy-tailed noise. In 20th International Conference on Information Fusion (FUSION), pages 875--882. IEEE, 2017
work page 2017
-
[29]
C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006
work page 2006
-
[30]
C. P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer, New York, 2nd edition, 2004
work page 2004
-
[31]
G. O. Roberts and J. S. Rosenthal. Harris recurrence of metropolis-within-gibbs and trans-dimensional markov chains. The Annals of Applied Probability, 16 0 (4): 0 2123--2139, Nov. 2006
work page 2006
-
[32]
S. S \"a rkk \"a , J. Hartikainen, L. Svensson, and F. Sandblom. On the relation between gaussian process quadratures and sigma-point methods. Journal of Advances in Information Fusion, 11 0 (1): 0 31--46, 2016. doi:10.5270/JAIF11-1-204
-
[33]
A. Shah, A. G. Wilson, and Z. Ghahramani. Student-t processes as alternatives to gaussian processes. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 33 of Proceedings of Machine Learning Research, pages 877--885. PMLR, 2014
work page 2014
-
[34]
D. Simpson, G. Riutort-Mayol, M. Binois, et al. Marginalising over stationary kernels with B ayesian quadrature. Advances in Neural Information Processing Systems, 2021
work page 2021
-
[35]
A. V. Suldin. Wiener measure and its applications to approximation methods. i. Izvestiya Vysshikh Uchebnykh Zavedenii. Matematika, pages 145--158, 1959. in Russian
work page 1959
-
[36]
Batch Selection for Parallelisation of Bayesian Quadrature
E. Wagstaff, S. Hamid, and M. A. Osborne. Batch selection for parallelisation of bayesian quadrature. arXiv preprint arXiv:1812.01553, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [37]
-
[38]
H. Wendland and C. Rieger. Approximate interpolation with applications to selecting smoothing parameters. Numerische Mathematik, 101 0 (4): 0 729--748, 2005. doi:10.1007/s00211-005-0637-y
- [39]
-
[40]
G. Wynne, F.-X. Briol, and M. Girolami. Convergence guarantees for gaussian process means with misspecified likelihoods and smoothness. Journal of Machine Learning Research, 22 0 (123): 0 1--40, 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.