Sharp Rates of MMD Empirical Estimation with Power Kernels
Pith reviewed 2026-05-20 08:22 UTC · model grok-4.3
The pith
For Ahlfors-regular measures the MMD with power kernels to any N-point empirical set decays at the exact rate N to the power of minus one-half times one plus q over beta.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a probability measure omega on R^d satisfying an Ahlfors regularity condition of exponent beta, the sharp two-sided bound E_q(mu_N, omega) ≍ N^{-1/2 (1 + q/beta)} holds both for the worst-case empirical measure mu_N (lower bound) and for an optimally chosen empirical measure mu_N (upper bound), where E_q is the energy distance induced by the power kernel K_q(x,y) = -|x-y|^q.
What carries the argument
The energy distance E_q induced by the power kernel K_q(x,y) = -|x-y|^q for q in (0,2), whose empirical estimation is controlled by the Ahlfors regularity exponent beta of the target measure.
If this is right
- The upper and lower bounds match, so the rate is optimal and cannot be improved by any choice of N points.
- The same rate applies uniformly to every possible configuration of N points in the lower bound.
- The quantitative speed fills the gap left by the earlier qualitative narrow-convergence result for minimizers of the energy distance.
- The exponent depends explicitly on both the kernel parameter q and the regularity exponent beta of the measure.
Where Pith is reading between the lines
- The same proof technique might yield analogous sharp rates for other singular kernels whose Fourier transforms decay at comparable rates.
- Because beta encodes the dimension of the support, the result links empirical approximation quality directly to the intrinsic dimension of the measure.
- Numerical verification on self-similar measures with known beta would provide an immediate test of the predicted exponent.
Load-bearing premise
The target probability measure must satisfy an Ahlfors regularity condition of exponent beta.
What would settle it
Compute the asymptotic decay of the minimal energy distance for a concrete Ahlfors-regular measure such as the uniform distribution on the unit ball in R^d and check whether the observed exponent matches exactly minus one-half times one plus q over beta.
read the original abstract
We establish quantitative rates of convergence for the empirical estimation of probability measures by means of the Maximum Mean Discrepancy (MMD) with power kernel $K_q(x,y) = -|x-y|^q$, $q \in (0,2)$. The resulting discrepancy is the classical energy distance $$\mathcal E_q^2(\mu, \omega) = -\frac{1}{2}\iint_{\mathbb{R}^d \times \mathbb{R}^d} |x-y|^q \, d(\mu - \omega)(x)\, d(\mu - \omega)(y),$$ and we ask how fast the best $N$-point empirical approximation $\inf_{\mu_N \in \mathcal{P}^N}\mathcal{E}_q(\mu_N,\omega)$ decays as $N \to \infty$. Given a probability measure $\omega$ on $\mathbb{R}^d$ satisfying an Ahlfors regularity condition of exponent $\beta$, we prove that the sharp two-sided bound $$\mathcal E_q(\mu_N, \omega) \asymp N^{-\frac{1}{2}\left(1 + \frac{q}{\beta}\right)}$$ holds both for the worst-case empirical measure $\mu_N$ (lower bound, holding for every configuration of $N$ points) and for an optimally chosen empirical measure $\mu_N$ (upper bound). This complements the qualitative consistency result of Fornasier and H\"utter \cite{fornasier2014consistency}, who proved narrow convergence of the minimizers of $\mathcal E_q^2(\cdot, \omega)$ over empirical measures without quantitative rates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper establishes sharp quantitative rates for the empirical estimation of a probability measure ω on R^d by N-point measures μ_N in the maximum mean discrepancy induced by the power kernel K_q(x,y) = -|x-y|^q for q ∈ (0,2). Under the assumption that ω satisfies an Ahlfors regularity condition of exponent β, it proves the two-sided bound E_q(μ_N, ω) ≍ N^{-(1/2)(1 + q/β)}, which holds both for the worst-case choice of μ_N (lower bound) and for the optimally chosen μ_N (upper bound). This supplies explicit rates that complement the qualitative narrow-convergence result of Fornasier and Hütter.
Significance. If the result holds, it furnishes the first sharp, explicit convergence rates for energy-distance approximation of Ahlfors-regular measures by empirical measures. The two-sided character of the bound, the direct dependence on the regularity exponent β, and the derivation via covering arguments and potential-theoretic comparisons constitute a clean contribution to quantitative potential theory and discrepancy theory.
minor comments (3)
- §2, Definition 2.3: the precise statement of the Ahlfors regularity condition (including the admissible range for β relative to dimension d) should be recalled explicitly before the main theorem, rather than only referenced.
- The proof of the lower bound in §5 invokes a potential-theoretic comparison on balls of radius N^{-1/β}; a short remark clarifying why the constant in the comparison is independent of the particular ball would improve readability.
- Figure 1 (if present) or the numerical illustration in §6: the caption should state the precise values of q and β used in the simulation so that the plotted rate can be directly compared with the theoretical exponent.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were provided in the report, so we have no individual points to address point-by-point. We will prepare a revised manuscript incorporating any minor editorial suggestions that may arise during the process.
Circularity Check
No significant circularity; derivation is self-contained analytic proof
full rationale
The paper establishes the sharp two-sided rate directly from the Ahlfors regularity assumption of exponent β via an explicit covering argument for the upper bound (placing N points to control local energy at scale N^{-1/β}) and a potential-theoretic comparison for the lower bound (showing any N-point measure leaves discrepancy of the claimed order). The exponent 1/2(1 + q/β) follows from balancing the quadratic form of the power kernel against β-dimensional volume scaling, with no fitted parameters, self-definitional reductions, or load-bearing self-citations. The cited prior result of Fornasier and Hütter supplies only qualitative consistency and is not used to derive the quantitative rates, leaving the central claim independent of its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The probability measure ω satisfies an Ahlfors regularity condition of exponent β.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 2.10: E_q²(μ,ω) = κ_{d,q}^{-1} ||μ-ω||_{H^{-s}_0}^2 via Fourier representation with weight |ξ|^{-(d+q)}
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Ahlfors regularity (1.3) and equimass partition upper bound (Lemma 3.7)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Oxford Mathematical Monographs
Luigi Ambrosio, Nicola Fusco, and Diego Pallara.Functions of bounded variation and free dis- continuity problems. Oxford Mathematical Monographs. The Clarendon Press, Oxford University Press, New York, 2000
work page 2000
-
[2]
Lectures in Mathematics ETH Z¨ urich
Luigi Ambrosio, Nicola Gigli, and Giuseppe Savar´ e.Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics ETH Z¨ urich. Birkh¨ auser, Basel, 2nd edition, 2008
work page 2008
-
[3]
Gennaro Auricchio, Giovanni Brigati, Paolo Giudici, and Giuseppe Toscani. From kinetic theory to AI: A rediscovery of high-dimensional divergences and their properties.Mathematical Models and Methods in Applied Sciences, 2026. Preprint arXiv:2507.11387
-
[4]
Gennaro Auricchio, Andrea Codegoni, Stefano Gualandi, Giuseppe Toscani, and Marco Veneroni. The equivalence of Fourier-based and Wasserstein metrics on imaging problems.Atti Accademia Nazionale dei Lincei. Rendiconti Lincei. Matematica e Applicazioni, 31(3):627–649, 2020
work page 2020
-
[5]
Luca Brandolini, William W. L. Chen, Leonardo Colzani, Giacomo Gigante, and Giancarlo Travaglini. Discrepancy and numerical integration on metric measure spaces.Journal of Geo- metric Analysis, 29(1):328–369, 2019
work page 2019
-
[6]
A projection algorithm on measures sets
Nicolas Chauffert, Philippe Ciuciu, Jonas Kahn, and Pierre Weiss. A projection method on mea- sures sets.Constructive Approximation, 45(1):83–111, February 2017. Preprint arXiv:1509.00229, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[7]
Kernel two-sample tests for manifold data.Bernoulli, 30(4):2572– 2597, 2024
Xiuyuan Cheng and Yao Xie. Kernel two-sample tests for manifold data.Bernoulli, 30(4):2572– 2597, 2024
work page 2024
-
[8]
L´ ena ¨ ıc Chizat, Maria Colombo, Roberto Colombo, and Xavier Fern´ andez-Real. Quantita- tive convergence of Wasserstein gradient flows of kernel mean discrepancies, 2026. Preprint, arXiv:2603.01977. SHARP RATES OF MMD EMPIRICAL ESTIMATION WITH POWER KERNELS 33
-
[9]
Sinkhorn distances: Lightspeed computation of optimal transport
Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Christopher J. C. Burges, L´ eon Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger, editors,Advances in Neural Information Processing Systems, volume 26, pages 2292–2300, 2013
work page 2013
-
[10]
Giorgio Dall’Aglio. Sugli estremi dei momenti delle funzioni di ripartizione doppia.Annali della Scuola Normale Superiore di Pisa - Scienze Fisiche e Matematiche, Ser. 3, 10(1-2):35–74, 1956
work page 1956
-
[11]
Constructive quantization: approxi- mation by empirical measures.Annales de l’I.H.P
Steffen Dereich, Michael Scheutzow, and Reik Schottstedt. Constructive quantization: approxi- mation by empirical measures.Annales de l’I.H.P. Probabilit´ es et statistiques, 49(4):1183–1203, 2013
work page 2013
-
[12]
Marco Di Francesco, Massimo Fornasier, Jan-Christian H¨ utter, and Daniel Matthes. Asymptotic behavior of gradient flows driven by nonlocal power repulsion and attraction potentials in one dimension.SIAM Journal on Mathematical Analysis, 46(6):3814–3837, 2014
work page 2014
-
[13]
Springer Monographs in Mathematics
Irene Fonseca and Giovanni Leoni.Modern Methods in the Calculus of Variations: Lp Spaces. Springer Monographs in Mathematics. Springer New York, 2007
work page 2007
-
[14]
Massimo Fornasier, Jan Haˇ skovec, and Gabriele Steidl. Consistency of variational continuous- domain quantization via kinetic theory.Applicable Analysis, 92(6):1283–1298, 2013
work page 2013
-
[15]
Consistency of Probability Measure Quantization by Means of Power Repulsion-Attraction Potentials
Massimo Fornasier and Jan-Christian H¨ utter. Consistency of probability measure quantization by means of power repulsion–attraction potentials.Journal of Fourier Analysis and Applications, 22(3):694–749, 2016. Preprint arXiv:1310.1120, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[16]
Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure.Probability Theory and Related Fields, 162(3–4):707–738, 2015
work page 2015
-
[17]
Bert Fristedt and Lawrence Gray.A Modern Approach to Probability Theory. Birkh¨ auser, Boston, 1997
work page 1997
-
[18]
Diameter bounded equal measure partitions of Ahlfors regular metric measure spaces.Discrete Comput
Giacomo Gigante and Paul Leopardi. Diameter bounded equal measure partitions of Ahlfors regular metric measure spaces.Discrete Comput. Geom., 57(2):419–430, 2017
work page 2017
-
[19]
Siegfried Graf and Harald Luschgy.Foundations of Quantization for Probability Distributions, volume 1730 ofLecture Notes in Mathematics. Springer, Berlin, 2000
work page 2000
-
[20]
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13(25):723–773, 2012
work page 2012
-
[21]
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander J. Smola. A kernel method for the two-sample-problem. In Bernhard Sch¨ olkopf, John Platt, and Thomas Hofmann, editors,Advances in Neural Information Processing Systems 19 (NIPS 2006), pages 513–520. MIT Press, December 2006
work page 2006
-
[22]
Posterior sampling based on gradient flows of the MMD with negative distance kernel
Paul Hagemann, Johannes Hertrich, Fabian Altekr¨ uger, Robert Beinert, Jannis Chemseddine, and Gabriele Steidl. Posterior sampling based on gradient flows of the MMD with negative distance kernel. InThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[23]
Generative sliced MMD flows with riesz kernels
Johannes Hertrich, Christian Wald, Fabian Altekr¨ uger, and Paul Hagemann. Generative sliced MMD flows with riesz kernels. InThe Twelfth International Conference on Learning Represen- tations, 2024
work page 2024
-
[24]
John E. Hutchinson. Fractals and self-similarity.Indiana Univ. Math. J., 30(5):713–747, 1981
work page 1981
-
[25]
Distance covariance in metric spaces.The Annals of Probability, 41(5):3284–3305, 2013
Russell Lyons. Distance covariance in metric spaces.The Annals of Probability, 41(5):3284–3305, 2013
work page 2013
-
[26]
Thibault Modeste and Cl´ ement Dombry. Characterization of translation invariant MMD on Rd and connections with Wasserstein distances.Journal of Machine Learning Research, 25:1–39, 2024
work page 2024
-
[27]
Computational optimal transport.Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019
Gabriel Peyr´ e and Marco Cuturi. Computational optimal transport.Foundations and Trends in Machine Learning, 11(5–6):355–607, 2019
work page 2019
-
[28]
Electrostatic halftoning.Computer Graphics Forum, 29(8):2313–2327, December 2010
Christian Schmaltz, Pascal Gwosdek, Andr´ es Bruhn, and Joachim Weickert. Electrostatic halftoning.Computer Graphics Forum, 29(8):2313–2327, December 2010
work page 2010
-
[29]
I. J. Schoenberg. Metric spaces and completely monotone functions.Annals of Mathematics, 39(4):811–841, 1938
work page 1938
-
[30]
I. J. Schoenberg. Metric spaces and positive definite functions.Transactions of the American Mathematical Society, 44(3):522–536, November 1938
work page 1938
-
[31]
Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. Equivalence of distance-based and RKHS-based statistics in hypothesis testing.The Annals of Statistics, 41(5):2263–2291, October 2013. SHARP RATES OF MMD EMPIRICAL ESTIMATION WITH POWER KERNELS 34
work page 2013
-
[32]
Cencheng Shen and Joshua T. Vogelstein. The exact equivalence of distance and kernel methods in hypothesis testing.AStA Advances in Statistical Analysis, 105(3):385–403, 2021
work page 2021
- [33]
- [34]
-
[35]
G´ abor J. Sz´ ekely and Maria L. Rizzo. A new test for multivariate normality.Journal of Multivariate Analysis, 93(1):58–80, 2005
work page 2005
-
[36]
G´ abor J. Sz´ ekely and Maria L. Rizzo. Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143(8):1249–1272, August 2013
work page 2013
-
[37]
G´ abor J. Sz´ ekely and Maria L. Rizzo.The Energy of Data and Distance Correlation, volume 171 ofChapman & Hall/CRC Monographs on Statistics and Applied Probability. Chapman and Hall/CRC Press, Boca Raton, 2023
work page 2023
-
[38]
G´ abor J. Sz´ ekely, Maria L. Rizzo, and Nail K. Bakirov. Measuring and testing dependence by correlation of distances.The Annals of Statistics, 35(6):2769–2794, December 2007
work page 2007
-
[39]
Dithering by differences of convex functions.SIAM Journal on Imaging Sciences, 4(1):79–108, 2011
Tanja Teuber, Gabriele Steidl, Pascal Gwosdek, Christian Schmaltz, and Joachim Weickert. Dithering by differences of convex functions.SIAM Journal on Imaging Sciences, 4(1):79–108, 2011
work page 2011
-
[40]
Cambridge University Press, Cambridge, 2005
Holger Wendland.Scattered Data Approximation, volume 17 ofCambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2005
work page 2005
-
[41]
Jian Yan and Xianyang Zhang. Kernel two-sample tests in high dimensions: interplay between moment discrepancy and dimension-and-sample orders.Biometrika, 110(2):411–430, 2023
work page 2023
-
[42]
Paul L. Zador. Topics in the asymptotic quantization of continuous random variables. Technical report, Bell Laboratories, Murray Hill, NJ, 1966. (Francesco Colasanto)Department of Mathematics, CIT School, Technical University of Munich, Munich, Germany Email address:francesco.colasanto@tum.de (Matteo Focardi)DiMaI U. Dini, Universit `a di Firenze, Florenc...
work page 1966
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.