Kernel Embeddings and the Separation of Measure Phenomenon
Pith reviewed 2026-05-22 15:54 UTC · model grok-4.3
The pith
Kernel covariance embeddings make equality testing of non-atomic measures equivalent to singularity testing of centered Gaussians in an RKHS
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. Testing for the equality of two non-atomic Borel probability measures on a locally compact uncountable Polish space is equivalent to testing for the singularity between two centered Gaussian measures on the reproducing kernel Hilbert space generated by the embedding kernel. The proof leverages the classical Feldman-Hajek dichotomy and demonstrates that small perturbations of continuous distributions are maximally magnified through their Gaussian embeddings.
What carries the argument
The kernel covariance embedding, which maps each probability measure to a centered Gaussian on the RKHS so that the Feldman-Hajek dichotomy can be applied directly to establish singularity for distinct measures
Load-bearing premise
The kernel must generate an RKHS in which the covariance embeddings of distinct non-atomic measures produce Gaussians whose supports are essentially separate affine subspaces
What would settle it
Two distinct non-atomic continuous measures on a Polish space together with a kernel such that the corresponding embedded centered Gaussians are not singular
Figures
read the original abstract
We prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. In statistical terms, we establish that testing for the \emph{equality} of two non-atomic (Borel) probability measures on a locally compact uncountable Polish space is \emph{equivalent} to testing for the \emph{singularity} between two centered Gaussian measures on a reproducing kernel Hilbert space. The corresponding Gaussians are defined via the notion of kernel covariance embedding of a probability measure, and the Hilbert space is that generated by the embedding kernel. Distinguishing singular Gaussians is structurally simpler from an information-theoretic perspective than non-parametric two-sample testing, particularly in complex or high-dimensional domains. This is because singular Gaussians are supported on essentially separate and affine subspaces. Our proof leverages the classical Feldman-H\'{a}jek dichotomy, and shows that even a small perturbation of a continuous distribution will be maximally magnified through its Gaussian embedding. This ``separation of measure phenomenon'' appears to be a blessing of infinite dimensionality, by means of embedding, with the potential to inform the design of efficient inference tools in considerable generality. The elicitation of this phenomenon also appears to crystallize, in a precise and simple mathematical statement, a core mechanism underpinning the empirical effectiveness of kernel methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. It establishes that testing for the equality of two non-atomic Borel probability measures on a locally compact uncountable Polish space is equivalent to testing for the singularity between two centered Gaussian measures on the RKHS generated by the embedding kernel. The proof applies the classical Feldman-Hájek dichotomy to these embedded Gaussians, arguing that small perturbations of continuous distributions are maximally magnified, yielding a 'separation of measure phenomenon' that is a blessing of infinite dimensionality and may explain the effectiveness of kernel methods.
Significance. If the equivalence holds under the stated conditions, the result crystallizes a precise mechanism by which kernel embeddings convert non-parametric two-sample testing into the structurally simpler problem of distinguishing singular Gaussians supported on separate affine subspaces. This has potential to inform the design of efficient inference procedures in high-dimensional or complex domains. The manuscript receives credit for grounding the argument in the standard Feldman-Hájek dichotomy and for highlighting an infinite-dimensional phenomenon without introducing free parameters or ad-hoc entities.
major comments (2)
- [Abstract] Abstract (paragraph on the separation of measure phenomenon): the claimed equivalence between equality testing for P ≠ Q and singularity of the embedded centered Gaussians G_P, G_Q via Feldman-Hájek requires that the covariance operators satisfy the necessary conditions for singularity even when supp(P) = supp(Q). The range of C_P is contained in the closure of span{φ(x) : x ∈ supp(P)}, so identical supports imply identical ranges and thus coinciding closed supports for G_P and G_Q; the manuscript must explicitly verify that C_P^{-1/2} C_Q C_P^{-1/2} − I fails to be Hilbert-Schmidt or has spectrum intersecting −1 for all such equal-support pairs, otherwise the information-theoretic equivalence does not hold without further restrictions on the kernel.
- [Abstract and proof] The application of Feldman-Hájek (abstract and proof outline): the precise conditions on the kernel and the Polish space under which the embedded Gaussians are singular for every pair of distinct non-atomic measures are not fully inspectable from the abstract; if these conditions are only verified for measures with disjoint supports, the central claim is load-bearing on an unstated extension to the equal-support case.
minor comments (2)
- [Abstract] Clarify the exact definition of the covariance embedding operator C_P f = ∫ ⟨f, φ(x)⟩ φ(x) dP(x) and its domain in the RKHS, including any measurability requirements.
- [Throughout] The phrase 'essentially separate and affine subspaces' should be tied explicitly to the Feldman-Hájek support condition in the main text.
Simulated Author's Rebuttal
We thank the referee for the thorough reading and for identifying points that merit clarification. The comments correctly highlight the need to make the treatment of the equal-support case fully explicit. We will revise the manuscript to address this.
read point-by-point responses
-
Referee: The claimed equivalence requires that covariance operators satisfy singularity conditions even when supp(P) = supp(Q). Identical supports imply identical ranges and coinciding closed supports for G_P and G_Q; the manuscript must explicitly verify that C_P^{-1/2} C_Q C_P^{-1/2} − I fails to be Hilbert-Schmidt or has spectrum intersecting −1 for all such pairs.
Authors: We agree that explicit verification is required. The main theorem applies to all distinct non-atomic measures, but the current exposition emphasizes the disjoint-support case for clarity. For the equal-support case we will add a lemma establishing that, because the measures are distinct and non-atomic on an uncountable Polish space, the resulting covariance operators differ by a perturbation that violates the Feldman-Hájek equivalence criteria (the operator C_P^{-1/2} C_Q C_P^{-1/2} − I is never Hilbert-Schmidt and its spectrum intersects −1). This lemma will be inserted before the main proof and the abstract will be updated to reference it. revision: yes
-
Referee: The precise conditions on the kernel and Polish space under which the embedded Gaussians are singular for every pair of distinct non-atomic measures are not fully inspectable; if verified only for disjoint supports, the central claim rests on an unstated extension to the equal-support case.
Authors: The conditions are stated in Theorem 1: the kernel is continuous and characteristic, the space is locally compact and uncountable Polish, and the measures are non-atomic Borel probabilities. The proof outline in the abstract is necessarily brief; the full argument proceeds by first treating disjoint supports (where singularity is immediate) and then handling equal supports via the lemma described above. We will expand the proof section to present both cases sequentially and self-contained, removing any ambiguity about the extension. revision: yes
Circularity Check
No circularity: derivation relies on classical external theorem
full rationale
The paper's central claim equates equality testing of non-atomic measures to singularity testing of their kernel covariance embeddings via the classical Feldman-Hajek dichotomy for Gaussian measures on Hilbert spaces. This is an application of an independent, externally verifiable mathematical result rather than any self-definitional reduction, fitted-input prediction, or load-bearing self-citation chain. The abstract and derivation explicitly invoke the classical dichotomy without redefining it in terms of the paper's own constructions or renaming known results as novel. The argument remains self-contained against external benchmarks, with no steps that reduce the claimed equivalence to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The underlying space is a locally compact uncountable Polish space and the measures are non-atomic Borel probability measures.
- domain assumption The kernel is positive definite and generates a reproducing kernel Hilbert space in which the covariance embeddings are well-defined centered Gaussians.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
testing for the equality of two non-atomic probability measures ... is equivalent to testing for the singularity between two centered Gaussian measures on a reproducing kernel Hilbert space
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our proof leverages the classical Feldman-Hájek dichotomy
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Kolmogorov A. Sulla determinazione empirica di una legge di distribuzione.Giornale dell’Istituto Italiano degli Attuari, 4:89–91, 1933
work page 1933
-
[2]
Theory of reproducing kernels.Transactions of the American mathematical society, 68(3):337–404, 1950
Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American mathematical society, 68(3):337–404, 1950
work page 1950
-
[3]
Information theory with kernel methods.IEEE Transactions on Information Theory, 69(2):752–775, 2022
Francis Bach. Information theory with kernel methods.IEEE Transactions on Information Theory, 69(2):752–775, 2022
work page 2022
-
[4]
Charles R Baker. On covariance operators. Technical report, North Carolina State University. Dept. of Statistics, 1970
work page 1970
-
[5]
Springer Science & Business Media, 2011
Alain Berlinet and Christine Thomas-Agnan.Reproducing kernel Hilbert spaces in probability and statis- tics. Springer Science & Business Media, 2011
work page 2011
- [6]
-
[7]
C. Carmeli, E. De Vito, A. Toigo, and V. Umanit` a. Vector valued reproducing kernel hilbert spaces and universality.Analysis and Applications, 8(1):19–61, 2010
work page 2010
-
[8]
Choosing multiple param- eters for support vector machines.Machine learning, 46(1):131–159, 2002
Olivier Chapelle, Vladimir Vapnik, Olivier Bousquet, and Sayan Mukherjee. Choosing multiple param- eters for support vector machines.Machine learning, 46(1):131–159, 2002
work page 2002
-
[9]
Boosting the power of kernel two-sample tests
Anirban Chatterjee and Bhaswar B Bhattacharya. Boosting the power of kernel two-sample tests. Biometrika, page asae048, 2024
work page 2024
-
[10]
Hao Chen and Jerome H Friedman. A new graph-based two-sample test for multivariate and object data.Journal of the American Statistical Association, 112(517):397–409, 2017. 14
work page 2017
-
[11]
Shojaeddin Chenouri and Christopher G. Small. A Nonparametric Multivariate Multisample Test Based on Data Depth.Electronic Journal of Statistics, 6(none):760 – 782, 2012
work page 2012
-
[12]
Moulines Eric, Francis Bach, and Za¨ ıd Harchaoui. Testing for homogeneity with kernel fisher discrimi- nant analysis.Advances in Neural Information Processing Systems, 20, 2007
work page 2007
-
[13]
Equivalence and perpendicularity of gaussian processes.Pacific J
Jacob Feldman. Equivalence and perpendicularity of gaussian processes.Pacific J. Math, 8(4):699–708, 1958
work page 1958
-
[14]
Sur une classe d’´ equations fonctionnelles.Acta Mathematica, 27(1):365–390, 1903
Ivar Fredholm. Sur une classe d’´ equations fonctionnelles.Acta Mathematica, 27(1):365–390, 1903
work page 1903
-
[15]
Kenji Fukumizu, Francis R Bach, and Michael I Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces.Journal of Machine Learning Research, 5(Jan):73–99, 2004
work page 2004
-
[16]
Kenji Fukumizu, Arthur Gretton, Gert Lanckriet, Bernhard Sch¨ olkopf, and Bharath K Sriperumbudur. Kernel choice and classifiability for rkhs embeddings of probability distributions.Advances in neural information processing systems, 22, 2009
work page 2009
-
[17]
The probable error of a mean.Biometrika, 6:1–25, 1908
William Sealy Gosset. The probable error of a mean.Biometrika, 6:1–25, 1908. Published under the pseudonym ”Student”
work page 1908
- [18]
-
[19]
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13:723–773, 2012
work page 2012
-
[20]
Arthur Gretton, Kenji Fukumizu, Choon Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola. A kernel statistical test of independence.Advances in neural information processing systems, 20, 2007
work page 2007
-
[21]
Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola
Arthur Gretton, Kenji Fukumizu, Choon H. Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola. Optimal kernel choice for large-scale two-sample tests. InAdvances in Neural Information Processing Systems (NeurIPS), volume 25, 2012
work page 2012
-
[22]
Spectral regularized kernel two-sample tests.The Annals of Statistics, 52(3):1076–1101, 2024
Omar Hagrass, Bharath Sriperumbudur, and Bing Li. Spectral regularized kernel two-sample tests.The Annals of Statistics, 52(3):1076–1101, 2024
work page 2024
-
[23]
Jaroslav Hajek. On a property of normal distributions of any stochastic process.Czechoslovak Mathe- matical Journal, 8(4):610–618, 1958
work page 1958
-
[24]
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations
Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corrup- tions and perturbations.arXiv preprint arXiv:1903.12261, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1903
-
[25]
Norbert Henze. A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences.The Annals of Statistics, 16(2):772 – 783, 1988
work page 1988
-
[26]
Learning kernel in maximum mean discrepancy test
Qianli Liu, Song Liu, Jian Li, and Dacheng Tao. Learning kernel in maximum mean discrepancy test. Statistical Analysis and Data Mining: The ASA Data Science Journal, 13(6):491–503, 2020
work page 2020
-
[27]
H.B. Mann and D.R. Whitney. On a test of whether one of two random variables is stochastically larger than the other.The Annals of Mathematical Statistics, 18:50–60, 1947
work page 1947
-
[28]
Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Bernhard Sch¨ olkopf, et al. Kernel mean embedding of distributions: A review and beyond.Foundations and Trends in Machine Learning, 10(1-2):1–141, 2017
work page 2017
-
[29]
Karl Pearson. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling.Philosophical Magazine, 50:157–175, 1900. 15
work page 1900
-
[30]
C Radhakrishna Rao and VS Varadarajan. Discrimination of gaussian processes.Sankhy¯ a: The Indian Journal of Statistics, Series A, pages 303–330, 1963
work page 1963
-
[31]
Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InInternational Conference on Machine Learning, pages 5389–5400. PMLR, 2019
work page 2019
-
[32]
Leonardo V. Santoro and Victor M. Panaretos. Likelihood ratio tests by kernel gaussian embedding. arXiv preprint arXiv:2508.07982, 2025
-
[33]
Smola.Learning with Kernels: Support Vector Machines, Regu- larization, Optimization, and Beyond
Bernhard Sch¨ olkopf and Alexander J. Smola.Learning with Kernels: Support Vector Machines, Regu- larization, Optimization, and Beyond. MIT Press, 2002
work page 2002
-
[34]
Mmd aggregated two-sample test.Journal of Machine Learning Research, 24(194):1–81, 2023
Antonin Schrab, Ilmun Kim, M´ elisande Albert, B´ eatrice Laurent, Benjamin Guedj, and Arthur Gretton. Mmd aggregated two-sample test.Journal of Machine Learning Research, 24(194):1–81, 2023
work page 2023
-
[35]
A permutation-free kernel two-sample test
Shubhanshu Shekhar, Ilmun Kim, and Aaditya Ramdas. A permutation-free kernel two-sample test. Advances in Neural Information Processing Systems, 35:18168–18180, 2022
work page 2022
-
[36]
Gaussian measures in function space.Pacific Journal of Mathematics, 17(1):167–173, 1966
Lawrence Shepp. Gaussian measures in function space.Pacific Journal of Mathematics, 17(1):167–173, 1966
work page 1966
-
[37]
Barry Simon. Notes on infinite determinants of hilbert space operators.Advances in Mathematics, 24(3):244–273, 1977
work page 1977
-
[38]
American Mathematical Society, Providence, RI, 2015
Barry Simon.Operator theory, volume Part 4 ofA Comprehensive Course in Analysis. American Mathematical Society, Providence, RI, 2015
work page 2015
-
[39]
R. K. Singh and Ashok Kumar. Compact composition operators.J. Austral. Math. Soc. Ser. A, 28(3):309–314, 1979
work page 1979
-
[40]
Nickolay Smirnov. Table for estimating the goodness of fit of empirical distributions.The annals of mathematical statistics, 19(2):279–281, 1948
work page 1948
-
[41]
Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Gert R
Bharath K. Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Gert R. G. Lanckriet, and Bernhard Sch¨ olkopf. On the empirical estimation of integral probability metrics.Electronic Journal of Statistics, 6:1550–1599, 2011
work page 2011
-
[42]
Ingo Steinwart. On the influence of the kernel on the consistency of support vector machines.Journal of machine learning research, 2(Nov):67–93, 2001
work page 2001
-
[43]
Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. InCVPR 2011, pages 1521–1528. IEEE, 2011
work page 2011
-
[44]
Waghmare, Tomas Masak, and Victor M
Kartik G. Waghmare, Tomas Masak, and Victor M. Panaretos. The functional graphical lasso. Annals of Statistics(to appear, arXiv:2306.02347), 2023
-
[45]
Rand R. Wilcox. Two-sample, bivariate hypothesis testing methods based on tukey’s depth.Multivariate Behavioral Research, 38(2):225–246, 2003
work page 2003
-
[46]
Individual comparisons by ranking methods.Biometrics Bulletin, 1(6):80–83, 1945
Frank Wilcoxon. Individual comparisons by ranking methods.Biometrics Bulletin, 1(6):80–83, 1945
work page 1945
-
[47]
General notions of statistical depth function.Annals of Statistics, pages 461–482, 2000
Yijun Zuo and Robert Serfling. General notions of statistical depth function.Annals of Statistics, pages 461–482, 2000. 16
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.