pith. sign in

arxiv: 2505.04613 · v4 · pith:AQDNLDGEnew · submitted 2025-05-07 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Kernel Embeddings and the Separation of Measure Phenomenon

Pith reviewed 2026-05-22 15:54 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH
keywords kernel embeddingsreproducing kernel Hilbert spaceGaussian measuressingularitytwo-sample testingFeldman-Hajek dichotomyprobability measuresseparation of measures
0
0 comments X

The pith

Kernel covariance embeddings make equality testing of non-atomic measures equivalent to singularity testing of centered Gaussians in an RKHS

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that embedding non-atomic probability measures through their kernel covariances converts the problem of testing whether two such measures are identical into the problem of testing whether two centered Gaussian measures in the associated reproducing kernel Hilbert space are mutually singular. This equivalence holds on locally compact uncountable Polish spaces because the embedded Gaussians for distinct measures have supports that are essentially disjoint affine subspaces. A sympathetic reader would care because singularity testing between such Gaussians is structurally simpler and information-theoretically more decisive than direct nonparametric two-sample testing, especially in high-dimensional or complex settings. The argument relies on the Feldman-Hajek dichotomy and shows that even tiny perturbations of a continuous distribution become maximally separated after embedding.

Core claim

We prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. Testing for the equality of two non-atomic Borel probability measures on a locally compact uncountable Polish space is equivalent to testing for the singularity between two centered Gaussian measures on the reproducing kernel Hilbert space generated by the embedding kernel. The proof leverages the classical Feldman-Hajek dichotomy and demonstrates that small perturbations of continuous distributions are maximally magnified through their Gaussian embeddings.

What carries the argument

The kernel covariance embedding, which maps each probability measure to a centered Gaussian on the RKHS so that the Feldman-Hajek dichotomy can be applied directly to establish singularity for distinct measures

Load-bearing premise

The kernel must generate an RKHS in which the covariance embeddings of distinct non-atomic measures produce Gaussians whose supports are essentially separate affine subspaces

What would settle it

Two distinct non-atomic continuous measures on a Polish space together with a kernel such that the corresponding embedded centered Gaussians are not singular

Figures

Figures reproduced from arXiv: 2505.04613 by Kartik G. Waghmare, Leonardo V. Santoro, Victor M. Panaretos.

Figure 1
Figure 1. Figure 1: Since Gaussians are always supported on affine sets, there is structure to the way singularity can [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Gaussian embeddings magnify distributional differences in a structured fashion: distinct measures on X (P, Q on the left) are mapped to mutually singular Gaussian measures on H (NP, NQ on the right, where NP, NQ are either centered or uncentered Gaussian embeddings of P, Q). 7 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Monte Carlo Illustration of the sampling behaviour of MMD and [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

We prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. In statistical terms, we establish that testing for the \emph{equality} of two non-atomic (Borel) probability measures on a locally compact uncountable Polish space is \emph{equivalent} to testing for the \emph{singularity} between two centered Gaussian measures on a reproducing kernel Hilbert space. The corresponding Gaussians are defined via the notion of kernel covariance embedding of a probability measure, and the Hilbert space is that generated by the embedding kernel. Distinguishing singular Gaussians is structurally simpler from an information-theoretic perspective than non-parametric two-sample testing, particularly in complex or high-dimensional domains. This is because singular Gaussians are supported on essentially separate and affine subspaces. Our proof leverages the classical Feldman-H\'{a}jek dichotomy, and shows that even a small perturbation of a continuous distribution will be maximally magnified through its Gaussian embedding. This ``separation of measure phenomenon'' appears to be a blessing of infinite dimensionality, by means of embedding, with the potential to inform the design of efficient inference tools in considerable generality. The elicitation of this phenomenon also appears to crystallize, in a precise and simple mathematical statement, a core mechanism underpinning the empirical effectiveness of kernel methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. It establishes that testing for the equality of two non-atomic Borel probability measures on a locally compact uncountable Polish space is equivalent to testing for the singularity between two centered Gaussian measures on the RKHS generated by the embedding kernel. The proof applies the classical Feldman-Hájek dichotomy to these embedded Gaussians, arguing that small perturbations of continuous distributions are maximally magnified, yielding a 'separation of measure phenomenon' that is a blessing of infinite dimensionality and may explain the effectiveness of kernel methods.

Significance. If the equivalence holds under the stated conditions, the result crystallizes a precise mechanism by which kernel embeddings convert non-parametric two-sample testing into the structurally simpler problem of distinguishing singular Gaussians supported on separate affine subspaces. This has potential to inform the design of efficient inference procedures in high-dimensional or complex domains. The manuscript receives credit for grounding the argument in the standard Feldman-Hájek dichotomy and for highlighting an infinite-dimensional phenomenon without introducing free parameters or ad-hoc entities.

major comments (2)
  1. [Abstract] Abstract (paragraph on the separation of measure phenomenon): the claimed equivalence between equality testing for P ≠ Q and singularity of the embedded centered Gaussians G_P, G_Q via Feldman-Hájek requires that the covariance operators satisfy the necessary conditions for singularity even when supp(P) = supp(Q). The range of C_P is contained in the closure of span{φ(x) : x ∈ supp(P)}, so identical supports imply identical ranges and thus coinciding closed supports for G_P and G_Q; the manuscript must explicitly verify that C_P^{-1/2} C_Q C_P^{-1/2} − I fails to be Hilbert-Schmidt or has spectrum intersecting −1 for all such equal-support pairs, otherwise the information-theoretic equivalence does not hold without further restrictions on the kernel.
  2. [Abstract and proof] The application of Feldman-Hájek (abstract and proof outline): the precise conditions on the kernel and the Polish space under which the embedded Gaussians are singular for every pair of distinct non-atomic measures are not fully inspectable from the abstract; if these conditions are only verified for measures with disjoint supports, the central claim is load-bearing on an unstated extension to the equal-support case.
minor comments (2)
  1. [Abstract] Clarify the exact definition of the covariance embedding operator C_P f = ∫ ⟨f, φ(x)⟩ φ(x) dP(x) and its domain in the RKHS, including any measurability requirements.
  2. [Throughout] The phrase 'essentially separate and affine subspaces' should be tied explicitly to the Feldman-Hájek support condition in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough reading and for identifying points that merit clarification. The comments correctly highlight the need to make the treatment of the equal-support case fully explicit. We will revise the manuscript to address this.

read point-by-point responses
  1. Referee: The claimed equivalence requires that covariance operators satisfy singularity conditions even when supp(P) = supp(Q). Identical supports imply identical ranges and coinciding closed supports for G_P and G_Q; the manuscript must explicitly verify that C_P^{-1/2} C_Q C_P^{-1/2} − I fails to be Hilbert-Schmidt or has spectrum intersecting −1 for all such pairs.

    Authors: We agree that explicit verification is required. The main theorem applies to all distinct non-atomic measures, but the current exposition emphasizes the disjoint-support case for clarity. For the equal-support case we will add a lemma establishing that, because the measures are distinct and non-atomic on an uncountable Polish space, the resulting covariance operators differ by a perturbation that violates the Feldman-Hájek equivalence criteria (the operator C_P^{-1/2} C_Q C_P^{-1/2} − I is never Hilbert-Schmidt and its spectrum intersects −1). This lemma will be inserted before the main proof and the abstract will be updated to reference it. revision: yes

  2. Referee: The precise conditions on the kernel and Polish space under which the embedded Gaussians are singular for every pair of distinct non-atomic measures are not fully inspectable; if verified only for disjoint supports, the central claim rests on an unstated extension to the equal-support case.

    Authors: The conditions are stated in Theorem 1: the kernel is continuous and characteristic, the space is locally compact and uncountable Polish, and the measures are non-atomic Borel probabilities. The proof outline in the abstract is necessarily brief; the full argument proceeds by first treating disjoint supports (where singularity is immediate) and then handling equal supports via the lemma described above. We will expand the proof section to present both cases sequentially and self-contained, removing any ambiguity about the extension. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on classical external theorem

full rationale

The paper's central claim equates equality testing of non-atomic measures to singularity testing of their kernel covariance embeddings via the classical Feldman-Hajek dichotomy for Gaussian measures on Hilbert spaces. This is an application of an independent, externally verifiable mathematical result rather than any self-definitional reduction, fitted-input prediction, or load-bearing self-citation chain. The abstract and derivation explicitly invoke the classical dichotomy without redefining it in terms of the paper's own constructions or renaming known results as novel. The argument remains self-contained against external benchmarks, with no steps that reduce the claimed equivalence to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The result rests on standard measure-theoretic and RKHS background rather than new free parameters or invented entities. The Feldman-Hajek dichotomy is invoked as an external theorem.

axioms (2)
  • domain assumption The underlying space is a locally compact uncountable Polish space and the measures are non-atomic Borel probability measures.
    Stated in the abstract as the setting where the equivalence holds.
  • domain assumption The kernel is positive definite and generates a reproducing kernel Hilbert space in which the covariance embeddings are well-defined centered Gaussians.
    Implicit in the definition of kernel covariance embedding used throughout the claim.

pith-pipeline@v0.9.0 · 5775 in / 1386 out tokens · 32471 ms · 2026-05-22T15:54:01.510531+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

  1. [1]

    Sulla determinazione empirica di una legge di distribuzione.Giornale dell’Istituto Italiano degli Attuari, 4:89–91, 1933

    Kolmogorov A. Sulla determinazione empirica di una legge di distribuzione.Giornale dell’Istituto Italiano degli Attuari, 4:89–91, 1933

  2. [2]

    Theory of reproducing kernels.Transactions of the American mathematical society, 68(3):337–404, 1950

    Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American mathematical society, 68(3):337–404, 1950

  3. [3]

    Information theory with kernel methods.IEEE Transactions on Information Theory, 69(2):752–775, 2022

    Francis Bach. Information theory with kernel methods.IEEE Transactions on Information Theory, 69(2):752–775, 2022

  4. [4]

    On covariance operators

    Charles R Baker. On covariance operators. Technical report, North Carolina State University. Dept. of Statistics, 1970

  5. [5]

    Springer Science & Business Media, 2011

    Alain Berlinet and Christine Thomas-Agnan.Reproducing kernel Hilbert spaces in probability and statis- tics. Springer Science & Business Media, 2011

  6. [6]

    Number 62

    Vladimir Igorevich Bogachev.Gaussian measures. Number 62. American Mathematical Soc., 1998

  7. [7]

    Carmeli, E

    C. Carmeli, E. De Vito, A. Toigo, and V. Umanit` a. Vector valued reproducing kernel hilbert spaces and universality.Analysis and Applications, 8(1):19–61, 2010

  8. [8]

    Choosing multiple param- eters for support vector machines.Machine learning, 46(1):131–159, 2002

    Olivier Chapelle, Vladimir Vapnik, Olivier Bousquet, and Sayan Mukherjee. Choosing multiple param- eters for support vector machines.Machine learning, 46(1):131–159, 2002

  9. [9]

    Boosting the power of kernel two-sample tests

    Anirban Chatterjee and Bhaswar B Bhattacharya. Boosting the power of kernel two-sample tests. Biometrika, page asae048, 2024

  10. [10]

    A new graph-based two-sample test for multivariate and object data.Journal of the American Statistical Association, 112(517):397–409, 2017

    Hao Chen and Jerome H Friedman. A new graph-based two-sample test for multivariate and object data.Journal of the American Statistical Association, 112(517):397–409, 2017. 14

  11. [11]

    Shojaeddin Chenouri and Christopher G. Small. A Nonparametric Multivariate Multisample Test Based on Data Depth.Electronic Journal of Statistics, 6(none):760 – 782, 2012

  12. [12]

    Testing for homogeneity with kernel fisher discrimi- nant analysis.Advances in Neural Information Processing Systems, 20, 2007

    Moulines Eric, Francis Bach, and Za¨ ıd Harchaoui. Testing for homogeneity with kernel fisher discrimi- nant analysis.Advances in Neural Information Processing Systems, 20, 2007

  13. [13]

    Equivalence and perpendicularity of gaussian processes.Pacific J

    Jacob Feldman. Equivalence and perpendicularity of gaussian processes.Pacific J. Math, 8(4):699–708, 1958

  14. [14]

    Sur une classe d’´ equations fonctionnelles.Acta Mathematica, 27(1):365–390, 1903

    Ivar Fredholm. Sur une classe d’´ equations fonctionnelles.Acta Mathematica, 27(1):365–390, 1903

  15. [15]

    Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces.Journal of Machine Learning Research, 5(Jan):73–99, 2004

    Kenji Fukumizu, Francis R Bach, and Michael I Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces.Journal of Machine Learning Research, 5(Jan):73–99, 2004

  16. [16]

    Kernel choice and classifiability for rkhs embeddings of probability distributions.Advances in neural information processing systems, 22, 2009

    Kenji Fukumizu, Arthur Gretton, Gert Lanckriet, Bernhard Sch¨ olkopf, and Bharath K Sriperumbudur. Kernel choice and classifiability for rkhs embeddings of probability distributions.Advances in neural information processing systems, 22, 2009

  17. [17]

    The probable error of a mean.Biometrika, 6:1–25, 1908

    William Sealy Gosset. The probable error of a mean.Biometrika, 6:1–25, 1908. Published under the pseudonym ”Student”

  18. [18]

    Wiley, New York, 1981

    Ulf Grenander.Abstract Inference. Wiley, New York, 1981

  19. [19]

    Borgwardt, Malte J

    Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13:723–773, 2012

  20. [20]

    A kernel statistical test of independence.Advances in neural information processing systems, 20, 2007

    Arthur Gretton, Kenji Fukumizu, Choon Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola. A kernel statistical test of independence.Advances in neural information processing systems, 20, 2007

  21. [21]

    Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola

    Arthur Gretton, Kenji Fukumizu, Choon H. Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola. Optimal kernel choice for large-scale two-sample tests. InAdvances in Neural Information Processing Systems (NeurIPS), volume 25, 2012

  22. [22]

    Spectral regularized kernel two-sample tests.The Annals of Statistics, 52(3):1076–1101, 2024

    Omar Hagrass, Bharath Sriperumbudur, and Bing Li. Spectral regularized kernel two-sample tests.The Annals of Statistics, 52(3):1076–1101, 2024

  23. [23]

    On a property of normal distributions of any stochastic process.Czechoslovak Mathe- matical Journal, 8(4):610–618, 1958

    Jaroslav Hajek. On a property of normal distributions of any stochastic process.Czechoslovak Mathe- matical Journal, 8(4):610–618, 1958

  24. [24]

    Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

    Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corrup- tions and perturbations.arXiv preprint arXiv:1903.12261, 2019

  25. [25]

    A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences.The Annals of Statistics, 16(2):772 – 783, 1988

    Norbert Henze. A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences.The Annals of Statistics, 16(2):772 – 783, 1988

  26. [26]

    Learning kernel in maximum mean discrepancy test

    Qianli Liu, Song Liu, Jian Li, and Dacheng Tao. Learning kernel in maximum mean discrepancy test. Statistical Analysis and Data Mining: The ASA Data Science Journal, 13(6):491–503, 2020

  27. [27]

    Mann and D.R

    H.B. Mann and D.R. Whitney. On a test of whether one of two random variables is stochastically larger than the other.The Annals of Mathematical Statistics, 18:50–60, 1947

  28. [28]

    Kernel mean embedding of distributions: A review and beyond.Foundations and Trends in Machine Learning, 10(1-2):1–141, 2017

    Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Bernhard Sch¨ olkopf, et al. Kernel mean embedding of distributions: A review and beyond.Foundations and Trends in Machine Learning, 10(1-2):1–141, 2017

  29. [29]

    Karl Pearson. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling.Philosophical Magazine, 50:157–175, 1900. 15

  30. [30]

    Discrimination of gaussian processes.Sankhy¯ a: The Indian Journal of Statistics, Series A, pages 303–330, 1963

    C Radhakrishna Rao and VS Varadarajan. Discrimination of gaussian processes.Sankhy¯ a: The Indian Journal of Statistics, Series A, pages 303–330, 1963

  31. [31]

    Do imagenet classifiers generalize to imagenet? InInternational Conference on Machine Learning, pages 5389–5400

    Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InInternational Conference on Machine Learning, pages 5389–5400. PMLR, 2019

  32. [32]

    Santoro and Victor M

    Leonardo V. Santoro and Victor M. Panaretos. Likelihood ratio tests by kernel gaussian embedding. arXiv preprint arXiv:2508.07982, 2025

  33. [33]

    Smola.Learning with Kernels: Support Vector Machines, Regu- larization, Optimization, and Beyond

    Bernhard Sch¨ olkopf and Alexander J. Smola.Learning with Kernels: Support Vector Machines, Regu- larization, Optimization, and Beyond. MIT Press, 2002

  34. [34]

    Mmd aggregated two-sample test.Journal of Machine Learning Research, 24(194):1–81, 2023

    Antonin Schrab, Ilmun Kim, M´ elisande Albert, B´ eatrice Laurent, Benjamin Guedj, and Arthur Gretton. Mmd aggregated two-sample test.Journal of Machine Learning Research, 24(194):1–81, 2023

  35. [35]

    A permutation-free kernel two-sample test

    Shubhanshu Shekhar, Ilmun Kim, and Aaditya Ramdas. A permutation-free kernel two-sample test. Advances in Neural Information Processing Systems, 35:18168–18180, 2022

  36. [36]

    Gaussian measures in function space.Pacific Journal of Mathematics, 17(1):167–173, 1966

    Lawrence Shepp. Gaussian measures in function space.Pacific Journal of Mathematics, 17(1):167–173, 1966

  37. [37]

    Notes on infinite determinants of hilbert space operators.Advances in Mathematics, 24(3):244–273, 1977

    Barry Simon. Notes on infinite determinants of hilbert space operators.Advances in Mathematics, 24(3):244–273, 1977

  38. [38]

    American Mathematical Society, Providence, RI, 2015

    Barry Simon.Operator theory, volume Part 4 ofA Comprehensive Course in Analysis. American Mathematical Society, Providence, RI, 2015

  39. [39]

    R. K. Singh and Ashok Kumar. Compact composition operators.J. Austral. Math. Soc. Ser. A, 28(3):309–314, 1979

  40. [40]

    Table for estimating the goodness of fit of empirical distributions.The annals of mathematical statistics, 19(2):279–281, 1948

    Nickolay Smirnov. Table for estimating the goodness of fit of empirical distributions.The annals of mathematical statistics, 19(2):279–281, 1948

  41. [41]

    Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Gert R

    Bharath K. Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Gert R. G. Lanckriet, and Bernhard Sch¨ olkopf. On the empirical estimation of integral probability metrics.Electronic Journal of Statistics, 6:1550–1599, 2011

  42. [42]

    On the influence of the kernel on the consistency of support vector machines.Journal of machine learning research, 2(Nov):67–93, 2001

    Ingo Steinwart. On the influence of the kernel on the consistency of support vector machines.Journal of machine learning research, 2(Nov):67–93, 2001

  43. [43]

    Unbiased look at dataset bias

    Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. InCVPR 2011, pages 1521–1528. IEEE, 2011

  44. [44]

    Waghmare, Tomas Masak, and Victor M

    Kartik G. Waghmare, Tomas Masak, and Victor M. Panaretos. The functional graphical lasso. Annals of Statistics(to appear, arXiv:2306.02347), 2023

  45. [45]

    Rand R. Wilcox. Two-sample, bivariate hypothesis testing methods based on tukey’s depth.Multivariate Behavioral Research, 38(2):225–246, 2003

  46. [46]

    Individual comparisons by ranking methods.Biometrics Bulletin, 1(6):80–83, 1945

    Frank Wilcoxon. Individual comparisons by ranking methods.Biometrics Bulletin, 1(6):80–83, 1945

  47. [47]

    General notions of statistical depth function.Annals of Statistics, pages 461–482, 2000

    Yijun Zuo and Robert Serfling. General notions of statistical depth function.Annals of Statistics, pages 461–482, 2000. 16