Kernel Embeddings and the Separation of Measure Phenomenon

Kartik G. Waghmare; Leonardo V. Santoro; Victor M. Panaretos

arxiv: 2505.04613 · v4 · pith:AQDNLDGEnew · submitted 2025-05-07 · 📊 stat.ML · cs.LG· math.ST· stat.TH

Kernel Embeddings and the Separation of Measure Phenomenon

Leonardo V. Santoro , Kartik G. Waghmare , Victor M. Panaretos This is my paper

Pith reviewed 2026-05-22 15:54 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords kernel embeddingsreproducing kernel Hilbert spaceGaussian measuressingularitytwo-sample testingFeldman-Hajek dichotomyprobability measuresseparation of measures

0 comments

The pith

Kernel covariance embeddings make equality testing of non-atomic measures equivalent to singularity testing of centered Gaussians in an RKHS

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that embedding non-atomic probability measures through their kernel covariances converts the problem of testing whether two such measures are identical into the problem of testing whether two centered Gaussian measures in the associated reproducing kernel Hilbert space are mutually singular. This equivalence holds on locally compact uncountable Polish spaces because the embedded Gaussians for distinct measures have supports that are essentially disjoint affine subspaces. A sympathetic reader would care because singularity testing between such Gaussians is structurally simpler and information-theoretically more decisive than direct nonparametric two-sample testing, especially in high-dimensional or complex settings. The argument relies on the Feldman-Hajek dichotomy and shows that even tiny perturbations of a continuous distribution become maximally separated after embedding.

Core claim

We prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. Testing for the equality of two non-atomic Borel probability measures on a locally compact uncountable Polish space is equivalent to testing for the singularity between two centered Gaussian measures on the reproducing kernel Hilbert space generated by the embedding kernel. The proof leverages the classical Feldman-Hajek dichotomy and demonstrates that small perturbations of continuous distributions are maximally magnified through their Gaussian embeddings.

What carries the argument

The kernel covariance embedding, which maps each probability measure to a centered Gaussian on the RKHS so that the Feldman-Hajek dichotomy can be applied directly to establish singularity for distinct measures

Load-bearing premise

The kernel must generate an RKHS in which the covariance embeddings of distinct non-atomic measures produce Gaussians whose supports are essentially separate affine subspaces

What would settle it

Two distinct non-atomic continuous measures on a Polish space together with a kernel such that the corresponding embedded centered Gaussians are not singular

Figures

Figures reproduced from arXiv: 2505.04613 by Kartik G. Waghmare, Leonardo V. Santoro, Victor M. Panaretos.

**Figure 2.** Figure 2: Gaussian embeddings magnify distributional differences in a structured fashion: distinct measures on X (P, Q on the left) are mapped to mutually singular Gaussian measures on H (NP, NQ on the right, where NP, NQ are either centered or uncentered Gaussian embeddings of P, Q). 7 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Monte Carlo Illustration of the sampling behaviour of MMD and [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

We prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. In statistical terms, we establish that testing for the \emph{equality} of two non-atomic (Borel) probability measures on a locally compact uncountable Polish space is \emph{equivalent} to testing for the \emph{singularity} between two centered Gaussian measures on a reproducing kernel Hilbert space. The corresponding Gaussians are defined via the notion of kernel covariance embedding of a probability measure, and the Hilbert space is that generated by the embedding kernel. Distinguishing singular Gaussians is structurally simpler from an information-theoretic perspective than non-parametric two-sample testing, particularly in complex or high-dimensional domains. This is because singular Gaussians are supported on essentially separate and affine subspaces. Our proof leverages the classical Feldman-H\'{a}jek dichotomy, and shows that even a small perturbation of a continuous distribution will be maximally magnified through its Gaussian embedding. This ``separation of measure phenomenon'' appears to be a blessing of infinite dimensionality, by means of embedding, with the potential to inform the design of efficient inference tools in considerable generality. The elicitation of this phenomenon also appears to crystallize, in a precise and simple mathematical statement, a core mechanism underpinning the empirical effectiveness of kernel methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper equates equality testing for non-atomic measures to singularity testing for their kernel-embedded Gaussians via Feldman-Hajek, but the shared-support case needs explicit verification in the proof.

read the letter

The main point is that testing whether two non-atomic Borel measures on a locally compact Polish space are equal is information-theoretically equivalent to testing whether the centered Gaussians defined by their kernel covariance embeddings are singular. The authors use the Feldman-Hajek dichotomy to make this link and argue that the embedding magnifies differences into separate affine supports for the Gaussians.

Referee Report

2 major / 2 minor

Summary. The paper claims to prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. It establishes that testing for the equality of two non-atomic Borel probability measures on a locally compact uncountable Polish space is equivalent to testing for the singularity between two centered Gaussian measures on the RKHS generated by the embedding kernel. The proof applies the classical Feldman-Hájek dichotomy to these embedded Gaussians, arguing that small perturbations of continuous distributions are maximally magnified, yielding a 'separation of measure phenomenon' that is a blessing of infinite dimensionality and may explain the effectiveness of kernel methods.

Significance. If the equivalence holds under the stated conditions, the result crystallizes a precise mechanism by which kernel embeddings convert non-parametric two-sample testing into the structurally simpler problem of distinguishing singular Gaussians supported on separate affine subspaces. This has potential to inform the design of efficient inference procedures in high-dimensional or complex domains. The manuscript receives credit for grounding the argument in the standard Feldman-Hájek dichotomy and for highlighting an infinite-dimensional phenomenon without introducing free parameters or ad-hoc entities.

major comments (2)

[Abstract] Abstract (paragraph on the separation of measure phenomenon): the claimed equivalence between equality testing for P ≠ Q and singularity of the embedded centered Gaussians G_P, G_Q via Feldman-Hájek requires that the covariance operators satisfy the necessary conditions for singularity even when supp(P) = supp(Q). The range of C_P is contained in the closure of span{φ(x) : x ∈ supp(P)}, so identical supports imply identical ranges and thus coinciding closed supports for G_P and G_Q; the manuscript must explicitly verify that C_P^{-1/2} C_Q C_P^{-1/2} − I fails to be Hilbert-Schmidt or has spectrum intersecting −1 for all such equal-support pairs, otherwise the information-theoretic equivalence does not hold without further restrictions on the kernel.
[Abstract and proof] The application of Feldman-Hájek (abstract and proof outline): the precise conditions on the kernel and the Polish space under which the embedded Gaussians are singular for every pair of distinct non-atomic measures are not fully inspectable from the abstract; if these conditions are only verified for measures with disjoint supports, the central claim is load-bearing on an unstated extension to the equal-support case.

minor comments (2)

[Abstract] Clarify the exact definition of the covariance embedding operator C_P f = ∫ ⟨f, φ(x)⟩ φ(x) dP(x) and its domain in the RKHS, including any measurability requirements.
[Throughout] The phrase 'essentially separate and affine subspaces' should be tied explicitly to the Feldman-Hájek support condition in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough reading and for identifying points that merit clarification. The comments correctly highlight the need to make the treatment of the equal-support case fully explicit. We will revise the manuscript to address this.

read point-by-point responses

Referee: The claimed equivalence requires that covariance operators satisfy singularity conditions even when supp(P) = supp(Q). Identical supports imply identical ranges and coinciding closed supports for G_P and G_Q; the manuscript must explicitly verify that C_P^{-1/2} C_Q C_P^{-1/2} − I fails to be Hilbert-Schmidt or has spectrum intersecting −1 for all such pairs.

Authors: We agree that explicit verification is required. The main theorem applies to all distinct non-atomic measures, but the current exposition emphasizes the disjoint-support case for clarity. For the equal-support case we will add a lemma establishing that, because the measures are distinct and non-atomic on an uncountable Polish space, the resulting covariance operators differ by a perturbation that violates the Feldman-Hájek equivalence criteria (the operator C_P^{-1/2} C_Q C_P^{-1/2} − I is never Hilbert-Schmidt and its spectrum intersects −1). This lemma will be inserted before the main proof and the abstract will be updated to reference it. revision: yes
Referee: The precise conditions on the kernel and Polish space under which the embedded Gaussians are singular for every pair of distinct non-atomic measures are not fully inspectable; if verified only for disjoint supports, the central claim rests on an unstated extension to the equal-support case.

Authors: The conditions are stated in Theorem 1: the kernel is continuous and characteristic, the space is locally compact and uncountable Polish, and the measures are non-atomic Borel probabilities. The proof outline in the abstract is necessarily brief; the full argument proceeds by first treating disjoint supports (where singularity is immediate) and then handling equal supports via the lemma described above. We will expand the proof section to present both cases sequentially and self-contained, removing any ambiguity about the extension. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on classical external theorem

full rationale

The paper's central claim equates equality testing of non-atomic measures to singularity testing of their kernel covariance embeddings via the classical Feldman-Hajek dichotomy for Gaussian measures on Hilbert spaces. This is an application of an independent, externally verifiable mathematical result rather than any self-definitional reduction, fitted-input prediction, or load-bearing self-citation chain. The abstract and derivation explicitly invoke the classical dichotomy without redefining it in terms of the paper's own constructions or renaming known results as novel. The argument remains self-contained against external benchmarks, with no steps that reduce the claimed equivalence to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The result rests on standard measure-theoretic and RKHS background rather than new free parameters or invented entities. The Feldman-Hajek dichotomy is invoked as an external theorem.

axioms (2)

domain assumption The underlying space is a locally compact uncountable Polish space and the measures are non-atomic Borel probability measures.
Stated in the abstract as the setting where the equivalence holds.
domain assumption The kernel is positive definite and generates a reproducing kernel Hilbert space in which the covariance embeddings are well-defined centered Gaussians.
Implicit in the definition of kernel covariance embedding used throughout the claim.

pith-pipeline@v0.9.0 · 5775 in / 1386 out tokens · 32471 ms · 2026-05-22T15:54:01.510531+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

testing for the equality of two non-atomic probability measures ... is equivalent to testing for the singularity between two centered Gaussian measures on a reproducing kernel Hilbert space
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our proof leverages the classical Feldman-Hájek dichotomy

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

[1]

Sulla determinazione empirica di una legge di distribuzione.Giornale dell’Istituto Italiano degli Attuari, 4:89–91, 1933

Kolmogorov A. Sulla determinazione empirica di una legge di distribuzione.Giornale dell’Istituto Italiano degli Attuari, 4:89–91, 1933

work page 1933
[2]

Theory of reproducing kernels.Transactions of the American mathematical society, 68(3):337–404, 1950

Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American mathematical society, 68(3):337–404, 1950

work page 1950
[3]

Information theory with kernel methods.IEEE Transactions on Information Theory, 69(2):752–775, 2022

Francis Bach. Information theory with kernel methods.IEEE Transactions on Information Theory, 69(2):752–775, 2022

work page 2022
[4]

On covariance operators

Charles R Baker. On covariance operators. Technical report, North Carolina State University. Dept. of Statistics, 1970

work page 1970
[5]

Springer Science & Business Media, 2011

Alain Berlinet and Christine Thomas-Agnan.Reproducing kernel Hilbert spaces in probability and statis- tics. Springer Science & Business Media, 2011

work page 2011
[6]

Number 62

Vladimir Igorevich Bogachev.Gaussian measures. Number 62. American Mathematical Soc., 1998

work page 1998
[7]

Carmeli, E

C. Carmeli, E. De Vito, A. Toigo, and V. Umanit` a. Vector valued reproducing kernel hilbert spaces and universality.Analysis and Applications, 8(1):19–61, 2010

work page 2010
[8]

Choosing multiple param- eters for support vector machines.Machine learning, 46(1):131–159, 2002

Olivier Chapelle, Vladimir Vapnik, Olivier Bousquet, and Sayan Mukherjee. Choosing multiple param- eters for support vector machines.Machine learning, 46(1):131–159, 2002

work page 2002
[9]

Boosting the power of kernel two-sample tests

Anirban Chatterjee and Bhaswar B Bhattacharya. Boosting the power of kernel two-sample tests. Biometrika, page asae048, 2024

work page 2024
[10]

A new graph-based two-sample test for multivariate and object data.Journal of the American Statistical Association, 112(517):397–409, 2017

Hao Chen and Jerome H Friedman. A new graph-based two-sample test for multivariate and object data.Journal of the American Statistical Association, 112(517):397–409, 2017. 14

work page 2017
[11]

Shojaeddin Chenouri and Christopher G. Small. A Nonparametric Multivariate Multisample Test Based on Data Depth.Electronic Journal of Statistics, 6(none):760 – 782, 2012

work page 2012
[12]

Testing for homogeneity with kernel fisher discrimi- nant analysis.Advances in Neural Information Processing Systems, 20, 2007

Moulines Eric, Francis Bach, and Za¨ ıd Harchaoui. Testing for homogeneity with kernel fisher discrimi- nant analysis.Advances in Neural Information Processing Systems, 20, 2007

work page 2007
[13]

Equivalence and perpendicularity of gaussian processes.Pacific J

Jacob Feldman. Equivalence and perpendicularity of gaussian processes.Pacific J. Math, 8(4):699–708, 1958

work page 1958
[14]

Sur une classe d’´ equations fonctionnelles.Acta Mathematica, 27(1):365–390, 1903

Ivar Fredholm. Sur une classe d’´ equations fonctionnelles.Acta Mathematica, 27(1):365–390, 1903

work page 1903
[15]

Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces.Journal of Machine Learning Research, 5(Jan):73–99, 2004

Kenji Fukumizu, Francis R Bach, and Michael I Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces.Journal of Machine Learning Research, 5(Jan):73–99, 2004

work page 2004
[16]

Kernel choice and classifiability for rkhs embeddings of probability distributions.Advances in neural information processing systems, 22, 2009

Kenji Fukumizu, Arthur Gretton, Gert Lanckriet, Bernhard Sch¨ olkopf, and Bharath K Sriperumbudur. Kernel choice and classifiability for rkhs embeddings of probability distributions.Advances in neural information processing systems, 22, 2009

work page 2009
[17]

The probable error of a mean.Biometrika, 6:1–25, 1908

William Sealy Gosset. The probable error of a mean.Biometrika, 6:1–25, 1908. Published under the pseudonym ”Student”

work page 1908
[18]

Wiley, New York, 1981

Ulf Grenander.Abstract Inference. Wiley, New York, 1981

work page 1981
[19]

Borgwardt, Malte J

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13:723–773, 2012

work page 2012
[20]

A kernel statistical test of independence.Advances in neural information processing systems, 20, 2007

Arthur Gretton, Kenji Fukumizu, Choon Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola. A kernel statistical test of independence.Advances in neural information processing systems, 20, 2007

work page 2007
[21]

Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola

Arthur Gretton, Kenji Fukumizu, Choon H. Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola. Optimal kernel choice for large-scale two-sample tests. InAdvances in Neural Information Processing Systems (NeurIPS), volume 25, 2012

work page 2012
[22]

Spectral regularized kernel two-sample tests.The Annals of Statistics, 52(3):1076–1101, 2024

Omar Hagrass, Bharath Sriperumbudur, and Bing Li. Spectral regularized kernel two-sample tests.The Annals of Statistics, 52(3):1076–1101, 2024

work page 2024
[23]

On a property of normal distributions of any stochastic process.Czechoslovak Mathe- matical Journal, 8(4):610–618, 1958

Jaroslav Hajek. On a property of normal distributions of any stochastic process.Czechoslovak Mathe- matical Journal, 8(4):610–618, 1958

work page 1958
[24]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corrup- tions and perturbations.arXiv preprint arXiv:1903.12261, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[25]

A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences.The Annals of Statistics, 16(2):772 – 783, 1988

Norbert Henze. A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences.The Annals of Statistics, 16(2):772 – 783, 1988

work page 1988
[26]

Learning kernel in maximum mean discrepancy test

Qianli Liu, Song Liu, Jian Li, and Dacheng Tao. Learning kernel in maximum mean discrepancy test. Statistical Analysis and Data Mining: The ASA Data Science Journal, 13(6):491–503, 2020

work page 2020
[27]

Mann and D.R

H.B. Mann and D.R. Whitney. On a test of whether one of two random variables is stochastically larger than the other.The Annals of Mathematical Statistics, 18:50–60, 1947

work page 1947
[28]

Kernel mean embedding of distributions: A review and beyond.Foundations and Trends in Machine Learning, 10(1-2):1–141, 2017

Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Bernhard Sch¨ olkopf, et al. Kernel mean embedding of distributions: A review and beyond.Foundations and Trends in Machine Learning, 10(1-2):1–141, 2017

work page 2017
[29]

Karl Pearson. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling.Philosophical Magazine, 50:157–175, 1900. 15

work page 1900
[30]

Discrimination of gaussian processes.Sankhy¯ a: The Indian Journal of Statistics, Series A, pages 303–330, 1963

C Radhakrishna Rao and VS Varadarajan. Discrimination of gaussian processes.Sankhy¯ a: The Indian Journal of Statistics, Series A, pages 303–330, 1963

work page 1963
[31]

Do imagenet classifiers generalize to imagenet? InInternational Conference on Machine Learning, pages 5389–5400

Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InInternational Conference on Machine Learning, pages 5389–5400. PMLR, 2019

work page 2019
[32]

Santoro and Victor M

Leonardo V. Santoro and Victor M. Panaretos. Likelihood ratio tests by kernel gaussian embedding. arXiv preprint arXiv:2508.07982, 2025

work page arXiv 2025
[33]

Smola.Learning with Kernels: Support Vector Machines, Regu- larization, Optimization, and Beyond

Bernhard Sch¨ olkopf and Alexander J. Smola.Learning with Kernels: Support Vector Machines, Regu- larization, Optimization, and Beyond. MIT Press, 2002

work page 2002
[34]

Mmd aggregated two-sample test.Journal of Machine Learning Research, 24(194):1–81, 2023

Antonin Schrab, Ilmun Kim, M´ elisande Albert, B´ eatrice Laurent, Benjamin Guedj, and Arthur Gretton. Mmd aggregated two-sample test.Journal of Machine Learning Research, 24(194):1–81, 2023

work page 2023
[35]

A permutation-free kernel two-sample test

Shubhanshu Shekhar, Ilmun Kim, and Aaditya Ramdas. A permutation-free kernel two-sample test. Advances in Neural Information Processing Systems, 35:18168–18180, 2022

work page 2022
[36]

Gaussian measures in function space.Pacific Journal of Mathematics, 17(1):167–173, 1966

Lawrence Shepp. Gaussian measures in function space.Pacific Journal of Mathematics, 17(1):167–173, 1966

work page 1966
[37]

Notes on infinite determinants of hilbert space operators.Advances in Mathematics, 24(3):244–273, 1977

Barry Simon. Notes on infinite determinants of hilbert space operators.Advances in Mathematics, 24(3):244–273, 1977

work page 1977
[38]

American Mathematical Society, Providence, RI, 2015

Barry Simon.Operator theory, volume Part 4 ofA Comprehensive Course in Analysis. American Mathematical Society, Providence, RI, 2015

work page 2015
[39]

R. K. Singh and Ashok Kumar. Compact composition operators.J. Austral. Math. Soc. Ser. A, 28(3):309–314, 1979

work page 1979
[40]

Table for estimating the goodness of fit of empirical distributions.The annals of mathematical statistics, 19(2):279–281, 1948

Nickolay Smirnov. Table for estimating the goodness of fit of empirical distributions.The annals of mathematical statistics, 19(2):279–281, 1948

work page 1948
[41]

Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Gert R

Bharath K. Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Gert R. G. Lanckriet, and Bernhard Sch¨ olkopf. On the empirical estimation of integral probability metrics.Electronic Journal of Statistics, 6:1550–1599, 2011

work page 2011
[42]

On the influence of the kernel on the consistency of support vector machines.Journal of machine learning research, 2(Nov):67–93, 2001

Ingo Steinwart. On the influence of the kernel on the consistency of support vector machines.Journal of machine learning research, 2(Nov):67–93, 2001

work page 2001
[43]

Unbiased look at dataset bias

Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. InCVPR 2011, pages 1521–1528. IEEE, 2011

work page 2011
[44]

Waghmare, Tomas Masak, and Victor M

Kartik G. Waghmare, Tomas Masak, and Victor M. Panaretos. The functional graphical lasso. Annals of Statistics(to appear, arXiv:2306.02347), 2023

work page arXiv 2023
[45]

Rand R. Wilcox. Two-sample, bivariate hypothesis testing methods based on tukey’s depth.Multivariate Behavioral Research, 38(2):225–246, 2003

work page 2003
[46]

Individual comparisons by ranking methods.Biometrics Bulletin, 1(6):80–83, 1945

Frank Wilcoxon. Individual comparisons by ranking methods.Biometrics Bulletin, 1(6):80–83, 1945

work page 1945
[47]

General notions of statistical depth function.Annals of Statistics, pages 461–482, 2000

Yijun Zuo and Robert Serfling. General notions of statistical depth function.Annals of Statistics, pages 461–482, 2000. 16

work page 2000

[1] [1]

Sulla determinazione empirica di una legge di distribuzione.Giornale dell’Istituto Italiano degli Attuari, 4:89–91, 1933

Kolmogorov A. Sulla determinazione empirica di una legge di distribuzione.Giornale dell’Istituto Italiano degli Attuari, 4:89–91, 1933

work page 1933

[2] [2]

Theory of reproducing kernels.Transactions of the American mathematical society, 68(3):337–404, 1950

Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American mathematical society, 68(3):337–404, 1950

work page 1950

[3] [3]

Information theory with kernel methods.IEEE Transactions on Information Theory, 69(2):752–775, 2022

Francis Bach. Information theory with kernel methods.IEEE Transactions on Information Theory, 69(2):752–775, 2022

work page 2022

[4] [4]

On covariance operators

Charles R Baker. On covariance operators. Technical report, North Carolina State University. Dept. of Statistics, 1970

work page 1970

[5] [5]

Springer Science & Business Media, 2011

Alain Berlinet and Christine Thomas-Agnan.Reproducing kernel Hilbert spaces in probability and statis- tics. Springer Science & Business Media, 2011

work page 2011

[6] [6]

Number 62

Vladimir Igorevich Bogachev.Gaussian measures. Number 62. American Mathematical Soc., 1998

work page 1998

[7] [7]

Carmeli, E

C. Carmeli, E. De Vito, A. Toigo, and V. Umanit` a. Vector valued reproducing kernel hilbert spaces and universality.Analysis and Applications, 8(1):19–61, 2010

work page 2010

[8] [8]

Choosing multiple param- eters for support vector machines.Machine learning, 46(1):131–159, 2002

Olivier Chapelle, Vladimir Vapnik, Olivier Bousquet, and Sayan Mukherjee. Choosing multiple param- eters for support vector machines.Machine learning, 46(1):131–159, 2002

work page 2002

[9] [9]

Boosting the power of kernel two-sample tests

Anirban Chatterjee and Bhaswar B Bhattacharya. Boosting the power of kernel two-sample tests. Biometrika, page asae048, 2024

work page 2024

[10] [10]

A new graph-based two-sample test for multivariate and object data.Journal of the American Statistical Association, 112(517):397–409, 2017

Hao Chen and Jerome H Friedman. A new graph-based two-sample test for multivariate and object data.Journal of the American Statistical Association, 112(517):397–409, 2017. 14

work page 2017

[11] [11]

Shojaeddin Chenouri and Christopher G. Small. A Nonparametric Multivariate Multisample Test Based on Data Depth.Electronic Journal of Statistics, 6(none):760 – 782, 2012

work page 2012

[12] [12]

Testing for homogeneity with kernel fisher discrimi- nant analysis.Advances in Neural Information Processing Systems, 20, 2007

Moulines Eric, Francis Bach, and Za¨ ıd Harchaoui. Testing for homogeneity with kernel fisher discrimi- nant analysis.Advances in Neural Information Processing Systems, 20, 2007

work page 2007

[13] [13]

Equivalence and perpendicularity of gaussian processes.Pacific J

Jacob Feldman. Equivalence and perpendicularity of gaussian processes.Pacific J. Math, 8(4):699–708, 1958

work page 1958

[14] [14]

Sur une classe d’´ equations fonctionnelles.Acta Mathematica, 27(1):365–390, 1903

Ivar Fredholm. Sur une classe d’´ equations fonctionnelles.Acta Mathematica, 27(1):365–390, 1903

work page 1903

[15] [15]

Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces.Journal of Machine Learning Research, 5(Jan):73–99, 2004

Kenji Fukumizu, Francis R Bach, and Michael I Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces.Journal of Machine Learning Research, 5(Jan):73–99, 2004

work page 2004

[16] [16]

Kernel choice and classifiability for rkhs embeddings of probability distributions.Advances in neural information processing systems, 22, 2009

Kenji Fukumizu, Arthur Gretton, Gert Lanckriet, Bernhard Sch¨ olkopf, and Bharath K Sriperumbudur. Kernel choice and classifiability for rkhs embeddings of probability distributions.Advances in neural information processing systems, 22, 2009

work page 2009

[17] [17]

The probable error of a mean.Biometrika, 6:1–25, 1908

William Sealy Gosset. The probable error of a mean.Biometrika, 6:1–25, 1908. Published under the pseudonym ”Student”

work page 1908

[18] [18]

Wiley, New York, 1981

Ulf Grenander.Abstract Inference. Wiley, New York, 1981

work page 1981

[19] [19]

Borgwardt, Malte J

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13:723–773, 2012

work page 2012

[20] [20]

A kernel statistical test of independence.Advances in neural information processing systems, 20, 2007

Arthur Gretton, Kenji Fukumizu, Choon Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola. A kernel statistical test of independence.Advances in neural information processing systems, 20, 2007

work page 2007

[21] [21]

Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola

Arthur Gretton, Kenji Fukumizu, Choon H. Teo, Le Song, Bernhard Sch¨ olkopf, and Alex Smola. Optimal kernel choice for large-scale two-sample tests. InAdvances in Neural Information Processing Systems (NeurIPS), volume 25, 2012

work page 2012

[22] [22]

Spectral regularized kernel two-sample tests.The Annals of Statistics, 52(3):1076–1101, 2024

Omar Hagrass, Bharath Sriperumbudur, and Bing Li. Spectral regularized kernel two-sample tests.The Annals of Statistics, 52(3):1076–1101, 2024

work page 2024

[23] [23]

On a property of normal distributions of any stochastic process.Czechoslovak Mathe- matical Journal, 8(4):610–618, 1958

Jaroslav Hajek. On a property of normal distributions of any stochastic process.Czechoslovak Mathe- matical Journal, 8(4):610–618, 1958

work page 1958

[24] [24]

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corrup- tions and perturbations.arXiv preprint arXiv:1903.12261, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903

[25] [25]

A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences.The Annals of Statistics, 16(2):772 – 783, 1988

Norbert Henze. A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences.The Annals of Statistics, 16(2):772 – 783, 1988

work page 1988

[26] [26]

Learning kernel in maximum mean discrepancy test

Qianli Liu, Song Liu, Jian Li, and Dacheng Tao. Learning kernel in maximum mean discrepancy test. Statistical Analysis and Data Mining: The ASA Data Science Journal, 13(6):491–503, 2020

work page 2020

[27] [27]

Mann and D.R

H.B. Mann and D.R. Whitney. On a test of whether one of two random variables is stochastically larger than the other.The Annals of Mathematical Statistics, 18:50–60, 1947

work page 1947

[28] [28]

Kernel mean embedding of distributions: A review and beyond.Foundations and Trends in Machine Learning, 10(1-2):1–141, 2017

Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Bernhard Sch¨ olkopf, et al. Kernel mean embedding of distributions: A review and beyond.Foundations and Trends in Machine Learning, 10(1-2):1–141, 2017

work page 2017

[29] [29]

Karl Pearson. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling.Philosophical Magazine, 50:157–175, 1900. 15

work page 1900

[30] [30]

Discrimination of gaussian processes.Sankhy¯ a: The Indian Journal of Statistics, Series A, pages 303–330, 1963

C Radhakrishna Rao and VS Varadarajan. Discrimination of gaussian processes.Sankhy¯ a: The Indian Journal of Statistics, Series A, pages 303–330, 1963

work page 1963

[31] [31]

Do imagenet classifiers generalize to imagenet? InInternational Conference on Machine Learning, pages 5389–5400

Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? InInternational Conference on Machine Learning, pages 5389–5400. PMLR, 2019

work page 2019

[32] [32]

Santoro and Victor M

Leonardo V. Santoro and Victor M. Panaretos. Likelihood ratio tests by kernel gaussian embedding. arXiv preprint arXiv:2508.07982, 2025

work page arXiv 2025

[33] [33]

Smola.Learning with Kernels: Support Vector Machines, Regu- larization, Optimization, and Beyond

Bernhard Sch¨ olkopf and Alexander J. Smola.Learning with Kernels: Support Vector Machines, Regu- larization, Optimization, and Beyond. MIT Press, 2002

work page 2002

[34] [34]

Mmd aggregated two-sample test.Journal of Machine Learning Research, 24(194):1–81, 2023

Antonin Schrab, Ilmun Kim, M´ elisande Albert, B´ eatrice Laurent, Benjamin Guedj, and Arthur Gretton. Mmd aggregated two-sample test.Journal of Machine Learning Research, 24(194):1–81, 2023

work page 2023

[35] [35]

A permutation-free kernel two-sample test

Shubhanshu Shekhar, Ilmun Kim, and Aaditya Ramdas. A permutation-free kernel two-sample test. Advances in Neural Information Processing Systems, 35:18168–18180, 2022

work page 2022

[36] [36]

Gaussian measures in function space.Pacific Journal of Mathematics, 17(1):167–173, 1966

Lawrence Shepp. Gaussian measures in function space.Pacific Journal of Mathematics, 17(1):167–173, 1966

work page 1966

[37] [37]

Notes on infinite determinants of hilbert space operators.Advances in Mathematics, 24(3):244–273, 1977

Barry Simon. Notes on infinite determinants of hilbert space operators.Advances in Mathematics, 24(3):244–273, 1977

work page 1977

[38] [38]

American Mathematical Society, Providence, RI, 2015

Barry Simon.Operator theory, volume Part 4 ofA Comprehensive Course in Analysis. American Mathematical Society, Providence, RI, 2015

work page 2015

[39] [39]

R. K. Singh and Ashok Kumar. Compact composition operators.J. Austral. Math. Soc. Ser. A, 28(3):309–314, 1979

work page 1979

[40] [40]

Table for estimating the goodness of fit of empirical distributions.The annals of mathematical statistics, 19(2):279–281, 1948

Nickolay Smirnov. Table for estimating the goodness of fit of empirical distributions.The annals of mathematical statistics, 19(2):279–281, 1948

work page 1948

[41] [41]

Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Gert R

Bharath K. Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Gert R. G. Lanckriet, and Bernhard Sch¨ olkopf. On the empirical estimation of integral probability metrics.Electronic Journal of Statistics, 6:1550–1599, 2011

work page 2011

[42] [42]

On the influence of the kernel on the consistency of support vector machines.Journal of machine learning research, 2(Nov):67–93, 2001

Ingo Steinwart. On the influence of the kernel on the consistency of support vector machines.Journal of machine learning research, 2(Nov):67–93, 2001

work page 2001

[43] [43]

Unbiased look at dataset bias

Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. InCVPR 2011, pages 1521–1528. IEEE, 2011

work page 2011

[44] [44]

Waghmare, Tomas Masak, and Victor M

Kartik G. Waghmare, Tomas Masak, and Victor M. Panaretos. The functional graphical lasso. Annals of Statistics(to appear, arXiv:2306.02347), 2023

work page arXiv 2023

[45] [45]

Rand R. Wilcox. Two-sample, bivariate hypothesis testing methods based on tukey’s depth.Multivariate Behavioral Research, 38(2):225–246, 2003

work page 2003

[46] [46]

Individual comparisons by ranking methods.Biometrics Bulletin, 1(6):80–83, 1945

Frank Wilcoxon. Individual comparisons by ranking methods.Biometrics Bulletin, 1(6):80–83, 1945

work page 1945

[47] [47]

General notions of statistical depth function.Annals of Statistics, pages 461–482, 2000

Yijun Zuo and Robert Serfling. General notions of statistical depth function.Annals of Statistics, pages 461–482, 2000. 16

work page 2000