Measuring Differences between Conditional Distributions using Kernel Embeddings

Dino Sejdinovic; Peter Moskvichev; Siu Lun Chau

arxiv: 2605.02260 · v1 · submitted 2026-05-04 · 📊 stat.ML · cs.LG

Measuring Differences between Conditional Distributions using Kernel Embeddings

Peter Moskvichev , Siu Lun Chau , Dino Sejdinovic This is my paper

Pith reviewed 2026-05-08 19:19 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords conditional distributionskernel embeddingsmaximum mean discrepancydoubly robust estimationRKHSstatistical testingconditional dependenceoperator smoothing

0 comments

The pith

Kernel embeddings define a family of metrics for measuring differences between conditional distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a unified framework for comparing conditional distributions using kernel methods. It introduces the conditional maximum mean discrepancy (CMMD) as a family of metrics with different levels based on various reproducing kernel Hilbert space embeddings. These levels are connected mathematically through operator-based smoothing, addressing the fragmentation in prior work. A novel doubly robust estimator is presented that stays consistent even if only one of the underlying models is accurate. This matters because accurate comparison of conditional distributions supports statistical testing of dependencies across machine learning applications.

Core claim

The CMMD consists of a family of metrics which we call levels, with three special cases each using a different type of RKHS embedding: CMMD0 (conditional mean operators), CMMD1 (conditional mean embeddings), and CMMD2 (joint mean embeddings). We additionally introduce a general level s CMMD, clarifying the required assumptions, and establishing mathematical connections between the levels through the lens of operator-based smoothing. In addition to reviewing previously proposed estimators, we introduce a novel doubly robust estimator for the CMMD that maintains consistency provided at least one of the underlying models is correctly specified.

What carries the argument

The conditional maximum mean discrepancy (CMMD) family, which quantifies differences between conditional distributions through RKHS embeddings at varying levels connected by operator smoothing.

If this is right

The levels of CMMD provide a hierarchy allowing comparisons under different assumptions on the conditional distributions.
The doubly robust estimator enables consistent measurement of discrepancies without requiring full correctness of both models.
CMMD supports statistical testing that detects complex conditional dependencies, as verified in numerical experiments.
Operator smoothing establishes explicit mathematical links between different embedding-based approaches to conditional comparison.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The smoothing connections between levels suggest the framework could unify other kernel-based conditional tests beyond those reviewed.
The robustness property may extend naturally to settings like causal effect estimation where conditional distributions are central.
Applying the general level s CMMD to high-dimensional or structured data could reveal practical performance differences among levels.

Load-bearing premise

The doubly robust estimator maintains consistency only when at least one of the models for the conditional distribution or the embedding is correctly specified.

What would settle it

An experiment where the estimator fails to converge to the true value when both the conditional distribution model and the embedding model are deliberately misspecified would challenge the consistency claim.

Figures

Figures reproduced from arXiv: 2605.02260 by Dino Sejdinovic, Peter Moskvichev, Siu Lun Chau.

**Figure 1.** Figure 1: Illustration of the three levels of CMMD using different types of RKHS embedding. view at source ↗

**Figure 2.** Figure 2: Rejection rates for CMMD test. Left: Power curve shows that CMMD view at source ↗

**Figure 3.** Figure 3: Rejection rates for level s CMMD test. Left: Under Setting 1, extra smoothing deteriorates test power. Right: Under Setting 2, extra smoothing increases test power up to some limit. increases as the parameter approaches 1. As before, we use a Gaussian kernel for k and ℓ with median heuristic bandwidth. A regularization parameter of λ = 0.1 is applied and tests are conducted with n = 100 samples view at source ↗

**Figure 4.** Figure 4: Left: Plot of difference in CMEs illustrates that the DR estimator gives a closer view at source ↗

**Figure 5.** Figure 5: Rejection rates for CMMD test on MNIST data. Left: Under view at source ↗

**Figure 6.** Figure 6: Plots of simulated data (top) and distribution of test statistics (bottom) under view at source ↗

**Figure 7.** Figure 7: Rejection rates of hypothesis test under Setting 1 (left) and Setting 2 (right). view at source ↗

**Figure 8.** Figure 8: Left: Data sampled from P and CME model µˆY |X. Middle: Data sampled from Q and CME model µˆZ|X. Right: Pseudo-outcomes computed on the combined data and the DR model ∆ˆ k(·, x). Despite individual CME models being misspecified, the DR estimator fitted on the pseudo-outcomes correctly models the true difference between CMEs. Next, we consider two-sample testing under the null hypothesis, that is, condition… view at source ↗

**Figure 9.** Figure 9: Rejection rate under the null hypothesis. All tests, both using standard and view at source ↗

read the original abstract

Comparing conditional distributions is a fundamental challenge in statistics and machine learning, with applications across a wide range of domains. While proposed methods for measuring discrepancies using kernel embeddings of distributions in a reproducing kernel Hilbert space (RKHS) provide powerful non-parametric techniques, the existing literature remains fragmented and lacks a unified theoretical treatment. This paper addresses this gap by establishing a coherent framework for studying kernel-based methods to measure divergence between conditional distributions through what we refer to as conditional maximum mean discrepancy (CMMD). The CMMD consists of a family of metrics which we call levels, with three special cases each using a different type of RKHS embedding: CMMD$_0$ (conditional mean operators), CMMD$_1$ (conditional mean embeddings), and CMMD$_2$ (joint mean embeddings). We additionally introduce a general level $s$ CMMD, clarifying the required assumptions, and establishing mathematical connections between the levels through the lens of operator-based smoothing. In addition to reviewing previously proposed estimators, we introduce a novel doubly robust estimator for the CMMD that maintains consistency provided at least one of the underlying models is correctly specified. We provide numerical experiments demonstrating that the CMMD effectively captures complex conditional dependencies for statistical testing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper unifies scattered kernel conditional discrepancy measures as CMMD levels linked by operator smoothing and adds a doubly robust estimator that stays consistent if at least one nuisance model is correct.

read the letter

The main new piece is the general level-s CMMD together with the explicit links between the special cases (conditional mean operators, mean embeddings, joint embeddings) via smoothing operators. That framing organizes prior work without forcing everything into one rigid form and spells out the assumptions more clearly than most earlier papers. The doubly robust estimator is also concrete: it only needs one of the two models (conditional distribution or embedding) to be correctly specified, which is a standard but useful semiparametric property here. The review of existing estimators and the numerical experiments on synthetic dependence detection round out the contribution without overclaiming.

Referee Report

1 major / 3 minor

Summary. The paper proposes a unified framework called conditional maximum mean discrepancy (CMMD) for measuring differences between conditional distributions via RKHS kernel embeddings. It defines a family of metrics at different 'levels': CMMD0 using conditional mean operators, CMMD1 using conditional mean embeddings, CMMD2 using joint mean embeddings, plus a general level-s version. Mathematical connections between levels are established through operator-based smoothing, assumptions are clarified, prior estimators are reviewed, and a novel doubly robust estimator is introduced whose consistency requires only that at least one of the two nuisance models (conditional distribution or embedding) is correctly specified. Numerical experiments illustrate effectiveness for statistical testing of conditional dependencies.

Significance. If the claimed connections via smoothing operators and the doubly-robust consistency property hold with the stated proofs, the work provides a coherent unification of previously fragmented kernel methods for conditional discrepancies. The doubly robust estimator is a clear practical advance in the semiparametric RKHS setting, and the level-s generalization with explicit assumptions strengthens the theoretical foundation. These elements would make the manuscript a useful reference for nonparametric conditional inference.

major comments (1)

[§4.2] §4.2, Theorem 4 (doubly robust estimator): the consistency argument under the 'at least one correct model' condition is load-bearing for the main contribution; the provided derivation should explicitly bound the cross-term bias using the RKHS norm of the smoothing operator rather than invoking it only asymptotically.

minor comments (3)

[Abstract] Abstract: the description of numerical experiments omits the specific datasets, sample sizes, and baseline methods used; a one-sentence summary would improve clarity.
[§3.1] §3.1, Eq. (7): the notation for the conditional mean operator could be aligned more explicitly with the subsequent level-1 and level-2 definitions to ease comparison.
[§5] §5: the caption of Figure 2 should state the kernel bandwidth selection procedure and the number of Monte Carlo repetitions for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for the constructive comment on the doubly robust estimator. We address the point below.

read point-by-point responses

Referee: [§4.2] §4.2, Theorem 4 (doubly robust estimator): the consistency argument under the 'at least one correct model' condition is load-bearing for the main contribution; the provided derivation should explicitly bound the cross-term bias using the RKHS norm of the smoothing operator rather than invoking it only asymptotically.

Authors: We thank the referee for highlighting this point. We agree that the current derivation of the cross-term bias in the proof of Theorem 4 relies on an asymptotic argument and would benefit from an explicit finite-sample bound expressed via the RKHS norm of the smoothing operator (as introduced in the operator-smoothing connections of Section 3). In the revised manuscript we will expand the proof in §4.2 to derive and insert this bound, using the same operator-norm control already established for the level-s family. The revised argument will remain fully rigorous under the stated 'at least one correct model' condition and will not alter the theorem statement or its implications. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines the CMMD family via standard RKHS embeddings (conditional mean operators, mean embeddings, joint embeddings) and operator smoothing, then reviews estimators and proposes a doubly-robust one whose consistency property follows directly from semiparametric theory requiring only one correct nuisance model. No derivation step reduces by construction to a fitted parameter, self-referential definition, or load-bearing self-citation; all connections are derived from established kernel and operator theory without renaming known results or smuggling ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based on abstract only; full details unavailable. Paper likely rests on standard RKHS properties for mean embeddings and operator theory.

axioms (1)

domain assumption Reproducing kernel Hilbert spaces admit well-defined mean embeddings and conditional mean operators for the distributions of interest.
Invoked to define the CMMD levels and their connections via smoothing.

invented entities (1)

CMMD levels (0, 1, 2, and general s) no independent evidence
purpose: Family of metrics to quantify divergence between conditional distributions
Newly introduced family with specific embeddings; no independent evidence provided in abstract.

pith-pipeline@v0.9.0 · 5514 in / 1348 out tokens · 89973 ms · 2026-05-08T19:19:32.975212+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost (J(x) = ½(x+x⁻¹)−1) washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CMMD_s^2(P_Y|X, Q_Z|X) = ‖Δ C_XX^{s/2}‖^2 = Tr(Δ* Δ C_XX^s) where s ≥ 0. ... intuitively higher levels correspond to greater amounts of smoothing caused by the marginal distribution of X.
IndisputableMonolith/Foundation (forcing chain, parameter-free derivations) reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a novel doubly robust estimator for the CMMD that maintains consistency provided at least one of the underlying models is correctly specified.
IndisputableMonolith/Foundation/AlexanderDuality (integer dimension forcing) alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Considering fractional levels of smoothing gives rise to a general level s CMMD which we investigate further in this work.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

[1]

Judea Pearl , journal =

work page
[2]

Improving predictive inference under covariate shift by weighting the log-likelihood function , volume =

Shimodaira, Hidetoshi , journal =. Improving predictive inference under covariate shift by weighting the log-likelihood function , volume =

work page
[3]

Discriminative

Bickel, Steffen and Br. Discriminative. Journal of Machine Learning Research , number =

work page
[4]

Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation , year =

Masashi Sugiyama and Motoaki Kawanabe , publisher =. Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation , year =

work page
[5]

Wainwright , journal =

Cong Ma and Reese Pathak and Martin J. Wainwright , journal =

work page
[6]

Calibration by

Marx, Charlie and Zalouk, Sofian and Ermon, Stefano , booktitle =. Calibration by

work page
[7]

Generalized kernel two-sample tests , volume =

Song, Hoseung and Chen, Hao , journal =. Generalized kernel two-sample tests , volume =

work page
[8]

Gretton, Arthur and Fukumizu, Kenji and Teo, Choon and Song, Le and Sch. A. Advances in

work page
[9]

A Kernel Test of Goodness of Fit , year =

Chwialkowski, Kacper and Strathmann, Heiko and Gretton, Arthur , booktitle =. A Kernel Test of Goodness of Fit , year =

work page
[10]

Composite

Key, Oscar and Gretton, Arthur and Briol, Fran. Composite. Journal of Machine Learning Research , number =

work page
[11]

Smola, Alex and Gretton, Arthur and Song, Le and Sch. A. Algorithmic

work page
[12]

Advances in Neural Information Processing Systems , title =

Massiani, Pierre-Fran. Advances in Neural Information Processing Systems , title =

work page
[13]

Distance and

Yan, Jian and Li, Zhuoxi and Zhang, Xianyang , note =. Distance and

work page
[14]

, note =

Chatterjee, Anirban and Niu, Ziang and Bhattacharya, Bhaswar B. , note =. A

work page
[15]

Lee, Seongchan and Cha, Suman and Kim, Ilmun , note =. General

work page
[16]

A Two-Sample Conditional Distribution Test Using Conformal Prediction and Weighted Rank Sum , volume =

Xiaoyu Hu and Jing Lei , journal =. A Two-Sample Conditional Distribution Test Using Conformal Prediction and Weighted Rank Sum , volume =

work page
[17]

Conference on Uncertainty in Artificial Intelligence , year =

Boeken, Philip and Mooij, Joris , title =. Conference on Uncertainty in Artificial Intelligence , year =

work page
[18]

Teymur, Onur and Filippi, Sarah , journal =. A

work page
[19]

2026 , note =

Conditional Distributional Treatment Effects: Doubly Robust Estimation and Testing , author=. 2026 , note =

work page 2026
[20]

Smoothing noisy data with spline functions:

Craven, Peter and Wahba, Grace , journal =. Smoothing noisy data with spline functions:

work page
[21]

Biometrika , volume =

Singh, Rahul and Xu, Liyuan and Gretton, Arthur , title =. Biometrika , volume =

work page
[22]

Conditional mean embeddings as regressors , year =

Gr\". Conditional mean embeddings as regressors , year =

work page
[23]

A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings , year =

Park, Junhyung and Muandet, Krikamol , booktitle =. A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings , year =

work page
[24]

Klebanov, Ilja and Schuster, Ingmar and Sullivan, T. J. , journal =. A

work page
[25]

Evaluating

Huang, Ziyi and Lam, Henry and Zhang, Haofeng , note =. Evaluating

work page
[26]

2025 , booktitle =

Moskvichev, Peter and Sejdinovic, Dino , title =. 2025 , booktitle =

work page 2025
[27]

Optimal Rates for Regularized Conditional Mean Embedding Learning , year =

Li, Zhu and Meunier, Dimitri and Mollenhauer, Mattes and Gretton, Arthur , booktitle =. Optimal Rates for Regularized Conditional Mean Embedding Learning , year =

work page
[28]

Muandet, Krikamol and Fukumizu, Kenji and Sriperumbudur, Bharath and Sch. Kernel. 2017 , month = jun, journal =

work page 2017
[29]

Borgwardt and Malte J

Arthur Gretton and Karsten M. Borgwardt and Malte J. Rasch and Bernhard Sch. A Kernel Two-Sample Test , journal =. 2012 , volume =

work page 2012
[30]

Advances in Neural Information Processing Systems , title =

Fukumizu, Kenji and Gretton, Arthur and Sun, Xiaohai and Sch\". Advances in Neural Information Processing Systems , title =

work page
[31]

and Jordan, Michael I

Fukumizu, Kenji and Bach, Francis R. and Jordan, Michael I. , journal =. Dimensionality

work page
[32]

and Fukumizu, Kenji and Lanckriet, Gert R

Sriperumbudur, Bharath K. and Fukumizu, Kenji and Lanckriet, Gert R. G. , year =. Universality,. Journal of Machine Learning Research , volume =

work page
[33]

2009 , booktitle =

Song, Le and Huang, Jonathan and Smola, Alex and Fukumizu, Kenji , title =. 2009 , booktitle =

work page 2009
[34]

Conference on Artificial Intelligence and Statistics , year =

Nonparametric Tree Graphical Models , author =. Conference on Artificial Intelligence and Statistics , year =

work page
[35]

Kernel Embeddings of Conditional Distributions: A Unified Kernel Framework for Nonparametric Inference in Graphical Models , year=

Song, Le and Fukumizu, Kenji and Gretton, Arthur , journal=. Kernel Embeddings of Conditional Distributions: A Unified Kernel Framework for Nonparametric Inference in Graphical Models , year=

work page
[36]

Conditional Generative Moment-Matching Networks , year =

Ren, Yong and Zhu, Jun and Li, Jialian and Luo, Yucen , booktitle =. Conditional Generative Moment-Matching Networks , year =

work page
[37]

International Conference on Learning Representations , year=

Calibration tests beyond classification , author=. International Conference on Learning Representations , year=

work page
[38]

International Conference on Machine Learning , year =

Conditional Distributional Treatment Effect with Kernel Conditional Mean Embeddings and U-Statistic Regression , author =. International Conference on Machine Learning , year =

work page
[39]

and Deane, Charlotte M

Glaser, Pierre and Paul, Steffanie and Hummer, Alissa M. and Deane, Charlotte M. and Marks, Debora S. and Amin, Alan N. , title =. 2024 , booktitle =

work page 2024
[40]

Characteristic and Universal Tensor Product Kernels , journal =

Zolt. Characteristic and Universal Tensor Product Kernels , journal =. 2018 , volume =

work page 2018
[41]

2022 , note =

Regularised Least-Squares Regression with Infinite-Dimensional Output Space , author=. 2022 , note =

work page 2022
[42]

Journal of Machine Learning Research , pages =

Fine, Shai and Scheinberg, Katya , title =. Journal of Machine Learning Research , pages =. 2002 , volume =

work page 2002
[43]

and Rubin, Donald B

Imbens, Guido W. and Rubin, Donald B. , year=. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , publisher=

work page
[44]

, journal =

Bang, Heejung and Robins, James M. , journal =. Doubly

work page
[45]

Horvitz and Donovan J

Daniel G. Horvitz and Donovan J. Thompson , journal =. A Generalization of Sampling Without Replacement from a Finite Universe , volume =

work page
[46]

2024 , journal=

Doubly Robust Kernel Statistics for Testing Distributional Treatment Effects , author=. 2024 , journal=

work page 2024
[47]

2024 , booktitle =

Shimizu, Eiki and Fukumizu, Kenji and Sejdinovic, Dino , title =. 2024 , booktitle =

work page 2024
[48]

Rosenbaum , journal =

Paul R. Rosenbaum , journal =. Conditional Permutation Tests and the Propensity Score in Observational Studies , volume =

work page
[49]

MNIST handwritten digit database , author=

work page
[50]

Advances in Neural Information Processing Systems , year =

Learning from Distributions via Support Measure Machines , author =. Advances in Neural Information Processing Systems , year =

work page
[51]

Advances in Neural Information Processing Systems , year=

Variational learning on aggregate outputs with Gaussian processes , author=. Advances in Neural Information Processing Systems , year=

work page
[52]

Advances in Neural Information Processing Systems , year=

Deconditional downscaling with gaussian processes , author=. Advances in Neural Information Processing Systems , year=

work page
[53]

Advances in Neural Information Processing Systems , year=

Bayesimp: Uncertainty quantification for causal data fusion , author=. Advances in Neural Information Processing Systems , year=

work page
[54]

Statistical inference for generative models with maximum mean discrepancy , author=

work page
[55]

Advances in Approximate Bayesian Inference , title =

Ch. Advances in Approximate Bayesian Inference , title =

work page
[56]

Journal of Machine Learning Research , volume=

Counterfactual mean embeddings , author=. Journal of Machine Learning Research , volume=

work page
[57]

, note =

Sejdinovic, D. , note =

work page
[58]

Advances in Neural Information Processing Systems , year=

RKHS-SHAP: Shapley values for kernel methods , author=. Advances in Neural Information Processing Systems , year=

work page
[59]

Advances in Neural Information Processing Systems , year=

Explaining the uncertain: Stochastic Shapley values for Gaussian process models , author=. Advances in Neural Information Processing Systems , year=

work page
[60]

Journal of Machine Learning Research , volume=

Learning theory for distribution regression , author=. Journal of Machine Learning Research , volume=

work page
[61]

Transactions of the American mathematical society , volume=

Theory of reproducing kernels , author=. Transactions of the American mathematical society , volume=

work page

[1] [1]

Judea Pearl , journal =

work page

[2] [2]

Improving predictive inference under covariate shift by weighting the log-likelihood function , volume =

Shimodaira, Hidetoshi , journal =. Improving predictive inference under covariate shift by weighting the log-likelihood function , volume =

work page

[3] [3]

Discriminative

Bickel, Steffen and Br. Discriminative. Journal of Machine Learning Research , number =

work page

[4] [4]

Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation , year =

Masashi Sugiyama and Motoaki Kawanabe , publisher =. Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation , year =

work page

[5] [5]

Wainwright , journal =

Cong Ma and Reese Pathak and Martin J. Wainwright , journal =

work page

[6] [6]

Calibration by

Marx, Charlie and Zalouk, Sofian and Ermon, Stefano , booktitle =. Calibration by

work page

[7] [7]

Generalized kernel two-sample tests , volume =

Song, Hoseung and Chen, Hao , journal =. Generalized kernel two-sample tests , volume =

work page

[8] [8]

Gretton, Arthur and Fukumizu, Kenji and Teo, Choon and Song, Le and Sch. A. Advances in

work page

[9] [9]

A Kernel Test of Goodness of Fit , year =

Chwialkowski, Kacper and Strathmann, Heiko and Gretton, Arthur , booktitle =. A Kernel Test of Goodness of Fit , year =

work page

[10] [10]

Composite

Key, Oscar and Gretton, Arthur and Briol, Fran. Composite. Journal of Machine Learning Research , number =

work page

[11] [11]

Smola, Alex and Gretton, Arthur and Song, Le and Sch. A. Algorithmic

work page

[12] [12]

Advances in Neural Information Processing Systems , title =

Massiani, Pierre-Fran. Advances in Neural Information Processing Systems , title =

work page

[13] [13]

Distance and

Yan, Jian and Li, Zhuoxi and Zhang, Xianyang , note =. Distance and

work page

[14] [14]

, note =

Chatterjee, Anirban and Niu, Ziang and Bhattacharya, Bhaswar B. , note =. A

work page

[15] [15]

Lee, Seongchan and Cha, Suman and Kim, Ilmun , note =. General

work page

[16] [16]

A Two-Sample Conditional Distribution Test Using Conformal Prediction and Weighted Rank Sum , volume =

Xiaoyu Hu and Jing Lei , journal =. A Two-Sample Conditional Distribution Test Using Conformal Prediction and Weighted Rank Sum , volume =

work page

[17] [17]

Conference on Uncertainty in Artificial Intelligence , year =

Boeken, Philip and Mooij, Joris , title =. Conference on Uncertainty in Artificial Intelligence , year =

work page

[18] [18]

Teymur, Onur and Filippi, Sarah , journal =. A

work page

[19] [19]

2026 , note =

Conditional Distributional Treatment Effects: Doubly Robust Estimation and Testing , author=. 2026 , note =

work page 2026

[20] [20]

Smoothing noisy data with spline functions:

Craven, Peter and Wahba, Grace , journal =. Smoothing noisy data with spline functions:

work page

[21] [21]

Biometrika , volume =

Singh, Rahul and Xu, Liyuan and Gretton, Arthur , title =. Biometrika , volume =

work page

[22] [22]

Conditional mean embeddings as regressors , year =

Gr\". Conditional mean embeddings as regressors , year =

work page

[23] [23]

A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings , year =

Park, Junhyung and Muandet, Krikamol , booktitle =. A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings , year =

work page

[24] [24]

Klebanov, Ilja and Schuster, Ingmar and Sullivan, T. J. , journal =. A

work page

[25] [25]

Evaluating

Huang, Ziyi and Lam, Henry and Zhang, Haofeng , note =. Evaluating

work page

[26] [26]

2025 , booktitle =

Moskvichev, Peter and Sejdinovic, Dino , title =. 2025 , booktitle =

work page 2025

[27] [27]

Optimal Rates for Regularized Conditional Mean Embedding Learning , year =

Li, Zhu and Meunier, Dimitri and Mollenhauer, Mattes and Gretton, Arthur , booktitle =. Optimal Rates for Regularized Conditional Mean Embedding Learning , year =

work page

[28] [28]

Muandet, Krikamol and Fukumizu, Kenji and Sriperumbudur, Bharath and Sch. Kernel. 2017 , month = jun, journal =

work page 2017

[29] [29]

Borgwardt and Malte J

Arthur Gretton and Karsten M. Borgwardt and Malte J. Rasch and Bernhard Sch. A Kernel Two-Sample Test , journal =. 2012 , volume =

work page 2012

[30] [30]

Advances in Neural Information Processing Systems , title =

Fukumizu, Kenji and Gretton, Arthur and Sun, Xiaohai and Sch\". Advances in Neural Information Processing Systems , title =

work page

[31] [31]

and Jordan, Michael I

Fukumizu, Kenji and Bach, Francis R. and Jordan, Michael I. , journal =. Dimensionality

work page

[32] [32]

and Fukumizu, Kenji and Lanckriet, Gert R

Sriperumbudur, Bharath K. and Fukumizu, Kenji and Lanckriet, Gert R. G. , year =. Universality,. Journal of Machine Learning Research , volume =

work page

[33] [33]

2009 , booktitle =

Song, Le and Huang, Jonathan and Smola, Alex and Fukumizu, Kenji , title =. 2009 , booktitle =

work page 2009

[34] [34]

Conference on Artificial Intelligence and Statistics , year =

Nonparametric Tree Graphical Models , author =. Conference on Artificial Intelligence and Statistics , year =

work page

[35] [35]

Kernel Embeddings of Conditional Distributions: A Unified Kernel Framework for Nonparametric Inference in Graphical Models , year=

Song, Le and Fukumizu, Kenji and Gretton, Arthur , journal=. Kernel Embeddings of Conditional Distributions: A Unified Kernel Framework for Nonparametric Inference in Graphical Models , year=

work page

[36] [36]

Conditional Generative Moment-Matching Networks , year =

Ren, Yong and Zhu, Jun and Li, Jialian and Luo, Yucen , booktitle =. Conditional Generative Moment-Matching Networks , year =

work page

[37] [37]

International Conference on Learning Representations , year=

Calibration tests beyond classification , author=. International Conference on Learning Representations , year=

work page

[38] [38]

International Conference on Machine Learning , year =

Conditional Distributional Treatment Effect with Kernel Conditional Mean Embeddings and U-Statistic Regression , author =. International Conference on Machine Learning , year =

work page

[39] [39]

and Deane, Charlotte M

Glaser, Pierre and Paul, Steffanie and Hummer, Alissa M. and Deane, Charlotte M. and Marks, Debora S. and Amin, Alan N. , title =. 2024 , booktitle =

work page 2024

[40] [40]

Characteristic and Universal Tensor Product Kernels , journal =

Zolt. Characteristic and Universal Tensor Product Kernels , journal =. 2018 , volume =

work page 2018

[41] [41]

2022 , note =

Regularised Least-Squares Regression with Infinite-Dimensional Output Space , author=. 2022 , note =

work page 2022

[42] [42]

Journal of Machine Learning Research , pages =

Fine, Shai and Scheinberg, Katya , title =. Journal of Machine Learning Research , pages =. 2002 , volume =

work page 2002

[43] [43]

and Rubin, Donald B

Imbens, Guido W. and Rubin, Donald B. , year=. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , publisher=

work page

[44] [44]

, journal =

Bang, Heejung and Robins, James M. , journal =. Doubly

work page

[45] [45]

Horvitz and Donovan J

Daniel G. Horvitz and Donovan J. Thompson , journal =. A Generalization of Sampling Without Replacement from a Finite Universe , volume =

work page

[46] [46]

2024 , journal=

Doubly Robust Kernel Statistics for Testing Distributional Treatment Effects , author=. 2024 , journal=

work page 2024

[47] [47]

2024 , booktitle =

Shimizu, Eiki and Fukumizu, Kenji and Sejdinovic, Dino , title =. 2024 , booktitle =

work page 2024

[48] [48]

Rosenbaum , journal =

Paul R. Rosenbaum , journal =. Conditional Permutation Tests and the Propensity Score in Observational Studies , volume =

work page

[49] [49]

MNIST handwritten digit database , author=

work page

[50] [50]

Advances in Neural Information Processing Systems , year =

Learning from Distributions via Support Measure Machines , author =. Advances in Neural Information Processing Systems , year =

work page

[51] [51]

Advances in Neural Information Processing Systems , year=

Variational learning on aggregate outputs with Gaussian processes , author=. Advances in Neural Information Processing Systems , year=

work page

[52] [52]

Advances in Neural Information Processing Systems , year=

Deconditional downscaling with gaussian processes , author=. Advances in Neural Information Processing Systems , year=

work page

[53] [53]

Advances in Neural Information Processing Systems , year=

Bayesimp: Uncertainty quantification for causal data fusion , author=. Advances in Neural Information Processing Systems , year=

work page

[54] [54]

Statistical inference for generative models with maximum mean discrepancy , author=

work page

[55] [55]

Advances in Approximate Bayesian Inference , title =

Ch. Advances in Approximate Bayesian Inference , title =

work page

[56] [56]

Journal of Machine Learning Research , volume=

Counterfactual mean embeddings , author=. Journal of Machine Learning Research , volume=

work page

[57] [57]

, note =

Sejdinovic, D. , note =

work page

[58] [58]

Advances in Neural Information Processing Systems , year=

RKHS-SHAP: Shapley values for kernel methods , author=. Advances in Neural Information Processing Systems , year=

work page

[59] [59]

Advances in Neural Information Processing Systems , year=

Explaining the uncertain: Stochastic Shapley values for Gaussian process models , author=. Advances in Neural Information Processing Systems , year=

work page

[60] [60]

Journal of Machine Learning Research , volume=

Learning theory for distribution regression , author=. Journal of Machine Learning Research , volume=

work page

[61] [61]

Transactions of the American mathematical society , volume=

Theory of reproducing kernels , author=. Transactions of the American mathematical society , volume=

work page