Testing Equality of Conditional Distributions via Generative Models
Pith reviewed 2026-06-27 21:23 UTC · model grok-4.3
The pith
Cross-generating responses with conditional generative models constructs a test for equality of two conditional distributions that avoids density ratio estimation and local smoothing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The population version of this construction yields a conditional discrepancy that characterizes equality of the two conditional distributions under suitable overlap conditions, while the sample version leads to a test statistic defined as the supremum of an RKHS-indexed empirical process with multiplier bootstrap calibration. The proposed procedure attains a double-robustness property with respect to conditional generator estimation errors.
What carries the argument
The cross-generation step that applies each sample's learned conditional generator to the covariate values observed in the other sample, producing comparable responses for direct comparison.
If this is right
- The test statistic converges to a known limiting distribution under the null and diverges under the alternative.
- Multiplier bootstrap consistently approximates the null distribution of the test statistic.
- The test is consistent for detecting differences between the two conditional distributions.
- The double-robustness property ensures the test remains valid when either generator is estimated at a suitable rate.
- The procedure applies directly to multivariate responses without requiring dimension reduction.
Where Pith is reading between the lines
- The same cross-generation idea could be used to test other conditional properties such as equality of conditional means or quantiles.
- The method might extend naturally to settings with censored or missing responses by modifying the generator training step.
- Because of double robustness, the approach could serve as a building block for semi-parametric inference on conditional distributions when nuisance generators are fitted with flexible machine learning tools.
Load-bearing premise
The two sets of covariates must overlap sufficiently in support so that responses generated from one set remain comparable to responses observed in the other.
What would settle it
A simulation study in which the two conditional distributions are known to be identical, generators are estimated consistently, and the test nevertheless rejects the null at a rate substantially above the nominal level.
Figures
read the original abstract
We study the problem of testing whether two conditional distributions are equal using generative models. The proposed method learns a conditional generator from each sample and uses it to create responses at covariate values observed in the other sample, allowing generated and observed responses to be compared directly. By aligning covariates through cross-generation, the approach avoids conditional density-ratio estimation and local smoothing over high-dimensional covariates. The population version of this construction yields a conditional discrepancy that characterizes equality of the two conditional distributions under suitable overlap conditions, while the sample version leads to a test statistic defined as the supremum of an RKHS-indexed empirical process with multiplier bootstrap calibration. A computationally efficient algorithm for evaluating the statistic and its bootstrap analogue is developed based on alternating maximization and the kernel trick. Theoretically, we derive the limiting distribution of the test statistic under both the null and alternative hypotheses, prove bootstrap validity and consistency of the resulting test, and show that the proposed procedure attains a double-robustness property with respect to conditional generator estimation errors. Simulations and real data applications suggest that the proposed method performs well for multivariate responses and high-dimensional covariates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a method for testing equality of two conditional distributions by learning conditional generators from each sample and cross-generating responses at the other's observed covariates. This yields a population conditional discrepancy that characterizes equality under overlap conditions, and a sample test statistic as the supremum of an RKHS-indexed empirical process calibrated by multiplier bootstrap. An efficient algorithm uses alternating maximization and the kernel trick. The manuscript derives limiting distributions under null and alternative, proves bootstrap validity and consistency, establishes double robustness to generator estimation errors, and reports favorable simulation and real-data performance for multivariate responses and high-dimensional covariates.
Significance. If the results hold, the work is significant for offering a smoothing-free and conditional density-ratio-free procedure for testing conditional distribution equality, with the double-robustness property providing practical flexibility in generator estimation. The RKHS supremum construction and bootstrap calibration are technically appealing strengths.
major comments (2)
- [Abstract / population construction] Abstract and population discrepancy section: the claim that the conditional discrepancy characterizes equality of the two conditional distributions is tied to 'suitable overlap conditions,' but the precise mathematical form of these conditions (e.g., the required common support or density lower bound between the two covariate distributions) is not stated explicitly; without this, it is impossible to verify necessity and sufficiency for the characterizing property.
- [Theoretical results on limiting distribution and double robustness] Double-robustness claim (theoretical results): the abstract asserts double robustness with respect to conditional generator estimation errors, yet the specific convergence rates required of the two generators (and how their errors interact) for the limiting distribution and bootstrap validity to remain valid are not detailed; this is load-bearing for the asymptotic guarantees.
minor comments (2)
- [Algorithm section] The description of the alternating maximization algorithm would benefit from explicit pseudocode or convergence criteria to aid reproducibility.
- [Simulations] Simulation section: the reported settings for high-dimensional covariates should include the dimension values and sample sizes explicitly in a table for clarity.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive report. We address the two major comments below and will revise the manuscript accordingly to improve clarity on the stated points.
read point-by-point responses
-
Referee: Abstract / population construction] Abstract and population discrepancy section: the claim that the conditional discrepancy characterizes equality of the two conditional distributions is tied to 'suitable overlap conditions,' but the precise mathematical form of these conditions (e.g., the required common support or density lower bound between the two covariate distributions) is not stated explicitly; without this, it is impossible to verify necessity and sufficiency for the characterizing property.
Authors: We agree that the overlap conditions require an explicit statement. The population discrepancy section will be revised to include the precise conditions: the two covariate distributions must share common support with a uniform positive lower bound on the density ratio (or equivalent overlap measure) to ensure the characterizing property holds with necessity and sufficiency. This will be added both in the main text and referenced in the abstract. revision: yes
-
Referee: [Theoretical results on limiting distribution and double robustness] Double-robustness claim (theoretical results): the abstract asserts double robustness with respect to conditional generator estimation errors, yet the specific convergence rates required of the two generators (and how their errors interact) for the limiting distribution and bootstrap validity to remain valid are not detailed; this is load-bearing for the asymptotic guarantees.
Authors: The double-robustness result is established in the theoretical section by showing that the cross-generation error terms vanish in the limit under product-rate conditions on the two generator estimators. However, the referee is correct that the abstract does not detail these rates. We will revise the abstract to briefly state the required rates (e.g., each generator error o_p(n^{-1/4}) with their product o_p(n^{-1/2})) and add a pointer to the relevant theorem for the interaction of the errors. revision: yes
Circularity Check
No circularity detected; derivation is self-contained.
full rationale
The population conditional discrepancy is constructed directly from cross-generation of responses using the two conditional generators; its characterizing property (zero iff conditionals equal, under overlap) follows from the definition of the discrepancy measure rather than from any fitted parameter or self-citation. The sample statistic is the sup of an RKHS-indexed empirical process with multiplier bootstrap; limiting distribution, validity, consistency, and double-robustness are derived from standard empirical-process arguments once generators are treated as fixed. No load-bearing self-citation, no fitted input renamed as prediction, and no ansatz smuggled via prior work appear in the abstract or described chain. The overlap condition is an explicit assumption required for the characterization, not a hidden definitional loop.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Standard results on empirical processes and multiplier bootstrap in RKHS hold for the constructed statistic.
- domain assumption Suitable overlap conditions on the covariate distributions hold.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2301.02739 , year=
Rank-transformed subsampling: inference for multiple data splitting and exchangeable p-values , author=. arXiv preprint arXiv:2301.02739 , year=
-
[2]
IEEE Transactions on Neural Networks and Learning Systems , year=
Significance tests of feature relevance for a black-box learner , author=. IEEE Transactions on Neural Networks and Learning Systems , year=
-
[3]
The Journal of Machine Learning Research , volume=
Double generative adversarial networks for conditional independence testing , author=. The Journal of Machine Learning Research , volume=. 2021 , publisher=
2021
-
[4]
Journal of the American Statistical Association , volume =
A deep generative approach to conditional sampling , author=. Journal of the American Statistical Association , volume =. 2023 , publisher=
2023
-
[5]
Journal of the American Statistical Association , volume=
A two-sample conditional distribution test using conformal prediction and weighted rank sum , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=
2024
-
[6]
Rectifier nonlinearities improve neural network acoustic models , author=. Proc. icml , volume=. 2013 , organization=
2013
-
[7]
Advances in neural information processing systems , volume=
Generative adversarial nets , author=. Advances in neural information processing systems , volume=
-
[8]
International conference on machine learning , pages=
Arjovsky, Martin and Chintala, Soumith and Bottou, L. International conference on machine learning , pages=. 2017 , organization=
2017
-
[9]
Improved training of
Gulrajani, Ishaan and Ahmed, Faruk and Arjovsky, Martin and Dumoulin, Vincent and Courville, Aaron C , journal=. Improved training of
-
[10]
Proceedings of the IEEE International Conference on Computer Vision , pages=
Least squares generative adversarial networks , author=. Proceedings of the IEEE International Conference on Computer Vision , pages=
-
[11]
International conference on machine learning , pages=
Batch normalization: Accelerating deep network training by reducing internal covariate shift , author=. International conference on machine learning , pages=. 2015 , organization=
2015
-
[12]
arXiv preprint arXiv:1412.6980 , year=
Adam: A method for stochastic optimization , author=. arXiv preprint arXiv:1412.6980 , year=
-
[13]
Proceedings of the National Academy of Sciences , volume=
Universal inference , author=. Proceedings of the National Academy of Sciences , volume=. 2020 , publisher=
2020
-
[14]
Advances in neural information processing systems , volume=
Pruning neural networks without any data by iteratively conserving synaptic flow , author=. Advances in neural information processing systems , volume=
-
[15]
arXiv preprint arXiv:1803.03635 , year=
The lottery ticket hypothesis: Finding sparse, trainable neural networks , author=. arXiv preprint arXiv:1803.03635 , year=
-
[16]
arXiv preprint arXiv:1903.01611 , year=
Stabilizing the lottery ticket hypothesis , author=. arXiv preprint arXiv:1903.01611 , year=
arXiv 1903
-
[17]
The Journal of Machine Learning Research , volume=
Lassonet: A neural network with feature sparsity , author=. The Journal of Machine Learning Research , volume=. 2021 , publisher=
2021
-
[18]
Annales de l'IHP Probabilit
On consistency of kernel density estimators for randomly censored data: rates holding uniformly over adaptive intervals , author=. Annales de l'IHP Probabilit
-
[19]
Annales de l'Institut Henri Poincare (B) Probability and Statistics , volume=
Rates of strong uniform consistency for multivariate kernel density estimators , author=. Annales de l'Institut Henri Poincare (B) Probability and Statistics , volume=. 2002 , organization=
2002
-
[20]
The Annals of Probability , pages=
Some limit theorems for empirical processes , author=. The Annals of Probability , pages=. 1984 , publisher=
1984
-
[21]
Concentration inequalities and asymptotic results for ratio type empirical processes , author=
-
[22]
Journal of Theoretical Probability , volume=
Uniform and universal Glivenko-Cantelli classes , author=. Journal of Theoretical Probability , volume=. 1991 , publisher=
1991
-
[23]
The Annals of Probability , pages=
Bootstrapping general empirical measures , author=. The Annals of Probability , pages=. 1990 , publisher=
1990
-
[24]
Bernoulli , pages=
Empirical processes and applications: an overview , author=. Bernoulli , pages=. 1996 , publisher=
1996
-
[25]
1996 , publisher=
Weak Convergence and Empirical Processes: With Applications to Statistics , author=. 1996 , publisher=
1996
-
[26]
The Annals of Probability , volume=
Laws of the iterated logarithm for censored data , author=. The Annals of Probability , volume=. 1999 , publisher=
1999
-
[27]
Lecture Notes, Columbia University , volume=
A gentle introduction to empirical process theory and applications , author=. Lecture Notes, Columbia University , volume=
-
[28]
Conference On Learning Theory , pages=
Approximation beats concentration? An approximation view on inference with smooth radial kernels , author=. Conference On Learning Theory , pages=. 2018 , organization=
2018
-
[29]
Probability Theory and Related Fields , volume=
Comparison and anti-concentration bounds for maxima of Gaussian random vectors , author=. Probability Theory and Related Fields , volume=. 2015 , publisher=
2015
-
[30]
The Annals of Statistics , volume=
Anti-concentration and honest, adaptive confidence bands , author=. The Annals of Statistics , volume=. 2014 , publisher=
2014
-
[31]
The Annals of Statistics , volume=
Gaussian approximation of suprema of empirical processes , author=. The Annals of Statistics , volume=
-
[32]
Inventiones mathematicae , volume=
Entropy and the combinatorial dimension , author=. Inventiones mathematicae , volume=. 2003 , publisher=
2003
-
[33]
Journal of the ACM (JACM) , volume=
Scale-sensitive dimensions, uniform convergence, and learnability , author=. Journal of the ACM (JACM) , volume=. 1997 , publisher=
1997
-
[34]
, author=
Universal Kernels. , author=. Journal of Machine Learning Research , volume=
-
[35]
Inventiones mathematicae , volume=
New concentration inequalities in product spaces , author=. Inventiones mathematicae , volume=. 1996 , publisher=
1996
-
[36]
The Annals of Probability , pages=
Sharper bounds for Gaussian and empirical processes , author=. The Annals of Probability , pages=. 1994 , publisher=
1994
-
[37]
Journal of Theoretical Probability , volume=
A note on conditional versus joint unconditional weak convergence in bootstrap consistency results , author=. Journal of Theoretical Probability , volume=. 2019 , publisher=
2019
-
[38]
, author=
Universality, Characteristic Kernels and RKHS Embedding of Measures. , author=. Journal of Machine Learning Research , volume=
-
[39]
The Journal of Machine Learning Research , volume=
Universal multi-task kernels , author=. The Journal of Machine Learning Research , volume=. 2008 , publisher=
2008
-
[40]
The Journal of Machine Learning Research , volume=
A kernel two-sample test , author=. The Journal of Machine Learning Research , volume=. 2012 , publisher=
2012
-
[41]
2008 , publisher=
Introduction to Empirical Processes and Semiparametric Inference , author=. 2008 , publisher=
2008
-
[42]
Foundations of Modern Probability , author =. 2002 , pages =. doi:10.1007/978-1-4757-4015-8 , url =
-
[43]
Annales de l'IHP Probabilit
Exchangeable random measures , author=. Annales de l'IHP Probabilit
-
[44]
arXiv preprint arXiv:1505.03906 , year=
Training generative neural networks via maximum mean discrepancy optimization , author=. arXiv preprint arXiv:1505.03906 , year=
-
[45]
International conference on machine learning , pages=
Generative moment matching networks , author=. International conference on machine learning , pages=. 2015 , organization=
2015
-
[46]
On gradient regularizers for
Arbel, Michael and Sutherland, Danica J and Bi. On gradient regularizers for. Advances in neural information processing systems , volume=
-
[47]
Mroueh, Youssef and Li, Chun-Liang and Sercu, Tom and Raj, Anant and Cheng, Yu , booktitle=. Sobolev
-
[48]
Journal of Machine Learning Research , volume=
How well generative adversarial networks learn distributions , author=. Journal of Machine Learning Research , volume=
-
[49]
Minimax distribution estimation in
Singh, Shashank and P. Minimax distribution estimation in. arXiv preprint arXiv:1802.08855 , year=
-
[50]
Approximability of Discriminators Implies Diversity in
Bai, Yu and Ma, Tengyu and Risteski, Andrej , booktitle=. Approximability of Discriminators Implies Diversity in
-
[51]
Estimation of smooth densities in
Weed, Jonathan and Berthet, Quentin , booktitle=. Estimation of smooth densities in. 2019 , organization=
2019
-
[52]
International Conference on Machine Learning , pages=
Sgd learns one-layer networks in wgans , author=. International Conference on Machine Learning , pages=. 2020 , organization=
2020
-
[53]
arXiv preprint arXiv:2002.03938 , year=
Distribution approximation and statistical estimation guarantees of generative adversarial networks , author=. arXiv preprint arXiv:2002.03938 , year=
arXiv 2002
-
[55]
Journal of statistical planning and inference , volume=
Improving predictive inference under covariate shift by weighting the log-likelihood function , author=. Journal of statistical planning and inference , volume=. 2000 , publisher=
2000
-
[56]
, author=
Covariate shift adaptation by importance weighted cross validation. , author=. Journal of Machine Learning Research , volume=
-
[57]
Advances in neural information processing systems , volume=
Conformal prediction under covariate shift , author=. Advances in neural information processing systems , volume=
-
[58]
Journal of Machine Learning Research , volume=
Augmented transfer regression learning with semi-non-parametric nuisance models , author=. Journal of Machine Learning Research , volume=
-
[59]
Journal of machine learning research , volume=
An error analysis of generative adversarial networks for learning distributions , author=. Journal of machine learning research , volume=
-
[60]
Journal of the American Statistical Association , volume=
Bootstrap test for difference between means in nonparametric regression , author=. Journal of the American Statistical Association , volume=. 1990 , publisher=
1990
-
[61]
Journal of the American Statistical Association , volume=
Comparison of regression curves using quasi-residuals , author=. Journal of the American Statistical Association , volume=. 1995 , publisher=
1995
-
[62]
Journal of the American Statistical Association , volume=
Smoothing parameter selection for power optimality in testing of regression curves , author=. Journal of the American Statistical Association , volume=. 1997 , publisher=
1997
-
[63]
Journal of the American Statistical Association , volume=
Test of significance when data are curves , author=. Journal of the American Statistical Association , volume=. 1998 , publisher=
1998
-
[64]
The Annals of Statistics , volume=
Nonparametric comparison of regression curves: an empirical process approach , author=. The Annals of Statistics , volume=. 2003 , publisher=
2003
-
[65]
Econometrica: Journal of the Econometric Society , pages=
A conditional Kolmogorov test , author=. Econometrica: Journal of the Econometric Society , pages=. 1997 , publisher=
1997
-
[66]
Econometric Theory , volume=
A consistent test of conditional parametric distributions , author=. Econometric Theory , volume=. 2000 , publisher=
2000
-
[67]
Econometric Theory , volume=
A nonparametric bootstrap test of conditional distributions , author=. Econometric Theory , volume=. 2006 , publisher=
2006
-
[68]
Journal of Econometrics , volume=
Distribution-free specification tests of conditional models , author=. Journal of Econometrics , volume=. 2008 , publisher=
2008
-
[69]
wiley interdisciplinary reviews: Computational statistics , volume=
Energy distance , author=. wiley interdisciplinary reviews: Computational statistics , volume=. 2016 , publisher=
2016
-
[70]
arXiv preprint arXiv:1411.1784 , year=
Conditional generative adversarial nets , author=. arXiv preprint arXiv:1411.1784 , year=
-
[71]
arXiv preprint arXiv:1511.06434 , year=
Unsupervised representation learning with deep convolutional generative adversarial networks , author=. arXiv preprint arXiv:1511.06434 , year=
-
[72]
arXiv preprint arXiv:1312.6114 , year=
Auto-encoding variational bayes , author=. arXiv preprint arXiv:1312.6114 , year=
-
[73]
Advances in neural information processing systems , volume=
Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=
-
[74]
Neural information processing: 20th international conference, ICONIP 2013, daegu, korea, november 3-7, 2013
Challenges in representation learning: A report on three machine learning contests , author=. Neural information processing: 20th international conference, ICONIP 2013, daegu, korea, november 3-7, 2013. Proceedings, Part III 20 , pages=. 2013 , organization=
2013
-
[75]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Stargan v2: Diverse image synthesis for multiple domains , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[76]
arXiv preprint arXiv:1812.11806 , year=
An introduction to domain adaptation and transfer learning , author=. arXiv preprint arXiv:1812.11806 , year=
-
[78]
arXiv preprint arXiv:2210.08149 , year=
Distance and kernel-based measures for global and local two-sample conditional distribution testing , author=. arXiv preprint arXiv:2210.08149 , year=
-
[79]
arXiv preprint arXiv:2410.16636 , year=
General frameworks for conditional two-sample testing , author=. arXiv preprint arXiv:2410.16636 , year=
-
[80]
1994 , publisher=
Mixture density networks , author=. 1994 , publisher=
1994
-
[81]
and Li, R
Cai, Z. and Li, R. and Zhang, Y. , title =. Journal of Machine Learning Research , year =
- [82]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.