General Frameworks for Conditional Two-Sample Testing

Ilmun Kim; Seongchan Lee; Suman Cha

arxiv: 2410.16636 · v2 · submitted 2024-10-22 · 📊 stat.ML · cs.LG· math.ST· stat.TH

General Frameworks for Conditional Two-Sample Testing

Seongchan Lee , Suman Cha , Ilmun Kim This is my paper

Pith reviewed 2026-05-23 19:18 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.TH

keywords conditional two-sample testingconditional independence testingdensity ratio estimationhardness resultblack-box conversiondomain adaptationalgorithmic fairness

0 comments

The pith

Two frameworks convert conditional independence tests into conditional two-sample tests or reduce the problem to marginal testing via density ratios under targeted distribution classes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first proves a hardness result: without assumptions on the distributions, no valid test can achieve significant power against any single alternative in conditional two-sample testing. It then presents two frameworks that achieve validity and power by targeting specific distribution classes. The first converts any conditional independence test into a conditional two-sample test through a black-box procedure that preserves the original test's asymptotic properties. The second reduces the conditional problem to comparing marginal distributions after estimating density ratios, which permits direct use of existing marginal two-sample methods such as classification-based or kernel-based approaches. These constructions matter for applications like domain adaptation and algorithmic fairness, where one must compare groups while controlling for confounders, and the paper illustrates their finite-sample behavior through simulations.

Core claim

Conditional two-sample testing is hard in general because no valid test can have significant power against any single alternative without assumptions. Under targeted classes of distributions, two frameworks succeed: the first converts any conditional independence test into a conditional two-sample test in a black-box manner while preserving asymptotic properties; the second transforms the task into marginal distribution comparison using estimated density ratios and demonstrates this with classification and kernel methods. Simulation studies confirm the frameworks' behavior in finite samples.

What carries the argument

Black-box conversion from any conditional independence test to a conditional two-sample test that preserves asymptotics, together with density ratio estimation that reduces the conditional problem to marginal two-sample testing.

If this is right

Any existing conditional independence test can be repurposed directly for conditional two-sample testing while retaining its theoretical guarantees.
Standard marginal two-sample testing procedures become applicable once density ratios are estimated from data.
The methods inherit the validity and power properties of the base tests within the targeted distribution classes.
Simulation studies can be used to check finite-sample performance of the converted or reduced tests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The frameworks could simplify testing for group differences in fairness settings by allowing reuse of conditional independence tools to control for confounders.
If density ratio estimation error grows with dimension, the second framework's power may degrade faster than the first in high-dimensional data.
The same reduction ideas might apply to other conditional testing problems by swapping the base test or the marginal comparator.

Load-bearing premise

The frameworks must target specific classes of distributions to achieve both validity and power.

What would settle it

A concrete counterexample in which the black-box conversion fails to preserve the type I error control or asymptotic power of the original conditional independence test would disprove the first framework.

Figures

Figures reproduced from arXiv: 2410.16636 by Ilmun Kim, Seongchan Lee, Suman Cha.

**Figure 2.** Figure 2: Rejection rates for Scenario 2 under null and alternative hypotheses, shown for both unbounded [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: Rejection rates for Scenario 3 under null and alternative hypotheses, shown for both unbounded [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Performance comparison of DRT methods on diamonds and superconductivity datasets using LL [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Log-scaled mean squared errors of marginal density ratio [PITH_FULL_IMAGE:figures/full_fig_p035_5.png] view at source ↗

**Figure 6.** Figure 6: Rejection rates of CIT methods on the diamonds and superconductivity datasets under null and [PITH_FULL_IMAGE:figures/full_fig_p036_6.png] view at source ↗

read the original abstract

We study the problem of conditional two-sample testing, which aims to determine whether two populations have the same distribution after accounting for confounding factors. This problem commonly arises in various applications, such as domain adaptation and algorithmic fairness, where comparing two groups is essential while controlling for confounding variables. We begin by establishing a hardness result for conditional two-sample testing, demonstrating that no valid test can have significant power against any single alternative without proper assumptions. We then introduce two general frameworks that implicitly or explicitly target specific classes of distributions for their validity and power. Our first framework allows us to convert any conditional independence test into a conditional two-sample test in a black-box manner, while preserving the asymptotic properties of the original conditional independence test. The second framework transforms the problem into comparing marginal distributions with estimated density ratios, which allows us to leverage existing methods for marginal two-sample testing. We demonstrate this idea in a concrete manner with classification and kernel-based methods. Finally, simulation studies are conducted to illustrate the proposed frameworks in finite-sample scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reduces conditional two-sample testing to existing CI or marginal tests via two clean frameworks after a hardness result, with the black-box conversion as the main practical addition.

read the letter

The main thing here is the pair of reductions. The first turns any conditional independence test into a conditional two-sample test in a black-box way while keeping the original asymptotics. The second estimates density ratios so the problem becomes a standard marginal two-sample test. Both are presented after a clear hardness result showing that power requires targeting specific distribution classes. That framing is new and directly useful for applications like fairness auditing and domain adaptation. The concrete examples with classification and kernel methods, plus the finite-sample simulations, make the ideas easy to try out. The derivations appear standard and the reductions avoid obvious circularity or self-reference. One soft spot is that the density-ratio route will inherit whatever error comes from ratio estimation, especially in higher dimensions, though the paper limits its claims to the targeted classes. The citation pattern is appropriate and does not lean on unverified self-references. This work is for statisticians and ML researchers who already use CI or two-sample tests and need to handle confounders. It is solid enough on the methods to merit a serious referee who can check the power calculations and finite-sample behavior in detail.

Referee Report

0 major / 4 minor

Summary. The manuscript establishes a hardness result showing that nonparametric conditional two-sample testing admits no valid test with nontrivial power against a fixed alternative without distributional assumptions. It then presents two frameworks that target specific distribution classes: the first reduces the conditional two-sample problem (X ⊥ A | Z with A binary) to conditional independence testing via a direct equivalence, permitting any valid CI test to be applied in black-box fashion while inheriting its asymptotics; the second re-expresses the problem as a marginal two-sample test after estimating density ratios. The frameworks are instantiated with classification and kernel methods and assessed via finite-sample simulations.

Significance. If the claimed equivalence and asymptotic preservation hold, the work supplies a general, modular route to conditional two-sample testing that reuses existing CI and two-sample procedures. This is relevant to domain adaptation and fairness applications. The explicit alignment with the hardness result (targeting restricted distribution classes) and the black-box character of the first framework are constructive features.

minor comments (4)

[§2] §2 (hardness result): the precise statement of the alternative class against which power is precluded should be stated as a formal theorem rather than described at high level, to make the necessity of the subsequent assumptions fully transparent.
[§4.1] §4.1 (first framework): the reduction step that maps the conditional two-sample null to a CI null should include an explicit statement of the measure-theoretic conditions under which the equivalence is measure-preserving, even if standard.
[§5] §5 (density-ratio framework): the error propagation from density-ratio estimation into the marginal test statistic is only sketched; a short lemma bounding the additional bias term would strengthen the asymptotic claim.
[Simulations] Simulation section: the reported power curves would benefit from an additional panel showing type-I error under the null for each method, to confirm that the black-box conversion does not inflate size in finite samples.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work, the recognition of its relevance to domain adaptation and fairness, and the recommendation of minor revision. The report correctly captures the hardness result, the two frameworks, and their modular character. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper establishes a hardness result showing that nonparametric conditional two-sample testing requires assumptions for nontrivial power, then presents two frameworks: one that converts any conditional independence test to a conditional two-sample test via black-box equivalence while preserving asymptotics, and a second that re-expresses the problem using density-ratio reweighting to reduce to marginal two-sample testing. These are explicit methodological reductions based on distributional equivalences (X ⊥ A | Z with A binary), not self-definitions, fitted parameters renamed as predictions, or self-citation chains. No load-bearing ansatz, uniqueness theorem from the same authors, or renaming of known results is present; the central claims remain independent of the paper's own fitted quantities or prior self-references.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard regularity conditions for asymptotic validity of the converted tests and on the feasibility of density ratio estimation; no new free parameters or invented entities are mentioned.

axioms (1)

domain assumption Regularity conditions sufficient for asymptotic properties of conditional independence and marginal two-sample tests to carry over
Invoked when the paper states that asymptotic properties are preserved.

pith-pipeline@v0.9.0 · 5707 in / 1125 out tokens · 25518 ms · 2026-05-23T19:18:36.473884+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 1 internal anchor

[1]

Andrews, D. W. K. (1997). A Conditional Kolmogorov Test . Econometrica , 65(5):1097--1128

work page 1997
[2]

Barocas, S., Hardt, M., and Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities . The MIT Press

work page 2023
[3]

B., Wang, Y., Barber, R

Berrett, T. B., Wang, Y., Barber, R. F., and Samworth, R. J. (2020). The conditional permutation test for independence while controlling for confounders. Journal of the Royal Statistical Society Series B: Statistical Methodology , 82(1):175--197

work page 2020
[4]

Boeken, P. A. and Mooij, J. M. (2021). A bayesian nonparametric conditional two-sample test with an application to local causal discovery. In de Campos, C. and Maathuis, M. H., editors, Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence , volume 161 of Proceedings of Machine Learning Research , pages 1565--1575. PMLR

work page 2021
[5]

Candes, E., Fan, Y., Janson, L., and Lv, J. (2018). Panning for gold: ‘Model-X’ knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology , 80(3):551--577

work page 2018
[6]

Chakraborty, A., Zhang, J., and Katsevich, E. (2024). Doubly robust and computationally efficient high-dimensional variable selection. arXiv preprint arXiv:2409.09512

work page arXiv 2024
[7]

Chatterjee, A., Niu, Z., and Bhattacharya, B. B. (2024). A kernel-based conditional two-sample test using nearest neighbors (with applications to calibration, regression curves, and simulation-based inference). arXiv preprint arXiv:2407.16550

work page arXiv 2024
[8]

and Lei, J

Chen, Y. and Lei, J. (2024). De-Biased Two-Sample U-Statistics With Application To Conditional Distribution Testing . arXiv preprint arXiv:2402.00164

work page arXiv 2024
[9]

Choi, K., Liao, M., and Ermon, S. (2021). Featurized density ratio estimation. In Uncertainty in Artificial Intelligence , pages 172--182

work page 2021
[10]

Choi, K., Meng, C., Song, Y., and Ermon, S. (2022). Density ratio estimation via infinitesimal classification. In International Conference on Artificial Intelligence and Statistics , pages 2552--2573

work page 2022
[11]

and Romano, J

Chung, E. and Romano, J. P. (2013). Exact and asymptotically robust permutation tests. The Annals of Statistics , 41(2):484--507

work page 2013
[12]

Dai, B., Shen, X., and Pan, W. (2022). Significance tests of feature relevance for a black-box learner. IEEE transactions on neural networks and learning systems , 35(2):1898--1911

work page 2022
[13]

Doran, G., Muandet, K., Zhang, K., and Sch\" o lkopf, B. (2014). A permutation-based kernel conditional independence test. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence , UAI'14, page 132–141, Arlington, Virginia, USA. AUAI Press

work page 2014
[14]

and Lin, S.-K

Fan, J. and Lin, S.-K. (1998). Test of significance when data are curves. Journal of the American Statistical Association , 93(443):1007--1021

work page 1998
[15]

Fan, Y., Li, Q., and Min, I. (2006). A nonparametric bootstrap test of conditional distributions. Econometric Theory , 22(4):587--613

work page 2006
[16]

Fukumizu, K., Gretton, A., Sun, X., and Sch \"o lkopf, B. (2007). Kernel measures of conditional dependence. Advances in Neural Information Processing Systems , 20:489–496

work page 2007
[17]

Givens, G. H. and Hoeting, J. A. (2012). Computational statistics . John Wiley & Sons, Hoboken, NJ, USA, 2 edition

work page 2012
[18]

M., Rasch, M

Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch \"o lkopf, B., and Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research , 13(25):723--773

work page 2012
[19]

and Hart, J

Hall, P. and Hart, J. D. (1990). Bootstrap test for difference between means in nonparametric regression. Journal of the American Statistical Association , 85(412):1039--1049

work page 1990
[20]

Hamidieh, K. (2018). A data-driven statistical model for predicting the critical temperature of a superconductor. Computational Materials Science , 154:346–354

work page 2018
[21]

Hardt, M., Price, E., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 29, page 3323–3331. Curran Associates, Inc

work page 2016
[22]

Hediger, S., Michel, L., and N \"a f, J. (2022). On the use of random forest for two-sample testing. Computational Statistics & Data Analysis , 170:107435

work page 2022
[23]

and Lei, J

Hu, X. and Lei, J. (2024). A two-sample conditional distribution test using conformal prediction and weighted rank sum. Journal of the American Statistical Association , 119(546):1136--1154

work page 2024
[24]

Kanamori, T., Hido, S., and Sugiyama, M. (2009). A least-squares approach to direct importance estimation. The Journal of Machine Learning Research , 10:1391--1445

work page 2009
[25]

Kanamori, T., Suzuki, T., and Sugiyama, M. (2010). Theoretical analysis of density ratio estimation. IEICE transactions on fundamentals of electronics, communications and computer sciences , 93(4):787--798

work page 2010
[26]

Kim, I., Balakrishnan, S., and Wasserman, L. (2022a). Minimax optimality of permutation tests. The Annals of Statistics , 50(1):225--251

work page
[27]

B., and Lei, J

Kim, I., Lee, A. B., and Lei, J. (2019). Global and local two-sample tests via regression. Electronic Journal of Statistics , 13(2):5253--5305

work page 2019
[28]

Kim, I., Neykov, M., Balakrishnan, S., and Wasserman, L. (2022b). Local permutation tests for conditional independence. The Annals of Statistics , 50(6):3388--3414

work page
[29]

Kim, I., Neykov, M., Balakrishnan, S., and Wasserman, L. (2023). Conditional Independence Testing for Discrete Distributions: Beyond ^2 -and G -tests . arXiv preprint arXiv:2308.05373

work page arXiv 2023
[30]

Kim, I., Ramdas, A., Singh, A., and Wasserman, L. (2021). Classification accuracy as a proxy for two-sample testing. The Annals of Statistics , 49(1):411--434

work page 2021
[31]

and Hino, H

Kimura, M. and Hino, H. (2024). A short survey on importance weighting for machine learning. arXiv preprint arXiv:2403.10175

work page arXiv 2024
[32]

Kulasekera, K. (1995). Comparison of regression curves using quasi-residuals. Journal of the American Statistical Association , 90(431):1085--1093

work page 1995
[33]

and Wang, J

Kulasekera, K. and Wang, J. (1997). Smoothing parameter selection for power optimality in testing of regression curves. Journal of the American Statistical Association , 92(438):500--511

work page 1997
[34]

Li, S., Zhang, Y., Zhu, H., Wang, C., Shu, H., Chen, Z., Sun, Z., and Yang, Y. (2023). K-nearest-neighbor local sampling based conditional independence testing. Advances in Neural Information Processing Systems , 36:23321--23344

work page 2023
[35]

Liu, F., Xu, W., Lu, J., Zhang, G., Gretton, A., and Sutherland, D. J. (2020). Learning deep kernels for non-parametric two-sample tests. In International Conference on Machine Learning , pages 6316--6326

work page 2020
[36]

Liu, M., Katsevich, E., Janson, L., and Ramdas, A. (2022). Fast and powerful conditional randomization testing via distillation. Biometrika , 109(2):277--293

work page 2022
[37]

Liu, S., Takeda, A., Suzuki, T., and Fukumizu, K. (2017). Trimmed density ratio estimation. Advances in Neural Information Processing Systems , 30:4521–4531

work page 2017
[38]

and Oquab, M

Lopez-Paz, D. and Oquab, M. (2017). Revisiting Classifier Two-Sample Tests . In International Conference on Learning Representations

work page 2017
[39]

R., Kim, I., Shah, R

Lundborg, A. R., Kim, I., Shah, R. D., and Samworth, R. J. (2022). The Projected Covariance Measure for assumption-lean variable significance testing . arXiv preprint arXiv:2211.02039 (accepted to the Annals of Statistics)

work page arXiv 2022
[40]

Mulzer, W. (2018). Five proofs of Chernoff's bound with applications . arXiv preprint arXiv:1801.03365

work page internal anchor Pith review Pith/arXiv arXiv 2018
[41]

and Dette, H

Neumeyer, N. and Dette, H. (2003). Nonparametric comparison of regression curves: an empirical process approach. The Annals of Statistics , 31(3):880--920

work page 2003
[42]

Neykov, M., Balakrishnan, S., and Wasserman, L. (2021). Minimax optimal conditional independence testing. The Annals of Statistics , 49(4):2151--2177

work page 2021
[43]

Neykov, M., Wasserman, L., Kim, I., and Balakrishnan, S. (2023). Nearly Minimax Optimal Wasserstein Conditional Independence Testing . arXiv preprint arXiv:2308.08672

work page arXiv 2023
[44]

C., Jiménez-Gamero, M

Pardo-Fernández, J. C., Jiménez-Gamero, M. D., and El Ghouch, A. (2015). Tests for the equality of conditional variance functions in nonparametric regression. Electronic Journal of Statistics , 9(2)

work page 2015
[45]

J., and Gretton, A

Pogodin, R., Schrab, A., Li, Y., Sutherland, D. J., and Gretton, A. (2024). Practical Kernel Tests of Conditional Independence . arXiv preprint arXiv:2402.13196

work page arXiv 2024
[46]

Rhodes, B., Xu, K., and Gutmann, M. U. (2020). Telescoping density-ratio estimation. Advances in Neural Information Processing Systems , 33:4905--4916

work page 2020
[47]

o rrmann, J., and B \

Scheidegger, C., H \"o rrmann, J., and B \"u hlmann, P. (2022). The weighted generalised covariance measure. Journal of Machine Learning Research , 23(273):1--68

work page 2022
[48]

Schrab, A., Kim, I., Albert, M., Laurent, B., Guedj, B., and Gretton, A. (2023). MMD aggregated two-sample test . Journal of Machine Learning Research , 24(194):1--81

work page 2023
[49]

Shah, R. D. and Peters, J. (2020). The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics , 48(3):1514--1538

work page 2020
[50]

Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference , 90(2):227--244

work page 2000
[51]

V., Zhang, K., and Visweswaran, S

Strobl, E. V., Zhang, K., and Visweswaran, S. (2019). Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Journal of Causal Inference , 7(1):20180017

work page 2019
[52]

Sugiyama, M., Krauledat, M., and M \"u ller, K.-R. (2007a). Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research , 8(35):985--1005

work page
[53]

Sugiyama, M., Nakajima, S., Kashima, H., Buenau, P., and Kawanabe, M. (2007b). Direct importance estimation with model selection and its application to covariate shift adaptation. Advances in Neural Information Processing Systems , 20

work page
[54]

Sugiyama, M., Suzuki, T., and Kanamori, T. (2010). Density ratio estimation: A comprehensive review. RIMS Kokyuroku , pages 10--31

work page 2010
[55]

Sugiyama, M., Suzuki, T., and Kanamori, T. (2012). Density Ratio Estimation in Machine Learning . Cambridge University Press

work page 2012
[56]

Tansey, W., Veitch, V., Zhang, H., Rabadan, R., and Blei, D. M. (2022). The holdout randomization test for feature selection in black box models. Journal of Computational and Graphical Statistics , 31(1):151--162

work page 2022
[57]

Tsuboi, Y., Kashima, H., Hido, S., Bickel, S., and Sugiyama, M. (2009). Direct density ratio estimation for large-scale covariate shift adaptation. Journal of Information Processing , 17:138--155

work page 2009
[58]

J., VonHandorf, A., Viel, K

Virolainen, S. J., VonHandorf, A., Viel, K. C. M. F., Weirauch, M. T., and Kottyan, L. C. (2022). Gene–environment interactions and their impact on human health. Genes & Immunity , 24(1):1–11

work page 2022
[59]

D., Gilbert, P

Williamson, B. D., Gilbert, P. B., Simon, N. R., and Carone, M. (2023). A general framework for inference on algorithm-agnostic variable importance. Journal of the American Statistical Association , 118(543):1645--1658

work page 2023
[60]

M., and Baccarelli, A

Wu, H., Eckhardt, C. M., and Baccarelli, A. A. (2023). Molecular mechanisms of environmental exposures and human disease. Nature Reviews Genetics , 24(5):332–344

work page 2023
[61]

and Zhang, X

Yan, J. and Zhang, X. (2022). A nonparametric two-sample conditional distribution test. arXiv preprint arXiv:2210.08149

work page arXiv 2022
[62]

Zaremba, W., Gretton, A., and Blaschko, M. (2013). B-test: A non-parametric, low variance kernel two-sample test. Advances in Neural Information Processing Systems , 26

work page 2013
[63]

Zhang, K., Huang, B., Zhang, J., Glymour, C., and Sch \"o lkopf, B. (2017). Causal discovery from nonstationary/heterogeneous data: skeleton estimation and orientation determination. Proceedings of the 26th International Joint Conference on Artificial Intelligence , pages 1347--1353

work page 2017
[64]

Zhang, K., Peters, J., Janzing, D., and Sch\" o lkopf, B. (2011). Kernel-Based Conditional Independence Test and Application in Causal Discovery . In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence , UAI'11, pages 804--813, Arlington, Virginia, USA. AUAI Press

work page 2011
[65]

Zheng, J. X. (2000). A consistent test of conditional parametric distributions. Econometric Theory , 16(5):667--691

work page 2000
[66]

and Hastie, T

Zhu, J. and Hastie, T. (2005). Kernel logistic regression and the import vector machine. Journal of Computational and Graphical Statistics , 14(1):185--205

work page 2005

[1] [1]

Andrews, D. W. K. (1997). A Conditional Kolmogorov Test . Econometrica , 65(5):1097--1128

work page 1997

[2] [2]

Barocas, S., Hardt, M., and Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities . The MIT Press

work page 2023

[3] [3]

B., Wang, Y., Barber, R

Berrett, T. B., Wang, Y., Barber, R. F., and Samworth, R. J. (2020). The conditional permutation test for independence while controlling for confounders. Journal of the Royal Statistical Society Series B: Statistical Methodology , 82(1):175--197

work page 2020

[4] [4]

Boeken, P. A. and Mooij, J. M. (2021). A bayesian nonparametric conditional two-sample test with an application to local causal discovery. In de Campos, C. and Maathuis, M. H., editors, Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence , volume 161 of Proceedings of Machine Learning Research , pages 1565--1575. PMLR

work page 2021

[5] [5]

Candes, E., Fan, Y., Janson, L., and Lv, J. (2018). Panning for gold: ‘Model-X’ knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology , 80(3):551--577

work page 2018

[6] [6]

Chakraborty, A., Zhang, J., and Katsevich, E. (2024). Doubly robust and computationally efficient high-dimensional variable selection. arXiv preprint arXiv:2409.09512

work page arXiv 2024

[7] [7]

Chatterjee, A., Niu, Z., and Bhattacharya, B. B. (2024). A kernel-based conditional two-sample test using nearest neighbors (with applications to calibration, regression curves, and simulation-based inference). arXiv preprint arXiv:2407.16550

work page arXiv 2024

[8] [8]

and Lei, J

Chen, Y. and Lei, J. (2024). De-Biased Two-Sample U-Statistics With Application To Conditional Distribution Testing . arXiv preprint arXiv:2402.00164

work page arXiv 2024

[9] [9]

Choi, K., Liao, M., and Ermon, S. (2021). Featurized density ratio estimation. In Uncertainty in Artificial Intelligence , pages 172--182

work page 2021

[10] [10]

Choi, K., Meng, C., Song, Y., and Ermon, S. (2022). Density ratio estimation via infinitesimal classification. In International Conference on Artificial Intelligence and Statistics , pages 2552--2573

work page 2022

[11] [11]

and Romano, J

Chung, E. and Romano, J. P. (2013). Exact and asymptotically robust permutation tests. The Annals of Statistics , 41(2):484--507

work page 2013

[12] [12]

Dai, B., Shen, X., and Pan, W. (2022). Significance tests of feature relevance for a black-box learner. IEEE transactions on neural networks and learning systems , 35(2):1898--1911

work page 2022

[13] [13]

Doran, G., Muandet, K., Zhang, K., and Sch\" o lkopf, B. (2014). A permutation-based kernel conditional independence test. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence , UAI'14, page 132–141, Arlington, Virginia, USA. AUAI Press

work page 2014

[14] [14]

and Lin, S.-K

Fan, J. and Lin, S.-K. (1998). Test of significance when data are curves. Journal of the American Statistical Association , 93(443):1007--1021

work page 1998

[15] [15]

Fan, Y., Li, Q., and Min, I. (2006). A nonparametric bootstrap test of conditional distributions. Econometric Theory , 22(4):587--613

work page 2006

[16] [16]

Fukumizu, K., Gretton, A., Sun, X., and Sch \"o lkopf, B. (2007). Kernel measures of conditional dependence. Advances in Neural Information Processing Systems , 20:489–496

work page 2007

[17] [17]

Givens, G. H. and Hoeting, J. A. (2012). Computational statistics . John Wiley & Sons, Hoboken, NJ, USA, 2 edition

work page 2012

[18] [18]

M., Rasch, M

Gretton, A., Borgwardt, K. M., Rasch, M. J., Sch \"o lkopf, B., and Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research , 13(25):723--773

work page 2012

[19] [19]

and Hart, J

Hall, P. and Hart, J. D. (1990). Bootstrap test for difference between means in nonparametric regression. Journal of the American Statistical Association , 85(412):1039--1049

work page 1990

[20] [20]

Hamidieh, K. (2018). A data-driven statistical model for predicting the critical temperature of a superconductor. Computational Materials Science , 154:346–354

work page 2018

[21] [21]

Hardt, M., Price, E., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning. In Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R., editors, Advances in Neural Information Processing Systems , volume 29, page 3323–3331. Curran Associates, Inc

work page 2016

[22] [22]

Hediger, S., Michel, L., and N \"a f, J. (2022). On the use of random forest for two-sample testing. Computational Statistics & Data Analysis , 170:107435

work page 2022

[23] [23]

and Lei, J

Hu, X. and Lei, J. (2024). A two-sample conditional distribution test using conformal prediction and weighted rank sum. Journal of the American Statistical Association , 119(546):1136--1154

work page 2024

[24] [24]

Kanamori, T., Hido, S., and Sugiyama, M. (2009). A least-squares approach to direct importance estimation. The Journal of Machine Learning Research , 10:1391--1445

work page 2009

[25] [25]

Kanamori, T., Suzuki, T., and Sugiyama, M. (2010). Theoretical analysis of density ratio estimation. IEICE transactions on fundamentals of electronics, communications and computer sciences , 93(4):787--798

work page 2010

[26] [26]

Kim, I., Balakrishnan, S., and Wasserman, L. (2022a). Minimax optimality of permutation tests. The Annals of Statistics , 50(1):225--251

work page

[27] [27]

B., and Lei, J

Kim, I., Lee, A. B., and Lei, J. (2019). Global and local two-sample tests via regression. Electronic Journal of Statistics , 13(2):5253--5305

work page 2019

[28] [28]

Kim, I., Neykov, M., Balakrishnan, S., and Wasserman, L. (2022b). Local permutation tests for conditional independence. The Annals of Statistics , 50(6):3388--3414

work page

[29] [29]

Kim, I., Neykov, M., Balakrishnan, S., and Wasserman, L. (2023). Conditional Independence Testing for Discrete Distributions: Beyond ^2 -and G -tests . arXiv preprint arXiv:2308.05373

work page arXiv 2023

[30] [30]

Kim, I., Ramdas, A., Singh, A., and Wasserman, L. (2021). Classification accuracy as a proxy for two-sample testing. The Annals of Statistics , 49(1):411--434

work page 2021

[31] [31]

and Hino, H

Kimura, M. and Hino, H. (2024). A short survey on importance weighting for machine learning. arXiv preprint arXiv:2403.10175

work page arXiv 2024

[32] [32]

Kulasekera, K. (1995). Comparison of regression curves using quasi-residuals. Journal of the American Statistical Association , 90(431):1085--1093

work page 1995

[33] [33]

and Wang, J

Kulasekera, K. and Wang, J. (1997). Smoothing parameter selection for power optimality in testing of regression curves. Journal of the American Statistical Association , 92(438):500--511

work page 1997

[34] [34]

Li, S., Zhang, Y., Zhu, H., Wang, C., Shu, H., Chen, Z., Sun, Z., and Yang, Y. (2023). K-nearest-neighbor local sampling based conditional independence testing. Advances in Neural Information Processing Systems , 36:23321--23344

work page 2023

[35] [35]

Liu, F., Xu, W., Lu, J., Zhang, G., Gretton, A., and Sutherland, D. J. (2020). Learning deep kernels for non-parametric two-sample tests. In International Conference on Machine Learning , pages 6316--6326

work page 2020

[36] [36]

Liu, M., Katsevich, E., Janson, L., and Ramdas, A. (2022). Fast and powerful conditional randomization testing via distillation. Biometrika , 109(2):277--293

work page 2022

[37] [37]

Liu, S., Takeda, A., Suzuki, T., and Fukumizu, K. (2017). Trimmed density ratio estimation. Advances in Neural Information Processing Systems , 30:4521–4531

work page 2017

[38] [38]

and Oquab, M

Lopez-Paz, D. and Oquab, M. (2017). Revisiting Classifier Two-Sample Tests . In International Conference on Learning Representations

work page 2017

[39] [39]

R., Kim, I., Shah, R

Lundborg, A. R., Kim, I., Shah, R. D., and Samworth, R. J. (2022). The Projected Covariance Measure for assumption-lean variable significance testing . arXiv preprint arXiv:2211.02039 (accepted to the Annals of Statistics)

work page arXiv 2022

[40] [40]

Mulzer, W. (2018). Five proofs of Chernoff's bound with applications . arXiv preprint arXiv:1801.03365

work page internal anchor Pith review Pith/arXiv arXiv 2018

[41] [41]

and Dette, H

Neumeyer, N. and Dette, H. (2003). Nonparametric comparison of regression curves: an empirical process approach. The Annals of Statistics , 31(3):880--920

work page 2003

[42] [42]

Neykov, M., Balakrishnan, S., and Wasserman, L. (2021). Minimax optimal conditional independence testing. The Annals of Statistics , 49(4):2151--2177

work page 2021

[43] [43]

Neykov, M., Wasserman, L., Kim, I., and Balakrishnan, S. (2023). Nearly Minimax Optimal Wasserstein Conditional Independence Testing . arXiv preprint arXiv:2308.08672

work page arXiv 2023

[44] [44]

C., Jiménez-Gamero, M

Pardo-Fernández, J. C., Jiménez-Gamero, M. D., and El Ghouch, A. (2015). Tests for the equality of conditional variance functions in nonparametric regression. Electronic Journal of Statistics , 9(2)

work page 2015

[45] [45]

J., and Gretton, A

Pogodin, R., Schrab, A., Li, Y., Sutherland, D. J., and Gretton, A. (2024). Practical Kernel Tests of Conditional Independence . arXiv preprint arXiv:2402.13196

work page arXiv 2024

[46] [46]

Rhodes, B., Xu, K., and Gutmann, M. U. (2020). Telescoping density-ratio estimation. Advances in Neural Information Processing Systems , 33:4905--4916

work page 2020

[47] [47]

o rrmann, J., and B \

Scheidegger, C., H \"o rrmann, J., and B \"u hlmann, P. (2022). The weighted generalised covariance measure. Journal of Machine Learning Research , 23(273):1--68

work page 2022

[48] [48]

Schrab, A., Kim, I., Albert, M., Laurent, B., Guedj, B., and Gretton, A. (2023). MMD aggregated two-sample test . Journal of Machine Learning Research , 24(194):1--81

work page 2023

[49] [49]

Shah, R. D. and Peters, J. (2020). The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics , 48(3):1514--1538

work page 2020

[50] [50]

Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference , 90(2):227--244

work page 2000

[51] [51]

V., Zhang, K., and Visweswaran, S

Strobl, E. V., Zhang, K., and Visweswaran, S. (2019). Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Journal of Causal Inference , 7(1):20180017

work page 2019

[52] [52]

Sugiyama, M., Krauledat, M., and M \"u ller, K.-R. (2007a). Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research , 8(35):985--1005

work page

[53] [53]

Sugiyama, M., Nakajima, S., Kashima, H., Buenau, P., and Kawanabe, M. (2007b). Direct importance estimation with model selection and its application to covariate shift adaptation. Advances in Neural Information Processing Systems , 20

work page

[54] [54]

Sugiyama, M., Suzuki, T., and Kanamori, T. (2010). Density ratio estimation: A comprehensive review. RIMS Kokyuroku , pages 10--31

work page 2010

[55] [55]

Sugiyama, M., Suzuki, T., and Kanamori, T. (2012). Density Ratio Estimation in Machine Learning . Cambridge University Press

work page 2012

[56] [56]

Tansey, W., Veitch, V., Zhang, H., Rabadan, R., and Blei, D. M. (2022). The holdout randomization test for feature selection in black box models. Journal of Computational and Graphical Statistics , 31(1):151--162

work page 2022

[57] [57]

Tsuboi, Y., Kashima, H., Hido, S., Bickel, S., and Sugiyama, M. (2009). Direct density ratio estimation for large-scale covariate shift adaptation. Journal of Information Processing , 17:138--155

work page 2009

[58] [58]

J., VonHandorf, A., Viel, K

Virolainen, S. J., VonHandorf, A., Viel, K. C. M. F., Weirauch, M. T., and Kottyan, L. C. (2022). Gene–environment interactions and their impact on human health. Genes & Immunity , 24(1):1–11

work page 2022

[59] [59]

D., Gilbert, P

Williamson, B. D., Gilbert, P. B., Simon, N. R., and Carone, M. (2023). A general framework for inference on algorithm-agnostic variable importance. Journal of the American Statistical Association , 118(543):1645--1658

work page 2023

[60] [60]

M., and Baccarelli, A

Wu, H., Eckhardt, C. M., and Baccarelli, A. A. (2023). Molecular mechanisms of environmental exposures and human disease. Nature Reviews Genetics , 24(5):332–344

work page 2023

[61] [61]

and Zhang, X

Yan, J. and Zhang, X. (2022). A nonparametric two-sample conditional distribution test. arXiv preprint arXiv:2210.08149

work page arXiv 2022

[62] [62]

Zaremba, W., Gretton, A., and Blaschko, M. (2013). B-test: A non-parametric, low variance kernel two-sample test. Advances in Neural Information Processing Systems , 26

work page 2013

[63] [63]

Zhang, K., Huang, B., Zhang, J., Glymour, C., and Sch \"o lkopf, B. (2017). Causal discovery from nonstationary/heterogeneous data: skeleton estimation and orientation determination. Proceedings of the 26th International Joint Conference on Artificial Intelligence , pages 1347--1353

work page 2017

[64] [64]

Zhang, K., Peters, J., Janzing, D., and Sch\" o lkopf, B. (2011). Kernel-Based Conditional Independence Test and Application in Causal Discovery . In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence , UAI'11, pages 804--813, Arlington, Virginia, USA. AUAI Press

work page 2011

[65] [65]

Zheng, J. X. (2000). A consistent test of conditional parametric distributions. Econometric Theory , 16(5):667--691

work page 2000

[66] [66]

and Hastie, T

Zhu, J. and Hastie, T. (2005). Kernel logistic regression and the import vector machine. Journal of Computational and Graphical Statistics , 14(1):185--205

work page 2005