Deep-testing: the case of dependence detection

Gery Geenens; Ivan Muyun Zou; Pierre Lafaye de Micheaux

arxiv: 2604.26558 · v1 · submitted 2026-04-29 · 📊 stat.ML · cs.LG· stat.ME

Deep-testing: the case of dependence detection

Gery Geenens , Pierre Lafaye de Micheaux , Ivan Muyun Zou This is my paper

Pith reviewed 2026-05-07 12:40 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME

keywords independence testingdeep learninghypothesis testingsimulation studydependence detectionneural network classifiertest statisticpower comparison

0 comments

The pith

A neural network trained on simulated null and alternative samples produces a test statistic that achieves the highest overall power for independence testing against nineteen competing methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes to treat hypothesis testing as a classification task solved by deep learning. A neural network is trained on simulated samples drawn under the null hypothesis and under alternatives, then the learned map from sample to classification score becomes the test statistic. Applied as a proof of concept to the problem of testing independence, the procedure is compared in a large-scale simulation study against nineteen existing methods across many complex dependence structures. If successful, the approach supplies a flexible way to construct powerful tests without needing closed-form expressions for the null distribution or the alternative. A reader would care because it suggests deep learning can be transferred from image classification to core statistical inference tasks.

Core claim

Deep-testing approaches the classical problem of hypothesis testing by training a deep neural network on simulated data satisfying the null and alternative hypotheses; the resulting classification map serves as the test statistic and leverages the network's strong discriminating power to produce a highly powerful test. As a proof of concept the method is applied to independence testing, where a large-scale simulation study shows that deep-testing attains the highest overall power among nineteen competing procedures across a broad range of complex dependence structures.

What carries the argument

The classification map learned by a deep neural network trained on simulated samples from the null and alternative hypotheses, used directly as the test statistic.

If this is right

The learned classifier can serve as a test statistic for independence without requiring explicit formulas for the null distribution.
High power is maintained across a wide variety of dependence structures that are difficult for traditional tests.
The procedure offers a general template that can be applied to other hypothesis-testing problems by changing the simulation protocol.
Performance gains arise from the network's ability to extract discriminating features directly from the sample geometry.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same training strategy could be used to construct tests for multivariate dependence or for conditional independence by adjusting the simulation design.
One would still need to verify that the p-value obtained from the network output is calibrated under the null on data sets whose marginal distributions differ from those used in training.
Hybrid methods that combine the network-based statistic with classical rank-based tests might improve robustness when sample sizes are small.

Load-bearing premise

A neural network trained on simulated null and alternative samples will produce a test statistic whose null distribution can be reliably calibrated and that generalizes to yield valid and powerful tests on real data.

What would settle it

Applying the trained classifier to fresh independent samples drawn from the same distributions used in training and checking whether the empirical rejection rate at the nominal level equals the target significance level.

Figures

Figures reproduced from arXiv: 2604.26558 by Gery Geenens, Ivan Muyun Zou, Pierre Lafaye de Micheaux.

**Figure 1.** Figure 1: Bivariate histogram representation (right) of a typical ‘parabola’ view at source ↗

**Figure 2.** Figure 2: Typical samples of size n = 400 generated from the training models 1–10; from top left to bottom right: Linear, Diamond, Triangle, Crescent, Points, Exponential, Circles, Cross, Wedge, Cubic view at source ↗

**Figure 4.** Figure 4: Convolutional neural network architecture view at source ↗

**Figure 5.** Figure 5: Feed-forward neural network architecture view at source ↗

**Figure 6.** Figure 6: Neural network architecture All-CNN-MLP using both dependence indicators (and sample size) and images as input features (Scenario 3). TensorFlow and Keras as the backend. All three networks were implemented in R using keras, with reticulate providing the interface to Python’s tensorflow.keras backend. They were trained under the same configuration: Adam optimiser (learning_rate=10−3 , β1 = 0.9, β2 = 0.999… view at source ↗

**Figure 7.** Figure 7: Representative samples of size n = 400 generated from six novel dependence patterns; from top left to bottom right: Laplace, Ishigami, Tree Ring, Variance, Infinity, Pi. each such indicator, and near-exact critical values for each of them may be deduced. This means that, by construction, all the procedures under comparison – both our three deep-tests and the tests based on the individual indicators – have … view at source ↗

**Figure 8.** Figure 8: (Monte-Carlo) Power of the proposed 3 deep-testing procedures and view at source ↗

**Figure 9.** Figure 9: (Monte-Carlo approximated) Power of the proposed 3 deep view at source ↗

read the original abstract

Deep learning methods have proved highly effective for classification and image recognition problems. In this paper, we ask whether this success can be transferred to hypothesis testing: if a neural network can distinguish, for example, an image of a handwritten digit from another, can it also distinguish an "image of a sample" (such as a scatter plot) generated under a given statistical model from one generated outside that model? Motivated by this idea, we propose a novel procedure called deep-testing, which approaches the classical inferential problem of hypothesis testing through deep learning. More specifically, the test statistic is a classification map learned by a deep neural network from simulated data satisfying the null and alternative hypotheses, leveraging its strong discriminating power to construct a highly powerful test. As a proof of concept, we apply deep-testing to the problem of independence testing, arguably one of the most important problems in statistics. In a large-scale simulation study, deep-testing achieves the highest overall power against nineteen competing methods across a broad range of complex dependence structures, confirming the viability of the proposed approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames independence testing as training a neural net on simulated null and alternative samples to produce a test statistic, with simulations claiming top power against 19 methods, but the results depend on whether training alternatives are held out from evaluation.

read the letter

The core idea is to simulate data under independence and under dependence, train a deep network to classify which is which, and then use the network output as the test statistic on real data. This turns the usual analytic derivation into a supervised learning step on external simulations. It is a direct way to handle complex dependence where closed-form tests are hard to derive. The large simulation study across many structures and the comparison to nineteen existing methods is the main piece of evidence, and it shows the approach can deliver higher power in those settings. That part is executed at a reasonable scale and gives a concrete sense of where the gains appear. The soft spot is exactly the one in the stress-test note. If the dependence patterns used to train the network are too close to those used to measure power, the reported advantage could partly reflect the network fitting simulation-specific details rather than learning a general detector. The abstract does not make the separation explicit, so the power ranking needs verification that the evaluation cases are genuinely out of sample. Null calibration also requires checking, since the statistic is learned rather than fixed. This is for readers working on nonparametric testing or on ways to embed machine learning inside statistical procedures. Someone looking for new tools for high-dimensional or nonlinear dependence would get practical value from the simulations and the implementation sketch. It deserves peer review because the framing is fresh and the empirical comparison is broad enough to be worth referee scrutiny, even if the training protocol will need tightening.

Referee Report

2 major / 2 minor

Summary. The paper proposes deep-testing, a hypothesis testing framework that trains a deep neural network on simulated data drawn from the null and alternative distributions to learn a classification map serving as the test statistic. As a proof of concept, the method is applied to independence testing; a large-scale simulation study reports that deep-testing attains the highest overall power among nineteen competing procedures across a range of complex dependence structures.

Significance. If the simulation results are obtained under strict separation between training and evaluation distributions and the procedure maintains valid type-I error control, the work demonstrates that deep-learning classifiers can be repurposed as powerful, flexible test statistics for nonparametric problems where analytic forms are unavailable. The empirical breadth of the study provides concrete evidence that the approach is viable for dependence detection, though its broader utility hinges on generalization beyond the simulated regimes.

major comments (2)

[Simulation study] Simulation study section: the manuscript does not explicitly state whether the dependence structures (or their parameterizations) used to generate the training alternatives are disjoint from those used to evaluate power. Without this separation, the reported power ranking could reflect the network's ability to exploit simulation-specific artifacts rather than a general advantage over the nonparametric competitors.
[Method] Method section (around the definition of the test statistic): it is unclear how the threshold for the learned classifier output is calibrated to guarantee finite-sample or asymptotic type-I error control. Because the network is trained on external simulated data, the null distribution of the resulting statistic is not automatically pivotal and requires a separate calibration step whose details are not provided.

minor comments (2)

[Abstract] The abstract lists 'nineteen competing methods' without naming them or providing a reference table; adding this information would allow readers to assess the breadth of the comparison immediately.
[Notation] Notation for the network output (e.g., the precise mapping from classifier probability to test statistic) should be introduced with an equation number in the methods section to facilitate later discussion of calibration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address the major comments point by point below.

read point-by-point responses

Referee: [Simulation study] Simulation study section: the manuscript does not explicitly state whether the dependence structures (or their parameterizations) used to generate the training alternatives are disjoint from those used to evaluate power. Without this separation, the reported power ranking could reflect the network's ability to exploit simulation-specific artifacts rather than a general advantage over the nonparametric competitors.

Authors: We thank the referee for this observation. Upon checking, the training alternatives were generated from dependence structures and parameterizations that are disjoint from the evaluation set to avoid any potential for the network to exploit simulation-specific features. We will revise the manuscript to explicitly state this separation in the Simulation study section, including a description of the distinct sets used for training and evaluation. revision: yes
Referee: [Method] Method section (around the definition of the test statistic): it is unclear how the threshold for the learned classifier output is calibrated to guarantee finite-sample or asymptotic type-I error control. Because the network is trained on external simulated data, the null distribution of the resulting statistic is not automatically pivotal and requires a separate calibration step whose details are not provided.

Authors: We agree that additional details on threshold calibration are necessary. The procedure involves simulating a large number of samples under the null hypothesis after training, computing the classifier outputs on these samples, and determining the threshold as the appropriate quantile to achieve the desired type-I error rate. This ensures finite-sample control. We will update the Method section to provide a complete description of this calibration process, along with theoretical justification for its validity. revision: yes

Circularity Check

0 steps flagged

No significant circularity in deep-testing procedure or simulation claims

full rationale

The paper defines deep-testing by training a neural network classifier on independently generated simulated samples drawn from the null (independence) and chosen alternatives, then uses the resulting classification map as the test statistic. Power is assessed via a separate large-scale Monte Carlo study that applies the trained statistic to fresh draws from a range of dependence structures and compares rejection rates against 19 other methods. No equation or claim reduces by construction to a parameter fitted on the same data being tested, no self-citation supplies a load-bearing uniqueness result, and the training simulations are external to any real-data application. The derivation is therefore self-contained as a standard empirical simulation-based procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that neural networks can reliably discriminate between distributions when trained on simulated samples, and that the resulting classifier yields a valid test when applied to real data. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption A deep neural network trained on simulated samples can learn a discriminating map between null and alternative distributions that generalizes to produce a powerful test statistic on real data.
This is the central modeling assumption required for the method to work.

pith-pipeline@v0.9.0 · 5487 in / 1280 out tokens · 58378 ms · 2026-05-07T12:40:32.768763+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages

[1]

and Stegun, I

Abramowitz, M. and Stegun, I. A. (1965).Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. US Department of Commerce, National Bureau of Standards, Applied Mathematics Series 55

work page 1965
[2]

Albawi, S., Mohammed, T . A. and Al-Zawi, S. (2017). Understanding of a convolutional neural network. InProceedings of the 2017 International Conference on Engineering and Technology (ICET). 16

work page 2017
[3]

and Liang, Y

Allen-Zhu, Z., Li, Y. and Liang, Y. (2019).Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers.Advances in Neural Information Processing Systems, 32

work page 2019
[4]

S., Sohl-Dickstein, J

Bahri, Y., Kadmon, J., Pennington, J., Schoenholz, S. S., Sohl-Dickstein, J. and Ganguli, S. (2020).Statisti- cal Mechanics of Deep Learning.Annual Review of Condensed Matter Physics, 11, 501–528

work page 2020
[5]

L., Long, P

Bartlett, P . L., Long, P . M., Lugosi, G. and Tsigler, A. (2020).Benign Overfitting in Linear Regression. Proceedings of the National Academy of Sciences, 117, 30063–30070

work page 2020
[6]

L., Montanari, A

Bartlett, P . L., Montanari, A. and Rakhlin, A. (2021). Deep Learning: a Statistical Viewpoint.Acta Numer- ica, 30, 87–201

work page 2021
[7]

and Mandal, S

Belkin, M., Hsu, D., Ma, S. and Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off.Proc. Natl. Acad. Sci. USA, 116, 15849–15854

work page 2019
[8]

and van der Schaar, M

Bellot, A. and van der Schaar, M. (2019).Conditional Independence Testing using Generative Adversarial Networks.Advances in Neural Information Processing Systems, 32

work page 2019
[9]

Berrett, T . B. and Samworth, R. J. (2019).Nonpara- metric independence testing via mutual information. Biometrika, 106, 547–566

work page 2019
[10]

(1950).On a Measure of Dependence between Two Random Variables.Annals of Mathemat- ical statistics, 21(4), 593–600

Blomqvist, N. (1950).On a Measure of Dependence between Two Random Variables.Annals of Mathemat- ical statistics, 21(4), 593–600

work page 1950
[11]

and Friedman, J

Breiman, L. and Friedman, J. H. (1985).Estimating Optimal Transformations for Multiple Regression and Correlation.Journal of the American Statistical Association, 80(391), 580–598

work page 1985
[12]

Cover, T . M. and Thomas, J. A. (2006).Elements of Information Theory, 2nd ed. Wiley, New York

work page 2006
[13]

Dawid, A. P . (1979).Conditional independence in sta- tistical theory.Journal of the Royal Statistical Society: Series B, 41, 1–15

work page 1979
[14]

(1979).La fonction de dépendance em- pirique et ses propriétés

Deheuvels, P . (1979).La fonction de dépendance em- pirique et ses propriétés. Un test non paramétrique d’indépendance.Bulletins de l’Académie Royale de Bel- gique, 65, 274–292

work page 1979
[15]

and Shepp, L

Dembo, A., Kagan, A. and Shepp, L. A. (2001).Remarks on the Maximum Correlation Coefficient.Bernoulli, 7(2), 343–350

work page 2001
[16]

and Lugosi, G

Devroye, L., Györfi, L. and Lugosi, G. (1996).A Proba- bilistic Theory of Pattern Recognition. Springer, New York

work page 1996
[17]

and Kotz, S

Drouet Mari, D. and Kotz, S. (2001).Correlation and Dependence. Imperial College Press

work page 2001
[18]

and Zhong, Y

Fan, J., Ma, C. and Zhong, Y. (2021).A Selective Overview of Deep Learning.Statistical Science, 36, 264–290

work page 2021
[19]

(1941).Das statistische Problem der Ko- rrelation als Variations- und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung

Gebelein, H. (1941).Das statistische Problem der Ko- rrelation als Variations- und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung. Zeitschrift für Angewandte Mathematik und Mechanik, 21, 364–379

work page 1941
[20]

Towards a universal representation of statistical dependence

Geenens, G. (2023).Towards a universal repre- sentation of statistical dependence.arXiv preprint arXiv:2302.08151

work page arXiv 2023
[21]

and Lafaye de Micheaux, P

Geenens, G. and Lafaye de Micheaux, P . (2022).The Hellinger Correlation.Journal of the American Statis- tical Association, 117, 639–653

work page 2022
[22]

and Boies, J

Genest, C. and Boies, J. C. (2003).Detecting Depen- dence with Kendall Plots.The American Statistician, 57(4), 275–284

work page 2003
[23]

and Polyanskiy, Y

Gerber, P .R., Han, Y. and Polyanskiy, Y. (2023).Mini- max optimal testing via classification.Proceedings of Machine Learning Research, 195:1–38

work page 2023
[24]

and Courville, A

Goodfellow, I., Bengio, Y. and Courville, A. (2016).Deep Learning. MIT Press

work page 2016
[25]

and Schölkopf, B

Gretton, A., Bousquet, O., Smola, A. and Schölkopf, B. (2005).Measuring Statistical Dependence with Hilbert-Schmidt Norms. InProceedings of the 16th In- ternational Conference on Algorithmic Learning Theory, 63–77

work page 2005
[26]

H., Song, L., Schölkopf, B

Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B. and Smola, A. J. (2008).A kernel sta- tistical test of independence. InAdvances in Neural Information Processing Systems, 585–592

work page 2008
[27]

and Elisseeff, A

Guyon, I. and Elisseeff, A. (2003).An introduction to variable and feature selection.Journal of Machine Learning Research, 3, 1157–1182

work page 2003
[28]

Härdle, W . K. and Simar, L. (2007).Applied Multivari- ate Analysis, 2nd ed. Springer

work page 2007
[29]

H., Rouhani, M., Fayyaz, M

Hasanpour, S. H., Rouhani, M., Fayyaz, M. and Sabokrou, M. (2016).Lets keep it simple, using simple architectures to outperform deeper and more com- plex architectures.arXiv preprintarXiv:1608.06037

work page arXiv 2016
[30]

and Friedman, J

Hastie, T ., Tibshirani, R. and Friedman, J. (2013).The Elements of Statistical Learning: Data Mining, Infer- ence, and Prediction. Springer

work page 2013
[31]

and Strimmer, K

Hausser, J. and Strimmer, K. (2015).Estimation of Entropy, Mutual Information and Related Quantities. Rpackage, version 1.2.1

work page 2015
[32]

and Gorfine, M

Heller, R., Heller, Y. and Gorfine, M. (2012).A Consis- tent Multivariate Test of Association Based on Ranks of Distances.arXiv preprintarXiv:1201.3522

work page arXiv 2012
[33]

(1948).A Non-Parametric Test of In- dependence.Annals of Mathematical statistics, 19(4), 546–557

Hoeffding, W . (1948).A Non-Parametric Test of In- dependence.Annals of Mathematical statistics, 19(4), 546–557

work page 1948
[34]

(2014).Dependence Modeling with Copulas

Joe, H. (2014).Dependence Modeling with Copulas. Chapman and Hall/CRC, Boca Raton, FL

work page 2014
[35]

Kallenberg, W . C. M. and Ledwina, T . (1999).Data- Driven Rank Tests for Independence.Journal of the American Statistical Association, 94(445), 285–301

work page 1999
[36]

Kendall, M. G. (1938).A new measure of rank corre- lation.Biometrika, 30(1–2), 81–89

work page 1938
[37]

Kendall, M. G. and Buckland, W . R. (1971).A Dictio- nary of Statistical Terms, 3rd ed. Hafner, New York

work page 1971
[38]

Kinney, J. B. and Atwal, G. S. (2014).Equitability, 17 mutual information and the maximal information coefficient.Proceedings of the National Academy of Sciences, 111, 3354–3359

work page 2014
[39]

Kozachenko, L. F . and Leonenko, N. N. (1987).Sample estimate of the entropy of a random vector.Problems of Information Transmission, 23, 95–101

work page 1987
[40]

and Tran, V

Lafaye de Micheaux, P . and Tran, V . (2016).PoweR: A Reproducible Research Tool to Ease Monte Carlo Power Simulation Studies for Goodness-of-fit Tests inR.Journal of Statistical Software, 69(3)

work page 2016
[41]

and Haffner, P

LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P . (1998). Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86, 2278–2324

work page 1998
[42]

and Hinton, G

LeCun, Y., Bengio, Y. and Hinton, G. (2015).Deep learning.Nature, 521, 436–444

work page 2015
[43]

Lehmann, E. L. and Romano, J. P . (2005).Testing Statistical Hypotheses. Springer, New York

work page 2005
[44]

Li, J. J. and Tong, X. (2020).Statistical hypothesis testing versus machine learning binary classification: Distinctions and guidelines.Patterns, 1(7)

work page 2020
[45]

and Nazarathy, Y

Liquet, B., Moka, S. and Nazarathy, Y. (2024).Mathe- matical Engineering of Deep Learning. CRC Press

work page 2024
[46]

Linfoot, E. H. (1957).An Informational Measure of Correlation.Information and Control, 1(1), 85–89

work page 1957
[47]

and Schölkopf, B

Lopez-Paz, D., Hennig, P . and Schölkopf, B. (2013).The Randomised Dependence Coefficient.arXiv preprint arXiv:1304.7717

work page arXiv 2013
[48]

Nelsen, R. B. (2006).An Introduction to Copulas. Springer, New York

work page 2006
[49]

B., Quesada-Molina, J

Nelsen, R. B., Quesada-Molina, J. J., Rodriguez-Lallena, J. A. and Ubeda-Flores, M. (2003).Kendall distribution functions.statistics and Probability Letters, 65, 263– 268

work page 2003
[50]

and Pearson, E

Neyman, J. and Pearson, E. S. (1933).IX. On the problem of the most efficient tests of statistical hypotheses.Philosophical Transactions of the Royal Society of London. Series A, 231, 289–337

work page 1933
[51]

and Pohl, K

Paschali, M., Zhao, Q., Adeli, E. and Pohl, K. M. (2022).Bridging the gap between deep learning and hypothesis-driven analysis via permutation testing. In Rekik, I., Adeli, E., Park, S. H. and Cintas, C. (eds), Predictive Intelligence in Medicine. PRIME 2022.Lecture Notes in Computer Science, 13564, 13–23. Springer, Cham

work page 2022
[52]

and Shekhar, S

Pandeva, T ., Forré, P ., Ramdas, A. and Shekhar, S. (2024).Deep anytime-valid hypothesis testing.Pro- ceedings of the 27th International Conference on Arti- ficial Intelligence and Statistics (AISTATS),Proceedings of Machine Learning Research, 238, 622–630

work page 2024
[53]

and Peters, J

Pfister, N., Bühlmann, P ., Schölkopf, B. and Peters, J. (2018).Kernel-Based Tests for Joint Independence. Journal of the Royal Statistical Society: Series B, 80, 5– 31

work page 2018
[54]

and Peters, J

Pfister, N. and Peters, J. (2019).dHSIC: Independence Testing via Hilbert-Schmidt Independence Criterion. Rpackage, version 2.1

work page 2019
[55]

(1959).On Measures of Dependence

Rényi, A. (1959).On Measures of Dependence. Acta Mathematica Academiae Scientiarum Hungaricae, 10(3–4), 441–451

work page 1959
[56]

N., Reshef, Y

Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P . J., Lander, E. S., Mitzenmacher, M. and Sabeti, P . C. (2011).Detecting Novel Associations in Large Data Sets.Science, 334, 1518–1524

work page 2011
[57]

A., Reshef, D

Reshef, Y. A., Reshef, D. N., Finucane, H. K., Sabeti, P . C. and Mitzenmacher, M. (2016).Measuring depen- dence powerfully and equitably.Journal of Machine Learning Research, 17, 1–63

work page 2016
[58]

and Tong, X

Rigollet, P . and Tong, X. (2011).Neyman–Pearson classification, convexity and stochastic constraints. Journal of Machine Learning Research, 12, 2831–2855

work page 2011
[59]

Sakib, S., Ahmed, N., Kabir, A. J. and Ahmed, H. (2018). An Overview of Convolutional Neural Network: Its Architecture and Applications.Preprints, 2018110546

work page 2018
[60]

(1984).On measures of concordance

Scarsini, M. (1984).On measures of concordance. Stochastica, 8, 201–218

work page 1984
[61]

and Wolff, E

Schweizer, B. and Wolff, E. F . (1981).On Nonparamet- ric Measures of Dependence for Random Variables. Annals of statistics, 9, 879–885

work page 1981
[62]

(2003).Mathematical statistics

Shao, J. (2003).Mathematical statistics. Springer, New York

work page 2003
[63]

and Zhang, J

Shao, X. and Zhang, J. (2014).Martingale Difference Correlation and Its Use in High-Dimensional Variable Screening.Journal of the American Statistical Associa- tion, 109(507), 1302–1318

work page 2014
[64]

Sejnowski, T . J. (2020).The unreasonable effectiveness of deep learning in artificial intelligence.Proceedings of the National Academy of Sciences, 117, 30033–30038

work page 2020
[65]

(1904).The Proof and Measurement of Association between Two Things.The American Journal of Psychology, 15(1), 72–101

Spearman, C. (1904).The Proof and Measurement of Association between Two Things.The American Journal of Psychology, 15(1), 72–101

work page 1904
[66]

and Cheng, G

Suh, N. and Cheng, G. (2025).A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models.Annual Review of statistics and Its Application, 12, 177–207

work page 2025
[67]

J., Rizzo, M

Székely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing independence by correlation of distances.Annals of statistics, 35, 2769–2794

work page 2007
[68]

(2013).A plug-in approach to Neyman– Pearson classification.Journal of Machine Learning Research, 14, 3011–3040

Tong, X. (2013).A plug-in approach to Neyman– Pearson classification.Journal of Machine Learning Research, 14, 3011–3040

work page 2013
[69]

and Zhao, A

Tong, X., Feng, Y. and Zhao, A. (2016).A survey on Neyman–Pearson classification and suggestions for future research.WIREs Computational statistics, 8, 64– 81

work page 2016
[70]

and Feng, Y

Tong, X., Xia, L., Wang, J. and Feng, Y. (2020).Neyman– Pearson classification: parametrics and sample size requirement.Journal of Machine Learning Research, 21, 1–48

work page 2020
[71]

and Hutson, A

Vexler, A., Chen, X. and Hutson, A. D. (2017).Depen- dence and Independence Structure and Inference. Statistical Methods in Medical Research, 26(5), 2114– 18 2132

work page 2017
[72]

and Zhong, P .-S

Yang, Y., Zhang, K. and Zhong, P .-S. (2025).Testing conditional independence with deep neural network based binary expansion testing (DeepBET).Proceed- ings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS),Proceedings of Machine Learning Research, 258, 4690–4698

work page 2025
[73]

− r 1−V (ℓ)2 k , r 1−V (ℓ)2 k #, ifV (ℓ) k <c (ℓ) −1, the mixture 0.5U

Xu, N., Liu, F . and Sutherland, D. J. (2026).Learning representations for independence testing.Transac- tions on Machine Learning Research. 19 APPENDIX A. Generation of the training sets Here we provide the essential algorithms – that is, the data-generating-process (DGP) formulae – used to generate independent samples (‘units’) ofnpairs of i.i.d. observ...

work page arXiv 2026

[1] [1]

and Stegun, I

Abramowitz, M. and Stegun, I. A. (1965).Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. US Department of Commerce, National Bureau of Standards, Applied Mathematics Series 55

work page 1965

[2] [2]

Albawi, S., Mohammed, T . A. and Al-Zawi, S. (2017). Understanding of a convolutional neural network. InProceedings of the 2017 International Conference on Engineering and Technology (ICET). 16

work page 2017

[3] [3]

and Liang, Y

Allen-Zhu, Z., Li, Y. and Liang, Y. (2019).Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers.Advances in Neural Information Processing Systems, 32

work page 2019

[4] [4]

S., Sohl-Dickstein, J

Bahri, Y., Kadmon, J., Pennington, J., Schoenholz, S. S., Sohl-Dickstein, J. and Ganguli, S. (2020).Statisti- cal Mechanics of Deep Learning.Annual Review of Condensed Matter Physics, 11, 501–528

work page 2020

[5] [5]

L., Long, P

Bartlett, P . L., Long, P . M., Lugosi, G. and Tsigler, A. (2020).Benign Overfitting in Linear Regression. Proceedings of the National Academy of Sciences, 117, 30063–30070

work page 2020

[6] [6]

L., Montanari, A

Bartlett, P . L., Montanari, A. and Rakhlin, A. (2021). Deep Learning: a Statistical Viewpoint.Acta Numer- ica, 30, 87–201

work page 2021

[7] [7]

and Mandal, S

Belkin, M., Hsu, D., Ma, S. and Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off.Proc. Natl. Acad. Sci. USA, 116, 15849–15854

work page 2019

[8] [8]

and van der Schaar, M

Bellot, A. and van der Schaar, M. (2019).Conditional Independence Testing using Generative Adversarial Networks.Advances in Neural Information Processing Systems, 32

work page 2019

[9] [9]

Berrett, T . B. and Samworth, R. J. (2019).Nonpara- metric independence testing via mutual information. Biometrika, 106, 547–566

work page 2019

[10] [10]

(1950).On a Measure of Dependence between Two Random Variables.Annals of Mathemat- ical statistics, 21(4), 593–600

Blomqvist, N. (1950).On a Measure of Dependence between Two Random Variables.Annals of Mathemat- ical statistics, 21(4), 593–600

work page 1950

[11] [11]

and Friedman, J

Breiman, L. and Friedman, J. H. (1985).Estimating Optimal Transformations for Multiple Regression and Correlation.Journal of the American Statistical Association, 80(391), 580–598

work page 1985

[12] [12]

Cover, T . M. and Thomas, J. A. (2006).Elements of Information Theory, 2nd ed. Wiley, New York

work page 2006

[13] [13]

Dawid, A. P . (1979).Conditional independence in sta- tistical theory.Journal of the Royal Statistical Society: Series B, 41, 1–15

work page 1979

[14] [14]

(1979).La fonction de dépendance em- pirique et ses propriétés

Deheuvels, P . (1979).La fonction de dépendance em- pirique et ses propriétés. Un test non paramétrique d’indépendance.Bulletins de l’Académie Royale de Bel- gique, 65, 274–292

work page 1979

[15] [15]

and Shepp, L

Dembo, A., Kagan, A. and Shepp, L. A. (2001).Remarks on the Maximum Correlation Coefficient.Bernoulli, 7(2), 343–350

work page 2001

[16] [16]

and Lugosi, G

Devroye, L., Györfi, L. and Lugosi, G. (1996).A Proba- bilistic Theory of Pattern Recognition. Springer, New York

work page 1996

[17] [17]

and Kotz, S

Drouet Mari, D. and Kotz, S. (2001).Correlation and Dependence. Imperial College Press

work page 2001

[18] [18]

and Zhong, Y

Fan, J., Ma, C. and Zhong, Y. (2021).A Selective Overview of Deep Learning.Statistical Science, 36, 264–290

work page 2021

[19] [19]

(1941).Das statistische Problem der Ko- rrelation als Variations- und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung

Gebelein, H. (1941).Das statistische Problem der Ko- rrelation als Variations- und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung. Zeitschrift für Angewandte Mathematik und Mechanik, 21, 364–379

work page 1941

[20] [20]

Towards a universal representation of statistical dependence

Geenens, G. (2023).Towards a universal repre- sentation of statistical dependence.arXiv preprint arXiv:2302.08151

work page arXiv 2023

[21] [21]

and Lafaye de Micheaux, P

Geenens, G. and Lafaye de Micheaux, P . (2022).The Hellinger Correlation.Journal of the American Statis- tical Association, 117, 639–653

work page 2022

[22] [22]

and Boies, J

Genest, C. and Boies, J. C. (2003).Detecting Depen- dence with Kendall Plots.The American Statistician, 57(4), 275–284

work page 2003

[23] [23]

and Polyanskiy, Y

Gerber, P .R., Han, Y. and Polyanskiy, Y. (2023).Mini- max optimal testing via classification.Proceedings of Machine Learning Research, 195:1–38

work page 2023

[24] [24]

and Courville, A

Goodfellow, I., Bengio, Y. and Courville, A. (2016).Deep Learning. MIT Press

work page 2016

[25] [25]

and Schölkopf, B

Gretton, A., Bousquet, O., Smola, A. and Schölkopf, B. (2005).Measuring Statistical Dependence with Hilbert-Schmidt Norms. InProceedings of the 16th In- ternational Conference on Algorithmic Learning Theory, 63–77

work page 2005

[26] [26]

H., Song, L., Schölkopf, B

Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B. and Smola, A. J. (2008).A kernel sta- tistical test of independence. InAdvances in Neural Information Processing Systems, 585–592

work page 2008

[27] [27]

and Elisseeff, A

Guyon, I. and Elisseeff, A. (2003).An introduction to variable and feature selection.Journal of Machine Learning Research, 3, 1157–1182

work page 2003

[28] [28]

Härdle, W . K. and Simar, L. (2007).Applied Multivari- ate Analysis, 2nd ed. Springer

work page 2007

[29] [29]

H., Rouhani, M., Fayyaz, M

Hasanpour, S. H., Rouhani, M., Fayyaz, M. and Sabokrou, M. (2016).Lets keep it simple, using simple architectures to outperform deeper and more com- plex architectures.arXiv preprintarXiv:1608.06037

work page arXiv 2016

[30] [30]

and Friedman, J

Hastie, T ., Tibshirani, R. and Friedman, J. (2013).The Elements of Statistical Learning: Data Mining, Infer- ence, and Prediction. Springer

work page 2013

[31] [31]

and Strimmer, K

Hausser, J. and Strimmer, K. (2015).Estimation of Entropy, Mutual Information and Related Quantities. Rpackage, version 1.2.1

work page 2015

[32] [32]

and Gorfine, M

Heller, R., Heller, Y. and Gorfine, M. (2012).A Consis- tent Multivariate Test of Association Based on Ranks of Distances.arXiv preprintarXiv:1201.3522

work page arXiv 2012

[33] [33]

(1948).A Non-Parametric Test of In- dependence.Annals of Mathematical statistics, 19(4), 546–557

Hoeffding, W . (1948).A Non-Parametric Test of In- dependence.Annals of Mathematical statistics, 19(4), 546–557

work page 1948

[34] [34]

(2014).Dependence Modeling with Copulas

Joe, H. (2014).Dependence Modeling with Copulas. Chapman and Hall/CRC, Boca Raton, FL

work page 2014

[35] [35]

Kallenberg, W . C. M. and Ledwina, T . (1999).Data- Driven Rank Tests for Independence.Journal of the American Statistical Association, 94(445), 285–301

work page 1999

[36] [36]

Kendall, M. G. (1938).A new measure of rank corre- lation.Biometrika, 30(1–2), 81–89

work page 1938

[37] [37]

Kendall, M. G. and Buckland, W . R. (1971).A Dictio- nary of Statistical Terms, 3rd ed. Hafner, New York

work page 1971

[38] [38]

Kinney, J. B. and Atwal, G. S. (2014).Equitability, 17 mutual information and the maximal information coefficient.Proceedings of the National Academy of Sciences, 111, 3354–3359

work page 2014

[39] [39]

Kozachenko, L. F . and Leonenko, N. N. (1987).Sample estimate of the entropy of a random vector.Problems of Information Transmission, 23, 95–101

work page 1987

[40] [40]

and Tran, V

Lafaye de Micheaux, P . and Tran, V . (2016).PoweR: A Reproducible Research Tool to Ease Monte Carlo Power Simulation Studies for Goodness-of-fit Tests inR.Journal of Statistical Software, 69(3)

work page 2016

[41] [41]

and Haffner, P

LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P . (1998). Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86, 2278–2324

work page 1998

[42] [42]

and Hinton, G

LeCun, Y., Bengio, Y. and Hinton, G. (2015).Deep learning.Nature, 521, 436–444

work page 2015

[43] [43]

Lehmann, E. L. and Romano, J. P . (2005).Testing Statistical Hypotheses. Springer, New York

work page 2005

[44] [44]

Li, J. J. and Tong, X. (2020).Statistical hypothesis testing versus machine learning binary classification: Distinctions and guidelines.Patterns, 1(7)

work page 2020

[45] [45]

and Nazarathy, Y

Liquet, B., Moka, S. and Nazarathy, Y. (2024).Mathe- matical Engineering of Deep Learning. CRC Press

work page 2024

[46] [46]

Linfoot, E. H. (1957).An Informational Measure of Correlation.Information and Control, 1(1), 85–89

work page 1957

[47] [47]

and Schölkopf, B

Lopez-Paz, D., Hennig, P . and Schölkopf, B. (2013).The Randomised Dependence Coefficient.arXiv preprint arXiv:1304.7717

work page arXiv 2013

[48] [48]

Nelsen, R. B. (2006).An Introduction to Copulas. Springer, New York

work page 2006

[49] [49]

B., Quesada-Molina, J

Nelsen, R. B., Quesada-Molina, J. J., Rodriguez-Lallena, J. A. and Ubeda-Flores, M. (2003).Kendall distribution functions.statistics and Probability Letters, 65, 263– 268

work page 2003

[50] [50]

and Pearson, E

Neyman, J. and Pearson, E. S. (1933).IX. On the problem of the most efficient tests of statistical hypotheses.Philosophical Transactions of the Royal Society of London. Series A, 231, 289–337

work page 1933

[51] [51]

and Pohl, K

Paschali, M., Zhao, Q., Adeli, E. and Pohl, K. M. (2022).Bridging the gap between deep learning and hypothesis-driven analysis via permutation testing. In Rekik, I., Adeli, E., Park, S. H. and Cintas, C. (eds), Predictive Intelligence in Medicine. PRIME 2022.Lecture Notes in Computer Science, 13564, 13–23. Springer, Cham

work page 2022

[52] [52]

and Shekhar, S

Pandeva, T ., Forré, P ., Ramdas, A. and Shekhar, S. (2024).Deep anytime-valid hypothesis testing.Pro- ceedings of the 27th International Conference on Arti- ficial Intelligence and Statistics (AISTATS),Proceedings of Machine Learning Research, 238, 622–630

work page 2024

[53] [53]

and Peters, J

Pfister, N., Bühlmann, P ., Schölkopf, B. and Peters, J. (2018).Kernel-Based Tests for Joint Independence. Journal of the Royal Statistical Society: Series B, 80, 5– 31

work page 2018

[54] [54]

and Peters, J

Pfister, N. and Peters, J. (2019).dHSIC: Independence Testing via Hilbert-Schmidt Independence Criterion. Rpackage, version 2.1

work page 2019

[55] [55]

(1959).On Measures of Dependence

Rényi, A. (1959).On Measures of Dependence. Acta Mathematica Academiae Scientiarum Hungaricae, 10(3–4), 441–451

work page 1959

[56] [56]

N., Reshef, Y

Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P . J., Lander, E. S., Mitzenmacher, M. and Sabeti, P . C. (2011).Detecting Novel Associations in Large Data Sets.Science, 334, 1518–1524

work page 2011

[57] [57]

A., Reshef, D

Reshef, Y. A., Reshef, D. N., Finucane, H. K., Sabeti, P . C. and Mitzenmacher, M. (2016).Measuring depen- dence powerfully and equitably.Journal of Machine Learning Research, 17, 1–63

work page 2016

[58] [58]

and Tong, X

Rigollet, P . and Tong, X. (2011).Neyman–Pearson classification, convexity and stochastic constraints. Journal of Machine Learning Research, 12, 2831–2855

work page 2011

[59] [59]

Sakib, S., Ahmed, N., Kabir, A. J. and Ahmed, H. (2018). An Overview of Convolutional Neural Network: Its Architecture and Applications.Preprints, 2018110546

work page 2018

[60] [60]

(1984).On measures of concordance

Scarsini, M. (1984).On measures of concordance. Stochastica, 8, 201–218

work page 1984

[61] [61]

and Wolff, E

Schweizer, B. and Wolff, E. F . (1981).On Nonparamet- ric Measures of Dependence for Random Variables. Annals of statistics, 9, 879–885

work page 1981

[62] [62]

(2003).Mathematical statistics

Shao, J. (2003).Mathematical statistics. Springer, New York

work page 2003

[63] [63]

and Zhang, J

Shao, X. and Zhang, J. (2014).Martingale Difference Correlation and Its Use in High-Dimensional Variable Screening.Journal of the American Statistical Associa- tion, 109(507), 1302–1318

work page 2014

[64] [64]

Sejnowski, T . J. (2020).The unreasonable effectiveness of deep learning in artificial intelligence.Proceedings of the National Academy of Sciences, 117, 30033–30038

work page 2020

[65] [65]

(1904).The Proof and Measurement of Association between Two Things.The American Journal of Psychology, 15(1), 72–101

Spearman, C. (1904).The Proof and Measurement of Association between Two Things.The American Journal of Psychology, 15(1), 72–101

work page 1904

[66] [66]

and Cheng, G

Suh, N. and Cheng, G. (2025).A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models.Annual Review of statistics and Its Application, 12, 177–207

work page 2025

[67] [67]

J., Rizzo, M

Székely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing independence by correlation of distances.Annals of statistics, 35, 2769–2794

work page 2007

[68] [68]

(2013).A plug-in approach to Neyman– Pearson classification.Journal of Machine Learning Research, 14, 3011–3040

Tong, X. (2013).A plug-in approach to Neyman– Pearson classification.Journal of Machine Learning Research, 14, 3011–3040

work page 2013

[69] [69]

and Zhao, A

Tong, X., Feng, Y. and Zhao, A. (2016).A survey on Neyman–Pearson classification and suggestions for future research.WIREs Computational statistics, 8, 64– 81

work page 2016

[70] [70]

and Feng, Y

Tong, X., Xia, L., Wang, J. and Feng, Y. (2020).Neyman– Pearson classification: parametrics and sample size requirement.Journal of Machine Learning Research, 21, 1–48

work page 2020

[71] [71]

and Hutson, A

Vexler, A., Chen, X. and Hutson, A. D. (2017).Depen- dence and Independence Structure and Inference. Statistical Methods in Medical Research, 26(5), 2114– 18 2132

work page 2017

[72] [72]

and Zhong, P .-S

Yang, Y., Zhang, K. and Zhong, P .-S. (2025).Testing conditional independence with deep neural network based binary expansion testing (DeepBET).Proceed- ings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS),Proceedings of Machine Learning Research, 258, 4690–4698

work page 2025

[73] [73]

− r 1−V (ℓ)2 k , r 1−V (ℓ)2 k #, ifV (ℓ) k <c (ℓ) −1, the mixture 0.5U

Xu, N., Liu, F . and Sutherland, D. J. (2026).Learning representations for independence testing.Transac- tions on Machine Learning Research. 19 APPENDIX A. Generation of the training sets Here we provide the essential algorithms – that is, the data-generating-process (DGP) formulae – used to generate independent samples (‘units’) ofnpairs of i.i.d. observ...

work page arXiv 2026