pith. sign in

arxiv: 2604.26558 · v1 · submitted 2026-04-29 · 📊 stat.ML · cs.LG· stat.ME

Deep-testing: the case of dependence detection

Pith reviewed 2026-05-07 12:40 UTC · model grok-4.3

classification 📊 stat.ML cs.LGstat.ME
keywords independence testingdeep learninghypothesis testingsimulation studydependence detectionneural network classifiertest statisticpower comparison
0
0 comments X

The pith

A neural network trained on simulated null and alternative samples produces a test statistic that achieves the highest overall power for independence testing against nineteen competing methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes to treat hypothesis testing as a classification task solved by deep learning. A neural network is trained on simulated samples drawn under the null hypothesis and under alternatives, then the learned map from sample to classification score becomes the test statistic. Applied as a proof of concept to the problem of testing independence, the procedure is compared in a large-scale simulation study against nineteen existing methods across many complex dependence structures. If successful, the approach supplies a flexible way to construct powerful tests without needing closed-form expressions for the null distribution or the alternative. A reader would care because it suggests deep learning can be transferred from image classification to core statistical inference tasks.

Core claim

Deep-testing approaches the classical problem of hypothesis testing by training a deep neural network on simulated data satisfying the null and alternative hypotheses; the resulting classification map serves as the test statistic and leverages the network's strong discriminating power to produce a highly powerful test. As a proof of concept the method is applied to independence testing, where a large-scale simulation study shows that deep-testing attains the highest overall power among nineteen competing procedures across a broad range of complex dependence structures.

What carries the argument

The classification map learned by a deep neural network trained on simulated samples from the null and alternative hypotheses, used directly as the test statistic.

If this is right

  • The learned classifier can serve as a test statistic for independence without requiring explicit formulas for the null distribution.
  • High power is maintained across a wide variety of dependence structures that are difficult for traditional tests.
  • The procedure offers a general template that can be applied to other hypothesis-testing problems by changing the simulation protocol.
  • Performance gains arise from the network's ability to extract discriminating features directly from the sample geometry.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same training strategy could be used to construct tests for multivariate dependence or for conditional independence by adjusting the simulation design.
  • One would still need to verify that the p-value obtained from the network output is calibrated under the null on data sets whose marginal distributions differ from those used in training.
  • Hybrid methods that combine the network-based statistic with classical rank-based tests might improve robustness when sample sizes are small.

Load-bearing premise

A neural network trained on simulated null and alternative samples will produce a test statistic whose null distribution can be reliably calibrated and that generalizes to yield valid and powerful tests on real data.

What would settle it

Applying the trained classifier to fresh independent samples drawn from the same distributions used in training and checking whether the empirical rejection rate at the nominal level equals the target significance level.

Figures

Figures reproduced from arXiv: 2604.26558 by Gery Geenens, Ivan Muyun Zou, Pierre Lafaye de Micheaux.

Figure 1
Figure 1. Figure 1: Bivariate histogram representation (right) of a typical ‘parabola’ view at source ↗
Figure 2
Figure 2. Figure 2: Typical samples of size n = 400 generated from the training models 1–10; from top left to bottom right: Linear, Diamond, Triangle, Crescent, Points, Exponential, Circles, Cross, Wedge, Cubic view at source ↗
Figure 4
Figure 4. Figure 4: Convolutional neural network architecture view at source ↗
Figure 5
Figure 5. Figure 5: Feed-forward neural network architecture view at source ↗
Figure 6
Figure 6. Figure 6: Neural network architecture All-CNN-MLP using both dependence indicators (and sample size) and images as input features (Scenario 3). TensorFlow and Keras as the backend. All three networks were implemented in R using keras, with reticulate providing the interface to Python’s tensorflow.keras backend. They were trained under the same configu￾ration: Adam optimiser (learning_rate=10−3 , β1 = 0.9, β2 = 0.999… view at source ↗
Figure 7
Figure 7. Figure 7: Representative samples of size n = 400 generated from six novel dependence patterns; from top left to bottom right: Laplace, Ishigami, Tree Ring, Variance, Infinity, Pi. each such indicator, and near-exact critical values for each of them may be deduced. This means that, by construction, all the procedures under comparison – both our three deep-tests and the tests based on the individual indicators – have … view at source ↗
Figure 8
Figure 8. Figure 8: (Monte-Carlo) Power of the proposed 3 deep-testing procedures and view at source ↗
Figure 9
Figure 9. Figure 9: (Monte-Carlo approximated) Power of the proposed 3 deep view at source ↗
read the original abstract

Deep learning methods have proved highly effective for classification and image recognition problems. In this paper, we ask whether this success can be transferred to hypothesis testing: if a neural network can distinguish, for example, an image of a handwritten digit from another, can it also distinguish an "image of a sample" (such as a scatter plot) generated under a given statistical model from one generated outside that model? Motivated by this idea, we propose a novel procedure called deep-testing, which approaches the classical inferential problem of hypothesis testing through deep learning. More specifically, the test statistic is a classification map learned by a deep neural network from simulated data satisfying the null and alternative hypotheses, leveraging its strong discriminating power to construct a highly powerful test. As a proof of concept, we apply deep-testing to the problem of independence testing, arguably one of the most important problems in statistics. In a large-scale simulation study, deep-testing achieves the highest overall power against nineteen competing methods across a broad range of complex dependence structures, confirming the viability of the proposed approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes deep-testing, a hypothesis testing framework that trains a deep neural network on simulated data drawn from the null and alternative distributions to learn a classification map serving as the test statistic. As a proof of concept, the method is applied to independence testing; a large-scale simulation study reports that deep-testing attains the highest overall power among nineteen competing procedures across a range of complex dependence structures.

Significance. If the simulation results are obtained under strict separation between training and evaluation distributions and the procedure maintains valid type-I error control, the work demonstrates that deep-learning classifiers can be repurposed as powerful, flexible test statistics for nonparametric problems where analytic forms are unavailable. The empirical breadth of the study provides concrete evidence that the approach is viable for dependence detection, though its broader utility hinges on generalization beyond the simulated regimes.

major comments (2)
  1. [Simulation study] Simulation study section: the manuscript does not explicitly state whether the dependence structures (or their parameterizations) used to generate the training alternatives are disjoint from those used to evaluate power. Without this separation, the reported power ranking could reflect the network's ability to exploit simulation-specific artifacts rather than a general advantage over the nonparametric competitors.
  2. [Method] Method section (around the definition of the test statistic): it is unclear how the threshold for the learned classifier output is calibrated to guarantee finite-sample or asymptotic type-I error control. Because the network is trained on external simulated data, the null distribution of the resulting statistic is not automatically pivotal and requires a separate calibration step whose details are not provided.
minor comments (2)
  1. [Abstract] The abstract lists 'nineteen competing methods' without naming them or providing a reference table; adding this information would allow readers to assess the breadth of the comparison immediately.
  2. [Notation] Notation for the network output (e.g., the precise mapping from classifier probability to test statistic) should be introduced with an equation number in the methods section to facilitate later discussion of calibration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address the major comments point by point below.

read point-by-point responses
  1. Referee: [Simulation study] Simulation study section: the manuscript does not explicitly state whether the dependence structures (or their parameterizations) used to generate the training alternatives are disjoint from those used to evaluate power. Without this separation, the reported power ranking could reflect the network's ability to exploit simulation-specific artifacts rather than a general advantage over the nonparametric competitors.

    Authors: We thank the referee for this observation. Upon checking, the training alternatives were generated from dependence structures and parameterizations that are disjoint from the evaluation set to avoid any potential for the network to exploit simulation-specific features. We will revise the manuscript to explicitly state this separation in the Simulation study section, including a description of the distinct sets used for training and evaluation. revision: yes

  2. Referee: [Method] Method section (around the definition of the test statistic): it is unclear how the threshold for the learned classifier output is calibrated to guarantee finite-sample or asymptotic type-I error control. Because the network is trained on external simulated data, the null distribution of the resulting statistic is not automatically pivotal and requires a separate calibration step whose details are not provided.

    Authors: We agree that additional details on threshold calibration are necessary. The procedure involves simulating a large number of samples under the null hypothesis after training, computing the classifier outputs on these samples, and determining the threshold as the appropriate quantile to achieve the desired type-I error rate. This ensures finite-sample control. We will update the Method section to provide a complete description of this calibration process, along with theoretical justification for its validity. revision: yes

Circularity Check

0 steps flagged

No significant circularity in deep-testing procedure or simulation claims

full rationale

The paper defines deep-testing by training a neural network classifier on independently generated simulated samples drawn from the null (independence) and chosen alternatives, then uses the resulting classification map as the test statistic. Power is assessed via a separate large-scale Monte Carlo study that applies the trained statistic to fresh draws from a range of dependence structures and compares rejection rates against 19 other methods. No equation or claim reduces by construction to a parameter fitted on the same data being tested, no self-citation supplies a load-bearing uniqueness result, and the training simulations are external to any real-data application. The derivation is therefore self-contained as a standard empirical simulation-based procedure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that neural networks can reliably discriminate between distributions when trained on simulated samples, and that the resulting classifier yields a valid test when applied to real data. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption A deep neural network trained on simulated samples can learn a discriminating map between null and alternative distributions that generalizes to produce a powerful test statistic on real data.
    This is the central modeling assumption required for the method to work.

pith-pipeline@v0.9.0 · 5487 in / 1280 out tokens · 58378 ms · 2026-05-07T12:40:32.768763+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages

  1. [1]

    and Stegun, I

    Abramowitz, M. and Stegun, I. A. (1965).Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. US Department of Commerce, National Bureau of Standards, Applied Mathematics Series 55

  2. [2]

    Albawi, S., Mohammed, T . A. and Al-Zawi, S. (2017). Understanding of a convolutional neural network. InProceedings of the 2017 International Conference on Engineering and Technology (ICET). 16

  3. [3]

    and Liang, Y

    Allen-Zhu, Z., Li, Y. and Liang, Y. (2019).Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers.Advances in Neural Information Processing Systems, 32

  4. [4]

    S., Sohl-Dickstein, J

    Bahri, Y., Kadmon, J., Pennington, J., Schoenholz, S. S., Sohl-Dickstein, J. and Ganguli, S. (2020).Statisti- cal Mechanics of Deep Learning.Annual Review of Condensed Matter Physics, 11, 501–528

  5. [5]

    L., Long, P

    Bartlett, P . L., Long, P . M., Lugosi, G. and Tsigler, A. (2020).Benign Overfitting in Linear Regression. Proceedings of the National Academy of Sciences, 117, 30063–30070

  6. [6]

    L., Montanari, A

    Bartlett, P . L., Montanari, A. and Rakhlin, A. (2021). Deep Learning: a Statistical Viewpoint.Acta Numer- ica, 30, 87–201

  7. [7]

    and Mandal, S

    Belkin, M., Hsu, D., Ma, S. and Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off.Proc. Natl. Acad. Sci. USA, 116, 15849–15854

  8. [8]

    and van der Schaar, M

    Bellot, A. and van der Schaar, M. (2019).Conditional Independence Testing using Generative Adversarial Networks.Advances in Neural Information Processing Systems, 32

  9. [9]

    Berrett, T . B. and Samworth, R. J. (2019).Nonpara- metric independence testing via mutual information. Biometrika, 106, 547–566

  10. [10]

    (1950).On a Measure of Dependence between Two Random Variables.Annals of Mathemat- ical statistics, 21(4), 593–600

    Blomqvist, N. (1950).On a Measure of Dependence between Two Random Variables.Annals of Mathemat- ical statistics, 21(4), 593–600

  11. [11]

    and Friedman, J

    Breiman, L. and Friedman, J. H. (1985).Estimating Optimal Transformations for Multiple Regression and Correlation.Journal of the American Statistical Association, 80(391), 580–598

  12. [12]

    Cover, T . M. and Thomas, J. A. (2006).Elements of Information Theory, 2nd ed. Wiley, New York

  13. [13]

    Dawid, A. P . (1979).Conditional independence in sta- tistical theory.Journal of the Royal Statistical Society: Series B, 41, 1–15

  14. [14]

    (1979).La fonction de dépendance em- pirique et ses propriétés

    Deheuvels, P . (1979).La fonction de dépendance em- pirique et ses propriétés. Un test non paramétrique d’indépendance.Bulletins de l’Académie Royale de Bel- gique, 65, 274–292

  15. [15]

    and Shepp, L

    Dembo, A., Kagan, A. and Shepp, L. A. (2001).Remarks on the Maximum Correlation Coefficient.Bernoulli, 7(2), 343–350

  16. [16]

    and Lugosi, G

    Devroye, L., Györfi, L. and Lugosi, G. (1996).A Proba- bilistic Theory of Pattern Recognition. Springer, New York

  17. [17]

    and Kotz, S

    Drouet Mari, D. and Kotz, S. (2001).Correlation and Dependence. Imperial College Press

  18. [18]

    and Zhong, Y

    Fan, J., Ma, C. and Zhong, Y. (2021).A Selective Overview of Deep Learning.Statistical Science, 36, 264–290

  19. [19]

    (1941).Das statistische Problem der Ko- rrelation als Variations- und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung

    Gebelein, H. (1941).Das statistische Problem der Ko- rrelation als Variations- und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung. Zeitschrift für Angewandte Mathematik und Mechanik, 21, 364–379

  20. [20]

    Towards a universal representation of statistical dependence

    Geenens, G. (2023).Towards a universal repre- sentation of statistical dependence.arXiv preprint arXiv:2302.08151

  21. [21]

    and Lafaye de Micheaux, P

    Geenens, G. and Lafaye de Micheaux, P . (2022).The Hellinger Correlation.Journal of the American Statis- tical Association, 117, 639–653

  22. [22]

    and Boies, J

    Genest, C. and Boies, J. C. (2003).Detecting Depen- dence with Kendall Plots.The American Statistician, 57(4), 275–284

  23. [23]

    and Polyanskiy, Y

    Gerber, P .R., Han, Y. and Polyanskiy, Y. (2023).Mini- max optimal testing via classification.Proceedings of Machine Learning Research, 195:1–38

  24. [24]

    and Courville, A

    Goodfellow, I., Bengio, Y. and Courville, A. (2016).Deep Learning. MIT Press

  25. [25]

    and Schölkopf, B

    Gretton, A., Bousquet, O., Smola, A. and Schölkopf, B. (2005).Measuring Statistical Dependence with Hilbert-Schmidt Norms. InProceedings of the 16th In- ternational Conference on Algorithmic Learning Theory, 63–77

  26. [26]

    H., Song, L., Schölkopf, B

    Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B. and Smola, A. J. (2008).A kernel sta- tistical test of independence. InAdvances in Neural Information Processing Systems, 585–592

  27. [27]

    and Elisseeff, A

    Guyon, I. and Elisseeff, A. (2003).An introduction to variable and feature selection.Journal of Machine Learning Research, 3, 1157–1182

  28. [28]

    Härdle, W . K. and Simar, L. (2007).Applied Multivari- ate Analysis, 2nd ed. Springer

  29. [29]

    H., Rouhani, M., Fayyaz, M

    Hasanpour, S. H., Rouhani, M., Fayyaz, M. and Sabokrou, M. (2016).Lets keep it simple, using simple architectures to outperform deeper and more com- plex architectures.arXiv preprintarXiv:1608.06037

  30. [30]

    and Friedman, J

    Hastie, T ., Tibshirani, R. and Friedman, J. (2013).The Elements of Statistical Learning: Data Mining, Infer- ence, and Prediction. Springer

  31. [31]

    and Strimmer, K

    Hausser, J. and Strimmer, K. (2015).Estimation of Entropy, Mutual Information and Related Quantities. Rpackage, version 1.2.1

  32. [32]

    and Gorfine, M

    Heller, R., Heller, Y. and Gorfine, M. (2012).A Consis- tent Multivariate Test of Association Based on Ranks of Distances.arXiv preprintarXiv:1201.3522

  33. [33]

    (1948).A Non-Parametric Test of In- dependence.Annals of Mathematical statistics, 19(4), 546–557

    Hoeffding, W . (1948).A Non-Parametric Test of In- dependence.Annals of Mathematical statistics, 19(4), 546–557

  34. [34]

    (2014).Dependence Modeling with Copulas

    Joe, H. (2014).Dependence Modeling with Copulas. Chapman and Hall/CRC, Boca Raton, FL

  35. [35]

    Kallenberg, W . C. M. and Ledwina, T . (1999).Data- Driven Rank Tests for Independence.Journal of the American Statistical Association, 94(445), 285–301

  36. [36]

    Kendall, M. G. (1938).A new measure of rank corre- lation.Biometrika, 30(1–2), 81–89

  37. [37]

    Kendall, M. G. and Buckland, W . R. (1971).A Dictio- nary of Statistical Terms, 3rd ed. Hafner, New York

  38. [38]

    Kinney, J. B. and Atwal, G. S. (2014).Equitability, 17 mutual information and the maximal information coefficient.Proceedings of the National Academy of Sciences, 111, 3354–3359

  39. [39]

    Kozachenko, L. F . and Leonenko, N. N. (1987).Sample estimate of the entropy of a random vector.Problems of Information Transmission, 23, 95–101

  40. [40]

    and Tran, V

    Lafaye de Micheaux, P . and Tran, V . (2016).PoweR: A Reproducible Research Tool to Ease Monte Carlo Power Simulation Studies for Goodness-of-fit Tests inR.Journal of Statistical Software, 69(3)

  41. [41]

    and Haffner, P

    LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P . (1998). Gradient-based learning applied to document recog- nition.Proceedings of the IEEE, 86, 2278–2324

  42. [42]

    and Hinton, G

    LeCun, Y., Bengio, Y. and Hinton, G. (2015).Deep learning.Nature, 521, 436–444

  43. [43]

    Lehmann, E. L. and Romano, J. P . (2005).Testing Statistical Hypotheses. Springer, New York

  44. [44]

    Li, J. J. and Tong, X. (2020).Statistical hypothesis testing versus machine learning binary classification: Distinctions and guidelines.Patterns, 1(7)

  45. [45]

    and Nazarathy, Y

    Liquet, B., Moka, S. and Nazarathy, Y. (2024).Mathe- matical Engineering of Deep Learning. CRC Press

  46. [46]

    Linfoot, E. H. (1957).An Informational Measure of Correlation.Information and Control, 1(1), 85–89

  47. [47]

    and Schölkopf, B

    Lopez-Paz, D., Hennig, P . and Schölkopf, B. (2013).The Randomised Dependence Coefficient.arXiv preprint arXiv:1304.7717

  48. [48]

    Nelsen, R. B. (2006).An Introduction to Copulas. Springer, New York

  49. [49]

    B., Quesada-Molina, J

    Nelsen, R. B., Quesada-Molina, J. J., Rodriguez-Lallena, J. A. and Ubeda-Flores, M. (2003).Kendall distribution functions.statistics and Probability Letters, 65, 263– 268

  50. [50]

    and Pearson, E

    Neyman, J. and Pearson, E. S. (1933).IX. On the problem of the most efficient tests of statistical hypotheses.Philosophical Transactions of the Royal Society of London. Series A, 231, 289–337

  51. [51]

    and Pohl, K

    Paschali, M., Zhao, Q., Adeli, E. and Pohl, K. M. (2022).Bridging the gap between deep learning and hypothesis-driven analysis via permutation testing. In Rekik, I., Adeli, E., Park, S. H. and Cintas, C. (eds), Predictive Intelligence in Medicine. PRIME 2022.Lecture Notes in Computer Science, 13564, 13–23. Springer, Cham

  52. [52]

    and Shekhar, S

    Pandeva, T ., Forré, P ., Ramdas, A. and Shekhar, S. (2024).Deep anytime-valid hypothesis testing.Pro- ceedings of the 27th International Conference on Arti- ficial Intelligence and Statistics (AISTATS),Proceedings of Machine Learning Research, 238, 622–630

  53. [53]

    and Peters, J

    Pfister, N., Bühlmann, P ., Schölkopf, B. and Peters, J. (2018).Kernel-Based Tests for Joint Independence. Journal of the Royal Statistical Society: Series B, 80, 5– 31

  54. [54]

    and Peters, J

    Pfister, N. and Peters, J. (2019).dHSIC: Independence Testing via Hilbert-Schmidt Independence Criterion. Rpackage, version 2.1

  55. [55]

    (1959).On Measures of Dependence

    Rényi, A. (1959).On Measures of Dependence. Acta Mathematica Academiae Scientiarum Hungaricae, 10(3–4), 441–451

  56. [56]

    N., Reshef, Y

    Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P . J., Lander, E. S., Mitzenmacher, M. and Sabeti, P . C. (2011).Detecting Novel Associations in Large Data Sets.Science, 334, 1518–1524

  57. [57]

    A., Reshef, D

    Reshef, Y. A., Reshef, D. N., Finucane, H. K., Sabeti, P . C. and Mitzenmacher, M. (2016).Measuring depen- dence powerfully and equitably.Journal of Machine Learning Research, 17, 1–63

  58. [58]

    and Tong, X

    Rigollet, P . and Tong, X. (2011).Neyman–Pearson classification, convexity and stochastic constraints. Journal of Machine Learning Research, 12, 2831–2855

  59. [59]

    Sakib, S., Ahmed, N., Kabir, A. J. and Ahmed, H. (2018). An Overview of Convolutional Neural Network: Its Architecture and Applications.Preprints, 2018110546

  60. [60]

    (1984).On measures of concordance

    Scarsini, M. (1984).On measures of concordance. Stochastica, 8, 201–218

  61. [61]

    and Wolff, E

    Schweizer, B. and Wolff, E. F . (1981).On Nonparamet- ric Measures of Dependence for Random Variables. Annals of statistics, 9, 879–885

  62. [62]

    (2003).Mathematical statistics

    Shao, J. (2003).Mathematical statistics. Springer, New York

  63. [63]

    and Zhang, J

    Shao, X. and Zhang, J. (2014).Martingale Difference Correlation and Its Use in High-Dimensional Variable Screening.Journal of the American Statistical Associa- tion, 109(507), 1302–1318

  64. [64]

    Sejnowski, T . J. (2020).The unreasonable effectiveness of deep learning in artificial intelligence.Proceedings of the National Academy of Sciences, 117, 30033–30038

  65. [65]

    (1904).The Proof and Measurement of Association between Two Things.The American Journal of Psychology, 15(1), 72–101

    Spearman, C. (1904).The Proof and Measurement of Association between Two Things.The American Journal of Psychology, 15(1), 72–101

  66. [66]

    and Cheng, G

    Suh, N. and Cheng, G. (2025).A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models.Annual Review of statistics and Its Application, 12, 177–207

  67. [67]

    J., Rizzo, M

    Székely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing independence by correlation of distances.Annals of statistics, 35, 2769–2794

  68. [68]

    (2013).A plug-in approach to Neyman– Pearson classification.Journal of Machine Learning Research, 14, 3011–3040

    Tong, X. (2013).A plug-in approach to Neyman– Pearson classification.Journal of Machine Learning Research, 14, 3011–3040

  69. [69]

    and Zhao, A

    Tong, X., Feng, Y. and Zhao, A. (2016).A survey on Neyman–Pearson classification and suggestions for future research.WIREs Computational statistics, 8, 64– 81

  70. [70]

    and Feng, Y

    Tong, X., Xia, L., Wang, J. and Feng, Y. (2020).Neyman– Pearson classification: parametrics and sample size requirement.Journal of Machine Learning Research, 21, 1–48

  71. [71]

    and Hutson, A

    Vexler, A., Chen, X. and Hutson, A. D. (2017).Depen- dence and Independence Structure and Inference. Statistical Methods in Medical Research, 26(5), 2114– 18 2132

  72. [72]

    and Zhong, P .-S

    Yang, Y., Zhang, K. and Zhong, P .-S. (2025).Testing conditional independence with deep neural network based binary expansion testing (DeepBET).Proceed- ings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS),Proceedings of Machine Learning Research, 258, 4690–4698

  73. [73]

    − r 1−V (ℓ)2 k , r 1−V (ℓ)2 k #, ifV (ℓ) k <c (ℓ) −1, the mixture 0.5U

    Xu, N., Liu, F . and Sutherland, D. J. (2026).Learning representations for independence testing.Transac- tions on Machine Learning Research. 19 APPENDIX A. Generation of the training sets Here we provide the essential algorithms – that is, the data-generating-process (DGP) formulae – used to generate independent samples (‘units’) ofnpairs of i.i.d. observ...