pith. sign in

arxiv: 2401.16407 · v2 · submitted 2024-01-29 · 📊 stat.ML · cs.LG· eess.IV· eess.SP

Is K-fold cross validation the best model selection method for Machine Learning?

Pith reviewed 2026-05-24 04:42 UTC · model grok-4.3

classification 📊 stat.ML cs.LGeess.IVeess.SP
keywords K-fold cross-validationPAC-Bayesian boundsmodel selectionmachine learningfalse positivesneuroimagingconcentration inequalitiesactual risk
0
0 comments X

The pith

K-fold CUBV uses PAC-Bayesian bounds on linear classifiers to validate machine learning accuracy while reducing excess false positives on small or heterogeneous data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes K-fold CUBV, a statistical test that augments standard K-fold cross-validation with upper bounds on actual risk derived from concentration inequalities and PAC-Bayesian analysis. It targets the problems of partitioning small samples and learning from mixed data sources, which produce unreliable accuracy estimates and replication failures in machine learning. By bounding uncertain predictions in the worst case, the method supplies a frequentist-style check that works directly with classification measures like accuracy. Evaluation on simulated data and neuroimaging examples indicates the approach detects effects reliably without inflating false positives compared with classical CV or permutation tests.

Core claim

The paper derives Probably Approximately Correct-Bayesian upper bounds for linear classifiers combined with K-fold CV, then uses these to estimate actual risk via the worst-case bound on uncertain predictions; performance on simulated and neuroimaging datasets shows K-fold CUBV as a robust criterion for detecting effects and validating accuracy values from machine learning and classical CV schemes while avoiding excess false positives.

What carries the argument

K-fold CUBV, the combination of K-fold cross-validation with PAC-Bayesian upper bounds on actual risk that applies concentration inequalities to bound uncertain predictions by their worst-case value.

If this is right

  • K-fold CUBV supplies confidence intervals for accuracy values obtained directly from machine learning classifications.
  • The method reduces excess false positives when validating models on small-sample or heterogeneous sources.
  • It enables a frequentist-style analysis inside machine learning pipelines without requiring parametric assumptions on accuracy.
  • Classical CV schemes can be checked against the K-fold CUBV bound to confirm whether reported accuracy reflects genuine effects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The bounding technique might extend beyond linear classifiers if similar concentration inequalities can be derived for other model families.
  • Integration into existing cross-validation routines could change how practitioners report statistical significance in applied machine learning.
  • Comparison with permutation tests on the same datasets could clarify whether the PAC-Bayesian bound adds information beyond resampling.

Load-bearing premise

The PAC-Bayesian upper bounds for linear classifiers stay useful and not overly conservative when applied to real heterogeneous datasets.

What would settle it

A heterogeneous dataset where K-fold CUBV produces bounds so conservative that it misses known effects detected by standard K-fold CV without excess false positives would falsify the robustness claim.

Figures

Figures reproduced from arXiv: 2401.16407 by A Ortiz, F Segovia, J Ramirez, J. Suckling, Juan M Gorriz, R. Martin Clemente.

Figure 1
Figure 1. Figure 1: Left Column: Null distribution of accuracy values using K-fold CV (in green font) obtained from sampling [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Performance of K-fold CV in common experimental designs. Typical large biobanks include data across [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Examples of performance, FP rates and MC performance evaluation across independent (multi-sample) [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Examples of performance, FP rates and MC performance evaluation in single sample experiments. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The accuracy values (average and standard deviation) obtained in K-fold CV versus complexity ( [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The accuracy values (average and standard deviation) obtained in CUBV versus complexity ( [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance of nested CV, naive CV and the proposed K-fold CUBV test. We show the model-driven upper [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance of nested CV, naive CV and the proposed K-fold CUBV test. We show the model-driven upper [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Data complexity and VC dimension in n = 2. In two dimensions the number of non-intersecting convex hulls is, in general, 2 h for a set of points or distant clusters with cardinality less than the Radon number (n + 2) [58]. Assuming balanced sources we have in 2D only up to h h/2,h/2  ∼ Nc = 6 separable simulations whilst the number of non-separable simulations grow with order ∼ 2 Nc+1 √ 2πNc 25 [PITH_FUL… view at source ↗
Figure 10
Figure 10. Figure 10: We generate realistic datasets [37] including several modes by selecting a different number of clusters or [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Analysis of the ideal case. Top-up: statistical power of K-fold and CUBV (top-down) CV permutation [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Analysis of the non-ideal case (Nc = 4 and n = 2). Left column: samples and d values -top-; statistical power of CV permutation tests - middle up K-fold CV, middle-down CUBV; MC performance of K-fold CV -bottom up- and CUBV detection -bottom down- using a balanced dataset. Right column: the same measures using an imbal￾anced sample with r = 1/3 per cluster in each group. 28 [PITH_FULL_IMAGE:figures/full_… view at source ↗
Figure 13
Figure 13. Figure 13: Distribution of accuracy values (M = 100) vs. sample size and d for the non-ideal case. We show a n = 2 classification problem sampling from Nc = 4 Gaussian pdfs (2 per cluster) using an imbalanced dataset (r = 1/3) and d = {0, 1, 2, 4}. Note the biased regions within the green area. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Examples of classical permutation tests based on regular CV and CUBV decisions depending on sample [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: The analysis depicted in figure 5 is replicated here using a single realization. Observe the theoretical [PITH_FULL_IMAGE:figures/full_fig_p031_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: The same analysis as in figure 6 using a single realisation [PITH_FULL_IMAGE:figures/full_fig_p032_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Examples, power and detection analysis in single-mode pdf using a single sample realization [PITH_FULL_IMAGE:figures/full_fig_p033_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Examples, power and detection analysis in multi-mode pdf using a single sample realization [PITH_FULL_IMAGE:figures/full_fig_p034_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Cohen’s distance obtained from binary classes (whole datasets) versus dimension (PLS features). Note that [PITH_FULL_IMAGE:figures/full_fig_p035_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Examples of data analyzed in this section with several dimensions and problems. Each classification [PITH_FULL_IMAGE:figures/full_fig_p036_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Accuracy values for selected CV methods as a function of number of dimensions. Note the CUBV tech [PITH_FULL_IMAGE:figures/full_fig_p037_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Top: MC evaluation of K-fold CV in real datasets by averaging the results of Problems 1, 2 and 3. Note [PITH_FULL_IMAGE:figures/full_fig_p038_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Normalized cumulative sum of (1 − β) values (Pc := P(N,n) j=1 (1−βj ) #experiments ) for N and n versus dimension/sample size, respectively. Top: null experiment; bottom: Problem 1 39 [PITH_FULL_IMAGE:figures/full_fig_p039_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Normalized cumulative sum of (1 − β) values for N and n versus dimension/sample size, respectively. Top: Problem 2; bottom: Problem 3. 40 [PITH_FULL_IMAGE:figures/full_fig_p040_24.png] view at source ↗
read the original abstract

As a technique that can compactly represent complex patterns, machine learning has significant potential for predictive inference. K-fold cross-validation (CV) is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance, and it frequently outperforms conventional hypothesis testing. This improvement uses measures directly obtained from machine learning classifications, such as accuracy, that do not have a parametric description. To approach a frequentist analysis within machine learning pipelines, a permutation test or simple statistics from data partitions (i.e., folds) can be added to estimate confidence intervals. Unfortunately, neither parametric nor non-parametric tests solve the inherent problems of partitioning small sample-size datasets and learning from heterogeneous data sources. The fact that machine learning strongly depends on the learning parameters and the distribution of data across folds recapitulates familiar difficulties around excess false positives and replication. A novel statistical test based on K-fold CV and the Upper Bound of the actual risk (K-fold CUBV) is proposed, where uncertain predictions of machine learning with CV are bounded by the worst case through the evaluation of concentration inequalities. Probably Approximately Correct-Bayesian upper bounds for linear classifiers in combination with K-fold CV are derived and used to estimate the actual risk. The performance with simulated and neuroimaging datasets suggests that K-fold CUBV is a robust criterion for detecting effects and validating accuracy values obtained from machine learning and classical CV schemes, while avoiding excess false positives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript questions whether K-fold cross-validation is the best model selection method for machine learning and proposes K-fold CUBV, which combines K-fold CV with PAC-Bayesian upper bounds on the actual risk derived for linear classifiers. The central claim is that this approach yields a robust criterion for detecting effects and validating ML accuracies on small-sample and heterogeneous data (simulated and neuroimaging datasets) while controlling excess false positives better than standard CV or permutation tests.

Significance. If the derived bounds are shown to be sufficiently tight in practice and the method demonstrably improves false-positive control without loss of power, the work could strengthen validation practices in applied ML domains such as neuroimaging. The use of standard concentration inequalities to produce explicit upper bounds on risk is a methodological strength that aligns with PAC-Bayesian theory.

major comments (2)
  1. [Results (neuroimaging experiments)] The robustness claim rests on the PAC-Bayesian bounds remaining useful (not overly conservative) on heterogeneous neuroimaging data. The results section on the neuroimaging experiments does not report the numerical values of the derived upper bounds relative to the observed empirical accuracies or risks, so it is impossible to verify whether the bounds stay within a factor of 2–3 of the empirical performance or inflate substantially as is common for PAC-Bayes on non-stationary, high-dimensional data.
  2. [Method (bound derivation)] The derivation of the PAC-Bayesian upper bounds is stated to be for linear classifiers, yet the abstract and title frame the contribution for general machine learning pipelines. The manuscript does not clarify how (or whether) the bounds extend to non-linear models that are standard in the evaluated neuroimaging tasks, which is load-bearing for the claim that K-fold CUBV improves upon classical CV schemes.
minor comments (2)
  1. [Methods] Notation for the concentration inequalities and the precise definition of the K-fold CUBV statistic could be introduced with an explicit equation early in the methods section rather than relying on the abstract description.
  2. [Experiments (simulated data)] The simulated-data experiments would benefit from an explicit statement of the data-generating process parameters and the exact form of the linear classifier used, to allow direct reproduction of the reported false-positive rates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We address each major comment point-by-point below, indicating where we agree and will revise the manuscript.

read point-by-point responses
  1. Referee: [Results (neuroimaging experiments)] The robustness claim rests on the PAC-Bayesian bounds remaining useful (not overly conservative) on heterogeneous neuroimaging data. The results section on the neuroimaging experiments does not report the numerical values of the derived upper bounds relative to the observed empirical accuracies or risks, so it is impossible to verify whether the bounds stay within a factor of 2–3 of the empirical performance or inflate substantially as is common for PAC-Bayes on non-stationary, high-dimensional data.

    Authors: We agree that the numerical values of the PAC-Bayesian upper bounds relative to empirical accuracies are needed to evaluate tightness on heterogeneous data. In the revised manuscript we will add a table (or supplementary table) reporting these values for each neuroimaging dataset, allowing direct assessment of whether the bounds remain within a reasonable factor of the observed risks. revision: yes

  2. Referee: [Method (bound derivation)] The derivation of the PAC-Bayesian upper bounds is stated to be for linear classifiers, yet the abstract and title frame the contribution for general machine learning pipelines. The manuscript does not clarify how (or whether) the bounds extend to non-linear models that are standard in the evaluated neuroimaging tasks, which is load-bearing for the claim that K-fold CUBV improves upon classical CV schemes.

    Authors: The derivation in the methods section is explicitly for linear classifiers using the corresponding concentration inequalities. The title poses a general question about model selection, but the concrete contribution and bounds are for linear models. We will revise the abstract to state this scope clearly and add a short discussion paragraph noting that extensions to non-linear models would require different inequalities and are left for future work. This removes any ambiguity without overclaiming generality. revision: yes

Circularity Check

0 steps flagged

No circularity: bounds derived from standard concentration inequalities; evaluation is empirical

full rationale

The paper states that PAC-Bayesian upper bounds for linear classifiers combined with K-fold CV are derived from concentration inequalities and then applied to estimate actual risk. This is a standard mathematical derivation step whose inputs are the classifier, the prior, and the empirical risk on the folds; the resulting bound is not defined in terms of the target accuracy or the final performance metric. The subsequent claim that K-fold CUBV is robust rests on reported performance on simulated and neuroimaging datasets, which constitutes external empirical validation rather than a reduction of the bound to its own inputs. No self-citation, fitted-parameter renaming, or self-definitional step is present in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only: the approach rests on standard concentration inequalities and PAC-Bayesian analysis applied to the CV setting; no new entities are introduced and no free parameters are explicitly fitted beyond the choice of K and the bound parameters implicit in the inequalities.

axioms (1)
  • domain assumption Concentration inequalities and PAC-Bayesian analysis can be applied to bound the actual risk of linear classifiers when combined with K-fold cross-validation partitions.
    Invoked in the abstract description of the derived bounds used to estimate actual risk.

pith-pipeline@v0.9.0 · 5808 in / 1207 out tokens · 23963 ms · 2026-05-24T04:42:09.037610+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 3 internal anchors

  1. [1]

    National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and Replicability in Sci- ence. Washington, DC: The National Academies Press. https://doi.org/10.17226/25303

  2. [2]

    Cluster failure: Inflated false positives for fMRI

    A.Eklund, et al. Cluster failure: Inflated false positives for fMRI. Proceedings of the National Academy of Sci- ences Jul 2016, 113 (28) 7900-7905

  3. [3]

    Noble, et al

    S. Noble, et al. Cluster failure or power failure? Evaluating sensitivity in cluster-level inference. NeuroImage, 209, 116468,2020

  4. [4]

    Statistical Parametric Maps in functional imaging: A general linear approach Hum

    K.J.Friston, et al. Statistical Parametric Maps in functional imaging: A general linear approach Hum. Brain Mapp. 2:189-210 (1995)

  5. [5]

    Classical and Bayesian inference in neuroimaging: theory NeuroImage, 16 (2) (2002), pp

    K.J.Friston, et al. Classical and Bayesian inference in neuroimaging: theory NeuroImage, 16 (2) (2002), pp. 465- 483

  6. [6]

    Rosenblatt, et al

    J.D. Rosenblatt, et al. Revisiting multi-subject random effects in fMRI: Advocating prevalence estimation. Neu- roImage 84 (2014): 113-121

  7. [7]

    Model-Agnostic Interpretability of Machine Learning

    MT Ribeiro, et al. Model-agnostic interpretability of machine learning arXiv preprint arXiv:1606.05386. 2016

  8. [8]

    LeCun et al

    Y . LeCun et al. Deep learning. Nature 521, 436–444 (2015). 5Please see the column in Nature about this issue https://www.nature.com/articles/d41586-019-02960-3 18 Is K-fold cross validation the best model selection method for Machine Learning? A PREPRINT

  9. [9]

    Mathematical Aspects of Deep Learning

    P Grohs, et al. Mathematical Aspects of Deep Learning. Cambridge University Press. ISBN 9781009025096. https://doi.org/10.1017/9781009025096

  10. [10]

    Visualizing Data using t-SNE

    L.van der Maaten et al. Visualizing Data using t-SNE. Journal of Machine Learning Research 2008 vol 9, num 86, 2579–2605

  11. [11]

    Classifying brain states and determining the discriminating activation patterns: Support vector machine on functional MRI data

    J.Mouro-Miranda, et al. Classifying brain states and determining the discriminating activation patterns: Support vector machine on functional MRI data. NeuroImage, 28, 980-995. (2005)

  12. [12]

    Zhang et al

    Y . Zhang et al. Multivariate lesion-symptom mapping using support vector regression. Hum Brain Mapp. 2014 Dec;35(12):5861-76

  13. [13]

    A connection between pattern classification by machine learning and statistical inference with the General Linear Model

    JM Gorriz, et al. A connection between pattern classification by machine learning and statistical inference with the General Linear Model. IEEE Journal of Biomedical and Health Informatics 2021

  14. [14]

    A hypothesis-driven method based on machine learning for neuroimaging data analysis

    JM Gorriz, et al. A hypothesis-driven method based on machine learning for neuroimaging data analysis. Neuro- computing V olume 510, 21 October 2022, Pages 159-171

  15. [15]

    Support vector machine learning-based fMRI data group analysis

    Z Wang, et al. Support vector machine learning-based fMRI data group analysis. NeuroImage 36 (4), 1139-1151. 2007

  16. [16]

    A hybrid SVM–GLM approach for fMRI data analysis

    Z Wang. A hybrid SVM–GLM approach for fMRI data analysis. Neuroimage 46 (3), 608-615. 2009

  17. [17]

    Quantifying performance of machine learning methods for neuroimaging data

    Jollans L,et al. Quantifying performance of machine learning methods for neuroimaging data. Neuroimage. 2019 Oct 1;199:351-365

  18. [18]

    McKeown et

    M.J. McKeown et. al. Independent component analysis of functional MRI: what is signal and what is noise? Curr Opin Neurobiol. 2003 Oct; 13(5): 620–629

  19. [19]

    The same analysis approach: Practical protection against the pitfalls of novel neuroimaging analysis methods

    Gorgen, K., et al. The same analysis approach: Practical protection against the pitfalls of novel neuroimaging analysis methods. NeuroImage, 180, 19-30. 2018

  20. [20]

    Varoquaux

    G. Varoquaux. Cross-validation failure: Small sample sizes lead to large error bars. NeuroImage 180 (2018) 68-77

  21. [21]

    Gallavotti

    G. Gallavotti. Ergodicity, ensembles, irreversibility in Boltzmann and beyond Springer March 1995 Journal of Statistical Physics 78(5):1571-1589

  22. [22]

    R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence (IJCAI), pp 1–7, 1995

  23. [23]

    Allen, D. (1974). The relationship between variable selection and data augmentation and a method of prediction. Technometrics, 16:125-7

  24. [24]

    Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70(350):320-328

  25. [25]

    Bates, S., et al. (2023). Cross-Validation: What Does It Estimate and How Well Does It Do It? Journal of the American Statistical Association, 1–12

  26. [26]

    Rodriguez, J.D. (2020). Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence, V ol. 32, No. 3

  27. [27]

    A Machine Learning Approach to Reveal the NeuroPhenotypes of Autisms

    J.M.Górriz, et al. A Machine Learning Approach to Reveal the NeuroPhenotypes of Autisms. International jour- nal of neural systems, 1850058. 2019

  28. [28]

    Phipson et al

    B. Phipson et al. Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn. Statistical Applications in Genetics and Molecular Biology: V ol. 9: Iss. 1, Article 39. (2010)

  29. [29]

    Vapnik, V . N. (1998). Statistical Learning Theory. Wiley-Interscience

  30. [30]

    Boucheron et al

    S. Boucheron et al. Concentration Inequalities: A Nonasymptotic Theory of Independence ISBN: 9780199535255 Oxford University Press

  31. [31]

    Frackowiak, et al

    R.S.J. Frackowiak, et al. Human Brain Function (Second Edition). Chap. 44. Introduction to Random Field Theory. ISBN 978-0-12-264841-0 Academic Press. 867-879, 2004

  32. [32]

    Multiple testing corrections, nonparametric methods, and random field theory

    T.E.Nichols. Multiple testing corrections, nonparametric methods, and random field theory. NeuroImage 62 (2012) 811-815

  33. [33]

    Efron, B.; et al. (1993). An Introduction to the Bootstrap. Boca Raton, FL: Chapman & Hall/CRC. ISBN 0-412- 04231-2

  34. [34]

    Sarica, et al

    A. Sarica, et al. A machine learning neuroimaging challenge for automated diagnosis of Alzheimer’s disease. Editorial on special issue: Machine learning on MCI, vol 302, Journal of Neuroscience Methods. 2018. 19 Is K-fold cross validation the best model selection method for Machine Learning? A PREPRINT

  35. [35]

    C.C.Jack,Jr. ,et al. NIA-AA Research Framework: Toward a biological definition of Alzheimer’s disease. Alzheimers Dement. 2018 Apr; 14(4): 535?562

  36. [36]

    Artificial intelligence within the interplay between natural and artificial computation: Advances in data science, trends and applications

    J.M.Gorriz, et al. Artificial intelligence within the interplay between natural and artificial computation: Advances in data science, trends and applications. Neurocomputing V olume 410, 14 October 237-270 2020

  37. [37]

    On the computation of distribution-free performance bounds: Application to small sample sizes in neuroimaging

    J.M.Górriz, et al. On the computation of distribution-free performance bounds: Application to small sample sizes in neuroimaging. Pattern Recognition 93, 1-13, 2019

  38. [38]

    Statistical Agnostic Mapping: A framework in neuroimaging based on concentration inequali- ties

    J.M.Gorriz, et al. Statistical Agnostic Mapping: A framework in neuroimaging based on concentration inequali- ties. Information Fusion V olume 66, February 2021, Pages 198-212

  39. [39]

    A tutorial on support vector machines for pattern recognition Data Mining and Knowledge Dis- covery, 2 (2) (1998), pp

    C.J.C Burges. A tutorial on support vector machines for pattern recognition Data Mining and Knowledge Dis- covery, 2 (2) (1998), pp. 121-167

  40. [40]

    Chernoff

    H. Chernoff. A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493–507, 1952

  41. [41]

    McDiarmid

    C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, pages 148–188. Cambridge University Press, 1989

  42. [42]

    V . Vapnik. Estimation dependencies based on Empirical Data. Springer-Verlach. 1982 ISBN 0-387-90733-5

  43. [43]

    Haussler

    D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation V olume 100, Issue 1, September 1992, Pages 78-150

  44. [44]

    Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning

    Olivier Catoni. Pac-bayesian supervised classification: the thermodynamics of statistical learning. arXiv preprint arXiv:0712.0248, 2007

  45. [45]

    A PAC-Bayesian Tutorial with A Dropout Bound

    D. McAllester, A PAC-Bayesian Tutorial with A Dropout Bound. arXiv 10.48550/ARXIV .1307.2118,2013

  46. [46]

    Asymptotic evaluation of certain Markov process expectations for large time. IV

    Donsker, Monroe D.; Varadhan, SR Srinivasa (1983). "Asymptotic evaluation of certain Markov process expectations for large time. IV". Communications on Pure and Applied Mathematics. 36 (2): 183–212. doi:10.1002/cpa.3160360204

  47. [47]

    K.J. Friston. Sample size and the fallacies of classical inference. NeuroImage 81 (2013) 503–504

  48. [48]

    Global, voxel, and cluster tests, by theory and permutation, for a difference between two groups of structural MR images of the brain IEEE Trans Med Imaging (1999) Jan;18(1):32-42

    E T Bullmore et al. Global, voxel, and cluster tests, by theory and permutation, for a difference between two groups of structural MR images of the brain IEEE Trans Med Imaging (1999) Jan;18(1):32-42

  49. [49]

    Reiss, et al

    P.T. Reiss, et al. Cross-validation and hypothesis testing in neuroimaging: an irenic comment on the exchange between Friston and Lindquist et al. Neuroimage. 2015 August 1; 116: 248-254

  50. [50]

    Jimenez-Mesa et al

    C. Jimenez-Mesa et al. A non-parametric statistical inference framework for Deep Learning in current neu- roimaging. Information Fusion V olume 91, March 2023, Pages 598-611

  51. [51]

    S.M. Kay. Fundamentals of Statistical Signal Processing: Detection theory. Prentice-Hall PTR, 1998 013504135X, 9780135041352

  52. [52]

    Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orien- tation

    Zhang YD, et al. Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orien- tation. Inf Fusion. 2020 Dec;64:149-187

  53. [53]

    Acosta et al

    J.N. Acosta et al. Multimodal biomedical AI. Nat Med 28, 1773–1784 (2022)

  54. [54]

    Hyatt et al

    C.S. Hyatt et al. The quandary of covarying: A brief review and empirical examination of covariate use in structural neuroimaging studies on psychological variables. Neuroimage 205, 116225

  55. [55]

    Leming, et al

    M. Leming, et al. Ensemble Deep Learning on Large, Mixed-Site fMRI Datasets in Autism and Other Tasks. M Leming, International Journal of Neural Systems. V ol. 30, No. 07, 2050012. 2020

  56. [56]

    Rosenblatt, et al

    J.D. Rosenblatt, et al. Better-than-chance classification for signal detection. Biostatistics (2016)

  57. [57]

    Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition

    Cover, Thomas M.. “Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition.” IEEE Trans. Electron. Comput. 14 (1965): 326-334

  58. [58]

    far away

    H. Tverberg, A Generalization of Radon’s Theorem, Journal of the London Mathematical Society, V olume s1-41, Issue 1, 1966, Pages 123-128. 20 Is K-fold cross validation the best model selection method for Machine Learning? A PREPRINT Supplementary Materials 7.1 Remarks on ´´Common Experimental Designs” section How and when does a specific laboratory rejec...