Is K-fold cross validation the best model selection method for Machine Learning?

A Ortiz; F Segovia; J Ramirez; J. Suckling; Juan M Gorriz; R. Martin Clemente

arxiv: 2401.16407 · v2 · submitted 2024-01-29 · 📊 stat.ML · cs.LG· eess.IV· eess.SP

Is K-fold cross validation the best model selection method for Machine Learning?

Juan M Gorriz , R. Martin Clemente , F Segovia , J Ramirez , A Ortiz , J. Suckling This is my paper

Pith reviewed 2026-05-24 04:42 UTC · model grok-4.3

classification 📊 stat.ML cs.LGeess.IVeess.SP

keywords K-fold cross-validationPAC-Bayesian boundsmodel selectionmachine learningfalse positivesneuroimagingconcentration inequalitiesactual risk

0 comments

The pith

K-fold CUBV uses PAC-Bayesian bounds on linear classifiers to validate machine learning accuracy while reducing excess false positives on small or heterogeneous data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes K-fold CUBV, a statistical test that augments standard K-fold cross-validation with upper bounds on actual risk derived from concentration inequalities and PAC-Bayesian analysis. It targets the problems of partitioning small samples and learning from mixed data sources, which produce unreliable accuracy estimates and replication failures in machine learning. By bounding uncertain predictions in the worst case, the method supplies a frequentist-style check that works directly with classification measures like accuracy. Evaluation on simulated data and neuroimaging examples indicates the approach detects effects reliably without inflating false positives compared with classical CV or permutation tests.

Core claim

The paper derives Probably Approximately Correct-Bayesian upper bounds for linear classifiers combined with K-fold CV, then uses these to estimate actual risk via the worst-case bound on uncertain predictions; performance on simulated and neuroimaging datasets shows K-fold CUBV as a robust criterion for detecting effects and validating accuracy values from machine learning and classical CV schemes while avoiding excess false positives.

What carries the argument

K-fold CUBV, the combination of K-fold cross-validation with PAC-Bayesian upper bounds on actual risk that applies concentration inequalities to bound uncertain predictions by their worst-case value.

If this is right

K-fold CUBV supplies confidence intervals for accuracy values obtained directly from machine learning classifications.
The method reduces excess false positives when validating models on small-sample or heterogeneous sources.
It enables a frequentist-style analysis inside machine learning pipelines without requiring parametric assumptions on accuracy.
Classical CV schemes can be checked against the K-fold CUBV bound to confirm whether reported accuracy reflects genuine effects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The bounding technique might extend beyond linear classifiers if similar concentration inequalities can be derived for other model families.
Integration into existing cross-validation routines could change how practitioners report statistical significance in applied machine learning.
Comparison with permutation tests on the same datasets could clarify whether the PAC-Bayesian bound adds information beyond resampling.

Load-bearing premise

The PAC-Bayesian upper bounds for linear classifiers stay useful and not overly conservative when applied to real heterogeneous datasets.

What would settle it

A heterogeneous dataset where K-fold CUBV produces bounds so conservative that it misses known effects detected by standard K-fold CV without excess false positives would falsify the robustness claim.

Figures

Figures reproduced from arXiv: 2401.16407 by A Ortiz, F Segovia, J Ramirez, J. Suckling, Juan M Gorriz, R. Martin Clemente.

**Figure 2.** Figure 2: Performance of K-fold CV in common experimental designs. Typical large biobanks include data across [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of performance, FP rates and MC performance evaluation across independent (multi-sample) [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Examples of performance, FP rates and MC performance evaluation in single sample experiments. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: The accuracy values (average and standard deviation) obtained in K-fold CV versus complexity ( [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: The accuracy values (average and standard deviation) obtained in CUBV versus complexity ( [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Performance of nested CV, naive CV and the proposed K-fold CUBV test. We show the model-driven upper [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Performance of nested CV, naive CV and the proposed K-fold CUBV test. We show the model-driven upper [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Data complexity and VC dimension in n = 2. In two dimensions the number of non-intersecting convex hulls is, in general, 2 h for a set of points or distant clusters with cardinality less than the Radon number (n + 2) [58]. Assuming balanced sources we have in 2D only up to h h/2,h/2 ∼ Nc = 6 separable simulations whilst the number of non-separable simulations grow with order ∼ 2 Nc+1 √ 2πNc 25 [PITH_FUL… view at source ↗

**Figure 10.** Figure 10: We generate realistic datasets [37] including several modes by selecting a different number of clusters or [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

**Figure 11.** Figure 11: Analysis of the ideal case. Top-up: statistical power of K-fold and CUBV (top-down) CV permutation [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 12.** Figure 12: Analysis of the non-ideal case (Nc = 4 and n = 2). Left column: samples and d values -top-; statistical power of CV permutation tests - middle up K-fold CV, middle-down CUBV; MC performance of K-fold CV -bottom up- and CUBV detection -bottom down- using a balanced dataset. Right column: the same measures using an imbalanced sample with r = 1/3 per cluster in each group. 28 [PITH_FULL_IMAGE:figures/full_… view at source ↗

**Figure 13.** Figure 13: Distribution of accuracy values (M = 100) vs. sample size and d for the non-ideal case. We show a n = 2 classification problem sampling from Nc = 4 Gaussian pdfs (2 per cluster) using an imbalanced dataset (r = 1/3) and d = {0, 1, 2, 4}. Note the biased regions within the green area. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_13.png] view at source ↗

**Figure 14.** Figure 14: Examples of classical permutation tests based on regular CV and CUBV decisions depending on sample [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗

**Figure 15.** Figure 15: The analysis depicted in figure 5 is replicated here using a single realization. Observe the theoretical [PITH_FULL_IMAGE:figures/full_fig_p031_15.png] view at source ↗

**Figure 16.** Figure 16: The same analysis as in figure 6 using a single realisation [PITH_FULL_IMAGE:figures/full_fig_p032_16.png] view at source ↗

**Figure 17.** Figure 17: Examples, power and detection analysis in single-mode pdf using a single sample realization [PITH_FULL_IMAGE:figures/full_fig_p033_17.png] view at source ↗

**Figure 18.** Figure 18: Examples, power and detection analysis in multi-mode pdf using a single sample realization [PITH_FULL_IMAGE:figures/full_fig_p034_18.png] view at source ↗

**Figure 19.** Figure 19: Cohen’s distance obtained from binary classes (whole datasets) versus dimension (PLS features). Note that [PITH_FULL_IMAGE:figures/full_fig_p035_19.png] view at source ↗

**Figure 20.** Figure 20: Examples of data analyzed in this section with several dimensions and problems. Each classification [PITH_FULL_IMAGE:figures/full_fig_p036_20.png] view at source ↗

**Figure 21.** Figure 21: Accuracy values for selected CV methods as a function of number of dimensions. Note the CUBV tech [PITH_FULL_IMAGE:figures/full_fig_p037_21.png] view at source ↗

**Figure 22.** Figure 22: Top: MC evaluation of K-fold CV in real datasets by averaging the results of Problems 1, 2 and 3. Note [PITH_FULL_IMAGE:figures/full_fig_p038_22.png] view at source ↗

**Figure 23.** Figure 23: Normalized cumulative sum of (1 − β) values (Pc := P(N,n) j=1 (1−βj ) #experiments ) for N and n versus dimension/sample size, respectively. Top: null experiment; bottom: Problem 1 39 [PITH_FULL_IMAGE:figures/full_fig_p039_23.png] view at source ↗

**Figure 24.** Figure 24: Normalized cumulative sum of (1 − β) values for N and n versus dimension/sample size, respectively. Top: Problem 2; bottom: Problem 3. 40 [PITH_FULL_IMAGE:figures/full_fig_p040_24.png] view at source ↗

read the original abstract

As a technique that can compactly represent complex patterns, machine learning has significant potential for predictive inference. K-fold cross-validation (CV) is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance, and it frequently outperforms conventional hypothesis testing. This improvement uses measures directly obtained from machine learning classifications, such as accuracy, that do not have a parametric description. To approach a frequentist analysis within machine learning pipelines, a permutation test or simple statistics from data partitions (i.e., folds) can be added to estimate confidence intervals. Unfortunately, neither parametric nor non-parametric tests solve the inherent problems of partitioning small sample-size datasets and learning from heterogeneous data sources. The fact that machine learning strongly depends on the learning parameters and the distribution of data across folds recapitulates familiar difficulties around excess false positives and replication. A novel statistical test based on K-fold CV and the Upper Bound of the actual risk (K-fold CUBV) is proposed, where uncertain predictions of machine learning with CV are bounded by the worst case through the evaluation of concentration inequalities. Probably Approximately Correct-Bayesian upper bounds for linear classifiers in combination with K-fold CV are derived and used to estimate the actual risk. The performance with simulated and neuroimaging datasets suggests that K-fold CUBV is a robust criterion for detecting effects and validating accuracy values obtained from machine learning and classical CV schemes, while avoiding excess false positives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript questions whether K-fold cross-validation is the best model selection method for machine learning and proposes K-fold CUBV, which combines K-fold CV with PAC-Bayesian upper bounds on the actual risk derived for linear classifiers. The central claim is that this approach yields a robust criterion for detecting effects and validating ML accuracies on small-sample and heterogeneous data (simulated and neuroimaging datasets) while controlling excess false positives better than standard CV or permutation tests.

Significance. If the derived bounds are shown to be sufficiently tight in practice and the method demonstrably improves false-positive control without loss of power, the work could strengthen validation practices in applied ML domains such as neuroimaging. The use of standard concentration inequalities to produce explicit upper bounds on risk is a methodological strength that aligns with PAC-Bayesian theory.

major comments (2)

[Results (neuroimaging experiments)] The robustness claim rests on the PAC-Bayesian bounds remaining useful (not overly conservative) on heterogeneous neuroimaging data. The results section on the neuroimaging experiments does not report the numerical values of the derived upper bounds relative to the observed empirical accuracies or risks, so it is impossible to verify whether the bounds stay within a factor of 2–3 of the empirical performance or inflate substantially as is common for PAC-Bayes on non-stationary, high-dimensional data.
[Method (bound derivation)] The derivation of the PAC-Bayesian upper bounds is stated to be for linear classifiers, yet the abstract and title frame the contribution for general machine learning pipelines. The manuscript does not clarify how (or whether) the bounds extend to non-linear models that are standard in the evaluated neuroimaging tasks, which is load-bearing for the claim that K-fold CUBV improves upon classical CV schemes.

minor comments (2)

[Methods] Notation for the concentration inequalities and the precise definition of the K-fold CUBV statistic could be introduced with an explicit equation early in the methods section rather than relying on the abstract description.
[Experiments (simulated data)] The simulated-data experiments would benefit from an explicit statement of the data-generating process parameters and the exact form of the linear classifier used, to allow direct reproduction of the reported false-positive rates.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We address each major comment point-by-point below, indicating where we agree and will revise the manuscript.

read point-by-point responses

Referee: [Results (neuroimaging experiments)] The robustness claim rests on the PAC-Bayesian bounds remaining useful (not overly conservative) on heterogeneous neuroimaging data. The results section on the neuroimaging experiments does not report the numerical values of the derived upper bounds relative to the observed empirical accuracies or risks, so it is impossible to verify whether the bounds stay within a factor of 2–3 of the empirical performance or inflate substantially as is common for PAC-Bayes on non-stationary, high-dimensional data.

Authors: We agree that the numerical values of the PAC-Bayesian upper bounds relative to empirical accuracies are needed to evaluate tightness on heterogeneous data. In the revised manuscript we will add a table (or supplementary table) reporting these values for each neuroimaging dataset, allowing direct assessment of whether the bounds remain within a reasonable factor of the observed risks. revision: yes
Referee: [Method (bound derivation)] The derivation of the PAC-Bayesian upper bounds is stated to be for linear classifiers, yet the abstract and title frame the contribution for general machine learning pipelines. The manuscript does not clarify how (or whether) the bounds extend to non-linear models that are standard in the evaluated neuroimaging tasks, which is load-bearing for the claim that K-fold CUBV improves upon classical CV schemes.

Authors: The derivation in the methods section is explicitly for linear classifiers using the corresponding concentration inequalities. The title poses a general question about model selection, but the concrete contribution and bounds are for linear models. We will revise the abstract to state this scope clearly and add a short discussion paragraph noting that extensions to non-linear models would require different inequalities and are left for future work. This removes any ambiguity without overclaiming generality. revision: yes

Circularity Check

0 steps flagged

No circularity: bounds derived from standard concentration inequalities; evaluation is empirical

full rationale

The paper states that PAC-Bayesian upper bounds for linear classifiers combined with K-fold CV are derived from concentration inequalities and then applied to estimate actual risk. This is a standard mathematical derivation step whose inputs are the classifier, the prior, and the empirical risk on the folds; the resulting bound is not defined in terms of the target accuracy or the final performance metric. The subsequent claim that K-fold CUBV is robust rests on reported performance on simulated and neuroimaging datasets, which constitutes external empirical validation rather than a reduction of the bound to its own inputs. No self-citation, fitted-parameter renaming, or self-definitional step is present in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only: the approach rests on standard concentration inequalities and PAC-Bayesian analysis applied to the CV setting; no new entities are introduced and no free parameters are explicitly fitted beyond the choice of K and the bound parameters implicit in the inequalities.

axioms (1)

domain assumption Concentration inequalities and PAC-Bayesian analysis can be applied to bound the actual risk of linear classifiers when combined with K-fold cross-validation partitions.
Invoked in the abstract description of the derived bounds used to estimate actual risk.

pith-pipeline@v0.9.0 · 5808 in / 1207 out tokens · 23963 ms · 2026-05-24T04:42:09.037610+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 3 internal anchors

[1]

National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and Replicability in Sci- ence. Washington, DC: The National Academies Press. https://doi.org/10.17226/25303

work page doi:10.17226/25303 2019
[2]

Cluster failure: Inflated false positives for fMRI

A.Eklund, et al. Cluster failure: Inflated false positives for fMRI. Proceedings of the National Academy of Sci- ences Jul 2016, 113 (28) 7900-7905

work page 2016
[3]

Noble, et al

S. Noble, et al. Cluster failure or power failure? Evaluating sensitivity in cluster-level inference. NeuroImage, 209, 116468,2020

work page 2020
[4]

Statistical Parametric Maps in functional imaging: A general linear approach Hum

K.J.Friston, et al. Statistical Parametric Maps in functional imaging: A general linear approach Hum. Brain Mapp. 2:189-210 (1995)

work page 1995
[5]

Classical and Bayesian inference in neuroimaging: theory NeuroImage, 16 (2) (2002), pp

K.J.Friston, et al. Classical and Bayesian inference in neuroimaging: theory NeuroImage, 16 (2) (2002), pp. 465- 483

work page 2002
[6]

Rosenblatt, et al

J.D. Rosenblatt, et al. Revisiting multi-subject random effects in fMRI: Advocating prevalence estimation. Neu- roImage 84 (2014): 113-121

work page 2014
[7]

Model-Agnostic Interpretability of Machine Learning

MT Ribeiro, et al. Model-agnostic interpretability of machine learning arXiv preprint arXiv:1606.05386. 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[8]

LeCun et al

Y . LeCun et al. Deep learning. Nature 521, 436–444 (2015). 5Please see the column in Nature about this issue https://www.nature.com/articles/d41586-019-02960-3 18 Is K-fold cross validation the best model selection method for Machine Learning? A PREPRINT

work page 2015
[9]

Mathematical Aspects of Deep Learning

P Grohs, et al. Mathematical Aspects of Deep Learning. Cambridge University Press. ISBN 9781009025096. https://doi.org/10.1017/9781009025096

work page doi:10.1017/9781009025096
[10]

Visualizing Data using t-SNE

L.van der Maaten et al. Visualizing Data using t-SNE. Journal of Machine Learning Research 2008 vol 9, num 86, 2579–2605

work page 2008
[11]

Classifying brain states and determining the discriminating activation patterns: Support vector machine on functional MRI data

J.Mouro-Miranda, et al. Classifying brain states and determining the discriminating activation patterns: Support vector machine on functional MRI data. NeuroImage, 28, 980-995. (2005)

work page 2005
[12]

Zhang et al

Y . Zhang et al. Multivariate lesion-symptom mapping using support vector regression. Hum Brain Mapp. 2014 Dec;35(12):5861-76

work page 2014
[13]

A connection between pattern classification by machine learning and statistical inference with the General Linear Model

JM Gorriz, et al. A connection between pattern classification by machine learning and statistical inference with the General Linear Model. IEEE Journal of Biomedical and Health Informatics 2021

work page 2021
[14]

A hypothesis-driven method based on machine learning for neuroimaging data analysis

JM Gorriz, et al. A hypothesis-driven method based on machine learning for neuroimaging data analysis. Neuro- computing V olume 510, 21 October 2022, Pages 159-171

work page 2022
[15]

Support vector machine learning-based fMRI data group analysis

Z Wang, et al. Support vector machine learning-based fMRI data group analysis. NeuroImage 36 (4), 1139-1151. 2007

work page 2007
[16]

A hybrid SVM–GLM approach for fMRI data analysis

Z Wang. A hybrid SVM–GLM approach for fMRI data analysis. Neuroimage 46 (3), 608-615. 2009

work page 2009
[17]

Quantifying performance of machine learning methods for neuroimaging data

Jollans L,et al. Quantifying performance of machine learning methods for neuroimaging data. Neuroimage. 2019 Oct 1;199:351-365

work page 2019
[18]

McKeown et

M.J. McKeown et. al. Independent component analysis of functional MRI: what is signal and what is noise? Curr Opin Neurobiol. 2003 Oct; 13(5): 620–629

work page 2003
[19]

The same analysis approach: Practical protection against the pitfalls of novel neuroimaging analysis methods

Gorgen, K., et al. The same analysis approach: Practical protection against the pitfalls of novel neuroimaging analysis methods. NeuroImage, 180, 19-30. 2018

work page 2018
[20]

Varoquaux

G. Varoquaux. Cross-validation failure: Small sample sizes lead to large error bars. NeuroImage 180 (2018) 68-77

work page 2018
[21]

Gallavotti

G. Gallavotti. Ergodicity, ensembles, irreversibility in Boltzmann and beyond Springer March 1995 Journal of Statistical Physics 78(5):1571-1589

work page 1995
[22]

R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence (IJCAI), pp 1–7, 1995

work page 1995
[23]

Allen, D. (1974). The relationship between variable selection and data augmentation and a method of prediction. Technometrics, 16:125-7

work page 1974
[24]

Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70(350):320-328

work page 1975
[25]

Bates, S., et al. (2023). Cross-Validation: What Does It Estimate and How Well Does It Do It? Journal of the American Statistical Association, 1–12

work page 2023
[26]

Rodriguez, J.D. (2020). Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence, V ol. 32, No. 3

work page 2020
[27]

A Machine Learning Approach to Reveal the NeuroPhenotypes of Autisms

J.M.Górriz, et al. A Machine Learning Approach to Reveal the NeuroPhenotypes of Autisms. International jour- nal of neural systems, 1850058. 2019

work page 2019
[28]

Phipson et al

B. Phipson et al. Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn. Statistical Applications in Genetics and Molecular Biology: V ol. 9: Iss. 1, Article 39. (2010)

work page 2010
[29]

Vapnik, V . N. (1998). Statistical Learning Theory. Wiley-Interscience

work page 1998
[30]

Boucheron et al

S. Boucheron et al. Concentration Inequalities: A Nonasymptotic Theory of Independence ISBN: 9780199535255 Oxford University Press

work page
[31]

Frackowiak, et al

R.S.J. Frackowiak, et al. Human Brain Function (Second Edition). Chap. 44. Introduction to Random Field Theory. ISBN 978-0-12-264841-0 Academic Press. 867-879, 2004

work page 2004
[32]

Multiple testing corrections, nonparametric methods, and random field theory

T.E.Nichols. Multiple testing corrections, nonparametric methods, and random field theory. NeuroImage 62 (2012) 811-815

work page 2012
[33]

Efron, B.; et al. (1993). An Introduction to the Bootstrap. Boca Raton, FL: Chapman & Hall/CRC. ISBN 0-412- 04231-2

work page 1993
[34]

Sarica, et al

A. Sarica, et al. A machine learning neuroimaging challenge for automated diagnosis of Alzheimer’s disease. Editorial on special issue: Machine learning on MCI, vol 302, Journal of Neuroscience Methods. 2018. 19 Is K-fold cross validation the best model selection method for Machine Learning? A PREPRINT

work page 2018
[35]

C.C.Jack,Jr. ,et al. NIA-AA Research Framework: Toward a biological definition of Alzheimer’s disease. Alzheimers Dement. 2018 Apr; 14(4): 535?562

work page 2018
[36]

Artificial intelligence within the interplay between natural and artificial computation: Advances in data science, trends and applications

J.M.Gorriz, et al. Artificial intelligence within the interplay between natural and artificial computation: Advances in data science, trends and applications. Neurocomputing V olume 410, 14 October 237-270 2020

work page 2020
[37]

On the computation of distribution-free performance bounds: Application to small sample sizes in neuroimaging

J.M.Górriz, et al. On the computation of distribution-free performance bounds: Application to small sample sizes in neuroimaging. Pattern Recognition 93, 1-13, 2019

work page 2019
[38]

Statistical Agnostic Mapping: A framework in neuroimaging based on concentration inequali- ties

J.M.Gorriz, et al. Statistical Agnostic Mapping: A framework in neuroimaging based on concentration inequali- ties. Information Fusion V olume 66, February 2021, Pages 198-212

work page 2021
[39]

A tutorial on support vector machines for pattern recognition Data Mining and Knowledge Dis- covery, 2 (2) (1998), pp

C.J.C Burges. A tutorial on support vector machines for pattern recognition Data Mining and Knowledge Dis- covery, 2 (2) (1998), pp. 121-167

work page 1998
[40]

Chernoff

H. Chernoff. A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493–507, 1952

work page 1952
[41]

McDiarmid

C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, pages 148–188. Cambridge University Press, 1989

work page 1989
[42]

V . Vapnik. Estimation dependencies based on Empirical Data. Springer-Verlach. 1982 ISBN 0-387-90733-5

work page 1982
[43]

Haussler

D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation V olume 100, Issue 1, September 1992, Pages 78-150

work page 1992
[44]

Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning

Olivier Catoni. Pac-bayesian supervised classification: the thermodynamics of statistical learning. arXiv preprint arXiv:0712.0248, 2007

work page internal anchor Pith review Pith/arXiv arXiv 2007
[45]

A PAC-Bayesian Tutorial with A Dropout Bound

D. McAllester, A PAC-Bayesian Tutorial with A Dropout Bound. arXiv 10.48550/ARXIV .1307.2118,2013

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2013
[46]

Asymptotic evaluation of certain Markov process expectations for large time. IV

Donsker, Monroe D.; Varadhan, SR Srinivasa (1983). "Asymptotic evaluation of certain Markov process expectations for large time. IV". Communications on Pure and Applied Mathematics. 36 (2): 183–212. doi:10.1002/cpa.3160360204

work page doi:10.1002/cpa.3160360204 1983
[47]

K.J. Friston. Sample size and the fallacies of classical inference. NeuroImage 81 (2013) 503–504

work page 2013
[48]

Global, voxel, and cluster tests, by theory and permutation, for a difference between two groups of structural MR images of the brain IEEE Trans Med Imaging (1999) Jan;18(1):32-42

E T Bullmore et al. Global, voxel, and cluster tests, by theory and permutation, for a difference between two groups of structural MR images of the brain IEEE Trans Med Imaging (1999) Jan;18(1):32-42

work page 1999
[49]

Reiss, et al

P.T. Reiss, et al. Cross-validation and hypothesis testing in neuroimaging: an irenic comment on the exchange between Friston and Lindquist et al. Neuroimage. 2015 August 1; 116: 248-254

work page 2015
[50]

Jimenez-Mesa et al

C. Jimenez-Mesa et al. A non-parametric statistical inference framework for Deep Learning in current neu- roimaging. Information Fusion V olume 91, March 2023, Pages 598-611

work page 2023
[51]

S.M. Kay. Fundamentals of Statistical Signal Processing: Detection theory. Prentice-Hall PTR, 1998 013504135X, 9780135041352

work page 1998
[52]

Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orien- tation

Zhang YD, et al. Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orien- tation. Inf Fusion. 2020 Dec;64:149-187

work page 2020
[53]

Acosta et al

J.N. Acosta et al. Multimodal biomedical AI. Nat Med 28, 1773–1784 (2022)

work page 2022
[54]

Hyatt et al

C.S. Hyatt et al. The quandary of covarying: A brief review and empirical examination of covariate use in structural neuroimaging studies on psychological variables. Neuroimage 205, 116225

work page
[55]

Leming, et al

M. Leming, et al. Ensemble Deep Learning on Large, Mixed-Site fMRI Datasets in Autism and Other Tasks. M Leming, International Journal of Neural Systems. V ol. 30, No. 07, 2050012. 2020

work page 2020
[56]

Rosenblatt, et al

J.D. Rosenblatt, et al. Better-than-chance classification for signal detection. Biostatistics (2016)

work page 2016
[57]

Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition

Cover, Thomas M.. “Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition.” IEEE Trans. Electron. Comput. 14 (1965): 326-334

work page 1965
[58]

far away

H. Tverberg, A Generalization of Radon’s Theorem, Journal of the London Mathematical Society, V olume s1-41, Issue 1, 1966, Pages 123-128. 20 Is K-fold cross validation the best model selection method for Machine Learning? A PREPRINT Supplementary Materials 7.1 Remarks on ´´Common Experimental Designs” section How and when does a specific laboratory rejec...

work page 1966

[1] [1]

National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and Replicability in Sci- ence. Washington, DC: The National Academies Press. https://doi.org/10.17226/25303

work page doi:10.17226/25303 2019

[2] [2]

Cluster failure: Inflated false positives for fMRI

A.Eklund, et al. Cluster failure: Inflated false positives for fMRI. Proceedings of the National Academy of Sci- ences Jul 2016, 113 (28) 7900-7905

work page 2016

[3] [3]

Noble, et al

S. Noble, et al. Cluster failure or power failure? Evaluating sensitivity in cluster-level inference. NeuroImage, 209, 116468,2020

work page 2020

[4] [4]

Statistical Parametric Maps in functional imaging: A general linear approach Hum

K.J.Friston, et al. Statistical Parametric Maps in functional imaging: A general linear approach Hum. Brain Mapp. 2:189-210 (1995)

work page 1995

[5] [5]

Classical and Bayesian inference in neuroimaging: theory NeuroImage, 16 (2) (2002), pp

K.J.Friston, et al. Classical and Bayesian inference in neuroimaging: theory NeuroImage, 16 (2) (2002), pp. 465- 483

work page 2002

[6] [6]

Rosenblatt, et al

J.D. Rosenblatt, et al. Revisiting multi-subject random effects in fMRI: Advocating prevalence estimation. Neu- roImage 84 (2014): 113-121

work page 2014

[7] [7]

Model-Agnostic Interpretability of Machine Learning

MT Ribeiro, et al. Model-agnostic interpretability of machine learning arXiv preprint arXiv:1606.05386. 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[8] [8]

LeCun et al

Y . LeCun et al. Deep learning. Nature 521, 436–444 (2015). 5Please see the column in Nature about this issue https://www.nature.com/articles/d41586-019-02960-3 18 Is K-fold cross validation the best model selection method for Machine Learning? A PREPRINT

work page 2015

[9] [9]

Mathematical Aspects of Deep Learning

P Grohs, et al. Mathematical Aspects of Deep Learning. Cambridge University Press. ISBN 9781009025096. https://doi.org/10.1017/9781009025096

work page doi:10.1017/9781009025096

[10] [10]

Visualizing Data using t-SNE

L.van der Maaten et al. Visualizing Data using t-SNE. Journal of Machine Learning Research 2008 vol 9, num 86, 2579–2605

work page 2008

[11] [11]

Classifying brain states and determining the discriminating activation patterns: Support vector machine on functional MRI data

J.Mouro-Miranda, et al. Classifying brain states and determining the discriminating activation patterns: Support vector machine on functional MRI data. NeuroImage, 28, 980-995. (2005)

work page 2005

[12] [12]

Zhang et al

Y . Zhang et al. Multivariate lesion-symptom mapping using support vector regression. Hum Brain Mapp. 2014 Dec;35(12):5861-76

work page 2014

[13] [13]

A connection between pattern classification by machine learning and statistical inference with the General Linear Model

JM Gorriz, et al. A connection between pattern classification by machine learning and statistical inference with the General Linear Model. IEEE Journal of Biomedical and Health Informatics 2021

work page 2021

[14] [14]

A hypothesis-driven method based on machine learning for neuroimaging data analysis

JM Gorriz, et al. A hypothesis-driven method based on machine learning for neuroimaging data analysis. Neuro- computing V olume 510, 21 October 2022, Pages 159-171

work page 2022

[15] [15]

Support vector machine learning-based fMRI data group analysis

Z Wang, et al. Support vector machine learning-based fMRI data group analysis. NeuroImage 36 (4), 1139-1151. 2007

work page 2007

[16] [16]

A hybrid SVM–GLM approach for fMRI data analysis

Z Wang. A hybrid SVM–GLM approach for fMRI data analysis. Neuroimage 46 (3), 608-615. 2009

work page 2009

[17] [17]

Quantifying performance of machine learning methods for neuroimaging data

Jollans L,et al. Quantifying performance of machine learning methods for neuroimaging data. Neuroimage. 2019 Oct 1;199:351-365

work page 2019

[18] [18]

McKeown et

M.J. McKeown et. al. Independent component analysis of functional MRI: what is signal and what is noise? Curr Opin Neurobiol. 2003 Oct; 13(5): 620–629

work page 2003

[19] [19]

The same analysis approach: Practical protection against the pitfalls of novel neuroimaging analysis methods

Gorgen, K., et al. The same analysis approach: Practical protection against the pitfalls of novel neuroimaging analysis methods. NeuroImage, 180, 19-30. 2018

work page 2018

[20] [20]

Varoquaux

G. Varoquaux. Cross-validation failure: Small sample sizes lead to large error bars. NeuroImage 180 (2018) 68-77

work page 2018

[21] [21]

Gallavotti

G. Gallavotti. Ergodicity, ensembles, irreversibility in Boltzmann and beyond Springer March 1995 Journal of Statistical Physics 78(5):1571-1589

work page 1995

[22] [22]

R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence (IJCAI), pp 1–7, 1995

work page 1995

[23] [23]

Allen, D. (1974). The relationship between variable selection and data augmentation and a method of prediction. Technometrics, 16:125-7

work page 1974

[24] [24]

Geisser, S. (1975). The predictive sample reuse method with applications. Journal of the American Statistical Association, 70(350):320-328

work page 1975

[25] [25]

Bates, S., et al. (2023). Cross-Validation: What Does It Estimate and How Well Does It Do It? Journal of the American Statistical Association, 1–12

work page 2023

[26] [26]

Rodriguez, J.D. (2020). Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence, V ol. 32, No. 3

work page 2020

[27] [27]

A Machine Learning Approach to Reveal the NeuroPhenotypes of Autisms

J.M.Górriz, et al. A Machine Learning Approach to Reveal the NeuroPhenotypes of Autisms. International jour- nal of neural systems, 1850058. 2019

work page 2019

[28] [28]

Phipson et al

B. Phipson et al. Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn. Statistical Applications in Genetics and Molecular Biology: V ol. 9: Iss. 1, Article 39. (2010)

work page 2010

[29] [29]

Vapnik, V . N. (1998). Statistical Learning Theory. Wiley-Interscience

work page 1998

[30] [30]

Boucheron et al

S. Boucheron et al. Concentration Inequalities: A Nonasymptotic Theory of Independence ISBN: 9780199535255 Oxford University Press

work page

[31] [31]

Frackowiak, et al

R.S.J. Frackowiak, et al. Human Brain Function (Second Edition). Chap. 44. Introduction to Random Field Theory. ISBN 978-0-12-264841-0 Academic Press. 867-879, 2004

work page 2004

[32] [32]

Multiple testing corrections, nonparametric methods, and random field theory

T.E.Nichols. Multiple testing corrections, nonparametric methods, and random field theory. NeuroImage 62 (2012) 811-815

work page 2012

[33] [33]

Efron, B.; et al. (1993). An Introduction to the Bootstrap. Boca Raton, FL: Chapman & Hall/CRC. ISBN 0-412- 04231-2

work page 1993

[34] [34]

Sarica, et al

A. Sarica, et al. A machine learning neuroimaging challenge for automated diagnosis of Alzheimer’s disease. Editorial on special issue: Machine learning on MCI, vol 302, Journal of Neuroscience Methods. 2018. 19 Is K-fold cross validation the best model selection method for Machine Learning? A PREPRINT

work page 2018

[35] [35]

C.C.Jack,Jr. ,et al. NIA-AA Research Framework: Toward a biological definition of Alzheimer’s disease. Alzheimers Dement. 2018 Apr; 14(4): 535?562

work page 2018

[36] [36]

Artificial intelligence within the interplay between natural and artificial computation: Advances in data science, trends and applications

J.M.Gorriz, et al. Artificial intelligence within the interplay between natural and artificial computation: Advances in data science, trends and applications. Neurocomputing V olume 410, 14 October 237-270 2020

work page 2020

[37] [37]

On the computation of distribution-free performance bounds: Application to small sample sizes in neuroimaging

J.M.Górriz, et al. On the computation of distribution-free performance bounds: Application to small sample sizes in neuroimaging. Pattern Recognition 93, 1-13, 2019

work page 2019

[38] [38]

Statistical Agnostic Mapping: A framework in neuroimaging based on concentration inequali- ties

J.M.Gorriz, et al. Statistical Agnostic Mapping: A framework in neuroimaging based on concentration inequali- ties. Information Fusion V olume 66, February 2021, Pages 198-212

work page 2021

[39] [39]

A tutorial on support vector machines for pattern recognition Data Mining and Knowledge Dis- covery, 2 (2) (1998), pp

C.J.C Burges. A tutorial on support vector machines for pattern recognition Data Mining and Knowledge Dis- covery, 2 (2) (1998), pp. 121-167

work page 1998

[40] [40]

Chernoff

H. Chernoff. A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493–507, 1952

work page 1952

[41] [41]

McDiarmid

C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics, pages 148–188. Cambridge University Press, 1989

work page 1989

[42] [42]

V . Vapnik. Estimation dependencies based on Empirical Data. Springer-Verlach. 1982 ISBN 0-387-90733-5

work page 1982

[43] [43]

Haussler

D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation V olume 100, Issue 1, September 1992, Pages 78-150

work page 1992

[44] [44]

Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning

Olivier Catoni. Pac-bayesian supervised classification: the thermodynamics of statistical learning. arXiv preprint arXiv:0712.0248, 2007

work page internal anchor Pith review Pith/arXiv arXiv 2007

[45] [45]

A PAC-Bayesian Tutorial with A Dropout Bound

D. McAllester, A PAC-Bayesian Tutorial with A Dropout Bound. arXiv 10.48550/ARXIV .1307.2118,2013

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2013

[46] [46]

Asymptotic evaluation of certain Markov process expectations for large time. IV

Donsker, Monroe D.; Varadhan, SR Srinivasa (1983). "Asymptotic evaluation of certain Markov process expectations for large time. IV". Communications on Pure and Applied Mathematics. 36 (2): 183–212. doi:10.1002/cpa.3160360204

work page doi:10.1002/cpa.3160360204 1983

[47] [47]

K.J. Friston. Sample size and the fallacies of classical inference. NeuroImage 81 (2013) 503–504

work page 2013

[48] [48]

Global, voxel, and cluster tests, by theory and permutation, for a difference between two groups of structural MR images of the brain IEEE Trans Med Imaging (1999) Jan;18(1):32-42

E T Bullmore et al. Global, voxel, and cluster tests, by theory and permutation, for a difference between two groups of structural MR images of the brain IEEE Trans Med Imaging (1999) Jan;18(1):32-42

work page 1999

[49] [49]

Reiss, et al

P.T. Reiss, et al. Cross-validation and hypothesis testing in neuroimaging: an irenic comment on the exchange between Friston and Lindquist et al. Neuroimage. 2015 August 1; 116: 248-254

work page 2015

[50] [50]

Jimenez-Mesa et al

C. Jimenez-Mesa et al. A non-parametric statistical inference framework for Deep Learning in current neu- roimaging. Information Fusion V olume 91, March 2023, Pages 598-611

work page 2023

[51] [51]

S.M. Kay. Fundamentals of Statistical Signal Processing: Detection theory. Prentice-Hall PTR, 1998 013504135X, 9780135041352

work page 1998

[52] [52]

Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orien- tation

Zhang YD, et al. Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orien- tation. Inf Fusion. 2020 Dec;64:149-187

work page 2020

[53] [53]

Acosta et al

J.N. Acosta et al. Multimodal biomedical AI. Nat Med 28, 1773–1784 (2022)

work page 2022

[54] [54]

Hyatt et al

C.S. Hyatt et al. The quandary of covarying: A brief review and empirical examination of covariate use in structural neuroimaging studies on psychological variables. Neuroimage 205, 116225

work page

[55] [55]

Leming, et al

M. Leming, et al. Ensemble Deep Learning on Large, Mixed-Site fMRI Datasets in Autism and Other Tasks. M Leming, International Journal of Neural Systems. V ol. 30, No. 07, 2050012. 2020

work page 2020

[56] [56]

Rosenblatt, et al

J.D. Rosenblatt, et al. Better-than-chance classification for signal detection. Biostatistics (2016)

work page 2016

[57] [57]

Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition

Cover, Thomas M.. “Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition.” IEEE Trans. Electron. Comput. 14 (1965): 326-334

work page 1965

[58] [58]

far away

H. Tverberg, A Generalization of Radon’s Theorem, Journal of the London Mathematical Society, V olume s1-41, Issue 1, 1966, Pages 123-128. 20 Is K-fold cross validation the best model selection method for Machine Learning? A PREPRINT Supplementary Materials 7.1 Remarks on ´´Common Experimental Designs” section How and when does a specific laboratory rejec...

work page 1966