pith. sign in

arxiv: 2606.26781 · v1 · pith:5NPU2PFKnew · submitted 2026-06-25 · 📊 stat.ME · math.ST· stat.TH

Multiple testing

Pith reviewed 2026-06-26 03:16 UTC · model grok-4.3

classification 📊 stat.ME math.STstat.TH
keywords multiple hypothesis testingerror criteriatesting proceduresfamily-wise error ratefalse discovery rateR packages
0
0 comments X

The pith

This text introduces multiple hypothesis testing by covering error criteria and testing procedures with R package references.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper serves as lecture notes providing an introduction to multiple hypothesis testing. It explains various error criteria used when testing many hypotheses simultaneously. It also covers common testing procedures and points to relevant R packages for implementation. The material was developed for a PhD-level course.

Core claim

The text provides an introduction to multiple hypothesis testing. It covers various error criteria and testing procedures, and includes references to relevant R packages.

What carries the argument

Multiple testing procedures that control error rates such as family-wise error rate or false discovery rate when many hypotheses are tested at once.

If this is right

  • Users gain the ability to select appropriate error control when performing many simultaneous tests.
  • Practical implementation is supported by the referenced R packages.
  • The material supports teaching of multiple testing concepts at an advanced level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The notes could serve as a foundation for researchers entering fields that require high-dimensional testing.
  • They highlight the need to match error criteria to the scientific goal of the analysis.
  • Similar lecture notes might be adapted for other statistical topics with software examples.

Load-bearing premise

The descriptions of error criteria and testing procedures accurately reflect established methods in the statistical literature.

What would settle it

A demonstration that one of the described procedures fails to control the stated error rate under the conditions given in the text.

Figures

Figures reproduced from arXiv: 2606.26781 by Jesse Hemerik.

Figure 1
Figure 1. Figure 1: Non-inferiority testing. The null hypothesis corresponds to effects worse [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Equivalence testing. “Equivalence” means that that [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Histogram of 5! = 120 test statistics based on permuted versions of the [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Histogram of 26 = 64 test statistics based on sign-flipped versions of the dataset on maize plants. where Gg means {h ◦ g : h ∈ G}. Because tests such as those in §2.4.2 and §2.4.3 involve groups of transformations, they are sometimes called group invariance tests. Thus, a permutation test for example, is a special case of a group invariance test. Example 2.1 (Permutation maps form a group.). In the exampl… view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of the global test using Simes’ inequality ( [PITH_FULL_IMAGE:figures/full_fig_p035_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The sorted values of the smallest 50 p-values among all 279 p-values for [PITH_FULL_IMAGE:figures/full_fig_p050_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The sorted p-values corresponding to the numerical predictors of the [PITH_FULL_IMAGE:figures/full_fig_p051_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A hypothesis is a set of distributions. This Venn diagram represents [PITH_FULL_IMAGE:figures/full_fig_p053_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: This figure shows all intersection hypotheses in case there are three [PITH_FULL_IMAGE:figures/full_fig_p055_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Example data on car models. 18.1 cyl 22.8 disp 24.4 hp 21 drat 14.3 wt 6 qsec 4 vs 6 am 8 gear 4 carb 160 108 225 360 146.7 110 93 105 245 62 3.9 3.85 2.76 3.21 3.69 2.62 2.32 3.46 3.57 3.19 16.46 18.61 20.22 15.84 20 0 1 1 0 1 1 1 0 0 0 4 4 3 3 4 4 1 1 4 2 [PITH_FULL_IMAGE:figures/full_fig_p067_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Simultaneously permuting the columns corresponding to all variables [PITH_FULL_IMAGE:figures/full_fig_p067_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: For each of the 120 permutations we computed the maximum of the [PITH_FULL_IMAGE:figures/full_fig_p068_12.png] view at source ↗
read the original abstract

This text provides an introduction to multiple hypothesis testing. It covers various error criteria and testing procedures, and includes references to relevant R packages. An earlier version of this text served as the lecture notes for a PhD-level course on multiple testing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript is an expository introduction to multiple hypothesis testing. It covers error criteria (FWER, FDR and variants), standard procedures (Bonferroni, Holm, Benjamini-Hochberg and related step-up/step-down methods), and points readers to R packages for implementation. The text originated as PhD-level lecture notes.

Significance. If the descriptions match the established literature, the manuscript could function as a compact teaching aid for graduate students. Because it advances no new methods, proofs, or empirical results, its contribution to the research literature in statistical methodology is minimal.

minor comments (2)
  1. Add a table of contents or explicit section numbering to improve usability as standalone lecture notes.
  2. Include version numbers or last-update dates for the cited R packages (e.g., multtest, qvalue) so readers can reproduce the examples.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for reviewing our manuscript. We agree that it is an expository introduction based on PhD lecture notes, covering established error criteria and procedures along with R package references, without introducing new methods or results.

read point-by-point responses
  1. Referee: If the descriptions match the established literature, the manuscript could function as a compact teaching aid for graduate students. Because it advances no new methods, proofs, or empirical results, its contribution to the research literature in statistical methodology is minimal.

    Authors: We concur that the manuscript does not advance new methodology, proofs, or empirical findings, as its scope is limited to summarizing standard approaches and directing readers to implementations. This aligns with its origin as lecture notes intended for instructional use rather than original research. We maintain that such consolidated expository resources can still offer pedagogical value for students and practitioners seeking an accessible overview. revision: no

Circularity Check

0 steps flagged

No circularity: purely expository introduction with no derivations or predictions

full rationale

The manuscript is an expository introduction to multiple hypothesis testing methods drawn from the established statistical literature. It covers error criteria, procedures, and R packages but contains no derivations, predictions, fitted parameters, or novel claims. The reader's weakest assumption (accurate reflection of standard methods) is external to the paper and does not create internal circularity. No load-bearing steps reduce to self-definition, self-citation chains, or fitted inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an expository text introducing standard concepts in multiple testing with no new mathematical derivations, parameters, or entities introduced.

pith-pipeline@v0.9.1-grok · 5537 in / 970 out tokens · 43788 ms · 2026-06-26T03:16:32.615347+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 4 canonical work pages

  1. [1]

    Anderson, M. J. and Robinson, J. Permutation tests for linear models. Australian & New Zealand Journal of Statistics, 43 0 (1): 0 75--88, 2001

  2. [2]

    Permutation-based true discovery proportions for functional magnetic resonance imaging cluster analysis

    Andreella, A., Hemerik, J., Finos, L., Weeda, W., and Goeman, J. Permutation-based true discovery proportions for functional magnetic resonance imaging cluster analysis. Statistics in Medicine, 42 0 (14): 0 2311--2340, 2023

  3. [3]

    Barber, R. F. and Candes, E. Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43 0 (5): 0 2055--2085, 2015

  4. [4]

    F., Candes, E., Janson, L., Patterson, E., and Sesia, M

    Barber, R. F., Candes, E., Janson, L., Patterson, E., and Sesia, M. The Knockoff Filter for Controlled Variable Selection, 2022. URL https://CRAN.R-project.org/package=knockoff. R package version 0.3.6

  5. [5]

    and Hochberg, Y

    Benjamini, Y. and Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), pages 289--300, 1995

  6. [6]

    and Yekutieli, D

    Benjamini, Y. and Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Annals of statistics, pages 1165--1188, 2001

  7. [7]

    Notip: Non-parametric true discovery proportion control for brain imaging

    Blain, A., Thirion, B., and Neuvial, P. Notip: Non-parametric true discovery proportion control for brain imaging. NeuroImage, 260: 0 119492, 2022

  8. [8]

    R., Linhart, J., Thirion, B., and Neuvial, P

    Blain, A., Lobo, A. R., Linhart, J., Thirion, B., and Neuvial, P. When knockoffs fail: diagnosing and fixing non-exchangeability of knockoffs. arXiv preprint arXiv:2407.06892, 2024

  9. [9]

    B., Benjamini, Y., and Sabatti, C

    Bogomolov, M., Peterson, C. B., Benjamini, Y., and Sabatti, C. Hypotheses on a tree: new error rates and testing strategies. Biometrika, 108 0 (3): 0 575--590, 2021

  10. [10]

    High-dimensional statistics with a view toward applications in biology

    B \"u hlmann, P., Kalisch, M., and Meier, L. High-dimensional statistics with a view toward applications in biology. Annual review of statistics and its application, 1 0 (1): 0 255--278, 2014

  11. [11]

    A., Romano, J

    Canay, I. A., Romano, J. P., and Shaikh, A. M. Randomization tests under an approximate symmetry assumption. Econometrica, 85 0 (3): 0 1013--1030, 2017

  12. [12]

    Panning for gold:‘model-x’knockoffs for high dimensional controlled variable selection

    Candes, E., Fan, Y., Janson, L., and Lv, J. Panning for gold:‘model-x’knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80 0 (3): 0 551--577, 2018

  13. [13]

    P., and Wolf, M

    Clarke, D., Romano, J. P., and Wolf, M. The R omano-- W olf multiple-hypothesis correction in S tata. The S tata Journal , 20 0 (4): 0 812--843, 2020

  14. [14]

    Cock, D. D. Ames, I owa: Alternative to the B oston housing data as an end of semester regression project. Journal of Statistics Education, 19 0 (3): 0 1--15, 2011

  15. [15]

    and Flachaire, E

    Davidson, R. and Flachaire, E. The wild bootstrap, tamed at last. Journal of Econometrics, 146 0 (1): 0 162--169, 2008

  16. [16]

    J., Davenport, S., Hemerik, J., and Finos, L

    De Santis, R., Goeman, J. J., Davenport, S., Hemerik, J., and Finos, L. Permutation-based multiple testing when fitting many generalized linear models. Electronic Journal of Statistics, 19 0 (2): 0 3317--3332, 2025 a

  17. [17]

    J., Hemerik, J., Davenport, S., and Finos, L

    De Santis, R., Goeman, J. J., Hemerik, J., Davenport, S., and Finos, L. Inference in generalized linear models with robustness to misspecified variances. Journal of the American Statistical Association, 120 0 (552): 0 2762--2771, 2025 b

  18. [18]

    and Roquain, E

    Delattre, S. and Roquain, E. New procedures controlling the false discovery proportion via R omano-- W olf’s heuristic. The Annals of Statistics, 43 0 (3): 0 1141--1177, 2015

  19. [19]

    and Scheer, M

    Dikta, G. and Scheer, M. Bootstrap methods. Springer, 2021

  20. [20]

    False Discovery Exceedance Controlling Multiple Testing Procedures, 2024

    Dohler, S., Junge, F., and Roquain, E. False Discovery Exceedance Controlling Multiple Testing Procedures, 2024. URL https://CRAN.R-project.org/package=FDX. R package version 2.0.2

  21. [21]

    and Van Der Laan, M

    Dudoit, S. and Van Der Laan, M. J. Multiple testing procedures with applications to genomics. Springer, 2008

  22. [22]

    Fay, M. P. and Brittain, E. H. Statistical Hypothesis Testing in Context: Volume 52: Reproducibility, Inference, and Science, volume 52. Cambridge University Press, 2022

  23. [23]

    On the false discovery rate and an asymptotically optimal rejection curve

    Finner, H., Dickhaus, T., and Roters, M. On the false discovery rate and an asymptotically optimal rejection curve. The Annals of Statistics, pages 596--618, 2009

  24. [24]

    Fisher, R. A. The design of experiments. Oliver and Boyd, 1935

  25. [25]

    and Lane, D

    Freedman, D. and Lane, D. A nonstochastic interpretation of reported significance levels. Journal of Business & Economic Statistics, 1 0 (4): 0 292--298, 1983

  26. [26]

    Genovese, C. R. and Wasserman, L. Exceedance control of the false discovery proportion. Journal of the American Statistical Association, 101 0 (476): 0 1408--1417, 2006

  27. [27]

    Goeman, J. J. and Solari, A. Multiple testing for exploratory research. Statistical Science, 26 0 (4): 0 584--597, 2011

  28. [28]

    Goeman, J. J. and Solari, A. Multiple hypothesis testing in genomics. Statistics in medicine, 33 0 (11): 0 1946--1978, 2014

  29. [29]

    J., Meijer, R

    Goeman, J. J., Meijer, R. J., Krebs, T. J., and Solari, A. Simultaneous control of all false discovery proportions in large-scale multiple hypothesis testing. Biometrika, 106 0 (4): 0 841--856, 2019

  30. [30]

    J., Hemerik, J., and Solari, A

    Goeman, J. J., Hemerik, J., and Solari, A. Only closed testing procedures are admissible for controlling false discovery proportions. The Annals of Statistics, 49 0 (2): 0 1218--1238, 2021

  31. [31]

    J., Meijer, R., and Krebs, T

    Goeman, J. J., Meijer, R., and Krebs, T. Methods for Closed Testing with Simes Inequality, in Particular Hommel's Method, 2025. URL https://CRAN.R-project.org/package=hommel. R package version 1.8

  32. [32]

    and Goeman, J

    Hemerik, J. and Goeman, J. J. False discovery proportion estimation by permutations: confidence for significance analysis of microarrays. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80 0 (1): 0 137--155, 2018 a

  33. [33]

    Permutation-based simultaneous confidence bounds for the false discovery proportion

    Hemerik, J., Solari, A., and Goeman, J. Permutation-based simultaneous confidence bounds for the false discovery proportion. Biometrika, 106 0 (3): 0 635--649, 2019

  34. [34]

    and Goeman, J

    Hemerik, J. and Goeman, J. J. Exact testing with random permutations. TEST, 27 0 (4): 0 811--825, 2018 b

  35. [35]

    J., and Finos, L

    Hemerik, J., Goeman, J. J., and Finos, L. Robust testing in generalized linear models by sign flipping score contributions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82 0 (3): 0 841--864, 2020

  36. [36]

    and Tamhane, A

    Hochberg, Y. and Tamhane, A. C. Multiple comparison procedures. John Wiley & Sons, Inc., 1987

  37. [37]

    A simple sequentially rejective multiple test procedure

    Holm, S. A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics, pages 65--70, 1979

  38. [38]

    A stagewise rejective multiple test procedure based on a modified bonferroni test

    Hommel, G. A stagewise rejective multiple test procedure based on a modified bonferroni test. Biometrika, pages 383--386, 1988

  39. [39]

    Studentized permutation tests for non-iid hypotheses and the generalized behrens-fisher problem

    Janssen, A. Studentized permutation tests for non-iid hypotheses and the generalized behrens-fisher problem. Statistics & probability letters, 36 0 (1): 0 9--21, 1997

  40. [40]

    Kashlak, A. B. Asymptotic symmetry and group invariance for randomization. arXiv preprint arXiv:2211.00144, 2022

  41. [41]

    Koning, N. W. and Hemerik, J. More efficient exact group invariance testing: using a representative subgroup. Biometrika, 111 0 (2): 0 441--458, 2024

  42. [42]

    The A mes I owa Housing data , 2025

    Kuhn, M. The A mes I owa Housing data , 2025. URL https://CRAN.R-project.org/package=AmesHousing. R package version 0.0.4

  43. [43]

    Lehmann, E. L. and Romano, J. P. Testing statistical hypotheses. Springer Science & Business Media, 2022

  44. [44]

    Lehmann, E. L. and Romano, J. P. Generalizations of the familywise error rate. volume 33, pages 1138--1154. 2005

  45. [45]

    J., Krebs, T

    Meijer, R. J., Krebs, T. J., and Goeman, J. J. Hommel's procedure in linear time. Biometrical Journal, 61 0 (1): 0 73--82, 2019

  46. [46]

    S., Dudoit, S., and van der Laan, M

    Pollard, K. S., Dudoit, S., and van der Laan, M. J. R package multtest. URL https://www.bioconductor.org/packages/release/bioc/html/multtest.html

  47. [47]

    S., Dudoit, S., and van der Laan, M

    Pollard, K. S., Dudoit, S., and van der Laan, M. J. Multiple testing procedures: the multtest package and applications to genomics. In Bioinformatics and computational biology solutions using R and bioconductor, pages 249--271. Springer, 2005

  48. [48]

    Potter, D. M. A permutation test for inference in logistic regression with small-and moderate-sized data sets. Statistics in medicine, 24 0 (5): 0 693--708, 2005

  49. [49]

    and Wang, R

    Ramdas, A. and Wang, R. Hypothesis testing with e-values. Foundations and Trends in Statistics , 1 0 (1-2): 0 1--390, 2025. doi:10.1561/STA

  50. [50]

    F., Cand \`e s, E

    Ramdas, A., Barber, R. F., Cand \`e s, E. J., and Tibshirani, R. J. Permutation tests using arbitrary permutation distributions. Sankhya A, 85 0 (2): 0 1156--1177, 2023

  51. [51]

    Romano, J. P. On the behavior of randomization tests without a group invariance assumption. Journal of the American Statistical Association, 85 0 (411): 0 686--692, 1990

  52. [52]

    Romano, J. P. and Shaikh, A. M. On stepdown control of the false discovery proportion. Lecture Notes-Monograph Series, pages 33--50, 2006

  53. [53]

    Romano, J. P. and Wolf, M. Stepwise multiple testing as formalized data snooping. Econometrica, 73 0 (4): 0 1237--1282, 2005

  54. [54]

    Romano, J. P. and Wolf, M. Control of generalized error rates in multiple testing. The Annals of Statistics, 35 0 (4): 0 1378--1408, 2007

  55. [55]

    Romano, J. P. and Wolf, M. Efficient computation of adjusted p-values for resampling-based stepdown multiple testing. Statistics & Probability Letters, 113: 0 38--40, 2016

  56. [56]

    Deep knockoffs

    Romano, Y., Sesia, M., and Cand \`e s, E. Deep knockoffs. Journal of the American Statistical Association, 115 0 (532): 0 1861--1872, 2020

  57. [57]

    Sarkar, S. K. Some probability inequalities for ordered mtp 2 random variables: a proof of the simes conjecture. Annals of Statistics, pages 494--504, 1998

  58. [58]

    Solari, A., Finos, L., and Goeman, J. J. Rotation-based multiple testing in the multivariate linear model. Biometrics, 70 0 (4): 0 954--961, 2014

  59. [59]

    K., Kim, S

    Southworth, L. K., Kim, S. K., and Owen, A. B. Properties of balanced permutations. Journal of Computational Biology, 16 0 (4): 0 625--638, 2009

  60. [60]

    Spreij, P. J. Measure theoretic probability. Course Notes, 2023. URL https://staff.fnwi.uva.nl/p.j.c.spreij/onderwijs/master/mtp.pdf

  61. [61]

    Storey, J. D. A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 0 (3): 0 479--498, 2002

  62. [62]

    Vesely, A., Finos, L., and Goeman, J. J. Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85 0 (3): 0 664--683, 2023

  63. [63]

    Elementary proofs of several results on false discovery rate

    Wang, R. Elementary proofs of several results on false discovery rate. arXiv preprint arXiv:2201.09350, 2022

  64. [64]

    Westfall, P. H. and Young, S. S. Resampling-based multiple testing: Examples and methods for p-value adjustment, volume 279. John Wiley & Sons, 1993

  65. [65]

    M., Ridgway, G

    Winkler, A. M., Ridgway, G. R., Webster, M. A., Smith, S. M., and Nichols, T. E. Permutation inference for the general linear model. Neuroimage, 92: 0 381--397, 2014