Predictive Power Analysis of Multiple Test Procedures Under Arbitrary Dependence
Pith reviewed 2026-05-21 11:32 UTC · model grok-4.3
The pith
A Bayesian method performs predictive power analysis for multiple testing procedures under arbitrary p-value dependence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This study introduces a new and congenial method for Bayesian predictive power analysis, for power calculation and sample size determination for any given planned future study. The novel MTP predictive power analysis method is based on a joint prior distribution defining a scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions, factorized as a general prior distribution for effect sizes and a uniform prior distribution for correlation matrices representing arbitrary dependencies between p-values of test statistics under their alternative hypotheses. The new method also results in p-value weights which can be used to minimize the relative impacts of and to
What carries the argument
The joint prior distribution defining a scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions, factorized into a prior for effect sizes and a uniform prior for correlation matrices.
If this is right
- The method supports power analysis and sample size planning for any MTP that controls FWER or FDR, including Bonferroni, Holm, Benjamini-Yekutieli, and DP-MTP.
- P-value weights produced by the analysis can be used to assess and reduce the impact of significance-chasing biases without assuming independence of p-values.
- The simulation-based procedure applies directly to planning replication or interim studies that involve multiple hypothesis tests.
- The approach works for any planned future study by combining prior information on effect sizes with the uniform correlation prior.
Where Pith is reading between the lines
- The derived p-value weights could be applied in meta-analyses to adjust for potential publication bias across multiple studies.
- Extending the uniform correlation prior to high-dimensional settings might support power planning in genomics or neuroimaging where tests are numerous and dependent.
- Combining this predictive power method with the DP-MTP could yield a fully Bayesian pipeline for both testing and study design under arbitrary dependence.
Load-bearing premise
The uniform prior distribution for correlation matrices representing arbitrary dependencies between p-values adequately captures the unknown dependence structure.
What would settle it
A concrete replication or interim study in which the observed power deviates substantially from the power predicted by the method after estimating the true correlation matrix from the collected data would challenge the central claim.
read the original abstract
Many statistical problems can be addressed by applying a multiple testing procedure (MTP) that controls either the Family-wise Error Rate (FWER) or False Discovery Rate (FDR) under unknown arbitrarily-interdependent $p$-values, without explicitly modeling these inter-correlations. They include the FWER-controlling Bonferroni (1936) MTP and Holm (1979) MTP; the FDR-controlling Benjamini and Yekutieli (2001) MTP; and the DP-MTP (Karabatsos, 2025), based on a Dirichlet process (DP) prior distribution supporting the entire space of MTPs that control either the FWER or FDR. For such an MTP, this study introduces a new and congenial method for Bayesian predictive power analysis, for power calculation and sample size determination for any given planned future (e.g., replication or interim) study. This novel MTP predictive power analysis method is based on a joint prior distribution defining a scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions, factorized as a general prior distribution for effect sizes (e.g., obtained from expert judgment or results of prior studies), and a uniform prior distribution for correlation matrices representing arbitrary dependencies between $p$-values of test statistics of given multiple hypothesis tests under their alternative hypotheses. The new MTP power analysis method also results in $p$-value weights which can be used to minimize the relative impacts of and assess for significance-chasing biases (e.g., publication bias, $p$-hacking, etc.) in multiple testing, without needing to assume that $p$-values (effect sizes) are independent. The new simulation-based MTP predictive power analysis method is illustrated through the analysis of $p$-values obtained by a famous study of lead exposure and re-analyzed by the previous MTP literature, using R package bnpMTP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a Bayesian predictive power analysis method for multiple testing procedures (MTPs) controlling FWER or FDR under arbitrary dependence among p-values. It builds on the DP-MTP (Karabatsos 2025) by defining a joint prior as a general effect-size prior (from expert judgment or prior studies) factorized with a uniform prior on correlation matrices to represent arbitrary dependencies between test statistics under alternatives. The method yields predictive power calculations and sample-size recommendations for future studies (e.g., replications) as well as p-value weights to assess significance-chasing biases, without assuming independence. The approach is illustrated via re-analysis of lead-exposure p-values using the bnpMTP R package.
Significance. If the central derivations hold and the uniform prior is rigorously constructed, the work could provide a practical tool for power analysis and sample-size planning in settings with dependent multiple tests, such as environmental epidemiology or genomics. It extends the DP-MTP framework with predictive elements and bias-assessment weights. The simulation-based implementation and use of an existing package are positive features, though the reliance on external priors for effect sizes creates a circularity that must be addressed for the predictive claims to be convincing.
major comments (2)
- [Methods, joint prior distribution] Methods section on the joint prior: The claim that the uniform prior on correlation matrices represents 'arbitrary dependence' without explicit modeling is load-bearing for the predictive power results. The manuscript must specify the exact construction (e.g., sampling on the manifold of PSD matrices with unit diagonal, or via a valid measure such as the uniform distribution induced by the Cholesky factor or hyperspherical coordinates) to ensure positive semi-definiteness and uniformity; otherwise the integrated posterior predictive distributions for power and p-value weights become sensitive to an unstated implementation choice.
- [Illustration section] Illustration and validation: The re-analysis of the lead-exposure data and any accompanying simulations should include explicit checks against known analytic cases (e.g., independent tests or simple equicorrelated structures) and report the effective sample size or convergence diagnostics for the Monte Carlo integration over the scale-matrix mixture. Without such validation, it is unclear whether the reported power and sample-size results are correctly derived or restate the input priors.
minor comments (2)
- [Abstract] The abstract refers to a 'scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions' without a brief definition or reference; add one sentence of clarification in the main text for readers unfamiliar with the construction.
- [Software and computation] Ensure that the R package bnpMTP version and any custom code for the uniform correlation prior are cited or made available so that the simulation-based method is fully reproducible.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which help clarify key aspects of our Bayesian predictive power analysis for multiple testing procedures under arbitrary dependence. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Methods, joint prior distribution] Methods section on the joint prior: The claim that the uniform prior on correlation matrices represents 'arbitrary dependence' without explicit modeling is load-bearing for the predictive power results. The manuscript must specify the exact construction (e.g., sampling on the manifold of PSD matrices with unit diagonal, or via a valid measure such as the uniform distribution induced by the Cholesky factor or hyperspherical coordinates) to ensure positive semi-definiteness and uniformity; otherwise the integrated posterior predictive distributions for power and p-value weights become sensitive to an unstated implementation choice.
Authors: We agree that an explicit specification of the uniform prior construction is essential for rigor and reproducibility. In the revised Methods section, we will add a detailed description of the sampling procedure using hyperspherical coordinates on the manifold of correlation matrices (ensuring unit diagonal and positive semi-definiteness). This will explicitly show how the uniform measure is induced and how it integrates with the scale-matrix mixture to yield the predictive power and p-value weights, addressing any sensitivity concerns. revision: yes
-
Referee: [Illustration section] Illustration and validation: The re-analysis of the lead-exposure data and any accompanying simulations should include explicit checks against known analytic cases (e.g., independent tests or simple equicorrelated structures) and report the effective sample size or convergence diagnostics for the Monte Carlo integration over the scale-matrix mixture. Without such validation, it is unclear whether the reported power and sample-size results are correctly derived or restate the input priors.
Authors: We thank the referee for highlighting the need for validation. In the revised illustration section, we will include direct comparisons of the computed predictive powers and sample-size recommendations against closed-form analytic results for the independent-tests case and for equicorrelated structures. We will also report effective sample sizes and standard Monte Carlo convergence diagnostics (e.g., Gelman-Rubin statistics and trace diagnostics) for the integrations over the scale-matrix mixture. These additions will confirm that the results are properly derived from the joint prior rather than simply echoing the input effect-size distributions. revision: yes
Circularity Check
No significant circularity; derivation is self-contained Bayesian extension
full rationale
The paper defines a joint prior (general effect-size prior times uniform prior on correlation matrices) and uses it to compute posterior predictive power and sample-size quantities for a planned future study. This is the standard Bayesian predictive mechanism and does not reduce any claimed prediction to its inputs by construction. The reference to Karabatsos (2025) supplies the DP-MTP framework but is not invoked as a uniqueness theorem or to justify the new power-analysis formulas; the current contribution (predictive distributions, p-value weights, simulation method) adds independent content. The uniform prior on correlation matrices is an explicit modeling choice whose consequences can be checked against external benchmarks or alternative dependence structures. No self-definitional, fitted-input, or ansatz-smuggling steps appear in the provided derivation chain.
Axiom & Free-Parameter Ledger
free parameters (1)
- parameters of the scale matrix mixture distribution
axioms (1)
- domain assumption Uniform prior over correlation matrices represents the entire space of possible arbitrary dependencies among p-values under the alternative
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.