Predictive Power Analysis of Multiple Test Procedures Under Arbitrary Dependence

George Karabatsos

arxiv: 2603.07312 · v3 · pith:JGJSSF5Qnew · submitted 2026-03-07 · 📊 stat.ME

Predictive Power Analysis of Multiple Test Procedures Under Arbitrary Dependence

George Karabatsos This is my paper

Pith reviewed 2026-05-21 11:32 UTC · model grok-4.3

classification 📊 stat.ME

keywords Bayesian predictive power analysismultiple testing proceduresarbitrary dependencep-value weightssample size determinationFWER controlFDR controlDirichlet process MTP

0 comments

The pith

A Bayesian method performs predictive power analysis for multiple testing procedures under arbitrary p-value dependence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a Bayesian approach to analyze the predictive power of multiple testing procedures that control family-wise error rate or false discovery rate when p-values may be interdependent in unknown ways. It constructs this analysis using a joint prior that mixes effect size distributions with a uniform prior over all possible correlation matrices to account for arbitrary dependencies. This allows researchers to calculate power and determine sample sizes for planned future studies such as replications, and to derive weights for p-values that help evaluate biases like p-hacking. A sympathetic reader would care because many real applications involve correlated tests, and this method avoids needing to model the exact correlations explicitly while providing practical planning tools.

Core claim

This study introduces a new and congenial method for Bayesian predictive power analysis, for power calculation and sample size determination for any given planned future study. The novel MTP predictive power analysis method is based on a joint prior distribution defining a scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions, factorized as a general prior distribution for effect sizes and a uniform prior distribution for correlation matrices representing arbitrary dependencies between p-values of test statistics under their alternative hypotheses. The new method also results in p-value weights which can be used to minimize the relative impacts of and to

What carries the argument

The joint prior distribution defining a scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions, factorized into a prior for effect sizes and a uniform prior for correlation matrices.

If this is right

The method supports power analysis and sample size planning for any MTP that controls FWER or FDR, including Bonferroni, Holm, Benjamini-Yekutieli, and DP-MTP.
P-value weights produced by the analysis can be used to assess and reduce the impact of significance-chasing biases without assuming independence of p-values.
The simulation-based procedure applies directly to planning replication or interim studies that involve multiple hypothesis tests.
The approach works for any planned future study by combining prior information on effect sizes with the uniform correlation prior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The derived p-value weights could be applied in meta-analyses to adjust for potential publication bias across multiple studies.
Extending the uniform correlation prior to high-dimensional settings might support power planning in genomics or neuroimaging where tests are numerous and dependent.
Combining this predictive power method with the DP-MTP could yield a fully Bayesian pipeline for both testing and study design under arbitrary dependence.

Load-bearing premise

The uniform prior distribution for correlation matrices representing arbitrary dependencies between p-values adequately captures the unknown dependence structure.

What would settle it

A concrete replication or interim study in which the observed power deviates substantially from the power predicted by the method after estimating the true correlation matrix from the collected data would challenge the central claim.

read the original abstract

Many statistical problems can be addressed by applying a multiple testing procedure (MTP) that controls either the Family-wise Error Rate (FWER) or False Discovery Rate (FDR) under unknown arbitrarily-interdependent $p$-values, without explicitly modeling these inter-correlations. They include the FWER-controlling Bonferroni (1936) MTP and Holm (1979) MTP; the FDR-controlling Benjamini and Yekutieli (2001) MTP; and the DP-MTP (Karabatsos, 2025), based on a Dirichlet process (DP) prior distribution supporting the entire space of MTPs that control either the FWER or FDR. For such an MTP, this study introduces a new and congenial method for Bayesian predictive power analysis, for power calculation and sample size determination for any given planned future (e.g., replication or interim) study. This novel MTP predictive power analysis method is based on a joint prior distribution defining a scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions, factorized as a general prior distribution for effect sizes (e.g., obtained from expert judgment or results of prior studies), and a uniform prior distribution for correlation matrices representing arbitrary dependencies between $p$-values of test statistics of given multiple hypothesis tests under their alternative hypotheses. The new MTP power analysis method also results in $p$-value weights which can be used to minimize the relative impacts of and assess for significance-chasing biases (e.g., publication bias, $p$-hacking, etc.) in multiple testing, without needing to assume that $p$-values (effect sizes) are independent. The new simulation-based MTP predictive power analysis method is illustrated through the analysis of $p$-values obtained by a famous study of lead exposure and re-analyzed by the previous MTP literature, using R package bnpMTP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

This paper offers a Bayesian predictive power method for MTPs under arbitrary dependence by factoring a joint prior into effect sizes and a uniform correlation-matrix distribution, extending the author's prior DP-MTP work. It aims at power and sample-size calculations for planned future studies while also producing p-value weights to check for biases like p-hacking without assuming independence. The lead-exposure example with the bnpMTP package shows a concrete application to real p-values from earlier literature. The factorization itself is the clearest new element relative to standard MTP power tools, and it keeps the dependence part separate so users can plug in effect-size information from experts or past data. That separation makes the approach more flexible than methods that force independence or a fixed correlation structure. The simulation-based nature fits the goal of handling arbitrary dependence without explicit modeling. The uniform prior on correlation matrices is the load-bearing piece for the arbitrary-dependence claim. It must be defined on the manifold of positive semi-definite matrices with unit diagonal to be truly uniform and valid; if the code instead samples unconstrained and projects or rejects, the resulting power numbers become sensitive to an unstated choice rather than genuinely arbitrary. The abstract does not show derivations, checks against known MTP cases, or sensitivity runs, so it is hard to judge how well the power calculations hold up in practice. Reliance on the 2025 DP-MTP and on chosen effect-size priors also means the outputs inherit whatever assumptions sit in those inputs. This is aimed at methodological statisticians who work on multiple-testing power analysis and want a Bayesian route that avoids strong independence assumptions. A reader planning replications or interim analyses with several tests would see direct value in the p-value weights and the predictive setup. It deserves peer review because the core construction addresses a practical need in applied multiple testing, even though the authors will need to add explicit prior-construction details and validation results.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a Bayesian predictive power analysis method for multiple testing procedures (MTPs) controlling FWER or FDR under arbitrary dependence among p-values. It builds on the DP-MTP (Karabatsos 2025) by defining a joint prior as a general effect-size prior (from expert judgment or prior studies) factorized with a uniform prior on correlation matrices to represent arbitrary dependencies between test statistics under alternatives. The method yields predictive power calculations and sample-size recommendations for future studies (e.g., replications) as well as p-value weights to assess significance-chasing biases, without assuming independence. The approach is illustrated via re-analysis of lead-exposure p-values using the bnpMTP R package.

Significance. If the central derivations hold and the uniform prior is rigorously constructed, the work could provide a practical tool for power analysis and sample-size planning in settings with dependent multiple tests, such as environmental epidemiology or genomics. It extends the DP-MTP framework with predictive elements and bias-assessment weights. The simulation-based implementation and use of an existing package are positive features, though the reliance on external priors for effect sizes creates a circularity that must be addressed for the predictive claims to be convincing.

major comments (2)

[Methods, joint prior distribution] Methods section on the joint prior: The claim that the uniform prior on correlation matrices represents 'arbitrary dependence' without explicit modeling is load-bearing for the predictive power results. The manuscript must specify the exact construction (e.g., sampling on the manifold of PSD matrices with unit diagonal, or via a valid measure such as the uniform distribution induced by the Cholesky factor or hyperspherical coordinates) to ensure positive semi-definiteness and uniformity; otherwise the integrated posterior predictive distributions for power and p-value weights become sensitive to an unstated implementation choice.
[Illustration section] Illustration and validation: The re-analysis of the lead-exposure data and any accompanying simulations should include explicit checks against known analytic cases (e.g., independent tests or simple equicorrelated structures) and report the effective sample size or convergence diagnostics for the Monte Carlo integration over the scale-matrix mixture. Without such validation, it is unclear whether the reported power and sample-size results are correctly derived or restate the input priors.

minor comments (2)

[Abstract] The abstract refers to a 'scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions' without a brief definition or reference; add one sentence of clarification in the main text for readers unfamiliar with the construction.
[Software and computation] Ensure that the R package bnpMTP version and any custom code for the uniform correlation prior are cited or made available so that the simulation-based method is fully reproducible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify key aspects of our Bayesian predictive power analysis for multiple testing procedures under arbitrary dependence. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Methods, joint prior distribution] Methods section on the joint prior: The claim that the uniform prior on correlation matrices represents 'arbitrary dependence' without explicit modeling is load-bearing for the predictive power results. The manuscript must specify the exact construction (e.g., sampling on the manifold of PSD matrices with unit diagonal, or via a valid measure such as the uniform distribution induced by the Cholesky factor or hyperspherical coordinates) to ensure positive semi-definiteness and uniformity; otherwise the integrated posterior predictive distributions for power and p-value weights become sensitive to an unstated implementation choice.

Authors: We agree that an explicit specification of the uniform prior construction is essential for rigor and reproducibility. In the revised Methods section, we will add a detailed description of the sampling procedure using hyperspherical coordinates on the manifold of correlation matrices (ensuring unit diagonal and positive semi-definiteness). This will explicitly show how the uniform measure is induced and how it integrates with the scale-matrix mixture to yield the predictive power and p-value weights, addressing any sensitivity concerns. revision: yes
Referee: [Illustration section] Illustration and validation: The re-analysis of the lead-exposure data and any accompanying simulations should include explicit checks against known analytic cases (e.g., independent tests or simple equicorrelated structures) and report the effective sample size or convergence diagnostics for the Monte Carlo integration over the scale-matrix mixture. Without such validation, it is unclear whether the reported power and sample-size results are correctly derived or restate the input priors.

Authors: We thank the referee for highlighting the need for validation. In the revised illustration section, we will include direct comparisons of the computed predictive powers and sample-size recommendations against closed-form analytic results for the independent-tests case and for equicorrelated structures. We will also report effective sample sizes and standard Monte Carlo convergence diagnostics (e.g., Gelman-Rubin statistics and trace diagnostics) for the integrations over the scale-matrix mixture. These additions will confirm that the results are properly derived from the joint prior rather than simply echoing the input effect-size distributions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained Bayesian extension

full rationale

The paper defines a joint prior (general effect-size prior times uniform prior on correlation matrices) and uses it to compute posterior predictive power and sample-size quantities for a planned future study. This is the standard Bayesian predictive mechanism and does not reduce any claimed prediction to its inputs by construction. The reference to Karabatsos (2025) supplies the DP-MTP framework but is not invoked as a uniqueness theorem or to justify the new power-analysis formulas; the current contribution (predictive distributions, p-value weights, simulation method) adds independent content. The uniform prior on correlation matrices is an explicit modeling choice whose consequences can be checked against external benchmarks or alternative dependence structures. No self-definitional, fitted-input, or ansatz-smuggling steps appear in the provided derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the choice of a uniform prior over correlation matrices to represent arbitrary dependence and on the DP prior from the author's earlier work; these are domain assumptions rather than derived quantities.

free parameters (1)

parameters of the scale matrix mixture distribution
Chosen or elicited from expert judgment or prior studies to define the joint prior on effect sizes and correlations.

axioms (1)

domain assumption Uniform prior over correlation matrices represents the entire space of possible arbitrary dependencies among p-values under the alternative
Invoked to allow the method to handle unknown inter-correlations without explicit modeling.

pith-pipeline@v0.9.0 · 5871 in / 1468 out tokens · 60680 ms · 2026-05-21T11:32:19.454306+00:00 · methodology

Review history (2 revisions) →

Predictive Power Analysis of Multiple Test Procedures Under Arbitrary Dependence

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)