pith. sign in

arxiv: 2603.07312 · v3 · pith:JGJSSF5Qnew · submitted 2026-03-07 · 📊 stat.ME

Predictive Power Analysis of Multiple Test Procedures Under Arbitrary Dependence

Pith reviewed 2026-05-21 11:32 UTC · model grok-4.3

classification 📊 stat.ME
keywords Bayesian predictive power analysismultiple testing proceduresarbitrary dependencep-value weightssample size determinationFWER controlFDR controlDirichlet process MTP
0
0 comments X

The pith

A Bayesian method performs predictive power analysis for multiple testing procedures under arbitrary p-value dependence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a Bayesian approach to analyze the predictive power of multiple testing procedures that control family-wise error rate or false discovery rate when p-values may be interdependent in unknown ways. It constructs this analysis using a joint prior that mixes effect size distributions with a uniform prior over all possible correlation matrices to account for arbitrary dependencies. This allows researchers to calculate power and determine sample sizes for planned future studies such as replications, and to derive weights for p-values that help evaluate biases like p-hacking. A sympathetic reader would care because many real applications involve correlated tests, and this method avoids needing to model the exact correlations explicitly while providing practical planning tools.

Core claim

This study introduces a new and congenial method for Bayesian predictive power analysis, for power calculation and sample size determination for any given planned future study. The novel MTP predictive power analysis method is based on a joint prior distribution defining a scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions, factorized as a general prior distribution for effect sizes and a uniform prior distribution for correlation matrices representing arbitrary dependencies between p-values of test statistics under their alternative hypotheses. The new method also results in p-value weights which can be used to minimize the relative impacts of and to

What carries the argument

The joint prior distribution defining a scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions, factorized into a prior for effect sizes and a uniform prior for correlation matrices.

If this is right

  • The method supports power analysis and sample size planning for any MTP that controls FWER or FDR, including Bonferroni, Holm, Benjamini-Yekutieli, and DP-MTP.
  • P-value weights produced by the analysis can be used to assess and reduce the impact of significance-chasing biases without assuming independence of p-values.
  • The simulation-based procedure applies directly to planning replication or interim studies that involve multiple hypothesis tests.
  • The approach works for any planned future study by combining prior information on effect sizes with the uniform correlation prior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The derived p-value weights could be applied in meta-analyses to adjust for potential publication bias across multiple studies.
  • Extending the uniform correlation prior to high-dimensional settings might support power planning in genomics or neuroimaging where tests are numerous and dependent.
  • Combining this predictive power method with the DP-MTP could yield a fully Bayesian pipeline for both testing and study design under arbitrary dependence.

Load-bearing premise

The uniform prior distribution for correlation matrices representing arbitrary dependencies between p-values adequately captures the unknown dependence structure.

What would settle it

A concrete replication or interim study in which the observed power deviates substantially from the power predicted by the method after estimating the true correlation matrix from the collected data would challenge the central claim.

read the original abstract

Many statistical problems can be addressed by applying a multiple testing procedure (MTP) that controls either the Family-wise Error Rate (FWER) or False Discovery Rate (FDR) under unknown arbitrarily-interdependent $p$-values, without explicitly modeling these inter-correlations. They include the FWER-controlling Bonferroni (1936) MTP and Holm (1979) MTP; the FDR-controlling Benjamini and Yekutieli (2001) MTP; and the DP-MTP (Karabatsos, 2025), based on a Dirichlet process (DP) prior distribution supporting the entire space of MTPs that control either the FWER or FDR. For such an MTP, this study introduces a new and congenial method for Bayesian predictive power analysis, for power calculation and sample size determination for any given planned future (e.g., replication or interim) study. This novel MTP predictive power analysis method is based on a joint prior distribution defining a scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions, factorized as a general prior distribution for effect sizes (e.g., obtained from expert judgment or results of prior studies), and a uniform prior distribution for correlation matrices representing arbitrary dependencies between $p$-values of test statistics of given multiple hypothesis tests under their alternative hypotheses. The new MTP power analysis method also results in $p$-value weights which can be used to minimize the relative impacts of and assess for significance-chasing biases (e.g., publication bias, $p$-hacking, etc.) in multiple testing, without needing to assume that $p$-values (effect sizes) are independent. The new simulation-based MTP predictive power analysis method is illustrated through the analysis of $p$-values obtained by a famous study of lead exposure and re-analyzed by the previous MTP literature, using R package bnpMTP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a Bayesian predictive power analysis method for multiple testing procedures (MTPs) controlling FWER or FDR under arbitrary dependence among p-values. It builds on the DP-MTP (Karabatsos 2025) by defining a joint prior as a general effect-size prior (from expert judgment or prior studies) factorized with a uniform prior on correlation matrices to represent arbitrary dependencies between test statistics under alternatives. The method yields predictive power calculations and sample-size recommendations for future studies (e.g., replications) as well as p-value weights to assess significance-chasing biases, without assuming independence. The approach is illustrated via re-analysis of lead-exposure p-values using the bnpMTP R package.

Significance. If the central derivations hold and the uniform prior is rigorously constructed, the work could provide a practical tool for power analysis and sample-size planning in settings with dependent multiple tests, such as environmental epidemiology or genomics. It extends the DP-MTP framework with predictive elements and bias-assessment weights. The simulation-based implementation and use of an existing package are positive features, though the reliance on external priors for effect sizes creates a circularity that must be addressed for the predictive claims to be convincing.

major comments (2)
  1. [Methods, joint prior distribution] Methods section on the joint prior: The claim that the uniform prior on correlation matrices represents 'arbitrary dependence' without explicit modeling is load-bearing for the predictive power results. The manuscript must specify the exact construction (e.g., sampling on the manifold of PSD matrices with unit diagonal, or via a valid measure such as the uniform distribution induced by the Cholesky factor or hyperspherical coordinates) to ensure positive semi-definiteness and uniformity; otherwise the integrated posterior predictive distributions for power and p-value weights become sensitive to an unstated implementation choice.
  2. [Illustration section] Illustration and validation: The re-analysis of the lead-exposure data and any accompanying simulations should include explicit checks against known analytic cases (e.g., independent tests or simple equicorrelated structures) and report the effective sample size or convergence diagnostics for the Monte Carlo integration over the scale-matrix mixture. Without such validation, it is unclear whether the reported power and sample-size results are correctly derived or restate the input priors.
minor comments (2)
  1. [Abstract] The abstract refers to a 'scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions' without a brief definition or reference; add one sentence of clarification in the main text for readers unfamiliar with the construction.
  2. [Software and computation] Ensure that the R package bnpMTP version and any custom code for the uniform correlation prior are cited or made available so that the simulation-based method is fully reproducible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify key aspects of our Bayesian predictive power analysis for multiple testing procedures under arbitrary dependence. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Methods, joint prior distribution] Methods section on the joint prior: The claim that the uniform prior on correlation matrices represents 'arbitrary dependence' without explicit modeling is load-bearing for the predictive power results. The manuscript must specify the exact construction (e.g., sampling on the manifold of PSD matrices with unit diagonal, or via a valid measure such as the uniform distribution induced by the Cholesky factor or hyperspherical coordinates) to ensure positive semi-definiteness and uniformity; otherwise the integrated posterior predictive distributions for power and p-value weights become sensitive to an unstated implementation choice.

    Authors: We agree that an explicit specification of the uniform prior construction is essential for rigor and reproducibility. In the revised Methods section, we will add a detailed description of the sampling procedure using hyperspherical coordinates on the manifold of correlation matrices (ensuring unit diagonal and positive semi-definiteness). This will explicitly show how the uniform measure is induced and how it integrates with the scale-matrix mixture to yield the predictive power and p-value weights, addressing any sensitivity concerns. revision: yes

  2. Referee: [Illustration section] Illustration and validation: The re-analysis of the lead-exposure data and any accompanying simulations should include explicit checks against known analytic cases (e.g., independent tests or simple equicorrelated structures) and report the effective sample size or convergence diagnostics for the Monte Carlo integration over the scale-matrix mixture. Without such validation, it is unclear whether the reported power and sample-size results are correctly derived or restate the input priors.

    Authors: We thank the referee for highlighting the need for validation. In the revised illustration section, we will include direct comparisons of the computed predictive powers and sample-size recommendations against closed-form analytic results for the independent-tests case and for equicorrelated structures. We will also report effective sample sizes and standard Monte Carlo convergence diagnostics (e.g., Gelman-Rubin statistics and trace diagnostics) for the integrations over the scale-matrix mixture. These additions will confirm that the results are properly derived from the joint prior rather than simply echoing the input effect-size distributions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained Bayesian extension

full rationale

The paper defines a joint prior (general effect-size prior times uniform prior on correlation matrices) and uses it to compute posterior predictive power and sample-size quantities for a planned future study. This is the standard Bayesian predictive mechanism and does not reduce any claimed prediction to its inputs by construction. The reference to Karabatsos (2025) supplies the DP-MTP framework but is not invoked as a uniqueness theorem or to justify the new power-analysis formulas; the current contribution (predictive distributions, p-value weights, simulation method) adds independent content. The uniform prior on correlation matrices is an explicit modeling choice whose consequences can be checked against external benchmarks or alternative dependence structures. No self-definitional, fitted-input, or ansatz-smuggling steps appear in the provided derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the choice of a uniform prior over correlation matrices to represent arbitrary dependence and on the DP prior from the author's earlier work; these are domain assumptions rather than derived quantities.

free parameters (1)
  • parameters of the scale matrix mixture distribution
    Chosen or elicited from expert judgment or prior studies to define the joint prior on effect sizes and correlations.
axioms (1)
  • domain assumption Uniform prior over correlation matrices represents the entire space of possible arbitrary dependencies among p-values under the alternative
    Invoked to allow the method to handle unknown inter-correlations without explicit modeling.

pith-pipeline@v0.9.0 · 5871 in / 1468 out tokens · 60680 ms · 2026-05-21T11:32:19.454306+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.