Family-wise Error Rate Control with E-values
Pith reviewed 2026-05-23 05:15 UTC · model grok-4.3
The pith
E-value closed testing strongly controls post-hoc family-wise error rates and provides anytime-valid guarantees in sequential settings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
E-value-based closed testing strongly controls the post-hoc family-wise error rate in the static setting and inherits anytime-valid and always-valid FWER-controlling properties in the sequential setting. Extending the graphical approach by taking the weighted average of e-values as the local test statistic is strictly more powerful than the weighted Bonferroni procedure that uses inverse e-values as p-values. Polynomial-time algorithms exist via dynamic programming for any directed acyclic graph and for the special cases of the e-Holm and e-Fallback procedures.
What carries the argument
The e-value-based closed testing framework, which applies the closure principle directly to collections of e-values rather than p-values and thereby transfers FWER control.
If this is right
- Any collection of valid e-values can be turned into a post-hoc FWER-controlling procedure without further model assumptions.
- The same procedure remains valid at every stopping time in a sequential experiment.
- The graphical extension yields higher power than converting e-values to p-values for the same local tests.
- Dynamic programming computes the e-value graphical procedure in polynomial time for arbitrary DAGs.
- Tailored algorithms exist for the e-Holm and e-Fallback special cases.
Where Pith is reading between the lines
- The framework could be applied directly in universal-inference settings where only e-values are available for irregular models.
- It opens the possibility of combining e-values from different sequential experiments while preserving overall FWER control.
- Researchers could examine whether the reported power advantage appears in finite-sample simulations with dependent test statistics.
Load-bearing premise
The supplied e-values must satisfy the supermartingale or other validity conditions that let the closed-testing argument carry over without extra dependence restrictions.
What would settle it
A concrete counter-example in which a collection of individually valid e-values fails the supermartingale property and the resulting closed-testing procedure produces FWER greater than alpha on a fixed collection of hypotheses.
read the original abstract
The closure principle is a standard tool for achieving strong family-wise error rate (FWER) control in multiple testing problems. We develop an e-value-based closed testing framework that inherits nice properties of e-values, which are common in settings of sequential hypothesis testing or universal inference for irregular parametric models. We prove that e-value-based closed testing strongly controls the post-hoc FWER in the static setting, and has stronger anytime-valid and always-valid FWER-controlling properties in the sequential setting. Furthermore, we extend the celebrated graphical approach for FWER control (Bretz et al. 2009), using the weighted average of e-values for the local test, a strictly more powerful approach than weighted Bonferroni local tests with inverse e-values as p-values. In general, the computational cost for closed testing can be exponential in the number of hypotheses. Although the computational shortcuts for the p-value-based graphical approach are not applicable, we develop an efficient polynomial-time algorithm using dynamic programming for e-value-based graphical approaches with any directed acyclic graph, and tailored algorithms for the e-Holm procedure previously studied by Vovk and Wang (2021) and the e-Fallback procedure.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops an e-value-based closed testing framework for strong family-wise error rate (FWER) control. It proves that this framework achieves strong post-hoc FWER control in the static setting and stronger anytime-valid and always-valid FWER control in the sequential setting. It extends the graphical multiple testing procedure by using weighted averages of e-values as local tests (claimed to be strictly more powerful than weighted Bonferroni tests based on inverse e-values as p-values), and introduces a polynomial-time dynamic programming algorithm for the graphical approach on any DAG along with tailored algorithms for the e-Holm and e-Fallback procedures.
Significance. If the central claims hold, the work meaningfully extends closed testing and graphical methods to the e-value setting, inheriting e-values' advantages for sequential analysis and universal inference while delivering stronger validity guarantees than standard p-value approaches. The algorithmic contributions directly address the exponential cost of closed testing, enhancing practicality. The stress-test concern regarding supermartingale validity and dependence restrictions does not land: linearity of expectation ensures the weighted average remains a valid e-value (E[weighted avg] ≤ 1) under any intersection null and arbitrary dependence, and the supermartingale property is likewise preserved by linearity when each component is a supermartingale.
major comments (2)
- [Abstract / graphical approach extension] Abstract and graphical extension section: the claim that weighted averages of e-values yield a 'strictly more powerful' local test than weighted Bonferroni with inverse e-values as p-values is central to the extension's value, yet no explicit dominance theorem, power comparison, or counterexample-free argument is referenced to establish when and why the improvement occurs while preserving validity.
- [Sequential setting and algorithm sections] Sequential setting proofs: while linearity guarantees validity of the local e-value tests, the manuscript must explicitly verify that the dynamic programming algorithm and weighting scheme preserve the supermartingale property (E[· | filtration] ≤ previous value) without introducing implicit restrictions on the joint distribution or filtration that would prevent direct transfer of the always-valid FWER control from the p-value case.
minor comments (3)
- [Abstract / Introduction] The abstract is information-dense; early introduction of notation (e.g., definition of weighted average e-value, distinction between post-hoc vs. standard FWER) would improve readability.
- [References] Full bibliographic details for Bretz et al. (2009) and Vovk and Wang (2021) should appear in the reference list with consistent formatting.
- [Algorithm section] Figure or pseudocode for the dynamic programming algorithm would clarify the polynomial-time claim and its dependence on the DAG structure.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the recommendation of minor revision. The comments are helpful for strengthening the presentation of the graphical extension and the sequential properties. We address each major comment below and will incorporate clarifications in the revised version.
read point-by-point responses
-
Referee: [Abstract / graphical approach extension] Abstract and graphical extension section: the claim that weighted averages of e-values yield a 'strictly more powerful' local test than weighted Bonferroni with inverse e-values as p-values is central to the extension's value, yet no explicit dominance theorem, power comparison, or counterexample-free argument is referenced to establish when and why the improvement occurs while preserving validity.
Authors: We agree that an explicit comparison strengthens the claim. Validity of the weighted-average local e-value follows immediately from linearity of expectation under any dependence. For the power comparison, the weighted e-value test rejects when the average exceeds the threshold, while the inverse-e Bonferroni uses the harmonic-type weighting on reciprocals. We will add a short proposition in Section 4 showing that the e-value local test rejects at least as often as the inverse-e Bonferroni test (with equality only in degenerate cases) while preserving validity, thereby establishing the strict improvement in non-degenerate settings. revision: yes
-
Referee: [Sequential setting and algorithm sections] Sequential setting proofs: while linearity guarantees validity of the local e-value tests, the manuscript must explicitly verify that the dynamic programming algorithm and weighting scheme preserve the supermartingale property (E[· | filtration] ≤ previous value) without introducing implicit restrictions on the joint distribution or filtration that would prevent direct transfer of the always-valid FWER control from the p-value case.
Authors: We thank the referee for this suggestion. Because the dynamic programming recursion and the graph-based weighting consist solely of linear combinations (with fixed, non-random weights) of the underlying e-values, the conditional-expectation property is inherited directly: if each component process is a supermartingale, any fixed linear combination remains a supermartingale. The algorithm introduces no data-dependent reweighting or additional measurability requirements. We will insert a brief lemma or remark in the sequential section making this preservation explicit and confirming that the always-valid FWER control transfers without further restrictions on the filtration or dependence structure. revision: yes
Circularity Check
No circularity: proofs transfer closure principle to e-values under external validity conditions
full rationale
The derivation applies the standard closure principle to e-values satisfying supermartingale conditions, proves post-hoc/anytime/always-valid FWER control, and extends the Bretz et al. (2009) graphical approach via weighted e-value averages plus new DP algorithms. All steps rely on independent mathematical properties of e-values and prior external results (Bretz 2009; Vovk & Wang 2021); no equation reduces by construction to a fitted input, self-definition, or load-bearing self-citation chain.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption E-values are valid (non-negative supermartingales with expectation at most 1 under the null)
- domain assumption Closed testing principle applies to e-values to achieve FWER control
Forward citations
Cited by 3 Pith papers
-
The E-measure
E-measures generalize E-values to intersection-closed hypothesis classes, yielding uniform evidence bounds, automatic familywise evidence control without multiplicity correction, and a frequentist E-prior to E-posteri...
-
Generalized Boundary FDR Control under Arbitrary Dependence: An Approach on Closure Principle
Domino guarantees k-bFDR control under arbitrary dependence via the closure principle, extending boundary FDR methods to general settings for both p-values and e-values.
-
Weighted Holm Procedures: Theory, Properties, and Recommendations
The weighted Holm procedure (WHP) based on ordered weighted p-values is uniformly more powerful than the weighted alternative Holm procedure (WAP) based on ordered raw p-values, with stronger optimality properties und...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.