Family-wise Error Rate Control with E-values

Lihua Lei; Will Hartog

arxiv: 2501.09015 · v4 · pith:UULVCLLKnew · submitted 2025-01-15 · 📊 stat.ME

Family-wise Error Rate Control with E-values

Will Hartog , Lihua Lei This is my paper

Pith reviewed 2026-05-23 05:15 UTC · model grok-4.3

classification 📊 stat.ME

keywords e-valuesclosed testingfamily-wise error ratemultiple testinggraphical approachsequential testingdynamic programming

0 comments

The pith

E-value closed testing strongly controls post-hoc family-wise error rates and provides anytime-valid guarantees in sequential settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a closed testing procedure that replaces p-values with e-values to control the family-wise error rate across multiple hypotheses. It proves strong post-hoc FWER control in fixed-sample settings and shows that the same construction yields anytime-valid and always-valid control when data arrive sequentially. The authors also adapt the graphical multiple-testing method so that local tests use weighted averages of e-values, which they prove is strictly more powerful than converting e-values to p-values and applying weighted Bonferroni. Efficient dynamic-programming algorithms are supplied that keep the procedures polynomial-time even though general closed testing is exponential. These results matter for analysts who already possess valid e-values from sequential or universal-inference settings and want to combine them into a single FWER-controlling report.

Core claim

E-value-based closed testing strongly controls the post-hoc family-wise error rate in the static setting and inherits anytime-valid and always-valid FWER-controlling properties in the sequential setting. Extending the graphical approach by taking the weighted average of e-values as the local test statistic is strictly more powerful than the weighted Bonferroni procedure that uses inverse e-values as p-values. Polynomial-time algorithms exist via dynamic programming for any directed acyclic graph and for the special cases of the e-Holm and e-Fallback procedures.

What carries the argument

The e-value-based closed testing framework, which applies the closure principle directly to collections of e-values rather than p-values and thereby transfers FWER control.

If this is right

Any collection of valid e-values can be turned into a post-hoc FWER-controlling procedure without further model assumptions.
The same procedure remains valid at every stopping time in a sequential experiment.
The graphical extension yields higher power than converting e-values to p-values for the same local tests.
Dynamic programming computes the e-value graphical procedure in polynomial time for arbitrary DAGs.
Tailored algorithms exist for the e-Holm and e-Fallback special cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be applied directly in universal-inference settings where only e-values are available for irregular models.
It opens the possibility of combining e-values from different sequential experiments while preserving overall FWER control.
Researchers could examine whether the reported power advantage appears in finite-sample simulations with dependent test statistics.

Load-bearing premise

The supplied e-values must satisfy the supermartingale or other validity conditions that let the closed-testing argument carry over without extra dependence restrictions.

What would settle it

A concrete counter-example in which a collection of individually valid e-values fails the supermartingale property and the resulting closed-testing procedure produces FWER greater than alpha on a fixed collection of hypotheses.

read the original abstract

The closure principle is a standard tool for achieving strong family-wise error rate (FWER) control in multiple testing problems. We develop an e-value-based closed testing framework that inherits nice properties of e-values, which are common in settings of sequential hypothesis testing or universal inference for irregular parametric models. We prove that e-value-based closed testing strongly controls the post-hoc FWER in the static setting, and has stronger anytime-valid and always-valid FWER-controlling properties in the sequential setting. Furthermore, we extend the celebrated graphical approach for FWER control (Bretz et al. 2009), using the weighted average of e-values for the local test, a strictly more powerful approach than weighted Bonferroni local tests with inverse e-values as p-values. In general, the computational cost for closed testing can be exponential in the number of hypotheses. Although the computational shortcuts for the p-value-based graphical approach are not applicable, we develop an efficient polynomial-time algorithm using dynamic programming for e-value-based graphical approaches with any directed acyclic graph, and tailored algorithms for the e-Holm procedure previously studied by Vovk and Wang (2021) and the e-Fallback procedure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a closed-testing procedure with e-values that adds post-hoc FWER control and anytime-valid sequential properties, plus a DP algorithm for the graphical extension.

read the letter

The main thing to know is that they replace p-values with e-values inside the closed testing principle and show it still delivers strong FWER control. In the static case this works post-hoc; in the sequential case it gives anytime-valid and always-valid guarantees. They also replace the usual weighted Bonferroni local test with a weighted average of e-values inside the graphical approach, which they claim is strictly more powerful, and they supply a dynamic-programming algorithm that runs in polynomial time on any DAG for the e-value version of the graphical method. The e-Holm and e-Fallback shortcuts get tailored algorithms too. These pieces are presented as new and not reducible to the p-value results they cite. The conceptual transfer looks clean on paper, and the computational contribution is concrete. The soft spot is the transfer of the supermartingale property. Linearity of expectation takes care of the marginal validity of the weighted-average local test under any dependence, but the sequential claims require the local tests to remain supermartingales under every intersection null. The abstract asserts the proofs exist, yet without the derivations it is not possible to see whether extra model restrictions slipped in. If those proofs are tight, the claims stand; if not, the sequential advantage shrinks. This is aimed at people already working with e-values in multiple testing or sequential analysis. A reader who knows the Bretz graphical method and Vovk-Wang e-Holm will see the differences immediately. The work is grounded enough to deserve a serious referee.

Referee Report

2 major / 3 minor

Summary. The paper develops an e-value-based closed testing framework for strong family-wise error rate (FWER) control. It proves that this framework achieves strong post-hoc FWER control in the static setting and stronger anytime-valid and always-valid FWER control in the sequential setting. It extends the graphical multiple testing procedure by using weighted averages of e-values as local tests (claimed to be strictly more powerful than weighted Bonferroni tests based on inverse e-values as p-values), and introduces a polynomial-time dynamic programming algorithm for the graphical approach on any DAG along with tailored algorithms for the e-Holm and e-Fallback procedures.

Significance. If the central claims hold, the work meaningfully extends closed testing and graphical methods to the e-value setting, inheriting e-values' advantages for sequential analysis and universal inference while delivering stronger validity guarantees than standard p-value approaches. The algorithmic contributions directly address the exponential cost of closed testing, enhancing practicality. The stress-test concern regarding supermartingale validity and dependence restrictions does not land: linearity of expectation ensures the weighted average remains a valid e-value (E[weighted avg] ≤ 1) under any intersection null and arbitrary dependence, and the supermartingale property is likewise preserved by linearity when each component is a supermartingale.

major comments (2)

[Abstract / graphical approach extension] Abstract and graphical extension section: the claim that weighted averages of e-values yield a 'strictly more powerful' local test than weighted Bonferroni with inverse e-values as p-values is central to the extension's value, yet no explicit dominance theorem, power comparison, or counterexample-free argument is referenced to establish when and why the improvement occurs while preserving validity.
[Sequential setting and algorithm sections] Sequential setting proofs: while linearity guarantees validity of the local e-value tests, the manuscript must explicitly verify that the dynamic programming algorithm and weighting scheme preserve the supermartingale property (E[· | filtration] ≤ previous value) without introducing implicit restrictions on the joint distribution or filtration that would prevent direct transfer of the always-valid FWER control from the p-value case.

minor comments (3)

[Abstract / Introduction] The abstract is information-dense; early introduction of notation (e.g., definition of weighted average e-value, distinction between post-hoc vs. standard FWER) would improve readability.
[References] Full bibliographic details for Bretz et al. (2009) and Vovk and Wang (2021) should appear in the reference list with consistent formatting.
[Algorithm section] Figure or pseudocode for the dynamic programming algorithm would clarify the polynomial-time claim and its dependence on the DAG structure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the recommendation of minor revision. The comments are helpful for strengthening the presentation of the graphical extension and the sequential properties. We address each major comment below and will incorporate clarifications in the revised version.

read point-by-point responses

Referee: [Abstract / graphical approach extension] Abstract and graphical extension section: the claim that weighted averages of e-values yield a 'strictly more powerful' local test than weighted Bonferroni with inverse e-values as p-values is central to the extension's value, yet no explicit dominance theorem, power comparison, or counterexample-free argument is referenced to establish when and why the improvement occurs while preserving validity.

Authors: We agree that an explicit comparison strengthens the claim. Validity of the weighted-average local e-value follows immediately from linearity of expectation under any dependence. For the power comparison, the weighted e-value test rejects when the average exceeds the threshold, while the inverse-e Bonferroni uses the harmonic-type weighting on reciprocals. We will add a short proposition in Section 4 showing that the e-value local test rejects at least as often as the inverse-e Bonferroni test (with equality only in degenerate cases) while preserving validity, thereby establishing the strict improvement in non-degenerate settings. revision: yes
Referee: [Sequential setting and algorithm sections] Sequential setting proofs: while linearity guarantees validity of the local e-value tests, the manuscript must explicitly verify that the dynamic programming algorithm and weighting scheme preserve the supermartingale property (E[· | filtration] ≤ previous value) without introducing implicit restrictions on the joint distribution or filtration that would prevent direct transfer of the always-valid FWER control from the p-value case.

Authors: We thank the referee for this suggestion. Because the dynamic programming recursion and the graph-based weighting consist solely of linear combinations (with fixed, non-random weights) of the underlying e-values, the conditional-expectation property is inherited directly: if each component process is a supermartingale, any fixed linear combination remains a supermartingale. The algorithm introduces no data-dependent reweighting or additional measurability requirements. We will insert a brief lemma or remark in the sequential section making this preservation explicit and confirming that the always-valid FWER control transfers without further restrictions on the filtration or dependence structure. revision: yes

Circularity Check

0 steps flagged

No circularity: proofs transfer closure principle to e-values under external validity conditions

full rationale

The derivation applies the standard closure principle to e-values satisfying supermartingale conditions, proves post-hoc/anytime/always-valid FWER control, and extends the Bretz et al. (2009) graphical approach via weighted e-value averages plus new DP algorithms. All steps rely on independent mathematical properties of e-values and prior external results (Bretz 2009; Vovk & Wang 2021); no equation reduces by construction to a fitted input, self-definition, or load-bearing self-citation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on abstract; no free parameters, invented entities, or ad-hoc axioms identified beyond standard domain assumptions about e-values.

axioms (2)

domain assumption E-values are valid (non-negative supermartingales with expectation at most 1 under the null)
Core property invoked for inheriting nice properties in the framework.
domain assumption Closed testing principle applies to e-values to achieve FWER control
Foundational for transferring control properties to the new setting.

pith-pipeline@v0.9.0 · 5728 in / 1333 out tokens · 52580 ms · 2026-05-23T05:15:14.688751+00:00 · methodology

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The E-measure
math.ST 2026-04 unverdicted novelty 7.0

E-measures generalize E-values to intersection-closed hypothesis classes, yielding uniform evidence bounds, automatic familywise evidence control without multiplicity correction, and a frequentist E-prior to E-posteri...
Generalized Boundary FDR Control under Arbitrary Dependence: An Approach on Closure Principle
stat.ME 2026-05 unverdicted novelty 6.0

Domino guarantees k-bFDR control under arbitrary dependence via the closure principle, extending boundary FDR methods to general settings for both p-values and e-values.
Weighted Holm Procedures: Theory, Properties, and Recommendations
stat.ME 2026-04 conditional novelty 5.0

The weighted Holm procedure (WHP) based on ordered weighted p-values is uniformly more powerful than the weighted alternative Holm procedure (WAP) based on ordered raw p-values, with stronger optimality properties und...