Anytime-valid simultaneous lower confidence bounds for the true discovery proportion

Friederike Preusse

arxiv: 2505.17803 · v3 · submitted 2025-05-23 · 📊 stat.ME

Anytime-valid simultaneous lower confidence bounds for the true discovery proportion

Friederike Preusse This is my paper

Pith reviewed 2026-05-19 13:52 UTC · model grok-4.3

classification 📊 stat.ME

keywords anytime-valid inferencesimultaneous confidence boundstrue discovery proportionclosed testingmultiple testingsequential analysisoptional stopping

0 comments

The pith

Combining closed testing with safe anytime-valid inference yields lower bounds on the true discovery proportion that hold at every time point and for every subset of hypotheses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a procedure for multiple hypothesis testing that supplies lower confidence bounds on the proportion of true discoveries. These bounds stay valid no matter when data collection stops and cover all possible groups of hypotheses at once. The bounds can be refreshed after each new observation while preserving their statistical guarantees. This property supports applications where data is expensive to gather, such as neuroscience experiments. A computational shortcut keeps the method practical even with many hypotheses under test.

Core claim

By merging the closed testing framework with safe anytime-valid inference, the authors construct lower confidence bounds for the true discovery proportion. These bounds remain valid at every observation time point and are simultaneous across all subsets of hypotheses. The underlying hypotheses stay fixed, but the subsets of interest can be chosen or changed at any time. The construction permits sequential updating of the bounds and optional stopping without loss of validity.

What carries the argument

Integration of closed testing with safe anytime-valid inference to produce simultaneous, time-uniform lower bounds on the true discovery proportion.

If this is right

The bounds can be recomputed after every new observation while retaining exact coverage.
Data collection may stop at any time chosen by the analyst without invalidating the results.
Any subset of hypotheses can be examined after seeing the data and still receive a valid lower bound.
The computational shortcut makes the method feasible for hundreds or thousands of hypotheses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support adaptive experiments that decide whether to continue based on interim lower bounds.
It may extend naturally to other sequential multiple-testing metrics such as false discovery proportions.
The same structure might apply to online decision problems where new tests arrive over time.

Load-bearing premise

The hypotheses under test are fixed in advance and do not change as new data arrive over time.

What would settle it

Simulate data from a known mixture of true and false null hypotheses, apply the procedure sequentially, stop at an arbitrary time, and verify whether the reported lower bound falls below the actual true discovery proportion more often than the nominal error rate allows.

read the original abstract

We propose a method that combines the closed testing framework with the concept of safe anytime-valid inference (SAVI) to compute lower confidence bounds for the true discovery proportion in a multiple testing setting. The proposed procedure provides confidence bounds that are valid at every observation time point and that are simultaneous for all possible subsets of hypotheses. While the hypotheses are assumed to be fixed over time, the subsets of interest may vary. Anytime-valid simultaneous confidence bounds allow us to sequentially update the bounds over time and allow for optional stopping. This is a desirable property in practical applications such as neuroscience, where data acquisition is costly and time-consuming. We also present a computational shortcut which makes the application of the proposed procedure feasible when the number of hypotheses under consideration is large. We illustrate the performance of the proposed method in a simulation study and give some practical guidelines on the implementation of the proposed procedure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper combines closed testing with SAVI to produce anytime-valid simultaneous lower bounds on the true discovery proportion, plus a computational shortcut that makes it practical for large numbers of hypotheses.

read the letter

The main takeaway is that this paper delivers a method for lower confidence bounds on the true discovery proportion that are valid at any stopping time and simultaneously for every possible subset of hypotheses. It achieves this by layering the closed testing procedure on top of safe anytime-valid inference. The new piece is the specific way these two frameworks are put together, along with the shortcut that avoids enumerating all subsets when the number of hypotheses grows. That shortcut is what makes the approach usable in practice rather than just theoretical. The paper also includes a simulation study that checks the performance and some guidelines for implementation. These elements show that the authors thought about how someone would actually apply the bounds in a setting like neuroscience, where data collection is expensive and you want to be able to stop early without invalidating the inference. The argument for validity seems to rest on standard properties of closed testing for simultaneity and SAVI for the time aspect, with the hypotheses fixed while subsets can be chosen later. That distinction is stated clearly, so there is no circularity or hidden assumption about the dependence structure. The simulation likely explores different scenarios, though I would look closely at how well it handles cases with positive dependence among the tests, as that can affect how tight the bounds end up being. Overall the construction looks sound and the practical additions are helpful. This kind of work is aimed at statisticians focused on multiple testing and sequential analysis. A reader who needs to control error rates in streaming data or adaptive experiments will find concrete value here. It is the sort of paper that deserves a serious referee because the core idea is well-motivated, the technical steps are laid out, and the application is relevant. I would recommend sending it for peer review.

Referee Report

1 major / 3 minor

Summary. The manuscript proposes a procedure that combines the closed testing framework with safe anytime-valid inference (SAVI) to construct lower confidence bounds for the true discovery proportion (TDP). These bounds are valid at every observation time point and hold simultaneously for all possible subsets of hypotheses. Hypotheses remain fixed while subsets of interest may vary over time, enabling sequential updating and optional stopping. A computational shortcut is introduced to make the method feasible for large numbers of hypotheses, with performance illustrated through a simulation study and accompanied by practical implementation guidelines.

Significance. If the validity arguments hold, the work provides a useful advance for sequential multiple testing by delivering anytime-valid and simultaneous TDP bounds that support flexible, post-hoc subset analysis without pre-specifying stopping times. The synthesis of closed testing (for simultaneity) and SAVI (for time-uniform validity) directly addresses needs in costly data-acquisition settings such as neuroscience. The computational shortcut and simulation evidence further support practical deployment, strengthening the contribution to adaptive inference methodology.

major comments (1)

[§4.2] §4.2 (computational shortcut): the claim that the shortcut preserves exact simultaneous validity under closed testing requires an explicit lemma or argument showing that the reduced enumeration does not introduce any gap in coverage for arbitrary subsets; without this, the feasibility claim for large p rests on an unverified preservation property.

minor comments (3)

[§5] The simulation study would benefit from explicit reporting of coverage at multiple stopping times to illustrate the anytime-valid property beyond fixed-sample results.
[Throughout] Notation for the TDP estimator and its bounds could be unified across sections to avoid minor inconsistencies in subscript usage between the main text and the appendix.
[§1] A brief comparison table contrasting the proposed bounds with existing sequential TDP methods (e.g., those based on martingale or e-value approaches) would improve context in the introduction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and the constructive major comment. We address it directly below.

read point-by-point responses

Referee: [§4.2] §4.2 (computational shortcut): the claim that the shortcut preserves exact simultaneous validity under closed testing requires an explicit lemma or argument showing that the reduced enumeration does not introduce any gap in coverage for arbitrary subsets; without this, the feasibility claim for large p rests on an unverified preservation property.

Authors: We agree that the current presentation of the computational shortcut in §4.2 would benefit from an explicit argument establishing preservation of exact simultaneous validity. In the revised manuscript we will add a new lemma (Lemma 4.1) that shows the reduced enumeration of intersections still guarantees the required coverage for every possible subset. The lemma proceeds by verifying that any subset whose closed-testing p-value is computed via the shortcut is bounded above by the p-value obtained from the full enumeration, which directly inherits the simultaneous validity from the underlying closed-testing and SAVI construction. This addition removes the unverified step while leaving the computational complexity reduction intact. revision: yes

Circularity Check

0 steps flagged

Derivation combines established external frameworks without circular reduction

full rationale

The paper integrates the closed testing framework (for simultaneous bounds over all subsets) with safe anytime-valid inference (SAVI) to obtain lower confidence bounds on the true discovery proportion that remain valid at every time point and under optional stopping. The abstract and construction explicitly treat hypotheses as fixed while allowing subsets to vary, and the computational shortcut is presented as preserving exact validity without introducing new assumptions on dependence or uniformity. No quoted equations or steps reduce a claimed result to a fitted parameter renamed as a prediction, a self-definition, or a load-bearing self-citation chain; the central claims rest on independent prior literature rather than tautological re-expression of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The procedure rests on the combination of two prior frameworks plus the assumption that hypotheses remain fixed while subsets can change; no new entities or free parameters are introduced in the abstract.

axioms (1)

domain assumption Hypotheses are fixed over time while subsets of interest may vary
Explicitly stated as the setting for the procedure.

pith-pipeline@v0.9.0 · 5668 in / 1116 out tokens · 54207 ms · 2026-05-19T13:52:18.189140+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a method that combines the closed testing framework with the concept of safe anytime-valid inference (SAVI) to compute lower confidence bounds for the true discovery proportion
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

An e-process corresponding to H_I is a nonnegative process (E^[n]_I) adapted to some filtration with E_P[E^[ν]_I] ≤ 1 for any F-stopping time ν

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.