pith. sign in

arxiv: 2505.17803 · v3 · submitted 2025-05-23 · 📊 stat.ME

Anytime-valid simultaneous lower confidence bounds for the true discovery proportion

Pith reviewed 2026-05-19 13:52 UTC · model grok-4.3

classification 📊 stat.ME
keywords anytime-valid inferencesimultaneous confidence boundstrue discovery proportionclosed testingmultiple testingsequential analysisoptional stopping
0
0 comments X

The pith

Combining closed testing with safe anytime-valid inference yields lower bounds on the true discovery proportion that hold at every time point and for every subset of hypotheses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a procedure for multiple hypothesis testing that supplies lower confidence bounds on the proportion of true discoveries. These bounds stay valid no matter when data collection stops and cover all possible groups of hypotheses at once. The bounds can be refreshed after each new observation while preserving their statistical guarantees. This property supports applications where data is expensive to gather, such as neuroscience experiments. A computational shortcut keeps the method practical even with many hypotheses under test.

Core claim

By merging the closed testing framework with safe anytime-valid inference, the authors construct lower confidence bounds for the true discovery proportion. These bounds remain valid at every observation time point and are simultaneous across all subsets of hypotheses. The underlying hypotheses stay fixed, but the subsets of interest can be chosen or changed at any time. The construction permits sequential updating of the bounds and optional stopping without loss of validity.

What carries the argument

Integration of closed testing with safe anytime-valid inference to produce simultaneous, time-uniform lower bounds on the true discovery proportion.

If this is right

  • The bounds can be recomputed after every new observation while retaining exact coverage.
  • Data collection may stop at any time chosen by the analyst without invalidating the results.
  • Any subset of hypotheses can be examined after seeing the data and still receive a valid lower bound.
  • The computational shortcut makes the method feasible for hundreds or thousands of hypotheses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could support adaptive experiments that decide whether to continue based on interim lower bounds.
  • It may extend naturally to other sequential multiple-testing metrics such as false discovery proportions.
  • The same structure might apply to online decision problems where new tests arrive over time.

Load-bearing premise

The hypotheses under test are fixed in advance and do not change as new data arrive over time.

What would settle it

Simulate data from a known mixture of true and false null hypotheses, apply the procedure sequentially, stop at an arbitrary time, and verify whether the reported lower bound falls below the actual true discovery proportion more often than the nominal error rate allows.

read the original abstract

We propose a method that combines the closed testing framework with the concept of safe anytime-valid inference (SAVI) to compute lower confidence bounds for the true discovery proportion in a multiple testing setting. The proposed procedure provides confidence bounds that are valid at every observation time point and that are simultaneous for all possible subsets of hypotheses. While the hypotheses are assumed to be fixed over time, the subsets of interest may vary. Anytime-valid simultaneous confidence bounds allow us to sequentially update the bounds over time and allow for optional stopping. This is a desirable property in practical applications such as neuroscience, where data acquisition is costly and time-consuming. We also present a computational shortcut which makes the application of the proposed procedure feasible when the number of hypotheses under consideration is large. We illustrate the performance of the proposed method in a simulation study and give some practical guidelines on the implementation of the proposed procedure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript proposes a procedure that combines the closed testing framework with safe anytime-valid inference (SAVI) to construct lower confidence bounds for the true discovery proportion (TDP). These bounds are valid at every observation time point and hold simultaneously for all possible subsets of hypotheses. Hypotheses remain fixed while subsets of interest may vary over time, enabling sequential updating and optional stopping. A computational shortcut is introduced to make the method feasible for large numbers of hypotheses, with performance illustrated through a simulation study and accompanied by practical implementation guidelines.

Significance. If the validity arguments hold, the work provides a useful advance for sequential multiple testing by delivering anytime-valid and simultaneous TDP bounds that support flexible, post-hoc subset analysis without pre-specifying stopping times. The synthesis of closed testing (for simultaneity) and SAVI (for time-uniform validity) directly addresses needs in costly data-acquisition settings such as neuroscience. The computational shortcut and simulation evidence further support practical deployment, strengthening the contribution to adaptive inference methodology.

major comments (1)
  1. [§4.2] §4.2 (computational shortcut): the claim that the shortcut preserves exact simultaneous validity under closed testing requires an explicit lemma or argument showing that the reduced enumeration does not introduce any gap in coverage for arbitrary subsets; without this, the feasibility claim for large p rests on an unverified preservation property.
minor comments (3)
  1. [§5] The simulation study would benefit from explicit reporting of coverage at multiple stopping times to illustrate the anytime-valid property beyond fixed-sample results.
  2. [Throughout] Notation for the TDP estimator and its bounds could be unified across sections to avoid minor inconsistencies in subscript usage between the main text and the appendix.
  3. [§1] A brief comparison table contrasting the proposed bounds with existing sequential TDP methods (e.g., those based on martingale or e-value approaches) would improve context in the introduction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and the constructive major comment. We address it directly below.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (computational shortcut): the claim that the shortcut preserves exact simultaneous validity under closed testing requires an explicit lemma or argument showing that the reduced enumeration does not introduce any gap in coverage for arbitrary subsets; without this, the feasibility claim for large p rests on an unverified preservation property.

    Authors: We agree that the current presentation of the computational shortcut in §4.2 would benefit from an explicit argument establishing preservation of exact simultaneous validity. In the revised manuscript we will add a new lemma (Lemma 4.1) that shows the reduced enumeration of intersections still guarantees the required coverage for every possible subset. The lemma proceeds by verifying that any subset whose closed-testing p-value is computed via the shortcut is bounded above by the p-value obtained from the full enumeration, which directly inherits the simultaneous validity from the underlying closed-testing and SAVI construction. This addition removes the unverified step while leaving the computational complexity reduction intact. revision: yes

Circularity Check

0 steps flagged

Derivation combines established external frameworks without circular reduction

full rationale

The paper integrates the closed testing framework (for simultaneous bounds over all subsets) with safe anytime-valid inference (SAVI) to obtain lower confidence bounds on the true discovery proportion that remain valid at every time point and under optional stopping. The abstract and construction explicitly treat hypotheses as fixed while allowing subsets to vary, and the computational shortcut is presented as preserving exact validity without introducing new assumptions on dependence or uniformity. No quoted equations or steps reduce a claimed result to a fitted parameter renamed as a prediction, a self-definition, or a load-bearing self-citation chain; the central claims rest on independent prior literature rather than tautological re-expression of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The procedure rests on the combination of two prior frameworks plus the assumption that hypotheses remain fixed while subsets can change; no new entities or free parameters are introduced in the abstract.

axioms (1)
  • domain assumption Hypotheses are fixed over time while subsets of interest may vary
    Explicitly stated as the setting for the procedure.

pith-pipeline@v0.9.0 · 5668 in / 1116 out tokens · 54207 ms · 2026-05-19T13:52:18.189140+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.