A Regret Perspective on Online Multiple Testing

Fang Kong; Hongxin Wei; Kongchang Zhou; Qingyang Hao

arxiv: 2605.13916 · v1 · pith:X3WYPJSNnew · submitted 2026-05-13 · 📊 stat.ML · cs.AI· cs.LG

A Regret Perspective on Online Multiple Testing

Qingyang Hao , Kongchang Zhou , Fang Kong , Hongxin Wei This is my paper

Pith reviewed 2026-05-15 02:50 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG

keywords online multiple testingfalse discovery rateweighted regretthreshold depletionrandom perturbationcold startasymptotic safetybursty environments

0 comments

The pith

Deterministic online FDR control forces linear regret from early threshold depletion, but a history-decoupled non-negative perturbation reduces it to order sqrt(T) without adding false negatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In online multiple testing, any deterministic rule that strictly controls false discovery rate pays a linear penalty in weighted regret. Early periods with few signals drive the rejection threshold down, so later bursts of real signals are mostly missed. The paper introduces Decoupled-OMT, a wrapper that adds a random perturbation depending only on past observations and always kept non-negative. This change stops severe depletion, introduces no extra false negatives, keeps long-run FDR control exact under stationarity, and brings regret down to the square-root order in bursty streams. The result matters for high-volume automated pipelines where missing real effects is far costlier than occasional false alarms.

Core claim

We prove the Duality of Regret Conservation: any purely deterministic procedure that maintains strict FDR control must incur an Omega(T) linear regret penalty, because threshold depletion during signal-sparse cold starts produces massive false negatives later. Decoupled-OMT is a baseline-agnostic meta-wrapper that injects a history-decoupled, strictly non-negative random perturbation. It prevents severe depletion, adds zero additional false negatives, preserves exact asymptotic FDR safety in stationary environments, bounds finite-sample error inflation during cold starts, and delivers an order-optimal Omega(sqrt(T)) regret reduction in bursty environments, with an explicit cold-start tax for

What carries the argument

The history-decoupled strictly non-negative random perturbation, which raises or leaves thresholds unchanged based only on past data and never increases the chance of false negatives.

If this is right

Any existing deterministic online multiple testing method can be wrapped to achieve sublinear regret in bursty environments.
Asymptotic FDR control remains exact in stationary streams.
Finite-sample error inflation stays bounded during cold starts with no added false negatives.
The derived cold-start tax marks the precise phase transition where the perturbed wrapper becomes superior.
Regret drops from linear to square-root order while preserving the original safety guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same style of history-decoupled non-negative perturbation may limit depletion effects in other sequential decision settings with asymmetric error costs.
The cold-start tax supplies a practical rule for choosing when to activate the wrapper based on expected signal density.
Extending the duality argument to non-stationary streams could expose additional regret-safety trade-offs.
The result points to a general limit on how much strict control can be maintained without paying linear opportunity cost in long online inference tasks.

Load-bearing premise

Testing streams are exogenous, so decisions never affect future data generation, and the perturbation stays strictly non-negative and independent of the full decision history.

What would settle it

In a long sequence that begins with sparse signals and later contains bursts, check whether cumulative weighted regret of a deterministic FDR controller grows linearly while the perturbed version grows only as the square root of time and produces no extra false negatives.

Figures

Figures reproduced from arXiv: 2605.13916 by Fang Kong, Hongxin Wei, Kongchang Zhou, Qingyang Hao.

**Figure 2.** Figure 2: Regret Phase Transition and Geometric Synthesis. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Ablation on Virtual Decoupling (LOND Framework). Evaluated across stationary (top) [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation on Virtual Decoupling (SAFFRON Framework). Similar to the LOND evaluation, [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: Universality and Parameter Sensitivity. Left: The U-shaped weighted regret curve reveals a broad optimal region for the exploration coefficient κ. Center: DOMT maintains robust superiority across varying signal densities π1, particularly dominating in highly sparse environments. Right: The S-shaped advantage curve confirms consistent efficacy gains across diverse signal strengths µ. F.4 Advancing Modern OM… view at source ↗

**Figure 6.** Figure 6: Comparisons in Stationary Environments. Both DOMT-ADDIS (p [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗

**Figure 7.** Figure 7: Comparisons in Bursty Environments. During macroscopic pure-null droughts, determin [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗

**Figure 8.** Figure 8: Real-World Application: RNA-Seq Microarray Data. [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗

**Figure 9.** Figure 9: Real-World Application: S&P 500 Financial Anomalies. In this bursty environment, DOMT [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗

**Figure 10.** Figure 10: Real-World Application: IMPC Mouse Phenotype Data. The dual-plot validates the [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗

read the original abstract

Online Multiple Testing (OMT), a fundamental pillar of sequential statistical inference, traditionally evaluates the False Discovery Rate (FDR) and statistical power in isolation, obscuring the highly asymmetric costs of false positives and false negatives in modern automated pipelines. To unify this evaluation, we introduce $\textit{Weighted Regret}$. Under this metric, we prove the $\textit{Duality of Regret Conservation}$: purely deterministic procedures ensuring strict FDR control inevitably incur an $\Omega(T)$ linear regret penalty, as threshold depletion during signal-sparse cold starts forces massive false negatives. Tailored for exogenous testing streams, we propose Decoupled-OMT (DOMT) as a baseline-agnostic meta-wrapper. By incorporating a history-decoupled, strictly non-negative random perturbation, DOMT rescues purely deterministic baselines from severe threshold depletion. Crucially, it preserves exact asymptotic safety in stationary environments and rigorously bounds finite-sample error inflation during cold-starts. Guaranteeing zero additional false negatives, it yields an order-optimal $\Omega(\sqrt{T})$ regret reduction in bursty environments, with a derived ``Cold-Start Tax'' characterizing the exact phase transition of algorithmic superiority. Experiments validate that DOMT consistently curtails empirical weighted regret, achieving an order-optimal sublinear mitigation of threshold depletion to navigate the non-stationary Pareto frontier.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames online multiple testing through weighted regret and shows deterministic FDR controllers pay linear regret from cold-start threshold depletion, then offers a simple perturbation wrapper that cuts it to sqrt(T) under its assumptions.

read the letter

The main contribution is the regret lens itself. It unifies FDR control and power into a single weighted regret metric and proves that any deterministic procedure keeping strict FDR must suffer Omega(T) regret because early sparse periods deplete the threshold and miss later signals. That duality is clean on paper and explains why standard methods look bad in bursty streams. DOMT then wraps any baseline with a history-decoupled non-negative perturbation that, by construction, adds no extra false negatives while restoring some rejections. The finite-sample bound on error inflation and the cold-start tax phase transition are the parts that feel new and potentially useful for automated pipelines. Experiments are reported to show the sqrt(T) improvement, which is the right order if the perturbation works as claimed. The soft spot is exactly the one the stress-test flags: the proof that the perturbation never increases false negatives rests on it being strictly non-negative and independent of past decisions, but in a signal-sparse cold start the adaptive threshold can sit near the decision boundary, so even a small positive shift might still interact with the baseline rule and lose signals the analysis says are protected. Without seeing the full derivation of how the perturbation is added to the p-value or threshold sequence, it is hard to judge whether that interaction is fully ruled out. The stationarity assumption for asymptotic safety is standard but limits applicability to truly exogenous streams. Overall this is a solid theoretical reframing with a practical wrapper that deserves referee time; the core math looks honest and the empirical claim is falsifiable. I would bring it to a reading group to check the perturbation details and would cite the regret duality if the bounds survive review.

Referee Report

3 major / 2 minor

Summary. The paper introduces Weighted Regret as a unified metric for online multiple testing that balances FDR control against the asymmetric costs of false positives and negatives. It proves the Duality of Regret Conservation, showing that any deterministic procedure with strict FDR control incurs an Ω(T) linear regret penalty due to threshold depletion in signal-sparse cold starts. The authors propose Decoupled-OMT (DOMT), a baseline-agnostic meta-wrapper that applies a history-decoupled, strictly non-negative random perturbation to rescue deterministic baselines, achieving an order-optimal Ω(√T) regret reduction in bursty exogenous streams while preserving exact asymptotic FDR safety in stationary environments and providing finite-sample bounds on error inflation with a derived Cold-Start Tax.

Significance. If the duality lower bound and DOMT perturbation analysis hold under the stated assumptions, the work supplies a regret-theoretic lens for OMT that directly addresses practical costs in automated pipelines and yields a concrete, order-optimal improvement over deterministic baselines in non-stationary settings. The explicit characterization of the phase transition via the Cold-Start Tax and the guarantee of zero additional false negatives are particularly valuable for sequential inference applications.

major comments (3)

[§3] §3 (perturbation definition and finite-sample bounds): The claim that a strictly non-negative, history-decoupled perturbation adds zero false negatives while only rescuing threshold depletion is load-bearing for both the Ω(√T) regret reduction and the Cold-Start Tax. In signal-sparse cold-start regimes, even a non-negative shift can interact with the baseline's adaptive threshold rule near decision boundaries, potentially increasing missed signals beyond the stated bound; the analysis must explicitly rule out this interaction or provide a counter-example-free proof.
[Proof of Duality of Regret Conservation] Proof of Duality of Regret Conservation: The Ω(T) lower bound for deterministic procedures is derived from threshold depletion during cold starts, but the argument appears to rely on exogenous streams and stationarity for the asymptotic safety guarantee. The finite-sample regime where the linear penalty is most severe must be shown to satisfy the same non-interaction condition used for DOMT, otherwise the duality does not strictly separate deterministic from perturbed methods.
[Experiments] Experimental section: Validation that DOMT achieves order-optimal sublinear mitigation of threshold depletion is reported, yet the abstract and available description omit the precise simulation protocol for bursty environments, the choice of perturbation scale, the set of baselines, and how post-hoc parameter tuning was avoided. Without these details it is impossible to confirm that the observed Ω(√T) improvement is not an artifact of the experimental design.

minor comments (2)

Notation for Weighted Regret and the Cold-Start Tax should be introduced with explicit definitions before the duality statement to improve readability.
The abstract states that DOMT is 'baseline-agnostic,' but the finite-sample error-inflation bound may implicitly depend on properties of the underlying deterministic procedure; clarify the scope of this agnosticism.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful and constructive feedback on our manuscript. We address each of the major comments point by point below. We have made revisions to the manuscript to strengthen the analysis and provide additional details as requested.

read point-by-point responses

Referee: [§3] §3 (perturbation definition and finite-sample bounds): The claim that a strictly non-negative, history-decoupled perturbation adds zero false negatives while only rescuing threshold depletion is load-bearing for both the Ω(√T) regret reduction and the Cold-Start Tax. In signal-sparse cold-start regimes, even a non-negative shift can interact with the baseline's adaptive threshold rule near decision boundaries, potentially increasing missed signals beyond the stated bound; the analysis must explicitly rule out this interaction or provide a counter-example-free proof.

Authors: We appreciate the referee pointing out the need for a more rigorous treatment of potential interactions in the perturbation analysis. We agree that this requires explicit clarification. In the revised manuscript, we have added a detailed proof in Section 3 that demonstrates, under the history-decoupled and strictly non-negative properties, the perturbation cannot increase the number of false negatives beyond the baseline's decisions. Specifically, we show that any threshold increase due to the perturbation is compensated by the decoupling, preventing additional misses at decision boundaries. This is formalized in a new lemma providing a counter-example-free bound. revision: yes
Referee: [Proof of Duality of Regret Conservation] Proof of Duality of Regret Conservation: The Ω(T) lower bound for deterministic procedures is derived from threshold depletion during cold starts, but the argument appears to rely on exogenous streams and stationarity for the asymptotic safety guarantee. The finite-sample regime where the linear penalty is most severe must be shown to satisfy the same non-interaction condition used for DOMT, otherwise the duality does not strictly separate deterministic from perturbed methods.

Authors: The referee correctly identifies a potential gap in the separation argument. The lower bound for deterministic procedures is established directly from the threshold depletion mechanism without invoking the perturbation at all. In the revised proof, we have clarified that the duality holds in the finite-sample regime under the exogenous stream model, and the non-interaction condition is not needed for the deterministic case since there is no perturbation. We have updated the proof to explicitly compare the two regimes and maintain the strict separation. revision: yes
Referee: [Experiments] Experimental section: Validation that DOMT achieves order-optimal sublinear mitigation of threshold depletion is reported, yet the abstract and available description omit the precise simulation protocol for bursty environments, the choice of perturbation scale, the set of baselines, and how post-hoc parameter tuning was avoided. Without these details it is impossible to confirm that the observed Ω(√T) improvement is not an artifact of the experimental design.

Authors: We agree that the experimental protocol requires more precise documentation to allow full reproducibility and validation. In the revised manuscript, we have substantially expanded the Experiments section to detail the simulation protocol for bursty environments, specify the perturbation scale used (consistent with the theoretical √T scaling), list all baselines evaluated, and confirm that parameters were set based on theoretical considerations without post-hoc tuning. We have also included the full set of simulation parameters in a supplementary table. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives the Duality of Regret Conservation lower bound and the Ω(√T) regret reduction for DOMT from explicit modeling assumptions on exogenous streams, stationarity, and the perturbation's strictly non-negative history-decoupled properties. These are stated as inputs to the analysis (e.g., the perturbation is defined to add zero additional false negatives by construction of its sign and independence), rather than being fitted or redefined from the regret outputs themselves. No equations reduce the main claims to self-citations, ansatzes smuggled via prior work, or renaming of known patterns; the finite-sample bounds follow from the stated perturbation rules without tautological collapse. The derivation remains self-contained against the given assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claims rest on standard statistical assumptions for FDR control plus new constructs (weighted regret, history-decoupled perturbation) introduced without external validation in the provided abstract.

axioms (2)

domain assumption Testing streams are exogenous (decisions do not influence future observations)
Stated as the setting for which DOMT is tailored.
domain assumption Stationarity of the environment for asymptotic FDR safety
Required for the claim that DOMT preserves exact asymptotic safety.

invented entities (2)

Weighted Regret no independent evidence
purpose: Unify FDR control and statistical power under a single cost metric
New evaluation criterion introduced to capture asymmetric error costs.
Decoupled-OMT (DOMT) no independent evidence
purpose: Meta-wrapper that adds history-decoupled non-negative perturbation to rescue deterministic baselines
New algorithm proposed to achieve sublinear regret.

pith-pipeline@v0.9.0 · 5534 in / 1601 out tokens · 41085 ms · 2026-05-15T02:50:20.563875+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 1 internal anchor

[1]

Conformal risk control

Anastasios Nikolas Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, and Tal Schuster. Conformal risk control. InICLR, 2024

work page 2024
[2]

Online local false discovery rate control: A resource allocation approach.CoRR, abs/2402.11425, 2024

Ruicheng Ao, Hongyu Chen, David Simchi-Levi, and Feng Zhu. Online local false discovery rate control: A resource allocation approach.CoRR, abs/2402.11425, 2024

work page arXiv 2024
[3]

Cap: A general algorithm for online selective conformal prediction with fcr control.Journal of Machine Learning Research, 26(287):1–74, 2025

Yajie Bao, Yuyang Huo, Haojie Ren, and Changliang Zou. Cap: A general algorithm for online selective conformal prediction with fcr control.Journal of Machine Learning Research, 26(287):1–74, 2025

work page 2025
[4]

Practical adversarial multivalid conformal prediction

Osbert Bastani, Varun Gupta, Christopher Jung, Georgy Noarov, Ramya Ramalingam, and Aaron Roth. Practical adversarial multivalid conformal prediction. InNeurIPS, 2022

work page 2022
[5]

Stephen Bates, Anastasios Nikolas Angelopoulos, Lihua Lei, Jitendra Malik, and Michael I. Jordan. Distribution-free, risk-controlling prediction sets.J. ACM, 68:43:1–43:34, 2021

work page 2021
[6]

Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995

work page 1995
[7]

Contextual online false discovery rate control

Shiyun Chen and Shiva Prasad Kasiviswanathan. Contextual online false discovery rate control. InAISTATS, 2019

work page 2019
[8]

The minimax-regret decision framework for online a/b tests.SSRN Electronic Journal, 2023

Khai Xiang Chiong and Joonhwi Joo. The minimax-regret decision framework for online a/b tests.SSRN Electronic Journal, 2023

work page 2023
[9]

van Nes, Martin W

Clara Delecroix, Egbert H. van Nes, Martin W. Scheffer, and Ingrid A van de Leemput. Mon- itoring resilience in bursts.Proceedings of the National Academy of Sciences of the United States of America, 121, 2023

work page 2023
[10]

Online multiple testing with super- uniformity reward.Electronic Journal of Statistics, 18(1):1293–1354, 2024

Sebastian Döhler, Iqraa Meah, and Etienne Roquain. Online multiple testing with super- uniformity reward.Electronic Journal of Statistics, 18(1):1293–1354, 2024

work page 2024
[11]

Cambridge University Press, 2012

Bradley Efron.Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, volume 1. Cambridge University Press, 2012

work page 2012
[12]

The online closure principle.The Annals of Statistics, 52(2):817–841, 2024

Lasse Fischer, Marta Bofill Roig, and Werner Brannath. The online closure principle.The Annals of Statistics, 52(2):817–841, 2024

work page 2024
[13]

Online generalizations of the e-BH and BH procedure

Lasse Fischer, Ziyu Xu, and Aaditya Ramdas. An online generalization of the (e-) benjamini- hochberg procedure.arXiv preprint arXiv:2407.20683, 2024

work page arXiv 2024
[14]

α-investing: a procedure for sequential control of expected false discoveries.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 2008

Dean P Foster and Robert A Stine. α-investing: a procedure for sequential control of expected false discoveries.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 2008

work page 2008
[15]

Structure–adaptive sequential testing for online false discovery rate control.Journal of the American Statistical Association, 118:732 – 745, 2020

Bowen Gang, Wenguang Sun, and Weinan Wang. Structure–adaptive sequential testing for online false discovery rate control.Journal of the American Statistical Association, 118:732 – 745, 2020

work page 2020
[16]

Adaptive conformal inference under distribution shift

Isaac Gibbs and Emmanuel Candes. Adaptive conformal inference under distribution shift. In NeurIPS, 2021

work page 2021
[17]

Wilson, Pilar Cacheiro, Anthony Frost, Piia Keskivali-Bond, Bora Vardal, Aaron McCoy, Tsz Kwan Cheng, Luis A

Tudor Groza, Federico López-Gómez, Hamed Haseli Mashhadi, Violeta Muñoz-Fuentes, Os- man Nuri Günes, Robert J. Wilson, Pilar Cacheiro, Anthony Frost, Piia Keskivali-Bond, Bora Vardal, Aaron McCoy, Tsz Kwan Cheng, Luis A. Santos, Sara E. Wells, Damian Smedley, Ann-Marie Mallon, and Helen E. Parkinson. The international mouse phenotyping consortium: compreh...

work page 2022
[18]

Online rules for control of false discovery rate and false discovery exceedance.The Annals of Statistics, 46(2):526 – 554, 2018

Adel Javanmard and Andrea Montanari. Online rules for control of false discovery rate and false discovery exceedance.The Annals of Statistics, 46(2):526 – 554, 2018. 10

work page 2018
[19]

Weintraub

Ramesh Johari, Hannah Li, Inessa Liskovich, and Gabriel Y . Weintraub. Experimental design in two-sided platforms: An analysis of bias.Management Science, 68(10):7069–7089, 2022

work page 2022
[20]

Lie to me: Knowledge graphs for robust hallucination self-detection in llms.ArXiv, abs/2512.23547, 2025

Sahil Kale and Antonio Luca Alfeo. Lie to me: Knowledge graphs for robust hallucination self-detection in llms.ArXiv, abs/2512.23547, 2025

work page arXiv 2025
[21]

Score: A unified framework for overshoot refund in online fdr control.arXiv preprint arXiv:2601.20386, 2026

Qi Kuang, Bowen Gang, and Yin Xia. Score: A unified framework for overshoot refund in online fdr control.arXiv preprint arXiv:2601.20386, 2026

work page internal anchor Pith review arXiv 2026
[22]

Tze Leung Lai and Herbert E. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4–22, 1985

work page 1985
[23]

From bandit regret to fdr control: Online selective generation with feedback unlocking

Minjae Lee, Yoonjae Jung, and Sangdon Park. From bandit regret to fdr control: Online selective generation with feedback unlocking. InAgentic AI in the Wild: From Hallucinations to Reliable Autonomy, 2026

work page 2026
[24]

Synthetic-powered multiple testing with fdr control.ArXiv, abs/2602.16690, 2026

Yonghoon Lee, Meshi Bashari, Edgar Dobriban, and Yaniv Romano. Synthetic-powered multiple testing with fdr control.ArXiv, abs/2602.16690, 2026

work page arXiv 2026
[25]

Sequential multiple testing: A second-order asymptotic analysis

Jingyu Liu and Yanglei Song. Sequential multiple testing: A second-order asymptotic analysis. arXiv preprint arXiv:2603.04685, 2026

work page arXiv 2026
[26]

Tracking the risk of a deployed model and detecting harmful distribution shifts

Aleksandr Podkopaev and Aaditya Ramdas. Tracking the risk of a deployed model and detecting harmful distribution shifts. InICLR, 2022

work page 2022
[27]

Online control of the false discovery rate with decaying memory

Aaditya Ramdas, Fanny Yang, Martin J Wainwright, and Michael I Jordan. Online control of the false discovery rate with decaying memory. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,NeurIPS, 2017

work page 2017
[28]

Wainwright, and Michael I

Aaditya Ramdas, Tijana Zrnic, Martin J. Wainwright, and Michael I. Jordan. Saffron: an adaptive algorithm for online control of the false discovery rate. InICML, 2018

work page 2018
[29]

Robertson, James M

David S. Robertson, James M. S. Wason, and Aaditya Ramdas. Online multiple hypothesis testing.Statistical science : a review journal of the Institute of Mathematical Statistics, 38:557 – 575, 2022

work page 2022
[30]

Online error rate control for platform trials.Statistics in Medicine, 42(14):2475–2495, 2023

David S Robertson, James MS Wason, Franz König, Martin Posch, and Thomas Jaki. Online error rate control for platform trials.Statistics in Medicine, 42(14):2475–2495, 2023

work page 2023
[31]

Leonard J. Savage. The theory of statistical decision.Journal of the American Statistical Association, 46:55–67, 1951

work page 1951
[32]

Multiple hypothesis testing.Annual review of psychology, 46(1):561– 584, 1995

Juliet Popper Shaffer et al. Multiple hypothesis testing.Annual review of psychology, 46(1):561– 584, 1995

work page 1995
[33]

Febbo, Kenneth N

Dinesh Singh, Phillip G. Febbo, Kenneth N. Ross, Donald G. Jackson, Judith B. Manola, Christine Ladd, Pablo Tamayo, Andrew A. Renshaw, Anthony V . D’Amico, Jerome P. Richie, Eric S. Lander, Massimo Loda, Philip W. Kantoff, Todd R. Golub, and William R. Sellers. Gene expression correlates of clinical prostate cancer behavior.Cancer cell, 1 2:203–9, 2002

work page 2002
[34]

ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls

Jinjin Tian and Aaditya Ramdas. ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls. InNeurIPS, 2019

work page 2019
[35]

Online control of the familywise error rate.Statistical Methods in Medical Research, 30(4):976–993, 2021

Jinjin Tian and Aaditya Ramdas. Online control of the familywise error rate.Statistical Methods in Medical Research, 30(4):976–993, 2021

work page 2021
[36]

A unified framework for bandit multiple testing

Ziyu Xu and Aaditya Ramdas. A unified framework for bandit multiple testing. InNeurIPS, 2021

work page 2021
[37]

Dynamic algorithms for online multiple testing

Ziyu Xu and Aaditya Ramdas. Dynamic algorithms for online multiple testing. InMathematical and Scientific Machine Learning, pages 955–986, 2022

work page 2022
[38]

Online multiple testing with e-values

Ziyu Xu and Aaditya Ramdas. Online multiple testing with e-values. InAISTATS, 2023. 11

work page 2023
[39]

More powerful multiple testing under dependence via random- ization

Ziyu Xu and Aaditya Ramdas. More powerful multiple testing under dependence via random- ization. InAISTATS, 2024

work page 2024
[40]

Jamieson, and Martin J

Fanny Yang, Aaditya Ramdas, Kevin G. Jamieson, and Martin J. Wainwright. A framework for multi-a(rmed)/b(andit) testing with online fdr control. InNeurIPS, 2017

work page 2017
[41]

e-gai: e-value-based generalized α-investing for online false discovery rate control

Yifan Zhang, Zijian Wei, Haojie Ren, and Changliang Zou. e-gai: e-value-based generalized α-investing for online false discovery rate control. InICML, 2025

work page 2025
[42]

False discovery rate control for structured multiple testing: Asymmetric rules and conformal q-values.Journal of the American Statistical Association, 120:805 – 817, 2023

Zinan Zhao and Wenguang Sun. False discovery rate control for structured multiple testing: Asymmetric rules and conformal q-values.Journal of the American Statistical Association, 120:805 – 817, 2023

work page 2023
[43]

Cold-Start Tax

Tijana Zrnic, Aaditya Ramdas, and Michael I Jordan. Asynchronous online testing of multiple hypotheses.Journal of Machine Learning Research, 22(33):1–39, 2021. 12 Organization of the Appendix To facilitate navigation and comprehensive review, the appendix are organized as follows: • Appendix A (Related Work):It positions our Weighted Regret perspective an...

work page 2021
[44]

alpha-death

and ADDIS [34]. To counter the “alpha-death” inherent in macroscopic cold-starts [9], various mechanisms have emerged. The SURE framework exploits conservative null structures to effectively reward the algorithm [10], while SupLORD employs dynamic scheduling to adjust testing levels based on wealth [37]. Concurrently, structure-adaptive methods like SAST ...

work page
[45]

In the second phase (t>ρT), nulls occur with probabilityπ, incurring a rate ofπE[ξt]

Exploration Cost ( ∆V T ):In the first phase ( t≤ρT), the environment is purely null, incurring a marginal penalty rate of 1·E[ξt]. In the second phase (t>ρT), nulls occur with probabilityπ, incurring a rate ofπE[ξt]. ∆V T≃ ∫ ρT 0 κα 2 √ tdt+ ∫ T ρT πκα 2 √ tdt=κα √ T(√ρ+π(1−√ρ)).(57)

work page
[46]

Cold-Start Tax

Discovery Dividend ( ∆M T ):Signals ( Yt = 1 ) occur only in the second phase with probability1−π. Given the detectabilityµ, the marginal recovery rate is(1−π)µE[ξt]. ∆M T≃ ∫ T ρT (1−π)µκα 2 √ tdt= (1−π)µκα √ T(1−√ρ).(58) Step 2: Solving for the Critical ThresholdM∗.The DOMT framework achieves a net reduction in weighted regret if b∆M T > a∆VT , or equiva...

work page
[47]

The baseline parameter generation is predictable ( Vt−1-measurable): λbase t = f(δbase 1 ,...,δbase t−1)

work page
[48]

The virtual decision utilizes the current evidence without future noise, making it Vt- measurable:δbase t =1{et≥1/λbase t }

work page
[49]

wealth pollution

The virtual wealth W base t used to establish safety is updated exclusively by these virtual decisions and predictable parameters. Because the exploration noise ξt is causally decoupled from the virtual state, the parameter λbase t retains its strict predictability under the original uncorrupted filtrationVt. Consequently, the virtual processM base t safe...

work page 2002
[50]

broken-window

The 100% Transient FDP Phenomenon:During infinite pure-noise cold-starts ( Yt≤T0 ≡0), the baseline discoveries are strictly zero. Any exploratory false positive triggers an instantaneous localized empirical FDP ofVt/Rt = 100% (sinceVt =R t). We explicitly clarify that this is not a failure of FDR control, but a topological necessity of 0/0→1/1. DOMT consc...

work page
[51]

weak-to-moderate

The Singularity of Extreme Signals:Theorem 2 dictates that deterministic baselines suffer Ω(T) missed discoveries due to threshold depletion. However, this absolute penalty critically hinges on the global Lipschitz conditionG(x)≤Lx, which mathematically confines the environment to “weak-to-moderate” signals. If the real-world signal distribution exhibits ...

work page
[52]

Contextual DOMT:Rather than applying an indiscriminate, uniform spatial noise distribution ξt∼Uniform[0,ϵt] across the entire sequence, we envision an attention-based or covariate-guided exploration policyκθ(xt). By training a secondary machine learning model to estimate the localized prior probability of anomalies, DOMT could dynamically compress the spa...

work page
[53]

stochastic tunneling

Complex Topologies (DAGs and PRDS):Modern inference increasingly involves complex topolo- gies, such as testing Directed Acyclic Graphs (DAGs) in gene regulatory networks or tracking spatial contagion in neuroimaging (e.g., fMRI) and financial markets. Traditional FDR control often breaks down or suffers from severe starvation under arbitrary dependencies...

work page

[1] [1]

Conformal risk control

Anastasios Nikolas Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, and Tal Schuster. Conformal risk control. InICLR, 2024

work page 2024

[2] [2]

Online local false discovery rate control: A resource allocation approach.CoRR, abs/2402.11425, 2024

Ruicheng Ao, Hongyu Chen, David Simchi-Levi, and Feng Zhu. Online local false discovery rate control: A resource allocation approach.CoRR, abs/2402.11425, 2024

work page arXiv 2024

[3] [3]

Cap: A general algorithm for online selective conformal prediction with fcr control.Journal of Machine Learning Research, 26(287):1–74, 2025

Yajie Bao, Yuyang Huo, Haojie Ren, and Changliang Zou. Cap: A general algorithm for online selective conformal prediction with fcr control.Journal of Machine Learning Research, 26(287):1–74, 2025

work page 2025

[4] [4]

Practical adversarial multivalid conformal prediction

Osbert Bastani, Varun Gupta, Christopher Jung, Georgy Noarov, Ramya Ramalingam, and Aaron Roth. Practical adversarial multivalid conformal prediction. InNeurIPS, 2022

work page 2022

[5] [5]

Stephen Bates, Anastasios Nikolas Angelopoulos, Lihua Lei, Jitendra Malik, and Michael I. Jordan. Distribution-free, risk-controlling prediction sets.J. ACM, 68:43:1–43:34, 2021

work page 2021

[6] [6]

Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995

work page 1995

[7] [7]

Contextual online false discovery rate control

Shiyun Chen and Shiva Prasad Kasiviswanathan. Contextual online false discovery rate control. InAISTATS, 2019

work page 2019

[8] [8]

The minimax-regret decision framework for online a/b tests.SSRN Electronic Journal, 2023

Khai Xiang Chiong and Joonhwi Joo. The minimax-regret decision framework for online a/b tests.SSRN Electronic Journal, 2023

work page 2023

[9] [9]

van Nes, Martin W

Clara Delecroix, Egbert H. van Nes, Martin W. Scheffer, and Ingrid A van de Leemput. Mon- itoring resilience in bursts.Proceedings of the National Academy of Sciences of the United States of America, 121, 2023

work page 2023

[10] [10]

Online multiple testing with super- uniformity reward.Electronic Journal of Statistics, 18(1):1293–1354, 2024

Sebastian Döhler, Iqraa Meah, and Etienne Roquain. Online multiple testing with super- uniformity reward.Electronic Journal of Statistics, 18(1):1293–1354, 2024

work page 2024

[11] [11]

Cambridge University Press, 2012

Bradley Efron.Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, volume 1. Cambridge University Press, 2012

work page 2012

[12] [12]

The online closure principle.The Annals of Statistics, 52(2):817–841, 2024

Lasse Fischer, Marta Bofill Roig, and Werner Brannath. The online closure principle.The Annals of Statistics, 52(2):817–841, 2024

work page 2024

[13] [13]

Online generalizations of the e-BH and BH procedure

Lasse Fischer, Ziyu Xu, and Aaditya Ramdas. An online generalization of the (e-) benjamini- hochberg procedure.arXiv preprint arXiv:2407.20683, 2024

work page arXiv 2024

[14] [14]

α-investing: a procedure for sequential control of expected false discoveries.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 2008

Dean P Foster and Robert A Stine. α-investing: a procedure for sequential control of expected false discoveries.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 2008

work page 2008

[15] [15]

Structure–adaptive sequential testing for online false discovery rate control.Journal of the American Statistical Association, 118:732 – 745, 2020

Bowen Gang, Wenguang Sun, and Weinan Wang. Structure–adaptive sequential testing for online false discovery rate control.Journal of the American Statistical Association, 118:732 – 745, 2020

work page 2020

[16] [16]

Adaptive conformal inference under distribution shift

Isaac Gibbs and Emmanuel Candes. Adaptive conformal inference under distribution shift. In NeurIPS, 2021

work page 2021

[17] [17]

Wilson, Pilar Cacheiro, Anthony Frost, Piia Keskivali-Bond, Bora Vardal, Aaron McCoy, Tsz Kwan Cheng, Luis A

Tudor Groza, Federico López-Gómez, Hamed Haseli Mashhadi, Violeta Muñoz-Fuentes, Os- man Nuri Günes, Robert J. Wilson, Pilar Cacheiro, Anthony Frost, Piia Keskivali-Bond, Bora Vardal, Aaron McCoy, Tsz Kwan Cheng, Luis A. Santos, Sara E. Wells, Damian Smedley, Ann-Marie Mallon, and Helen E. Parkinson. The international mouse phenotyping consortium: compreh...

work page 2022

[18] [18]

Online rules for control of false discovery rate and false discovery exceedance.The Annals of Statistics, 46(2):526 – 554, 2018

Adel Javanmard and Andrea Montanari. Online rules for control of false discovery rate and false discovery exceedance.The Annals of Statistics, 46(2):526 – 554, 2018. 10

work page 2018

[19] [19]

Weintraub

Ramesh Johari, Hannah Li, Inessa Liskovich, and Gabriel Y . Weintraub. Experimental design in two-sided platforms: An analysis of bias.Management Science, 68(10):7069–7089, 2022

work page 2022

[20] [20]

Lie to me: Knowledge graphs for robust hallucination self-detection in llms.ArXiv, abs/2512.23547, 2025

Sahil Kale and Antonio Luca Alfeo. Lie to me: Knowledge graphs for robust hallucination self-detection in llms.ArXiv, abs/2512.23547, 2025

work page arXiv 2025

[21] [21]

Score: A unified framework for overshoot refund in online fdr control.arXiv preprint arXiv:2601.20386, 2026

Qi Kuang, Bowen Gang, and Yin Xia. Score: A unified framework for overshoot refund in online fdr control.arXiv preprint arXiv:2601.20386, 2026

work page internal anchor Pith review arXiv 2026

[22] [22]

Tze Leung Lai and Herbert E. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4–22, 1985

work page 1985

[23] [23]

From bandit regret to fdr control: Online selective generation with feedback unlocking

Minjae Lee, Yoonjae Jung, and Sangdon Park. From bandit regret to fdr control: Online selective generation with feedback unlocking. InAgentic AI in the Wild: From Hallucinations to Reliable Autonomy, 2026

work page 2026

[24] [24]

Synthetic-powered multiple testing with fdr control.ArXiv, abs/2602.16690, 2026

Yonghoon Lee, Meshi Bashari, Edgar Dobriban, and Yaniv Romano. Synthetic-powered multiple testing with fdr control.ArXiv, abs/2602.16690, 2026

work page arXiv 2026

[25] [25]

Sequential multiple testing: A second-order asymptotic analysis

Jingyu Liu and Yanglei Song. Sequential multiple testing: A second-order asymptotic analysis. arXiv preprint arXiv:2603.04685, 2026

work page arXiv 2026

[26] [26]

Tracking the risk of a deployed model and detecting harmful distribution shifts

Aleksandr Podkopaev and Aaditya Ramdas. Tracking the risk of a deployed model and detecting harmful distribution shifts. InICLR, 2022

work page 2022

[27] [27]

Online control of the false discovery rate with decaying memory

Aaditya Ramdas, Fanny Yang, Martin J Wainwright, and Michael I Jordan. Online control of the false discovery rate with decaying memory. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,NeurIPS, 2017

work page 2017

[28] [28]

Wainwright, and Michael I

Aaditya Ramdas, Tijana Zrnic, Martin J. Wainwright, and Michael I. Jordan. Saffron: an adaptive algorithm for online control of the false discovery rate. InICML, 2018

work page 2018

[29] [29]

Robertson, James M

David S. Robertson, James M. S. Wason, and Aaditya Ramdas. Online multiple hypothesis testing.Statistical science : a review journal of the Institute of Mathematical Statistics, 38:557 – 575, 2022

work page 2022

[30] [30]

Online error rate control for platform trials.Statistics in Medicine, 42(14):2475–2495, 2023

David S Robertson, James MS Wason, Franz König, Martin Posch, and Thomas Jaki. Online error rate control for platform trials.Statistics in Medicine, 42(14):2475–2495, 2023

work page 2023

[31] [31]

Leonard J. Savage. The theory of statistical decision.Journal of the American Statistical Association, 46:55–67, 1951

work page 1951

[32] [32]

Multiple hypothesis testing.Annual review of psychology, 46(1):561– 584, 1995

Juliet Popper Shaffer et al. Multiple hypothesis testing.Annual review of psychology, 46(1):561– 584, 1995

work page 1995

[33] [33]

Febbo, Kenneth N

Dinesh Singh, Phillip G. Febbo, Kenneth N. Ross, Donald G. Jackson, Judith B. Manola, Christine Ladd, Pablo Tamayo, Andrew A. Renshaw, Anthony V . D’Amico, Jerome P. Richie, Eric S. Lander, Massimo Loda, Philip W. Kantoff, Todd R. Golub, and William R. Sellers. Gene expression correlates of clinical prostate cancer behavior.Cancer cell, 1 2:203–9, 2002

work page 2002

[34] [34]

ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls

Jinjin Tian and Aaditya Ramdas. ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls. InNeurIPS, 2019

work page 2019

[35] [35]

Online control of the familywise error rate.Statistical Methods in Medical Research, 30(4):976–993, 2021

Jinjin Tian and Aaditya Ramdas. Online control of the familywise error rate.Statistical Methods in Medical Research, 30(4):976–993, 2021

work page 2021

[36] [36]

A unified framework for bandit multiple testing

Ziyu Xu and Aaditya Ramdas. A unified framework for bandit multiple testing. InNeurIPS, 2021

work page 2021

[37] [37]

Dynamic algorithms for online multiple testing

Ziyu Xu and Aaditya Ramdas. Dynamic algorithms for online multiple testing. InMathematical and Scientific Machine Learning, pages 955–986, 2022

work page 2022

[38] [38]

Online multiple testing with e-values

Ziyu Xu and Aaditya Ramdas. Online multiple testing with e-values. InAISTATS, 2023. 11

work page 2023

[39] [39]

More powerful multiple testing under dependence via random- ization

Ziyu Xu and Aaditya Ramdas. More powerful multiple testing under dependence via random- ization. InAISTATS, 2024

work page 2024

[40] [40]

Jamieson, and Martin J

Fanny Yang, Aaditya Ramdas, Kevin G. Jamieson, and Martin J. Wainwright. A framework for multi-a(rmed)/b(andit) testing with online fdr control. InNeurIPS, 2017

work page 2017

[41] [41]

e-gai: e-value-based generalized α-investing for online false discovery rate control

Yifan Zhang, Zijian Wei, Haojie Ren, and Changliang Zou. e-gai: e-value-based generalized α-investing for online false discovery rate control. InICML, 2025

work page 2025

[42] [42]

False discovery rate control for structured multiple testing: Asymmetric rules and conformal q-values.Journal of the American Statistical Association, 120:805 – 817, 2023

Zinan Zhao and Wenguang Sun. False discovery rate control for structured multiple testing: Asymmetric rules and conformal q-values.Journal of the American Statistical Association, 120:805 – 817, 2023

work page 2023

[43] [43]

Cold-Start Tax

Tijana Zrnic, Aaditya Ramdas, and Michael I Jordan. Asynchronous online testing of multiple hypotheses.Journal of Machine Learning Research, 22(33):1–39, 2021. 12 Organization of the Appendix To facilitate navigation and comprehensive review, the appendix are organized as follows: • Appendix A (Related Work):It positions our Weighted Regret perspective an...

work page 2021

[44] [44]

alpha-death

and ADDIS [34]. To counter the “alpha-death” inherent in macroscopic cold-starts [9], various mechanisms have emerged. The SURE framework exploits conservative null structures to effectively reward the algorithm [10], while SupLORD employs dynamic scheduling to adjust testing levels based on wealth [37]. Concurrently, structure-adaptive methods like SAST ...

work page

[45] [45]

In the second phase (t>ρT), nulls occur with probabilityπ, incurring a rate ofπE[ξt]

Exploration Cost ( ∆V T ):In the first phase ( t≤ρT), the environment is purely null, incurring a marginal penalty rate of 1·E[ξt]. In the second phase (t>ρT), nulls occur with probabilityπ, incurring a rate ofπE[ξt]. ∆V T≃ ∫ ρT 0 κα 2 √ tdt+ ∫ T ρT πκα 2 √ tdt=κα √ T(√ρ+π(1−√ρ)).(57)

work page

[46] [46]

Cold-Start Tax

Discovery Dividend ( ∆M T ):Signals ( Yt = 1 ) occur only in the second phase with probability1−π. Given the detectabilityµ, the marginal recovery rate is(1−π)µE[ξt]. ∆M T≃ ∫ T ρT (1−π)µκα 2 √ tdt= (1−π)µκα √ T(1−√ρ).(58) Step 2: Solving for the Critical ThresholdM∗.The DOMT framework achieves a net reduction in weighted regret if b∆M T > a∆VT , or equiva...

work page

[47] [47]

The baseline parameter generation is predictable ( Vt−1-measurable): λbase t = f(δbase 1 ,...,δbase t−1)

work page

[48] [48]

The virtual decision utilizes the current evidence without future noise, making it Vt- measurable:δbase t =1{et≥1/λbase t }

work page

[49] [49]

wealth pollution

The virtual wealth W base t used to establish safety is updated exclusively by these virtual decisions and predictable parameters. Because the exploration noise ξt is causally decoupled from the virtual state, the parameter λbase t retains its strict predictability under the original uncorrupted filtrationVt. Consequently, the virtual processM base t safe...

work page 2002

[50] [50]

broken-window

The 100% Transient FDP Phenomenon:During infinite pure-noise cold-starts ( Yt≤T0 ≡0), the baseline discoveries are strictly zero. Any exploratory false positive triggers an instantaneous localized empirical FDP ofVt/Rt = 100% (sinceVt =R t). We explicitly clarify that this is not a failure of FDR control, but a topological necessity of 0/0→1/1. DOMT consc...

work page

[51] [51]

weak-to-moderate

The Singularity of Extreme Signals:Theorem 2 dictates that deterministic baselines suffer Ω(T) missed discoveries due to threshold depletion. However, this absolute penalty critically hinges on the global Lipschitz conditionG(x)≤Lx, which mathematically confines the environment to “weak-to-moderate” signals. If the real-world signal distribution exhibits ...

work page

[52] [52]

Contextual DOMT:Rather than applying an indiscriminate, uniform spatial noise distribution ξt∼Uniform[0,ϵt] across the entire sequence, we envision an attention-based or covariate-guided exploration policyκθ(xt). By training a secondary machine learning model to estimate the localized prior probability of anomalies, DOMT could dynamically compress the spa...

work page

[53] [53]

stochastic tunneling

Complex Topologies (DAGs and PRDS):Modern inference increasingly involves complex topolo- gies, such as testing Directed Acyclic Graphs (DAGs) in gene regulatory networks or tracking spatial contagion in neuroimaging (e.g., fMRI) and financial markets. Traditional FDR control often breaks down or suffers from severe starvation under arbitrary dependencies...

work page