A Regret Perspective on Online Multiple Testing
Pith reviewed 2026-05-15 02:50 UTC · model grok-4.3
The pith
Deterministic online FDR control forces linear regret from early threshold depletion, but a history-decoupled non-negative perturbation reduces it to order sqrt(T) without adding false negatives.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove the Duality of Regret Conservation: any purely deterministic procedure that maintains strict FDR control must incur an Omega(T) linear regret penalty, because threshold depletion during signal-sparse cold starts produces massive false negatives later. Decoupled-OMT is a baseline-agnostic meta-wrapper that injects a history-decoupled, strictly non-negative random perturbation. It prevents severe depletion, adds zero additional false negatives, preserves exact asymptotic FDR safety in stationary environments, bounds finite-sample error inflation during cold starts, and delivers an order-optimal Omega(sqrt(T)) regret reduction in bursty environments, with an explicit cold-start tax for
What carries the argument
The history-decoupled strictly non-negative random perturbation, which raises or leaves thresholds unchanged based only on past data and never increases the chance of false negatives.
If this is right
- Any existing deterministic online multiple testing method can be wrapped to achieve sublinear regret in bursty environments.
- Asymptotic FDR control remains exact in stationary streams.
- Finite-sample error inflation stays bounded during cold starts with no added false negatives.
- The derived cold-start tax marks the precise phase transition where the perturbed wrapper becomes superior.
- Regret drops from linear to square-root order while preserving the original safety guarantees.
Where Pith is reading between the lines
- The same style of history-decoupled non-negative perturbation may limit depletion effects in other sequential decision settings with asymmetric error costs.
- The cold-start tax supplies a practical rule for choosing when to activate the wrapper based on expected signal density.
- Extending the duality argument to non-stationary streams could expose additional regret-safety trade-offs.
- The result points to a general limit on how much strict control can be maintained without paying linear opportunity cost in long online inference tasks.
Load-bearing premise
Testing streams are exogenous, so decisions never affect future data generation, and the perturbation stays strictly non-negative and independent of the full decision history.
What would settle it
In a long sequence that begins with sparse signals and later contains bursts, check whether cumulative weighted regret of a deterministic FDR controller grows linearly while the perturbed version grows only as the square root of time and produces no extra false negatives.
Figures
read the original abstract
Online Multiple Testing (OMT), a fundamental pillar of sequential statistical inference, traditionally evaluates the False Discovery Rate (FDR) and statistical power in isolation, obscuring the highly asymmetric costs of false positives and false negatives in modern automated pipelines. To unify this evaluation, we introduce $\textit{Weighted Regret}$. Under this metric, we prove the $\textit{Duality of Regret Conservation}$: purely deterministic procedures ensuring strict FDR control inevitably incur an $\Omega(T)$ linear regret penalty, as threshold depletion during signal-sparse cold starts forces massive false negatives. Tailored for exogenous testing streams, we propose Decoupled-OMT (DOMT) as a baseline-agnostic meta-wrapper. By incorporating a history-decoupled, strictly non-negative random perturbation, DOMT rescues purely deterministic baselines from severe threshold depletion. Crucially, it preserves exact asymptotic safety in stationary environments and rigorously bounds finite-sample error inflation during cold-starts. Guaranteeing zero additional false negatives, it yields an order-optimal $\Omega(\sqrt{T})$ regret reduction in bursty environments, with a derived ``Cold-Start Tax'' characterizing the exact phase transition of algorithmic superiority. Experiments validate that DOMT consistently curtails empirical weighted regret, achieving an order-optimal sublinear mitigation of threshold depletion to navigate the non-stationary Pareto frontier.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Weighted Regret as a unified metric for online multiple testing that balances FDR control against the asymmetric costs of false positives and negatives. It proves the Duality of Regret Conservation, showing that any deterministic procedure with strict FDR control incurs an Ω(T) linear regret penalty due to threshold depletion in signal-sparse cold starts. The authors propose Decoupled-OMT (DOMT), a baseline-agnostic meta-wrapper that applies a history-decoupled, strictly non-negative random perturbation to rescue deterministic baselines, achieving an order-optimal Ω(√T) regret reduction in bursty exogenous streams while preserving exact asymptotic FDR safety in stationary environments and providing finite-sample bounds on error inflation with a derived Cold-Start Tax.
Significance. If the duality lower bound and DOMT perturbation analysis hold under the stated assumptions, the work supplies a regret-theoretic lens for OMT that directly addresses practical costs in automated pipelines and yields a concrete, order-optimal improvement over deterministic baselines in non-stationary settings. The explicit characterization of the phase transition via the Cold-Start Tax and the guarantee of zero additional false negatives are particularly valuable for sequential inference applications.
major comments (3)
- [§3] §3 (perturbation definition and finite-sample bounds): The claim that a strictly non-negative, history-decoupled perturbation adds zero false negatives while only rescuing threshold depletion is load-bearing for both the Ω(√T) regret reduction and the Cold-Start Tax. In signal-sparse cold-start regimes, even a non-negative shift can interact with the baseline's adaptive threshold rule near decision boundaries, potentially increasing missed signals beyond the stated bound; the analysis must explicitly rule out this interaction or provide a counter-example-free proof.
- [Proof of Duality of Regret Conservation] Proof of Duality of Regret Conservation: The Ω(T) lower bound for deterministic procedures is derived from threshold depletion during cold starts, but the argument appears to rely on exogenous streams and stationarity for the asymptotic safety guarantee. The finite-sample regime where the linear penalty is most severe must be shown to satisfy the same non-interaction condition used for DOMT, otherwise the duality does not strictly separate deterministic from perturbed methods.
- [Experiments] Experimental section: Validation that DOMT achieves order-optimal sublinear mitigation of threshold depletion is reported, yet the abstract and available description omit the precise simulation protocol for bursty environments, the choice of perturbation scale, the set of baselines, and how post-hoc parameter tuning was avoided. Without these details it is impossible to confirm that the observed Ω(√T) improvement is not an artifact of the experimental design.
minor comments (2)
- Notation for Weighted Regret and the Cold-Start Tax should be introduced with explicit definitions before the duality statement to improve readability.
- The abstract states that DOMT is 'baseline-agnostic,' but the finite-sample error-inflation bound may implicitly depend on properties of the underlying deterministic procedure; clarify the scope of this agnosticism.
Simulated Author's Rebuttal
We thank the referee for their insightful and constructive feedback on our manuscript. We address each of the major comments point by point below. We have made revisions to the manuscript to strengthen the analysis and provide additional details as requested.
read point-by-point responses
-
Referee: [§3] §3 (perturbation definition and finite-sample bounds): The claim that a strictly non-negative, history-decoupled perturbation adds zero false negatives while only rescuing threshold depletion is load-bearing for both the Ω(√T) regret reduction and the Cold-Start Tax. In signal-sparse cold-start regimes, even a non-negative shift can interact with the baseline's adaptive threshold rule near decision boundaries, potentially increasing missed signals beyond the stated bound; the analysis must explicitly rule out this interaction or provide a counter-example-free proof.
Authors: We appreciate the referee pointing out the need for a more rigorous treatment of potential interactions in the perturbation analysis. We agree that this requires explicit clarification. In the revised manuscript, we have added a detailed proof in Section 3 that demonstrates, under the history-decoupled and strictly non-negative properties, the perturbation cannot increase the number of false negatives beyond the baseline's decisions. Specifically, we show that any threshold increase due to the perturbation is compensated by the decoupling, preventing additional misses at decision boundaries. This is formalized in a new lemma providing a counter-example-free bound. revision: yes
-
Referee: [Proof of Duality of Regret Conservation] Proof of Duality of Regret Conservation: The Ω(T) lower bound for deterministic procedures is derived from threshold depletion during cold starts, but the argument appears to rely on exogenous streams and stationarity for the asymptotic safety guarantee. The finite-sample regime where the linear penalty is most severe must be shown to satisfy the same non-interaction condition used for DOMT, otherwise the duality does not strictly separate deterministic from perturbed methods.
Authors: The referee correctly identifies a potential gap in the separation argument. The lower bound for deterministic procedures is established directly from the threshold depletion mechanism without invoking the perturbation at all. In the revised proof, we have clarified that the duality holds in the finite-sample regime under the exogenous stream model, and the non-interaction condition is not needed for the deterministic case since there is no perturbation. We have updated the proof to explicitly compare the two regimes and maintain the strict separation. revision: yes
-
Referee: [Experiments] Experimental section: Validation that DOMT achieves order-optimal sublinear mitigation of threshold depletion is reported, yet the abstract and available description omit the precise simulation protocol for bursty environments, the choice of perturbation scale, the set of baselines, and how post-hoc parameter tuning was avoided. Without these details it is impossible to confirm that the observed Ω(√T) improvement is not an artifact of the experimental design.
Authors: We agree that the experimental protocol requires more precise documentation to allow full reproducibility and validation. In the revised manuscript, we have substantially expanded the Experiments section to detail the simulation protocol for bursty environments, specify the perturbation scale used (consistent with the theoretical √T scaling), list all baselines evaluated, and confirm that parameters were set based on theoretical considerations without post-hoc tuning. We have also included the full set of simulation parameters in a supplementary table. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper derives the Duality of Regret Conservation lower bound and the Ω(√T) regret reduction for DOMT from explicit modeling assumptions on exogenous streams, stationarity, and the perturbation's strictly non-negative history-decoupled properties. These are stated as inputs to the analysis (e.g., the perturbation is defined to add zero additional false negatives by construction of its sign and independence), rather than being fitted or redefined from the regret outputs themselves. No equations reduce the main claims to self-citations, ansatzes smuggled via prior work, or renaming of known patterns; the finite-sample bounds follow from the stated perturbation rules without tautological collapse. The derivation remains self-contained against the given assumptions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Testing streams are exogenous (decisions do not influence future observations)
- domain assumption Stationarity of the environment for asymptotic FDR safety
invented entities (2)
-
Weighted Regret
no independent evidence
-
Decoupled-OMT (DOMT)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Anastasios Nikolas Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, and Tal Schuster. Conformal risk control. InICLR, 2024
work page 2024
-
[2]
Online local false discovery rate control: A resource allocation approach.CoRR, abs/2402.11425, 2024
Ruicheng Ao, Hongyu Chen, David Simchi-Levi, and Feng Zhu. Online local false discovery rate control: A resource allocation approach.CoRR, abs/2402.11425, 2024
-
[3]
Yajie Bao, Yuyang Huo, Haojie Ren, and Changliang Zou. Cap: A general algorithm for online selective conformal prediction with fcr control.Journal of Machine Learning Research, 26(287):1–74, 2025
work page 2025
-
[4]
Practical adversarial multivalid conformal prediction
Osbert Bastani, Varun Gupta, Christopher Jung, Georgy Noarov, Ramya Ramalingam, and Aaron Roth. Practical adversarial multivalid conformal prediction. InNeurIPS, 2022
work page 2022
-
[5]
Stephen Bates, Anastasios Nikolas Angelopoulos, Lihua Lei, Jitendra Malik, and Michael I. Jordan. Distribution-free, risk-controlling prediction sets.J. ACM, 68:43:1–43:34, 2021
work page 2021
-
[6]
Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995
work page 1995
-
[7]
Contextual online false discovery rate control
Shiyun Chen and Shiva Prasad Kasiviswanathan. Contextual online false discovery rate control. InAISTATS, 2019
work page 2019
-
[8]
The minimax-regret decision framework for online a/b tests.SSRN Electronic Journal, 2023
Khai Xiang Chiong and Joonhwi Joo. The minimax-regret decision framework for online a/b tests.SSRN Electronic Journal, 2023
work page 2023
-
[9]
Clara Delecroix, Egbert H. van Nes, Martin W. Scheffer, and Ingrid A van de Leemput. Mon- itoring resilience in bursts.Proceedings of the National Academy of Sciences of the United States of America, 121, 2023
work page 2023
-
[10]
Sebastian Döhler, Iqraa Meah, and Etienne Roquain. Online multiple testing with super- uniformity reward.Electronic Journal of Statistics, 18(1):1293–1354, 2024
work page 2024
-
[11]
Cambridge University Press, 2012
Bradley Efron.Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, volume 1. Cambridge University Press, 2012
work page 2012
-
[12]
The online closure principle.The Annals of Statistics, 52(2):817–841, 2024
Lasse Fischer, Marta Bofill Roig, and Werner Brannath. The online closure principle.The Annals of Statistics, 52(2):817–841, 2024
work page 2024
-
[13]
Online generalizations of the e-BH and BH procedure
Lasse Fischer, Ziyu Xu, and Aaditya Ramdas. An online generalization of the (e-) benjamini- hochberg procedure.arXiv preprint arXiv:2407.20683, 2024
-
[14]
Dean P Foster and Robert A Stine. α-investing: a procedure for sequential control of expected false discoveries.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 2008
work page 2008
-
[15]
Bowen Gang, Wenguang Sun, and Weinan Wang. Structure–adaptive sequential testing for online false discovery rate control.Journal of the American Statistical Association, 118:732 – 745, 2020
work page 2020
-
[16]
Adaptive conformal inference under distribution shift
Isaac Gibbs and Emmanuel Candes. Adaptive conformal inference under distribution shift. In NeurIPS, 2021
work page 2021
-
[17]
Tudor Groza, Federico López-Gómez, Hamed Haseli Mashhadi, Violeta Muñoz-Fuentes, Os- man Nuri Günes, Robert J. Wilson, Pilar Cacheiro, Anthony Frost, Piia Keskivali-Bond, Bora Vardal, Aaron McCoy, Tsz Kwan Cheng, Luis A. Santos, Sara E. Wells, Damian Smedley, Ann-Marie Mallon, and Helen E. Parkinson. The international mouse phenotyping consortium: compreh...
work page 2022
-
[18]
Adel Javanmard and Andrea Montanari. Online rules for control of false discovery rate and false discovery exceedance.The Annals of Statistics, 46(2):526 – 554, 2018. 10
work page 2018
- [19]
-
[20]
Sahil Kale and Antonio Luca Alfeo. Lie to me: Knowledge graphs for robust hallucination self-detection in llms.ArXiv, abs/2512.23547, 2025
-
[21]
Qi Kuang, Bowen Gang, and Yin Xia. Score: A unified framework for overshoot refund in online fdr control.arXiv preprint arXiv:2601.20386, 2026
work page internal anchor Pith review arXiv 2026
-
[22]
Tze Leung Lai and Herbert E. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4–22, 1985
work page 1985
-
[23]
From bandit regret to fdr control: Online selective generation with feedback unlocking
Minjae Lee, Yoonjae Jung, and Sangdon Park. From bandit regret to fdr control: Online selective generation with feedback unlocking. InAgentic AI in the Wild: From Hallucinations to Reliable Autonomy, 2026
work page 2026
-
[24]
Synthetic-powered multiple testing with fdr control.ArXiv, abs/2602.16690, 2026
Yonghoon Lee, Meshi Bashari, Edgar Dobriban, and Yaniv Romano. Synthetic-powered multiple testing with fdr control.ArXiv, abs/2602.16690, 2026
-
[25]
Sequential multiple testing: A second-order asymptotic analysis
Jingyu Liu and Yanglei Song. Sequential multiple testing: A second-order asymptotic analysis. arXiv preprint arXiv:2603.04685, 2026
-
[26]
Tracking the risk of a deployed model and detecting harmful distribution shifts
Aleksandr Podkopaev and Aaditya Ramdas. Tracking the risk of a deployed model and detecting harmful distribution shifts. InICLR, 2022
work page 2022
-
[27]
Online control of the false discovery rate with decaying memory
Aaditya Ramdas, Fanny Yang, Martin J Wainwright, and Michael I Jordan. Online control of the false discovery rate with decaying memory. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,NeurIPS, 2017
work page 2017
-
[28]
Aaditya Ramdas, Tijana Zrnic, Martin J. Wainwright, and Michael I. Jordan. Saffron: an adaptive algorithm for online control of the false discovery rate. InICML, 2018
work page 2018
-
[29]
David S. Robertson, James M. S. Wason, and Aaditya Ramdas. Online multiple hypothesis testing.Statistical science : a review journal of the Institute of Mathematical Statistics, 38:557 – 575, 2022
work page 2022
-
[30]
Online error rate control for platform trials.Statistics in Medicine, 42(14):2475–2495, 2023
David S Robertson, James MS Wason, Franz König, Martin Posch, and Thomas Jaki. Online error rate control for platform trials.Statistics in Medicine, 42(14):2475–2495, 2023
work page 2023
-
[31]
Leonard J. Savage. The theory of statistical decision.Journal of the American Statistical Association, 46:55–67, 1951
work page 1951
-
[32]
Multiple hypothesis testing.Annual review of psychology, 46(1):561– 584, 1995
Juliet Popper Shaffer et al. Multiple hypothesis testing.Annual review of psychology, 46(1):561– 584, 1995
work page 1995
-
[33]
Dinesh Singh, Phillip G. Febbo, Kenneth N. Ross, Donald G. Jackson, Judith B. Manola, Christine Ladd, Pablo Tamayo, Andrew A. Renshaw, Anthony V . D’Amico, Jerome P. Richie, Eric S. Lander, Massimo Loda, Philip W. Kantoff, Todd R. Golub, and William R. Sellers. Gene expression correlates of clinical prostate cancer behavior.Cancer cell, 1 2:203–9, 2002
work page 2002
-
[34]
ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls
Jinjin Tian and Aaditya Ramdas. ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls. InNeurIPS, 2019
work page 2019
-
[35]
Jinjin Tian and Aaditya Ramdas. Online control of the familywise error rate.Statistical Methods in Medical Research, 30(4):976–993, 2021
work page 2021
-
[36]
A unified framework for bandit multiple testing
Ziyu Xu and Aaditya Ramdas. A unified framework for bandit multiple testing. InNeurIPS, 2021
work page 2021
-
[37]
Dynamic algorithms for online multiple testing
Ziyu Xu and Aaditya Ramdas. Dynamic algorithms for online multiple testing. InMathematical and Scientific Machine Learning, pages 955–986, 2022
work page 2022
-
[38]
Online multiple testing with e-values
Ziyu Xu and Aaditya Ramdas. Online multiple testing with e-values. InAISTATS, 2023. 11
work page 2023
-
[39]
More powerful multiple testing under dependence via random- ization
Ziyu Xu and Aaditya Ramdas. More powerful multiple testing under dependence via random- ization. InAISTATS, 2024
work page 2024
-
[40]
Fanny Yang, Aaditya Ramdas, Kevin G. Jamieson, and Martin J. Wainwright. A framework for multi-a(rmed)/b(andit) testing with online fdr control. InNeurIPS, 2017
work page 2017
-
[41]
e-gai: e-value-based generalized α-investing for online false discovery rate control
Yifan Zhang, Zijian Wei, Haojie Ren, and Changliang Zou. e-gai: e-value-based generalized α-investing for online false discovery rate control. InICML, 2025
work page 2025
-
[42]
Zinan Zhao and Wenguang Sun. False discovery rate control for structured multiple testing: Asymmetric rules and conformal q-values.Journal of the American Statistical Association, 120:805 – 817, 2023
work page 2023
-
[43]
Tijana Zrnic, Aaditya Ramdas, and Michael I Jordan. Asynchronous online testing of multiple hypotheses.Journal of Machine Learning Research, 22(33):1–39, 2021. 12 Organization of the Appendix To facilitate navigation and comprehensive review, the appendix are organized as follows: • Appendix A (Related Work):It positions our Weighted Regret perspective an...
work page 2021
-
[44]
and ADDIS [34]. To counter the “alpha-death” inherent in macroscopic cold-starts [9], various mechanisms have emerged. The SURE framework exploits conservative null structures to effectively reward the algorithm [10], while SupLORD employs dynamic scheduling to adjust testing levels based on wealth [37]. Concurrently, structure-adaptive methods like SAST ...
-
[45]
In the second phase (t>ρT), nulls occur with probabilityπ, incurring a rate ofπE[ξt]
Exploration Cost ( ∆V T ):In the first phase ( t≤ρT), the environment is purely null, incurring a marginal penalty rate of 1·E[ξt]. In the second phase (t>ρT), nulls occur with probabilityπ, incurring a rate ofπE[ξt]. ∆V T≃ ∫ ρT 0 κα 2 √ tdt+ ∫ T ρT πκα 2 √ tdt=κα √ T(√ρ+π(1−√ρ)).(57)
-
[46]
Discovery Dividend ( ∆M T ):Signals ( Yt = 1 ) occur only in the second phase with probability1−π. Given the detectabilityµ, the marginal recovery rate is(1−π)µE[ξt]. ∆M T≃ ∫ T ρT (1−π)µκα 2 √ tdt= (1−π)µκα √ T(1−√ρ).(58) Step 2: Solving for the Critical ThresholdM∗.The DOMT framework achieves a net reduction in weighted regret if b∆M T > a∆VT , or equiva...
-
[47]
The baseline parameter generation is predictable ( Vt−1-measurable): λbase t = f(δbase 1 ,...,δbase t−1)
-
[48]
The virtual decision utilizes the current evidence without future noise, making it Vt- measurable:δbase t =1{et≥1/λbase t }
-
[49]
The virtual wealth W base t used to establish safety is updated exclusively by these virtual decisions and predictable parameters. Because the exploration noise ξt is causally decoupled from the virtual state, the parameter λbase t retains its strict predictability under the original uncorrupted filtrationVt. Consequently, the virtual processM base t safe...
work page 2002
-
[50]
The 100% Transient FDP Phenomenon:During infinite pure-noise cold-starts ( Yt≤T0 ≡0), the baseline discoveries are strictly zero. Any exploratory false positive triggers an instantaneous localized empirical FDP ofVt/Rt = 100% (sinceVt =R t). We explicitly clarify that this is not a failure of FDR control, but a topological necessity of 0/0→1/1. DOMT consc...
-
[51]
The Singularity of Extreme Signals:Theorem 2 dictates that deterministic baselines suffer Ω(T) missed discoveries due to threshold depletion. However, this absolute penalty critically hinges on the global Lipschitz conditionG(x)≤Lx, which mathematically confines the environment to “weak-to-moderate” signals. If the real-world signal distribution exhibits ...
-
[52]
Contextual DOMT:Rather than applying an indiscriminate, uniform spatial noise distribution ξt∼Uniform[0,ϵt] across the entire sequence, we envision an attention-based or covariate-guided exploration policyκθ(xt). By training a secondary machine learning model to estimate the localized prior probability of anomalies, DOMT could dynamically compress the spa...
-
[53]
Complex Topologies (DAGs and PRDS):Modern inference increasingly involves complex topolo- gies, such as testing Directed Acyclic Graphs (DAGs) in gene regulatory networks or tracking spatial contagion in neuroimaging (e.g., fMRI) and financial markets. Traditional FDR control often breaks down or suffers from severe starvation under arbitrary dependencies...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.