pith. sign in

arxiv: 2605.13916 · v1 · pith:X3WYPJSNnew · submitted 2026-05-13 · 📊 stat.ML · cs.AI· cs.LG

A Regret Perspective on Online Multiple Testing

Pith reviewed 2026-05-15 02:50 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG
keywords online multiple testingfalse discovery rateweighted regretthreshold depletionrandom perturbationcold startasymptotic safetybursty environments
0
0 comments X

The pith

Deterministic online FDR control forces linear regret from early threshold depletion, but a history-decoupled non-negative perturbation reduces it to order sqrt(T) without adding false negatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In online multiple testing, any deterministic rule that strictly controls false discovery rate pays a linear penalty in weighted regret. Early periods with few signals drive the rejection threshold down, so later bursts of real signals are mostly missed. The paper introduces Decoupled-OMT, a wrapper that adds a random perturbation depending only on past observations and always kept non-negative. This change stops severe depletion, introduces no extra false negatives, keeps long-run FDR control exact under stationarity, and brings regret down to the square-root order in bursty streams. The result matters for high-volume automated pipelines where missing real effects is far costlier than occasional false alarms.

Core claim

We prove the Duality of Regret Conservation: any purely deterministic procedure that maintains strict FDR control must incur an Omega(T) linear regret penalty, because threshold depletion during signal-sparse cold starts produces massive false negatives later. Decoupled-OMT is a baseline-agnostic meta-wrapper that injects a history-decoupled, strictly non-negative random perturbation. It prevents severe depletion, adds zero additional false negatives, preserves exact asymptotic FDR safety in stationary environments, bounds finite-sample error inflation during cold starts, and delivers an order-optimal Omega(sqrt(T)) regret reduction in bursty environments, with an explicit cold-start tax for

What carries the argument

The history-decoupled strictly non-negative random perturbation, which raises or leaves thresholds unchanged based only on past data and never increases the chance of false negatives.

If this is right

  • Any existing deterministic online multiple testing method can be wrapped to achieve sublinear regret in bursty environments.
  • Asymptotic FDR control remains exact in stationary streams.
  • Finite-sample error inflation stays bounded during cold starts with no added false negatives.
  • The derived cold-start tax marks the precise phase transition where the perturbed wrapper becomes superior.
  • Regret drops from linear to square-root order while preserving the original safety guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same style of history-decoupled non-negative perturbation may limit depletion effects in other sequential decision settings with asymmetric error costs.
  • The cold-start tax supplies a practical rule for choosing when to activate the wrapper based on expected signal density.
  • Extending the duality argument to non-stationary streams could expose additional regret-safety trade-offs.
  • The result points to a general limit on how much strict control can be maintained without paying linear opportunity cost in long online inference tasks.

Load-bearing premise

Testing streams are exogenous, so decisions never affect future data generation, and the perturbation stays strictly non-negative and independent of the full decision history.

What would settle it

In a long sequence that begins with sparse signals and later contains bursts, check whether cumulative weighted regret of a deterministic FDR controller grows linearly while the perturbed version grows only as the square root of time and produces no extra false negatives.

Figures

Figures reproduced from arXiv: 2605.13916 by Fang Kong, Hongxin Wei, Kongchang Zhou, Qingyang Hao.

Figure 1
Figure 1. Figure 1: Performance Dynamics across Environments. [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Regret Phase Transition and Geometric Synthesis. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ablation on Virtual Decoupling (LOND Framework). Evaluated across stationary (top) [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation on Virtual Decoupling (SAFFRON Framework). Similar to the LOND evaluation, [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Universality and Parameter Sensitivity. Left: The U-shaped weighted regret curve reveals a broad optimal region for the exploration coefficient κ. Center: DOMT maintains robust superiority across varying signal densities π1, particularly dominating in highly sparse environments. Right: The S-shaped advantage curve confirms consistent efficacy gains across diverse signal strengths µ. F.4 Advancing Modern OM… view at source ↗
Figure 6
Figure 6. Figure 6: Comparisons in Stationary Environments. Both DOMT-ADDIS (p [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparisons in Bursty Environments. During macroscopic pure-null droughts, determin [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Real-World Application: RNA-Seq Microarray Data. [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Real-World Application: S&P 500 Financial Anomalies. In this bursty environment, DOMT [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Real-World Application: IMPC Mouse Phenotype Data. The dual-plot validates the [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗
read the original abstract

Online Multiple Testing (OMT), a fundamental pillar of sequential statistical inference, traditionally evaluates the False Discovery Rate (FDR) and statistical power in isolation, obscuring the highly asymmetric costs of false positives and false negatives in modern automated pipelines. To unify this evaluation, we introduce $\textit{Weighted Regret}$. Under this metric, we prove the $\textit{Duality of Regret Conservation}$: purely deterministic procedures ensuring strict FDR control inevitably incur an $\Omega(T)$ linear regret penalty, as threshold depletion during signal-sparse cold starts forces massive false negatives. Tailored for exogenous testing streams, we propose Decoupled-OMT (DOMT) as a baseline-agnostic meta-wrapper. By incorporating a history-decoupled, strictly non-negative random perturbation, DOMT rescues purely deterministic baselines from severe threshold depletion. Crucially, it preserves exact asymptotic safety in stationary environments and rigorously bounds finite-sample error inflation during cold-starts. Guaranteeing zero additional false negatives, it yields an order-optimal $\Omega(\sqrt{T})$ regret reduction in bursty environments, with a derived ``Cold-Start Tax'' characterizing the exact phase transition of algorithmic superiority. Experiments validate that DOMT consistently curtails empirical weighted regret, achieving an order-optimal sublinear mitigation of threshold depletion to navigate the non-stationary Pareto frontier.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Weighted Regret as a unified metric for online multiple testing that balances FDR control against the asymmetric costs of false positives and negatives. It proves the Duality of Regret Conservation, showing that any deterministic procedure with strict FDR control incurs an Ω(T) linear regret penalty due to threshold depletion in signal-sparse cold starts. The authors propose Decoupled-OMT (DOMT), a baseline-agnostic meta-wrapper that applies a history-decoupled, strictly non-negative random perturbation to rescue deterministic baselines, achieving an order-optimal Ω(√T) regret reduction in bursty exogenous streams while preserving exact asymptotic FDR safety in stationary environments and providing finite-sample bounds on error inflation with a derived Cold-Start Tax.

Significance. If the duality lower bound and DOMT perturbation analysis hold under the stated assumptions, the work supplies a regret-theoretic lens for OMT that directly addresses practical costs in automated pipelines and yields a concrete, order-optimal improvement over deterministic baselines in non-stationary settings. The explicit characterization of the phase transition via the Cold-Start Tax and the guarantee of zero additional false negatives are particularly valuable for sequential inference applications.

major comments (3)
  1. [§3] §3 (perturbation definition and finite-sample bounds): The claim that a strictly non-negative, history-decoupled perturbation adds zero false negatives while only rescuing threshold depletion is load-bearing for both the Ω(√T) regret reduction and the Cold-Start Tax. In signal-sparse cold-start regimes, even a non-negative shift can interact with the baseline's adaptive threshold rule near decision boundaries, potentially increasing missed signals beyond the stated bound; the analysis must explicitly rule out this interaction or provide a counter-example-free proof.
  2. [Proof of Duality of Regret Conservation] Proof of Duality of Regret Conservation: The Ω(T) lower bound for deterministic procedures is derived from threshold depletion during cold starts, but the argument appears to rely on exogenous streams and stationarity for the asymptotic safety guarantee. The finite-sample regime where the linear penalty is most severe must be shown to satisfy the same non-interaction condition used for DOMT, otherwise the duality does not strictly separate deterministic from perturbed methods.
  3. [Experiments] Experimental section: Validation that DOMT achieves order-optimal sublinear mitigation of threshold depletion is reported, yet the abstract and available description omit the precise simulation protocol for bursty environments, the choice of perturbation scale, the set of baselines, and how post-hoc parameter tuning was avoided. Without these details it is impossible to confirm that the observed Ω(√T) improvement is not an artifact of the experimental design.
minor comments (2)
  1. Notation for Weighted Regret and the Cold-Start Tax should be introduced with explicit definitions before the duality statement to improve readability.
  2. The abstract states that DOMT is 'baseline-agnostic,' but the finite-sample error-inflation bound may implicitly depend on properties of the underlying deterministic procedure; clarify the scope of this agnosticism.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful and constructive feedback on our manuscript. We address each of the major comments point by point below. We have made revisions to the manuscript to strengthen the analysis and provide additional details as requested.

read point-by-point responses
  1. Referee: [§3] §3 (perturbation definition and finite-sample bounds): The claim that a strictly non-negative, history-decoupled perturbation adds zero false negatives while only rescuing threshold depletion is load-bearing for both the Ω(√T) regret reduction and the Cold-Start Tax. In signal-sparse cold-start regimes, even a non-negative shift can interact with the baseline's adaptive threshold rule near decision boundaries, potentially increasing missed signals beyond the stated bound; the analysis must explicitly rule out this interaction or provide a counter-example-free proof.

    Authors: We appreciate the referee pointing out the need for a more rigorous treatment of potential interactions in the perturbation analysis. We agree that this requires explicit clarification. In the revised manuscript, we have added a detailed proof in Section 3 that demonstrates, under the history-decoupled and strictly non-negative properties, the perturbation cannot increase the number of false negatives beyond the baseline's decisions. Specifically, we show that any threshold increase due to the perturbation is compensated by the decoupling, preventing additional misses at decision boundaries. This is formalized in a new lemma providing a counter-example-free bound. revision: yes

  2. Referee: [Proof of Duality of Regret Conservation] Proof of Duality of Regret Conservation: The Ω(T) lower bound for deterministic procedures is derived from threshold depletion during cold starts, but the argument appears to rely on exogenous streams and stationarity for the asymptotic safety guarantee. The finite-sample regime where the linear penalty is most severe must be shown to satisfy the same non-interaction condition used for DOMT, otherwise the duality does not strictly separate deterministic from perturbed methods.

    Authors: The referee correctly identifies a potential gap in the separation argument. The lower bound for deterministic procedures is established directly from the threshold depletion mechanism without invoking the perturbation at all. In the revised proof, we have clarified that the duality holds in the finite-sample regime under the exogenous stream model, and the non-interaction condition is not needed for the deterministic case since there is no perturbation. We have updated the proof to explicitly compare the two regimes and maintain the strict separation. revision: yes

  3. Referee: [Experiments] Experimental section: Validation that DOMT achieves order-optimal sublinear mitigation of threshold depletion is reported, yet the abstract and available description omit the precise simulation protocol for bursty environments, the choice of perturbation scale, the set of baselines, and how post-hoc parameter tuning was avoided. Without these details it is impossible to confirm that the observed Ω(√T) improvement is not an artifact of the experimental design.

    Authors: We agree that the experimental protocol requires more precise documentation to allow full reproducibility and validation. In the revised manuscript, we have substantially expanded the Experiments section to detail the simulation protocol for bursty environments, specify the perturbation scale used (consistent with the theoretical √T scaling), list all baselines evaluated, and confirm that parameters were set based on theoretical considerations without post-hoc tuning. We have also included the full set of simulation parameters in a supplementary table. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives the Duality of Regret Conservation lower bound and the Ω(√T) regret reduction for DOMT from explicit modeling assumptions on exogenous streams, stationarity, and the perturbation's strictly non-negative history-decoupled properties. These are stated as inputs to the analysis (e.g., the perturbation is defined to add zero additional false negatives by construction of its sign and independence), rather than being fitted or redefined from the regret outputs themselves. No equations reduce the main claims to self-citations, ansatzes smuggled via prior work, or renaming of known patterns; the finite-sample bounds follow from the stated perturbation rules without tautological collapse. The derivation remains self-contained against the given assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claims rest on standard statistical assumptions for FDR control plus new constructs (weighted regret, history-decoupled perturbation) introduced without external validation in the provided abstract.

axioms (2)
  • domain assumption Testing streams are exogenous (decisions do not influence future observations)
    Stated as the setting for which DOMT is tailored.
  • domain assumption Stationarity of the environment for asymptotic FDR safety
    Required for the claim that DOMT preserves exact asymptotic safety.
invented entities (2)
  • Weighted Regret no independent evidence
    purpose: Unify FDR control and statistical power under a single cost metric
    New evaluation criterion introduced to capture asymmetric error costs.
  • Decoupled-OMT (DOMT) no independent evidence
    purpose: Meta-wrapper that adds history-decoupled non-negative perturbation to rescue deterministic baselines
    New algorithm proposed to achieve sublinear regret.

pith-pipeline@v0.9.0 · 5534 in / 1601 out tokens · 41085 ms · 2026-05-15T02:50:20.563875+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 1 internal anchor

  1. [1]

    Conformal risk control

    Anastasios Nikolas Angelopoulos, Stephen Bates, Adam Fisch, Lihua Lei, and Tal Schuster. Conformal risk control. InICLR, 2024

  2. [2]

    Online local false discovery rate control: A resource allocation approach.CoRR, abs/2402.11425, 2024

    Ruicheng Ao, Hongyu Chen, David Simchi-Levi, and Feng Zhu. Online local false discovery rate control: A resource allocation approach.CoRR, abs/2402.11425, 2024

  3. [3]

    Cap: A general algorithm for online selective conformal prediction with fcr control.Journal of Machine Learning Research, 26(287):1–74, 2025

    Yajie Bao, Yuyang Huo, Haojie Ren, and Changliang Zou. Cap: A general algorithm for online selective conformal prediction with fcr control.Journal of Machine Learning Research, 26(287):1–74, 2025

  4. [4]

    Practical adversarial multivalid conformal prediction

    Osbert Bastani, Varun Gupta, Christopher Jung, Georgy Noarov, Ramya Ramalingam, and Aaron Roth. Practical adversarial multivalid conformal prediction. InNeurIPS, 2022

  5. [5]

    Stephen Bates, Anastasios Nikolas Angelopoulos, Lihua Lei, Jitendra Malik, and Michael I. Jordan. Distribution-free, risk-controlling prediction sets.J. ACM, 68:43:1–43:34, 2021

  6. [6]

    Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995

    Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B (Methodological), 57(1):289–300, 1995

  7. [7]

    Contextual online false discovery rate control

    Shiyun Chen and Shiva Prasad Kasiviswanathan. Contextual online false discovery rate control. InAISTATS, 2019

  8. [8]

    The minimax-regret decision framework for online a/b tests.SSRN Electronic Journal, 2023

    Khai Xiang Chiong and Joonhwi Joo. The minimax-regret decision framework for online a/b tests.SSRN Electronic Journal, 2023

  9. [9]

    van Nes, Martin W

    Clara Delecroix, Egbert H. van Nes, Martin W. Scheffer, and Ingrid A van de Leemput. Mon- itoring resilience in bursts.Proceedings of the National Academy of Sciences of the United States of America, 121, 2023

  10. [10]

    Online multiple testing with super- uniformity reward.Electronic Journal of Statistics, 18(1):1293–1354, 2024

    Sebastian Döhler, Iqraa Meah, and Etienne Roquain. Online multiple testing with super- uniformity reward.Electronic Journal of Statistics, 18(1):1293–1354, 2024

  11. [11]

    Cambridge University Press, 2012

    Bradley Efron.Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, volume 1. Cambridge University Press, 2012

  12. [12]

    The online closure principle.The Annals of Statistics, 52(2):817–841, 2024

    Lasse Fischer, Marta Bofill Roig, and Werner Brannath. The online closure principle.The Annals of Statistics, 52(2):817–841, 2024

  13. [13]

    Online generalizations of the e-BH and BH procedure

    Lasse Fischer, Ziyu Xu, and Aaditya Ramdas. An online generalization of the (e-) benjamini- hochberg procedure.arXiv preprint arXiv:2407.20683, 2024

  14. [14]

    α-investing: a procedure for sequential control of expected false discoveries.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 2008

    Dean P Foster and Robert A Stine. α-investing: a procedure for sequential control of expected false discoveries.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 2008

  15. [15]

    Structure–adaptive sequential testing for online false discovery rate control.Journal of the American Statistical Association, 118:732 – 745, 2020

    Bowen Gang, Wenguang Sun, and Weinan Wang. Structure–adaptive sequential testing for online false discovery rate control.Journal of the American Statistical Association, 118:732 – 745, 2020

  16. [16]

    Adaptive conformal inference under distribution shift

    Isaac Gibbs and Emmanuel Candes. Adaptive conformal inference under distribution shift. In NeurIPS, 2021

  17. [17]

    Wilson, Pilar Cacheiro, Anthony Frost, Piia Keskivali-Bond, Bora Vardal, Aaron McCoy, Tsz Kwan Cheng, Luis A

    Tudor Groza, Federico López-Gómez, Hamed Haseli Mashhadi, Violeta Muñoz-Fuentes, Os- man Nuri Günes, Robert J. Wilson, Pilar Cacheiro, Anthony Frost, Piia Keskivali-Bond, Bora Vardal, Aaron McCoy, Tsz Kwan Cheng, Luis A. Santos, Sara E. Wells, Damian Smedley, Ann-Marie Mallon, and Helen E. Parkinson. The international mouse phenotyping consortium: compreh...

  18. [18]

    Online rules for control of false discovery rate and false discovery exceedance.The Annals of Statistics, 46(2):526 – 554, 2018

    Adel Javanmard and Andrea Montanari. Online rules for control of false discovery rate and false discovery exceedance.The Annals of Statistics, 46(2):526 – 554, 2018. 10

  19. [19]

    Weintraub

    Ramesh Johari, Hannah Li, Inessa Liskovich, and Gabriel Y . Weintraub. Experimental design in two-sided platforms: An analysis of bias.Management Science, 68(10):7069–7089, 2022

  20. [20]

    Lie to me: Knowledge graphs for robust hallucination self-detection in llms.ArXiv, abs/2512.23547, 2025

    Sahil Kale and Antonio Luca Alfeo. Lie to me: Knowledge graphs for robust hallucination self-detection in llms.ArXiv, abs/2512.23547, 2025

  21. [21]

    Score: A unified framework for overshoot refund in online fdr control.arXiv preprint arXiv:2601.20386, 2026

    Qi Kuang, Bowen Gang, and Yin Xia. Score: A unified framework for overshoot refund in online fdr control.arXiv preprint arXiv:2601.20386, 2026

  22. [22]

    Tze Leung Lai and Herbert E. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4–22, 1985

  23. [23]

    From bandit regret to fdr control: Online selective generation with feedback unlocking

    Minjae Lee, Yoonjae Jung, and Sangdon Park. From bandit regret to fdr control: Online selective generation with feedback unlocking. InAgentic AI in the Wild: From Hallucinations to Reliable Autonomy, 2026

  24. [24]

    Synthetic-powered multiple testing with fdr control.ArXiv, abs/2602.16690, 2026

    Yonghoon Lee, Meshi Bashari, Edgar Dobriban, and Yaniv Romano. Synthetic-powered multiple testing with fdr control.ArXiv, abs/2602.16690, 2026

  25. [25]

    Sequential multiple testing: A second-order asymptotic analysis

    Jingyu Liu and Yanglei Song. Sequential multiple testing: A second-order asymptotic analysis. arXiv preprint arXiv:2603.04685, 2026

  26. [26]

    Tracking the risk of a deployed model and detecting harmful distribution shifts

    Aleksandr Podkopaev and Aaditya Ramdas. Tracking the risk of a deployed model and detecting harmful distribution shifts. InICLR, 2022

  27. [27]

    Online control of the false discovery rate with decaying memory

    Aaditya Ramdas, Fanny Yang, Martin J Wainwright, and Michael I Jordan. Online control of the false discovery rate with decaying memory. In I. Guyon, U. V on Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,NeurIPS, 2017

  28. [28]

    Wainwright, and Michael I

    Aaditya Ramdas, Tijana Zrnic, Martin J. Wainwright, and Michael I. Jordan. Saffron: an adaptive algorithm for online control of the false discovery rate. InICML, 2018

  29. [29]

    Robertson, James M

    David S. Robertson, James M. S. Wason, and Aaditya Ramdas. Online multiple hypothesis testing.Statistical science : a review journal of the Institute of Mathematical Statistics, 38:557 – 575, 2022

  30. [30]

    Online error rate control for platform trials.Statistics in Medicine, 42(14):2475–2495, 2023

    David S Robertson, James MS Wason, Franz König, Martin Posch, and Thomas Jaki. Online error rate control for platform trials.Statistics in Medicine, 42(14):2475–2495, 2023

  31. [31]

    Leonard J. Savage. The theory of statistical decision.Journal of the American Statistical Association, 46:55–67, 1951

  32. [32]

    Multiple hypothesis testing.Annual review of psychology, 46(1):561– 584, 1995

    Juliet Popper Shaffer et al. Multiple hypothesis testing.Annual review of psychology, 46(1):561– 584, 1995

  33. [33]

    Febbo, Kenneth N

    Dinesh Singh, Phillip G. Febbo, Kenneth N. Ross, Donald G. Jackson, Judith B. Manola, Christine Ladd, Pablo Tamayo, Andrew A. Renshaw, Anthony V . D’Amico, Jerome P. Richie, Eric S. Lander, Massimo Loda, Philip W. Kantoff, Todd R. Golub, and William R. Sellers. Gene expression correlates of clinical prostate cancer behavior.Cancer cell, 1 2:203–9, 2002

  34. [34]

    ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls

    Jinjin Tian and Aaditya Ramdas. ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls. InNeurIPS, 2019

  35. [35]

    Online control of the familywise error rate.Statistical Methods in Medical Research, 30(4):976–993, 2021

    Jinjin Tian and Aaditya Ramdas. Online control of the familywise error rate.Statistical Methods in Medical Research, 30(4):976–993, 2021

  36. [36]

    A unified framework for bandit multiple testing

    Ziyu Xu and Aaditya Ramdas. A unified framework for bandit multiple testing. InNeurIPS, 2021

  37. [37]

    Dynamic algorithms for online multiple testing

    Ziyu Xu and Aaditya Ramdas. Dynamic algorithms for online multiple testing. InMathematical and Scientific Machine Learning, pages 955–986, 2022

  38. [38]

    Online multiple testing with e-values

    Ziyu Xu and Aaditya Ramdas. Online multiple testing with e-values. InAISTATS, 2023. 11

  39. [39]

    More powerful multiple testing under dependence via random- ization

    Ziyu Xu and Aaditya Ramdas. More powerful multiple testing under dependence via random- ization. InAISTATS, 2024

  40. [40]

    Jamieson, and Martin J

    Fanny Yang, Aaditya Ramdas, Kevin G. Jamieson, and Martin J. Wainwright. A framework for multi-a(rmed)/b(andit) testing with online fdr control. InNeurIPS, 2017

  41. [41]

    e-gai: e-value-based generalized α-investing for online false discovery rate control

    Yifan Zhang, Zijian Wei, Haojie Ren, and Changliang Zou. e-gai: e-value-based generalized α-investing for online false discovery rate control. InICML, 2025

  42. [42]

    False discovery rate control for structured multiple testing: Asymmetric rules and conformal q-values.Journal of the American Statistical Association, 120:805 – 817, 2023

    Zinan Zhao and Wenguang Sun. False discovery rate control for structured multiple testing: Asymmetric rules and conformal q-values.Journal of the American Statistical Association, 120:805 – 817, 2023

  43. [43]

    Cold-Start Tax

    Tijana Zrnic, Aaditya Ramdas, and Michael I Jordan. Asynchronous online testing of multiple hypotheses.Journal of Machine Learning Research, 22(33):1–39, 2021. 12 Organization of the Appendix To facilitate navigation and comprehensive review, the appendix are organized as follows: • Appendix A (Related Work):It positions our Weighted Regret perspective an...

  44. [44]

    alpha-death

    and ADDIS [34]. To counter the “alpha-death” inherent in macroscopic cold-starts [9], various mechanisms have emerged. The SURE framework exploits conservative null structures to effectively reward the algorithm [10], while SupLORD employs dynamic scheduling to adjust testing levels based on wealth [37]. Concurrently, structure-adaptive methods like SAST ...

  45. [45]

    In the second phase (t>ρT), nulls occur with probabilityπ, incurring a rate ofπE[ξt]

    Exploration Cost ( ∆V T ):In the first phase ( t≤ρT), the environment is purely null, incurring a marginal penalty rate of 1·E[ξt]. In the second phase (t>ρT), nulls occur with probabilityπ, incurring a rate ofπE[ξt]. ∆V T≃ ∫ ρT 0 κα 2 √ tdt+ ∫ T ρT πκα 2 √ tdt=κα √ T(√ρ+π(1−√ρ)).(57)

  46. [46]

    Cold-Start Tax

    Discovery Dividend ( ∆M T ):Signals ( Yt = 1 ) occur only in the second phase with probability1−π. Given the detectabilityµ, the marginal recovery rate is(1−π)µE[ξt]. ∆M T≃ ∫ T ρT (1−π)µκα 2 √ tdt= (1−π)µκα √ T(1−√ρ).(58) Step 2: Solving for the Critical ThresholdM∗.The DOMT framework achieves a net reduction in weighted regret if b∆M T > a∆VT , or equiva...

  47. [47]

    The baseline parameter generation is predictable ( Vt−1-measurable): λbase t = f(δbase 1 ,...,δbase t−1)

  48. [48]

    The virtual decision utilizes the current evidence without future noise, making it Vt- measurable:δbase t =1{et≥1/λbase t }

  49. [49]

    wealth pollution

    The virtual wealth W base t used to establish safety is updated exclusively by these virtual decisions and predictable parameters. Because the exploration noise ξt is causally decoupled from the virtual state, the parameter λbase t retains its strict predictability under the original uncorrupted filtrationVt. Consequently, the virtual processM base t safe...

  50. [50]

    broken-window

    The 100% Transient FDP Phenomenon:During infinite pure-noise cold-starts ( Yt≤T0 ≡0), the baseline discoveries are strictly zero. Any exploratory false positive triggers an instantaneous localized empirical FDP ofVt/Rt = 100% (sinceVt =R t). We explicitly clarify that this is not a failure of FDR control, but a topological necessity of 0/0→1/1. DOMT consc...

  51. [51]

    weak-to-moderate

    The Singularity of Extreme Signals:Theorem 2 dictates that deterministic baselines suffer Ω(T) missed discoveries due to threshold depletion. However, this absolute penalty critically hinges on the global Lipschitz conditionG(x)≤Lx, which mathematically confines the environment to “weak-to-moderate” signals. If the real-world signal distribution exhibits ...

  52. [52]

    Contextual DOMT:Rather than applying an indiscriminate, uniform spatial noise distribution ξt∼Uniform[0,ϵt] across the entire sequence, we envision an attention-based or covariate-guided exploration policyκθ(xt). By training a secondary machine learning model to estimate the localized prior probability of anomalies, DOMT could dynamically compress the spa...

  53. [53]

    stochastic tunneling

    Complex Topologies (DAGs and PRDS):Modern inference increasingly involves complex topolo- gies, such as testing Directed Acyclic Graphs (DAGs) in gene regulatory networks or tracking spatial contagion in neuroimaging (e.g., fMRI) and financial markets. Traditional FDR control often breaks down or suffers from severe starvation under arbitrary dependencies...