pith. sign in

arxiv: 2605.20726 · v1 · pith:FIMI4DSXnew · submitted 2026-05-20 · 📊 stat.ME · cs.LG· stat.ML

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

Pith reviewed 2026-05-21 02:44 UTC · model grok-4.3

classification 📊 stat.ME cs.LGstat.ML
keywords conformal inferencefalse discovery proportionmultiple testingpost hoc selectiondistribution-free boundshigh-probability boundsoutlier detection
0
0 comments X

The pith

Finite-sample bounds on the false discovery proportion hold simultaneously for all rejection thresholds in conformal inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops finite-sample, distribution-free upper bounds on the false discovery proportion that remain valid no matter which rejection threshold is selected after inspecting the data. Standard approaches control only the expected value of the FDP and lose their guarantees under post hoc threshold choice. The new bounds are obtained by constructing a high-probability envelope around the empirical distribution function of null conformal p-values, using samples drawn from their joint distribution. The envelope can be shaped to concentrate tightness in the rejection regions of greatest interest. The framework is applied to derive simultaneous bounds for outlier detection and for conformal selection procedures.

Core claim

The paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. The framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest, and applies this to derive simultaneous FDP upper bounds for both outlier detection and conformal selection.

What carries the argument

High-probability envelope for the empirical distribution function of null conformal p-values, built by sampling from their joint distribution.

If this is right

  • The bounds support arbitrary post hoc selection of the rejection threshold while preserving statistical validity.
  • The same envelope construction yields valid bounds for outlier detection and for conformal selection.
  • Modulating the envelope shape produces tighter bounds in the rejection regions of primary practical interest.
  • Synthetic and real-data experiments confirm that the bounds are valid yet substantially less conservative than existing methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The simultaneous validity could support more flexible exploratory analyses in settings where thresholds must be chosen after seeing preliminary results.
  • The sampling-based envelope might extend to other post-selection problems that involve data-dependent choices beyond standard conformal p-values.
  • In applied work the reduced conservatism could improve the power of selection procedures without sacrificing coverage.

Load-bearing premise

The construction requires the ability to sample from the joint distribution of the null conformal p-values.

What would settle it

Repeated simulations in which the observed FDP for a post-hoc chosen threshold exceeds the reported bound with frequency greater than the nominal error probability, or where the envelope fails to cover the realized null p-value distribution.

Figures

Figures reproduced from arXiv: 2605.20726 by Emmanuel J. Cand\`es, Ying Jin, Ziang Song.

Figure 1
Figure 1. Figure 1: Simultaneous FDP bounds in a drug-target interaction task. (a) One realization of the true FDP (blue) and simultaneous upper bounds constructed by our method with the following statistics: Truncated Higher Criticism (MC-THC), Higher Criticism (MC-HC), and Kolmogorov-Smirnov (MC-KS). The dashed line is the upper bound adapted from [GBR24]. (b) Residuals (upper bound minus true FDP) across 100 independent ex… view at source ↗
Figure 2
Figure 2. Figure 2: Upper bounds on Fbn,m(t) (n = m = 100, δ = 0.1) constructed via Algorithm 1 with B = 100. The gray curves represent 100 independent realizations of Fbn,m. The colored curves represent different envelope constructions: MC-KS (Kolmogorov–Smirnov statistic), MC-BJ (Berk–Jones statistic), MC-HC (Higher￾Criticism statistic), and MC-THC (truncated Higher-Criticism statistic). The Baseline curve corresponds to th… view at source ↗
Figure 3
Figure 3. Figure 3: Empirical coverage of the FDP bound. The plot displays the difference between our FDP upper bound (using MC-THC) and the true FDP across 100 replications of the outlier detection task (n = m = 1000, signal strength a = 0.2, target 1 − δ = 0.9). The curves remain above zero in 96% of the trials, demonstrating validity. The variance of the bound decreases as the rejection threshold t increases. Validity of F… view at source ↗
Figure 4
Figure 4. Figure 4: Impact of refinement strategies across signal strengths. FDP upper bounds in outlier detection with fixed purity (90%) and varying signal strength (a ∈ {0.1, 0.2, 0.5}). Columns correspond to different FDP envelopes. The curves compare three refinement levels: bounding the null count mb 0, the self-refinement step from Proposition 4.5, and the combined strategy. The combined approach consistently yields th… view at source ↗
Figure 5
Figure 5. Figure 5: Impact of refinement strategies across purity levels. FDP upper bounds with fixed signal strength (a = 0.2) and varying inlier purity ({70%, 80%, 90%}). As the proportion of outliers increases (lower purity), the benefit of the mb 0 tightening becomes more pronounced compared to the self-refinement step alone. 5 Controlling FDP/precision in conformal selection In this section, we demonstrate how our method… view at source ↗
Figure 6
Figure 6. Figure 6: Failure of post hoc BH levels as FDP certificates. Starting from [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The risk of post hoc parameter selection. A realization of [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Upper bounds on Fbn,m(t) (n = m = 100, δ = 0.1) constructed via Algorithm 1 with B = 100 with different shape parameter β. The gray curves represent 100 independent realizations of Fbn,m. C Applications to i.i.d. p-values C.1 Conformal vs. i.i.d. p-values: what changes and why Our envelope-based technique from Section 3 also applies to i.i.d. p-values U1, . . . , Um ∼ Unif[0, 1]. This setting can be viewed… view at source ↗
Figure 9
Figure 9. Figure 9: ECDF envelopes: Conformal vs. i.i.d. p-values. Left Column: Empirical CDFs of conformal p-values with fixed calibration size n = 100. Note that as m increases, the variance does not vanish due to the persistent randomness of the finite calibration set. Right Column: Empirical CDFs of i.i.d. uniform p￾values. The distribution concentrates tightly around the diagonal y = x as m → ∞. This contrast highlights … view at source ↗
Figure 10
Figure 10. Figure 10: Plots of ρn(t) for several values of n. Since cn,m(t) = m−1 + (1 − m−1 )ρn(t), this also illustrates the t-dependence of cn,m(t). The dependence is substantial only for very small n but becomes much less pronounced as n grows. C.3 Constructing calibration-conditional valid p-values We revisit the calibration-conditional p-values of [BCL+23] and show how our envelope construction provides a simple and tigh… view at source ↗
Figure 11
Figure 11. Figure 11: Different CCV p-values adjustments. The blue curves display 100 independent realizations of sorted i.i.d. uniform p-values (order statistics) with sample size n = 1000, plotted against their normalized rank i/n. This setup replicates the validation framework of [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Detailed view of CCV p-value adjustments in the lower tail. A zoomed-in perspective of [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗
read the original abstract

Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims to establish finite-sample, distribution-free upper bounds on the false discovery proportion (FDP) that hold simultaneously over all possible rejection thresholds in conformal inference. This is achieved by constructing a high-probability envelope on the empirical distribution function of null conformal p-values through exact sampling from their joint distribution (which is uniform and rank-based, hence distribution-free). The framework further permits modulating the envelope shape to tighten bounds in regions of interest and is applied to derive simultaneous FDP bounds for outlier detection and conformal selection, with supporting synthetic and real-data experiments.

Significance. If the central construction holds, the work is significant for providing high-probability (rather than expectation-only) control on realized FDP while preserving validity under arbitrary post-hoc threshold choice. This directly addresses a practical limitation of procedures like Benjamini-Hochberg in conformal multiple-testing settings and leverages the exact samplability of conformal null p-values to obtain non-asymptotic, distribution-free guarantees.

minor comments (3)
  1. Clarify in the main text (near the envelope construction) whether the modulation of envelope shape is performed in a data-independent manner or if any data-dependent tuning is used; the coverage statement must remain unaffected.
  2. In the experimental section, report the exact number of Monte Carlo samples used for envelope construction and include a sensitivity check showing that the reported bounds stabilize with increasing sample size.
  3. Add a brief remark on computational cost of the sampling procedure relative to standard conformal p-value computation, especially for large calibration sets.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review and accurate summary of our manuscript. We appreciate the recognition of the significance of our finite-sample, distribution-free simultaneous bounds on the FDP and the recommendation for minor revision. Since the report does not list any specific major comments, we have no points to address point-by-point at this stage. We will incorporate any minor suggestions from the editor or further review in the revised manuscript.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation constructs simultaneous finite-sample distribution-free upper bounds on the FDP by building a high-probability envelope around the ECDF of null conformal p-values. This envelope is obtained by direct Monte Carlo sampling from the exact joint distribution of those p-values, which is known to be uniform and distribution-free under the null due to the rank-based definition of conformal p-values. The sampling step uses only the known null properties and does not depend on fitted parameters from the observed data, post-hoc threshold selection, or any self-referential quantities. No load-bearing self-citations, ansatzes smuggled via prior work, or reductions of predictions to fitted inputs appear in the central argument; the construction remains self-contained against the external benchmark of conformal p-value uniformity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on standard conformal inference assumptions about the null p-values together with the ability to sample from their joint distribution.

axioms (1)
  • domain assumption Null conformal p-values admit sampling from their joint distribution under the null.
    Required to construct the high-probability envelope for the empirical distribution function.

pith-pipeline@v0.9.0 · 5752 in / 1189 out tokens · 29996 ms · 2026-05-21T02:44:44.629235+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    International Conference on Artificial Intelligence and Statistics , pages=

    Transductive conformal inference with adaptive scores , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

  2. [2]

    Proceedings of the 23rd international conference on Machine learning , pages=

    The relationship between Precision-Recall and ROC curves , author=. Proceedings of the 23rd international conference on Machine learning , pages=

  3. [3]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Simultaneous false discovery proportion bounds via knockoffs and closed testing , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2024 , publisher=

  4. [4]

    2000 , publisher=

    Continuous Multivariate Distributions, Volume 1: Models and Applications , author=. 2000 , publisher=

  5. [5]

    The annals of Probability , pages=

    The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality , author=. The annals of Probability , pages=. 1990 , publisher=

  6. [6]

    2012 , publisher=

    Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , author=. 2012 , publisher=

  7. [7]

    Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation,

    Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , author=. arXiv preprint arXiv:2010.16061 , year=

  8. [8]

    Advances in neural information processing systems , volume=

    Selective classification for deep neural networks , author=. Advances in neural information processing systems , volume=

  9. [9]

    arXiv preprint arXiv:2602.10018 , year=

    Online Selective Conformal Prediction with Asymmetric Rules: A Permutation Test Approach , author=. arXiv preprint arXiv:2602.10018 , year=

  10. [10]

    Higher criticism: p-values and criticism , author=

  11. [11]

    Advances in neural information processing systems , volume=

    Support vector method for novelty detection , author=. Advances in neural information processing systems , volume=

  12. [12]

    Bernoulli , volume=

    A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model , author=. Bernoulli , volume=. 2024 , publisher=

  13. [13]

    The Annals of Statistics , volume=

    Adaptive novelty detection with false discovery rate guarantee , author=. The Annals of Statistics , volume=. 2024 , publisher=

  14. [14]

    Nature biotechnology , volume=

    Comprehensive analysis of kinase inhibitor selectivity , author=. Nature biotechnology , volume=. 2011 , publisher=

  15. [15]

    Signal processing , volume=

    A review of novelty detection , author=. Signal processing , volume=. 2014 , publisher=

  16. [16]

    2005 , publisher=

    Algorithmic learning in a random world , author=. 2005 , publisher=

  17. [17]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2004 , publisher=

  18. [18]

    Higher criticism for detecting sparse heterogeneous mixtures , author=

  19. [19]

    Journal of Probability and Statistics , volume=

    Control of the false discovery proportion for independently tested null hypotheses , author=. Journal of Probability and Statistics , volume=. 2012 , publisher=

  20. [20]

    The Annals of Mathematical Statistics , pages=

    Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator , author=. The Annals of Mathematical Statistics , pages=. 1956 , publisher=

  21. [21]

    Advances in neural information processing systems , volume=

    Conformal prediction under covariate shift , author=. Advances in neural information processing systems , volume=

  22. [22]

    ACM Transactions on Mathematical Software (TOMS) , volume=

    Parallel weighted random sampling , author=. ACM Transactions on Mathematical Software (TOMS) , volume=. 2022 , publisher=

  23. [23]

    Brazilian Journal of Probability and Statistics , volume=

    Weighted sampling without replacement , author=. Brazilian Journal of Probability and Statistics , volume=. 2018 , publisher=

  24. [24]

    Mathematica Slovaca , volume=

    Rate of convergence of empirical measures for exchangeable sequences , author=. Mathematica Slovaca , volume=. 2017 , publisher=

  25. [25]

    arXiv preprint arXiv:2208.06685 , year=

    Machine learning meets false discovery rate , author=. arXiv preprint arXiv:2208.06685 , year=

  26. [26]

    arXiv preprint arXiv:2010.09686 , year=

    Estimating means of bounded random variables by betting , author=. arXiv preprint arXiv:2010.09686 , year=

  27. [27]

    The Annals of Statistics , volume=

    Testing for outliers with conformal p-values , author=. The Annals of Statistics , volume=. 2023 , publisher=

  28. [28]

    Journal of Machine Learning Research , volume=

    Selection by prediction with conformal p-values , author=. Journal of Machine Learning Research , volume=

  29. [29]

    arXiv preprint arXiv:2307.09291 , year=

    Model-free selective inference under covariate shift via weighted conformal p-values , author=. arXiv preprint arXiv:2307.09291 , year=

  30. [30]

    Goeman and Aldo Solari , title =

    Jelle J. Goeman and Aldo Solari , title =. Statistical Science , number =. 2011 , doi =

  31. [31]

    Journal of the American Statistical Association , volume=

    Exceedance control of the false discovery proportion , author=. Journal of the American Statistical Association , volume=. 2006 , publisher=

  32. [32]

    Biometrika , volume=

    Simultaneous control of all false discovery proportions in large-scale multiple hypothesis testing , author=. Biometrika , volume=. 2019 , publisher=

  33. [33]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Permutation-based true discovery guarantee by sum tests , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2023 , publisher=

  34. [34]

    arXiv preprint arXiv:2212.12822 , year=

    Simultaneous false discovery proportion bounds via knockoffs and closed testing , author=. arXiv preprint arXiv:2212.12822 , year=

  35. [35]

    The Annals of Statistics , number =

    Christopher Genovese and Larry Wasserman , title =. The Annals of Statistics , number =. 2004 , doi =

  36. [36]

    Scandinavian Journal of Statistics , volume=

    False discovery control for multiple tests of association under general dependence , author=. Scandinavian Journal of Statistics , volume=. 2006 , publisher=

  37. [37]

    Biometrika , volume=

    Permutation-based simultaneous confidence bounds for the false discovery proportion , author=. Biometrika , volume=. 2019 , publisher=

  38. [38]

    Goeman and Jesse Hemerik and Aldo Solari , title =

    Jelle J. Goeman and Jesse Hemerik and Aldo Solari , title =. The Annals of Statistics , number =. 2021 , doi =

  39. [39]

    The Annals of Statistics , number =

    Eugene Katsevich and Aaditya Ramdas , title =. The Annals of Statistics , number =. 2020 , doi =

  40. [40]

    The Annals of Statistics , number =

    Gilles Blanchard and Pierre Neuvial and Etienne Roquain , title =. The Annals of Statistics , number =. 2020 , doi =

  41. [41]

    Information and Inference: A Journal of the IMA , volume=

    The limits of distribution-free conditional predictive inference , author=. Information and Inference: A Journal of the IMA , volume=. 2021 , publisher=

  42. [42]

    Journal of the Royal statistical society: series B (Methodological) , volume=

    Controlling the false discovery rate: a practical and powerful approach to multiple testing , author=. Journal of the Royal statistical society: series B (Methodological) , volume=. 1995 , publisher=

  43. [43]

    arXiv preprint arXiv:2411.17983 , year=

    Optimized conformal selection: Powerful selective inference after conformity score optimization , author=. arXiv preprint arXiv:2411.17983 , year=

  44. [44]

    arXiv preprint arXiv:2506.16229 , year=

    Diversifying Conformal Selections , author=. arXiv preprint arXiv:2506.16229 , year=

  45. [45]

    arXiv preprint arXiv:2507.15825 , year=

    ACS: An interactive framework for conformal selection , author=. arXiv preprint arXiv:2507.15825 , year=

  46. [46]

    bioRxiv , pages=

    TxConformal: Controlling False Discoveries in AI-Driven Therapeutic Discovery , author=. bioRxiv , pages=. 2026 , publisher=