pith. machine review for the scientific record. sign in

arxiv: 2604.11305 · v2 · submitted 2026-04-13 · 💻 cs.LG · cs.IT· math.IT· stat.ML

Recognition: unknown

Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:24 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.ITstat.ML
keywords post-hoc conformal selectionfalse discovery ratee-variablese-Benjamini-Hochbergconformal predictionmultiple testingmachine learning
0
0 comments X

The pith

Post-hoc conformal selection generates paths of sets with FDP estimates whose average upper-bounds the true FDR

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard conformal selection fixes the target false discovery rate before seeing any test data, which blocks users from adjusting how aggressively they select based on the strength of evidence that actually appears. This paper introduces post-hoc conformal selection, or PH-CS, which instead produces an entire path of candidate selection sets, each paired with its own data-driven estimate of the false discovery proportion. A user can then pick whichever point on that path best matches a downstream utility function without needing to commit to an FDR level ahead of time. The construction rests on conformal e-variables and the e-Benjamini-Hochberg procedure, and the central guarantee is that the ratio of estimated to true false discovery proportion is on average at most 1, so the average estimated FDP serves as a first-order valid upper bound on the true FDR. The same framework extends to controlling a general risk rather than a binary quality threshold.

Core claim

PH-CS produces a monotone path of selection sets together with accompanying FDP estimates; after the path is generated, any operating point chosen by maximizing a user-specified utility satisfies a finite-sample post-hoc reliability guarantee in which the average (over the randomness of the procedure) of the ratio between the chosen estimated FDP and the true FDP is at most 1, making the average estimated FDP a valid upper bound on the true FDR to first order.

What carries the argument

Post-hoc conformal selection (PH-CS) built from conformal e-variables and the e-Benjamini-Hochberg (e-BH) procedure, which generates the full path of candidate sets and their data-driven FDP estimates

If this is right

  • Users can inspect the realized distribution of test statistics and then choose how many candidates to pursue, subject only to a utility function they specify after seeing the data.
  • The method extends directly to controlling any pre-specified risk measure rather than a simple binary quality threshold.
  • Because the guarantee is finite-sample and holds on average, it remains valid even when the user selects the operating point adaptively.
  • Experiments show that PH-CS can meet user-imposed utility constraints while producing FDP estimates whose reliability matches or exceeds that of fixed-FDR conformal selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The average-based nature of the guarantee may permit higher power than worst-case post-hoc procedures in settings where exchangeability holds only approximately.
  • The path construction could be combined with other conformal or e-value methods to produce post-hoc control for composite hypotheses or structured selection problems.
  • In domains such as genomics or neuroimaging, the ability to defer the FDR decision until after seeing the empirical distribution may reduce wasted follow-up resources on weak signals.

Load-bearing premise

Calibration and test data satisfy the exchangeability conditions that make the conformal e-variables valid, and the e-BH procedure is applied without additional violations.

What would settle it

Repeated independent trials in which the average ratio of estimated FDP to true FDP exceeds 1 would contradict the claimed finite-sample guarantee.

Figures

Figures reproduced from arXiv: 2604.11305 by Meiyi Zhu, Osvaldo Simeone.

Figure 1
Figure 1. Figure 1: Illustration of the PH-CS problem and framework. A user has access to labeled calibration [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Histograms of the selected set size (top row) and FDP (bottom row) under the constrained [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Histograms of the selected set size (top row) and FDP (bottom row) under the constrained [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Histograms of the realized utility under the additive trade-off utility (7) on synthetic data [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Histograms of the realized utility (left), selected set size (middle), and FDP (right) under [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Scatter plots of the realized FDP, FDP(RPH-CS ,Y test) in (4), versus the estimated level α PH-CS (25) over 100 random seeds under the constrained-size utility (6) on the Shuttle dataset (left) and the additive trade-off utility (7) on the Recruitment dataset (right). Contour lines show the kernel density of the joint distribution. The black dashed line is the reference at which estimated and true FDP coin… view at source ↗
Figure 7
Figure 7. Figure 7: Histograms of the selected set size (top row) and FDP (bottom row) under the constrained [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Histogram of the realized utility under the additive trade-off utility (7) on synthetic data [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Histograms of the realized utility under the additive trade-off utility (7) on the Recruitment [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗
read the original abstract

Conformal selection (CS) uses calibration data to identify test inputs whose unobserved outcomes are likely to satisfy a pre-specified minimal quality requirement, while controlling the false discovery rate (FDR). Existing methods fix the target FDR level before observing data, which prevents the user from adapting the balance between number of selected test inputs and FDR to downstream needs and constraints based on the available data. For example, in genomics or neuroimaging, researchers often inspect the distribution of test statistics, and decide how aggressively to pursue candidates based on observed evidence strength and available follow-up resources. To address this limitation, we introduce {post-hoc CS} (PH-CS), which generates a path of candidate selection sets, each paired with a data-driven false discovery proportion (FDP) estimate. PH-CS lets the user select any operating point on this path by maximizing a user-specified utility, arbitrarily balancing selection size and FDR. Building on conformal e-variables and the e-Benjamini-Hochberg (e-BH) procedure, PH-CS is proved to provide a finite-sample post-hoc reliability guarantee whereby the ratio between estimated FDP level and true FDP is, on average, upper bounded by $1$, so that the average estimated FDP is, to first order, a valid upper bound on the true FDR. PH-CS is extended to control quality defined in terms of a general risk. Experiments on synthetic and real-world datasets demonstrate that, unlike CS, PH-CS can consistently satisfy user-imposed utility constraints while producing reliable FDP estimates and maintaining competitive FDR control.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces post-hoc conformal selection (PH-CS) to overcome the fixed FDR limitation in standard conformal selection. It generates a path of candidate selection sets paired with data-driven FDP estimates derived from conformal e-variables and the e-BH procedure. Users can then select any point on this path by maximizing a utility function that trades off selection size against estimated FDP. The central theoretical claim is a finite-sample guarantee that the expected ratio of estimated FDP to true FDP is at most 1, which the authors state implies that the average estimated FDP is, to first order, a valid upper bound on the true FDR. The method is extended to general risk measures, and experiments on synthetic and real data show competitive FDR control and utility satisfaction compared to fixed-level CS.

Significance. If the post-hoc guarantee is rigorously established, the work would meaningfully advance conformal selection by enabling data-adaptive, post-inspection FDR control, which is practically relevant for domains such as genomics and neuroimaging. The finite-sample validity via e-variables and the provision of a full path of estimates are notable strengths. However, the logical step connecting the ratio bound to an upper bound on FDR requires explicit justification, as the current framing leaves the central reliability claim open to question.

major comments (2)
  1. [Abstract] Abstract: The claim states that E[est_FDP / true_FDP] ≤ 1 implies the average estimated FDP is a valid upper bound on the true FDR (to first order). This does not follow in general because E[est_FDP] = E[(est_FDP/true_FDP) × true_FDP], and dependence between the ratio and true_FDP can alter the direction of the inequality between E[est_FDP] and E[true_FDP]. The manuscript should either derive the desired expectation inequality directly or clarify why the ratio bound suffices for the stated FDR control interpretation.
  2. [Abstract] The proof of the post-hoc reliability guarantee (referenced via conformal e-variables and e-BH) is described only at a high level in the abstract. Without the explicit derivation steps, assumptions on exchangeability, and handling of the covariance term, it is not possible to verify whether the finite-sample guarantee actually supports the FDR upper-bound interpretation or remains limited to the ratio bound alone.
minor comments (2)
  1. [Abstract] The abstract uses the phrase 'to first order' without defining what approximation or asymptotic regime is intended; this should be made precise in the main text.
  2. Notation for est_FDP and true_FDP should be introduced with explicit definitions and distinguished from the random variables they represent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight important nuances in the interpretation of our post-hoc guarantee, and we will revise the manuscript to address them directly. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim states that E[est_FDP / true_FDP] ≤ 1 implies the average estimated FDP is a valid upper bound on the true FDR (to first order). This does not follow in general because E[est_FDP] = E[(est_FDP/true_FDP) × true_FDP], and dependence between the ratio and true_FDP can alter the direction of the inequality between E[est_FDP] and E[true_FDP]. The manuscript should either derive the desired expectation inequality directly or clarify why the ratio bound suffices for the stated FDR control interpretation.

    Authors: We appreciate the referee's precise observation on the distinction between the ratio bound and the expectation bound. The dependence between est_FDP and true_FDP can indeed prevent a direct implication from E[ratio] ≤ 1 to E[est_FDP] ≤ E[true_FDP]. In the current manuscript the phrase 'to first order' was intended to indicate an approximate practical validity when FDP variability is moderate, but we agree this requires clarification. In the revision we will either derive a direct bound on E[est_FDP] under the exchangeability and conformal assumptions of the paper, or explicitly state the conditions (e.g., bounded relative variance of true_FDP) under which the ratio guarantee yields the desired FDR interpretation. We will also add a short discussion of the covariance term in Section 3. revision: yes

  2. Referee: [Abstract] The proof of the post-hoc reliability guarantee (referenced via conformal e-variables and e-BH) is described only at a high level in the abstract. Without the explicit derivation steps, assumptions on exchangeability, and handling of the covariance term, it is not possible to verify whether the finite-sample guarantee actually supports the FDR upper-bound interpretation or remains limited to the ratio bound alone.

    Authors: The abstract is intentionally concise, while the complete derivation appears in Section 3. There we construct conformal e-variables for each candidate threshold under the exchangeability of calibration and test scores, apply the e-BH procedure to produce the path of FDP estimates, and invoke the martingale property of the e-variables (via optional stopping) to obtain the finite-sample bound on the expected ratio. We will revise the abstract to briefly list these elements and the exchangeability assumption. On the covariance term, the e-variable construction ensures the conditional expectation is 1 under the null independently of covariance with the true FDP, which is why the ratio bound holds without an additional covariance control step. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent prior results

full rationale

The paper derives its finite-sample post-hoc reliability guarantee for the ratio bound on estimated vs. true FDP directly from the established validity properties of conformal e-variables (under exchangeability) and the e-BH procedure. These are external, independently developed results not constructed or fitted within this manuscript. No step reduces a claimed prediction or first-principles result to a self-definition, a fitted parameter renamed as output, or a load-bearing self-citation chain. The central claim remains non-tautological and externally falsifiable via the cited e-value theory. The 'to first order' qualifier on the FDR implication is an interpretive note rather than a definitional reduction, and the derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method extends conformal prediction and e-BH frameworks; specific axioms are not detailed in the abstract but are implied by the use of e-variables.

axioms (2)
  • domain assumption Exchangeability between calibration and test data for conformal validity
    Required for e-variables to provide valid conformal scores
  • domain assumption Validity properties of the e-Benjamini-Hochberg procedure
    Used to obtain the post-hoc reliability guarantee

pith-pipeline@v0.9.0 · 5586 in / 1276 out tokens · 65468 ms · 2026-05-10T15:24:55.802100+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 18 canonical work pages · 1 internal anchor

  1. [1]

    Selection by prediction with conformal p-values,

    Y . Jin and E. J. Cand `es, “Selection by prediction with conformal p-values,”J. Mach. Learn. Res., vol. 24, no. 244, pp. 1–41, 2023

  2. [2]

    Machine learning assisted hit prioritization for high throughput screening in drug discovery,

    D. Boldini, L. Friedrichet al., “Machine learning assisted hit prioritization for high throughput screening in drug discovery,” ACS Cent. Sci., vol. 10, no. 4, pp. 823–832, 2024

  3. [3]

    Conformal selection for efficient and accurate compound screening in drug discovery,

    T. Bai, P. Tanget al., “Conformal selection for efficient and accurate compound screening in drug discovery,”J. Chem. Inf. Model., vol. 65, no. 24, pp. 13 070–13 085, 2025

  4. [4]

    A practical guide to methods controlling false discoveries in computational biology,

    K. Korthauer, P. K. Kimeset al., “A practical guide to methods controlling false discoveries in computational biology,” Genome Biol., vol. 20, no. 1, 2019

  5. [5]

    False discovery rate control via data splitting,

    C. Dai, B. Linet al., “False discovery rate control via data splitting,”J. Am. Stat. Assoc., vol. 118, no. 544, pp. 2503–2520, 2023

  6. [6]

    False discovery rate (FDR) and familywise error rate (FER) rules for model selection in signal processing applications,

    P. Stoica and P. Babu, “False discovery rate (FDR) and familywise error rate (FER) rules for model selection in signal processing applications,”IEEE Open J. Signal Process., vol. 3, pp. 403–416, 2022

  7. [7]

    V ovk, A

    V . V ovk, A. Gammerman, and G. Shafer,Algorithmic learning in a random world. Springer, 2005, vol. 29

  8. [8]

    Conformal prediction: A gentle introduction,

    A. N. Angelopoulos and S. Bates, “Conformal prediction: A gentle introduction,”Found. Trends Mach. Learn., vol. 16, no. 4, pp. 494–591, 2023

  9. [9]

    Conformal prediction for time series,

    C. Xu and Y . Xie, “Conformal prediction for time series,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 11 575–11 587, 2023

  10. [10]

    Few-shot calibration of set predictors via meta-learned cross-validation-based conformal prediction,

    S. Park, K. M. Cohen, and O. Simeone, “Few-shot calibration of set predictors via meta-learned cross-validation-based conformal prediction,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 1, pp. 280–291, 2024

  11. [11]

    Ensemble conformalized quantile regression for probabilistic time series forecasting,

    V . Jensen, F. M. Bianchi, and S. N. Anfinsen, “Ensemble conformalized quantile regression for probabilistic time series forecasting,”IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 7, pp. 9014–9025, 2024

  12. [12]

    Unifying different theories of conformal prediction.arXiv preprint arXiv:2504.02292,

    R. F. Barber and R. J. Tibshirani, “Unifying different theories of conformal prediction,”arXiv preprint arXiv:2504.02292, 2025. 31

  13. [13]

    Controlling the false discovery rate: a practical and powerful approach to multiple testing,

    Y . Benjamini and Y . Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc. Ser. B Methodol., vol. 57, no. 1, pp. 289–300, 1995

  14. [14]

    The control of the false discovery rate in multiple testing under dependency,

    Y . Benjamini and D. Yekutieli, “The control of the false discovery rate in multiple testing under dependency,”Ann. Stat., vol. 29, no. 4, pp. 1165–1188, 2001

  15. [15]

    Testing for outliers with conformal p-values,

    S. Bates, E. Cand `eset al., “Testing for outliers with conformal p-values,”Ann. Stat., vol. 51, no. 1, pp. 149–178, 2023

  16. [16]

    Sensitivity analysis of individual treatment effects: A robust conformal inference approach,

    Y . Jin, Z. Ren, and E. J. Cand `es, “Sensitivity analysis of individual treatment effects: A robust conformal inference approach,”Proc. Natl. Acad. Sci. U.S.A., vol. 120, no. 6, p. e2214889120, 2023

  17. [17]

    Multivariate conformal selection,

    T. Bai, Y . Zhaoet al., “Multivariate conformal selection,” inProc. Int. Conf. Mach. Learn., vol. 267, 2025, pp. 2535–2559

  18. [18]

    Multi-condition conformal selection,

    Q. Hao, W. Liaoet al., “Multi-condition conformal selection,”arXiv preprint arXiv:2510.08075, 2025

  19. [19]

    Optimized conformal selection: Powerful selective inference after conformity score optimization,

    T. Bai and Y . Jin, “Optimized conformal selection: Powerful selective inference after conformity score optimization,”arXiv preprint arXiv:2411.17983, 2024

  20. [20]

    Revamping conformal selection with optimal power: A Neyman–Pearson perspective,

    J. Qin, Y . Liuet al., “Revamping conformal selection with optimal power: A Neyman–Pearson perspective,”arXiv preprint arXiv:2502.16513, 2025

  21. [21]

    Online conformal selection with accept-to-reject changes,

    K. Liu, H. Xiet al., “Online conformal selection with accept-to-reject changes,” inProc. AAAI Conf. Artif. Intell., vol. 40, no. 28, 2026, pp. 23 765–23 773

  22. [22]

    Feedback-Enhanced Online Multiple Testing with Applications to Conformal Selection

    L. Lu, Y . Huoet al., “Feedback-enhanced online multiple testing with applications to conformal selection,”arXiv preprint arXiv:2509.03297, 2025

  23. [23]

    ACS: An interactive framework for conformal selection,

    Y . Gui, Y . Jinet al., “ACS: An interactive framework for conformal selection,”arXiv preprint arXiv:2507.15825, 2025

  24. [24]

    Diversifying conformal selections,

    Y . Nair, Y . Jinet al., “Diversifying conformal selections,”arXiv preprint arXiv:2506.16229, 2025

  25. [25]

    E-values: Calibration, combination and applications,

    V . V ovk and R. Wang, “E-values: Calibration, combination and applications,”Ann. Stat., vol. 49, no. 3, pp. 1736–1754, 2021

  26. [26]

    Safe testing,

    P. Gr ¨unwald, R. de Heide, and W. Koolen, “Safe testing,”J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 86, no. 5, pp. 1091–1128, 2024

  27. [27]

    Stan Koobs and Nick W

    N. W. Koning, “Post-hocαhypothesis testing and the post-hocp-value,”arXiv preprint arXiv:2312.08040, 2025

  28. [28]

    False discovery rate control with e-values,

    R. Wang and A. Ramdas, “False discovery rate control with e-values,”J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 84, no. 3, pp. 822–852, 2022

  29. [29]

    Enhancing conformal prediction using e-test statistics,

    A. A. Balinsky and A. D. Balinsky, “Enhancing conformal prediction using e-test statistics,”arXiv preprint arXiv:2403.19082, 2024

  30. [30]

    Derandomized novelty detection with FDR control via conformal e-values,

    M. Bashari, A. Epsteinet al., “Derandomized novelty detection with FDR control via conformal e-values,”Adv. Neural Inf. Process. Syst., vol. 36, pp. 65 585–65 596, 2023

  31. [31]

    and Ren, Z

    J. Lee and Z. Ren, “Boosting e-BH via conditional calibration,”arXiv preprint arXiv:2404.17562, 2024

  32. [32]

    Asymptotic and compound e-values: multiple testing and empirical Bayes,

    N. Ignatiadis, R. Wang, and A. Ramdas, “Asymptotic and compound e-values: multiple testing and empirical Bayes,”arXiv preprint arXiv:2409.19812, 2024

  33. [33]

    More powerful multiple testing under dependence via randomization,

    Z. Xu and A. Ramdas, “More powerful multiple testing under dependence via randomization,”arXiv preprint arXiv:2305.11126, 2023

  34. [34]

    Multiple testing for exploratory research,

    J. J. Goeman and A. Solari, “Multiple testing for exploratory research,”Statist. Sci., vol. 26, no. 4, pp. 584–597, 2011

  35. [35]

    Post hoc confidence bounds on false positives using reference families,

    G. Blanchard, P. Neuvial, and E. Roquain, “Post hoc confidence bounds on false positives using reference families,”Ann. Stat., vol. 48, no. 3, pp. 1281–1303, 2020

  36. [36]

    Permutation-based simultaneous confidence bounds for the false discovery proportion,

    J. Hemerik, A. Solari, and J. J. Goeman, “Permutation-based simultaneous confidence bounds for the false discovery proportion,”Biometrika, vol. 106, no. 3, pp. 635–649, 2019

  37. [37]

    Post-selection inference for e-value based confidence intervals,

    Z. Xu, R. Wang, and A. Ramdas, “Post-selection inference for e-value based confidence intervals,”Electron. J. Stat., vol. 18, no. 1, pp. 2292–2338, 2024. 32

  38. [38]

    Backward conformal prediction,

    E. Gauthier, F. Bach, and M. I. Jordan, “Backward conformal prediction,”arXiv preprint arXiv:2505.13732, 2025

  39. [39]

    E-values expand the scope of conformal prediction,

    E. Gauthier, F. Bach, and M. I. Jordan, “E-values expand the scope of conformal prediction,”arXiv preprint arXiv:2503.13050, 2025

  40. [40]

    Selection from hierarchical data with conformal e-values,

    Y . Lee and Z. Ren, “Selection from hierarchical data with conformal e-values,”arXiv preprint arXiv:2501.02514, 2025

  41. [41]

    Conformal selective prediction with general risk control,

    T. Bai and Y . Jin, “Conformal selective prediction with general risk control,”arXiv preprint arXiv:2603.24704, 2026

  42. [42]

    Towards human-AI complementarity with prediction sets,

    G. De Toni, N. Okatiet al., “Towards human-AI complementarity with prediction sets,”Adv. Neural Inf. Process. Syst., vol. 37, pp. 31 380–31 409, 2024

  43. [43]

    Improving the accuracy of medical diagnosis with causal machine learning,

    J. G. Richens, C. M. Lee, and S. Johri, “Improving the accuracy of medical diagnosis with causal machine learning,”Nat. Commun., vol. 11, no. 1, p. 3923, 2020

  44. [44]

    A review on machine learning approaches and trends in drug discovery,

    P. Carracedo-Reboredo, J. Li ˜nares-Blancoet al., “A review on machine learning approaches and trends in drug discovery,” Comput. Struct. Biotechnol. J., vol. 19, pp. 4538–4558, 2021

  45. [45]

    Fuzzy prediction sets: Conformal prediction with e-values,

    N. W. Koning and S. van Meer, “Fuzzy prediction sets: Conformal prediction with e-values,”arXiv preprint arXiv:2509.13130, 2025

  46. [46]

    arXiv preprint arXiv:2509.02517 , year=

    Z. Xu, A. Solariet al., “Bringing closure to false discovery rate control: A general principle for multiple testing,”arXiv preprint arXiv:2509.02517, 2025