Recognition: unknown
Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables
Pith reviewed 2026-05-10 15:24 UTC · model grok-4.3
The pith
Post-hoc conformal selection generates paths of sets with FDP estimates whose average upper-bounds the true FDR
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PH-CS produces a monotone path of selection sets together with accompanying FDP estimates; after the path is generated, any operating point chosen by maximizing a user-specified utility satisfies a finite-sample post-hoc reliability guarantee in which the average (over the randomness of the procedure) of the ratio between the chosen estimated FDP and the true FDP is at most 1, making the average estimated FDP a valid upper bound on the true FDR to first order.
What carries the argument
Post-hoc conformal selection (PH-CS) built from conformal e-variables and the e-Benjamini-Hochberg (e-BH) procedure, which generates the full path of candidate sets and their data-driven FDP estimates
If this is right
- Users can inspect the realized distribution of test statistics and then choose how many candidates to pursue, subject only to a utility function they specify after seeing the data.
- The method extends directly to controlling any pre-specified risk measure rather than a simple binary quality threshold.
- Because the guarantee is finite-sample and holds on average, it remains valid even when the user selects the operating point adaptively.
- Experiments show that PH-CS can meet user-imposed utility constraints while producing FDP estimates whose reliability matches or exceeds that of fixed-FDR conformal selection.
Where Pith is reading between the lines
- The average-based nature of the guarantee may permit higher power than worst-case post-hoc procedures in settings where exchangeability holds only approximately.
- The path construction could be combined with other conformal or e-value methods to produce post-hoc control for composite hypotheses or structured selection problems.
- In domains such as genomics or neuroimaging, the ability to defer the FDR decision until after seeing the empirical distribution may reduce wasted follow-up resources on weak signals.
Load-bearing premise
Calibration and test data satisfy the exchangeability conditions that make the conformal e-variables valid, and the e-BH procedure is applied without additional violations.
What would settle it
Repeated independent trials in which the average ratio of estimated FDP to true FDP exceeds 1 would contradict the claimed finite-sample guarantee.
Figures
read the original abstract
Conformal selection (CS) uses calibration data to identify test inputs whose unobserved outcomes are likely to satisfy a pre-specified minimal quality requirement, while controlling the false discovery rate (FDR). Existing methods fix the target FDR level before observing data, which prevents the user from adapting the balance between number of selected test inputs and FDR to downstream needs and constraints based on the available data. For example, in genomics or neuroimaging, researchers often inspect the distribution of test statistics, and decide how aggressively to pursue candidates based on observed evidence strength and available follow-up resources. To address this limitation, we introduce {post-hoc CS} (PH-CS), which generates a path of candidate selection sets, each paired with a data-driven false discovery proportion (FDP) estimate. PH-CS lets the user select any operating point on this path by maximizing a user-specified utility, arbitrarily balancing selection size and FDR. Building on conformal e-variables and the e-Benjamini-Hochberg (e-BH) procedure, PH-CS is proved to provide a finite-sample post-hoc reliability guarantee whereby the ratio between estimated FDP level and true FDP is, on average, upper bounded by $1$, so that the average estimated FDP is, to first order, a valid upper bound on the true FDR. PH-CS is extended to control quality defined in terms of a general risk. Experiments on synthetic and real-world datasets demonstrate that, unlike CS, PH-CS can consistently satisfy user-imposed utility constraints while producing reliable FDP estimates and maintaining competitive FDR control.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces post-hoc conformal selection (PH-CS) to overcome the fixed FDR limitation in standard conformal selection. It generates a path of candidate selection sets paired with data-driven FDP estimates derived from conformal e-variables and the e-BH procedure. Users can then select any point on this path by maximizing a utility function that trades off selection size against estimated FDP. The central theoretical claim is a finite-sample guarantee that the expected ratio of estimated FDP to true FDP is at most 1, which the authors state implies that the average estimated FDP is, to first order, a valid upper bound on the true FDR. The method is extended to general risk measures, and experiments on synthetic and real data show competitive FDR control and utility satisfaction compared to fixed-level CS.
Significance. If the post-hoc guarantee is rigorously established, the work would meaningfully advance conformal selection by enabling data-adaptive, post-inspection FDR control, which is practically relevant for domains such as genomics and neuroimaging. The finite-sample validity via e-variables and the provision of a full path of estimates are notable strengths. However, the logical step connecting the ratio bound to an upper bound on FDR requires explicit justification, as the current framing leaves the central reliability claim open to question.
major comments (2)
- [Abstract] Abstract: The claim states that E[est_FDP / true_FDP] ≤ 1 implies the average estimated FDP is a valid upper bound on the true FDR (to first order). This does not follow in general because E[est_FDP] = E[(est_FDP/true_FDP) × true_FDP], and dependence between the ratio and true_FDP can alter the direction of the inequality between E[est_FDP] and E[true_FDP]. The manuscript should either derive the desired expectation inequality directly or clarify why the ratio bound suffices for the stated FDR control interpretation.
- [Abstract] The proof of the post-hoc reliability guarantee (referenced via conformal e-variables and e-BH) is described only at a high level in the abstract. Without the explicit derivation steps, assumptions on exchangeability, and handling of the covariance term, it is not possible to verify whether the finite-sample guarantee actually supports the FDR upper-bound interpretation or remains limited to the ratio bound alone.
minor comments (2)
- [Abstract] The abstract uses the phrase 'to first order' without defining what approximation or asymptotic regime is intended; this should be made precise in the main text.
- Notation for est_FDP and true_FDP should be introduced with explicit definitions and distinguished from the random variables they represent.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comments highlight important nuances in the interpretation of our post-hoc guarantee, and we will revise the manuscript to address them directly. Below we respond point by point to the major comments.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim states that E[est_FDP / true_FDP] ≤ 1 implies the average estimated FDP is a valid upper bound on the true FDR (to first order). This does not follow in general because E[est_FDP] = E[(est_FDP/true_FDP) × true_FDP], and dependence between the ratio and true_FDP can alter the direction of the inequality between E[est_FDP] and E[true_FDP]. The manuscript should either derive the desired expectation inequality directly or clarify why the ratio bound suffices for the stated FDR control interpretation.
Authors: We appreciate the referee's precise observation on the distinction between the ratio bound and the expectation bound. The dependence between est_FDP and true_FDP can indeed prevent a direct implication from E[ratio] ≤ 1 to E[est_FDP] ≤ E[true_FDP]. In the current manuscript the phrase 'to first order' was intended to indicate an approximate practical validity when FDP variability is moderate, but we agree this requires clarification. In the revision we will either derive a direct bound on E[est_FDP] under the exchangeability and conformal assumptions of the paper, or explicitly state the conditions (e.g., bounded relative variance of true_FDP) under which the ratio guarantee yields the desired FDR interpretation. We will also add a short discussion of the covariance term in Section 3. revision: yes
-
Referee: [Abstract] The proof of the post-hoc reliability guarantee (referenced via conformal e-variables and e-BH) is described only at a high level in the abstract. Without the explicit derivation steps, assumptions on exchangeability, and handling of the covariance term, it is not possible to verify whether the finite-sample guarantee actually supports the FDR upper-bound interpretation or remains limited to the ratio bound alone.
Authors: The abstract is intentionally concise, while the complete derivation appears in Section 3. There we construct conformal e-variables for each candidate threshold under the exchangeability of calibration and test scores, apply the e-BH procedure to produce the path of FDP estimates, and invoke the martingale property of the e-variables (via optional stopping) to obtain the finite-sample bound on the expected ratio. We will revise the abstract to briefly list these elements and the exchangeability assumption. On the covariance term, the e-variable construction ensures the conditional expectation is 1 under the null independently of covariance with the true FDP, which is why the ratio bound holds without an additional covariance control step. revision: yes
Circularity Check
No significant circularity; derivation relies on independent prior results
full rationale
The paper derives its finite-sample post-hoc reliability guarantee for the ratio bound on estimated vs. true FDP directly from the established validity properties of conformal e-variables (under exchangeability) and the e-BH procedure. These are external, independently developed results not constructed or fitted within this manuscript. No step reduces a claimed prediction or first-principles result to a self-definition, a fitted parameter renamed as output, or a load-bearing self-citation chain. The central claim remains non-tautological and externally falsifiable via the cited e-value theory. The 'to first order' qualifier on the FDR implication is an interpretive note rather than a definitional reduction, and the derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Exchangeability between calibration and test data for conformal validity
- domain assumption Validity properties of the e-Benjamini-Hochberg procedure
Reference graph
Works this paper leans on
-
[1]
Selection by prediction with conformal p-values,
Y . Jin and E. J. Cand `es, “Selection by prediction with conformal p-values,”J. Mach. Learn. Res., vol. 24, no. 244, pp. 1–41, 2023
2023
-
[2]
Machine learning assisted hit prioritization for high throughput screening in drug discovery,
D. Boldini, L. Friedrichet al., “Machine learning assisted hit prioritization for high throughput screening in drug discovery,” ACS Cent. Sci., vol. 10, no. 4, pp. 823–832, 2024
2024
-
[3]
Conformal selection for efficient and accurate compound screening in drug discovery,
T. Bai, P. Tanget al., “Conformal selection for efficient and accurate compound screening in drug discovery,”J. Chem. Inf. Model., vol. 65, no. 24, pp. 13 070–13 085, 2025
2025
-
[4]
A practical guide to methods controlling false discoveries in computational biology,
K. Korthauer, P. K. Kimeset al., “A practical guide to methods controlling false discoveries in computational biology,” Genome Biol., vol. 20, no. 1, 2019
2019
-
[5]
False discovery rate control via data splitting,
C. Dai, B. Linet al., “False discovery rate control via data splitting,”J. Am. Stat. Assoc., vol. 118, no. 544, pp. 2503–2520, 2023
2023
-
[6]
False discovery rate (FDR) and familywise error rate (FER) rules for model selection in signal processing applications,
P. Stoica and P. Babu, “False discovery rate (FDR) and familywise error rate (FER) rules for model selection in signal processing applications,”IEEE Open J. Signal Process., vol. 3, pp. 403–416, 2022
2022
-
[7]
V ovk, A
V . V ovk, A. Gammerman, and G. Shafer,Algorithmic learning in a random world. Springer, 2005, vol. 29
2005
-
[8]
Conformal prediction: A gentle introduction,
A. N. Angelopoulos and S. Bates, “Conformal prediction: A gentle introduction,”Found. Trends Mach. Learn., vol. 16, no. 4, pp. 494–591, 2023
2023
-
[9]
Conformal prediction for time series,
C. Xu and Y . Xie, “Conformal prediction for time series,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 11 575–11 587, 2023
2023
-
[10]
Few-shot calibration of set predictors via meta-learned cross-validation-based conformal prediction,
S. Park, K. M. Cohen, and O. Simeone, “Few-shot calibration of set predictors via meta-learned cross-validation-based conformal prediction,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 1, pp. 280–291, 2024
2024
-
[11]
Ensemble conformalized quantile regression for probabilistic time series forecasting,
V . Jensen, F. M. Bianchi, and S. N. Anfinsen, “Ensemble conformalized quantile regression for probabilistic time series forecasting,”IEEE Trans. Neural Netw. Learn. Syst., vol. 35, no. 7, pp. 9014–9025, 2024
2024
-
[12]
Unifying different theories of conformal prediction.arXiv preprint arXiv:2504.02292,
R. F. Barber and R. J. Tibshirani, “Unifying different theories of conformal prediction,”arXiv preprint arXiv:2504.02292, 2025. 31
-
[13]
Controlling the false discovery rate: a practical and powerful approach to multiple testing,
Y . Benjamini and Y . Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc. Ser. B Methodol., vol. 57, no. 1, pp. 289–300, 1995
1995
-
[14]
The control of the false discovery rate in multiple testing under dependency,
Y . Benjamini and D. Yekutieli, “The control of the false discovery rate in multiple testing under dependency,”Ann. Stat., vol. 29, no. 4, pp. 1165–1188, 2001
2001
-
[15]
Testing for outliers with conformal p-values,
S. Bates, E. Cand `eset al., “Testing for outliers with conformal p-values,”Ann. Stat., vol. 51, no. 1, pp. 149–178, 2023
2023
-
[16]
Sensitivity analysis of individual treatment effects: A robust conformal inference approach,
Y . Jin, Z. Ren, and E. J. Cand `es, “Sensitivity analysis of individual treatment effects: A robust conformal inference approach,”Proc. Natl. Acad. Sci. U.S.A., vol. 120, no. 6, p. e2214889120, 2023
2023
-
[17]
Multivariate conformal selection,
T. Bai, Y . Zhaoet al., “Multivariate conformal selection,” inProc. Int. Conf. Mach. Learn., vol. 267, 2025, pp. 2535–2559
2025
-
[18]
Multi-condition conformal selection,
Q. Hao, W. Liaoet al., “Multi-condition conformal selection,”arXiv preprint arXiv:2510.08075, 2025
-
[19]
Optimized conformal selection: Powerful selective inference after conformity score optimization,
T. Bai and Y . Jin, “Optimized conformal selection: Powerful selective inference after conformity score optimization,”arXiv preprint arXiv:2411.17983, 2024
-
[20]
Revamping conformal selection with optimal power: A Neyman–Pearson perspective,
J. Qin, Y . Liuet al., “Revamping conformal selection with optimal power: A Neyman–Pearson perspective,”arXiv preprint arXiv:2502.16513, 2025
-
[21]
Online conformal selection with accept-to-reject changes,
K. Liu, H. Xiet al., “Online conformal selection with accept-to-reject changes,” inProc. AAAI Conf. Artif. Intell., vol. 40, no. 28, 2026, pp. 23 765–23 773
2026
-
[22]
Feedback-Enhanced Online Multiple Testing with Applications to Conformal Selection
L. Lu, Y . Huoet al., “Feedback-enhanced online multiple testing with applications to conformal selection,”arXiv preprint arXiv:2509.03297, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
ACS: An interactive framework for conformal selection,
Y . Gui, Y . Jinet al., “ACS: An interactive framework for conformal selection,”arXiv preprint arXiv:2507.15825, 2025
-
[24]
Diversifying conformal selections,
Y . Nair, Y . Jinet al., “Diversifying conformal selections,”arXiv preprint arXiv:2506.16229, 2025
-
[25]
E-values: Calibration, combination and applications,
V . V ovk and R. Wang, “E-values: Calibration, combination and applications,”Ann. Stat., vol. 49, no. 3, pp. 1736–1754, 2021
2021
-
[26]
Safe testing,
P. Gr ¨unwald, R. de Heide, and W. Koolen, “Safe testing,”J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 86, no. 5, pp. 1091–1128, 2024
2024
-
[27]
N. W. Koning, “Post-hocαhypothesis testing and the post-hocp-value,”arXiv preprint arXiv:2312.08040, 2025
-
[28]
False discovery rate control with e-values,
R. Wang and A. Ramdas, “False discovery rate control with e-values,”J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 84, no. 3, pp. 822–852, 2022
2022
-
[29]
Enhancing conformal prediction using e-test statistics,
A. A. Balinsky and A. D. Balinsky, “Enhancing conformal prediction using e-test statistics,”arXiv preprint arXiv:2403.19082, 2024
-
[30]
Derandomized novelty detection with FDR control via conformal e-values,
M. Bashari, A. Epsteinet al., “Derandomized novelty detection with FDR control via conformal e-values,”Adv. Neural Inf. Process. Syst., vol. 36, pp. 65 585–65 596, 2023
2023
-
[31]
J. Lee and Z. Ren, “Boosting e-BH via conditional calibration,”arXiv preprint arXiv:2404.17562, 2024
-
[32]
Asymptotic and compound e-values: multiple testing and empirical Bayes,
N. Ignatiadis, R. Wang, and A. Ramdas, “Asymptotic and compound e-values: multiple testing and empirical Bayes,”arXiv preprint arXiv:2409.19812, 2024
-
[33]
More powerful multiple testing under dependence via randomization,
Z. Xu and A. Ramdas, “More powerful multiple testing under dependence via randomization,”arXiv preprint arXiv:2305.11126, 2023
-
[34]
Multiple testing for exploratory research,
J. J. Goeman and A. Solari, “Multiple testing for exploratory research,”Statist. Sci., vol. 26, no. 4, pp. 584–597, 2011
2011
-
[35]
Post hoc confidence bounds on false positives using reference families,
G. Blanchard, P. Neuvial, and E. Roquain, “Post hoc confidence bounds on false positives using reference families,”Ann. Stat., vol. 48, no. 3, pp. 1281–1303, 2020
2020
-
[36]
Permutation-based simultaneous confidence bounds for the false discovery proportion,
J. Hemerik, A. Solari, and J. J. Goeman, “Permutation-based simultaneous confidence bounds for the false discovery proportion,”Biometrika, vol. 106, no. 3, pp. 635–649, 2019
2019
-
[37]
Post-selection inference for e-value based confidence intervals,
Z. Xu, R. Wang, and A. Ramdas, “Post-selection inference for e-value based confidence intervals,”Electron. J. Stat., vol. 18, no. 1, pp. 2292–2338, 2024. 32
2024
-
[38]
Backward conformal prediction,
E. Gauthier, F. Bach, and M. I. Jordan, “Backward conformal prediction,”arXiv preprint arXiv:2505.13732, 2025
-
[39]
E-values expand the scope of conformal prediction,
E. Gauthier, F. Bach, and M. I. Jordan, “E-values expand the scope of conformal prediction,”arXiv preprint arXiv:2503.13050, 2025
-
[40]
Selection from hierarchical data with conformal e-values,
Y . Lee and Z. Ren, “Selection from hierarchical data with conformal e-values,”arXiv preprint arXiv:2501.02514, 2025
-
[41]
Conformal selective prediction with general risk control,
T. Bai and Y . Jin, “Conformal selective prediction with general risk control,”arXiv preprint arXiv:2603.24704, 2026
-
[42]
Towards human-AI complementarity with prediction sets,
G. De Toni, N. Okatiet al., “Towards human-AI complementarity with prediction sets,”Adv. Neural Inf. Process. Syst., vol. 37, pp. 31 380–31 409, 2024
2024
-
[43]
Improving the accuracy of medical diagnosis with causal machine learning,
J. G. Richens, C. M. Lee, and S. Johri, “Improving the accuracy of medical diagnosis with causal machine learning,”Nat. Commun., vol. 11, no. 1, p. 3923, 2020
2020
-
[44]
A review on machine learning approaches and trends in drug discovery,
P. Carracedo-Reboredo, J. Li ˜nares-Blancoet al., “A review on machine learning approaches and trends in drug discovery,” Comput. Struct. Biotechnol. J., vol. 19, pp. 4538–4558, 2021
2021
-
[45]
Fuzzy prediction sets: Conformal prediction with e-values,
N. W. Koning and S. van Meer, “Fuzzy prediction sets: Conformal prediction with e-values,”arXiv preprint arXiv:2509.13130, 2025
-
[46]
arXiv preprint arXiv:2509.02517 , year=
Z. Xu, A. Solariet al., “Bringing closure to false discovery rate control: A general principle for multiple testing,”arXiv preprint arXiv:2509.02517, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.