Making Conformal Predictors Robust in Healthcare Settings: a Case Study on EEG Classification
Pith reviewed 2026-05-15 20:23 UTC · model grok-4.3
The pith
Personalized calibration strategies for conformal predictors improve coverage by over 20 percentage points in EEG seizure classification under patient distribution shifts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In EEG seizure classification, where inter-patient distribution shifts violate the exchangeability assumptions of standard conformal prediction, using calibration sets drawn from the same patient as the test example raises empirical coverage by more than 20 percentage points while preserving comparable prediction-set sizes.
What carries the argument
Patient-specific calibration sets inserted into the conformal prediction pipeline to correct for distribution shifts between training and deployment patients.
Load-bearing premise
Patient-specific calibration data is available at deployment time and the personalization step itself does not create new coverage failures under further unseen shifts.
What would settle it
Coverage falling below the nominal guarantee when the personalized method is applied to new patients who supply no calibration examples of their own or who encounter distribution shifts absent from both training and calibration data.
Figures
read the original abstract
Quantifying uncertainty in clinical predictions is critical for high-stakes diagnosis tasks. Conformal prediction offers a principled approach by providing prediction sets with theoretical coverage guarantees. However, in practice, patient distribution shifts violate the i.i.d. assumptions underlying standard conformal methods, leading to poor coverage in healthcare settings. In this work, we evaluate several conformal prediction approaches on EEG seizure classification, a task with known distribution shift challenges and label uncertainty. We demonstrate that personalized calibration strategies can improve coverage by over 20 percentage points while maintaining comparable prediction set sizes. Our implementation is available via PyHealth, an open-source healthcare AI framework: https://github.com/sunlabuiuc/PyHealth.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates several conformal prediction methods on EEG seizure classification, highlighting failures of standard approaches under patient distribution shifts that violate i.i.d. assumptions. It reports that personalized calibration strategies improve coverage by over 20 percentage points relative to baselines while maintaining comparable prediction set sizes, with an open-source implementation in PyHealth.
Significance. If the reported coverage gains are shown to be robust, statistically significant, and not artifacts of split choices or limited calibration data, the work would provide a concrete, deployable technique for adapting conformal prediction to non-stationary healthcare data. The open implementation in PyHealth is a positive contribution for reproducibility in the field.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): the headline claim of a >20 pp coverage improvement supplies no information on the number of patients, the size and source of patient-specific calibration sets, the exact baselines compared, or any statistical tests for significance. These details are load-bearing for evaluating whether the gain is reliable rather than split-dependent.
- [§3 and §5] §3 (Method) and §5 (Discussion): the personalization step assumes that patient-specific calibration data remains exchangeable with future test points from the same patient. No experiments address intra-patient non-stationarity (e.g., evolving seizure patterns or electrode drift), which could invalidate coverage guarantees when calibration data is limited to short recordings.
minor comments (1)
- [Abstract] The abstract mentions 'label uncertainty' but does not clarify how it interacts with the conformal score function; a brief description in §2 would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our work regarding conformal prediction in healthcare settings. We have carefully considered the comments and made revisions to enhance the reporting of experimental details and to elaborate on the methodological assumptions. Our point-by-point responses are provided below.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the headline claim of a >20 pp coverage improvement supplies no information on the number of patients, the size and source of patient-specific calibration sets, the exact baselines compared, or any statistical tests for significance. These details are load-bearing for evaluating whether the gain is reliable rather than split-dependent.
Authors: We agree with the referee that these specifics are essential for a thorough evaluation. Accordingly, we have revised the abstract to include information on the number of patients in the study, the size and source of the patient-specific calibration sets, the exact baselines used for comparison, and the statistical tests performed to assess significance. In §4, we have added detailed descriptions of the data splits, patient counts, calibration set sizes, and p-values from appropriate statistical tests to demonstrate that the coverage improvements are robust and not dependent on specific split choices. These changes ensure the claims are well-supported. revision: yes
-
Referee: [§3 and §5] §3 (Method) and §5 (Discussion): the personalization step assumes that patient-specific calibration data remains exchangeable with future test points from the same patient. No experiments address intra-patient non-stationarity (e.g., evolving seizure patterns or electrode drift), which could invalidate coverage guarantees when calibration data is limited to short recordings.
Authors: We appreciate this observation regarding the core assumption of our personalization strategy. The approach in §3 does assume exchangeability between the patient-specific calibration data and future test points from the same patient. We have updated §5 to explicitly address this assumption and discuss potential violations due to intra-patient non-stationarity, including examples like evolving seizure patterns and electrode drift. We acknowledge that with calibration data limited to short recordings, coverage guarantees could be affected. Although we were unable to conduct additional experiments on this due to the nature of the available EEG dataset (which lacks extensive longitudinal recordings), we have strengthened the discussion to highlight this as a limitation and suggest avenues for future research, such as adaptive conformal methods. revision: partial
Circularity Check
No circularity in empirical evaluation of conformal methods
full rationale
The paper is an empirical evaluation of existing conformal prediction techniques on EEG seizure data. It reports observed coverage gains from personalized calibration without presenting any mathematical derivation, uniqueness theorem, or ansatz that reduces the claimed improvements to fitted parameters or self-citations by construction. Results are framed as experimental outcomes on a specific dataset, with no load-bearing self-referential steps in the reported chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Conformal prediction guarantees hold only under i.i.d. or exchangeable data
Reference graph
Works this paper leans on
-
[1]
Chatterjee, A., Razin, S.S., Wu, J., Laghuvarapu, S., Pradeepkumar, J., Sun, J.: Making conformal predictors robust in healthcare settings: a case study on eeg classification (2026), https://arxiv.org/abs/2602.19483
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
Journal of neuroscience methods351, 108966 (2021)
Ge, W., Jing, J., An, S., Herlopian, A., Ng, M., Struck, A.F., Appavu, B., Johnson, E.L., Osman, G., Haider, H.A., et al.: Deep active learning for interictal ictal injury continuum eeg patterns. Journal of neuroscience methods351, 108966 (2021)
work page 2021
-
[3]
Proceedings of the AAAI Conference on Artificial Intelligence 37(6), 7722–7730 (Jun 2023)
Ghosh, S., Belkhouja, T., Yan, Y., Doppa, J.R.: Improving uncertainty quantifica- tion of deep classifiers via neighborhood conformal prediction: Novel algorithm and theoretical analysis. Proceedings of the AAAI Conference on Artificial Intelligence 37(6), 7722–7730 (Jun 2023)
work page 2023
-
[4]
IEEE Signal Processing in Medicine and Biology Symposium2015(2015)
Harati, A., Golmohammadi, M., Lopez, S., Obeid, I., Picone, J.: Improved eeg event classification using differential energy. IEEE Signal Processing in Medicine and Biology Symposium2015(2015)
work page 2015
-
[5]
Advances in Neural Information Processing Systems36, 37728–37747 (2023)
Laghuvarapu, S., Lin, Z., Sun, J.: Codrug: Conformal drug property prediction with density estimation under covariate shift. Advances in Neural Information Processing Systems36, 37728–37747 (2023)
work page 2023
-
[6]
IEEE Signal Processing in Medicine and Biology Sympo- sium2015(2015)
Lopez, S., Suarez, G., Jungreis, D., Obeid, I., Picone, J.: Automated identification of abnormal adult eegs. IEEE Signal Processing in Medicine and Biology Sympo- sium2015(2015)
work page 2015
-
[7]
Frontiers in neuroscience10, 196 (2016)
Obeid, I., Picone, J.: The temple university hospital eeg data corpus. Frontiers in neuroscience10, 196 (2016)
work page 2016
-
[8]
In: 19th IEEE International Conference on Tools with Artificial Intelli- gence (ICTAI 2007)
Papadopoulos, H., Vovk, V., Gammerman, A.: Conformal prediction with neural networks. In: 19th IEEE International Conference on Tools with Artificial Intelli- gence (ICTAI 2007). vol. 2, pp. 388–395. IEEE (2007)
work page 2007
-
[9]
Pradeepkumar, J., Piao, X., Chen, Z., Sun, J.: Tokenizing single-channel eeg with time-frequency motif learning (2026), https://arxiv.org/abs/2502.16060
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[10]
Frontiers in Neuroinformatics12(2018)
Shah, V., von Weltin, E., Lopez, S., McHugh, J.R., Veloso, L., Golmohammadi, M., Obeid, I., Picone, J.: The temple university hospital seizure detection corpus. Frontiers in Neuroinformatics12(2018)
work page 2018
-
[11]
In: Advances in Neural Information Processing Systems
Tibshirani, R.J., Foygel Barber, R., Candes, E., Ramdas, A.: Conformal prediction under covariate shift. In: Advances in Neural Information Processing Systems. vol. 32 (2019)
work page 2019
-
[12]
In: Advances in Neural Information Processing Sys- tems
Wu, Z., Yao, H., Liebovitz, D., Sun, J.: An iterative self-learning framework for medical domain generalization. In: Advances in Neural Information Processing Sys- tems. vol. 36, pp. 54833–54854 (2023)
work page 2023
-
[13]
In: The 11th International Conference on Learning Rep- resentations, ICLR 2023 (2023)
Yang, C., Westover, M.B., Sun, J.: Manydg: Many-domain generalization for healthcare applications. In: The 11th International Conference on Learning Rep- resentations, ICLR 2023 (2023)
work page 2023
-
[14]
Yang, C., Xiao, D., Westover, M.B., Sun, J.: Self-supervised eeg representation learning for automatic sleep staging. JMIR AI (2023)
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.