pith. sign in

arxiv: 2511.11935 · v2 · submitted 2025-11-14 · 💻 cs.LG

SurvBench: A Standardised Preprocessing Pipeline for Multi-Modal Electronic Health Record Survival Analysis

Pith reviewed 2026-05-17 21:35 UTC · model grok-4.3

classification 💻 cs.LG
keywords preprocessing pipelinesurvival analysiselectronic health recordsmulti-modal datadeep learningdata standardizationcritical care data
0
0 comments X

The pith

A configurable preprocessing pipeline converts raw electronic health records into consistent tensors for comparing survival models across studies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies that inconsistent and undocumented preprocessing steps such as cohort definition, time discretization, missingness handling, and censoring rules prevent fair comparisons of deep learning survival models on electronic health record data. It introduces a pipeline that turns raw data into model-ready tensors while exposing every decision through configuration files. The system supports multiple input modalities including time-series measurements, static patient information, diagnostic codes, and report embeddings, and it manages both single-risk and competing-risk endpoints. If adopted, performance differences reported between models would more likely stem from modeling choices rather than hidden variations in how the input data were prepared.

Core claim

The pipeline converts raw data exports into model-ready tensors for survival analysis, supporting multiple critical care sources and four input modalities of time-series vitals and laboratory values, static demographics, ICD codes, and radiology report embeddings, with every preprocessing decision controlled through YAML configuration files and with imputation, scaling, and feature filtering performed on the training fold only.

What carries the argument

The preprocessing pipeline that standardizes cohort definitions, time discretization, missingness handling with binary masks, and censoring rules for single-risk and competing-risk survival endpoints.

If this is right

  • Model comparisons can isolate the effect of architecture and training choices without preprocessing differences confounding concordance metrics.
  • Imputation and scaling statistics are derived exclusively from the training portion of each split to prevent information leakage.
  • Missing values are accompanied by explicit binary mask tensors so models can learn from missingness patterns directly.
  • The same configuration can be reused to produce harmonized external validation sets across different data sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread use could shorten the time researchers spend on data preparation and shift effort toward model innovation.
  • The configuration-driven design makes it straightforward to test how small changes in cohort or censoring rules affect final model rankings.
  • Community extensions could add new modalities or data sources while preserving the same controlled comparison environment.

Load-bearing premise

That the specific preprocessing choices encoded in the pipeline represent the appropriate standard the field should adopt rather than one reasonable option among many.

What would settle it

A controlled experiment in which two otherwise identical survival models produce materially different performance rankings when trained on data prepared under alternative but equally defensible preprocessing rules.

read the original abstract

Deep-learning survival models for electronic health record (EHR) data are hard to compare across papers because the upstream preprocessing step, which includes cohort definition, time discretisation, missingness handling, and censoring rules, is typically undocumented and inconsistent. A reported difference in concordance between two mortality models can therefore reflect any of these choices rather than a modelling contribution. We present SurvBench, an open-source preprocessing pipeline that converts raw PhysioNet exports into model-ready tensors for survival analysis. SurvBench covers four critical-care databases (MIMIC-IV, eICU, MC-MED, HiRID) and four input modalities: time-series vitals and laboratory values, static demographics, International Classification of Diseases (ICD) codes, and radiology report embeddings. Every preprocessing decision is controlled through YAML configuration. Imputation, scaling, and feature filtering are fit on the training fold only. Missingness is recorded as a binary mask alongside each feature tensor. The pipeline handles single-risk endpoints (in-hospital and in-ICU mortality) and competing-risks endpoints (a three-way emergency-department admission pathway, with home discharge treated as administrative censoring). We also provide support for harmonised cross-dataset external validation between eICU and MIMIC-IV. SurvBench is publicly available at https://github.com/munibmesinovic/SurvBench, providing a robust platform that future deep-learning EHR survival work, especially nascent multi-modal approaches, can be measured against under matched preprocessing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper presents SurvBench, an open-source preprocessing pipeline that converts raw PhysioNet exports from four critical-care databases (MIMIC-IV, eICU, MC-MED, HiRID) into model-ready tensors for survival analysis. It supports four modalities (time-series vitals/labs, static demographics, ICD codes, radiology embeddings), single-risk and competing-risk endpoints, missingness masks, train-only imputation/scaling, and cross-dataset harmonization, with every preprocessing decision controlled through YAML configuration files.

Significance. If the YAML interface fully exposes all decisions as claimed, this artifact would meaningfully advance reproducibility in deep-learning EHR survival work by supplying an explicit, configurable baseline that future studies can adopt for matched comparisons, particularly in multi-modal settings. The open-source release and concrete engineering choices (train-only fits, masks, competing-risk support) are strengths that could reduce the impact of undocumented preprocessing on reported model differences.

major comments (1)
  1. [Abstract] Abstract: the central claim that 'Every preprocessing decision is controlled through YAML configuration' is load-bearing for the standardisation objective. To substantiate that the pipeline eliminates the reproducibility barrier, the manuscript must demonstrate that cohort inclusion/exclusion logic, time-binning rules, censoring definitions (including for the three-way competing-risks pathway), missingness mask generation, and cross-modality alignment are all fully parameterised in the YAML schema without hard-coded or dataset-specific defaults that remain outside user control.
minor comments (1)
  1. Consider adding an explicit table or appendix that enumerates all YAML keys, their defaults, and the corresponding preprocessing step they control, to make the configuration surface immediately verifiable by readers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the potential of SurvBench to improve reproducibility in multi-modal EHR survival analysis. We agree that explicitly demonstrating the full YAML parameterization is necessary to substantiate the central claim and have prepared revisions accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'Every preprocessing decision is controlled through YAML configuration' is load-bearing for the standardisation objective. To substantiate that the pipeline eliminates the reproducibility barrier, the manuscript must demonstrate that cohort inclusion/exclusion logic, time-binning rules, censoring definitions (including for the three-way competing-risks pathway), missingness mask generation, and cross-modality alignment are all fully parameterised in the YAML schema without hard-coded or dataset-specific defaults that remain outside user control.

    Authors: We agree this demonstration is essential. In the revised manuscript we will add a new subsection in Methods (and an appendix table) that explicitly maps each listed decision to its YAML key(s): cohort inclusion/exclusion under 'cohort_selection' (with flags for each database and user-overridable criteria), time-binning under 'time_discretization' (bin_size, alignment, aggregation), censoring definitions for both single-risk and the three-way competing-risks pathway (including home-discharge handling) under 'endpoint_definition', missingness mask generation under 'missingness' (mask creation and propagation rules), and cross-modality alignment under 'modality_alignment' (temporal syncing and feature harmonization). We will include verbatim excerpts from the default config files for MIMIC-IV and eICU, show how every parameter is exposed without hard-coded fallbacks, and reference the exact Python functions that read these keys. The open-source repository already implements this design; the revision will make the mapping transparent in the paper itself. revision: yes

Circularity Check

0 steps flagged

No circularity: software pipeline defined by external configuration, no derivation chain

full rationale

The manuscript describes an open-source preprocessing pipeline whose behavior is explicitly governed by user-supplied YAML files rather than any internal equations or fitted quantities. No mathematical derivations, predictions, or self-referential definitions appear in the abstract or described contribution. Imputation, scaling, and feature filtering are stated to be fit on the training fold only, but this is a standard data-split practice and does not create a closed loop within the paper's own claims. The central assertion—that every preprocessing decision is controllable via configuration—is an engineering interface claim, not a logical reduction that collapses back onto its own inputs by construction. No self-citations or uniqueness theorems are invoked as load-bearing premises. The work is therefore self-contained as a software artifact.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The pipeline rests on standard domain assumptions about the structure of PhysioNet exports and on user-chosen configuration values rather than on any new scientific axioms or fitted parameters.

axioms (1)
  • domain assumption Raw exports from MIMIC-IV, eICU, MC-MED and HiRID follow documented PhysioNet schemas for time-series, static demographics, ICD codes and radiology reports.
    The pipeline description assumes these schemas exist and are stable; no new data model is invented.

pith-pipeline@v0.9.0 · 5569 in / 1439 out tokens · 66719 ms · 2026-05-17T21:35:31.668609+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ECG-biometrics-bench: A Unified Framework for Reproducible Benchmarking of ECG Biometrics

    cs.LG 2026-05 unverdicted novelty 7.0

    ECG-biometrics-bench standardizes evaluation to expose the Random Split Fallacy, where intra-session splits inflate ECG biometric performance, revealing temporal drift degradation that is not model-specific and can be...

Reference graph

Works this paper leans on

112 extracted references · 112 canonical work pages · cited by 1 Pith paper

  1. [1]

    Journal of the Royal Statistical Society: Series B (Methodological)34(2), 187–220 (1972)

    Cox, D.R.: Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological)34(2), 187–220 (1972)

  2. [2]

    Journal of the American Statistical Association53(282), 457–481 (1958)

    Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations. Journal of the American Statistical Association53(282), 457–481 (1958)

  3. [3]

    Journal of Machine Learning Research20(129), 1–30 (2019)

    Kvamme, H., Borgan, Ø., Scheel, I.: Time-to-event prediction with neural networks and Cox regression. Journal of Machine Learning Research20(129), 1–30 (2019)

  4. [4]

    In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Lee, C., Zame, W.R., Yoon, J., Schaar, M.: DeepHit: A deep learning approach to survival analysis with competing risks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

  5. [5]

    BMC Medical Research Methodology18(1), 24 (2018) https://doi.org/10.1186/ s12874-018-0482-1

    Katzman, J.L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., Kluger, Y.: DeepSurv: Personalised treatment recommender system using a Cox proportional hazards deep neural 19 network. BMC Medical Research Methodology18(1), 24 (2018) https://doi.org/10.1186/ s12874-018-0482-1

  6. [6]

    DySurv: dynamic deep learning model for survival analysis with conditional variational inference,

    Mesinovic, M., Watkinson, P., Zhu, T.: Dysurv: dynamic deep learning model for survival analysis with conditional variational inference. Journal of the American Medical Informatics Association, 271 (2024) https://doi.org/10.1093/jamia/ocae271

  7. [7]

    In: Temporal Graph Learning Workshop@ KDD 2025 (2025)

    Mesinovic, M., Watkinson, P., Zhu, T.: Multi-modal interpretable graph for competing risk prediction with electronic health records. In: Temporal Graph Learning Workshop@ KDD 2025 (2025)

  8. [8]

    Scientific Data , volume =

    Harutyunyan, H., Khachatrian, H., Kale, D.C., Ver Steeg, G., Galstyan, A.: Multitask learning and benchmarking with clinical time series data. Scientific Data6(1), 96 (2019) https://doi. org/10.1038/s41597-019-0103-9

  9. [9]

    Scalable and accurate deep learning with electronic health records,

    Rajkomar, A., Oren, E., Chen, K., Dai, A.M., Hajaj, N., Hardt, M., Liu, P.J., Liu, X., Marcus, J., Sun, M., Sundberg, P., Yee, H., Zhang, K., Zhang, Y., Flores, G., Duggan, G.E., Irvine, J., Le, Q., Litsch, K., Mossin, A., Tansuwan, J., Wang, D., Wexler, J., Wilson, J., Ludwig, D., Vol- chenboum, S.L., Chou, K., Pearson, M., Madabushi, S., Shah, N.H., But...

  10. [10]

    Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis

    Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE Journal of Biomedical and Health Informatics22(5), 1589–1604 (2018) https://doi.org/10.1109/JBHI.2017.2767063

  11. [11]

    Recurrent

    Che, Z., Purushotham, S., Cho, K., Sontag, D., Liu, Y.: Recurrent neural networks for mul- tivariate time series with missing values. Scientific Reports8(1), 6085 (2018) https://doi.org/ 10.1038/s41598-018-24271-9

  12. [12]

    In: International Conference on Learning Representations (2016)

    Lipton, Z.C., Kale, D.C., Elkan, C., Wetzel, R.: Learning to diagnose with LSTM recurrent neural networks. In: International Conference on Learning Representations (2016)

  13. [13]

    eGEMs1(3), 7 (2013) https://doi.org/10.13063/ 2327-9214.1035

    Wells, B.J., Chagin, K.M., Nowacki, A.S., Kattan, M.W.: Strategies for handling missing data in electronic health record derived data. eGEMs1(3), 7 (2013) https://doi.org/10.13063/ 2327-9214.1035

  14. [14]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, pp

    Luo, Y., Xin, Y., Joshi, R., Celi, L., Szolovits, P.: Predicting ICU mortality risk by grouping temporal trends from a multivariate panel of physiologic measurements. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 42–50 (2016)

  15. [15]

    Big Data Analytics1(1), 9 (2016) https://doi.org/10.1186/ s41044-016-0014-0

    Garc´ ıa, S., Ram´ ırez-Gallego, S., Luengo, J., Ben´ ıtez, J.M., Herrera, F.: Big data prepro- cessing: Methods and prospects. Big Data Analytics1(1), 9 (2016) https://doi.org/10.1186/ s41044-016-0014-0

  16. [16]

    Proceedings of the IEEE104(2), 444–466 (2016) https://doi.org/10.1109/JPROC.2015.2501978

    Johnson, A.E.W., Ghassemi, M.M., Nemati, S., Niehaus, K.E., Clifton, D.A., Clifford, G.D.: Machine learning and decision support in critical care. Proceedings of the IEEE104(2), 444–466 (2016) https://doi.org/10.1109/JPROC.2015.2501978

  17. [17]

    MIMIC-III, a freely accessible critical care database

    Johnson, A.E.W., Pollard, T.J., Shen, L., Lehman, L.-w.H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L.A., Mark, R.G.: MIMIC-III, a freely accessible critical care database. Scientific Data3, 160035 (2016) https://doi.org/10.1038/sdata.2016.35

  18. [18]

    Statistics in Medicine26(11), 2389–2430 (2007) https://doi.org/10.1002/sim.2712

    Putter, H., Fiocco, M., Geskus, R.B.: Tutorial in biostatistics: Competing risks and multi-state models. Statistics in Medicine26(11), 2389–2430 (2007) https://doi.org/10.1002/sim.2712

  19. [19]

    Statistics in Medicine31(11-12), 1074–1088 (2012) https://doi.org/10

    Andersen, P.K., Keiding, N.: Interpretability and importance of functionals in competing risks and multistate models. Statistics in Medicine31(11-12), 1074–1088 (2012) https://doi.org/10. 1002/sim.4385 20

  20. [20]

    Journal of Machine Learning Research17(1), 2797–2819 (2016)

    Wiens, J., Guttag, J., Horvitz, E.: Patient risk stratification with time-varying parameters: A multitask learning approach. Journal of Machine Learning Research17(1), 2797–2819 (2016)

  21. [21]

    In: Machine Learning for Healthcare Conference, pp

    Nestor, B., McDermott, M.B.A., Boag, W., Berner, G., Naumann, T., Hughes, M.C., Ghassemi, M., Szolovits, P.: Feature robustness in non-stationary health records: Caveats to deploy- able model performance in common clinical machine learning tasks. In: Machine Learning for Healthcare Conference, pp. 381–405 (2019)

  22. [22]

    In: NeurIPS 2019 Workshop on Machine Learning for Health (ML4H) (2019).https://arxiv.org/abs/1909.02832

    Ren, Y., Yang, M., Li, Y., He, L., Liu, W.: DeepWeiSurv: A Weibull-based deep learning model for survival analysis. In: NeurIPS 2019 Workshop on Machine Learning for Health (ML4H) (2019).https://arxiv.org/abs/1909.02832

  23. [23]

    Yhdego, J

    Aastha, Zare, A., He, Z.-J., L. P., L.P., Gotze, T., Li, H., V. S., V.S., Li, X., Sun, J.: Deep- Compete: A deep learning model for competing risks. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 933–938 (2021). https://doi.org/10.1109/ BIBM52615.2021.9669528 . IEEE

  24. [24]

    In: Pro- ceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM), pp

    Huang, Z., Zhang, A., Hu, Y., Wang, L., Sun, J., Chen, Y.: TransformerJM: A Transformer- based joint model for multivariate longitudinal data and competing risks survival. In: Pro- ceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM), pp. 799–808 (2022). https://doi.org/10.1145/3511808.3557345

  25. [25]

    In: Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pp

    Zhang, H., Wu, Z., Zhao, J.: CAT-Surv: A categorical time-discretization approach for survival analysis. In: Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pp. 406–414 (2023). https://doi.org/10.1137/1.9781611977653.ch46 . SIAM

  26. [26]

    Artificial Intelligence Review57(3), 65 (2024) https://doi.org/10.1007/ s10462-023-10681-3

    Wiegrebe, S., Kopper, P., Sonabend, R., Bender, A., R¨ ugamer, D.: Deep learning for sur- vival analysis: a review. Artificial Intelligence Review57(3), 65 (2024) https://doi.org/10.1007/ s10462-023-10681-3

  27. [27]

    JAMA323(4), 305–306 (2020) https://doi.org/10.1001/jama.2019.20866

    Beam, A.L., Manrai, A.K., Ghassemi, M.: Challenges to the reproducibility of machine learning models in health care. JAMA323(4), 305–306 (2020) https://doi.org/10.1001/jama.2019.20866

  28. [28]

    Science 334(6060), 1226–1227 (Dec 2011)

    Peng, R.D.: Reproducible research in computational science. Science334(6060), 1226–1227 (2011) https://doi.org/10.1126/science.1213847

  29. [29]

    NPJ Digital Medicine3(1), 41 (2020) https: //doi.org/10.1038/s41746-020-0253-3

    Sendak, M.P., Gao, M., Brajer, N., Balu, S.: Presenting machine learning model information to clinical end users with model facts labels. NPJ Digital Medicine3(1), 41 (2020) https: //doi.org/10.1038/s41746-020-0253-3

  30. [30]

    Scientific Data10(1), 1 (2023) https://doi.org/10

    Johnson, A.E.W., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., Pollard, T.J., Hao, S., Moody, B., Gow, B., Lehman, L.-w.H., Celi, L.A., Mark, R.G.: MIMIC-IV, a freely accessible electronic health record dataset. Scientific Data10(1), 1 (2023) https://doi.org/10. 1038/s41597-022-01899-x

  31. [31]

    In: Machine Learning for Health, pp

    Gupta, M., Gallamoza, B., Cutrona, N., Dhakal, P., Poulain, R., Beheshti, R.: An extensive data processing pipeline for MIMIC-IV. In: Machine Learning for Health, pp. 311–325. PMLR, ??? (2022)

  32. [32]

    McDermott, M.B.A., Nestor, B., Kim, E., Zhang, W., Goldenberg, A., Szolovits, P., Ghassemi, M.: Comprehensive comparative study of multi-label classification methods (2021)

  33. [33]

    and Johnson, Alistair E

    Pollard, T.J., Johnson, A.E.W., Raffa, J.D., Celi, L.A., Mark, R.G., Badawi, O.: The eICU Col- laborative Research Database, a freely available multi-centre database for critical care research. Scientific Data5, 180178 (2018) https://doi.org/10.1038/sdata.2018.178

  34. [34]

    Scientific 21 Reports9(1), 15665 (2019) https://doi.org/10.1038/s41598-019-51810-8

    Kim, Y., Shachar, S.S., Gayvert, K., Li, P.P., Gannavarapu, A., Lempicki, M., Claassen, J., Castro, M., Steinherz, P., Lis, R., Levy-Lahad, E., Meiner, V., Daly, M.B., Tischkowitz, M., Offit, K., Robson, M., Domchek, S.M., Walsh, M.F., Levine, D.A.: Development and validation of a scoring system for the differential diagnosis of diabetes type in pediatric...

  35. [35]

    The Lancet Respiratory Medicine3(1), 42–52 (2015) https://doi.org/ 10.1016/S2213-2600(14)70239-5

    Pirracchio, R., Petersen, M.L., Carone, M., Rigon, M.R., Chevret, S., Laan, M.J.: Mortal- ity prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): A population-based study. The Lancet Respiratory Medicine3(1), 42–52 (2015) https://doi.org/ 10.1016/S2213-2600(14)70239-5

  36. [36]

    BMC Medical Research Methodology 14, 137 (2014) https://doi.org/10.1186/1471-2288-14-137

    Ploeg, T., Austin, P.C., Steyerberg, E.W.: Modern modelling techniques are data hungry: A simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology 14, 137 (2014) https://doi.org/10.1186/1471-2288-14-137

  37. [37]

    IEEE Journal of Biomedical and Health Informatics24(11), 3268–3275 (2020) https://doi.org/10.1109/JBHI.2020.2984931

    Darabi, S., Kachuee, M., Fazeli, S., Sartipi, M.: TAPER: Time-aware patient EHR rep- resentation. IEEE Journal of Biomedical and Health Informatics24(11), 3268–3275 (2020) https://doi.org/10.1109/JBHI.2020.2984931

  38. [38]

    BMJ361, 1479 (2018) https: //doi.org/10.1136/bmj.k1479

    Agniel, D., Kohane, I.S., Weber, G.M.: Biases in electronic health record data due to processes within the healthcare system: Retrospective observational study. BMJ361, 1479 (2018) https: //doi.org/10.1136/bmj.k1479

  39. [39]

    Nature Medicine26(3), 364–373 (2020) https://doi.org/10.1038/s41591-020-0789-4

    Hyland, S.L., Faltys, M., H¨ user, M., Lyu, X., Gumbsch, T., Esteban, C., Bock, C., Horn, M., Moor, M., Rieck, B., Zimmermann, M., Bodenham, D., Borgwardt, K., R¨ atsch, G., Merz, T.M.: Early prediction of circulatory failure in the intensive care unit using machine learning. Nature Medicine26(3), 364–373 (2020) https://doi.org/10.1038/s41591-020-0789-4

  40. [40]

    Journal of Biomedical Informatics83, 112–134 (2018) https://doi.org/10

    Purushotham, S., Meng, C., Che, Z., Liu, Y.: Benchmarking deep learning models on large healthcare datasets. Journal of Biomedical Informatics83, 112–134 (2018) https://doi.org/10. 1016/j.jbi.2018.04.007

  41. [41]

    NPJ Digital Medicine4(1), 86 (2021) https://doi.org/10.1038/s41746-021-00455-y

    Rasmy, L., Xiang, Y., Xie, Z., Tao, C., Zhi, D.: Med-BERT: Pretrained contextualized em- beddings on large-scale structured electronic health records for disease prediction. NPJ Digital Medicine4(1), 86 (2021) https://doi.org/10.1038/s41746-021-00455-y

  42. [42]

    In: Advances in Neural Information Processing Systems, pp

    Choi, E., Bahadori, M.T., Sun, J., Kulas, J., Schuetz, A., Stewart, W.: RETAIN: An inter- pretable predictive model for healthcare using reverse time attention mechanism. In: Advances in Neural Information Processing Systems, pp. 3504–3512 (2016)

  43. [43]

    Huang, K., Altosaar, J., Ranganath, R.: ClinicalBERT: Modeling clinical notes and predicting hospital readmission (2019)

  44. [44]

    In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp

    Alsentzer, E., Murphy, J., Boag, W., Weng, W.-H., Jindi, D., Naumann, T., McDermott, M.: Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, pp. 72–78 (2019)

  45. [45]

    Circulation101(23), 215–220 (2000)

    Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.-K., Stanley, H.E.: PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation101(23), 215–220 (2000)

  46. [46]

    In: 2020 IEEE-EMBS International Con- ference on Biomedical and Health Informatics (BHI), pp

    Sheikhalishahi, S., K¨ ok, I., Luo, Y., Peelen, L., Slooter, A.J.C., G, K.C.: Benchmarking critical care datasets: A comparison of eICU and MIMIC-III. In: 2020 IEEE-EMBS International Con- ference on Biomedical and Health Informatics (BHI), pp. 1–4 (2020). https://doi.org/10.1109/ BHI48114.2020.9189679 . IEEE

  47. [47]

    In: International Conference on Machine Learning, pp

    Futoma, J., Hariharan, S., Heller, K.: Learning to detect sepsis with a multitask Gaussian process RNN classifier. In: International Conference on Machine Learning, pp. 1174–1182 (2017)

  48. [48]

    Scientific Data12(1), 1094 (2025) https://doi.org/ 10.1038/s41597-025-05419-5 22

    Kansal, A., Chen, E., Jin, B.T., Rajpurkar, P., Kim, D.A.: MC-MED, multimodal clinical monitoring in the emergency department. Scientific Data12(1), 1094 (2025) https://doi.org/ 10.1038/s41597-025-05419-5 22

  49. [49]

    PhysioNet (2025)

    Kansal, A., Chen, E., Jin, T., Rajpurkar, P., Kim, D.: MC-MED, multimodal clinical monitor- ing in the emergency department (version 1.0.1). PhysioNet (2025). https://doi.org/10.13026/ wvyw-g663 . https://physionet.org/content/mc-med/1.0.1/

  50. [50]

    Journal of the American Medical Informatics Association20(3), 494–502 (2013) https://doi

    Malin, B., El Emam, K.: Detecting identity disclosure in high-dimensional (biomedical) data. Journal of the American Medical Informatics Association20(3), 494–502 (2013) https://doi. org/10.1136/amiajnl-2012-001031

  51. [51]

    Critical Care18(4), 421 (2014) https://doi.org/10.1186/cc13993

    Verburg, I.W.M., Atlen, J., Keizer, N.F., Escoredo-Luks, N.L., H¨ okby, N., Jonge, E., Meurs, M.M.: Impact of arbitrary ICU admission and discharge criteria on outcome analysis. Critical Care18(4), 421 (2014) https://doi.org/10.1186/cc13993

  52. [52]

    Journal of Clinical Anesthesia 13(8), 563–568 (2001) https://doi.org/10.1016/s0952-8180(01)00344-9

    Rosenberg, A.L., Ho, V., Lema, J.V., O’Brien, J.F.: Severity of illness and the timing of admis- sion and discharge for patients who die in the intensive care unit. Journal of Clinical Anesthesia 13(8), 563–568 (2001) https://doi.org/10.1016/s0952-8180(01)00344-9

  53. [53]

    Anaesthesia54(6), 558–564 (1999) https://doi.org/10.1046/j.1365-2044.1999.00843.x

    Goldhill, D.R., Sumner, A.: Mortality and length of stay in intensive care: a comparison of three units. Anaesthesia54(6), 558–564 (1999) https://doi.org/10.1046/j.1365-2044.1999.00843.x

  54. [54]

    JMIR Medical Informatics4(3), 28 (2016) https://doi.org/10.2196/medinform.5909

    Desautels, T., Calvert, J., Hoffman, J., Jay, M., Kerem, Y., Shieh, L., Shimabukuro, D., Chet- tipally, U., Feldman, M.D., Barton, C., Wales, D.J., Das, R.: Prediction of sepsis in the intensive care unit with minimal electronic health record data: A machine learning approach. JMIR Medical Informatics4(3), 28 (2016) https://doi.org/10.2196/medinform.5909

  55. [55]

    Critical Care Medicine 34(11), 2735–2741 (2006) https://doi.org/10.1097/01.CCM.0000240974.74548.D8

    Kramer, A.A., Zimmerman, J.E.: A comparison of three methods for predicting discharge status: The importance of both patient case mix and provider-level effects. Critical Care Medicine 34(11), 2735–2741 (2006) https://doi.org/10.1097/01.CCM.0000240974.74548.D8

  56. [56]

    Intensive Care Medicine38(10), 1654–1661 (2012) https://doi.org/10.1007/ s00134-012-2629-6

    Fuchs, L., Chronaki, C.E., Park, S., Novack, V., Baumfeld, Y., Scott, D., McLennan, S., Tal- mor, D., Celi, L.: ICU admission characteristics and mortality rates among elderly and very elderly patients. Intensive Care Medicine38(10), 1654–1661 (2012) https://doi.org/10.1007/ s00134-012-2629-6

  57. [57]

    Epidemiology20(4), 555–561 (2009) https://doi.org/10.1097/EDE.0b013e3181a39056

    Wolbers, M., Koller, M.T., Witteman, J.C.M., Steyerberg, E.W.: Prognostic models with com- peting risks: Methods and application to coronary risk prediction. Epidemiology20(4), 555–561 (2009) https://doi.org/10.1097/EDE.0b013e3181a39056

  58. [58]

    Medical Care48(6 Suppl), 96–105 (2010) https://doi.org/10.1097/MLR

    Varadhan, R., Weiss, C.O., Segal, J.B., Wu, A.W., Scharfstein, D., Boyd, C.: Evaluating health outcomes in the presence of competing risks: A review of statistical methods and clinical applications. Medical Care48(6 Suppl), 96–105 (2010) https://doi.org/10.1097/MLR. 0b013e3181d99107

  59. [59]

    Orthopedics36(1), 15–21 (2013) https://doi.org/10.3928/01477447-20121217-08

    Cushing, T.A., Bruce, S.E., Kaimraj, M., Gaskill, T.R., Cripe, M.J., Gaski, G.E., Johnson, W.J., Salyers, E., Zody, R.D., Siders, C.A.,et al.: Comparison of pediatric and adult open-fracture management. Orthopedics36(1), 15–21 (2013) https://doi.org/10.3928/01477447-20121217-08

  60. [60]

    Current Pediatric Reviews11(3), 195–200 (2015) https: //doi.org/10.2174/1573396311666150722104113

    Martin, B., Kling, P., Gonzalez, R., Sgromolo, T., Pil-Kim, C.: Pediatric-specific metabolic and hematologic responses to trauma. Current Pediatric Reviews11(3), 195–200 (2015) https: //doi.org/10.2174/1573396311666150722104113

  61. [61]

    Annals of Emergency Medicine58(1), 33–40 (2011) https://doi.org/10.1016/j.annemergmed.2010.08.040

    Welch, S.J., Asplin, B.R., Stone-Griffith, S., Davidson, S.J., Augustine, J., Schuur, J.: Emer- gency department operational metrics, measures and definitions: Results of the Second Performance Measures and Benchmarking Summit. Annals of Emergency Medicine58(1), 33–40 (2011) https://doi.org/10.1016/j.annemergmed.2010.08.040

  62. [62]

    Academic Emergency Medicine18(8), 848–855 (2011) https://doi.org/10

    Rowe, B.H., McRae, A.D., Yaghoubi, M., Forgie, P.J., Shao, J., Johnson, C., Holroyd, B.R., Yoon, P., O’Brien, D.M., Oh, P.: Characteristics of patients who leave emergency departments without being seen. Academic Emergency Medicine18(8), 848–855 (2011) https://doi.org/10. 1111/j.1553-2712.2011.01131.x 23

  63. [63]

    John Wiley & Sons, ??? (2006)

    Pintilie, M.: Analysing Competing Risks Data. John Wiley & Sons, ??? (2006). https://doi. org/10.1002/0470870716

  64. [64]

    Journal of the American Statistical Association94(446), 496–509 (1999) https://doi.org/ 10.1080/01621459.1999.10474144

    Fine, J.P., Gray, R.J.: A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association94(446), 496–509 (1999) https://doi.org/ 10.1080/01621459.1999.10474144

  65. [65]

    In: International Conference on Artificial Intelligence and Statistics (2024)

    Mesinovic, M., Cui, C., Zenati, M.A., Cannesson, M., Burdjalov, V.K.: MM-GraphSurv: Multi- modal graph-based survival analysis for critical care. In: International Conference on Artificial Intelligence and Statistics (2024)

  66. [66]

    Critical Care Medicine24(4), 650–654 (1996) https://doi

    Vincent, J.-L., De Mendon¸ ca, A., Cantraine, F., Moreno, R., Takala, J., Suter, P.M., V-A, D.B., Thijs, L.G., Sprung, C.L.: The SOFA (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. Critical Care Medicine24(4), 650–654 (1996) https://doi. org/10.1097/00003246-199604000-00016

  67. [67]

    Critical Care Medicine34(5), 1297–1310 (2006) https://doi.org/10.1097/01.CCM

    Zimmerman, J.E., Kramer, A.A., McNair, D.S., Malila, F.M.: Acute Physiology and Chronic Health Evaluation (APACHE) IV: Hospital mortality assessment for today’s critically ill patients. Critical Care Medicine34(5), 1297–1310 (2006) https://doi.org/10.1097/01.CCM. 0000215112.84523.F0

  68. [68]

    Long short -term memory,

    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation9(8), 1735–1780 (1997) https://doi.org/10.1162/neco.1997.9.8.1735

  69. [69]

    Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling (2018)

  70. [70]

    In: 2017 International Conference on Pervasive Computing (ICPC), pp

    Potdar, K., Pardawala, T.S., Pai, C.D.: A comparative study of categorical data encoding techniques for supervised machine learning. In: 2017 International Conference on Pervasive Computing (ICPC), pp. 1–7 (2017). https://doi.org/10.1109/PERVASIVE.2017.83pervasive. 2017.83 . IEEE

  71. [71]

    Science , author =

    Obermeyer, Z., Powers, B., Vogeli, C., Mullainathan, S.: Dissecting racial bias in an algorithm used to manage the health of populations. Science366(6464), 447–453 (2019) https://doi.org/ 10.1126/science.aax2342

  72. [72]

    Annals of Internal Medicine169(12), 866–872 (2018) https: //doi.org/10.7326/M18-1990

    Rajkomar, A., Hardt, M., Howell, M.D., Corrado, G., Chin, M.H.: Ensuring fairness in machine learning to advance health equity. Annals of Internal Medicine169(12), 866–872 (2018) https: //doi.org/10.7326/M18-1990

  73. [73]

    Subspace clustering for high dimensional data: a review

    Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter6(1), 20–29 (2004) https://doi.org/10.1145/1007730.1007735

  74. [74]

    Intelligent Data Analysis6(5), 429–449 (2002) https://doi.org/10.3233/IDA-2002-6504

    Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis6(5), 429–449 (2002) https://doi.org/10.3233/IDA-2002-6504

  75. [75]

    Journal of Machine Learning Research3, 1157–1182 (2003)

    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research3, 1157–1182 (2003)

  76. [76]

    A review of feature selection techniques in bioinformatics,

    Saeys, Y., Inza, I., Larra˜ naga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics23(19), 2507–2517 (2007) https://doi.org/10.1093/bioinformatics/btm344

  77. [77]

    Proceedings of the National Academy of Sciences99(10), 6562–6566 (2002) https://doi.org/10.1073/pnas.102102699

    Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene- expression data. Proceedings of the National Academy of Sciences99(10), 6562–6566 (2002) https://doi.org/10.1073/pnas.102102699

  78. [78]

    Journal of Machine Learning Research3, 1371–1382 (2003)

    Reunanen, J.: Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research3, 1371–1382 (2003)

  79. [79]

    Science 354(6317), 1240–1241 (2016) https://doi.org/10.1126/science.aah6168

    Stodden, V., McNutt, M., Bailey, D.H., Deelman, E., Gil, Y., Hanson, B., Heroux, M.A., 24 Ioannidis, J.P.A., Taufer, M.: Enhancing reproducibility for computational methods. Science 354(6317), 1240–1241 (2016) https://doi.org/10.1126/science.aah6168

  80. [80]

    Nature Reviews Genetics13(6), 395–405 (2012) https://doi.org/ 10.1038/nrg3208

    Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: Towards better research applications and clinical care. Nature Reviews Genetics13(6), 395–405 (2012) https://doi.org/ 10.1038/nrg3208

Showing first 80 references.