Recognition: 2 theorem links
· Lean TheoremExplainable Machine Learning for Sepsis Outcome Prediction Using a Novel Romanian Electronic Health Record Dataset
Pith reviewed 2026-05-10 19:07 UTC · model grok-4.3
The pith
Machine learning models on a new Romanian EHR dataset predict sepsis outcomes at up to 0.983 AUC and flag eosinopenia as a key signal.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Models trained on the Romanian sepsis EHR dataset reach AUC 0.983 and accuracy 0.93 for the deceased-versus-recovered task; SHAP explanations consistently rank eosinophil percentage among the top predictors alongside cardiovascular comorbidities, urea, aspartate aminotransferase, and platelet count.
What carries the argument
SHAP explanations applied to the trained models to rank the contribution of laboratory values and comorbidities to outcome predictions.
If this is right
- Eosinophil percentage could be added to existing sepsis risk scores because it ranks as a strong predictor here.
- High internal AUC on the deceased-versus-recovered task indicates the models may be ready for prospective clinical testing.
- Limiting input features to the 10–50 most frequent lab tests still preserves useful performance while increasing the number of usable patient records.
- Explainable outputs help clinicians see why a prediction is made rather than treating the model as a black box.
Where Pith is reading between the lines
- Repeating the same SHAP analysis on datasets from other countries would show whether eosinopenia remains important outside the Romanian population.
- Combining the top SHAP features with established scores such as SOFA might yield a hybrid rule that improves calibration without losing interpretability.
- Deploying the models in real-time EHR dashboards would allow measurement of whether clinicians actually change decisions when shown the explanations.
Load-bearing premise
That models trained and tested on internal splits of data from one Romanian hospital will perform similarly on patients from other hospitals or regions.
What would settle it
Retraining or testing the reported models on an independent sepsis EHR collection from a different hospital system yields AUC well below 0.9 for the same classification tasks.
Figures
read the original abstract
We develop and analyze explainable machine learning (ML) models for sepsis outcome prediction using a novel Electronic Health Record (EHR) dataset from 12,286 hospitalizations at a large emergency hospital in Romania. The dataset includes demographics, International Classification of Diseases (ICD-10) diagnostics, and 600 types of laboratory tests. This study aims to identify clinically strong predictors while achieving state-of-the-art results across three classification tasks: (1)deceased vs. discharged, (2)deceased vs. recovered, and (3)recovered vs. ameliorated. We trained five ML models to capture complex distributions while preserving clinical interpretability. Experiments explored the trade-off between feature richness and patient coverage, using subsets of the 10--50 most frequent laboratory tests. Model performance was evaluated using accuracy and area under the curve (AUC), and explainability was assessed using SHapley Additive exPlanations (SHAP). The highest performance was obtained for the deceased vs. recovered case study (AUC=0.983, accuracy=0.93). SHAP analysis identified several strong predictors such as cardiovascular comorbidities, urea levels, aspartate aminotransferase, platelet count, and eosinophil percentage. Eosinopenia emerged as a top predictor, highlighting its value as an underutilized marker that is not included in current assessment standards, while the high performance suggests the applicability of these models in clinical settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a novel EHR dataset from 12,286 sepsis hospitalizations at a single Romanian hospital, including demographics, ICD-10 codes, and 600 lab tests. It trains five ML models on three binary tasks (deceased vs. discharged, deceased vs. recovered, recovered vs. ameliorated) using subsets of the 10-50 most frequent labs, evaluates with accuracy and AUC, and applies SHAP to identify predictors such as cardiovascular comorbidities, urea, AST, platelets, and eosinophil percentage. Peak results are AUC 0.983 and accuracy 0.93 on deceased vs. recovered, with the conclusion that eosinopenia is a valuable underutilized marker and that the models suggest clinical applicability.
Significance. A new public or shareable sepsis EHR dataset from an under-represented region combined with SHAP-based interpretability is potentially valuable for the community. If the reported internal performance proves robust under proper validation, the identification of eosinophil percentage as a top predictor could prompt re-examination of current sepsis scoring systems. However, the single-center design and missing methodological safeguards limit the strength of any claim to immediate clinical utility.
major comments (3)
- [Abstract] Abstract: The headline AUC of 0.983 and accuracy of 0.93 for the deceased-vs-recovered task are presented without any description of the cross-validation procedure, missing-data strategy, class-imbalance correction, or whether the selection of the 10–50 most frequent laboratory tests occurred inside or outside the CV loop. This information is required to assess whether the metrics are optimistically biased.
- [Abstract] Abstract and Discussion: The statement that the high performance 'suggests the applicability of these models in clinical settings' rests entirely on internal performance within one hospital’s 12k-record cohort. No external validation cohort, temporal hold-out across years, or multi-center test is reported, leaving the transportability of the learned boundaries (and therefore the clinical-applicability claim) unsupported.
- [Methods] Methods (feature selection and SHAP): The trade-off experiments that retain only the most frequent labs and the subsequent SHAP ranking of eosinophil percentage are load-bearing for the paper’s interpretability contribution, yet no details are given on hyperparameter tuning, leakage prevention, or stability of the SHAP rankings across different feature-subset sizes.
minor comments (1)
- [Abstract] Abstract: The three tasks are labeled (1) deceased vs. discharged, (2) deceased vs. recovered, and (3) recovered vs. ameliorated. Clarifying the clinical distinction between 'discharged' and 'recovered' (and whether these labels are mutually exclusive) would prevent reader confusion.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comments have helped us identify areas where the manuscript can be clarified and strengthened. Below we provide point-by-point responses to the major comments. We have revised the manuscript to incorporate additional methodological details and to moderate claims of clinical applicability in light of the single-center design.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline AUC of 0.983 and accuracy of 0.93 for the deceased-vs-recovered task are presented without any description of the cross-validation procedure, missing-data strategy, class-imbalance correction, or whether the selection of the 10–50 most frequent laboratory tests occurred inside or outside the CV loop. This information is required to assess whether the metrics are optimistically biased.
Authors: We agree that these details are essential for assessing potential bias. The revised manuscript now includes an expanded Methods section describing the 5-fold stratified cross-validation, median imputation for missing laboratory values, and class-weighting to address imbalance. Feature selection of the most frequent labs was performed on the full cohort prior to cross-validation to ensure adequate patient coverage across subsets; we now explicitly state this choice and discuss its implications for potential optimistic bias. A brief summary of the validation procedure has also been added to the abstract. revision: yes
-
Referee: [Abstract] Abstract and Discussion: The statement that the high performance 'suggests the applicability of these models in clinical settings' rests entirely on internal performance within one hospital’s 12k-record cohort. No external validation cohort, temporal hold-out across years, or multi-center test is reported, leaving the transportability of the learned boundaries (and therefore the clinical-applicability claim) unsupported.
Authors: We acknowledge that the single-center nature of the dataset limits claims about transportability. In the revised abstract and Discussion we have replaced the original phrasing with more cautious language indicating that the results 'suggest potential applicability subject to external validation.' We have also added an explicit limitations subsection highlighting the absence of temporal or multi-center testing and the consequent need for further studies before clinical deployment. revision: partial
-
Referee: [Methods] Methods (feature selection and SHAP): The trade-off experiments that retain only the most frequent labs and the subsequent SHAP ranking of eosinophil percentage are load-bearing for the paper’s interpretability contribution, yet no details are given on hyperparameter tuning, leakage prevention, or stability of the SHAP rankings across different feature-subset sizes.
Authors: We appreciate this observation. The revised Methods section now details the hyperparameter tuning procedure (grid search within cross-validation folds for each model), confirms that frequency-based feature selection was performed once on the full dataset for the trade-off experiments while model training and SHAP computation occurred inside the CV loop to limit leakage, and reports SHAP stability by showing that eosinophil percentage remains among the top-ranked features across the 10-, 20-, 30-, 40-, and 50-lab subsets. revision: yes
Circularity Check
No circularity: empirical ML evaluation on held-out data
full rationale
The paper describes a standard empirical pipeline: collection of a new single-center EHR dataset, selection of frequent lab features, training of off-the-shelf ML classifiers on three binary outcome tasks, evaluation via accuracy and AUC on held-out data, and post-hoc SHAP attribution. No equations, derivations, or self-referential steps appear; performance numbers are direct outputs of train/test splits rather than fitted parameters renamed as predictions. No self-citations support load-bearing uniqueness claims or ansatzes, and no known results are merely renamed. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
free parameters (2)
- Number of most frequent lab tests retained (10-50)
- Hyperparameters of the five ML models
axioms (2)
- domain assumption The collected EHR records accurately reflect true clinical states and outcomes.
- domain assumption SHAP values provide stable and clinically meaningful feature attributions.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We trained five ML models... SHAP analysis identified... eosinophil percentage... AUC=0.983
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
highest performance... Deceased vs. Recovered... top-40 subset
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Rudd KE, Johnson SC, Agesa KM, et al. Global, regional, and national sepsis incidence and mortality, 1990–2017: analysis for the Global Burden of Disease Study.The Lancet. 2020;395(10219):200–211. doi:10.1016/S0140-6736(19)32989-7
-
[2]
Australian Commission on Safety and Quality in Health Care.A Review of the Impacts of Surviving Sepsis for Australian Patients. 2021
work page 2021
-
[3]
Muşat F, Păduraru DN, Bolocan A, et al. Sepsis Burden in a Major Romanian Emer- gency Center—An 18-Year Retrospective Analysis of Mortality and Risk Factors.Medic- ina. 2025;61(5):864. doi:10.3390/medicina61050864
-
[4]
Vincent JL, Moreno R, Takala J, et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure.Intensive Care Medicine. 1996;22(7):707–710. doi:10.1007/s001340050156
-
[5]
The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3).JMS SKIMS
Rather AR, Kasana B. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3).JMS SKIMS. 2015;18(2):162–164. doi:10.33883/jms.v18i2.269
-
[6]
1993.The Origins of Order: Self-Organization and Selection in Evolution
Donabedian A. The Apache II Severity of Disease Classification Sys- tem.An Introduction to Quality Assurance in Health Care. 2002:159–162. doi:10.1093/oso/9780195158090.005.0005
-
[7]
Le Gall JR. A New Simplified Acute Physiology Score (SAPS II) Based on a European/North American Multicenter Study.JAMA. 1993;270(24):2957. doi:10.1001/jama.1993.03510240069035
-
[8]
Wongtangman K, Santer P, Wachtendorf LJ, et al. Association of Sedation, Coma, and In-Hospital Mortality in Mechanically Ventilated Patients With Coronavirus Disease 2019–Related Acute Respiratory Distress Syndrome: A Retrospective Cohort Study. Critical Care Medicine. 2021;49(9):1524–1534. doi:10.1097/ccm.0000000000005053
-
[9]
Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU Col- laborative Research Database, a freely available multi-center database for critical care research.Scientific Data. 2018;5(1). doi:10.1038/sdata.2018.178
-
[10]
Chicco D, Jurman G. Survival prediction of patients with sepsis from age, sex, and septic episode number alone.Scientific Reports. 2020;10(1). doi:10.1038/s41598-020-73558-3
-
[11]
Diwan S, Gandhi V, Baidya Kayal E, Khanna P, Mehndiratta A. Explainable machine learning models for mortality prediction in patients with sepsis in tertiary care hos- pital ICU in low- to middle-income countries.Intensive Care Medicine Experimental. 2025;13(1). doi:10.1186/s40635-025-00765-5
-
[12]
Zhang G, Shao F, Yuan W, et al. Predicting sepsis in-hospital mortality with machine learning: a multi-center study using clinical and inflammatory biomarkers.European Journal of Medical Research. 2024;29(1). doi:10.1186/s40001-024-01756-0 14
-
[13]
Machine-learning models for prediction of sepsis patients mor- tality.Medicina Intensiva
Bao C, Deng F, Zhao S. Machine-learning models for prediction of sepsis patients mor- tality.Medicina Intensiva. 2023;47(6):315–325. doi:10.1016/j.medin.2022.06.004
-
[14]
Zeng Z, Yao S, Zheng J, Gong X. Development and validation of a novel blending machine learning model for hospital mortality prediction in ICU patients with Sepsis. BioData Mining. 2021;14(1). doi:10.1186/s13040-021-00276-5
-
[15]
Hou N, Li M, He L, et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost.Journal of Translational Medicine. 2020;18(1). doi:10.1186/s12967-020-02620-5
-
[16]
Brankovic A, Hassanzadeh H, Good N, et al. Explainable machine learning for real- time deterioration alert prediction to guide pre-emptive treatment.Scientific Reports. 2022;12(1). doi:10.1038/s41598-022-15877-1
-
[17]
Steitz BD, McCoy AB, Reese TJ, et al. Development and Validation of a Machine Learning Algorithm Using Clinical Pages to Predict Imminent Clinical Deterioration. Journal of General Internal Medicine. 2023;39(1):27–35. doi:10.1007/s11606-023-08349- 3
-
[18]
Yuan S, Yang Z, Li J, Wu C, Liu S. AI-Powered early warning systems for clinical deterioration significantly improve patient outcomes: a meta-analysis.BMC Medical Informatics and Decision Making. 2025;25(1). doi:10.1186/s12911-025-03048-x
-
[19]
Akel MA, Carey KA, Winslow CJ, Churpek MM, Edelson DP. Less is more: Detecting clinical deterioration in the hospital with machine learning using only age, heart rate, and respiratory rate.Resuscitation. 2021;168:6–10. doi:10.1016/j.resuscitation.2021.08.024
-
[20]
Thiele D, Rodseth R, Friedland R, et al. Machine Learning Models for the Early Real- Time Prediction of Deterioration in Intensive Care Units—A Novel Approach to the Early Identification of High-Risk Patients.Journal of Clinical Medicine. 2025;14(2):350. doi:10.3390/jcm14020350
-
[21]
Hu C, Li L, Huang W, et al. Interpretable Machine Learning for Early Prediction of PrognosisinSepsis: ADiscoveryandValidationStudy.Infectious Diseases and Therapy. 2022;11(3):1117–1132. doi:10.1007/s40121-022-00628-6
-
[22]
He B, Qiu Z. Development and validation of an interpretable machine learning for mortality prediction in patients with sepsis.Frontiers in Artificial Intelligence. 2024;7. doi:10.3389/frai.2024.1348907
-
[23]
Zhang G, Wang T, An L, et al. U-shaped correlation of lymphocyte count with all-cause hospital mortality in sepsis and septic shock patients: a MIMIC-IV and eICU-CRD database study.International Journal of Emergency Medicine. 2024;17(1). doi:10.1186/s12245-024-00682-6 15
-
[24]
Lin TH, Chung HY, Jian MJ, et al. AI-Driven Innovations for Early Sepsis Detection by Combining Predictive Accuracy With Blood Count Analysis in an Emergency Setting: Retrospective Study. JMIR Publications Inc.; 2024. doi:10.2196/preprints.56155
-
[25]
Park SW, Yeo NY, Kang S, et al. Early Prediction of Mortality for Septic Patients Visit- ing Emergency Room Based on Explainable Machine Learning: A Real-World Multicen- ter Study.Journal of Korean Medical Science. 2024;39(5). doi:10.3346/jkms.2024.39.e53
-
[26]
Fan SH, Pang MM, Si M, et al. Quantitative changes in platelet count in response to dif- ferent pathogens: an analysis of patients with sepsis in both retrospective and prospec- tive cohorts.Annals of Medicine. 2024;56(1). doi:10.1080/07853890.2024.2405073
-
[27]
Li D, Hou J, Shi Z, et al. Frailty Index-laboratory and lymphocyte subset patterns in predicting 28-day mortality among elderly sepsis patients: a multicenter observational cohort study.Frontiers in Immunology. 2025;16. doi:10.3389/fimmu.2025.1624655
-
[28]
Choi S, Nah S, Suh GJ, et al. Prognostic Value of the AST/ALT Ratio in Patients with Septic Shock: A Prospective, Multicenter, Registry-Based Observational Study. Diagnostics. 2025;15(14):1773. doi:10.3390/diagnostics15141773
-
[29]
PMID: 41047921, https://doi.org/10.1080/07853890.2025.2568119
Pinte L, Dumitru AC, Usurelu AC, et al. Low eosinophils and their dynamic as a predictor of death in patients with infections: a systematic review and meta-analysis of cohort studies.Annals of Medicine. 2025;57(1). doi:10.1080/07853890.2025.2541084
-
[30]
Abidi K, Khoudri I, Belayachi J, et al. Eosinopenia is a reliable marker of sepsis on admission to medical intensive care units.Critical Care. 2008;12(2). doi:10.1186/cc6883
-
[31]
Absolute Eosinophil Counts as a Prognostic Marker in Patients with Sepsis.Annals of African Medicine
Shravani S, Kulkarni A, Aslam SM, Suhail KM, Shaji RM. Absolute Eosinophil Counts as a Prognostic Marker in Patients with Sepsis.Annals of African Medicine. 2025;24(2):332–336. doi:10.4103/aam.aam_203_24
-
[32]
Al Duhailib Z, Farooqi M, Piticaru J, Alhazzani W, Nair P. The role of eosinophils in sepsis and acute respiratory distress syndrome: a scoping review.Canadian Journal of Anesthesia. 2021;68(5):715–726. doi:10.1007/s12630-021-01920-8 16 Supplementary Materials S1. Dataset and Laboratory Test Coverage Each diagnostic was associated with a high-level comorb...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.