pith. machine review for the scientific record. sign in

arxiv: 2604.20924 · v1 · submitted 2026-04-22 · 💻 cs.LG

Recognition: unknown

Clinically Interpretable Sepsis Early Warning via LLM-Guided Simulation of Temporal Physiological Dynamics

Authors on Pith no claims yet

Pith reviewed 2026-05-10 01:14 UTC · model grok-4.3

classification 💻 cs.LG
keywords sepsis early warninglarge language modelsphysiological simulationinterpretable predictionstemporal dynamicsICU monitoringMIMIC-IVeICU
0
0 comments X

The pith

Simulating physiological trajectories with LLMs before classifying sepsis provides transparent early warnings that outperform opaque models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes using large language models to simulate the temporal changes in patients' vital signs leading up to sepsis. By generating these trajectories first and then predicting onset, the system produces explanations that clinicians can follow and verify against their own reasoning. The method is evaluated on two major ICU databases where it shows better predictive accuracy than existing deep learning or rule-based systems across different time windows before sepsis develops. If the simulations are faithful, this could increase physician trust in AI alerts and enable more timely personalized interventions in critical care.

Core claim

The framework uses spatiotemporal feature extraction to capture vital sign dependencies, a Medical Prompt-as-Prefix to guide LLMs with clinical cues, and agent-based post-processing to ensure realistic ranges. Simulating the evolution of key physiological indicators prior to classification yields interpretable predictions with AUC scores of 0.861-0.903 on 24- to 4-hour pre-onset tasks in MIMIC-IV and eICU databases, surpassing conventional approaches while aligning with clinical judgment through transparent trajectories.

What carries the argument

The LLM-guided temporal simulation framework, which first models physiological trajectories using prompted LLMs and agent constraints before performing sepsis classification.

If this is right

  • Offers transparent prediction mechanisms that align with clinical judgment.
  • Achieves superior AUC scores of 0.861-0.903 for pre-onset predictions from 4 to 24 hours.
  • Outperforms conventional deep learning and rule-based methods on MIMIC-IV and eICU databases.
  • Provides interpretable trajectories and risk trends to support early intervention and personalized decision-making in intensive care.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The simulation could be adapted for other acute conditions by modifying the clinical prompts to focus on different physiological patterns.
  • Physicians might leverage the generated trajectories to anticipate the effects of potential treatments on the simulated course.
  • Validating the fidelity of simulated trajectories against real-time monitoring data would strengthen clinical adoption.

Load-bearing premise

The large language model, when guided by medical prompts, can generate trajectories that accurately represent real physiological deterioration processes without introducing implausible artifacts.

What would settle it

An ablation test where removing the LLM simulation step causes the model's AUC to drop below that of baseline deep learning methods on the same MIMIC-IV and eICU data, or a direct check showing simulated vital sign paths deviate substantially from actual recorded changes in sepsis patients.

Figures

Figures reproduced from arXiv: 2604.20924 by Bingyang Zhou, Chunpei Li, Hongzhi Yu, Ke Lu, Weijie Wang, Weizhi Nie, Zhen Qu.

Figure 1
Figure 1. Figure 1: Illustration of the LLM-guided “predict-then-classify” mechanism for sepsis early [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Flowchart of data processing. The above figure illustrates the screening criteria [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The operational mechanism of our proposed LLM-Based Spatiotemporal Feature [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Demonstration of Patient-Level Prompts. The LLM initially performs a com [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The receiver operating characteristic (ROC) curves for five early prediction [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The figure shows the prediction results of some variables. The yellow part rep [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The receiver operating characteristic (ROC) curves corresponding to the models [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The degree of influence of different variables on the results. The vertical axis [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: From top-left to bottom-right, the figure sequentially displays the receiver oper [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
read the original abstract

Timely and interpretable early warning of sepsis remains a major clinical challenge due to the complex temporal dynamics of physiological deterioration. Traditional data-driven models often provide accurate yet opaque predictions, limiting physicians' confidence and clinical applicability. To address this limitation, we propose a Large Language Model (LLM)-guided temporal simulation framework that explicitly models physiological trajectories prior to disease onset for clinically interpretable prediction. The framework consists of a spatiotemporal feature extraction module that captures dynamic dependencies among multivariate vital signs, a Medical Prompt-as-Prefix module that embeds clinical reasoning cues into LLMs, and an agent-based post-processing component that constrains predictions within physiologically plausible ranges. By first simulating the evolution of key physiological indicators and then classifying sepsis onset, our model offers transparent prediction mechanisms that align with clinical judgment. Evaluated on the MIMIC-IV and eICU databases, the proposed method achieves superior AUC scores (0.861-0.903) across 24-4-hour pre-onset prediction tasks, outperforming conventional deep learning and rule-based approaches. More importantly, it provides interpretable trajectories and risk trends that can assist clinicians in early intervention and personalized decision-making in intensive care environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an LLM-guided temporal simulation framework for clinically interpretable sepsis early warning. It comprises a spatiotemporal feature extraction module for multivariate vital signs, a Medical Prompt-as-Prefix module to embed clinical reasoning cues into LLMs, and an agent-based post-processing component to constrain outputs to physiologically plausible ranges. By simulating physiological trajectories before classifying sepsis onset, the approach claims to yield transparent predictions aligned with clinical judgment. Evaluated on MIMIC-IV and eICU, it reports superior AUC scores of 0.861-0.903 across 24- to 4-hour pre-onset tasks, outperforming conventional deep learning and rule-based methods.

Significance. If the simulated trajectories prove physiologically faithful and the performance gains are attributable to the simulation rather than standard extraction, the framework could meaningfully advance interpretable AI for critical care by providing clinicians with explicit risk trends and trajectory visualizations. The core idea of prefixing clinical prompts to guide simulation before classification is a potentially useful direction for aligning ML outputs with medical reasoning, though its impact remains unverified without supporting evidence.

major comments (2)
  1. [Abstract] Abstract: The claim of superior AUC scores (0.861-0.903) across 24-4-hour pre-onset tasks is presented without any reference to data splits, baseline implementations, statistical tests, ablation studies, or missing-value handling. These details are load-bearing for the central performance claim and must be supplied to allow verification.
  2. [Abstract] Abstract: No quantitative fidelity validation is reported for the Medical Prompt-as-Prefix module or agent-based post-processing (e.g., distributional match of simulated vitals to MIMIC-IV/eICU data, clinician plausibility scores, or ablation isolating the simulation step's contribution). This directly undermines the interpretability claim that 'simulating the evolution of key physiological indicators' yields 'transparent prediction mechanisms that align with clinical judgment.'
minor comments (1)
  1. [Abstract] Abstract: The AUC range (0.861-0.903) should explicitly map each endpoint to its corresponding prediction horizon (24 h vs. 4 h) for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have addressed each major comment point by point below, providing clarifications from the full paper and indicating revisions made to strengthen the presentation of our results and claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim of superior AUC scores (0.861-0.903) across 24-4-hour pre-onset tasks is presented without any reference to data splits, baseline implementations, statistical tests, ablation studies, or missing-value handling. These details are load-bearing for the central performance claim and must be supplied to allow verification.

    Authors: We agree that the abstract, constrained by length, does not explicitly reference these supporting details, which are important for verifying the performance claims. The full manuscript provides them comprehensively: data splits and preprocessing (including missing-value handling via forward-fill and interpolation) are detailed in Section 4.1 and 3.2; baseline implementations and their hyperparameter settings in Section 4.2; statistical significance testing (paired t-tests with p-values) in Section 4.3; and ablation studies in Section 5.3. To improve accessibility, we have revised the abstract to include a concise reference to the 5-fold cross-validation protocol and evaluation on MIMIC-IV and eICU, while directing readers to the Methods and Experiments sections for full verification. This change ensures the central claim is better supported without exceeding abstract length limits. revision: yes

  2. Referee: [Abstract] Abstract: No quantitative fidelity validation is reported for the Medical Prompt-as-Prefix module or agent-based post-processing (e.g., distributional match of simulated vitals to MIMIC-IV/eICU data, clinician plausibility scores, or ablation isolating the simulation step's contribution). This directly undermines the interpretability claim that 'simulating the evolution of key physiological indicators' yields 'transparent prediction mechanisms that align with clinical judgment.'

    Authors: This observation is correct and highlights a gap in the original submission. The manuscript includes qualitative trajectory visualizations (Figure 6) and describes the agent-based constraints in Section 3.3, but lacks quantitative fidelity metrics. We have revised the paper by adding an ablation study (new subsection 5.4) that isolates the simulation module's contribution, showing a statistically significant AUC drop (p<0.01) when removed. We have also added quantitative distributional comparisons in the supplementary material, including Kolmogorov-Smirnov tests and KL divergence between simulated and real vital sign distributions on both datasets. While clinician plausibility scoring was not performed in this study, we have expanded the discussion section to acknowledge this limitation and outline it as future work. These additions directly bolster the evidence for the simulation's role in interpretability. revision: partial

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper presents a modular framework (spatiotemporal extractor + Medical Prompt-as-Prefix + agent-based post-processing) whose central claim is that LLM-guided simulation of physiological trajectories yields both higher AUC and clinical interpretability on MIMIC-IV/eICU data. No equations, fitted parameters, or self-citations are shown that reduce the simulation output or final classifier to the input data by construction. The performance numbers and interpretability benefit are presented as empirical outcomes of the proposed components rather than tautological re-statements of the training data. The derivation chain therefore remains self-contained against external benchmarks and does not match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Only abstract available, preventing full audit. Framework introduces new modules whose correctness depends on unstated assumptions about LLM medical reasoning fidelity and physiological constraint effectiveness.

invented entities (2)
  • Medical Prompt-as-Prefix module no independent evidence
    purpose: Embeds clinical reasoning cues into LLMs for trajectory simulation
    New component introduced to guide LLM behavior; no independent evidence of correctness provided.
  • agent-based post-processing component no independent evidence
    purpose: Constrains predictions to physiologically plausible ranges
    Invented mechanism to enforce realism; no external validation of its impact shown.

pith-pipeline@v0.9.0 · 5523 in / 1480 out tokens · 91347 ms · 2026-05-10T01:14:37.397636+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 56 canonical work pages · 3 internal anchors

  1. [1]

    The third international consensus definitions for sepsis and septic shock (Sepsis-3),

    M. Singer, C. S. Deutschman, C. W. Seymour, and et al., “The third international consensus definitions for sepsis and septic shock (Sepsis-3),”JAMA, vol. 315, no. 8, pp. 801–810, Jun. 2016, doi: 10.1001/jama.2016.0287

  2. [2]

    Sepsis and septic shock,

    M. Cecconi, L. Evans, M. Levy, and et al., “Sepsis and septic shock,” Lancet, vol. 392, no. 10141, pp. 75–87, Aug. 2018, doi: 10.1016/S0140- 6736(18)30696-2

  3. [3]

    The immunopathology of sepsis and potential therapeutic tar- gets,

    T. van der Poll, F. L. Van De Veerdonk, B. P. Scicluna, and et al., “The immunopathology of sepsis and potential therapeutic tar- gets,”Nat. Rev. Immunol., vol. 17, no. 7, pp. 407–420, Jul. 2017, doi: 10.1038/nri.2017.36

  4. [4]

    Assessment of clinical criteria for sepsis: 25 for the third international consensus definitions for sepsis and septic shock (Sepsis-3),

    C. W. Seymour, V. X. Liu, T. J. Iwashyna, F. M. Brunkhorst, T. D. Rea, A. Scherag, and et al., “Assessment of clinical criteria for sepsis: 25 for the third international consensus definitions for sepsis and septic shock (Sepsis-3),”JAMA, vol. 315, no. 8, pp. 762–774, Jun. 2016, doi: 10.1001/jama.2016.0288

  5. [5]

    Multi- step ahead predictions for critical levels in physiological time series,

    H. ElMoaqet, D. M. Tilbury, and S. K. Ramachandran, “Multi- step ahead predictions for critical levels in physiological time series,” IEEE Trans. Cybern., vol. 46, no. 7, pp. 1704–1717, Jul. 2016, doi: 10.1109/TCYB.2016.2561974

  6. [6]

    ChatGPT predicts in-hospital all-cause mortality for sep- sis: In-context learning with the Korean Sepsis Alliance Database,

    N. Oh, W. C. Cha, J. H. Seo, S. G. Choi, J. M. Kim, C. R. Chung, and et al., “ChatGPT predicts in-hospital all-cause mortality for sep- sis: In-context learning with the Korean Sepsis Alliance Database,” Healthcare Inform. Res., vol. 30, no. 3, pp. 266–276, Sep. 2024, doi: 10.4258/hir.2024.30.3.266

  7. [7]

    SIRS, qSOFA and new sepsis def- inition,

    P. E. Marik and A. M. Taeb, “SIRS, qSOFA and new sepsis def- inition,”J. Thorac. Dis., vol. 9, no. 4, pp. 943, Apr. 2017, doi: 10.21037/jtd.2017.03.125

  8. [8]

    Langlotz, and Akshay S

    A. E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, andetal., “MIMIC-IV,afreelyaccessibleelectronichealthrecord dataset,”Sci. Data, vol. 10, no. 1, pp. 1, Jan. 2023, doi: 10.1038/s41597- 022-01899-x

  9. [9]

    Sepsis: pathophysiology and clinical management,

    J. E. Gotts and M. A. Matthay, “Sepsis: pathophysiology and clinical management,”BMJ, vol. 353, no. 1, 2016, doi: 10.1136/bmj.i1585

  10. [10]

    Septic shock prediction and knowl- edge discovery through temporal pattern mining,

    J. K. Agor, R. Li, O. Y.¨Ozaltın, “Septic shock prediction and knowl- edge discovery through temporal pattern mining,”Artif. Intell. Med., vol. 132, pp. 102406, 2022, doi: 10.1016/j.artmed.2022.102406

  11. [11]

    Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare,

    K.-H. Goh, L. Wang, A. Y. K. Yeow, H. Poh, K. Li, and et al., “Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare,”Nat. Commun., vol. 12, no. 1, pp. 711, Jan. 2021, doi: 10.1038/s41467-021-20910-4

  12. [12]

    Temporal and spatial analysis in early sepsis prediction via causal disentanglements,

    Q. Li, D. Li, W. Nie, H. Jiao, Z. Wu, and et al., “Temporal and spatial analysis in early sepsis prediction via causal disentanglements,”IEEE Trans. Knowl. Data Eng., to be published, 2025. 26

  13. [13]

    Evaluation of performance, energy, and computation costs of quantum-attack resilient encryption algorithms for embedded de- vices,

    M. Zhu, J. Xia, X. Jin, M. Yan, G. Cai, and et al., “Class weights random forest algorithm for processing class imbalanced medical data,” IEEE Access, vol. 6, pp. 4641–4652, Jan. 2018, doi: 10.1109/AC- CESS.2018.2789429

  14. [14]

    Data process- ing and text mining technologies on electronic medical records: a re- view,

    W. Sun, Z. Cai, Y. Li, F. Liu, S. Fang, and et al., “Data process- ing and text mining technologies on electronic medical records: a re- view,”J. Healthcare Eng., vol. 2018, no. 1, pp. 4302425, Dec. 2018, doi: 10.1155/2018/4302425

  15. [15]

    Multitask Gaus- sian processes for multivariate physiological time-series analysis,

    R. Dürichen, M. A. F. Pimentel, L. Clifton, et al., “Multitask Gaus- sian processes for multivariate physiological time-series analysis,”IEEE Trans. Biomed. Eng., vol. 62, no. 1, pp. 314–322, Jan. 2015, doi: 10.1109/TBME.2014.2351376

  16. [16]

    Prediction of sepsis in the intensive care unit with minimal elec- tronic health record data: a machine learning approach,

    T. Desautels, J. Calvert, J. Hoffman, M. Jay, Y. Kerem, L. Shieh, and et al., “Prediction of sepsis in the intensive care unit with minimal elec- tronic health record data: a machine learning approach,”JMIR Med. In- form., vol. 4, no. 3, pp. e5909, Sep. 2016, doi: 10.2196/medinform.5909

  17. [17]

    An attention based deep learning model of clin- ical events in the intensive care unit,

    D. A. Kaji, J. R. Zech, J. S. Kim, S. K. Cho, N. S. Dangayach, A. B. Costa, and et al., “An attention based deep learning model of clin- ical events in the intensive care unit,”PLOS ONE, vol. 14, no. 2, pp. e0211057, Feb. 2019, doi: 10.1371/journal.pone.0211057

  18. [18]

    A time- phased machine learning model for real-time prediction of sepsis in crit- ical care,

    X. Li, X. Xu, F. Xie, X. Xu, Y. Sun, X. Liu, and et al., “A time- phased machine learning model for real-time prediction of sepsis in crit- ical care,”Crit. Care Med., vol. 48, no. 10, pp. e884–e888, Oct. 2020, doi: 10.1097/CCM.0000000000004494

  19. [19]

    MGP-AttTCN: an interpretable machine learning model for the prediction of sepsis,

    M. Rosnati and V. Fortuin, “MGP-AttTCN: an interpretable machine learning model for the prediction of sepsis,”PLOS ONE, vol. 16, no. 5, pp. e0251248, May 2021, doi: 10.1371/journal.pone.0251248

  20. [20]

    Forecasting monthly gas field production based on the CNN-LSTM model,

    W. Zha, Y. Liu, Y. Wan, R. Luo, D. Li, S. Yang, and et al., “Forecasting monthly gas field production based on the CNN-LSTM model,”Energy, vol. 260, pp. 124889, Dec. 2022, doi: 10.1016/j.energy.2022.124889

  21. [21]

    BioGPT: generative pre-trained transformer for biomedical text generation and mining.Brief Bioinform.2022;23(6):bbac409

    R. Luo, L. Sun, Y. Xia, T. Qin, S. Zhang, H. Poon, and et al., “BioGPT: generative pre-trained transformer for biomedical text generation and 27 mining,”Brief. Bioinform., vol. 23, no. 6, pp. bbac409, Dec. 2022, doi: 10.1093/bib/bbac409

  22. [22]

    Degradation Prediction of Semiconductor Lasers Using Conditional Variational Autoencoder , volume=

    M. H. Tahan, M. Ghasemzadeh, and S. Asadi, “A novel embedded discretization-based deep learning architecture for multivariate time se- ries classification,”IEEE Trans. Ind. Inform., vol. 19, no. 4, pp. 5976– 5985, Apr. 2023, doi: 10.1109/TII.2022.3188839

  23. [23]

    Neonatal in- fectious diseases: evaluation of neonatal sepsis,

    A. Camacho-Gonzalez, P. W. Spearman, and B. J. Stoll, “Neonatal in- fectious diseases: evaluation of neonatal sepsis,”Pediatr. Clin. North Am., vol. 60, no. 2, pp. 367, Apr. 2013, doi: 10.1016/j.pcl.2012.12.003

  24. [24]

    OnAI-Comp: an online AI experts competing framework for early sepsis detection,

    A. Zhou, R. Beyah, and R. Kamaleswaran, “OnAI-Comp: an online AI experts competing framework for early sepsis detection,”IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 19, no. 6, pp. 3595–3603, Nov./Dec. 2022, doi: 10.1109/TCBB.2021.3122405

  25. [25]

    Sepsis prediction, early de- tection, and identification using clinical text for machine learning: a systematic review,

    M. Y. Yan, L. T. Gustad, and Ø. Nytrø, “Sepsis prediction, early de- tection, and identification using clinical text for machine learning: a systematic review,”J. Am. Med. Inform. Assoc., vol. 29, no. 3, pp. 559–575, Mar. 2022, doi: 10.1093/jamia/ocab270

  26. [26]

    Evaluation of definitions for sepsis,

    W. A. Knaus, X. Sun, P. O. Nystrom, and D. P. Wagner, “Evaluation of definitions for sepsis,”Chest, vol. 101, no. 6, pp. 1656–1662, Jun. 1992, doi: 10.1378/chest.101.6.1656

  27. [27]

    The role of infection and comorbidity: factors that influence disparities in sepsis,

    A. M. Esper, M. Moss, C. A. Lewis, R. Nisbet, D. M. Mannino, and G. S. Martin, “The role of infection and comorbidity: factors that influence disparities in sepsis,”Crit. Care Med., vol. 34, no. 10, pp. 2576–2582, Oct. 2006, doi: 10.1097/01.CCM.0000240646.29109.6A

  28. [28]

    Y. Zhu, A. Mueen, and E. Keogh, Admissible time series motif discovery with missing data,”IEEE Trans. Knowl. Data Eng., vol. 33, no. 11, pp. 3402–3415, Nov. 2021, doi: 10.1109/TKDE.2019.2948196

  29. [29]

    Procalcitonin, C-reactive protein, white blood cells and SOFA score in ICU: diagnosis and monitoring of sepsis,

    G. P. Castelli, C. Pognani, M. Cita, A. Stuani, L. Sgarbi, and R. Pal- adini, “Procalcitonin, C-reactive protein, white blood cells and SOFA score in ICU: diagnosis and monitoring of sepsis,”Minerva Anestesiol., vol. 72, no. 1/2, pp. 69, Jan./Feb. 2006, doi: 10.1007/s12340-006-0012-6. 28

  30. [30]

    External validation com- plexities: a comparative study of late-onset sepsis prediction models across multiple clinical environments,

    Z. Peng, J. S. Schouten, D. Silvertand, et al., “External validation com- plexities: a comparative study of late-onset sepsis prediction models across multiple clinical environments,”IEEE Trans. Biomed. Eng., to be published, 2025, doi: 10.1109/TBME.2025.3618080

  31. [31]

    Evaluation of lactate, white blood cell count, neutrophil count, procalcitonin and immature gran- ulocyte count as biomarkers for sepsis in emergency department pa- tients,

    B. S. Karon, N. V. Tolan, A. M. Wockenfus, D. R. Block, N. A. Bau- mann, S. C. Bryant, and C. M. Clements, “Evaluation of lactate, white blood cell count, neutrophil count, procalcitonin and immature gran- ulocyte count as biomarkers for sepsis in emergency department pa- tients,”Clin. Biochem., vol. 50, no. 16-17, pp. 956–958, Oct. 2017, doi: 10.1016/j.c...

  32. [32]

    Arterial blood pressure during early sepsis and outcome,

    M. W. Dünser, J. Takala, H. Ulmer, V. D. Mayr, G. Luckner, S. Jochberger, and et al., “Arterial blood pressure during early sepsis and outcome,”Intensive Care Med., vol. 35, pp. 1225–1233, Jul. 2009, doi: 10.1007/s00134-009-1544-3

  33. [33]

    The reliability of the Glasgow Coma Scale: a systematic review,

    F. C. Reith, R. Van den Brande, A. Synnot, R. Gruen, and A. I. Maas, “The reliability of the Glasgow Coma Scale: a systematic review,”In- tensive Care Med., vol. 42, pp. 3–15, Jan. 2016, doi: 10.1007/s00134- 015-4124-3

  34. [34]

    Reducedproductionofcreatininelimitsitsuseasmarkerofkidney injury in sepsis,

    K. Doi, P. S. Yuen, C. Eisner, X. Hu, A. Leelahavanichkul, and R. A. Star, “Reducedproductionofcreatininelimitsitsuseasmarkerofkidney injury in sepsis,”J. Am. Soc. Nephrol., vol. 20, no. 6, pp. 1217–1221, Jun. 2009, doi: 10.1681/ASN.2008090955

  35. [35]

    Prompt engineering as an important emerging skill for med- ical professionals: tutorial,

    B. Meskó, “Prompt engineering as an important emerging skill for med- ical professionals: tutorial,”J. Med. Internet Res., vol. 25, pp. e50638, Mar. 2023, doi: 10.2196/50638

  36. [36]

    The eICU Col- laborative Research Database, a freely available multi-center database for critical care research.Scientific Data

    T. J. Pollard, A. E. W. Johnson, J. D. Raffa, L. A. Celi, R. G. Mark, and O. Badawi, “The eICU Collaborative Research Database, a freely available multi-center database for critical care research,”Sci. Data, vol. 5, no. 1, pp. 1–13, Jan. 2018, doi: 10.1038/sdata.2018.178

  37. [37]

    Explaining and Harnessing Adversarial Examples

    I. J. Goodfellow, J. Shlens, C. Szegedy, and et al., “Ex- plaining and harnessing adversarial examples,”arXiv Prepr., 2014, doi: 10.48550/arXiv.1412.6572, [Online]. Available: https://arxiv.org/abs/1412.6572. 29

  38. [38]

    Machine learning for the predic- tion of sepsis: a systematic review and meta-analysis of diagnostic test accuracy,

    L. M. Fleuren, T. L. Klausch, C. L. Zwager, L. J. Schoonmade, T. Guo, L. F. Roggeveen, and et al., “Machine learning for the predic- tion of sepsis: a systematic review and meta-analysis of diagnostic test accuracy,”Intensive Care Med., vol. 46, pp. 383–400, Feb. 2020, doi: 10.1007/s00134-019-05872-y

  39. [39]

    Machine learning based clinical prediction model for 1-year mortality in Sepsis patients with atrial fibrillation,

    H. Meng, L. Guo, Y. Pan, B. Kong, W. Shuai, H. Huang, “Machine learning based clinical prediction model for 1-year mortality in Sepsis patients with atrial fibrillation,”Heliyon, vol. 10, pp. e38730, 2024, doi: 10.1016/j.heliyon.2024.e38730

  40. [40]

    D. Dera, S. Ahmed, N. C. Bouaynaya, and G. Rasool, TRustworthy uncertainty propagation for sequential time-series analysis in RNNs,” IEEE Trans. Knowl. Data Eng., vol. 36, no. 2, pp. 882–896, Feb. 2024, doi: 10.1109/TKDE.2023.3288628

  41. [41]

    Sepsis definitions: time for change,

    J.-L. Vincent, S. M. Opal, J. C. Marshall, and K. J. Tracey, “Sepsis definitions: time for change,”Lancet, vol. 381, no. 9868, pp. 774–775, Feb. 2013, doi: 10.1016/S0140-6736(12)61815-7

  42. [42]

    Time- LLM: time series forecasting by reprogramming large language models,

    M. Jin, S. Wang, L. Ma, Z. Chu, J. Y. Zhang, X. Shi, and et al., “Time- LLM: time series forecasting by reprogramming large language models,” arXiv Prepr., 2023, doi: 10.48550/arXiv.2310.02307, [Online]. Available: https://arxiv.org/abs/2310.02307

  43. [43]

    Inte- grating federated learning for improved counterfactual explanations in clinical decision support systems for sepsis therapy,

    C. Düsing, P. Cimiano, S. Rehberg, C. Scherer, O. Kaup, C. Köster, S. Hellmich, D. Herrmann, K. L. Meier, S. Claßen, R. Borgstedt, “Inte- grating federated learning for improved counterfactual explanations in clinical decision support systems for sepsis therapy,”Artif. Intell. Med., vol. 157, pp. 102982, 2024, doi: 10.1016/j.artmed.2024.102982

  44. [44]

    RoBERTa: A Robustly Optimized BERT Pretraining Approach

    Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, and et al., “RoBERTa: a robustly optimized BERT pretraining approach,” arXiv Prepr., 2019, doi: 10.48550/arXiv.1907.11692, [Online]. Available: https://arxiv.org/abs/1907.11692

  45. [45]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, and et al., “Deepseek-r1: incentivizing reasoning capability in LLMs via reinforce- ment learning,”arXiv Prepr., 2025, doi: 10.48550/arXiv.2501.12948, [Online]. Available: https://arxiv.org/abs/2501.12948. 30

  46. [46]

    A ranking-based cross- entropy loss for early classification of time series,

    C. Sun, H. Li, M. Song, and S. Hong, “A ranking-based cross- entropy loss for early classification of time series,”IEEE Trans. Neu- ral Netw. Learn. Syst., vol. 35, no. 8, pp. 11194–11204, Aug. 2024, doi: 10.1109/TNNLS.2023.3250203

  47. [47]

    Publicly Available Clinical BERT Embeddings

    E. Alsentzer, J. R. Murphy, W. Boag, W. H. Weng, D. Jin, T. Naumann, and et al., “Publicly available clinical BERT embeddings,” arXiv Prepr., 2019, doi: 10.48550/arXiv.1904.03323, [Online]. Available: https://arxiv.org/abs/1904.03323

  48. [48]

    Automatic text summariza- tion of COVID-19 medical research articles using BERT and GPT-2,

    V. Kieuvongngam, B. Tan, and Y. Niu, “Automatic text summariza- tion of COVID-19 medical research articles using BERT and GPT-2,” arXiv Prepr., 2020, doi: 10.48550/arXiv.2006.01997, [Online]. Available: https://arxiv.org/abs/2006.01997

  49. [49]

    Q. Li, D. Li, W. Nie, H. Jiao, Z. Wu, and et al., Temporal and spatial analysis in early sepsis prediction via causal disentangle- ments,”IEEE Trans. Knowl. Data Eng., to be published, 2025, doi: 10.1109/TKDE.2024.3401849

  50. [50]

    When scaling meets LLM finetuning: The effect of data, model and finetuning method.arXiv preprint arXiv:2402.17193, 2024

    B. Zhang, Z. Liu, C. Cherry, and O. Firat, “When scaling meets LLM finetuning: the effect of data, model and finetuning method,” arXiv Prepr., 2024, doi: 10.48550/arXiv.2402.17193, [Online]. Available: https://arxiv.org/abs/2402.17193

  51. [51]

    Resnet in resnet: Generalizing residual architectures,

    S. Targ, D. Almeida, and K. Lyman, “Resnet in resnet: generalizing residual architectures,”arXiv Prepr., 2016, doi: 10.48550/arXiv.1603.08029, [Online]. Available: https://arxiv.org/abs/1603.08029

  52. [52]

    Mistral: Dynamically managing power, performance, and adaptation cost in cloud infrastructures,

    G. Jung, M. A. Hiltunen, K. R. Joshi, R. D. Schlichting, and C. Pu, “Mistral: Dynamically managing power, performance, and adaptation cost in cloud infrastructures,” inProc. 2010 IEEE 30th Int. Conf. Dis- trib. Comput. Syst., Minneapolis, MN, USA, Jun. 2010, pp. 62–73, doi: 10.1109/ICDCS.2010.34

  53. [53]

    A method for the time-varying non- linear prediction of complex nonstationary biomedical signals,

    L. Faes, K. H. Chon, and G. Nollo, “A method for the time-varying non- linear prediction of complex nonstationary biomedical signals,”IEEE Trans. Biomed. Eng., vol. 56, no. 2, pp. 206–209, Feb. 2009, doi: 10.1109/TBME.2008.2008726. 31

  54. [54]

    A robust fusion model for estimating respiratory rate from photoplethysmography and electrocardiography,

    D. A. Birrenkott, M. A. F. Pimentel, P. J. Watkinson, and D. A. Clifton, “A robust fusion model for estimating respiratory rate from photoplethysmography and electrocardiography,”IEEE Trans. Biomed. Eng., vol. 65, no. 9, pp. 2033–2041, Sep. 2018, doi: 10.1109/TBME.2017.2778265

  55. [55]

    Stochastic complexity measures for phys- iological signal analysis,

    I. A. Rezek and S. J. Roberts, “Stochastic complexity measures for phys- iological signal analysis,”IEEE Trans. Biomed. Eng., vol. 45, no. 9, pp. 1186–1191, Sep. 1998, doi: 10.1109/10.718287

  56. [56]

    Y. Li, J. Li, Y. Li, and Q. Li, Time series anomaly detection with adver- sarial reconstruction networks,”IEEE Trans. Knowl. Data Eng., vol. 35, no. 10, pp. 10245–10258, Oct. 2023, doi: 10.1109/TKDE.2021.3137861

  57. [57]

    Early detection of sepsis utilizing deep learning on electronic health record event sequences,

    S. M. Lauritsen, M. E. Kaløra, E. L. Kongsgaard, K. M. Lauritsen, M. J. Jørgensen, J. Lange, B. Thiesson, “Early detection of sepsis utilizing deep learning on electronic health record event sequences,”Artif. Intell. Med., vol. 104, pp. 101820, 2020, doi: 10.1016/j.artmed.2020.101820. 32