arxiv: 2604.20924 · v1 · submitted 2026-04-22 · 💻 cs.LG

Recognition: unknown

Clinically Interpretable Sepsis Early Warning via LLM-Guided Simulation of Temporal Physiological Dynamics

Weizhi Nie , Zhen Qu , Weijie Wang , Chunpei Li , Ke Lu , Bingyang Zhou , Hongzhi Yu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 01:14 UTC · model grok-4.3

classification 💻 cs.LG

keywords sepsis early warninglarge language modelsphysiological simulationinterpretable predictionstemporal dynamicsICU monitoringMIMIC-IVeICU

0 comments

The pith

Simulating physiological trajectories with LLMs before classifying sepsis provides transparent early warnings that outperform opaque models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes using large language models to simulate the temporal changes in patients' vital signs leading up to sepsis. By generating these trajectories first and then predicting onset, the system produces explanations that clinicians can follow and verify against their own reasoning. The method is evaluated on two major ICU databases where it shows better predictive accuracy than existing deep learning or rule-based systems across different time windows before sepsis develops. If the simulations are faithful, this could increase physician trust in AI alerts and enable more timely personalized interventions in critical care.

Core claim

The framework uses spatiotemporal feature extraction to capture vital sign dependencies, a Medical Prompt-as-Prefix to guide LLMs with clinical cues, and agent-based post-processing to ensure realistic ranges. Simulating the evolution of key physiological indicators prior to classification yields interpretable predictions with AUC scores of 0.861-0.903 on 24- to 4-hour pre-onset tasks in MIMIC-IV and eICU databases, surpassing conventional approaches while aligning with clinical judgment through transparent trajectories.

What carries the argument

The LLM-guided temporal simulation framework, which first models physiological trajectories using prompted LLMs and agent constraints before performing sepsis classification.

If this is right

Offers transparent prediction mechanisms that align with clinical judgment.
Achieves superior AUC scores of 0.861-0.903 for pre-onset predictions from 4 to 24 hours.
Outperforms conventional deep learning and rule-based methods on MIMIC-IV and eICU databases.
Provides interpretable trajectories and risk trends to support early intervention and personalized decision-making in intensive care.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The simulation could be adapted for other acute conditions by modifying the clinical prompts to focus on different physiological patterns.
Physicians might leverage the generated trajectories to anticipate the effects of potential treatments on the simulated course.
Validating the fidelity of simulated trajectories against real-time monitoring data would strengthen clinical adoption.

Load-bearing premise

The large language model, when guided by medical prompts, can generate trajectories that accurately represent real physiological deterioration processes without introducing implausible artifacts.

What would settle it

An ablation test where removing the LLM simulation step causes the model's AUC to drop below that of baseline deep learning methods on the same MIMIC-IV and eICU data, or a direct check showing simulated vital sign paths deviate substantially from actual recorded changes in sepsis patients.

Figures

Figures reproduced from arXiv: 2604.20924 by Bingyang Zhou, Chunpei Li, Hongzhi Yu, Ke Lu, Weijie Wang, Weizhi Nie, Zhen Qu.

**Figure 2.** Figure 2: Flowchart of data processing. The above figure illustrates the screening criteria [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: The operational mechanism of our proposed LLM-Based Spatiotemporal Feature [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Demonstration of Patient-Level Prompts. The LLM initially performs a com [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: The receiver operating characteristic (ROC) curves for five early prediction [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: The figure shows the prediction results of some variables. The yellow part rep [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: The receiver operating characteristic (ROC) curves corresponding to the models [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: The degree of influence of different variables on the results. The vertical axis [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

**Figure 9.** Figure 9: From top-left to bottom-right, the figure sequentially displays the receiver oper [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

read the original abstract

Timely and interpretable early warning of sepsis remains a major clinical challenge due to the complex temporal dynamics of physiological deterioration. Traditional data-driven models often provide accurate yet opaque predictions, limiting physicians' confidence and clinical applicability. To address this limitation, we propose a Large Language Model (LLM)-guided temporal simulation framework that explicitly models physiological trajectories prior to disease onset for clinically interpretable prediction. The framework consists of a spatiotemporal feature extraction module that captures dynamic dependencies among multivariate vital signs, a Medical Prompt-as-Prefix module that embeds clinical reasoning cues into LLMs, and an agent-based post-processing component that constrains predictions within physiologically plausible ranges. By first simulating the evolution of key physiological indicators and then classifying sepsis onset, our model offers transparent prediction mechanisms that align with clinical judgment. Evaluated on the MIMIC-IV and eICU databases, the proposed method achieves superior AUC scores (0.861-0.903) across 24-4-hour pre-onset prediction tasks, outperforming conventional deep learning and rule-based approaches. More importantly, it provides interpretable trajectories and risk trends that can assist clinicians in early intervention and personalized decision-making in intensive care environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper tries to improve sepsis early warning by simulating physiological trajectories with LLMs, but the abstract gives no evidence that those simulations are faithful to real data.

read the letter

The main contribution here is a framework that first extracts spatiotemporal features from vital signs, then uses an LLM with medical prompts to simulate how those signs might evolve, and finally applies agent-based rules to keep the simulated paths plausible before predicting sepsis onset. They report AUCs of 0.861-0.903 on MIMIC-IV and eICU for 4- to 24-hour ahead predictions, claiming this beats standard deep learning and rule-based baselines while giving clinicians visible trajectories to look at.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an LLM-guided temporal simulation framework for clinically interpretable sepsis early warning. It comprises a spatiotemporal feature extraction module for multivariate vital signs, a Medical Prompt-as-Prefix module to embed clinical reasoning cues into LLMs, and an agent-based post-processing component to constrain outputs to physiologically plausible ranges. By simulating physiological trajectories before classifying sepsis onset, the approach claims to yield transparent predictions aligned with clinical judgment. Evaluated on MIMIC-IV and eICU, it reports superior AUC scores of 0.861-0.903 across 24- to 4-hour pre-onset tasks, outperforming conventional deep learning and rule-based methods.

Significance. If the simulated trajectories prove physiologically faithful and the performance gains are attributable to the simulation rather than standard extraction, the framework could meaningfully advance interpretable AI for critical care by providing clinicians with explicit risk trends and trajectory visualizations. The core idea of prefixing clinical prompts to guide simulation before classification is a potentially useful direction for aligning ML outputs with medical reasoning, though its impact remains unverified without supporting evidence.

major comments (2)

[Abstract] Abstract: The claim of superior AUC scores (0.861-0.903) across 24-4-hour pre-onset tasks is presented without any reference to data splits, baseline implementations, statistical tests, ablation studies, or missing-value handling. These details are load-bearing for the central performance claim and must be supplied to allow verification.
[Abstract] Abstract: No quantitative fidelity validation is reported for the Medical Prompt-as-Prefix module or agent-based post-processing (e.g., distributional match of simulated vitals to MIMIC-IV/eICU data, clinician plausibility scores, or ablation isolating the simulation step's contribution). This directly undermines the interpretability claim that 'simulating the evolution of key physiological indicators' yields 'transparent prediction mechanisms that align with clinical judgment.'

minor comments (1)

[Abstract] Abstract: The AUC range (0.861-0.903) should explicitly map each endpoint to its corresponding prediction horizon (24 h vs. 4 h) for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have addressed each major comment point by point below, providing clarifications from the full paper and indicating revisions made to strengthen the presentation of our results and claims.

read point-by-point responses

Referee: [Abstract] Abstract: The claim of superior AUC scores (0.861-0.903) across 24-4-hour pre-onset tasks is presented without any reference to data splits, baseline implementations, statistical tests, ablation studies, or missing-value handling. These details are load-bearing for the central performance claim and must be supplied to allow verification.

Authors: We agree that the abstract, constrained by length, does not explicitly reference these supporting details, which are important for verifying the performance claims. The full manuscript provides them comprehensively: data splits and preprocessing (including missing-value handling via forward-fill and interpolation) are detailed in Section 4.1 and 3.2; baseline implementations and their hyperparameter settings in Section 4.2; statistical significance testing (paired t-tests with p-values) in Section 4.3; and ablation studies in Section 5.3. To improve accessibility, we have revised the abstract to include a concise reference to the 5-fold cross-validation protocol and evaluation on MIMIC-IV and eICU, while directing readers to the Methods and Experiments sections for full verification. This change ensures the central claim is better supported without exceeding abstract length limits. revision: yes
Referee: [Abstract] Abstract: No quantitative fidelity validation is reported for the Medical Prompt-as-Prefix module or agent-based post-processing (e.g., distributional match of simulated vitals to MIMIC-IV/eICU data, clinician plausibility scores, or ablation isolating the simulation step's contribution). This directly undermines the interpretability claim that 'simulating the evolution of key physiological indicators' yields 'transparent prediction mechanisms that align with clinical judgment.'

Authors: This observation is correct and highlights a gap in the original submission. The manuscript includes qualitative trajectory visualizations (Figure 6) and describes the agent-based constraints in Section 3.3, but lacks quantitative fidelity metrics. We have revised the paper by adding an ablation study (new subsection 5.4) that isolates the simulation module's contribution, showing a statistically significant AUC drop (p<0.01) when removed. We have also added quantitative distributional comparisons in the supplementary material, including Kolmogorov-Smirnov tests and KL divergence between simulated and real vital sign distributions on both datasets. While clinician plausibility scoring was not performed in this study, we have expanded the discussion section to acknowledge this limitation and outline it as future work. These additions directly bolster the evidence for the simulation's role in interpretability. revision: partial

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper presents a modular framework (spatiotemporal extractor + Medical Prompt-as-Prefix + agent-based post-processing) whose central claim is that LLM-guided simulation of physiological trajectories yields both higher AUC and clinical interpretability on MIMIC-IV/eICU data. No equations, fitted parameters, or self-citations are shown that reduce the simulation output or final classifier to the input data by construction. The performance numbers and interpretability benefit are presented as empirical outcomes of the proposed components rather than tautological re-statements of the training data. The derivation chain therefore remains self-contained against external benchmarks and does not match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Only abstract available, preventing full audit. Framework introduces new modules whose correctness depends on unstated assumptions about LLM medical reasoning fidelity and physiological constraint effectiveness.

invented entities (2)

Medical Prompt-as-Prefix module no independent evidence
purpose: Embeds clinical reasoning cues into LLMs for trajectory simulation
New component introduced to guide LLM behavior; no independent evidence of correctness provided.
agent-based post-processing component no independent evidence
purpose: Constrains predictions to physiologically plausible ranges
Invented mechanism to enforce realism; no external validation of its impact shown.

pith-pipeline@v0.9.0 · 5523 in / 1480 out tokens · 91347 ms · 2026-05-10T01:14:37.397636+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 56 canonical work pages · 3 internal anchors

[1]

The third international consensus definitions for sepsis and septic shock (Sepsis-3),

M. Singer, C. S. Deutschman, C. W. Seymour, and et al., “The third international consensus definitions for sepsis and septic shock (Sepsis-3),”JAMA, vol. 315, no. 8, pp. 801–810, Jun. 2016, doi: 10.1001/jama.2016.0287

work page doi:10.1001/jama.2016.0287 2016
[2]

Sepsis and septic shock,

M. Cecconi, L. Evans, M. Levy, and et al., “Sepsis and septic shock,” Lancet, vol. 392, no. 10141, pp. 75–87, Aug. 2018, doi: 10.1016/S0140- 6736(18)30696-2

work page doi:10.1016/s0140- 2018
[3]

The immunopathology of sepsis and potential therapeutic tar- gets,

T. van der Poll, F. L. Van De Veerdonk, B. P. Scicluna, and et al., “The immunopathology of sepsis and potential therapeutic tar- gets,”Nat. Rev. Immunol., vol. 17, no. 7, pp. 407–420, Jul. 2017, doi: 10.1038/nri.2017.36

work page doi:10.1038/nri.2017.36 2017
[4]

Assessment of clinical criteria for sepsis: 25 for the third international consensus definitions for sepsis and septic shock (Sepsis-3),

C. W. Seymour, V. X. Liu, T. J. Iwashyna, F. M. Brunkhorst, T. D. Rea, A. Scherag, and et al., “Assessment of clinical criteria for sepsis: 25 for the third international consensus definitions for sepsis and septic shock (Sepsis-3),”JAMA, vol. 315, no. 8, pp. 762–774, Jun. 2016, doi: 10.1001/jama.2016.0288

work page doi:10.1001/jama.2016.0288 2016
[5]

Multi- step ahead predictions for critical levels in physiological time series,

H. ElMoaqet, D. M. Tilbury, and S. K. Ramachandran, “Multi- step ahead predictions for critical levels in physiological time series,” IEEE Trans. Cybern., vol. 46, no. 7, pp. 1704–1717, Jul. 2016, doi: 10.1109/TCYB.2016.2561974

work page doi:10.1109/tcyb.2016.2561974 2016
[6]

ChatGPT predicts in-hospital all-cause mortality for sep- sis: In-context learning with the Korean Sepsis Alliance Database,

N. Oh, W. C. Cha, J. H. Seo, S. G. Choi, J. M. Kim, C. R. Chung, and et al., “ChatGPT predicts in-hospital all-cause mortality for sep- sis: In-context learning with the Korean Sepsis Alliance Database,” Healthcare Inform. Res., vol. 30, no. 3, pp. 266–276, Sep. 2024, doi: 10.4258/hir.2024.30.3.266

work page doi:10.4258/hir.2024.30.3.266 2024
[7]

SIRS, qSOFA and new sepsis def- inition,

P. E. Marik and A. M. Taeb, “SIRS, qSOFA and new sepsis def- inition,”J. Thorac. Dis., vol. 9, no. 4, pp. 943, Apr. 2017, doi: 10.21037/jtd.2017.03.125

work page doi:10.21037/jtd.2017.03.125 2017
[8]

Langlotz, and Akshay S

A. E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, andetal., “MIMIC-IV,afreelyaccessibleelectronichealthrecord dataset,”Sci. Data, vol. 10, no. 1, pp. 1, Jan. 2023, doi: 10.1038/s41597- 022-01899-x

work page doi:10.1038/s41597- 2023
[9]

Sepsis: pathophysiology and clinical management,

J. E. Gotts and M. A. Matthay, “Sepsis: pathophysiology and clinical management,”BMJ, vol. 353, no. 1, 2016, doi: 10.1136/bmj.i1585

work page doi:10.1136/bmj.i1585 2016
[10]

Septic shock prediction and knowl- edge discovery through temporal pattern mining,

J. K. Agor, R. Li, O. Y.¨Ozaltın, “Septic shock prediction and knowl- edge discovery through temporal pattern mining,”Artif. Intell. Med., vol. 132, pp. 102406, 2022, doi: 10.1016/j.artmed.2022.102406

work page doi:10.1016/j.artmed.2022.102406 2022
[11]

Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare,

K.-H. Goh, L. Wang, A. Y. K. Yeow, H. Poh, K. Li, and et al., “Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare,”Nat. Commun., vol. 12, no. 1, pp. 711, Jan. 2021, doi: 10.1038/s41467-021-20910-4

work page doi:10.1038/s41467-021-20910-4 2021
[12]

Temporal and spatial analysis in early sepsis prediction via causal disentanglements,

Q. Li, D. Li, W. Nie, H. Jiao, Z. Wu, and et al., “Temporal and spatial analysis in early sepsis prediction via causal disentanglements,”IEEE Trans. Knowl. Data Eng., to be published, 2025. 26

2025
[13]

Evaluation of performance, energy, and computation costs of quantum-attack resilient encryption algorithms for embedded de- vices,

M. Zhu, J. Xia, X. Jin, M. Yan, G. Cai, and et al., “Class weights random forest algorithm for processing class imbalanced medical data,” IEEE Access, vol. 6, pp. 4641–4652, Jan. 2018, doi: 10.1109/AC- CESS.2018.2789429

work page doi:10.1109/ac- 2018
[14]

Data process- ing and text mining technologies on electronic medical records: a re- view,

W. Sun, Z. Cai, Y. Li, F. Liu, S. Fang, and et al., “Data process- ing and text mining technologies on electronic medical records: a re- view,”J. Healthcare Eng., vol. 2018, no. 1, pp. 4302425, Dec. 2018, doi: 10.1155/2018/4302425

work page doi:10.1155/2018/4302425 2018
[15]

Multitask Gaus- sian processes for multivariate physiological time-series analysis,

R. Dürichen, M. A. F. Pimentel, L. Clifton, et al., “Multitask Gaus- sian processes for multivariate physiological time-series analysis,”IEEE Trans. Biomed. Eng., vol. 62, no. 1, pp. 314–322, Jan. 2015, doi: 10.1109/TBME.2014.2351376

work page doi:10.1109/tbme.2014.2351376 2015
[16]

Prediction of sepsis in the intensive care unit with minimal elec- tronic health record data: a machine learning approach,

T. Desautels, J. Calvert, J. Hoffman, M. Jay, Y. Kerem, L. Shieh, and et al., “Prediction of sepsis in the intensive care unit with minimal elec- tronic health record data: a machine learning approach,”JMIR Med. In- form., vol. 4, no. 3, pp. e5909, Sep. 2016, doi: 10.2196/medinform.5909

work page doi:10.2196/medinform.5909 2016
[17]

An attention based deep learning model of clin- ical events in the intensive care unit,

D. A. Kaji, J. R. Zech, J. S. Kim, S. K. Cho, N. S. Dangayach, A. B. Costa, and et al., “An attention based deep learning model of clin- ical events in the intensive care unit,”PLOS ONE, vol. 14, no. 2, pp. e0211057, Feb. 2019, doi: 10.1371/journal.pone.0211057

work page doi:10.1371/journal.pone.0211057 2019
[18]

A time- phased machine learning model for real-time prediction of sepsis in crit- ical care,

X. Li, X. Xu, F. Xie, X. Xu, Y. Sun, X. Liu, and et al., “A time- phased machine learning model for real-time prediction of sepsis in crit- ical care,”Crit. Care Med., vol. 48, no. 10, pp. e884–e888, Oct. 2020, doi: 10.1097/CCM.0000000000004494

work page doi:10.1097/ccm.0000000000004494 2020
[19]

MGP-AttTCN: an interpretable machine learning model for the prediction of sepsis,

M. Rosnati and V. Fortuin, “MGP-AttTCN: an interpretable machine learning model for the prediction of sepsis,”PLOS ONE, vol. 16, no. 5, pp. e0251248, May 2021, doi: 10.1371/journal.pone.0251248

work page doi:10.1371/journal.pone.0251248 2021
[20]

Forecasting monthly gas field production based on the CNN-LSTM model,

W. Zha, Y. Liu, Y. Wan, R. Luo, D. Li, S. Yang, and et al., “Forecasting monthly gas field production based on the CNN-LSTM model,”Energy, vol. 260, pp. 124889, Dec. 2022, doi: 10.1016/j.energy.2022.124889

work page doi:10.1016/j.energy.2022.124889 2022
[21]

BioGPT: generative pre-trained transformer for biomedical text generation and mining.Brief Bioinform.2022;23(6):bbac409

R. Luo, L. Sun, Y. Xia, T. Qin, S. Zhang, H. Poon, and et al., “BioGPT: generative pre-trained transformer for biomedical text generation and 27 mining,”Brief. Bioinform., vol. 23, no. 6, pp. bbac409, Dec. 2022, doi: 10.1093/bib/bbac409

work page doi:10.1093/bib/bbac409 2022
[22]

Degradation Prediction of Semiconductor Lasers Using Conditional Variational Autoencoder , volume=

M. H. Tahan, M. Ghasemzadeh, and S. Asadi, “A novel embedded discretization-based deep learning architecture for multivariate time se- ries classification,”IEEE Trans. Ind. Inform., vol. 19, no. 4, pp. 5976– 5985, Apr. 2023, doi: 10.1109/TII.2022.3188839

work page doi:10.1109/tii.2022.3188839 2023
[23]

Neonatal in- fectious diseases: evaluation of neonatal sepsis,

A. Camacho-Gonzalez, P. W. Spearman, and B. J. Stoll, “Neonatal in- fectious diseases: evaluation of neonatal sepsis,”Pediatr. Clin. North Am., vol. 60, no. 2, pp. 367, Apr. 2013, doi: 10.1016/j.pcl.2012.12.003

work page doi:10.1016/j.pcl.2012.12.003 2013
[24]

OnAI-Comp: an online AI experts competing framework for early sepsis detection,

A. Zhou, R. Beyah, and R. Kamaleswaran, “OnAI-Comp: an online AI experts competing framework for early sepsis detection,”IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 19, no. 6, pp. 3595–3603, Nov./Dec. 2022, doi: 10.1109/TCBB.2021.3122405

work page doi:10.1109/tcbb.2021.3122405 2022
[25]

Sepsis prediction, early de- tection, and identification using clinical text for machine learning: a systematic review,

M. Y. Yan, L. T. Gustad, and Ø. Nytrø, “Sepsis prediction, early de- tection, and identification using clinical text for machine learning: a systematic review,”J. Am. Med. Inform. Assoc., vol. 29, no. 3, pp. 559–575, Mar. 2022, doi: 10.1093/jamia/ocab270

work page doi:10.1093/jamia/ocab270 2022
[26]

Evaluation of definitions for sepsis,

W. A. Knaus, X. Sun, P. O. Nystrom, and D. P. Wagner, “Evaluation of definitions for sepsis,”Chest, vol. 101, no. 6, pp. 1656–1662, Jun. 1992, doi: 10.1378/chest.101.6.1656

work page doi:10.1378/chest.101.6.1656 1992
[27]

The role of infection and comorbidity: factors that influence disparities in sepsis,

A. M. Esper, M. Moss, C. A. Lewis, R. Nisbet, D. M. Mannino, and G. S. Martin, “The role of infection and comorbidity: factors that influence disparities in sepsis,”Crit. Care Med., vol. 34, no. 10, pp. 2576–2582, Oct. 2006, doi: 10.1097/01.CCM.0000240646.29109.6A

work page doi:10.1097/01.ccm.0000240646.29109.6a 2006
[28]

Y. Zhu, A. Mueen, and E. Keogh, Admissible time series motif discovery with missing data,”IEEE Trans. Knowl. Data Eng., vol. 33, no. 11, pp. 3402–3415, Nov. 2021, doi: 10.1109/TKDE.2019.2948196

work page doi:10.1109/tkde.2019.2948196 2021
[29]

Procalcitonin, C-reactive protein, white blood cells and SOFA score in ICU: diagnosis and monitoring of sepsis,

G. P. Castelli, C. Pognani, M. Cita, A. Stuani, L. Sgarbi, and R. Pal- adini, “Procalcitonin, C-reactive protein, white blood cells and SOFA score in ICU: diagnosis and monitoring of sepsis,”Minerva Anestesiol., vol. 72, no. 1/2, pp. 69, Jan./Feb. 2006, doi: 10.1007/s12340-006-0012-6. 28

work page doi:10.1007/s12340-006-0012-6 2006
[30]

External validation com- plexities: a comparative study of late-onset sepsis prediction models across multiple clinical environments,

Z. Peng, J. S. Schouten, D. Silvertand, et al., “External validation com- plexities: a comparative study of late-onset sepsis prediction models across multiple clinical environments,”IEEE Trans. Biomed. Eng., to be published, 2025, doi: 10.1109/TBME.2025.3618080

work page doi:10.1109/tbme.2025.3618080 2025
[31]

Evaluation of lactate, white blood cell count, neutrophil count, procalcitonin and immature gran- ulocyte count as biomarkers for sepsis in emergency department pa- tients,

B. S. Karon, N. V. Tolan, A. M. Wockenfus, D. R. Block, N. A. Bau- mann, S. C. Bryant, and C. M. Clements, “Evaluation of lactate, white blood cell count, neutrophil count, procalcitonin and immature gran- ulocyte count as biomarkers for sepsis in emergency department pa- tients,”Clin. Biochem., vol. 50, no. 16-17, pp. 956–958, Oct. 2017, doi: 10.1016/j.c...

work page doi:10.1016/j.clinbiochem.2017.06.010 2017
[32]

Arterial blood pressure during early sepsis and outcome,

M. W. Dünser, J. Takala, H. Ulmer, V. D. Mayr, G. Luckner, S. Jochberger, and et al., “Arterial blood pressure during early sepsis and outcome,”Intensive Care Med., vol. 35, pp. 1225–1233, Jul. 2009, doi: 10.1007/s00134-009-1544-3

work page doi:10.1007/s00134-009-1544-3 2009
[33]

The reliability of the Glasgow Coma Scale: a systematic review,

F. C. Reith, R. Van den Brande, A. Synnot, R. Gruen, and A. I. Maas, “The reliability of the Glasgow Coma Scale: a systematic review,”In- tensive Care Med., vol. 42, pp. 3–15, Jan. 2016, doi: 10.1007/s00134- 015-4124-3

work page doi:10.1007/s00134- 2016
[34]

Reducedproductionofcreatininelimitsitsuseasmarkerofkidney injury in sepsis,

K. Doi, P. S. Yuen, C. Eisner, X. Hu, A. Leelahavanichkul, and R. A. Star, “Reducedproductionofcreatininelimitsitsuseasmarkerofkidney injury in sepsis,”J. Am. Soc. Nephrol., vol. 20, no. 6, pp. 1217–1221, Jun. 2009, doi: 10.1681/ASN.2008090955

work page doi:10.1681/asn.2008090955 2009
[35]

Prompt engineering as an important emerging skill for med- ical professionals: tutorial,

B. Meskó, “Prompt engineering as an important emerging skill for med- ical professionals: tutorial,”J. Med. Internet Res., vol. 25, pp. e50638, Mar. 2023, doi: 10.2196/50638

work page doi:10.2196/50638 2023
[36]

The eICU Col- laborative Research Database, a freely available multi-center database for critical care research.Scientific Data

T. J. Pollard, A. E. W. Johnson, J. D. Raffa, L. A. Celi, R. G. Mark, and O. Badawi, “The eICU Collaborative Research Database, a freely available multi-center database for critical care research,”Sci. Data, vol. 5, no. 1, pp. 1–13, Jan. 2018, doi: 10.1038/sdata.2018.178

work page doi:10.1038/sdata.2018.178 2018
[37]

Explaining and Harnessing Adversarial Examples

I. J. Goodfellow, J. Shlens, C. Szegedy, and et al., “Ex- plaining and harnessing adversarial examples,”arXiv Prepr., 2014, doi: 10.48550/arXiv.1412.6572, [Online]. Available: https://arxiv.org/abs/1412.6572. 29

work page internal anchor Pith review doi:10.48550/arxiv.1412.6572 2014
[38]

Machine learning for the predic- tion of sepsis: a systematic review and meta-analysis of diagnostic test accuracy,

L. M. Fleuren, T. L. Klausch, C. L. Zwager, L. J. Schoonmade, T. Guo, L. F. Roggeveen, and et al., “Machine learning for the predic- tion of sepsis: a systematic review and meta-analysis of diagnostic test accuracy,”Intensive Care Med., vol. 46, pp. 383–400, Feb. 2020, doi: 10.1007/s00134-019-05872-y

work page doi:10.1007/s00134-019-05872-y 2020
[39]

Machine learning based clinical prediction model for 1-year mortality in Sepsis patients with atrial fibrillation,

H. Meng, L. Guo, Y. Pan, B. Kong, W. Shuai, H. Huang, “Machine learning based clinical prediction model for 1-year mortality in Sepsis patients with atrial fibrillation,”Heliyon, vol. 10, pp. e38730, 2024, doi: 10.1016/j.heliyon.2024.e38730

work page doi:10.1016/j.heliyon.2024.e38730 2024
[40]

D. Dera, S. Ahmed, N. C. Bouaynaya, and G. Rasool, TRustworthy uncertainty propagation for sequential time-series analysis in RNNs,” IEEE Trans. Knowl. Data Eng., vol. 36, no. 2, pp. 882–896, Feb. 2024, doi: 10.1109/TKDE.2023.3288628

work page doi:10.1109/tkde.2023.3288628 2024
[41]

Sepsis definitions: time for change,

J.-L. Vincent, S. M. Opal, J. C. Marshall, and K. J. Tracey, “Sepsis definitions: time for change,”Lancet, vol. 381, no. 9868, pp. 774–775, Feb. 2013, doi: 10.1016/S0140-6736(12)61815-7

work page doi:10.1016/s0140-6736(12)61815-7 2013
[42]

Time- LLM: time series forecasting by reprogramming large language models,

M. Jin, S. Wang, L. Ma, Z. Chu, J. Y. Zhang, X. Shi, and et al., “Time- LLM: time series forecasting by reprogramming large language models,” arXiv Prepr., 2023, doi: 10.48550/arXiv.2310.02307, [Online]. Available: https://arxiv.org/abs/2310.02307

work page doi:10.48550/arxiv.2310.02307 2023
[43]

Inte- grating federated learning for improved counterfactual explanations in clinical decision support systems for sepsis therapy,

C. Düsing, P. Cimiano, S. Rehberg, C. Scherer, O. Kaup, C. Köster, S. Hellmich, D. Herrmann, K. L. Meier, S. Claßen, R. Borgstedt, “Inte- grating federated learning for improved counterfactual explanations in clinical decision support systems for sepsis therapy,”Artif. Intell. Med., vol. 157, pp. 102982, 2024, doi: 10.1016/j.artmed.2024.102982

work page doi:10.1016/j.artmed.2024.102982 2024
[44]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, and et al., “RoBERTa: a robustly optimized BERT pretraining approach,” arXiv Prepr., 2019, doi: 10.48550/arXiv.1907.11692, [Online]. Available: https://arxiv.org/abs/1907.11692

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1907.11692 2019
[45]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, and et al., “Deepseek-r1: incentivizing reasoning capability in LLMs via reinforce- ment learning,”arXiv Prepr., 2025, doi: 10.48550/arXiv.2501.12948, [Online]. Available: https://arxiv.org/abs/2501.12948. 30

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
[46]

A ranking-based cross- entropy loss for early classification of time series,

C. Sun, H. Li, M. Song, and S. Hong, “A ranking-based cross- entropy loss for early classification of time series,”IEEE Trans. Neu- ral Netw. Learn. Syst., vol. 35, no. 8, pp. 11194–11204, Aug. 2024, doi: 10.1109/TNNLS.2023.3250203

work page doi:10.1109/tnnls.2023.3250203 2024
[47]

Publicly Available Clinical BERT Embeddings

E. Alsentzer, J. R. Murphy, W. Boag, W. H. Weng, D. Jin, T. Naumann, and et al., “Publicly available clinical BERT embeddings,” arXiv Prepr., 2019, doi: 10.48550/arXiv.1904.03323, [Online]. Available: https://arxiv.org/abs/1904.03323

work page Pith review doi:10.48550/arxiv.1904.03323 2019
[48]

Automatic text summariza- tion of COVID-19 medical research articles using BERT and GPT-2,

V. Kieuvongngam, B. Tan, and Y. Niu, “Automatic text summariza- tion of COVID-19 medical research articles using BERT and GPT-2,” arXiv Prepr., 2020, doi: 10.48550/arXiv.2006.01997, [Online]. Available: https://arxiv.org/abs/2006.01997

work page doi:10.48550/arxiv.2006.01997 2020
[49]

Q. Li, D. Li, W. Nie, H. Jiao, Z. Wu, and et al., Temporal and spatial analysis in early sepsis prediction via causal disentangle- ments,”IEEE Trans. Knowl. Data Eng., to be published, 2025, doi: 10.1109/TKDE.2024.3401849

work page doi:10.1109/tkde.2024.3401849 2025
[50]

When scaling meets LLM finetuning: The effect of data, model and finetuning method.arXiv preprint arXiv:2402.17193, 2024

B. Zhang, Z. Liu, C. Cherry, and O. Firat, “When scaling meets LLM finetuning: the effect of data, model and finetuning method,” arXiv Prepr., 2024, doi: 10.48550/arXiv.2402.17193, [Online]. Available: https://arxiv.org/abs/2402.17193

work page doi:10.48550/arxiv.2402.17193 2024
[51]

Resnet in resnet: Generalizing residual architectures,

S. Targ, D. Almeida, and K. Lyman, “Resnet in resnet: generalizing residual architectures,”arXiv Prepr., 2016, doi: 10.48550/arXiv.1603.08029, [Online]. Available: https://arxiv.org/abs/1603.08029

work page doi:10.48550/arxiv.1603.08029 2016
[52]

Mistral: Dynamically managing power, performance, and adaptation cost in cloud infrastructures,

G. Jung, M. A. Hiltunen, K. R. Joshi, R. D. Schlichting, and C. Pu, “Mistral: Dynamically managing power, performance, and adaptation cost in cloud infrastructures,” inProc. 2010 IEEE 30th Int. Conf. Dis- trib. Comput. Syst., Minneapolis, MN, USA, Jun. 2010, pp. 62–73, doi: 10.1109/ICDCS.2010.34

work page doi:10.1109/icdcs.2010.34 2010
[53]

A method for the time-varying non- linear prediction of complex nonstationary biomedical signals,

L. Faes, K. H. Chon, and G. Nollo, “A method for the time-varying non- linear prediction of complex nonstationary biomedical signals,”IEEE Trans. Biomed. Eng., vol. 56, no. 2, pp. 206–209, Feb. 2009, doi: 10.1109/TBME.2008.2008726. 31

work page doi:10.1109/tbme.2008.2008726 2009
[54]

A robust fusion model for estimating respiratory rate from photoplethysmography and electrocardiography,

D. A. Birrenkott, M. A. F. Pimentel, P. J. Watkinson, and D. A. Clifton, “A robust fusion model for estimating respiratory rate from photoplethysmography and electrocardiography,”IEEE Trans. Biomed. Eng., vol. 65, no. 9, pp. 2033–2041, Sep. 2018, doi: 10.1109/TBME.2017.2778265

work page doi:10.1109/tbme.2017.2778265 2033
[55]

Stochastic complexity measures for phys- iological signal analysis,

I. A. Rezek and S. J. Roberts, “Stochastic complexity measures for phys- iological signal analysis,”IEEE Trans. Biomed. Eng., vol. 45, no. 9, pp. 1186–1191, Sep. 1998, doi: 10.1109/10.718287

work page doi:10.1109/10.718287 1998
[56]

Y. Li, J. Li, Y. Li, and Q. Li, Time series anomaly detection with adver- sarial reconstruction networks,”IEEE Trans. Knowl. Data Eng., vol. 35, no. 10, pp. 10245–10258, Oct. 2023, doi: 10.1109/TKDE.2021.3137861

work page doi:10.1109/tkde.2021.3137861 2023
[57]

Early detection of sepsis utilizing deep learning on electronic health record event sequences,

S. M. Lauritsen, M. E. Kaløra, E. L. Kongsgaard, K. M. Lauritsen, M. J. Jørgensen, J. Lange, B. Thiesson, “Early detection of sepsis utilizing deep learning on electronic health record event sequences,”Artif. Intell. Med., vol. 104, pp. 101820, 2020, doi: 10.1016/j.artmed.2020.101820. 32

work page doi:10.1016/j.artmed.2020.101820 2020