NPCNet: Navigator-Driven Pseudo Text for Deep Clustering of Early Sepsis Phenotyping

Charkkri Limbud; Kuan-Fu Chen; Pi-Ju Tsai; Yi-Ju Tseng

arxiv: 2602.03562 · v2 · submitted 2026-02-03 · 💻 cs.LG

NPCNet: Navigator-Driven Pseudo Text for Deep Clustering of Early Sepsis Phenotyping

Pi-Ju Tsai , Charkkri Limbud , Kuan-Fu Chen , Yi-Ju Tseng This is my paper

Pith reviewed 2026-05-16 08:05 UTC · model grok-4.3

classification 💻 cs.LG

keywords sepsis phenotypingdeep clusteringelectronic health recordspseudo textclinical knowledge infusionprecision medicinetemporal data modeling

0 comments

The pith

NPCNet clusters sepsis patients from EHRs into clinically meaningful phenotypes by converting records into pseudo texts guided by clinical knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces NPCNet to overcome how standard clustering of sepsis from electronic health records often aggregates or imputes data and thereby erases time-based disease patterns. It also notes that existing methods rarely build in medical constraints, so the resulting groups lack clear meaning for doctors. NPCNet first turns continuous measurements into discrete pseudo texts, combines them with static patient details to form embeddings, then uses a target navigator to add clinical knowledge through auxiliary tasks that steer the clusters toward real-world relevance. An iterative clustering operator then refines the groups under those constraints. If the approach holds, it supplies a route to subgroup patients early enough for tailored treatments instead of uniform sepsis protocols.

Core claim

NPCNet is a clustering network with a text embedding generator that discretizes continuous EHR measurements into pseudo texts integrated with static variables, a target navigator that infuses clinical knowledge via auxiliary tasks to align results with sepsis phenotypes, and a clustering operator that iteratively refines centroids and representations under domain-driven constraints, yielding superior results on both internal clustering benchmarks and clinical validity metrics.

What carries the argument

The target navigator, which infuses clinical knowledge into embeddings through auxiliary tasks to constrain clustering results toward clinically significant sepsis phenotypes.

If this is right

Clustering results align more closely with clinical significance than unconstrained methods.
Performance exceeds baselines on both statistical clustering metrics and clinical validity measures on public datasets.
The method supplies a practical pathway for identifying distinct sepsis phenotypes to support precision treatment strategies.
Temporal trajectories in the data are preserved better than in aggregation or imputation approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pseudo-text and navigator structure could be tested on phenotyping other time-varying syndromes such as acute respiratory distress or heart failure.
Embedding clinical constraints directly may shorten the usual cycle of post-hoc validation that follows purely data-driven clustering.
Real-time versions of the architecture might be examined for continuous monitoring systems where early phenotype shifts could trigger intervention.

Load-bearing premise

Discretizing continuous clinical measurements into pseudo texts and infusing clinical knowledge via auxiliary tasks will produce phenotypes that are both statistically coherent and clinically actionable without distorting key temporal trajectories.

What would settle it

A prospective study in which patients stratified by NPCNet phenotypes show no difference in treatment response or outcomes compared with standard care, or in which the derived clusters fail to separate on established clinical markers such as mortality or organ-failure scores.

Figures

Figures reproduced from arXiv: 2602.03562 by Charkkri Limbud, Kuan-Fu Chen, Pi-Ju Tsai, Yi-Ju Tseng.

**Figure 1.** Figure 1: The overview of NPCNet. Through the text embedding generator, we first bin the value of time-varying variables into bin indices according to the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: The binning process of time-varying variables to generate the pseudo [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: An example of the input for NPCNet. Finally, we sum up (1) the pseudo text embedding P with order encoding O, and (2) the static embedding S using different weights, resulting in the input x ∈ R l×d : x = w × (P + O) + (1 − w) × S, where w ∈ [0, 1] is a hyperparameter that controls the contribution of static and time-varying variables. The example is shown in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Abnormal clinical variables, grouped into eight organ systems, among the sepsis computable phenotypes. The ribbon connects from a phenotype to [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Multivariable logistic regression on in-hospital mortality with phenotypes. [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: SOFA trajectories during the 18 hours following phenotype derivation by NPCNet, stratified by the SOFA score at six hours after ICU admission. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Pairwise comparisons of SOFA trajectories between four phenotypes [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Ablation study of navigators. Pairwise comparisons of SOFA trajec [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

read the original abstract

Electronic Health Records (EHRs) provide high-dimensional temporal data essential for patient modeling; however, conventional algorithmic approaches often rely on data aggregation or imputation, which distorts temporal disease trajectories. Such computational limitations are particularly critical in sepsis, a heterogeneous syndrome where clustering-based stratification plays a key role in identifying clinically distinct phenotypes for precise treatment strategies. Furthermore, existing clustering processes seldom incorporate domain-driven constraints, often resulting in phenotypes that lack clear clinical distinction. We propose a novel clustering network, NPCNet, that comprises a text embedding generator, a clustering operator, and a target navigator. We first transform EHRs into pseudo texts by discretizing continuous clinical measurements, then integrate them with static variables to construct the embeddings. The target navigator then infuses clinical knowledge into the embeddings through auxiliary tasks, constraining clustering results to better align sepsis phenotypes with clinical significance. Finally, the clustering operator employs an iterative refinement mechanism to jointly optimize phenotype centroids and patient representations under domain-driven constraints. Extensive experiments on public datasets validate that NPCNet achieves superior performance on both internal clustering benchmarks and clinical validity metrics, offering a viable pathway for precision treatment strategies in the management of sepsis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NPCNet turns EHRs into pseudo-text for sepsis clustering and adds a navigator to pull in clinical constraints, but the discretization step has no supporting checks.

read the letter

The main takeaway is that NPCNet discretizes EHR data into pseudo-text for embedding, then uses a target navigator to infuse clinical knowledge into the clustering process for sepsis phenotypes. What the paper does well is address a real clinical need. Sepsis is heterogeneous, and better phenotypes could guide treatment. The combination of text-like representation with auxiliary tasks to enforce domain constraints is a reasonable way to move beyond pure data-driven clusters. The iterative refinement sounds like a practical optimization step. The architecture is new enough in its specific coupling of these elements for this problem. Where it falls short is on the discretization. The abstract does not specify how the bin boundaries are chosen or provide any sensitivity analysis. If the bins are too coarse, short-term changes in patient state get lost, which could undermine the claim that trajectories are preserved. No comparison to a non-discretized baseline is mentioned either. That makes the superiority on internal and clinical metrics harder to accept at face value. The experiments are described as extensive on public datasets, but without seeing the actual numbers or error bars, it's difficult to judge the effect sizes. This work is aimed at researchers in critical care informatics who are exploring ways to incorporate clinical knowledge into unsupervised learning. It could be useful for someone building similar systems, but probably not ready for direct application. I would send it for peer review because the problem is important and the idea has potential, though it clearly needs more validation on the representation choices.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes NPCNet, a deep clustering framework for early sepsis phenotyping from EHR temporal data. It converts continuous clinical measurements into discrete pseudo-text tokens via discretization, combines them with static variables for embeddings, employs a target navigator to infuse clinical knowledge through auxiliary tasks that constrain the phenotypes, and uses an iterative clustering operator to jointly optimize centroids and representations. The central claim is that this yields superior performance on internal clustering benchmarks and clinical validity metrics compared to prior methods, enabling better precision treatment strategies for heterogeneous sepsis.

Significance. If the empirical claims hold after addressing the discretization concerns, the work could meaningfully advance clinical phenotyping by incorporating domain-driven constraints into deep clustering of temporal EHR data. Sepsis stratification remains a high-impact problem, and the navigator-plus-pseudo-text idea offers a concrete mechanism for aligning statistical clusters with clinical actionability. The approach is novel in its explicit use of auxiliary clinical tasks to regularize clustering without direct circular reuse of the objective, and the iterative refinement mechanism is a standard but well-motivated choice here.

major comments (3)

[§3.1] §3.1 (discretization and pseudo-text generation): the conversion of continuous vital signs and labs into discrete tokens is described at a high level without specifying bin boundaries, the number of bins, or the selection procedure. This step is load-bearing for the claim that temporal trajectories are preserved; without sensitivity analysis on bin count/boundaries or an ablation against a continuous time-series embedding baseline, it is impossible to rule out that reported gains on clustering metrics are artifacts of the chosen representation rather than the navigator or clustering operator.
[§4] §4 (experiments and ablations): the superiority on internal clustering benchmarks and clinical validity metrics is asserted, yet the text provides no quantitative tables with exact metric values, standard deviations, or ablation results isolating the navigator's auxiliary tasks versus the discretization alone. A direct comparison to a non-discretized continuous baseline is required to substantiate that the pseudo-text step does not distort short-term dynamics that distinguish sepsis subtypes.
[§3.2] §3.2 (target navigator and auxiliary tasks): the formulation of the auxiliary clinical-knowledge tasks and their loss terms is not given in sufficient detail to verify independence from the main clustering objective. If the auxiliary losses inadvertently reuse clustering-derived signals, the reported clinical alignment could be circular; explicit equations for these losses and a statement of their parameter independence from the phenotype centroids are needed.

minor comments (2)

The abstract would be strengthened by including at least one key quantitative result (e.g., ARI or NMI improvement) to support the superiority claim.
Notation for the embedding generator and navigator components should be introduced with a single consistent symbol table to improve readability across sections.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We appreciate the referee's identification of areas needing clarification and additional validation. We address each major comment below and will incorporate the requested details and experiments in the revised version.

read point-by-point responses

Referee: [§3.1] §3.1 (discretization and pseudo-text generation): the conversion of continuous vital signs and labs into discrete tokens is described at a high level without specifying bin boundaries, the number of bins, or the selection procedure. This step is load-bearing for the claim that temporal trajectories are preserved; without sensitivity analysis on bin count/boundaries or an ablation against a continuous time-series embedding baseline, it is impossible to rule out that reported gains on clustering metrics are artifacts of the chosen representation rather than the navigator or clustering operator.

Authors: We agree that additional detail on discretization is necessary. In the revised manuscript, we will specify the binning procedure (equal-frequency discretization into 5 bins per variable, with boundaries derived from training-set quantiles), the exact number of bins, and the selection rationale. We will also add a sensitivity analysis across bin counts (3-7) and an ablation comparing NPCNet to a continuous baseline that embeds raw temporal data directly via a GRU encoder without pseudo-text conversion. This will confirm that performance gains arise from the navigator and iterative clustering rather than the discretization step alone. revision: yes
Referee: [§4] §4 (experiments and ablations): the superiority on internal clustering benchmarks and clinical validity metrics is asserted, yet the text provides no quantitative tables with exact metric values, standard deviations, or ablation results isolating the navigator's auxiliary tasks versus the discretization alone. A direct comparison to a non-discretized continuous baseline is required to substantiate that the pseudo-text step does not distort short-term dynamics that distinguish sepsis subtypes.

Authors: We acknowledge the need for fuller empirical reporting. The revised manuscript will include complete tables with exact metric values (NMI, ARI, silhouette score, and clinical validity measures) reported as means ± standard deviations over 5 random seeds. We will add ablations that isolate the navigator's auxiliary tasks (with/without them) and a direct comparison to a non-discretized continuous time-series baseline. These results will demonstrate that the pseudo-text representation preserves distinguishing short-term dynamics while the navigator provides the primary alignment benefit. revision: yes
Referee: [§3.2] §3.2 (target navigator and auxiliary tasks): the formulation of the auxiliary clinical-knowledge tasks and their loss terms is not given in sufficient detail to verify independence from the main clustering objective. If the auxiliary losses inadvertently reuse clustering-derived signals, the reported clinical alignment could be circular; explicit equations for these losses and a statement of their parameter independence from the phenotype centroids are needed.

Authors: We will expand §3.2 with explicit equations for the auxiliary losses (e.g., cross-entropy on clinical outcome prediction and cosine alignment with external knowledge embeddings). These tasks draw from independent clinical labels and knowledge bases that do not incorporate clustering centroids. The revision will include a statement confirming that navigator parameters are updated via a separate optimizer path with no direct dependence on or reuse of phenotype centroid signals, ensuring the auxiliary objectives remain non-circular. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The NPCNet architecture is presented as a composition of three distinct modules (text embedding generator from discretized EHRs, target navigator via auxiliary clinical-knowledge tasks, and clustering operator with iterative refinement) whose interactions are described procedurally rather than through self-referential equations. No quantity is defined in terms of itself, no fitted parameter is relabeled as a prediction, and no uniqueness theorem or ansatz is imported via self-citation to force the central design choices. The auxiliary tasks are explicitly positioned as external clinical constraints, and the reported performance gains rest on experimental validation against public datasets rather than on any reduction of the method to its own inputs. Discretization is treated as a preprocessing decision whose validity is left to empirical checks, not as a derived result that loops back to the clustering objective.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract supplies insufficient detail for exhaustive ledger; main unstated premises concern information preservation under discretization and the independence of clinical auxiliary tasks from the clustering loss.

free parameters (1)

discretization bin boundaries
Continuous measurements must be mapped to discrete tokens; exact thresholds are not specified and would require fitting or domain choice.

axioms (1)

domain assumption Discretization of temporal clinical variables into pseudo-text preserves sufficient information for phenotype discovery
Invoked when converting EHRs to text embeddings; if false, downstream clustering would lose critical trajectory signals.

pith-pipeline@v0.9.0 · 5515 in / 1239 out tokens · 43766 ms · 2026-05-16T08:05:42.653412+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We first transform EHRs into pseudo texts by discretizing continuous clinical measurements... binning task... quantiles of the training set... [VARIABLE][BIN] format
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

L = λ1 * Lrec + λ2 * Lclustering + λ3 * Lnavigator
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Trajectory Divergence Index (TDI) ... SOFA trajectories ... GAMM

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

An evaluation of time series summary statistics as features for clinical prediction tasks,

C. Guo, M. Lu, and J. Chen, “An evaluation of time series summary statistics as features for clinical prediction tasks,”BMC medical infor- matics and decision making, vol. 20, pp. 1–20, 2020

work page 2020
[2]

Informative missingness in electronic health record systems: the curse of knowing,

R. H. Groenwold, “Informative missingness in electronic health record systems: the curse of knowing,”Diagnostic and prognostic research, vol. 4, no. 1, p. 8, 2020

work page 2020
[3]

Deep learning for temporal data represen- tation in electronic health records: A systematic review of challenges and methodologies,

F. Xie, H. Yuan, Y . Ning, M. E. H. Ong, M. Feng, W. Hsu, B. Chakraborty, and N. Liu, “Deep learning for temporal data represen- tation in electronic health records: A systematic review of challenges and methodologies,”Journal of biomedical informatics, vol. 126, p. 103980, 2022

work page 2022
[4]

The third international consensus definitions for sepsis and septic shock (sepsis-3),

M. Singer, C. S. Deutschman, C. W. Seymour, M. Shankar-Hari, D. Annane, M. Bauer, R. Bellomo, G. R. Bernard, J.-D. Chiche, C. M. Coopersmithet al., “The third international consensus definitions for sepsis and septic shock (sepsis-3),”Jama, vol. 315, no. 8, pp. 801–810, 2016

work page 2016
[5]

Global, regional, and national sepsis incidence and mortality, 1990–2017: analysis for the global burden of disease study,

K. E. Rudd, S. C. Johnson, K. M. Agesa, K. A. Shackelford, D. Tsoi, D. R. Kievlan, D. V . Colombara, K. S. Ikuta, N. Kissoon, S. Finfer et al., “Global, regional, and national sepsis incidence and mortality, 1990–2017: analysis for the global burden of disease study,”The Lancet, vol. 395, no. 10219, pp. 200–211, 2020. 13

work page 1990
[6]

Under- standing and enhancing sepsis survivorship. priorities for research and practice,

H. C. Prescott, T. J. Iwashyna, B. Blackwood, T. Calandra, L. L. Chlan, K. Choong, B. Connolly, P. Dark, L. Ferrucci, S. Finferet al., “Under- standing and enhancing sepsis survivorship. priorities for research and practice,”American journal of respiratory and critical care medicine, vol. 200, no. 8, pp. 972–981, 2019

work page 2019
[7]

Improving long-term outcomes after sepsis,

H. C. Prescott and D. K. Costa, “Improving long-term outcomes after sepsis,”Critical care clinics, vol. 34, no. 1, p. 175, 2017

work page 2017
[8]

Surviving sepsis campaign: international guidelines for manage- ment of sepsis and septic shock 2021,

L. Evans, A. Rhodes, W. Alhazzani, M. Antonelli, C. M. Coopersmith, C. French, F. R. Machado, L. Mcintyre, M. Ostermann, H. C. Prescott et al., “Surviving sepsis campaign: international guidelines for manage- ment of sepsis and septic shock 2021,”Critical care medicine, vol. 49, no. 11, pp. e1063–e1143, 2021

work page 2021
[9]

Time to treatment and mortality during mandated emergency care for sepsis,

C. W. Seymour, F. Gesten, H. C. Prescott, M. E. Friedrich, T. J. Iwashyna, G. S. Phillips, S. Lemeshow, T. Osborn, K. M. Terry, and M. M. Levy, “Time to treatment and mortality during mandated emergency care for sepsis,”New England Journal of Medicine, vol. 376, no. 23, pp. 2235–2244, 2017

work page 2017
[10]

The pathophysiology of sepsis and precision- medicine-based immunotherapy,

E. J. Giamarellos-Bourboulis, A. C. Aschenbrenner, M. Bauer, C. Bock, T. Calandra, I. Gat-Viks, E. Kyriazopoulou, M. Lupse, G. Monneret, P. Pickkerset al., “The pathophysiology of sepsis and precision- medicine-based immunotherapy,”Nature immunology, vol. 25, no. 1, pp. 19–28, 2024

work page 2024
[11]

Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis,

C. W. Seymour, J. N. Kennedy, S. Wang, C.-C. H. Chang, C. F. Elliott, Z. Xu, S. Berry, G. Clermont, G. Cooper, H. Gomezet al., “Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis,”Jama, vol. 321, no. 20, pp. 2003–2017, 2019

work page 2003
[12]

Sepsis subphenotyping based on organ dysfunction trajectory,

Z. Xu, C. Mao, C. Su, H. Zhang, I. Siempos, L. K. Torres, D. Pan, Y . Luo, E. J. Schenck, and F. Wang, “Sepsis subphenotyping based on organ dysfunction trajectory,”Critical Care, vol. 26, no. 1, p. 197, 2022

work page 2022
[13]

Identifying sepsis subpheno- types via time-aware multi-modal auto-encoder,

C. Yin, R. Liu, D. Zhang, and P. Zhang, “Identifying sepsis subpheno- types via time-aware multi-modal auto-encoder,” inProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 862–872

work page 2020
[14]

Sur la division des corps mat ´eriels en parties,

H. Steinhaus, “Sur la division des corps mat ´eriels en parties,”Bull. Acad. Pol. Sci., Cl. III, vol. 4, pp. 801–804, 1957

work page 1957
[15]

Unsupervised deep embedding for clustering analysis,

J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” inInternational conference on machine learning. PMLR, 2016, pp. 478–487

work page 2016
[16]

Deep learning predic- tion models based on ehr trajectories: A systematic review,

A. Amirahmadi, M. Ohlsson, and K. Etminani, “Deep learning predic- tion models based on ehr trajectories: A systematic review,”Journal of biomedical informatics, vol. 144, p. 104430, 2023

work page 2023
[17]

Temporal phenotype matrix engineering for electronic health records–enhancing coronary artery disease prediction,

K.-H. Liu, C.-Y . Chiang, H.-Y . Wang, and Y .-J. Tseng, “Temporal phenotype matrix engineering for electronic health records–enhancing coronary artery disease prediction,” in2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 2023, pp. 1–4

work page 2023
[18]

K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Hem- ing, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,”Information Sciences, vol. 622, pp. 178–210, 2023

work page 2023
[19]

Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,

A. A. Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,”PeerJ Computer Science, vol. 10, p. e2286, 2024

work page 2024
[20]

Towards k- means-friendly spaces: Simultaneous deep learning and clustering,

B. Yang, X. Fu, N. D. Sidiropoulos, and M. Hong, “Towards k- means-friendly spaces: Simultaneous deep learning and clustering,” in international conference on machine learning. PMLR, 2017, pp. 3861– 3870

work page 2017
[21]

Birds have four legs?! numersense: Probing numerical commonsense knowledge of pre-trained language models,

B. Y . Lin, S. Lee, R. Khanna, and X. Ren, “Birds have four legs?! numersense: Probing numerical commonsense knowledge of pre-trained language models,”arXiv preprint arXiv:2005.00683, 2020

work page arXiv 2005
[22]

Tabllm: Few-shot classification of tabular data with large language models,

S. Hegselmann, A. Buendia, H. Lang, M. Agrawal, X. Jiang, and D. Sontag, “Tabllm: Few-shot classification of tabular data with large language models,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2023, pp. 5549–5581

work page 2023
[23]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

work page 2017
[24]

Exbehrt: Extended transformer for electronic health records,

M. Rupp, O. Peter, and T. Pattipaka, “Exbehrt: Extended transformer for electronic health records,” inInternational Workshop on Trustworthy Machine Learning for Healthcare. Springer, 2023, pp. 73–84

work page 2023
[25]

navidcn: Navigator- guided multi-modal deep clustering for sepsis phenotyping in early icu admission,

P.-J. Tsai, K.-F. Chen, C. Limbud, and Y .-J. Tseng, “navidcn: Navigator- guided multi-modal deep clustering for sepsis phenotyping in early icu admission,” in2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2025, pp. 1–7

work page 2025
[26]

Vincent, R

J.-L. Vincent, R. Moreno, J. Takala, S. Willatts, A. De Mendonc ¸a, H. Bruining, C. Reinhart, P. Suter, and L. G. Thijs, “The sofa (sepsis- related organ failure assessment) score to describe organ dysfunction/- failure: On behalf of the working group on sepsis-related problems of the european society of intensive care medicine (see contributors to the pr...

work page 1996
[27]

Serial evaluation of the sofa score to predict outcome in critically ill patients,

F. L. Ferreira, D. P. Bota, A. Bross, C. M ´elot, and J.-L. Vincent, “Serial evaluation of the sofa score to predict outcome in critically ill patients,” Jama, vol. 286, no. 14, pp. 1754–1758, 2001

work page 2001
[28]

Mimic- iv, a freely accessible electronic health record dataset,

A. E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, T. J. Pollard, S. Hao, B. Moody, B. Gowet al., “Mimic- iv, a freely accessible electronic health record dataset,”Scientific data, vol. 10, no. 1, p. 1, 2023

work page 2023
[29]

Developing a new definition and assessing new clinical criteria for septic shock: for the third international consensus definitions for sepsis and septic shock (sepsis-3),

M. Shankar-Hari, G. S. Phillips, M. L. Levy, C. W. Seymour, V . X. Liu, C. S. Deutschman, D. C. Angus, G. D. Rubenfeld, M. Singeret al., “Developing a new definition and assessing new clinical criteria for septic shock: for the third international consensus definitions for sepsis and septic shock (sepsis-3),”Jama, vol. 315, no. 8, pp. 775–787, 2016

work page 2016
[30]

Identification of subclasses of sepsis that showed different clinical outcomes and responses to amount of fluid resuscitation: a latent profile analysis,

Z. Zhang, G. Zhang, H. Goyal, L. Mo, and Y . Hong, “Identification of subclasses of sepsis that showed different clinical outcomes and responses to amount of fluid resuscitation: a latent profile analysis,” Critical Care, vol. 22, pp. 1–11, 2018

work page 2018
[31]

Delayed vasopressor initiation is associated with increased mortality in patients with septic shock,

D. C. Hidalgo, J. Patel, D. Masic, D. Park, and M. A. Rech, “Delayed vasopressor initiation is associated with increased mortality in patients with septic shock,”Journal of Critical Care, vol. 55, pp. 145–148, 2020

work page 2020
[32]

The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care,

M. Komorowski, L. A. Celi, O. Badawi, A. C. Gordon, and A. A. Faisal, “The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care,”Nature medicine, vol. 24, no. 11, pp. 1716– 1720, 2018

work page 2018
[33]

The eicu collaborative research database, a freely available multi-center database for critical care research,

T. J. Pollard, A. E. Johnson, J. D. Raffa, L. A. Celi, R. G. Mark, and O. Badawi, “The eicu collaborative research database, a freely available multi-center database for critical care research,”Scientific data, vol. 5, no. 1, pp. 1–13, 2018

work page 2018
[34]

Predicting sepsis using deep learning across international sites: a retrospective development and validation study,

M. Moor, N. Bennett, D. Ple ˇcko, M. Horn, B. Rieck, N. Meinshausen, P. B¨uhlmann, and K. Borgwardt, “Predicting sepsis using deep learning across international sites: a retrospective development and validation study,”EClinicalMedicine, vol. 62, 2023

work page 2023
[35]

Consensusclusterplus: a class discovery tool with confidence assessments and item tracking,

M. D. Wilkerson and D. N. Hayes, “Consensusclusterplus: a class discovery tool with confidence assessments and item tracking,”Bioin- formatics, vol. 26, no. 12, pp. 1572–1573, 2010

work page 2010
[36]

Deep k-means: Jointly clus- tering with k-means and learning representations,

M. M. Fard, T. Thonet, and E. Gaussier, “Deep k-means: Jointly clus- tering with k-means and learning representations,”Pattern Recognition Letters, vol. 138, pp. 185–192, 2020

work page 2020
[37]

mice: Multivariate impu- tation by chained equations in r,

S. Van Buuren and K. Groothuis-Oudshoorn, “mice: Multivariate impu- tation by chained equations in r,”Journal of statistical software, vol. 45, pp. 1–67, 2011

work page 2011
[38]

” circlize

Z. Gu, L. Gu, R. Eils, M. Schlesner, and B. Brors, “” circlize” implements and enhances circular visualization in r,” 2014

work page 2014
[39]

The sofa score—development, utility and challenges of accurate assessment in clinical trials,

S. Lambden, P. F. Laterre, M. M. Levy, and B. Francois, “The sofa score—development, utility and challenges of accurate assessment in clinical trials,”Critical Care, vol. 23, pp. 1–9, 2019

work page 2019
[40]

Vasopressor therapy in critically ill patients with shock,

J. A. Russell, “Vasopressor therapy in critically ill patients with shock,” Intensive care medicine, vol. 45, no. 11, pp. 1503–1517, 2019

work page 2019
[41]

Current use of vasopressors in septic shock,

T. W. Scheeren, J. Bakker, D. De Backer, D. Annane, P. Asfar, E. C. Boerma, M. Cecconi, A. Dubin, M. W. D ¨unser, J. Duranteauet al., “Current use of vasopressors in septic shock,”Annals of intensive care, vol. 9, pp. 1–12, 2019

work page 2019
[42]

Vasopressors in septic shock: which, when, and how much?

R. Shi, O. Hamzaoui, N. De Vita, X. Monnet, and J.-L. Teboul, “Vasopressors in septic shock: which, when, and how much?”Annals of translational medicine, vol. 8, no. 12, p. 794, 2020

work page 2020

[1] [1]

An evaluation of time series summary statistics as features for clinical prediction tasks,

C. Guo, M. Lu, and J. Chen, “An evaluation of time series summary statistics as features for clinical prediction tasks,”BMC medical infor- matics and decision making, vol. 20, pp. 1–20, 2020

work page 2020

[2] [2]

Informative missingness in electronic health record systems: the curse of knowing,

R. H. Groenwold, “Informative missingness in electronic health record systems: the curse of knowing,”Diagnostic and prognostic research, vol. 4, no. 1, p. 8, 2020

work page 2020

[3] [3]

Deep learning for temporal data represen- tation in electronic health records: A systematic review of challenges and methodologies,

F. Xie, H. Yuan, Y . Ning, M. E. H. Ong, M. Feng, W. Hsu, B. Chakraborty, and N. Liu, “Deep learning for temporal data represen- tation in electronic health records: A systematic review of challenges and methodologies,”Journal of biomedical informatics, vol. 126, p. 103980, 2022

work page 2022

[4] [4]

The third international consensus definitions for sepsis and septic shock (sepsis-3),

M. Singer, C. S. Deutschman, C. W. Seymour, M. Shankar-Hari, D. Annane, M. Bauer, R. Bellomo, G. R. Bernard, J.-D. Chiche, C. M. Coopersmithet al., “The third international consensus definitions for sepsis and septic shock (sepsis-3),”Jama, vol. 315, no. 8, pp. 801–810, 2016

work page 2016

[5] [5]

Global, regional, and national sepsis incidence and mortality, 1990–2017: analysis for the global burden of disease study,

K. E. Rudd, S. C. Johnson, K. M. Agesa, K. A. Shackelford, D. Tsoi, D. R. Kievlan, D. V . Colombara, K. S. Ikuta, N. Kissoon, S. Finfer et al., “Global, regional, and national sepsis incidence and mortality, 1990–2017: analysis for the global burden of disease study,”The Lancet, vol. 395, no. 10219, pp. 200–211, 2020. 13

work page 1990

[6] [6]

Under- standing and enhancing sepsis survivorship. priorities for research and practice,

H. C. Prescott, T. J. Iwashyna, B. Blackwood, T. Calandra, L. L. Chlan, K. Choong, B. Connolly, P. Dark, L. Ferrucci, S. Finferet al., “Under- standing and enhancing sepsis survivorship. priorities for research and practice,”American journal of respiratory and critical care medicine, vol. 200, no. 8, pp. 972–981, 2019

work page 2019

[7] [7]

Improving long-term outcomes after sepsis,

H. C. Prescott and D. K. Costa, “Improving long-term outcomes after sepsis,”Critical care clinics, vol. 34, no. 1, p. 175, 2017

work page 2017

[8] [8]

Surviving sepsis campaign: international guidelines for manage- ment of sepsis and septic shock 2021,

L. Evans, A. Rhodes, W. Alhazzani, M. Antonelli, C. M. Coopersmith, C. French, F. R. Machado, L. Mcintyre, M. Ostermann, H. C. Prescott et al., “Surviving sepsis campaign: international guidelines for manage- ment of sepsis and septic shock 2021,”Critical care medicine, vol. 49, no. 11, pp. e1063–e1143, 2021

work page 2021

[9] [9]

Time to treatment and mortality during mandated emergency care for sepsis,

C. W. Seymour, F. Gesten, H. C. Prescott, M. E. Friedrich, T. J. Iwashyna, G. S. Phillips, S. Lemeshow, T. Osborn, K. M. Terry, and M. M. Levy, “Time to treatment and mortality during mandated emergency care for sepsis,”New England Journal of Medicine, vol. 376, no. 23, pp. 2235–2244, 2017

work page 2017

[10] [10]

The pathophysiology of sepsis and precision- medicine-based immunotherapy,

E. J. Giamarellos-Bourboulis, A. C. Aschenbrenner, M. Bauer, C. Bock, T. Calandra, I. Gat-Viks, E. Kyriazopoulou, M. Lupse, G. Monneret, P. Pickkerset al., “The pathophysiology of sepsis and precision- medicine-based immunotherapy,”Nature immunology, vol. 25, no. 1, pp. 19–28, 2024

work page 2024

[11] [11]

Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis,

C. W. Seymour, J. N. Kennedy, S. Wang, C.-C. H. Chang, C. F. Elliott, Z. Xu, S. Berry, G. Clermont, G. Cooper, H. Gomezet al., “Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis,”Jama, vol. 321, no. 20, pp. 2003–2017, 2019

work page 2003

[12] [12]

Sepsis subphenotyping based on organ dysfunction trajectory,

Z. Xu, C. Mao, C. Su, H. Zhang, I. Siempos, L. K. Torres, D. Pan, Y . Luo, E. J. Schenck, and F. Wang, “Sepsis subphenotyping based on organ dysfunction trajectory,”Critical Care, vol. 26, no. 1, p. 197, 2022

work page 2022

[13] [13]

Identifying sepsis subpheno- types via time-aware multi-modal auto-encoder,

C. Yin, R. Liu, D. Zhang, and P. Zhang, “Identifying sepsis subpheno- types via time-aware multi-modal auto-encoder,” inProceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 862–872

work page 2020

[14] [14]

Sur la division des corps mat ´eriels en parties,

H. Steinhaus, “Sur la division des corps mat ´eriels en parties,”Bull. Acad. Pol. Sci., Cl. III, vol. 4, pp. 801–804, 1957

work page 1957

[15] [15]

Unsupervised deep embedding for clustering analysis,

J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” inInternational conference on machine learning. PMLR, 2016, pp. 478–487

work page 2016

[16] [16]

Deep learning predic- tion models based on ehr trajectories: A systematic review,

A. Amirahmadi, M. Ohlsson, and K. Etminani, “Deep learning predic- tion models based on ehr trajectories: A systematic review,”Journal of biomedical informatics, vol. 144, p. 104430, 2023

work page 2023

[17] [17]

Temporal phenotype matrix engineering for electronic health records–enhancing coronary artery disease prediction,

K.-H. Liu, C.-Y . Chiang, H.-Y . Wang, and Y .-J. Tseng, “Temporal phenotype matrix engineering for electronic health records–enhancing coronary artery disease prediction,” in2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 2023, pp. 1–4

work page 2023

[18] [18]

K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Hem- ing, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,”Information Sciences, vol. 622, pp. 178–210, 2023

work page 2023

[19] [19]

Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,

A. A. Wani, “Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions,”PeerJ Computer Science, vol. 10, p. e2286, 2024

work page 2024

[20] [20]

Towards k- means-friendly spaces: Simultaneous deep learning and clustering,

B. Yang, X. Fu, N. D. Sidiropoulos, and M. Hong, “Towards k- means-friendly spaces: Simultaneous deep learning and clustering,” in international conference on machine learning. PMLR, 2017, pp. 3861– 3870

work page 2017

[21] [21]

Birds have four legs?! numersense: Probing numerical commonsense knowledge of pre-trained language models,

B. Y . Lin, S. Lee, R. Khanna, and X. Ren, “Birds have four legs?! numersense: Probing numerical commonsense knowledge of pre-trained language models,”arXiv preprint arXiv:2005.00683, 2020

work page arXiv 2005

[22] [22]

Tabllm: Few-shot classification of tabular data with large language models,

S. Hegselmann, A. Buendia, H. Lang, M. Agrawal, X. Jiang, and D. Sontag, “Tabllm: Few-shot classification of tabular data with large language models,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2023, pp. 5549–5581

work page 2023

[23] [23]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

work page 2017

[24] [24]

Exbehrt: Extended transformer for electronic health records,

M. Rupp, O. Peter, and T. Pattipaka, “Exbehrt: Extended transformer for electronic health records,” inInternational Workshop on Trustworthy Machine Learning for Healthcare. Springer, 2023, pp. 73–84

work page 2023

[25] [25]

navidcn: Navigator- guided multi-modal deep clustering for sepsis phenotyping in early icu admission,

P.-J. Tsai, K.-F. Chen, C. Limbud, and Y .-J. Tseng, “navidcn: Navigator- guided multi-modal deep clustering for sepsis phenotyping in early icu admission,” in2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2025, pp. 1–7

work page 2025

[26] [26]

Vincent, R

J.-L. Vincent, R. Moreno, J. Takala, S. Willatts, A. De Mendonc ¸a, H. Bruining, C. Reinhart, P. Suter, and L. G. Thijs, “The sofa (sepsis- related organ failure assessment) score to describe organ dysfunction/- failure: On behalf of the working group on sepsis-related problems of the european society of intensive care medicine (see contributors to the pr...

work page 1996

[27] [27]

Serial evaluation of the sofa score to predict outcome in critically ill patients,

F. L. Ferreira, D. P. Bota, A. Bross, C. M ´elot, and J.-L. Vincent, “Serial evaluation of the sofa score to predict outcome in critically ill patients,” Jama, vol. 286, no. 14, pp. 1754–1758, 2001

work page 2001

[28] [28]

Mimic- iv, a freely accessible electronic health record dataset,

A. E. Johnson, L. Bulgarelli, L. Shen, A. Gayles, A. Shammout, S. Horng, T. J. Pollard, S. Hao, B. Moody, B. Gowet al., “Mimic- iv, a freely accessible electronic health record dataset,”Scientific data, vol. 10, no. 1, p. 1, 2023

work page 2023

[29] [29]

Developing a new definition and assessing new clinical criteria for septic shock: for the third international consensus definitions for sepsis and septic shock (sepsis-3),

M. Shankar-Hari, G. S. Phillips, M. L. Levy, C. W. Seymour, V . X. Liu, C. S. Deutschman, D. C. Angus, G. D. Rubenfeld, M. Singeret al., “Developing a new definition and assessing new clinical criteria for septic shock: for the third international consensus definitions for sepsis and septic shock (sepsis-3),”Jama, vol. 315, no. 8, pp. 775–787, 2016

work page 2016

[30] [30]

Identification of subclasses of sepsis that showed different clinical outcomes and responses to amount of fluid resuscitation: a latent profile analysis,

Z. Zhang, G. Zhang, H. Goyal, L. Mo, and Y . Hong, “Identification of subclasses of sepsis that showed different clinical outcomes and responses to amount of fluid resuscitation: a latent profile analysis,” Critical Care, vol. 22, pp. 1–11, 2018

work page 2018

[31] [31]

Delayed vasopressor initiation is associated with increased mortality in patients with septic shock,

D. C. Hidalgo, J. Patel, D. Masic, D. Park, and M. A. Rech, “Delayed vasopressor initiation is associated with increased mortality in patients with septic shock,”Journal of Critical Care, vol. 55, pp. 145–148, 2020

work page 2020

[32] [32]

The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care,

M. Komorowski, L. A. Celi, O. Badawi, A. C. Gordon, and A. A. Faisal, “The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care,”Nature medicine, vol. 24, no. 11, pp. 1716– 1720, 2018

work page 2018

[33] [33]

The eicu collaborative research database, a freely available multi-center database for critical care research,

T. J. Pollard, A. E. Johnson, J. D. Raffa, L. A. Celi, R. G. Mark, and O. Badawi, “The eicu collaborative research database, a freely available multi-center database for critical care research,”Scientific data, vol. 5, no. 1, pp. 1–13, 2018

work page 2018

[34] [34]

Predicting sepsis using deep learning across international sites: a retrospective development and validation study,

M. Moor, N. Bennett, D. Ple ˇcko, M. Horn, B. Rieck, N. Meinshausen, P. B¨uhlmann, and K. Borgwardt, “Predicting sepsis using deep learning across international sites: a retrospective development and validation study,”EClinicalMedicine, vol. 62, 2023

work page 2023

[35] [35]

Consensusclusterplus: a class discovery tool with confidence assessments and item tracking,

M. D. Wilkerson and D. N. Hayes, “Consensusclusterplus: a class discovery tool with confidence assessments and item tracking,”Bioin- formatics, vol. 26, no. 12, pp. 1572–1573, 2010

work page 2010

[36] [36]

Deep k-means: Jointly clus- tering with k-means and learning representations,

M. M. Fard, T. Thonet, and E. Gaussier, “Deep k-means: Jointly clus- tering with k-means and learning representations,”Pattern Recognition Letters, vol. 138, pp. 185–192, 2020

work page 2020

[37] [37]

mice: Multivariate impu- tation by chained equations in r,

S. Van Buuren and K. Groothuis-Oudshoorn, “mice: Multivariate impu- tation by chained equations in r,”Journal of statistical software, vol. 45, pp. 1–67, 2011

work page 2011

[38] [38]

” circlize

Z. Gu, L. Gu, R. Eils, M. Schlesner, and B. Brors, “” circlize” implements and enhances circular visualization in r,” 2014

work page 2014

[39] [39]

The sofa score—development, utility and challenges of accurate assessment in clinical trials,

S. Lambden, P. F. Laterre, M. M. Levy, and B. Francois, “The sofa score—development, utility and challenges of accurate assessment in clinical trials,”Critical Care, vol. 23, pp. 1–9, 2019

work page 2019

[40] [40]

Vasopressor therapy in critically ill patients with shock,

J. A. Russell, “Vasopressor therapy in critically ill patients with shock,” Intensive care medicine, vol. 45, no. 11, pp. 1503–1517, 2019

work page 2019

[41] [41]

Current use of vasopressors in septic shock,

T. W. Scheeren, J. Bakker, D. De Backer, D. Annane, P. Asfar, E. C. Boerma, M. Cecconi, A. Dubin, M. W. D ¨unser, J. Duranteauet al., “Current use of vasopressors in septic shock,”Annals of intensive care, vol. 9, pp. 1–12, 2019

work page 2019

[42] [42]

Vasopressors in septic shock: which, when, and how much?

R. Shi, O. Hamzaoui, N. De Vita, X. Monnet, and J.-L. Teboul, “Vasopressors in septic shock: which, when, and how much?”Annals of translational medicine, vol. 8, no. 12, p. 794, 2020

work page 2020