pith. sign in

arxiv: 1907.11039 · v1 · pith:OWEHWVJRnew · submitted 2019-07-05 · 💻 cs.HC · cs.LG· stat.ML

Visualization of Emergency Department Clinical Data for Interpretable Patient Phenotyping

Pith reviewed 2026-05-25 01:44 UTC · model grok-4.3

classification 💻 cs.HC cs.LGstat.ML
keywords electronic health recordsemergency departmentpatient phenotypingdimensionality reductionUMAPcluster stabilityclinical visualization
0
0 comments X

The pith

Non-linear embeddings of EHR data produce stable clusters that reflect patient phenotypes across common emergency department complaints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies uniform manifold approximation and projection to high-dimensional electronic health record data from five chief complaints in an emergency department. It then fits Gaussian mixture models to locate clusters in the resulting two-dimensional space and measures their stability across repeated embeddings using the adjusted Rand index. The authors report between two and six clusters per complaint with peak mean pairwise ARI values ranging from 0.35 to 0.74, and they note that the clusters display attributes that appear clinically relevant. If the clusters are reproducible and meaningful, the approach would supply a visual method for surfacing data-driven phenotypes that could support triage and disposition decisions. The work therefore centers on whether non-linear dimensionality reduction can make complex EHR patterns interpretable enough to reveal patient subgroups without relying on predefined clinical categories.

Core claim

Visual embeddings of EHR data using non-linear dimensionality reduction is a promising approach to reveal data-driven patient phenotypes. In the five chief complaints, we find between 2 and 6 clusters, with the peak mean pairwise ARI between subsequent training iterations to range from 0.35 to 0.74.

What carries the argument

Uniform manifold approximation and projection (UMAP) to produce two-dimensional embeddings that preserve both local and global structure, followed by Gaussian mixture models to identify clusters whose stability is quantified by the adjusted Rand index.

If this is right

  • Triage decisions could incorporate visual cluster membership as an additional input alongside traditional vital signs and history.
  • Phenotype-specific resource allocation becomes feasible once clusters are shown to predict disposition or treatment response.
  • Repeated application across different chief complaints would test whether the same embedding pipeline generalizes beyond the five complaints examined.
  • Stability measured by ARI supplies a quantitative criterion for selecting the number of clusters in future deployments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If clusters prove stable across hospitals, the method could serve as a lightweight way to compare patient populations between sites without sharing raw records.
  • Combining the embeddings with time-stamped data might allow tracking of how patients move between phenotypes during an ED stay.
  • The modest ARI values suggest that ensemble methods or additional regularization could be tested to improve cluster consistency in follow-on work.

Load-bearing premise

The discovered clusters correspond to clinically meaningful phenotypes instead of being artifacts produced by the choice of embedding parameters or clustering algorithm.

What would settle it

A prospective study that links the identified clusters to independent clinical outcomes such as admission rates, length of stay, or mortality, or that shows clinicians can consistently assign meaningful labels to the clusters without seeing the embedding coordinates.

Figures

Figures reproduced from arXiv: 1907.11039 by Adrian D. Haimovich, Bobak J. Mortazavi, Nathan C. Hurley, R. Andrew Taylor.

Figure 1
Figure 1. Figure 1: A diagram of the method presented here. The data is randomly split into training data (80%) and testing data (20%). The training data [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: PCA Embeddings of synthetic data. Here the embedding and GMMs have been trained on one of the five training splits, and then applied [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: UMAP Embeddings of synthetic data. Here the embedding and GMMs have been trained on one of the five training splits, and then [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Mean pairwise ARIs of clusterings on synthetic data. The solid line denotes the true number of clusters. ARIs are shown both pairwise [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Representative plot of hyperparameters. Here, four sets of hyperparameters and the resulting mean pairwise ARI are shown. All ARIs are [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: UMAP embeddings of patients with Shortness of Breath. Figure 6a shows the training data, while Figure 6b shows the application of the [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: UMAP embeddings of patients with Abdominal Pain. Figure 7a shows the training data, while Figure 7b shows the application of the [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: UMAP embeddings of patients with Chest Pain. Figure 8a shows the training data, while Figure 8b shows the application of the model to [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: UMAP embeddings of patients with Back Pain. Figure 9a shows the training data, while Figure 9b shows the application of the model to [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Three different folds of UMAP embeddings of patients who suffered Falls. Figures 10a, 10c, and 10e show the training data, while [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
read the original abstract

Visual summarization of clinical data collected on patients contained within the electronic health record (EHR) may enable precise and rapid triage at the time of patient presentation to an emergency department (ED). The triage process is critical in the appropriate allocation of resources and in anticipating eventual patient disposition, typically admission to the hospital or discharge home. EHR data are high-dimensional and complex, but offer the opportunity to discover and characterize underlying data-driven patient phenotypes. These phenotypes will enable improved, personalized therapeutic decision making and prognostication. In this work, we focus on the challenge of two-dimensional patient projections. A low dimensional embedding offers visual interpretability lost in higher dimensions. While linear dimensionality reduction techniques such as principal component analysis are often used towards this aim, they are insufficient to describe the variance of patient data. In this work, we employ the newly-described non-linear embedding technique called uniform manifold approximation and projection (UMAP). UMAP seeks to capture both local and global structures in high-dimensional data. We then use Gaussian mixture models to identify clusters in the embedded data and use the adjusted Rand index (ARI) to establish stability in the discovery of these clusters. This technique is applied to five common clinical chief complaints from a real-world ED EHR dataset, describing the emergent properties of discovered clusters. We observe clinically-relevant cluster attributes, suggesting that visual embeddings of EHR data using non-linear dimensionality reduction is a promising approach to reveal data-driven patient phenotypes. In the five chief complaints, we find between 2 and 6 clusters, with the peak mean pairwise ARI between subsequent training iterations to range from 0.35 to 0.74.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a pipeline for visual phenotyping of ED patients using UMAP to create 2D embeddings of high-dimensional EHR data for five chief complaints, followed by GMM clustering to identify groups and ARI to measure stability across runs. It reports 2–6 clusters per complaint with peak mean pairwise ARI of 0.35–0.74 and asserts that the resulting clusters exhibit clinically relevant attributes, positioning non-linear embeddings as a promising approach for data-driven phenotypes that could aid triage and personalized decisions.

Significance. If externally validated, the approach could offer an interpretable visual tool for discovering patient subgroups in complex EHR data where linear methods fall short. The use of ARI for stability assessment is a positive step toward reproducibility, but the absence of outcome linkage or expert validation limits immediate clinical utility.

major comments (2)
  1. [Abstract] Abstract: the central claim that the method reveals 'clinically-relevant cluster attributes' is unsupported because no quantitative linkage to clinical endpoints (admission, mortality, LOS), clinician annotation, comparison to known phenotypes, or null-model baselines is described; ARI stability (0.35–0.74) addresses only reproducibility, not external validity or meaningfulness.
  2. [Abstract] Abstract (and implied Methods): no information is provided on feature engineering, missing-data handling, hyperparameter selection for UMAP (n_neighbors, min_dist) or GMM (number of components), or any sensitivity analysis, leaving the reported cluster counts and ARI values difficult to interpret or reproduce.
minor comments (1)
  1. [Abstract] The ARI range 0.35–0.74 includes values that indicate only moderate agreement; the paper should clarify what threshold is considered sufficient for claiming stable phenotypes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below, agreeing where revisions are needed to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the method reveals 'clinically-relevant cluster attributes' is unsupported because no quantitative linkage to clinical endpoints (admission, mortality, LOS), clinician annotation, comparison to known phenotypes, or null-model baselines is described; ARI stability (0.35–0.74) addresses only reproducibility, not external validity or meaningfulness.

    Authors: We agree that the abstract's phrasing overstates the evidence. The manuscript reports qualitative observations of cluster attributes (e.g., differences in vital signs or lab values) in the results but provides no quantitative linkage to outcomes, expert validation, or null models. We will revise the abstract to state that clusters 'exhibit attributes suggestive of clinical relevance based on descriptive analysis' and add an explicit limitations paragraph noting the absence of external validation. This change accurately reflects the work's scope. revision: yes

  2. Referee: [Abstract] Abstract (and implied Methods): no information is provided on feature engineering, missing-data handling, hyperparameter selection for UMAP (n_neighbors, min_dist) or GMM (number of components), or any sensitivity analysis, leaving the reported cluster counts and ARI values difficult to interpret or reproduce.

    Authors: We acknowledge that the abstract and methods summary omit these specifics. The full manuscript describes the EHR feature set and basic preprocessing but lacks exact hyperparameter values, missing-data strategy, and sensitivity results. We will expand the methods section to report the precise UMAP and GMM settings used, the approach to missing values, and include a sensitivity analysis varying key parameters to confirm stability of the reported cluster counts and ARI ranges. revision: yes

Circularity Check

0 steps flagged

No circularity: unsupervised pipeline with external stability metric

full rationale

The paper applies UMAP embedding followed by GMM clustering to EHR data for five chief complaints, then computes ARI across independent runs to quantify cluster stability (0.35-0.74). Cluster attributes are inspected post hoc. No equations or steps reduce the output clusters or stability scores to fitted parameters by construction; no target labels or outcomes are used in embedding or clustering; no self-citations, uniqueness theorems, or ansatzes are invoked to justify the pipeline. The derivation is therefore self-contained as standard unsupervised visualization and stability assessment.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The work depends on standard assumptions of manifold learning and mixture modeling plus the untested premise that visual clusters will prove clinically actionable; no new entities are postulated and only routine ML hyperparameters are involved.

free parameters (2)
  • UMAP n_neighbors and min_dist
    Control local versus global structure preservation; values not reported in abstract.
  • Number of GMM components
    Set per chief complaint between 2 and 6; chosen to fit the embedded data.
axioms (2)
  • domain assumption UMAP preserves both local and global structure in high-dimensional clinical data
    Invoked to justify choice over PCA.
  • domain assumption Gaussian mixture models recover stable, clinically relevant groupings in the 2D embedding
    Central modeling step after embedding.

pith-pipeline@v0.9.0 · 5845 in / 1370 out tokens · 64835 ms · 2026-05-25T01:44:07.854971+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 1 internal anchor

  1. [1]

    M. P. Lin, O. Baker, L. D. Richardson, J. D. Schuur, Trends in emergency department visits and admission rates among US acute care hospitals, JAMA Intern Med 178 (12) (2018) 1708–1710 (2018). doi:10.1001/jamainternmed.2018.4725. URL https://www.ncbi.nlm.nih.gov/pubmed/30326057

  2. [2]

    Farrohknia, M

    N. Farrohknia, M. Castren, A. Ehrenberg, L. Lind, S. Oredsson, H. Jonsson, K. Asplund, K. E. Goransson, Emergency department triage scales and their components: a systematic review of the scientific evidence, Scand J Trauma Resusc Emerg Med 19 (1) (2011) 42 (2011). doi:10.1186/1757-7241-19-42 . URL https://www.ncbi.nlm.nih.gov/pubmed/21718476

  3. [3]

    Tanabe, R

    P. Tanabe, R. Gimbel, P. R. Yarnold, D. N. Kyriacou, J. G. Adams, Reliability and validity of scores on the emergency severity index version 3, Acad Emerg Med 11 (1) (2004) 59–65 (2004). doi:doi:10.1197/j.aem.2003.06.013. URL https://www.ncbi.nlm.nih.gov/pubmed/14709429

  4. [4]

    W. S. Hong, A. D. Haimovich, R. A. Taylor, Predicting hospital admission at emergency department triage using machine learning, PLoS One 13 (7) (2018) e0201016 (2018). doi:10.1371/journal.pone.0201016. URL https://www.ncbi.nlm.nih.gov/pubmed/30028888

  5. [5]

    J. M. Kwon, Y . Lee, Y . Lee, S. Lee, H. Park, J. Park, Validation of deep-learning-based triage and acuity score using a large national dataset, PLoS One 13 (10) (2018) e0205836 (2018). doi:10.1371/journal.pone.0205836. URL https://www.ncbi.nlm.nih.gov/pubmed/30321231 15

  6. [6]

    Levin, M

    S. Levin, M. Toerper, E. Hamrock, J. S. Hinson, S. Barnes, H. Gardner, A. Dugas, B. Linton, T. Kirsch, G. Kelen, Machine-learning-based electronic triage more accurately differentiates patients with respect to clinical outcomes compared with the emergency severity index, Ann Emerg Med 71 (5) (2018) 565–574 e2 (2018). doi:10.1016/j.annemergmed.2017.08.005....

  7. [7]

    Shameer, K

    K. Shameer, K. W. Johnson, B. S. Glicksberg, J. T. Dudley, P. P. Sengupta, Machine learning in cardiovascular medicine: are we there yet?, Heart 104 (14) (2018) 1156–1164 (2018). doi:10.1136/heartjnl-2017-311198. URL https://www.ncbi.nlm.nih.gov/pubmed/29352006

  8. [8]

    Ahmad, L

    T. Ahmad, L. H. Lund, P. Rao, R. Ghosh, P. Warier, B. Vaccaro, U. Dahlstrom, C. M. O’Connor, G. M. Felker, N. R. Desai, Machine learning methods improve prognostication, identify clinically distinct phenotypes, and detect heterogeneity in response to therapy in a large cohort of heart failure patients, J Am Heart Assoc 7 (8) (2018) e008081 (2018). doi:10....

  9. [9]

    T. A. Lasko, J. C. Denny, M. A. Levy, Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data, PLoS One 8 (6) (2013) e66341 (2013). doi:10.1371/journal.pone.0066341. URL https://www.ncbi.nlm.nih.gov/pubmed/23826094

  10. [10]

    L. v. d. Maaten, G. Hinton, Visualizing data using t-sne, Journal of machine learning research 9 (Nov) (2008) 2579–2605 (2008)

  11. [11]

    E.-a. D. Amir, K. L. Davis, M. D. Tadmor, E. F. Simonds, J. H. Levine, S. C. Bendall, D. K. Shenfeld, S. Krishnaswamy, G. P. Nolan, D. Pe’er, visne enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia, Nature Biotechnology 31 (2013) 545 (2013). doi:10.1038/nbt.2594https://www.nature.com/articles/nbt.2...

  12. [12]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection for dimension reduction (2018). arXiv:arXiv: 1802.03426

  13. [13]

    Ahmad, M

    T. Ahmad, M. J. Pencina, P. J. Schulte, E. O’Brien, D. J. Whellan, I. L. Pina, D. W. Kitzman, K. L. Lee, C. M. O’Connor, G. M. Felker, Clinical implications of chronic heart failure phenotypes defined by cluster analysis, J Am Coll Cardiol 64 (17) (2014) 1765–74 (2014). doi:10.1016/j.jacc.2014.07.979. URL https://www.ncbi.nlm.nih.gov/pubmed/25443696

  14. [14]

    Seymour, J

    C. Seymour, J. Kennedy, S. Wang, Z. Xu, C. Chang, Q. Mi, Y . V odovotz, G. Clermont, S. Visweswaran, J. Weiss, Feasibility of sepsis phenotyping using electronic health record data during initial emergency department care, in: American Journal of Respiratory and Critical Care Medicine, V ol. 197, Amer Thoracic Soc 25 Broadway, 18 FL, New York, NY 10004 US...

  15. [15]

    B. K. Beaulieu-Jones, C. S. Greene, A. L. S. C. T. C. Pooled Resource Open-Access, Semi-supervised learning of the electronic health record for phenotype stratification, J Biomed Inform 64 (2016) 168–178 (2016).doi:10.1016/j.jbi.2016.10.007. URL https://www.ncbi.nlm.nih.gov/pubmed/27744022

  16. [16]

    J. C. Kirby, P. Speltz, L. V . Rasmussen, M. Basford, O. Gottesman, P. L. Peissig, J. A. Pacheco, G. Tromp, J. Pathak, D. S. Carrell, S. B. Ellis, T. Lingren, W. K. Thompson, G. Savova, J. Haines, D. M. Roden, P. A. Harris, J. C. Denny, Phekb: a catalog and workflow for creating electronic phenotype algorithms for transportability, J Am Med Inform Assoc 23...

  17. [17]

    Y . Wang, L. Luo, M. T. Freedman, S. Y . Kung, Probabilistic principal component subspaces: a hierarchical finite mixture model for data visualization, IEEE Trans Neural Netw 11 (3) (2000) 625–36 (2000). doi:10.1109/72.846734. URL https://www.ncbi.nlm.nih.gov/pubmed/18249790

  18. [18]

    K. Y . Yeung, W. L. Ruzzo, An empirical study on principal component analysis for clustering gene expression data, Department of Computer Science and Engineering, University of Washington (2000)

  19. [19]

    K. A. Oetjen, K. E. Lindblad, M. Goswami, G. Gui, P. K. Dagur, C. Lai, L. W. Dillon, J. P. McCoy, C. S. Hourigan, Human bone marrow assessment by single-cell rna sequencing, mass cytometry, and flow cytometry, JCI Insight 3 (23) (2018).doi:10.1172/jci.insight. 124928. URL https://www.ncbi.nlm.nih.gov/pubmed/30518681

  20. [20]

    Becht, L

    E. Becht, L. McInnes, J. Healy, C. A. Dutertre, I. W. H. Kwok, L. G. Ng, F. Ginhoux, E. W. Newell, Dimensionality reduction for visualizing single-cell data using umap, Nature Biotechnology 37 (1) (2019) 38–+ (2019). doi:10.1038/nbt.4314. URL <GotoISI>://WOS:000454804600017

  21. [21]

    Hubert, P

    L. Hubert, P. Arabie, Comparing partitions, Journal of classification 2 (1) (1985) 193–218 (1985)

  22. [22]

    K. Y . Yeung, W. L. Ruzzo, Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data, Bioinformatics 17 (9) (2001) 763–774 (2001)

  23. [23]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830 (2011)

  24. [24]

    S. L. Cartwright, M. P. Knudson, Evaluation of acute abdominal pain in adults., American family physician 77 (7) (2008). 16