Mining Electronic Health Records to Investigate Effectiveness of Ensemble Deep Clustering
Pith reviewed 2026-05-10 18:37 UTC · model grok-4.3
The pith
An ensemble deep clustering method combined with traditional techniques achieves the highest performance in grouping heart failure patients from electronic health records.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that traditional clustering methods perform robustly on tabular EHR data while deep learning approaches underperform due to their design for image clustering. It introduces an ensemble-based deep clustering approach that aggregates cluster assignments from multiple embedding dimensions. When combined with traditional clustering in a novel ensemble framework, this method delivers the best overall performance ranking across 14 diverse clustering methods and multiple patient cohorts. The findings highlight advantages of combining approaches and the importance of biological sex-specific clustering of EHR data.
What carries the argument
Ensemble embedding for deep clustering that aggregates cluster assignments obtained from multiple embedding dimensions rather than a single fixed embedding space, integrated with traditional clustering methods.
Load-bearing premise
Deep learning methods designed for image data inherently underperform on tabular EHR data, and aggregating assignments from multiple embedding dimensions reliably improves clustering quality without overfitting or selection bias.
What would settle it
A direct comparison showing that a single deep embedding space achieves equal or better clustering quality than the ensemble aggregation on the same heart failure EHR cohorts would falsify the advantage of the proposed method.
Figures
read the original abstract
In electronic health records (EHRs), clustering patients and distinguishing disease subtypes are key tasks to elucidate pathophysiology and aid clinical decision-making. However, clustering in healthcare informatics is still based on traditional methods, especially K-means, and has achieved limited success when applied to embedding representations learned by autoencoders as hybrid methods. This paper investigates the effectiveness of traditional, hybrid, and deep learning methods in heart failure patient cohorts using real EHR data from the All of Us Research Program. Traditional clustering methods perform robustly because deep learning approaches are specifically designed for image clustering, a task that differs substantially from the tabular EHR data setting. To address the shortcomings of deep clustering, we introduce an ensemble-based deep clustering approach that aggregates cluster assignments obtained from multiple embedding dimensions, rather than relying on a single fixed embedding space. When combined with traditional clustering in a novel ensemble framework, the proposed ensemble embedding for deep clustering delivers the best overall performance ranking across 14 diverse clustering methods and multiple patient cohorts. This paper underscores the importance of biological sex-specific clustering of EHR data and the advantages of combining traditional and deep clustering approaches over a single method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that traditional clustering methods perform robustly on tabular EHR data for heart failure patient cohorts from the All of Us program, while deep learning methods designed for images underperform. It introduces an ensemble deep clustering approach that aggregates cluster assignments from multiple embedding dimensions rather than a single fixed space. When combined with traditional clustering in a novel ensemble framework, this method is asserted to deliver the best overall performance ranking across 14 diverse clustering methods and multiple patient cohorts, while also highlighting the importance of biological sex-specific clustering.
Significance. If the empirical ranking holds under rigorous validation, the work could advance healthcare informatics by demonstrating practical benefits of hybrid ensemble strategies for patient subtyping in tabular EHR data, where pure deep clustering has seen limited success. It provides a concrete example of adapting embedding-based methods to non-image domains and emphasizes sex-specific analysis, which may inform more accurate pathophysiology studies and clinical decision support.
major comments (2)
- Abstract: The assertion that the proposed ensemble embedding for deep clustering 'delivers the best overall performance ranking' is presented without any quantitative metrics (e.g., ARI, NMI, silhouette scores), statistical tests, error bars, cohort sizes, or implementation details, leaving the central empirical claim unsupported by verifiable evidence.
- Introduction and Methods: The foundational assumption that deep learning methods 'are specifically designed for image clustering' and thus inherently limited on tabular EHR data requires explicit ablation studies or direct comparisons to confirm that multi-dimension aggregation improves quality without introducing selection bias or overfitting, as this premise drives the need for the ensemble framework.
minor comments (1)
- The 14 clustering methods should be explicitly enumerated in the methods section, and any tables reporting performance rankings should include full metric values and cohort descriptions for reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address each major comment point by point below, indicating the revisions we will incorporate.
read point-by-point responses
-
Referee: Abstract: The assertion that the proposed ensemble embedding for deep clustering 'delivers the best overall performance ranking' is presented without any quantitative metrics (e.g., ARI, NMI, silhouette scores), statistical tests, error bars, cohort sizes, or implementation details, leaving the central empirical claim unsupported by verifiable evidence.
Authors: We agree that the abstract would be strengthened by including supporting quantitative evidence. In the revised manuscript, we will update the abstract to report key metrics such as the overall performance ranking across the 14 methods, average ARI and NMI values, cohort sizes (number of heart failure patients per All of Us cohort), and references to statistical significance testing. Full details including error bars from repeated runs and implementation specifics remain in the Methods and Results sections. revision: yes
-
Referee: Introduction and Methods: The foundational assumption that deep learning methods 'are specifically designed for image clustering' and thus inherently limited on tabular EHR data requires explicit ablation studies or direct comparisons to confirm that multi-dimension aggregation improves quality without introducing selection bias or overfitting, as this premise drives the need for the ensemble framework.
Authors: The manuscript already contains direct empirical comparisons demonstrating that standard deep clustering methods underperform relative to traditional methods on this tabular EHR data. We also report results from the multi-dimension aggregation approach versus single-embedding baselines. To further validate the aggregation step and address concerns about selection bias or overfitting, we will add explicit ablation experiments in the revised version, including performance sensitivity to the number of embedding dimensions and consistency checks across independent cohorts. revision: partial
Circularity Check
No significant circularity
full rationale
The paper's central claim is an empirical performance ranking of clustering methods (including a proposed ensemble deep clustering approach) on real EHR data from the All of Us program across multiple cohorts and 14 baselines. No derivation chain, theorem, or first-principles result is presented that reduces to its own inputs by construction, self-definition, or fitted-parameter renaming. The abstract and described framework treat the ensemble aggregation as a methodological proposal whose quality is assessed via external data experiments rather than any self-referential equation or self-citation load-bearing premise. This is the expected non-circular outcome for an applied empirical study.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Deep learning clustering methods optimized for images are unsuitable for tabular EHR data without modification
- ad hoc to paper Aggregating cluster assignments from multiple embedding dimensions improves overall clustering quality
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we introduce an ensemble-based deep clustering approach that aggregates cluster assignments obtained from multiple embedding dimensions... KGG ensemble... best overall performance ranking across 14 diverse clustering methods
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Traditional clustering methods perform robustly because deep learning approaches are specifically designed for image clustering, a task that differs substantially from the tabular EHR data setting.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.