Large Language Models are Powerful Electronic Health Record Encoders

Stefan Hegselmann , Georg von Arnim , Tillmann Rheude , Noel Kronenberg , David Sontag , Gerhard Hindricks , Roland Eils , Benjamin Wild

Authors on Pith no claims yet

classification 💻 cs.LG cs.AIcs.CL

keywords datamodelsembeddingsllm-basedtasksaccessclinicalelectronic

0 comments

read the original abstract

Electronic Health Records (EHRs) offer considerable potential for clinical prediction, but their complexity and heterogeneity challenge traditional machine learning. Domain-specific EHR foundation models trained on unlabeled EHR data have shown improved predictive accuracy and generalization. However, their development is constrained by limited data access and site-specific vocabularies. We convert EHR data into plain text by replacing medical codes with natural-language descriptions, enabling general-purpose Large Language Models (LLMs) to produce high-dimensional embeddings for downstream prediction tasks without access to private medical training data. LLM-based embeddings perform on par with a specialized EHR foundation model, CLMBR-T-Base, across 15 clinical tasks from the EHRSHOT benchmark. In an external validation using the UK Biobank, an LLM-based model shows statistically significant improvements for some tasks, which we attribute to higher vocabulary coverage and slightly better generalization. Overall, we reveal a trade-off between the computational efficiency of specialized EHR models and the portability and data independence of LLM-based embeddings.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning
cs.LG 2026-04 unverdicted novelty 7.0

Multimodal contrastive learning using multilinear products is fragile to single bad modalities, and a gated version improves top-1 retrieval accuracy on synthetic and real trimodal data.
Representation Before Training: A Fixed-Budget Benchmark for Generative Medical Event Models
cs.LG 2026-04 unverdicted novelty 5.0

Fused code-value tokenization improves mortality AUROC from 0.891 to 0.915 and other clinical outcome predictions, while certain temporal encodings like event order match or exceed time tokens with shorter sequences.