A multimodal and temporal foundation model for virtual patient representations at healthcare system scale

Alexandre Misrahi; Andrew Zhang; Caiwei Tian; Dandan Mo; Faisal Mahmood; Joshua E. Lewis; Long Phi Le; Ming Y. Lu; Rowland Pettit; Sophia J. Wagner

arxiv: 2604.18570 · v2 · submitted 2026-04-20 · 💻 cs.LG · cs.AI· cs.CL

A multimodal and temporal foundation model for virtual patient representations at healthcare system scale

Andrew Zhang , Tong Ding , Sophia J. Wagner , Caiwei Tian , Ming Y. Lu , Rowland Pettit , Joshua E. Lewis , Alexandre Misrahi

show 3 more authors

Dandan Mo Long Phi Le Faisal Mahmood

This is my paper

Pith reviewed 2026-05-10 04:57 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL

keywords multimodal foundation modelselectronic health recordsclinical forecastingpatient representationstemporal datamedical AIprognostic modelsvirtual patients

0 comments

The pith

A multimodal foundation model unifies full patient records into embeddings for forecasting hundreds of clinical outcomes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a single model can integrate 28 medical modalities from 25 billion records into virtual patient representations. These representations are then shown to support accurate predictions on 322 tasks, including disease onset up to five years ahead. This matters because it suggests that the complete temporal and multimodal context of a patient's care can be made available for computational analysis without manual intervention. If correct, it lays groundwork for systems that reason over entire care journeys rather than fragmented data pieces.

Core claim

Apollo is a multimodal temporal foundation model that learns a unified representation space from over 100 thousand medical events, images, and clinical text across 7.2 million patients. The resulting virtual patient representations enable generalized clinical forecasting on 95 new disease onset tasks up to five years ahead, 78 disease progression tasks, 59 treatment response tasks, 17 adverse event risk tasks, and 12 hospital operations tasks, while also supporting 61 semantic retrieval tasks and showing alignment with interpretable biomarkers.

What carries the argument

The Apollo model itself, which acts as a compressor turning sequences of structured events, unstructured text, and images into unified virtual patient embeddings that capture the full care journey.

If this is right

The embeddings allow prediction of new disease onset risk up to five years in advance across 95 tasks.
Disease progression forecasting is possible in 78 tasks.
Treatment response prediction covers 59 tasks and adverse event risks cover 17 tasks.
Hospital operations endpoints are addressed in 12 tasks and semantic search in 61 retrieval tasks.
Feature attribution confirms that predictions rely on clinically relevant multimodal signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could enable searching for similar patient trajectories using text or image queries to guide care in complex cases.
Performance on external datasets from other hospitals would test if the representations are truly general or system-specific.
Integration into existing record systems could provide automated risk alerts based on the full record history.
The model might support cross-modal queries that link images directly to future outcome probabilities.

Load-bearing premise

The test set patients and their data patterns are representative of future patients, and the model captures generalizable medical signals instead of hospital-specific documentation biases.

What would settle it

If the predictive performance on the 95 disease onset tasks drops significantly when applied to patient data from a different hospital system, this would falsify the claim of generalized representations.

Figures

Figures reproduced from arXiv: 2604.18570 by Alexandre Misrahi, Andrew Zhang, Caiwei Tian, Dandan Mo, Faisal Mahmood, Joshua E. Lewis, Long Phi Le, Ming Y. Lu, Rowland Pettit, Sophia J. Wagner, Tong Ding.

**Figure 1.** Figure 1: Overview of MGB-7M and APOLLO. (a) Overview of the pretraining dataset MGB-7M curated from 17 hospitals in one large-scale health care system consisting of 7.15 million patients. (b-e) Detailed distribution of MGB-7M including (b) LOINC code distribution of measurements, (c) distribution of diagnostic reports across medical domains, (d) medications grouped by ATC classification, and (e) ICD10 codes grouped… view at source ↗

**Figure 2.** Figure 2: APOLLO generates an atlas of medical concepts. (a) Uniform manifold approximation and projection (UMAP) of the 103,940 discrete tokens that occur more than 100 times shows that APOLLO learns the underlying semantics of the discrete concepts. The emerging atlas of medical concepts exhibits meaningful spatial relationships both within modalities, as seen for (b) diagnosis codes and (c) for medications, as we… view at source ↗

**Figure 3.** Figure 3: Evaluation of APOLLO’s patient embeddings. (a) Uniform Manifold Approximation and Projection (UMAP) visualization of 100 thousand randomly sampled patient embeddings from the data partition for downstream evaluation, labeled by age. Local neighborhoods reveal clustering of patients with similar clinical phenotypes. (b) Patient trajectories of 10 random patients before their diagnosis of Schizophrenia show… view at source ↗

**Figure 4.** Figure 4: APOLLO enhances patient retrieval. (a) 61 patient retrieval tasks curated from combinations of ICD10 diagnosis codes and medications, assessed with accuracy among the five closest (Acc@5) embedded patients compared to retrieval of the latest progress note embedding, (b) qualitative evaluation of the closest embedded patient to a patient for kidney transplant maintenance. (c) Text-based retrieval on the exa… view at source ↗

**Figure 5.** Figure 5: APOLLO yields interpretable biomarkers at both the local and global level. (a–c) Local analysis. We plotted the model’s predicted 3-year risk for three example patients: (a) chronic kidney disease, (b) lung cancer, and (c) heart failure, as a function of age; markers indicate encounter times. At each prominent increase in risk, we performed a leave-one-token-out (LOTO) sensitivity analysis over events betw… view at source ↗

read the original abstract

Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal hospital records from a major US hospital system, composed of 25 billion records from 7.2 million patients, representing 28 distinct medical modalities and 12 major medical specialties. Apollo learns a unified representation space integrating over 100 thousand unique medical events in our clinical vocabulary as well as images and clinical text. This "atlas of medical concepts" forms a computational substrate for modeling entire patient care journeys comprised of sequences of structured and unstructured events, which are compressed by Apollo into virtual patient representations. To assess the potential of these whole-patient representations, we created 322 prognosis and retrieval tasks from a held-out test set of 1.4 million patients. We demonstrate the generalized clinical forecasting potential of Apollo embeddings, including predicting new disease onset risk up to five years in advance (95 tasks), disease progression (78 tasks), treatment response (59 tasks), risk of treatment-related adverse events (17 tasks), and hospital operations endpoints (12 tasks). Using feature attribution techniques, we show that model predictions align with clinically-interpretable multimodal biomarkers. We evaluate semantic similarity search on 61 retrieval tasks, and moreover demonstrate the potential of Apollo as a multimodal medical search engine using text and image queries. Together, these modeling capabilities establish the foundation for computable medicine, where the full context of patient care becomes accessible to computational reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Apollo pushes EHR foundation models to bigger scale with 28 modalities and 322 tasks but single-center data and absent methods details limit what the results actually show.

read the letter

The main thing to know is that this paper trains a multimodal temporal model on 25 billion records from 7.2 million patients across 28 modalities and 12 specialties, then evaluates the patient embeddings on 322 forecasting and retrieval tasks drawn from a held-out set of 1.4 million patients. The scale and task coverage exceed what earlier EHR foundation models have published, and the framing of compressing full patient journeys into searchable embeddings is a reasonable extension of prior embedding work. They also report using feature attribution to tie predictions to clinical biomarkers and test text-plus-image queries for retrieval, which adds some practical flavor. The temporal holdout helps avoid direct leakage, and the tasks cover distinct areas like five-year disease onset, progression, treatment response, adverse events, and hospital operations. That said, everything stays inside one US hospital system's records. No external validation cohort appears, so the embeddings could easily pick up local coding habits, note templates, or referral patterns rather than transportable signals. This directly undercuts the repeated claims of generalized forecasting and a multimodal medical search engine. The abstract supplies no architecture, training procedure, baselines, metrics, ablations, or statistical tests, which leaves the central performance claims impossible to assess. The work targets researchers building large clinical foundation models who care about scale. It deserves peer review to force out the missing methods and push for multi-site checks, even though heavy revision will be needed.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Apollo, a multimodal temporal foundation model trained on 25 billion records from 7.2 million patients spanning 28 medical modalities and 12 specialties from a single US hospital system. It constructs unified representations integrating over 100k medical events, images, and clinical text into 'virtual patient representations' that compress entire care journeys. These representations are assessed via 322 prognosis and retrieval tasks on a temporally held-out cohort of 1.4 million patients, with claims of forecasting new disease onset up to five years ahead (95 tasks), disease progression (78 tasks), treatment response (59 tasks), adverse events (17 tasks), hospital operations (12 tasks), plus semantic similarity search on 61 tasks and multimodal query capabilities.

Significance. If the results hold, the work has substantial significance due to the unprecedented scale of the integrated dataset and the breadth of evaluated tasks, which together position the embeddings as a potential substrate for computable medicine. The temporal hold-out design and use of feature attribution for interpretability are positive elements. The large patient cohort and multimodal coverage represent a clear strength that could enable downstream applications if generalizability is established.

major comments (3)

[Abstract] Abstract and evaluation description: the central claim of 'generalized clinical forecasting potential' across 322 tasks is unsupported because no quantitative performance metrics (AUC, F1, calibration, or statistical tests), baseline comparisons, or ablation results are reported for any task, preventing assessment of whether the embeddings outperform trivial or existing methods.
[Data and Evaluation] Data section: all training (7.2M patients) and evaluation (1.4M held-out patients) occurs within a single hospital system's records. This single-center limitation means the 95 onset, 78 progression, and other tasks test only intra-site patterns (coding, documentation, demographics), directly threatening the headline claims of generalization 'at healthcare system scale' and transportable clinical signals without external validation.
[Methods] Methods: no architecture details, training objective, loss functions, optimization procedure, or hyperparameter choices are provided for the foundation model that produces the embeddings used in all 322 tasks, rendering the central modeling contribution impossible to reproduce or critique.

minor comments (2)

[Abstract] The phrase 'virtual patient representations' is used repeatedly without a formal definition or equation distinguishing it from standard sequence embeddings.
[Abstract] The abstract lists task counts (95, 78, 59, etc.) but does not indicate how tasks were constructed or balanced, which affects interpretation of the forecasting results.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive review and for highlighting areas where the manuscript can be strengthened. We address each major comment in turn below, with plans for targeted revisions.

read point-by-point responses

Referee: [Abstract] Abstract and evaluation description: the central claim of 'generalized clinical forecasting potential' across 322 tasks is unsupported because no quantitative performance metrics (AUC, F1, calibration, or statistical tests), baseline comparisons, or ablation results are reported for any task, preventing assessment of whether the embeddings outperform trivial or existing methods.

Authors: We agree that the abstract, as a high-level summary, does not contain specific numerical results. The current manuscript defines the 322 tasks and describes the overall evaluation framework but does not report the requested quantitative metrics, baseline comparisons, or ablations. In the revised version we will add a concise Results subsection (or expanded evaluation paragraph) that reports representative AUC-ROC, F1, calibration, and statistical test values across task categories, includes comparisons to standard baselines (e.g., logistic regression on structured features), and presents modality and temporal ablations. We will also update the abstract to include one or two key quantitative highlights so readers can immediately gauge performance. revision: yes
Referee: [Data and Evaluation] Data section: all training (7.2M patients) and evaluation (1.4M held-out patients) occurs within a single hospital system's records. This single-center limitation means the 95 onset, 78 progression, and other tasks test only intra-site patterns (coding, documentation, demographics), directly threatening the headline claims of generalization 'at healthcare system scale' and transportable clinical signals without external validation.

Authors: We acknowledge the single-center constraint as a genuine limitation. Although the temporal hold-out design tests forecasting on future patients within the same system and the cohort size is large, the evaluation cannot speak to transportability across institutions with differing coding practices or populations. We cannot obtain external datasets for additional validation at this time. In revision we will insert an explicit Limitations section that states this restriction, discusses potential site-specific biases, and outlines the need for future multi-center studies. We will also moderate language in the abstract, introduction, and title to clarify that claims refer to scale within one large healthcare system rather than universal generalizability. revision: partial
Referee: [Methods] Methods: no architecture details, training objective, loss functions, optimization procedure, or hyperparameter choices are provided for the foundation model that produces the embeddings used in all 322 tasks, rendering the central modeling contribution impossible to reproduce or critique.

Authors: The referee correctly identifies that the current Methods section lacks sufficient technical detail for reproducibility. We will expand it substantially to include: (1) a precise description of the multimodal transformer architecture with temporal encodings, (2) the composite training objective (masked event modeling plus cross-modal contrastive loss), (3) the exact loss functions and weighting, (4) the optimizer, learning-rate schedule, and batching strategy, and (5) a table of all key hyperparameters. A high-level pseudocode block and an architecture diagram will also be added. These additions will allow readers to understand and, where data access permits, reproduce the embedding generation process. revision: yes

standing simulated objections not resolved

The single-center data constraint and consequent inability to supply external validation experiments with data from other healthcare systems.

Circularity Check

0 steps flagged

No significant circularity; evaluations independent of training objective

full rationale

The paper trains Apollo on 7.2M patients' multimodal longitudinal records to learn unified embeddings, then evaluates those embeddings on a temporally held-out cohort of 1.4M patients using 322 separately defined downstream tasks (95 disease-onset, 78 progression, 59 treatment-response, etc.). These forecasting and retrieval tasks are not quantities defined by the training objective itself, nor do they reduce to fitted parameters or self-citations by construction. No equations, self-definitional steps, or load-bearing self-citations appear in the provided text; the derivation chain is self-contained against external held-out benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the quality and representativeness of the proprietary longitudinal dataset plus standard assumptions that deep learning embeddings trained on observational records will generalize to future patients and tasks.

axioms (2)

domain assumption The 25 billion records accurately capture patient states, events, and outcomes without systematic documentation bias
Invoked implicitly when claiming the embeddings form a reliable computational substrate for forecasting.
domain assumption The held-out 1.4 million patients are statistically exchangeable with future patients at the same institution
Required for the 322-task evaluation to support claims of generalized forecasting potential.

invented entities (1)

virtual patient representations no independent evidence
purpose: Compressed unified embedding of a patient's full multimodal temporal record
The embeddings are the model's output; no external falsifiable signature (e.g., predicted biomarker) is provided beyond internal task performance.

pith-pipeline@v0.9.0 · 5623 in / 1456 out tokens · 42530 ms · 2026-05-10T04:57:35.491714+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Simulating clinical interventions with a generative multimodal model of human physiology
cs.AI 2026-04 unverdicted novelty 7.0

HealthFormer is a generative multimodal transformer that forecasts individual physiological trajectories and simulates clinical interventions, outperforming clinical risk scores on disease prediction and matching tria...
DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System
cs.LG 2026-05 unverdicted novelty 4.0

DT-Transformer predicts next disease events with median age- and sex-stratified AUC 0.871 across 896 categories on held-out and prospective data from a 1.7M-patient multi-hospital EHR dataset.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

Moor, M.et al.Foundation models for generalist medical artificial intelligence.Nature616, 259–265 (2023)

work page 2023
[2]

B., Jensen, L

Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records: Towards better research applications and clinical care.Nature Reviews Genetics13, 395–405 (2012)

work page 2012
[3]

The healthcare data explosion (2023)

RBC Capital Markets. The healthcare data explosion (2023). URLhttps://www.rbccm.com/en/ gib/healthcare/episode/the_healthcare_data_explosion

work page 2023
[4]

Report: Only 57% of healthcare organizations’ data is used to make decisions

Arcadia. Report: Only 57% of healthcare organizations’ data is used to make decisions. Tech. Rep., Healthcare Information and Management Systems Society (HIMSS) (2023)

work page 2023
[5]

E.et al.Burden of serious harms from diagnostic error in the USA.BMJ Quality & Safety33, 109–120 (2024)

Newman-Toker, D. E.et al.Burden of serious harms from diagnostic error in the USA.BMJ Quality & Safety33, 109–120 (2024)

work page 2024
[6]

Cheng, Y ., Wang, F., Zhang, P. & Hu, J. Risk prediction with electronic health records: A deep learning approach. InProceedings of the 2016 SIAM international conference on data mining, 432–440 (SIAM, 2016)

work page 2016
[7]

& Sun, J

Xiao, C., Choi, E. & Sun, J. Opportunities and challenges in developing deep learning models using elec- tronic health records data: A systematic review.Journal of the American Medical Informatics Association 25, 1419–1428 (2018)

work page 2018
[8]

Brown, T.et al.Language models are few-shot learners.Advances in neural information processing systems33, 1877–1901 (2020)

work page 1901
[9]

& Toutanova, K

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 4171–4186 (2019)

work page 2019
[10]

Oquab, M.et al.DINOv2: Learning robust visual features without supervision.Transactions on Machine Learning Research(2024)

work page 2024
[11]

& Hinton, G

Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, 1597–1607 (PMLR, 2020)

work page 2020
[12]

Science379, 1123–1130 (2023)

Lin, Z.et al.Evolutionary-scale prediction of atomic-level protein structure with a language model. Science379, 1123–1130 (2023)

work page 2023
[13]

Nguyen, E.et al.Sequence modeling and design from molecular to genome scale with Evo.Science386, eado9336 (2024)

work page 2024
[14]

Bommasani, R.et al.On the opportunities and risks of foundation models.ArXiv(2021)

work page 2021
[15]

& Topol, E

Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine.Nature medicine28, 31–38 (2022)

work page 2022
[16]

Tu, T.et al.Towards generalist biomedical AI.NEJM AI1, AIoa2300138 (2024)

work page 2024
[17]

InProceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78 (Association for Computational Linguistics, Minneapolis, Minnesota, USA, 2019)

Alsentzer, E.et al.Publicly available clinical BERT embeddings. InProceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78 (Association for Computational Linguistics, Minneapolis, Minnesota, USA, 2019). URLhttps://www.aclweb.org/anthology/W19-1909

work page 2019
[18]

Yang, X.et al.A large language model for electronic health records.npj Digital Medicine5, 194 (2022). 31

work page 2022
[19]

J.et al.Towards a general-purpose foundation model for computational pathology.Nature Medicine30, 850–862 (2024)

Chen, R. J.et al.Towards a general-purpose foundation model for computational pathology.Nature Medicine30, 850–862 (2024)

work page 2024
[20]

V orontsov, E.et al.A foundation model for clinical-grade computational pathology and rare cancers detection.Nature Medicine30, 2924–2935 (2024)

work page 2024
[21]

P ´erez-Garc´ıa, F.et al.Exploring scalable medical image encoders beyond text supervision.Nature Ma- chine Intelligence1–12 (2025)

work page 2025
[22]

Tiu, E.et al.Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning.Nature Biomedical Engineering6, 1399–1406 (2022)

work page 2022
[23]

Liu, S., Wang, X., Hou, Y .et al.Multimodal data matters: Language model pre-training over structured and unstructured electronic health records.IEEE Journal of Biomedical and Health Informatics27, 504– 514 (2023)

work page 2023
[24]

Khader, F., Kather, J. N., M¨uller-Franzes, G.et al.Medical transformer for multimodal survival prediction in intensive care: Integration of imaging and non-imaging data.Scientific Reports13, 10666 (2023)

work page 2023
[25]

& Shah, N

Wornow, M., Thapa, R., Steinberg, E., Fries, J. & Shah, N. EHRshot: An EHR benchmark for few-shot evaluation of foundation models.Advances in Neural Information Processing Systems36, 67125–67137 (2023)

work page 2023
[26]

A., Xu, Y

Steinberg, E., Fries, J. A., Xu, Y . & Shah, N. MOTOR: A time-to-event foundation model for structured medical records. InICLR(2024)

work page 2024
[27]

Li, Y .et al.BEHRT: Transformer for electronic health records.Scientific Reports10, 7155 (2020)

work page 2020
[28]

& Zhi, D

Rasmy, L., Xiang, Y ., Xie, Z., Tao, C. & Zhi, D. Med-BERT: Pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.npj Digital Medicine4, 86 (2021)

work page 2021
[29]

Miotto, R., Li, L., Kidd, B. A. & Dudley, J. T. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records.Scientific Reports6, 26094 (2016)

work page 2016
[30]

Redekop, E.et al.Zero-shot medical event prediction using a generative pretrained transformer on elec- tronic health records.Journal of the American Medical Informatics Association32, 1833–1842 (2025)

work page 2025
[31]

Renc, P.et al.Zero-shot health trajectory prediction using transformer.npj Digital Medicine7, 256 (2024)

work page 2024
[32]

Zhang, Y . & Li, S. Chronoformer: Time-aware transformer architectures for structured clinical event modeling.arXiv preprint arXiv:2504.07373(2025)

work page arXiv 2025
[33]

Kraljevic, Z., Bean, D., Shek, A.et al.Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: A retrospective modelling study.The Lancet Digital Health6, e281–e290 (2024)

work page 2024
[34]

W., Gaurav, K.et al.Learning the natural history of human disease with generative transformers.Nature(2025)

Shmatko, A., Jung, A. W., Gaurav, K.et al.Learning the natural history of human disease with generative transformers.Nature(2025)

work page 2025
[35]

Li, Y ., Mamouei, M., Salimi-Khorshidi, G.et al.Hi-BEHRT: Hierarchical transformer-based model for ac- curate prediction of clinical events using multimodal longitudinal electronic health records.IEEE Journal of Biomedical and Health Informatics27, 1106–1117 (2023)

work page 2023
[36]

Kauffman, J.et al.Embedding methods for electronic health record research.Annual Review of Biomedical Data Science8(2025)

work page 2025
[37]

R.et al.Integrated multimodal artificial intelligence framework for healthcare applications

Soenksen, L. R.et al.Integrated multimodal artificial intelligence framework for healthcare applications. npj Digital Medicine5, 149 (2022). 32

work page 2022
[38]

Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual-language foundation model for pathology image analysis using medical Twitter.Nature Medicine29, 2307–2316 (2023)

work page 2023
[39]

K.et al.Predicting cellular responses to perturbation across diverse contexts with STATE

Adduri, A. K.et al.Predicting cellular responses to perturbation across diverse contexts with STATE. bioRxiv2025.06.26.661135 (2025)

work page 2025
[40]

Bunne, C.et al.How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell 187, 7045–7063 (2024)

work page 2024
[41]

Johnson, A. E. W.et al.MIMIC-IV, a freely accessible electronic health record dataset.Scientific Data 10, 1 (2023)

work page 2023
[42]

Organization, W. H. ICD-10: International statistical classification of diseases and related health problems: Tenth revision (2004)

work page 2004
[43]

Ding, T.et al.A multimodal whole-slide foundation model for pathology.Nature Medicine1–13 (2025)

work page 2025
[44]

Y .et al.A visual-language foundation model for computational pathology.Nature Medicine30, 863–874 (2024)

Lu, M. Y .et al.A visual-language foundation model for computational pathology.Nature Medicine30, 863–874 (2024). Publisher: Nature Publishing Group

work page 2024
[45]

From ehrs to patient pathways: Scalable modeling of longitudinal health trajectories with llms.arXiv preprint arXiv:2506.04831, 2025

Pellegrini, C., ¨Ozsoy, E., Bani-Harouni, D., Keicher, M. & Navab, N. From ehrs to patient pathways: Scalable modeling of longitudinal health trajectories with llms.arXiv preprint arXiv:2506.04831(2025)

work page arXiv 2025
[46]

Waxler, S.et al.Generative medical event models improve with scale.arXiv preprint arXiv:2508.12104 (2025)

work page arXiv 2025
[47]

A.et al.Non-anemic iron deficiency predicts COPD exacerbations and hospitalizations: Re- sults from a prospective cohort.Journal of Clinical Medicine14, 4154 (2025)

Amado, C. A.et al.Non-anemic iron deficiency predicts COPD exacerbations and hospitalizations: Re- sults from a prospective cohort.Journal of Clinical Medicine14, 4154 (2025)

work page 2025
[48]

Dong, Z.et al.Association between iron homeostasis and prognosis in patients with chronic obstructive pulmonary disease: A retrospective analysis from MIMIC-IV database.Frontiers in Medicine12, 1610681 (2025)

work page 2025
[49]

Ghonemy, S., Nasr, M. M. M., Soliman, M. & Hosiney, H. A. Clinical skin aging score and risk of degenerative cardiovascular diseases.The Journal of Clinical and Aesthetic Dermatology14, 34 (2021)

work page 2021
[50]

& Katira, R

Katira, A. & Katira, R. Dermatological manifestations of cardiac conditions.The British Journal of Cardiology29, 9 (2022)

work page 2022
[51]

O., Abiodun, O

Soyoye, D. O., Abiodun, O. O., Ikem, R. T., Kolawole, B. A. & Akintomide, A. O. Diabetes and peripheral artery disease: A review.World Journal of Diabetes12, 827 (2021)

work page 2021
[52]

W., Holm, P

Rasmussen, C., Larsen, J. W., Holm, P. C. & Nielsen, G. L. Gout: An overlooked disease in patients with diabetes? a danish prospective cohort study with 2 years of follow-up.Clinical Diabetes43, 282–290 (2025)

work page 2025
[53]

Rasmussen, C.et al.Identifying tophaceous gout in foot ulcers using ulcer debris microscopy in type 2 diabetes.Journal of Wound Management26, 175–181 (2025)

work page 2025
[54]

& Malik, M

Valiyaveettil, D., Joseph, D. & Malik, M. Cardiotoxicity in breast cancer treatment: Causes and mitigation. Cancer Treatment and Research Communications37, 100760 (2023)

work page 2023
[55]

Yalc ¸ıner, M.et al.Impact of comorbidity on survival in cancer patients receiving immune checkpoint inhibitors.Clinical and Translational Oncology1–8 (2025)

work page 2025
[56]

Carreira, H.et al.Use of anthracyclines and trastuzumab for breast cancer in women with and without a history of cardiovascular disease in sweden: A national cross-sectional study.Cardio-Oncology11, 56 (2025). 33

work page 2025
[57]

Poletto, S.et al.Predictive factors in metastatic melanoma treated with immune checkpoint inhibitors: From clinical practice to future perspective.Cancers16, 101 (2023)

work page 2023
[58]

& Wang, F

Du, Y ., Wu, W., Chen, M., Dong, Z. & Wang, F. Cutaneous adverse events and cancer survival prognosis with immune checkpoint inhibitor treatment: A systematic review and meta-analysis.JAMA Dermatology 159, 1093–1101 (2023)

work page 2023
[59]

& Chu, C.-Y

Cho, Y .-T., Lin, Y .-T., Yang, C.-W. & Chu, C.-Y . Cutaneous immune-related adverse events among tai- wanese cancer patients receiving immune checkpoint inhibitors link to a survival benefit.Scientific Reports 12, 7021 (2022)

work page 2022
[60]

In International Conference on Medical Image Computing and Computer-Assisted Intervention, 520–530 (Springer, 2024)

Koch, V .et al.DinoBloom: A foundation model for generalizable cell embeddings in hematology. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 520–530 (Springer, 2024)

work page 2024
[61]

Vaswani, A.et al.Attention is all you need.Advances in neural information processing systems30(2017)

work page 2017
[62]

Yang, A.et al.Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[63]

Zadeh, S. G. & Schmid, M. Bias in cross-entropy-based training of deep survival networks.IEEE Trans- actions on Pattern Analysis and Machine Intelligence43, 3126–3137 (2020)

work page 2020
[64]

& Yan, Q

Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Precup, D. & Teh, Y . W. (eds.)Proceedings of the 34th International Conference on Machine Learning, vol. 70 ofProceedings of Machine Learning Research, 3319–3328 (PMLR, 2017)

work page 2017
[65]

M., Mahajan, A., Scott, D

Chase, D. M., Mahajan, A., Scott, D. A., Hawkins, N. & Kalilani, L. The impact of varying levels of residual disease following cytoreductive surgery on survival outcomes in patients with ovarian cancer: A meta-analysis.BMC Women’s Health24, 179 (2024)

work page 2024
[66]

Petrucelli, N., Daly, M. B. & Pal, T. BRCA1- and BRCA2-associated hereditary breast and ovarian cancer. In Adam, M. P., Bick, S., Mirzaa, G. M.et al.(eds.)GeneReviews® [Internet](University of Washington, Seattle, Seattle (W A), 2025). Initial posting: 1998-09-04. Updated: 2025-03-20. PMID: 20301425. Bookshelf ID: NBK1247

work page 2025
[67]

Praestegaard, C.et al.Cigarette smoking is associated with adverse survival among women with ovarian cancer: Results from a pooled analysis of 19 studies.International Journal of Cancer140, 2422–2435 (2017)

work page 2017
[68]

R.et al.Anthracycline-related cardiotoxicity in older patients with acute myeloid leukemia: A young SIOG review paper.Blood Advances4, 762–775 (2020)

Neuendorff, N. R.et al.Anthracycline-related cardiotoxicity in older patients with acute myeloid leukemia: A young SIOG review paper.Blood Advances4, 762–775 (2020)

work page 2020
[69]

Wang, J.et al.Impact of chronic kidney disease on the prognosis of transcatheter aortic valve replacement in patients with aortic stenosis: A meta-analysis of 133624 patients.Annals of Thoracic and Cardiovas- cular Surgery28, 83–95 (2022)

work page 2022
[70]

J.et al.Association between use of antithrombotic medication and hematuria-related compli- cations.JAMA318, 1260–1271 (2017)

Wallis, C. J.et al.Association between use of antithrombotic medication and hematuria-related compli- cations.JAMA318, 1260–1271 (2017). 34 Extended Data Figures 0 121 243 365 0.99950 0.99975 1.00000Disease-free probability Nephrotic Syndrome (n=324,298, I=0.0%) p < 0.0001 0 608 1216 1825 0.9990 0.9995 1.0000 Acute Lymphocytic Leukemia (n=320,401, I=0.0%)...

work page 2017

[1] [1]

Moor, M.et al.Foundation models for generalist medical artificial intelligence.Nature616, 259–265 (2023)

work page 2023

[2] [2]

B., Jensen, L

Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records: Towards better research applications and clinical care.Nature Reviews Genetics13, 395–405 (2012)

work page 2012

[3] [3]

The healthcare data explosion (2023)

RBC Capital Markets. The healthcare data explosion (2023). URLhttps://www.rbccm.com/en/ gib/healthcare/episode/the_healthcare_data_explosion

work page 2023

[4] [4]

Report: Only 57% of healthcare organizations’ data is used to make decisions

Arcadia. Report: Only 57% of healthcare organizations’ data is used to make decisions. Tech. Rep., Healthcare Information and Management Systems Society (HIMSS) (2023)

work page 2023

[5] [5]

E.et al.Burden of serious harms from diagnostic error in the USA.BMJ Quality & Safety33, 109–120 (2024)

Newman-Toker, D. E.et al.Burden of serious harms from diagnostic error in the USA.BMJ Quality & Safety33, 109–120 (2024)

work page 2024

[6] [6]

Cheng, Y ., Wang, F., Zhang, P. & Hu, J. Risk prediction with electronic health records: A deep learning approach. InProceedings of the 2016 SIAM international conference on data mining, 432–440 (SIAM, 2016)

work page 2016

[7] [7]

& Sun, J

Xiao, C., Choi, E. & Sun, J. Opportunities and challenges in developing deep learning models using elec- tronic health records data: A systematic review.Journal of the American Medical Informatics Association 25, 1419–1428 (2018)

work page 2018

[8] [8]

Brown, T.et al.Language models are few-shot learners.Advances in neural information processing systems33, 1877–1901 (2020)

work page 1901

[9] [9]

& Toutanova, K

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 4171–4186 (2019)

work page 2019

[10] [10]

Oquab, M.et al.DINOv2: Learning robust visual features without supervision.Transactions on Machine Learning Research(2024)

work page 2024

[11] [11]

& Hinton, G

Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, 1597–1607 (PMLR, 2020)

work page 2020

[12] [12]

Science379, 1123–1130 (2023)

Lin, Z.et al.Evolutionary-scale prediction of atomic-level protein structure with a language model. Science379, 1123–1130 (2023)

work page 2023

[13] [13]

Nguyen, E.et al.Sequence modeling and design from molecular to genome scale with Evo.Science386, eado9336 (2024)

work page 2024

[14] [14]

Bommasani, R.et al.On the opportunities and risks of foundation models.ArXiv(2021)

work page 2021

[15] [15]

& Topol, E

Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine.Nature medicine28, 31–38 (2022)

work page 2022

[16] [16]

Tu, T.et al.Towards generalist biomedical AI.NEJM AI1, AIoa2300138 (2024)

work page 2024

[17] [17]

InProceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78 (Association for Computational Linguistics, Minneapolis, Minnesota, USA, 2019)

Alsentzer, E.et al.Publicly available clinical BERT embeddings. InProceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78 (Association for Computational Linguistics, Minneapolis, Minnesota, USA, 2019). URLhttps://www.aclweb.org/anthology/W19-1909

work page 2019

[18] [18]

Yang, X.et al.A large language model for electronic health records.npj Digital Medicine5, 194 (2022). 31

work page 2022

[19] [19]

J.et al.Towards a general-purpose foundation model for computational pathology.Nature Medicine30, 850–862 (2024)

Chen, R. J.et al.Towards a general-purpose foundation model for computational pathology.Nature Medicine30, 850–862 (2024)

work page 2024

[20] [20]

V orontsov, E.et al.A foundation model for clinical-grade computational pathology and rare cancers detection.Nature Medicine30, 2924–2935 (2024)

work page 2024

[21] [21]

P ´erez-Garc´ıa, F.et al.Exploring scalable medical image encoders beyond text supervision.Nature Ma- chine Intelligence1–12 (2025)

work page 2025

[22] [22]

Tiu, E.et al.Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning.Nature Biomedical Engineering6, 1399–1406 (2022)

work page 2022

[23] [23]

Liu, S., Wang, X., Hou, Y .et al.Multimodal data matters: Language model pre-training over structured and unstructured electronic health records.IEEE Journal of Biomedical and Health Informatics27, 504– 514 (2023)

work page 2023

[24] [24]

Khader, F., Kather, J. N., M¨uller-Franzes, G.et al.Medical transformer for multimodal survival prediction in intensive care: Integration of imaging and non-imaging data.Scientific Reports13, 10666 (2023)

work page 2023

[25] [25]

& Shah, N

Wornow, M., Thapa, R., Steinberg, E., Fries, J. & Shah, N. EHRshot: An EHR benchmark for few-shot evaluation of foundation models.Advances in Neural Information Processing Systems36, 67125–67137 (2023)

work page 2023

[26] [26]

A., Xu, Y

Steinberg, E., Fries, J. A., Xu, Y . & Shah, N. MOTOR: A time-to-event foundation model for structured medical records. InICLR(2024)

work page 2024

[27] [27]

Li, Y .et al.BEHRT: Transformer for electronic health records.Scientific Reports10, 7155 (2020)

work page 2020

[28] [28]

& Zhi, D

Rasmy, L., Xiang, Y ., Xie, Z., Tao, C. & Zhi, D. Med-BERT: Pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.npj Digital Medicine4, 86 (2021)

work page 2021

[29] [29]

Miotto, R., Li, L., Kidd, B. A. & Dudley, J. T. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records.Scientific Reports6, 26094 (2016)

work page 2016

[30] [30]

Redekop, E.et al.Zero-shot medical event prediction using a generative pretrained transformer on elec- tronic health records.Journal of the American Medical Informatics Association32, 1833–1842 (2025)

work page 2025

[31] [31]

Renc, P.et al.Zero-shot health trajectory prediction using transformer.npj Digital Medicine7, 256 (2024)

work page 2024

[32] [32]

Zhang, Y . & Li, S. Chronoformer: Time-aware transformer architectures for structured clinical event modeling.arXiv preprint arXiv:2504.07373(2025)

work page arXiv 2025

[33] [33]

Kraljevic, Z., Bean, D., Shek, A.et al.Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: A retrospective modelling study.The Lancet Digital Health6, e281–e290 (2024)

work page 2024

[34] [34]

W., Gaurav, K.et al.Learning the natural history of human disease with generative transformers.Nature(2025)

Shmatko, A., Jung, A. W., Gaurav, K.et al.Learning the natural history of human disease with generative transformers.Nature(2025)

work page 2025

[35] [35]

Li, Y ., Mamouei, M., Salimi-Khorshidi, G.et al.Hi-BEHRT: Hierarchical transformer-based model for ac- curate prediction of clinical events using multimodal longitudinal electronic health records.IEEE Journal of Biomedical and Health Informatics27, 1106–1117 (2023)

work page 2023

[36] [36]

Kauffman, J.et al.Embedding methods for electronic health record research.Annual Review of Biomedical Data Science8(2025)

work page 2025

[37] [37]

R.et al.Integrated multimodal artificial intelligence framework for healthcare applications

Soenksen, L. R.et al.Integrated multimodal artificial intelligence framework for healthcare applications. npj Digital Medicine5, 149 (2022). 32

work page 2022

[38] [38]

Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual-language foundation model for pathology image analysis using medical Twitter.Nature Medicine29, 2307–2316 (2023)

work page 2023

[39] [39]

K.et al.Predicting cellular responses to perturbation across diverse contexts with STATE

Adduri, A. K.et al.Predicting cellular responses to perturbation across diverse contexts with STATE. bioRxiv2025.06.26.661135 (2025)

work page 2025

[40] [40]

Bunne, C.et al.How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell 187, 7045–7063 (2024)

work page 2024

[41] [41]

Johnson, A. E. W.et al.MIMIC-IV, a freely accessible electronic health record dataset.Scientific Data 10, 1 (2023)

work page 2023

[42] [42]

Organization, W. H. ICD-10: International statistical classification of diseases and related health problems: Tenth revision (2004)

work page 2004

[43] [43]

Ding, T.et al.A multimodal whole-slide foundation model for pathology.Nature Medicine1–13 (2025)

work page 2025

[44] [44]

Y .et al.A visual-language foundation model for computational pathology.Nature Medicine30, 863–874 (2024)

Lu, M. Y .et al.A visual-language foundation model for computational pathology.Nature Medicine30, 863–874 (2024). Publisher: Nature Publishing Group

work page 2024

[45] [45]

From ehrs to patient pathways: Scalable modeling of longitudinal health trajectories with llms.arXiv preprint arXiv:2506.04831, 2025

Pellegrini, C., ¨Ozsoy, E., Bani-Harouni, D., Keicher, M. & Navab, N. From ehrs to patient pathways: Scalable modeling of longitudinal health trajectories with llms.arXiv preprint arXiv:2506.04831(2025)

work page arXiv 2025

[46] [46]

Waxler, S.et al.Generative medical event models improve with scale.arXiv preprint arXiv:2508.12104 (2025)

work page arXiv 2025

[47] [47]

A.et al.Non-anemic iron deficiency predicts COPD exacerbations and hospitalizations: Re- sults from a prospective cohort.Journal of Clinical Medicine14, 4154 (2025)

Amado, C. A.et al.Non-anemic iron deficiency predicts COPD exacerbations and hospitalizations: Re- sults from a prospective cohort.Journal of Clinical Medicine14, 4154 (2025)

work page 2025

[48] [48]

Dong, Z.et al.Association between iron homeostasis and prognosis in patients with chronic obstructive pulmonary disease: A retrospective analysis from MIMIC-IV database.Frontiers in Medicine12, 1610681 (2025)

work page 2025

[49] [49]

Ghonemy, S., Nasr, M. M. M., Soliman, M. & Hosiney, H. A. Clinical skin aging score and risk of degenerative cardiovascular diseases.The Journal of Clinical and Aesthetic Dermatology14, 34 (2021)

work page 2021

[50] [50]

& Katira, R

Katira, A. & Katira, R. Dermatological manifestations of cardiac conditions.The British Journal of Cardiology29, 9 (2022)

work page 2022

[51] [51]

O., Abiodun, O

Soyoye, D. O., Abiodun, O. O., Ikem, R. T., Kolawole, B. A. & Akintomide, A. O. Diabetes and peripheral artery disease: A review.World Journal of Diabetes12, 827 (2021)

work page 2021

[52] [52]

W., Holm, P

Rasmussen, C., Larsen, J. W., Holm, P. C. & Nielsen, G. L. Gout: An overlooked disease in patients with diabetes? a danish prospective cohort study with 2 years of follow-up.Clinical Diabetes43, 282–290 (2025)

work page 2025

[53] [53]

Rasmussen, C.et al.Identifying tophaceous gout in foot ulcers using ulcer debris microscopy in type 2 diabetes.Journal of Wound Management26, 175–181 (2025)

work page 2025

[54] [54]

& Malik, M

Valiyaveettil, D., Joseph, D. & Malik, M. Cardiotoxicity in breast cancer treatment: Causes and mitigation. Cancer Treatment and Research Communications37, 100760 (2023)

work page 2023

[55] [55]

Yalc ¸ıner, M.et al.Impact of comorbidity on survival in cancer patients receiving immune checkpoint inhibitors.Clinical and Translational Oncology1–8 (2025)

work page 2025

[56] [56]

Carreira, H.et al.Use of anthracyclines and trastuzumab for breast cancer in women with and without a history of cardiovascular disease in sweden: A national cross-sectional study.Cardio-Oncology11, 56 (2025). 33

work page 2025

[57] [57]

Poletto, S.et al.Predictive factors in metastatic melanoma treated with immune checkpoint inhibitors: From clinical practice to future perspective.Cancers16, 101 (2023)

work page 2023

[58] [58]

& Wang, F

Du, Y ., Wu, W., Chen, M., Dong, Z. & Wang, F. Cutaneous adverse events and cancer survival prognosis with immune checkpoint inhibitor treatment: A systematic review and meta-analysis.JAMA Dermatology 159, 1093–1101 (2023)

work page 2023

[59] [59]

& Chu, C.-Y

Cho, Y .-T., Lin, Y .-T., Yang, C.-W. & Chu, C.-Y . Cutaneous immune-related adverse events among tai- wanese cancer patients receiving immune checkpoint inhibitors link to a survival benefit.Scientific Reports 12, 7021 (2022)

work page 2022

[60] [60]

In International Conference on Medical Image Computing and Computer-Assisted Intervention, 520–530 (Springer, 2024)

Koch, V .et al.DinoBloom: A foundation model for generalizable cell embeddings in hematology. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 520–530 (Springer, 2024)

work page 2024

[61] [61]

Vaswani, A.et al.Attention is all you need.Advances in neural information processing systems30(2017)

work page 2017

[62] [62]

Yang, A.et al.Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[63] [63]

Zadeh, S. G. & Schmid, M. Bias in cross-entropy-based training of deep survival networks.IEEE Trans- actions on Pattern Analysis and Machine Intelligence43, 3126–3137 (2020)

work page 2020

[64] [64]

& Yan, Q

Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Precup, D. & Teh, Y . W. (eds.)Proceedings of the 34th International Conference on Machine Learning, vol. 70 ofProceedings of Machine Learning Research, 3319–3328 (PMLR, 2017)

work page 2017

[65] [65]

M., Mahajan, A., Scott, D

Chase, D. M., Mahajan, A., Scott, D. A., Hawkins, N. & Kalilani, L. The impact of varying levels of residual disease following cytoreductive surgery on survival outcomes in patients with ovarian cancer: A meta-analysis.BMC Women’s Health24, 179 (2024)

work page 2024

[66] [66]

Petrucelli, N., Daly, M. B. & Pal, T. BRCA1- and BRCA2-associated hereditary breast and ovarian cancer. In Adam, M. P., Bick, S., Mirzaa, G. M.et al.(eds.)GeneReviews® [Internet](University of Washington, Seattle, Seattle (W A), 2025). Initial posting: 1998-09-04. Updated: 2025-03-20. PMID: 20301425. Bookshelf ID: NBK1247

work page 2025

[67] [67]

Praestegaard, C.et al.Cigarette smoking is associated with adverse survival among women with ovarian cancer: Results from a pooled analysis of 19 studies.International Journal of Cancer140, 2422–2435 (2017)

work page 2017

[68] [68]

R.et al.Anthracycline-related cardiotoxicity in older patients with acute myeloid leukemia: A young SIOG review paper.Blood Advances4, 762–775 (2020)

Neuendorff, N. R.et al.Anthracycline-related cardiotoxicity in older patients with acute myeloid leukemia: A young SIOG review paper.Blood Advances4, 762–775 (2020)

work page 2020

[69] [69]

Wang, J.et al.Impact of chronic kidney disease on the prognosis of transcatheter aortic valve replacement in patients with aortic stenosis: A meta-analysis of 133624 patients.Annals of Thoracic and Cardiovas- cular Surgery28, 83–95 (2022)

work page 2022

[70] [70]

J.et al.Association between use of antithrombotic medication and hematuria-related compli- cations.JAMA318, 1260–1271 (2017)

Wallis, C. J.et al.Association between use of antithrombotic medication and hematuria-related compli- cations.JAMA318, 1260–1271 (2017). 34 Extended Data Figures 0 121 243 365 0.99950 0.99975 1.00000Disease-free probability Nephrotic Syndrome (n=324,298, I=0.0%) p < 0.0001 0 608 1216 1825 0.9990 0.9995 1.0000 Acute Lymphocytic Leukemia (n=320,401, I=0.0%)...

work page 2017