A multimodal and temporal foundation model for virtual patient representations at healthcare system scale
Pith reviewed 2026-05-10 04:57 UTC · model grok-4.3
The pith
A multimodal foundation model unifies full patient records into embeddings for forecasting hundreds of clinical outcomes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Apollo is a multimodal temporal foundation model that learns a unified representation space from over 100 thousand medical events, images, and clinical text across 7.2 million patients. The resulting virtual patient representations enable generalized clinical forecasting on 95 new disease onset tasks up to five years ahead, 78 disease progression tasks, 59 treatment response tasks, 17 adverse event risk tasks, and 12 hospital operations tasks, while also supporting 61 semantic retrieval tasks and showing alignment with interpretable biomarkers.
What carries the argument
The Apollo model itself, which acts as a compressor turning sequences of structured events, unstructured text, and images into unified virtual patient embeddings that capture the full care journey.
If this is right
- The embeddings allow prediction of new disease onset risk up to five years in advance across 95 tasks.
- Disease progression forecasting is possible in 78 tasks.
- Treatment response prediction covers 59 tasks and adverse event risks cover 17 tasks.
- Hospital operations endpoints are addressed in 12 tasks and semantic search in 61 retrieval tasks.
- Feature attribution confirms that predictions rely on clinically relevant multimodal signals.
Where Pith is reading between the lines
- This could enable searching for similar patient trajectories using text or image queries to guide care in complex cases.
- Performance on external datasets from other hospitals would test if the representations are truly general or system-specific.
- Integration into existing record systems could provide automated risk alerts based on the full record history.
- The model might support cross-modal queries that link images directly to future outcome probabilities.
Load-bearing premise
The test set patients and their data patterns are representative of future patients, and the model captures generalizable medical signals instead of hospital-specific documentation biases.
What would settle it
If the predictive performance on the 95 disease onset tasks drops significantly when applied to patient data from a different hospital system, this would falsify the claim of generalized representations.
Figures
read the original abstract
Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal hospital records from a major US hospital system, composed of 25 billion records from 7.2 million patients, representing 28 distinct medical modalities and 12 major medical specialties. Apollo learns a unified representation space integrating over 100 thousand unique medical events in our clinical vocabulary as well as images and clinical text. This "atlas of medical concepts" forms a computational substrate for modeling entire patient care journeys comprised of sequences of structured and unstructured events, which are compressed by Apollo into virtual patient representations. To assess the potential of these whole-patient representations, we created 322 prognosis and retrieval tasks from a held-out test set of 1.4 million patients. We demonstrate the generalized clinical forecasting potential of Apollo embeddings, including predicting new disease onset risk up to five years in advance (95 tasks), disease progression (78 tasks), treatment response (59 tasks), risk of treatment-related adverse events (17 tasks), and hospital operations endpoints (12 tasks). Using feature attribution techniques, we show that model predictions align with clinically-interpretable multimodal biomarkers. We evaluate semantic similarity search on 61 retrieval tasks, and moreover demonstrate the potential of Apollo as a multimodal medical search engine using text and image queries. Together, these modeling capabilities establish the foundation for computable medicine, where the full context of patient care becomes accessible to computational reasoning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Apollo, a multimodal temporal foundation model trained on 25 billion records from 7.2 million patients spanning 28 medical modalities and 12 specialties from a single US hospital system. It constructs unified representations integrating over 100k medical events, images, and clinical text into 'virtual patient representations' that compress entire care journeys. These representations are assessed via 322 prognosis and retrieval tasks on a temporally held-out cohort of 1.4 million patients, with claims of forecasting new disease onset up to five years ahead (95 tasks), disease progression (78 tasks), treatment response (59 tasks), adverse events (17 tasks), hospital operations (12 tasks), plus semantic similarity search on 61 tasks and multimodal query capabilities.
Significance. If the results hold, the work has substantial significance due to the unprecedented scale of the integrated dataset and the breadth of evaluated tasks, which together position the embeddings as a potential substrate for computable medicine. The temporal hold-out design and use of feature attribution for interpretability are positive elements. The large patient cohort and multimodal coverage represent a clear strength that could enable downstream applications if generalizability is established.
major comments (3)
- [Abstract] Abstract and evaluation description: the central claim of 'generalized clinical forecasting potential' across 322 tasks is unsupported because no quantitative performance metrics (AUC, F1, calibration, or statistical tests), baseline comparisons, or ablation results are reported for any task, preventing assessment of whether the embeddings outperform trivial or existing methods.
- [Data and Evaluation] Data section: all training (7.2M patients) and evaluation (1.4M held-out patients) occurs within a single hospital system's records. This single-center limitation means the 95 onset, 78 progression, and other tasks test only intra-site patterns (coding, documentation, demographics), directly threatening the headline claims of generalization 'at healthcare system scale' and transportable clinical signals without external validation.
- [Methods] Methods: no architecture details, training objective, loss functions, optimization procedure, or hyperparameter choices are provided for the foundation model that produces the embeddings used in all 322 tasks, rendering the central modeling contribution impossible to reproduce or critique.
minor comments (2)
- [Abstract] The phrase 'virtual patient representations' is used repeatedly without a formal definition or equation distinguishing it from standard sequence embeddings.
- [Abstract] The abstract lists task counts (95, 78, 59, etc.) but does not indicate how tasks were constructed or balanced, which affects interpretation of the forecasting results.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for highlighting areas where the manuscript can be strengthened. We address each major comment in turn below, with plans for targeted revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract and evaluation description: the central claim of 'generalized clinical forecasting potential' across 322 tasks is unsupported because no quantitative performance metrics (AUC, F1, calibration, or statistical tests), baseline comparisons, or ablation results are reported for any task, preventing assessment of whether the embeddings outperform trivial or existing methods.
Authors: We agree that the abstract, as a high-level summary, does not contain specific numerical results. The current manuscript defines the 322 tasks and describes the overall evaluation framework but does not report the requested quantitative metrics, baseline comparisons, or ablations. In the revised version we will add a concise Results subsection (or expanded evaluation paragraph) that reports representative AUC-ROC, F1, calibration, and statistical test values across task categories, includes comparisons to standard baselines (e.g., logistic regression on structured features), and presents modality and temporal ablations. We will also update the abstract to include one or two key quantitative highlights so readers can immediately gauge performance. revision: yes
-
Referee: [Data and Evaluation] Data section: all training (7.2M patients) and evaluation (1.4M held-out patients) occurs within a single hospital system's records. This single-center limitation means the 95 onset, 78 progression, and other tasks test only intra-site patterns (coding, documentation, demographics), directly threatening the headline claims of generalization 'at healthcare system scale' and transportable clinical signals without external validation.
Authors: We acknowledge the single-center constraint as a genuine limitation. Although the temporal hold-out design tests forecasting on future patients within the same system and the cohort size is large, the evaluation cannot speak to transportability across institutions with differing coding practices or populations. We cannot obtain external datasets for additional validation at this time. In revision we will insert an explicit Limitations section that states this restriction, discusses potential site-specific biases, and outlines the need for future multi-center studies. We will also moderate language in the abstract, introduction, and title to clarify that claims refer to scale within one large healthcare system rather than universal generalizability. revision: partial
-
Referee: [Methods] Methods: no architecture details, training objective, loss functions, optimization procedure, or hyperparameter choices are provided for the foundation model that produces the embeddings used in all 322 tasks, rendering the central modeling contribution impossible to reproduce or critique.
Authors: The referee correctly identifies that the current Methods section lacks sufficient technical detail for reproducibility. We will expand it substantially to include: (1) a precise description of the multimodal transformer architecture with temporal encodings, (2) the composite training objective (masked event modeling plus cross-modal contrastive loss), (3) the exact loss functions and weighting, (4) the optimizer, learning-rate schedule, and batching strategy, and (5) a table of all key hyperparameters. A high-level pseudocode block and an architecture diagram will also be added. These additions will allow readers to understand and, where data access permits, reproduce the embedding generation process. revision: yes
- The single-center data constraint and consequent inability to supply external validation experiments with data from other healthcare systems.
Circularity Check
No significant circularity; evaluations independent of training objective
full rationale
The paper trains Apollo on 7.2M patients' multimodal longitudinal records to learn unified embeddings, then evaluates those embeddings on a temporally held-out cohort of 1.4M patients using 322 separately defined downstream tasks (95 disease-onset, 78 progression, 59 treatment-response, etc.). These forecasting and retrieval tasks are not quantities defined by the training objective itself, nor do they reduce to fitted parameters or self-citations by construction. No equations, self-definitional steps, or load-bearing self-citations appear in the provided text; the derivation chain is self-contained against external held-out benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The 25 billion records accurately capture patient states, events, and outcomes without systematic documentation bias
- domain assumption The held-out 1.4 million patients are statistically exchangeable with future patients at the same institution
invented entities (1)
-
virtual patient representations
no independent evidence
Forward citations
Cited by 2 Pith papers
-
Simulating clinical interventions with a generative multimodal model of human physiology
HealthFormer is a generative multimodal transformer that forecasts individual physiological trajectories and simulates clinical interventions, outperforming clinical risk scores on disease prediction and matching tria...
-
DT-Transformer: A Foundation Model for Disease Trajectory Prediction on a Real-world Health System
DT-Transformer predicts next disease events with median age- and sex-stratified AUC 0.871 across 896 categories on held-out and prospective data from a 1.7M-patient multi-hospital EHR dataset.
Reference graph
Works this paper leans on
-
[1]
Moor, M.et al.Foundation models for generalist medical artificial intelligence.Nature616, 259–265 (2023)
work page 2023
-
[2]
Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records: Towards better research applications and clinical care.Nature Reviews Genetics13, 395–405 (2012)
work page 2012
-
[3]
The healthcare data explosion (2023)
RBC Capital Markets. The healthcare data explosion (2023). URLhttps://www.rbccm.com/en/ gib/healthcare/episode/the_healthcare_data_explosion
work page 2023
-
[4]
Report: Only 57% of healthcare organizations’ data is used to make decisions
Arcadia. Report: Only 57% of healthcare organizations’ data is used to make decisions. Tech. Rep., Healthcare Information and Management Systems Society (HIMSS) (2023)
work page 2023
-
[5]
Newman-Toker, D. E.et al.Burden of serious harms from diagnostic error in the USA.BMJ Quality & Safety33, 109–120 (2024)
work page 2024
-
[6]
Cheng, Y ., Wang, F., Zhang, P. & Hu, J. Risk prediction with electronic health records: A deep learning approach. InProceedings of the 2016 SIAM international conference on data mining, 432–440 (SIAM, 2016)
work page 2016
- [7]
-
[8]
Brown, T.et al.Language models are few-shot learners.Advances in neural information processing systems33, 1877–1901 (2020)
work page 1901
-
[9]
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 4171–4186 (2019)
work page 2019
-
[10]
Oquab, M.et al.DINOv2: Learning robust visual features without supervision.Transactions on Machine Learning Research(2024)
work page 2024
-
[11]
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, 1597–1607 (PMLR, 2020)
work page 2020
-
[12]
Lin, Z.et al.Evolutionary-scale prediction of atomic-level protein structure with a language model. Science379, 1123–1130 (2023)
work page 2023
-
[13]
Nguyen, E.et al.Sequence modeling and design from molecular to genome scale with Evo.Science386, eado9336 (2024)
work page 2024
-
[14]
Bommasani, R.et al.On the opportunities and risks of foundation models.ArXiv(2021)
work page 2021
-
[15]
Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine.Nature medicine28, 31–38 (2022)
work page 2022
-
[16]
Tu, T.et al.Towards generalist biomedical AI.NEJM AI1, AIoa2300138 (2024)
work page 2024
-
[17]
Alsentzer, E.et al.Publicly available clinical BERT embeddings. InProceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78 (Association for Computational Linguistics, Minneapolis, Minnesota, USA, 2019). URLhttps://www.aclweb.org/anthology/W19-1909
work page 2019
-
[18]
Yang, X.et al.A large language model for electronic health records.npj Digital Medicine5, 194 (2022). 31
work page 2022
-
[19]
Chen, R. J.et al.Towards a general-purpose foundation model for computational pathology.Nature Medicine30, 850–862 (2024)
work page 2024
-
[20]
V orontsov, E.et al.A foundation model for clinical-grade computational pathology and rare cancers detection.Nature Medicine30, 2924–2935 (2024)
work page 2024
-
[21]
P ´erez-Garc´ıa, F.et al.Exploring scalable medical image encoders beyond text supervision.Nature Ma- chine Intelligence1–12 (2025)
work page 2025
-
[22]
Tiu, E.et al.Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning.Nature Biomedical Engineering6, 1399–1406 (2022)
work page 2022
-
[23]
Liu, S., Wang, X., Hou, Y .et al.Multimodal data matters: Language model pre-training over structured and unstructured electronic health records.IEEE Journal of Biomedical and Health Informatics27, 504– 514 (2023)
work page 2023
-
[24]
Khader, F., Kather, J. N., M¨uller-Franzes, G.et al.Medical transformer for multimodal survival prediction in intensive care: Integration of imaging and non-imaging data.Scientific Reports13, 10666 (2023)
work page 2023
- [25]
- [26]
-
[27]
Li, Y .et al.BEHRT: Transformer for electronic health records.Scientific Reports10, 7155 (2020)
work page 2020
- [28]
-
[29]
Miotto, R., Li, L., Kidd, B. A. & Dudley, J. T. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records.Scientific Reports6, 26094 (2016)
work page 2016
-
[30]
Redekop, E.et al.Zero-shot medical event prediction using a generative pretrained transformer on elec- tronic health records.Journal of the American Medical Informatics Association32, 1833–1842 (2025)
work page 2025
-
[31]
Renc, P.et al.Zero-shot health trajectory prediction using transformer.npj Digital Medicine7, 256 (2024)
work page 2024
- [32]
-
[33]
Kraljevic, Z., Bean, D., Shek, A.et al.Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: A retrospective modelling study.The Lancet Digital Health6, e281–e290 (2024)
work page 2024
-
[34]
Shmatko, A., Jung, A. W., Gaurav, K.et al.Learning the natural history of human disease with generative transformers.Nature(2025)
work page 2025
-
[35]
Li, Y ., Mamouei, M., Salimi-Khorshidi, G.et al.Hi-BEHRT: Hierarchical transformer-based model for ac- curate prediction of clinical events using multimodal longitudinal electronic health records.IEEE Journal of Biomedical and Health Informatics27, 1106–1117 (2023)
work page 2023
-
[36]
Kauffman, J.et al.Embedding methods for electronic health record research.Annual Review of Biomedical Data Science8(2025)
work page 2025
-
[37]
R.et al.Integrated multimodal artificial intelligence framework for healthcare applications
Soenksen, L. R.et al.Integrated multimodal artificial intelligence framework for healthcare applications. npj Digital Medicine5, 149 (2022). 32
work page 2022
-
[38]
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual-language foundation model for pathology image analysis using medical Twitter.Nature Medicine29, 2307–2316 (2023)
work page 2023
-
[39]
K.et al.Predicting cellular responses to perturbation across diverse contexts with STATE
Adduri, A. K.et al.Predicting cellular responses to perturbation across diverse contexts with STATE. bioRxiv2025.06.26.661135 (2025)
work page 2025
-
[40]
Bunne, C.et al.How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell 187, 7045–7063 (2024)
work page 2024
-
[41]
Johnson, A. E. W.et al.MIMIC-IV, a freely accessible electronic health record dataset.Scientific Data 10, 1 (2023)
work page 2023
-
[42]
Organization, W. H. ICD-10: International statistical classification of diseases and related health problems: Tenth revision (2004)
work page 2004
-
[43]
Ding, T.et al.A multimodal whole-slide foundation model for pathology.Nature Medicine1–13 (2025)
work page 2025
-
[44]
Lu, M. Y .et al.A visual-language foundation model for computational pathology.Nature Medicine30, 863–874 (2024). Publisher: Nature Publishing Group
work page 2024
-
[45]
Pellegrini, C., ¨Ozsoy, E., Bani-Harouni, D., Keicher, M. & Navab, N. From ehrs to patient pathways: Scalable modeling of longitudinal health trajectories with llms.arXiv preprint arXiv:2506.04831(2025)
- [46]
-
[47]
Amado, C. A.et al.Non-anemic iron deficiency predicts COPD exacerbations and hospitalizations: Re- sults from a prospective cohort.Journal of Clinical Medicine14, 4154 (2025)
work page 2025
-
[48]
Dong, Z.et al.Association between iron homeostasis and prognosis in patients with chronic obstructive pulmonary disease: A retrospective analysis from MIMIC-IV database.Frontiers in Medicine12, 1610681 (2025)
work page 2025
-
[49]
Ghonemy, S., Nasr, M. M. M., Soliman, M. & Hosiney, H. A. Clinical skin aging score and risk of degenerative cardiovascular diseases.The Journal of Clinical and Aesthetic Dermatology14, 34 (2021)
work page 2021
-
[50]
Katira, A. & Katira, R. Dermatological manifestations of cardiac conditions.The British Journal of Cardiology29, 9 (2022)
work page 2022
-
[51]
Soyoye, D. O., Abiodun, O. O., Ikem, R. T., Kolawole, B. A. & Akintomide, A. O. Diabetes and peripheral artery disease: A review.World Journal of Diabetes12, 827 (2021)
work page 2021
-
[52]
Rasmussen, C., Larsen, J. W., Holm, P. C. & Nielsen, G. L. Gout: An overlooked disease in patients with diabetes? a danish prospective cohort study with 2 years of follow-up.Clinical Diabetes43, 282–290 (2025)
work page 2025
-
[53]
Rasmussen, C.et al.Identifying tophaceous gout in foot ulcers using ulcer debris microscopy in type 2 diabetes.Journal of Wound Management26, 175–181 (2025)
work page 2025
-
[54]
Valiyaveettil, D., Joseph, D. & Malik, M. Cardiotoxicity in breast cancer treatment: Causes and mitigation. Cancer Treatment and Research Communications37, 100760 (2023)
work page 2023
-
[55]
Yalc ¸ıner, M.et al.Impact of comorbidity on survival in cancer patients receiving immune checkpoint inhibitors.Clinical and Translational Oncology1–8 (2025)
work page 2025
-
[56]
Carreira, H.et al.Use of anthracyclines and trastuzumab for breast cancer in women with and without a history of cardiovascular disease in sweden: A national cross-sectional study.Cardio-Oncology11, 56 (2025). 33
work page 2025
-
[57]
Poletto, S.et al.Predictive factors in metastatic melanoma treated with immune checkpoint inhibitors: From clinical practice to future perspective.Cancers16, 101 (2023)
work page 2023
- [58]
-
[59]
Cho, Y .-T., Lin, Y .-T., Yang, C.-W. & Chu, C.-Y . Cutaneous immune-related adverse events among tai- wanese cancer patients receiving immune checkpoint inhibitors link to a survival benefit.Scientific Reports 12, 7021 (2022)
work page 2022
-
[60]
Koch, V .et al.DinoBloom: A foundation model for generalizable cell embeddings in hematology. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 520–530 (Springer, 2024)
work page 2024
-
[61]
Vaswani, A.et al.Attention is all you need.Advances in neural information processing systems30(2017)
work page 2017
-
[62]
Yang, A.et al.Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[63]
Zadeh, S. G. & Schmid, M. Bias in cross-entropy-based training of deep survival networks.IEEE Trans- actions on Pattern Analysis and Machine Intelligence43, 3126–3137 (2020)
work page 2020
- [64]
-
[65]
Chase, D. M., Mahajan, A., Scott, D. A., Hawkins, N. & Kalilani, L. The impact of varying levels of residual disease following cytoreductive surgery on survival outcomes in patients with ovarian cancer: A meta-analysis.BMC Women’s Health24, 179 (2024)
work page 2024
-
[66]
Petrucelli, N., Daly, M. B. & Pal, T. BRCA1- and BRCA2-associated hereditary breast and ovarian cancer. In Adam, M. P., Bick, S., Mirzaa, G. M.et al.(eds.)GeneReviews® [Internet](University of Washington, Seattle, Seattle (W A), 2025). Initial posting: 1998-09-04. Updated: 2025-03-20. PMID: 20301425. Bookshelf ID: NBK1247
work page 2025
-
[67]
Praestegaard, C.et al.Cigarette smoking is associated with adverse survival among women with ovarian cancer: Results from a pooled analysis of 19 studies.International Journal of Cancer140, 2422–2435 (2017)
work page 2017
-
[68]
Neuendorff, N. R.et al.Anthracycline-related cardiotoxicity in older patients with acute myeloid leukemia: A young SIOG review paper.Blood Advances4, 762–775 (2020)
work page 2020
-
[69]
Wang, J.et al.Impact of chronic kidney disease on the prognosis of transcatheter aortic valve replacement in patients with aortic stenosis: A meta-analysis of 133624 patients.Annals of Thoracic and Cardiovas- cular Surgery28, 83–95 (2022)
work page 2022
-
[70]
Wallis, C. J.et al.Association between use of antithrombotic medication and hematuria-related compli- cations.JAMA318, 1260–1271 (2017). 34 Extended Data Figures 0 121 243 365 0.99950 0.99975 1.00000Disease-free probability Nephrotic Syndrome (n=324,298, I=0.0%) p < 0.0001 0 608 1216 1825 0.9990 0.9995 1.0000 Acute Lymphocytic Leukemia (n=320,401, I=0.0%)...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.