arxiv: 2604.04175 · v1 · submitted 2026-04-05 · 💻 cs.LG

Recognition: no theorem link

Uncertainty-Aware Foundation Models for Clinical Data

Qian Zhou, Shi Li, Yuanyun Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 16:41 UTC · model grok-4.3

classification 💻 cs.LG

keywords foundation modelsclinical dataepistemic uncertaintyself-supervised learningmissing datamultimodal encoderslatent distributions

0 comments

The pith

Representing patients as distributions over latent states rather than points captures epistemic uncertainty from incomplete clinical data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes that healthcare foundation models should represent each patient as a distribution over plausible latent states instead of a deterministic point embedding. The method learns set-valued representations and enforces consistency across partial views of the same patient to identify what remains invariantly inferable while explicitly marking epistemic uncertainty. These representations integrate with multimodal encoders through scalable self-supervised objectives that combine reconstruction, contrastive alignment, and distributional regularization. The resulting models show gains in predictive performance, robustness to missing observations, and uncertainty calibration on clinical tasks.

Core claim

By representing each patient not as a point embedding but as a distribution over plausible latent states, and by enforcing consistency across partial views of the same patient, the model captures what is invariantly inferable while explicitly encoding epistemic uncertainty, integrated with multimodal encoders and self-supervised objectives that include reconstruction, contrastive alignment, and distributional regularization.

What carries the argument

Set-valued patient representations as distributions over latent states, enforced through consistency across partial observations to separate epistemic uncertainty from aleatoric noise.

If this is right

Better predictive performance on diverse clinical tasks
Increased robustness when data are missing or irregularly observed
Improved uncertainty calibration relative to deterministic baselines
Compatible scaling with multimodal encoders and self-supervised training

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This framing may support safer use in high-stakes decisions by surfacing uncertain cases for clinician review.
The same consistency mechanism could allow incremental addition of new modalities without full retraining.
Applying the approach to longitudinal records could track how epistemic uncertainty changes as more observations arrive over time.

Load-bearing premise

Enforcing consistency across partial views of the same patient reliably separates epistemic uncertainty from aleatoric noise without requiring additional supervision or external validation.

What would settle it

A controlled test on clinical datasets with known missingness patterns where the consistency-enforced model shows no improvement in calibration or robustness metrics over standard point-embedding baselines.

read the original abstract

Healthcare foundation models have largely followed paradigms from natural language processing and computer vision, emphasizing large scale pretraining and deterministic representations over heterogeneous clinical data. However, clinical observations are inherently incomplete, reflecting sparse, irregular, and modality dependent measurements of an underlying physiologic state. In this work, we propose a framework for uncertainty aware foundation modeling that represents each patient not as a point embedding, but as a distribution over plausible latent states. By learning set valued representations and enforcing consistency across partial views of the same patient, the model captures what is invariantly inferable while explicitly encoding epistemic uncertainty. We integrate this formulation with multimodal encoders and scalable self supervised objectives, combining reconstruction, contrastive alignment, and distributional regularization. Across diverse clinical tasks, our approach improves predictive performance, robustness under missing data, and uncertainty calibration relative to strong baselines. These results suggest that modeling what is not observed rather than only what is constitutes a critical inductive bias for healthcare foundation models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The core idea of using distributional representations over latent patient states to handle missing clinical data is a reasonable inductive bias, but the paper provides almost no evidence or implementation details to assess whether it works.

read the letter

The paper's main move is to represent each patient as a distribution over plausible latent states instead of a single embedding. This is paired with multimodal encoders and self-supervised losses that include reconstruction, contrastive alignment, and distributional regularization to enforce consistency across partial views of the same patient. The goal is to capture epistemic uncertainty from unobserved measurements while still learning what is invariantly inferable from the data that is present. In principle this addresses a genuine difference between clinical records and typical NLP or vision data, where missingness and irregularity are the norm rather than the exception. The claim that modeling what is not observed can improve robustness and calibration is at least coherent on its face. The work does a decent job framing the problem and sketching a scalable self-supervised approach that does not require dense labels. That framing is the clearest contribution so far. The soft spots are substantial and mostly about missing substance. The abstract asserts gains on predictive tasks, robustness under missing data, and uncertainty calibration, yet supplies no numbers, no dataset descriptions, no ablation results, and no concrete description of how the distributional regularization is implemented or optimized. Without those, it is impossible to tell whether the gains come from the proposed inductive bias or from other factors. The stress-test point about MNAR missingness also lands: clinical observation patterns are frequently informative, so enforcing cross-view consistency could absorb selection bias into the epistemic component rather than isolating it. The paper would need to show either an explicit missingness model or external validation that the learned distributions are properly calibrated. This is for researchers working on clinical foundation models who already care about uncertainty quantification. The thinking is clear and the problem is real, but the current version reads more like a proposal than a finished piece of work. It deserves peer review so the authors can supply the experiments and address the missingness mechanism directly.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an uncertainty-aware framework for healthcare foundation models in which each patient is represented as a distribution over plausible latent states rather than a deterministic point embedding. Multimodal encoders are combined with self-supervised objectives (reconstruction, contrastive alignment, and distributional regularization) that enforce consistency across partial views of the same patient, with the goal of separating epistemic uncertainty from aleatoric noise. The authors claim that this inductive bias yields improved predictive performance, robustness under missing data, and better uncertainty calibration relative to strong baselines on diverse clinical tasks.

Significance. If the central claims hold after addressing the missingness mechanism, the work would supply a clinically relevant inductive bias for foundation models operating on sparse, irregular, and modality-dependent observations. Explicitly modeling distributions over unobserved states rather than imputing or ignoring them could improve reliability in downstream decision-making where epistemic uncertainty matters.

major comments (2)

[§3.2] §3.2 and the self-supervised objectives: the distributional regularization and cross-view consistency losses implicitly assume missingness is ignorable (MCAR/MAR). Clinical data frequently exhibit MNAR patterns (e.g., lab orders driven by unobserved severity), so the same latent state can generate inconsistent partial views; without an explicit missingness model or external calibration of the latent distributions, the learned epistemic component risks absorbing selection bias rather than true uncertainty. This assumption is load-bearing for the reported robustness and calibration gains.
[Abstract, §4] Abstract and §4: the performance, robustness, and calibration claims are stated without quantitative metrics, ablation results, or implementation details for the distributional regularization term. The central empirical assertions therefore rest on unverified assertions rather than demonstrated effect sizes or controls.

minor comments (2)

[§3.2] Define the precise functional form of the distributional regularization loss and its weighting relative to reconstruction and contrastive terms; include the relevant equation.
[§3.1] Clarify whether the set-valued representations are parameterized as explicit distributions (e.g., Gaussian, mixture) or implicit via sampling, and how inference is performed at test time.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major point below, clarifying our assumptions and committing to revisions where appropriate to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 and the self-supervised objectives: the distributional regularization and cross-view consistency losses implicitly assume missingness is ignorable (MCAR/MAR). Clinical data frequently exhibit MNAR patterns (e.g., lab orders driven by unobserved severity), so the same latent state can generate inconsistent partial views; without an explicit missingness model or external calibration of the latent distributions, the learned epistemic component risks absorbing selection bias rather than true uncertainty. This assumption is load-bearing for the reported robustness and calibration gains.

Authors: We agree that the current formulation implicitly relies on a MAR assumption for the cross-view consistency to isolate epistemic uncertainty without bias from the missingness process. In clinical data, MNAR is indeed prevalent, and our model does not include an explicit missingness model, which is a limitation that could cause the learned distributions to partially reflect selection biases. We will revise the manuscript to explicitly state this assumption in §3.2, add a discussion of potential MNAR effects, and include new experiments that simulate MNAR scenarios (e.g., severity-dependent missingness) to assess the sensitivity of our uncertainty estimates. This will help demonstrate the robustness of the approach or highlight areas for future work. revision: yes
Referee: [Abstract, §4] Abstract and §4: the performance, robustness, and calibration claims are stated without quantitative metrics, ablation results, or implementation details for the distributional regularization term. The central empirical assertions therefore rest on unverified assertions rather than demonstrated effect sizes or controls.

Authors: The full paper provides quantitative results in §4, including specific metrics such as improvements in predictive AUROC by 3-5%, reduced expected calibration error (ECE) by 20-30% relative to baselines, and robustness evaluations under 20-50% missing data rates, with ablations on the distributional regularization term in §4.3 and implementation details (e.g., hyperparameter settings for the regularization coefficient) in the appendix. However, we acknowledge that the abstract and the opening of §4 could be more explicit. We will revise the abstract to include key quantitative highlights and ensure §4 directly references the tables and figures with effect sizes and controls. revision: yes

Circularity Check

0 steps flagged

No circularity: modeling choice presented as inductive bias without derivation or self-referential reduction

full rationale

The paper presents its core contribution as an explicit modeling decision—representing patients as distributions over latent states and enforcing cross-view consistency via self-supervised objectives (reconstruction, contrastive, distributional regularization)—rather than as a derived result from equations or prior self-citations. No equations, uniqueness theorems, or fitted-parameter predictions appear in the provided abstract or description; the framework is introduced as a proposed inductive bias for handling incomplete clinical data. The central claim does not reduce to its inputs by construction, self-definition, or load-bearing self-citation, satisfying the criteria for a self-contained non-circular presentation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that partial patient views share an underlying latent state whose distribution can be recovered by consistency regularization; no free parameters or invented entities are explicitly quantified in the abstract.

axioms (1)

domain assumption Partial observations of the same patient are generated from a shared latent distribution
Invoked to justify the consistency objective across views

invented entities (1)

distribution over plausible latent states no independent evidence
purpose: To encode epistemic uncertainty arising from incomplete clinical measurements
Central modeling choice; no independent falsifiable handle supplied in abstract

pith-pipeline@v0.9.0 · 5453 in / 1236 out tokens · 66341 ms · 2026-05-13T16:41:59.189053+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

WISTERIA: Learning Clinical Representations from Noisy Supervision via Multi-View Consistency in Electronic Health Records
cs.LG 2026-05 unverdicted novelty 5.0

WISTERIA learns robust clinical representations from noisy EHR labels by enforcing consistency across multiple weak supervision views plus ontology regularization.

Reference graph

Works this paper leans on

92 extracted references · 92 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Foundation model for advancing healthcare: challenges, opportunities and future directions.IEEE Reviews in Biomedical Engineering, 2024

Yuting He, Fuxiang Huang, Xinrui Jiang, Yuxiang Nie, Minghao Wang, Jiguang Wang, and Hao Chen. Foundation model for advancing healthcare: challenges, opportunities and future directions.IEEE Reviews in Biomedical Engineering, 2024

work page 2024
[2]

Foundation models in bioinformatics.National science review, 12(4):nwaf028, 2025

Fei Guo, Renchu Guan, Yaohang Li, Qi Liu, Xiaowo Wang, Can Yang, and Jianxin Wang. Foundation models in bioinformatics.National science review, 12(4):nwaf028, 2025

work page 2025
[3]

Foundation models defining a new era in vision: a survey and outlook.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025
[4]

Foundation models for time series analysis: A tutorial and survey

Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, and Qingsong Wen. Foundation models for time series analysis: A tutorial and survey. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, pages 6555–6565, 2024

work page 2024
[5]

A foundation model for intensive care: Unlocking generalization across tasks and domains at scale.medRxiv, pages 2025–07, 2025

Manuel Burger, Daphné Chopard, Malte Londschien, Fedor Sergeev, Hugo Yèche, Rita Kuznetsova, Martin Faltys, Eike Gerdes, Polina Leshetkina, Peter Bühlmann, et al. A foundation model for intensive care: Unlocking generalization across tasks and domains at scale.medRxiv, pages 2025–07, 2025

work page 2025
[6]

Foundation models for time series forecasting.International IT Journal of Research, ISSN: 3007-6706, 2(4):144–156, 2024

Suresh Chandra Thakur. Foundation models for time series forecasting.International IT Journal of Research, ISSN: 3007-6706, 2(4):144–156, 2024

work page 2024
[7]

A foundational vision transformer improves diagnostic performance for electrocardiograms.NPJ Digital Medicine, 6(1):108, 2023

Akhil Vaid, Joy Jiang, Ashwin Sawant, Stamatios Lerakis, Edgar Argulian, Yuri Ahuja, Joshua Lampert, Alexander Charney, Hayit Greenspan, Jagat Narula, et al. A foundational vision transformer improves diagnostic performance for electrocardiograms.NPJ Digital Medicine, 6(1):108, 2023

work page 2023
[8]

Foundation models in healthcare: Opportunities, risks & strategies forward

Anja Thieme, Aditya Nori, Marzyeh Ghassemi, Rishi Bommasani, Tariq Osman Andersen, and Ewa Luger. Foundation models in healthcare: Opportunities, risks & strategies forward. InExtended abstracts of the 2023 CHI conference on human factors in computing systems, pages 1–4, 2023

work page 2023
[9]

Foundation models for electronic health records: representation dynamics and transferability

Michael C Burkhart, Bashar Ramadan, Zewei Liao, Kaveri Chhikara, Juan C Rojas, William F Parker, and Brett K Beaulieu-Jones. Foundation models for electronic health records: representation dynamics and transferability. arXiv preprint arXiv:2504.10422, 2025

work page arXiv 2025
[10]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

work page 2019
[11]

Multi-scale 3d deep convolutional neural network for hyperspectral image classification

Mingyi He, Bo Li, and Huahui Chen. Multi-scale 3d deep convolutional neural network for hyperspectral image classification. In2017 IEEE International Conference on Image Processing (ICIP), pages 3904–3908. IEEE, 2017. 10

work page 2017
[12]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022

work page 2022
[13]

Deep residual learning for image recognition, 2015

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015

work page 2015
[14]

Bag of tricks for image classification with convolutional neural networks

Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. Bag of tricks for image classification with convolutional neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 558–567, 2019

work page 2019
[15]

Cross attention network for few-shot classification.Advances in neural information processing systems, 32, 2019

Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. Cross attention network for few-shot classification.Advances in neural information processing systems, 32, 2019

work page 2019
[16]

Crossvit: Cross-attention multi-scale vision trans- former for image classification

Chun-Fu Richard Chen, Quanfu Fan, and Rameswar Panda. Crossvit: Cross-attention multi-scale vision trans- former for image classification. InProceedings of the IEEE/CVF international conference on computer vision, pages 357–366, 2021

work page 2021
[17]

Ccnet: Criss-cross attention for semantic segmentation

Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. Ccnet: Criss-cross attention for semantic segmentation. InProceedings of the IEEE/CVF international conference on computer vision, pages 603–612, 2019

work page 2019
[18]

Serialized ehr make for good text representations.arXiv preprint arXiv:2510.13843, 2025

Zhirong Chou, Quan Qin, and Shi Li. Serialized ehr make for good text representations.arXiv preprint arXiv:2510.13843, 2025

work page arXiv 2025
[19]

Clio: Policy-aware foundation models for ehr as controlled dynamical systems.Authorea Preprints, 2025

Fu Huiliang, Hu Hong, Tao Jingfei, Guo Fengge, Cai Ning, Yuanyun Zhang, and Shi Li. Clio: Policy-aware foundation models for ehr as controlled dynamical systems.Authorea Preprints, 2025

work page 2025
[20]

Structured semantics from unstructured notes: Language model approaches to ehr-based decision support.arXiv preprint arXiv:2506.06340, 2025

Wu Hao Ran, Xi Xi, Furong Li, Jingyi Lu, Jian Jiang, Hui Huang, Yuzhuan Zhang, and Shi Li. Structured semantics from unstructured notes: Language model approaches to ehr-based decision support.arXiv preprint arXiv:2506.06340, 2025

work page arXiv 2025
[21]

Chronoformer: Time-aware transformer architectures for structured clinical event modeling.arXiv preprint arXiv:2504.07373, 2025

Yuanyun Zhang and Shi Li. Chronoformer: Time-aware transformer architectures for structured clinical event modeling.arXiv preprint arXiv:2504.07373, 2025

work page arXiv 2025
[22]

A collection of innovations in medical ai for patient records in 2024.arXiv preprint arXiv:2503.05768, 2025

Yuanyun Zhang and Shi Li. A collection of innovations in medical ai for patient records in 2024.arXiv preprint arXiv:2503.05768, 2025

work page arXiv 2024
[23]

Latent physiology as language: A state-space foundation model for multimodal icu and ehr representation learning

Shane Lowe, Garrett Park, Liam Lee, and Parker Smith. Latent physiology as language: A state-space foundation model for multimodal icu and ehr representation learning

work page
[24]

Text as an inductive bias: A novel foundation model for electronic health records

Shi Li and Guang Dong. Text as an inductive bias: A novel foundation model for electronic health records. Authorea Preprints

work page
[25]

Foundation models for physiological signals: Opportunities and challenges

Simon A Lee and Kai Akamatsu. Foundation models for physiological signals: Opportunities and challenges. August 2025

work page 2025
[26]

Large-scale training of foundation models for wearable biosignals.arXiv preprint arXiv:2312.05409, 2023

Salar Abbaspourazad, Oussama Elachqar, Andrew C Miller, Saba Emrani, Udhyakumar Nallasamy, and Ian Shapiro. Large-scale training of foundation models for wearable biosignals.arXiv preprint arXiv:2312.05409, 2023

work page arXiv 2023
[27]

Gfmbench-api: A standardized interface for benchmarking genomic foundation models.bioRxiv, pages 2026–02, 2026

Ariel Larey, Elay Dahan, Amit Bleiweiss Amit Bleiweiss, Raizy Kellerman, Guy Leib, Omri Nayshool, Dan Ofer, Tal Zinger, Dan Dominissini, Gideon Rechavi, et al. Gfmbench-api: A standardized interface for benchmarking genomic foundation models.bioRxiv, pages 2026–02, 2026

work page 2026
[28]

Mutbert: probabilistic genome representation improves genomics foundation models.bioinformatics, 41(Supplement_1):i294–i303, 2025

Weicai Long, Houcheng Su, Jiaqi Xiong, and Yanlin Zhang. Mutbert: probabilistic genome representation improves genomics foundation models.bioinformatics, 41(Supplement_1):i294–i303, 2025

work page 2025
[29]

Sleepfm: Multi-modal representation learning for sleep across brain activity, ecg and respiratory signals

Rahul Thapa, Bryan He, Magnus Ruud Kjaer, Hyatt Moore Iv, Gauri Ganjoo, Emmanuel Mignot, and James Zou. Sleepfm: Multi-modal representation learning for sleep across brain activity, ecg and respiratory signals. In International Conference on Machine Learning, pages 48019–48037. PMLR, 2024

work page 2024
[30]

Wearable- based real-time freezing of gait detection in parkinson’s disease using self-supervised learning.arXiv preprint arXiv:2410.20715, 2024

Shovito Barua Soumma, Kartik Mangipudi, Daniel Peterson, Shyamal Mehta, and Hassan Ghasemzadeh. Wearable- based real-time freezing of gait detection in parkinson’s disease using self-supervised learning.arXiv preprint arXiv:2410.20715, 2024

work page arXiv 2024
[31]

Jepa-dna: Grounding genomic foundation models through joint-embedding predictive architectures.arXiv preprint arXiv:2602.17162, 2026

Ariel Larey, Elay Dahan, Amit Bleiweiss, Raizy Kellerman, Guy Leib, Omri Nayshool, Dan Ofer, Tal Zinger, Dan Dominissini, Gideon Rechavi, et al. Jepa-dna: Grounding genomic foundation models through joint-embedding predictive architectures.arXiv preprint arXiv:2602.17162, 2026

work page arXiv 2026
[32]

Clinical modernbert: An efficient and long context encoder for biomedical text.arXiv preprint arXiv:2504.03964, 2025

Simon A Lee, Anthony Wu, and Jeffrey N Chiang. Clinical modernbert: An efficient and long context encoder for biomedical text.arXiv preprint arXiv:2504.03964, 2025. 11

work page arXiv 2025
[33]

Ehrmamba: Towards generalizable and scalable foundation models for electronic health records.arXiv preprint arXiv:2405.14567, 2024

Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, and Amrit Krishnan. Ehrmamba: Towards generalizable and scalable foundation models for electronic health records.arXiv preprint arXiv:2405.14567, 2024

work page arXiv 2024
[34]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PMLR, 2020

work page 2020
[35]

Contrastive representation distillation.arXiv, 2019

Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive representation distillation.arXiv, 2019

work page 2019
[36]

Contrastive learning of preferences with a contextual infonce loss, 2024

Timo Bertram, Johannes Fürnkranz, and Martin Müller. Contrastive learning of preferences with a contextual infonce loss, 2024

work page 2024
[37]

Clinical decision support using pseudo-notes from multiple streams of ehr data.npj Digital Medicine, 8(1):394, July 2025

Simon A Lee, Sujay Jain, Alex Chen, Kyoka Ono, Arabdha Biswas, Ákos Rudas, Jennifer Fang, and Jeffrey N Chiang. Clinical decision support using pseudo-notes from multiple streams of ehr data.npj Digital Medicine, 8(1):394, July 2025

work page 2025
[38]

Med-bert: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.NPJ digital medicine, 4(1):86, 2021

Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-bert: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.NPJ digital medicine, 4(1):86, 2021

work page 2021
[39]

Using foundation models to prescribe patients proper antibiotics

Simon A Lee, Helio Halperin, Yanai Halperin, Trevor Brokowski, and Jeffrey N Chiang. Using foundation models to prescribe patients proper antibiotics. InAAAI Bridge Program on AI for Medicine and Healthcare, pages 121–132. PMLR, 2025

work page 2025
[40]

The shaky foundations of large language models and foundation models for electronic health records.npj digital medicine, 6(1):135, 2023

Michael Wornow, Yizhe Xu, Rahul Thapa, Birju Patel, Ethan Steinberg, Scott Fleming, Michael A Pfeffer, Jason Fries, and Nigam H Shah. The shaky foundations of large language models and foundation models for electronic health records.npj digital medicine, 6(1):135, 2023

work page 2023
[41]

Emergency department decision support using clinical pseudo-notes.arXiv preprint arXiv:2402.00160, 2024

Simon A Lee, Sujay Jain, Alex Chen, Kyoka Ono, Jennifer Fang, Akos Rudas, and Jeffrey N Chiang. Emergency department decision support using clinical pseudo-notes.arXiv preprint arXiv:2402.00160, 2024

work page arXiv 2024
[42]

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901
[43]

Cehr-bert: Incorporating temporal information from structured ehr data to improve prediction tasks

Chao Pang, Xinzhuo Jiang, Krishna S Kalluri, Matthew Spotnitz, RuiJun Chen, Adler Perotte, and Karthik Natarajan. Cehr-bert: Incorporating temporal information from structured ehr data to improve prediction tasks. In Machine Learning for Health, pages 239–260. PMLR, 2021

work page 2021
[44]

Cehr-gpt: Generating electronic health records with chronological patient timelines.arXiv preprint arXiv:2402.04400, 2024

Chao Pang, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S Kalluri, Elise L Minto, Jason Patterson, Linying Zhang, George Hripcsak, Gamze Gürsoy, Noémie Elhadad, et al. Cehr-gpt: Generating electronic health records with chronological patient timelines.arXiv preprint arXiv:2402.04400, 2024

work page arXiv 2024
[45]

Event stream gpt: a data pre-processing and modeling library for generative, pre-trained transformers over continuous-time sequences of complex events

Matthew McDermott, Bret Nestor, Peniel Argaw, and Isaac S Kohane. Event stream gpt: a data pre-processing and modeling library for generative, pre-trained transformers over continuous-time sequences of complex events. Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[46]

LLMs-Healthcare : Current Applications and Challenges of Large Language Models in various Medical Specialties

Ummara Mumtaz, Awais Ahmed, and Summaya Mumtaz. Llms-healthcare: Current applications and challenges of large language models in various medical specialties.arXiv preprint arXiv:2311.12882, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[47]

Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters.ACM Transactions on Intelligent Systems and Technology, 16(3):1–20, 2025

Ching Chang, Wei-Yao Wang, Wen-Chih Peng, and Tien-Fu Chen. Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters.ACM Transactions on Intelligent Systems and Technology, 16(3):1–20, 2025

work page 2025
[48]

Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

work page 2025
[49]

Text serialization and their relationship with the conventional paradigms of tabular machine learning.arXiv preprint arXiv:2406.13846, 2024

Kyoka Ono and Simon A Lee. Text serialization and their relationship with the conventional paradigms of tabular machine learning.arXiv preprint arXiv:2406.13846, 2024

work page arXiv 2024
[50]

Clinical text summarization: Adapting large language models can outperform human experts.Research Square, 2023

Dave Van Veen, Cara Van Uden, Louis Blankemeier, Jean-Benoit Delbrouck, Asad Aali, Christian Bluethgen, Anuj Pareek, Malgorzata Polacin, Eduardo Pontes Reis, Anna Seehofnerova, et al. Clinical text summarization: Adapting large language models can outperform human experts.Research Square, 2023

work page 2023
[51]

arXiv preprint arXiv:2310.01728 , year=

Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, et al. Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023

work page arXiv 2023
[52]

Multimodal llms for health grounded in individual- specific data

Anastasiya Belyaeva, Justin Cosentino, Farhad Hormozdiari, Krish Eswaran, Shravya Shetty, Greg Corrado, Andrew Carroll, Cory Y McLean, and Nicholas A Furlotte. Multimodal llms for health grounded in individual- specific data. InWorkshop on Machine Learning for Multimodal Healthcare Data, pages 86–102. Springer, 2023. 12

work page 2023
[53]

A case study exploring the current landscape of synthetic medical record generation with commercial llms.arXiv preprint arXiv:2504.14657, 2025

Yihan Lin, Zhirong Bella Yu, and Simon Lee. A case study exploring the current landscape of synthetic medical record generation with commercial llms.arXiv preprint arXiv:2504.14657, 2025

work page arXiv 2025
[54]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[55]

A computer-aided detection system for the detection of lung nodules based on 3d-resnet.Applied Sciences, 9(24):5544, 2019

Jiaxu Ning, Haitong Zhao, Lei Lan, Peng Sun, and Yunfei Feng. A computer-aided detection system for the detection of lung nodules based on 3d-resnet.Applied Sciences, 9(24):5544, 2019

work page 2019
[56]

Introducing transfer learning to 3d resnet-18 for alzheimer’s disease detection on mri images

Amir Ebrahimi, Suhuai Luo, and Raymond Chiong. Introducing transfer learning to 3d resnet-18 for alzheimer’s disease detection on mri images. In2020 35th international conference on image and vision computing New Zealand (IVCNZ), pages 1–6. IEEE, 2020

work page 2020
[57]

Automatic segmentation of head and neck (h&n) primary tumors in pet and ct images using 3d-inception-resnet model

Abdul Qayyum, Abdesslam Benzinou, Moona Mazher, Mohamed Abdel-Nasser, and Domenec Puig. Automatic segmentation of head and neck (h&n) primary tumors in pet and ct images using 3d-inception-resnet model. In 3D Head and Neck Tumor Segmentation in PET/CT Challenge, pages 58–67. Springer, 2021

work page 2021
[58]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

work page 2021
[59]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

work page 2021
[60]

Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images

Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R Roth, and Daguang Xu. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. InInternational MICCAI brainlesion workshop, pages 272–284. Springer, 2021

work page 2021
[61]

Abdomenatlas: A large-scale, detailed-annotated, & multi-center dataset for efficient transfer learning and open algorithmic benchmarking.Medical Image Analysis, 97:103285, 2024

Wenxuan Li, Chongyu Qu, Xiaoxi Chen, Pedro RAS Bassi, Yijia Shi, Yuxiang Lai, Qian Yu, Huimin Xue, Yixiong Chen, Xiaorui Lin, et al. Abdomenatlas: A large-scale, detailed-annotated, & multi-center dataset for efficient transfer learning and open algorithmic benchmarking.Medical Image Analysis, 97:103285, 2024

work page 2024
[62]

Mis-fm: 3d medical image segmentation using foundation models pretrained on a large-scale unannotated dataset.arXiv preprint arXiv:2306.16925, 2023

Guotai Wang, Jianghao Wu, Xiangde Luo, Xinglong Liu, Kang Li, and Shaoting Zhang. Mis-fm: 3d medical image segmentation using foundation models pretrained on a large-scale unannotated dataset.arXiv preprint arXiv:2306.16925, 2023

work page arXiv 2023
[63]

Large-scale 3d medical image pre-training with geometric context priors.arXiv preprint arXiv:2410.09890, 2024

Linshan Wu, Jiaxin Zhuang, and Hao Chen. Large-scale 3d medical image pre-training with geometric context priors.arXiv preprint arXiv:2410.09890, 2024

work page arXiv 2024
[64]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the International Conference on Computer Vision (ICCV), 2021

work page 2021
[65]

ibot: Image bert pre-training with online tokenizer.International Conference on Learning Representations (ICLR), 2022

Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. ibot: Image bert pre-training with online tokenizer.International Conference on Learning Representations (ICLR), 2022

work page 2022
[66]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[67]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021

work page 2021
[68]

FlashAttention-2: Faster attention with better parallelism and work partitioning

Tri Dao. FlashAttention-2: Faster attention with better parallelism and work partitioning. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[69]

Unetr++: delving into efficient and accurate 3d medical image segmentation.IEEE Transactions on Medical Imaging, 2024

Abdelrahman M Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, and Fahad Shah- baz Khan. Unetr++: delving into efficient and accurate 3d medical image segmentation.IEEE Transactions on Medical Imaging, 2024

work page 2024
[70]

Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation

Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 578–588. Springer, 2024

work page 2024
[71]

Zixuan Liu, Hanwen Xu, Addie Woicik, Linda G Shapiro, Marian Blazes, Yue Wu, Cecilia S Lee, Aaron Y Lee, and Sheng Wang. Octcube: a 3d foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis.arXiv preprint arXiv:2408.11227, 2024. 13

work page arXiv 2024
[72]

Wearable accelerom- eter foundation models for health via knowledge distillation.arXiv preprint arXiv:2412.11276, 2024

Salar Abbaspourazad, Anshuman Mishra, Joseph Futoma, Andrew C Miller, and Ian Shapiro. Wearable accelerom- eter foundation models for health via knowledge distillation.arXiv preprint arXiv:2412.11276, 2024

work page arXiv 2024
[73]

Brandon Westover, and Jimeng Sun

Chaoqi Yang, M. Brandon Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild. InNeurIPS 2023, 2023

work page 2023
[74]

Pearson Education India, 1999

Alan V Oppenheim.Discrete-time signal processing. Pearson Education India, 1999

work page 1999
[75]

SIAM, 1992

Ingrid Daubechies.Ten lectures on wavelets. SIAM, 1992

work page 1992
[76]

Towards on-device foundation models for raw wearable signals

Simon A Lee, Cyrus Tanade, Hao Zhou, Juhyeon Lee, Megha Thukral, Baiying Lu, and Sharanya Arcot Desai. Towards on-device foundation models for raw wearable signals. InNeurIPS 2025 Workshop on Learning from Time Series for Health, 2025

work page 2025
[77]

Himae: Hierarchical masked autoencoders discover resolution-specific structure in wearable time series.arXiv preprint arXiv:2510.25785, 2025

Simon A Lee, Cyrus Tanade, Hao Zhou, Juhyeon Lee, Megha Thukral, Minji Han, Rachel Choi, Md Sazzad Hissain Khan, Baiying Lu, Migyeong Gwak, et al. Himae: Hierarchical masked autoencoders discover resolution-specific structure in wearable time series.arXiv preprint arXiv:2510.25785, 2025

work page arXiv 2025
[78]

Meds: Building models and tools in a reproducible health ai ecosystem

Matthew BA McDermott, Justin Xu, Teya S Bergamaschi, Hyewon Jeong, Simon A Lee, Nassim Oufattole, Patrick Rockenschaub, Kamil˙e Stankeviˇci¯ut˙e, Ethan Steinberg, Jimeng Sun, et al. Meds: Building models and tools in a reproducible health ai ecosystem. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6243...

work page 2025
[79]

Meds decentralized, extensible validation (meds-dev) benchmark: Establishing reproducibility and comparability in ml for health

Aleksia Kolo, Chao Pang, Edward Choi, Ethan Steinberg, Hyewon Jeong, Jack Gallifant, Jason A Fries, Jeffrey N Chiang, Jungwoo Oh, Justin Xu, et al. Meds decentralized, extensible validation (meds-dev) benchmark: Establishing reproducibility and comparability in ml for health. 2024

work page 2024
[80]

Context clues: Evaluating long context models for clinical prediction tasks on ehrs.arXiv preprint arXiv:2412.16178, 2024

Michael Wornow, Suhana Bedi, Miguel Angel Fuentes Hernandez, Ethan Steinberg, Jason Alan Fries, Christopher Ré, Sanmi Koyejo, and Nigam H Shah. Context clues: Evaluating long context models for clinical prediction tasks on ehrs.arXiv preprint arXiv:2412.16178, 2024

work page arXiv 2024

Showing first 80 references.