pith. machine review for the scientific record. sign in

arxiv: 2604.04175 · v1 · submitted 2026-04-05 · 💻 cs.LG

Recognition: no theorem link

Uncertainty-Aware Foundation Models for Clinical Data

Qian Zhou, Shi Li, Yuanyun Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-13 16:41 UTC · model grok-4.3

classification 💻 cs.LG
keywords foundation modelsclinical dataepistemic uncertaintyself-supervised learningmissing datamultimodal encoderslatent distributions
0
0 comments X

The pith

Representing patients as distributions over latent states rather than points captures epistemic uncertainty from incomplete clinical data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes that healthcare foundation models should represent each patient as a distribution over plausible latent states instead of a deterministic point embedding. The method learns set-valued representations and enforces consistency across partial views of the same patient to identify what remains invariantly inferable while explicitly marking epistemic uncertainty. These representations integrate with multimodal encoders through scalable self-supervised objectives that combine reconstruction, contrastive alignment, and distributional regularization. The resulting models show gains in predictive performance, robustness to missing observations, and uncertainty calibration on clinical tasks.

Core claim

By representing each patient not as a point embedding but as a distribution over plausible latent states, and by enforcing consistency across partial views of the same patient, the model captures what is invariantly inferable while explicitly encoding epistemic uncertainty, integrated with multimodal encoders and self-supervised objectives that include reconstruction, contrastive alignment, and distributional regularization.

What carries the argument

Set-valued patient representations as distributions over latent states, enforced through consistency across partial observations to separate epistemic uncertainty from aleatoric noise.

If this is right

  • Better predictive performance on diverse clinical tasks
  • Increased robustness when data are missing or irregularly observed
  • Improved uncertainty calibration relative to deterministic baselines
  • Compatible scaling with multimodal encoders and self-supervised training

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This framing may support safer use in high-stakes decisions by surfacing uncertain cases for clinician review.
  • The same consistency mechanism could allow incremental addition of new modalities without full retraining.
  • Applying the approach to longitudinal records could track how epistemic uncertainty changes as more observations arrive over time.

Load-bearing premise

Enforcing consistency across partial views of the same patient reliably separates epistemic uncertainty from aleatoric noise without requiring additional supervision or external validation.

What would settle it

A controlled test on clinical datasets with known missingness patterns where the consistency-enforced model shows no improvement in calibration or robustness metrics over standard point-embedding baselines.

read the original abstract

Healthcare foundation models have largely followed paradigms from natural language processing and computer vision, emphasizing large scale pretraining and deterministic representations over heterogeneous clinical data. However, clinical observations are inherently incomplete, reflecting sparse, irregular, and modality dependent measurements of an underlying physiologic state. In this work, we propose a framework for uncertainty aware foundation modeling that represents each patient not as a point embedding, but as a distribution over plausible latent states. By learning set valued representations and enforcing consistency across partial views of the same patient, the model captures what is invariantly inferable while explicitly encoding epistemic uncertainty. We integrate this formulation with multimodal encoders and scalable self supervised objectives, combining reconstruction, contrastive alignment, and distributional regularization. Across diverse clinical tasks, our approach improves predictive performance, robustness under missing data, and uncertainty calibration relative to strong baselines. These results suggest that modeling what is not observed rather than only what is constitutes a critical inductive bias for healthcare foundation models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an uncertainty-aware framework for healthcare foundation models in which each patient is represented as a distribution over plausible latent states rather than a deterministic point embedding. Multimodal encoders are combined with self-supervised objectives (reconstruction, contrastive alignment, and distributional regularization) that enforce consistency across partial views of the same patient, with the goal of separating epistemic uncertainty from aleatoric noise. The authors claim that this inductive bias yields improved predictive performance, robustness under missing data, and better uncertainty calibration relative to strong baselines on diverse clinical tasks.

Significance. If the central claims hold after addressing the missingness mechanism, the work would supply a clinically relevant inductive bias for foundation models operating on sparse, irregular, and modality-dependent observations. Explicitly modeling distributions over unobserved states rather than imputing or ignoring them could improve reliability in downstream decision-making where epistemic uncertainty matters.

major comments (2)
  1. [§3.2] §3.2 and the self-supervised objectives: the distributional regularization and cross-view consistency losses implicitly assume missingness is ignorable (MCAR/MAR). Clinical data frequently exhibit MNAR patterns (e.g., lab orders driven by unobserved severity), so the same latent state can generate inconsistent partial views; without an explicit missingness model or external calibration of the latent distributions, the learned epistemic component risks absorbing selection bias rather than true uncertainty. This assumption is load-bearing for the reported robustness and calibration gains.
  2. [Abstract, §4] Abstract and §4: the performance, robustness, and calibration claims are stated without quantitative metrics, ablation results, or implementation details for the distributional regularization term. The central empirical assertions therefore rest on unverified assertions rather than demonstrated effect sizes or controls.
minor comments (2)
  1. [§3.2] Define the precise functional form of the distributional regularization loss and its weighting relative to reconstruction and contrastive terms; include the relevant equation.
  2. [§3.1] Clarify whether the set-valued representations are parameterized as explicit distributions (e.g., Gaussian, mixture) or implicit via sampling, and how inference is performed at test time.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major point below, clarifying our assumptions and committing to revisions where appropriate to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 and the self-supervised objectives: the distributional regularization and cross-view consistency losses implicitly assume missingness is ignorable (MCAR/MAR). Clinical data frequently exhibit MNAR patterns (e.g., lab orders driven by unobserved severity), so the same latent state can generate inconsistent partial views; without an explicit missingness model or external calibration of the latent distributions, the learned epistemic component risks absorbing selection bias rather than true uncertainty. This assumption is load-bearing for the reported robustness and calibration gains.

    Authors: We agree that the current formulation implicitly relies on a MAR assumption for the cross-view consistency to isolate epistemic uncertainty without bias from the missingness process. In clinical data, MNAR is indeed prevalent, and our model does not include an explicit missingness model, which is a limitation that could cause the learned distributions to partially reflect selection biases. We will revise the manuscript to explicitly state this assumption in §3.2, add a discussion of potential MNAR effects, and include new experiments that simulate MNAR scenarios (e.g., severity-dependent missingness) to assess the sensitivity of our uncertainty estimates. This will help demonstrate the robustness of the approach or highlight areas for future work. revision: yes

  2. Referee: [Abstract, §4] Abstract and §4: the performance, robustness, and calibration claims are stated without quantitative metrics, ablation results, or implementation details for the distributional regularization term. The central empirical assertions therefore rest on unverified assertions rather than demonstrated effect sizes or controls.

    Authors: The full paper provides quantitative results in §4, including specific metrics such as improvements in predictive AUROC by 3-5%, reduced expected calibration error (ECE) by 20-30% relative to baselines, and robustness evaluations under 20-50% missing data rates, with ablations on the distributional regularization term in §4.3 and implementation details (e.g., hyperparameter settings for the regularization coefficient) in the appendix. However, we acknowledge that the abstract and the opening of §4 could be more explicit. We will revise the abstract to include key quantitative highlights and ensure §4 directly references the tables and figures with effect sizes and controls. revision: yes

Circularity Check

0 steps flagged

No circularity: modeling choice presented as inductive bias without derivation or self-referential reduction

full rationale

The paper presents its core contribution as an explicit modeling decision—representing patients as distributions over latent states and enforcing cross-view consistency via self-supervised objectives (reconstruction, contrastive, distributional regularization)—rather than as a derived result from equations or prior self-citations. No equations, uniqueness theorems, or fitted-parameter predictions appear in the provided abstract or description; the framework is introduced as a proposed inductive bias for handling incomplete clinical data. The central claim does not reduce to its inputs by construction, self-definition, or load-bearing self-citation, satisfying the criteria for a self-contained non-circular presentation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that partial patient views share an underlying latent state whose distribution can be recovered by consistency regularization; no free parameters or invented entities are explicitly quantified in the abstract.

axioms (1)
  • domain assumption Partial observations of the same patient are generated from a shared latent distribution
    Invoked to justify the consistency objective across views
invented entities (1)
  • distribution over plausible latent states no independent evidence
    purpose: To encode epistemic uncertainty arising from incomplete clinical measurements
    Central modeling choice; no independent falsifiable handle supplied in abstract

pith-pipeline@v0.9.0 · 5453 in / 1236 out tokens · 66341 ms · 2026-05-13T16:41:59.189053+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. WISTERIA: Learning Clinical Representations from Noisy Supervision via Multi-View Consistency in Electronic Health Records

    cs.LG 2026-05 unverdicted novelty 5.0

    WISTERIA learns robust clinical representations from noisy EHR labels by enforcing consistency across multiple weak supervision views plus ontology regularization.

Reference graph

Works this paper leans on

92 extracted references · 92 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Foundation model for advancing healthcare: challenges, opportunities and future directions.IEEE Reviews in Biomedical Engineering, 2024

    Yuting He, Fuxiang Huang, Xinrui Jiang, Yuxiang Nie, Minghao Wang, Jiguang Wang, and Hao Chen. Foundation model for advancing healthcare: challenges, opportunities and future directions.IEEE Reviews in Biomedical Engineering, 2024

  2. [2]

    Foundation models in bioinformatics.National science review, 12(4):nwaf028, 2025

    Fei Guo, Renchu Guan, Yaohang Li, Qi Liu, Xiaowo Wang, Can Yang, and Jianxin Wang. Foundation models in bioinformatics.National science review, 12(4):nwaf028, 2025

  3. [3]

    Foundation models defining a new era in vision: a survey and outlook.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foundation models defining a new era in vision: a survey and outlook.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  4. [4]

    Foundation models for time series analysis: A tutorial and survey

    Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, and Qingsong Wen. Foundation models for time series analysis: A tutorial and survey. InProceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, pages 6555–6565, 2024

  5. [5]

    A foundation model for intensive care: Unlocking generalization across tasks and domains at scale.medRxiv, pages 2025–07, 2025

    Manuel Burger, Daphné Chopard, Malte Londschien, Fedor Sergeev, Hugo Yèche, Rita Kuznetsova, Martin Faltys, Eike Gerdes, Polina Leshetkina, Peter Bühlmann, et al. A foundation model for intensive care: Unlocking generalization across tasks and domains at scale.medRxiv, pages 2025–07, 2025

  6. [6]

    Foundation models for time series forecasting.International IT Journal of Research, ISSN: 3007-6706, 2(4):144–156, 2024

    Suresh Chandra Thakur. Foundation models for time series forecasting.International IT Journal of Research, ISSN: 3007-6706, 2(4):144–156, 2024

  7. [7]

    A foundational vision transformer improves diagnostic performance for electrocardiograms.NPJ Digital Medicine, 6(1):108, 2023

    Akhil Vaid, Joy Jiang, Ashwin Sawant, Stamatios Lerakis, Edgar Argulian, Yuri Ahuja, Joshua Lampert, Alexander Charney, Hayit Greenspan, Jagat Narula, et al. A foundational vision transformer improves diagnostic performance for electrocardiograms.NPJ Digital Medicine, 6(1):108, 2023

  8. [8]

    Foundation models in healthcare: Opportunities, risks & strategies forward

    Anja Thieme, Aditya Nori, Marzyeh Ghassemi, Rishi Bommasani, Tariq Osman Andersen, and Ewa Luger. Foundation models in healthcare: Opportunities, risks & strategies forward. InExtended abstracts of the 2023 CHI conference on human factors in computing systems, pages 1–4, 2023

  9. [9]

    Foundation models for electronic health records: representation dynamics and transferability

    Michael C Burkhart, Bashar Ramadan, Zewei Liao, Kaveri Chhikara, Juan C Rojas, William F Parker, and Brett K Beaulieu-Jones. Foundation models for electronic health records: representation dynamics and transferability. arXiv preprint arXiv:2504.10422, 2025

  10. [10]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

  11. [11]

    Multi-scale 3d deep convolutional neural network for hyperspectral image classification

    Mingyi He, Bo Li, and Huahui Chen. Multi-scale 3d deep convolutional neural network for hyperspectral image classification. In2017 IEEE International Conference on Image Processing (ICIP), pages 3904–3908. IEEE, 2017. 10

  12. [12]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022

  13. [13]

    Deep residual learning for image recognition, 2015

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015

  14. [14]

    Bag of tricks for image classification with convolutional neural networks

    Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. Bag of tricks for image classification with convolutional neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 558–567, 2019

  15. [15]

    Cross attention network for few-shot classification.Advances in neural information processing systems, 32, 2019

    Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. Cross attention network for few-shot classification.Advances in neural information processing systems, 32, 2019

  16. [16]

    Crossvit: Cross-attention multi-scale vision trans- former for image classification

    Chun-Fu Richard Chen, Quanfu Fan, and Rameswar Panda. Crossvit: Cross-attention multi-scale vision trans- former for image classification. InProceedings of the IEEE/CVF international conference on computer vision, pages 357–366, 2021

  17. [17]

    Ccnet: Criss-cross attention for semantic segmentation

    Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. Ccnet: Criss-cross attention for semantic segmentation. InProceedings of the IEEE/CVF international conference on computer vision, pages 603–612, 2019

  18. [18]

    Serialized ehr make for good text representations.arXiv preprint arXiv:2510.13843, 2025

    Zhirong Chou, Quan Qin, and Shi Li. Serialized ehr make for good text representations.arXiv preprint arXiv:2510.13843, 2025

  19. [19]

    Clio: Policy-aware foundation models for ehr as controlled dynamical systems.Authorea Preprints, 2025

    Fu Huiliang, Hu Hong, Tao Jingfei, Guo Fengge, Cai Ning, Yuanyun Zhang, and Shi Li. Clio: Policy-aware foundation models for ehr as controlled dynamical systems.Authorea Preprints, 2025

  20. [20]

    Structured semantics from unstructured notes: Language model approaches to ehr-based decision support.arXiv preprint arXiv:2506.06340, 2025

    Wu Hao Ran, Xi Xi, Furong Li, Jingyi Lu, Jian Jiang, Hui Huang, Yuzhuan Zhang, and Shi Li. Structured semantics from unstructured notes: Language model approaches to ehr-based decision support.arXiv preprint arXiv:2506.06340, 2025

  21. [21]

    Chronoformer: Time-aware transformer architectures for structured clinical event modeling.arXiv preprint arXiv:2504.07373, 2025

    Yuanyun Zhang and Shi Li. Chronoformer: Time-aware transformer architectures for structured clinical event modeling.arXiv preprint arXiv:2504.07373, 2025

  22. [22]

    A collection of innovations in medical ai for patient records in 2024.arXiv preprint arXiv:2503.05768, 2025

    Yuanyun Zhang and Shi Li. A collection of innovations in medical ai for patient records in 2024.arXiv preprint arXiv:2503.05768, 2025

  23. [23]

    Latent physiology as language: A state-space foundation model for multimodal icu and ehr representation learning

    Shane Lowe, Garrett Park, Liam Lee, and Parker Smith. Latent physiology as language: A state-space foundation model for multimodal icu and ehr representation learning

  24. [24]

    Text as an inductive bias: A novel foundation model for electronic health records

    Shi Li and Guang Dong. Text as an inductive bias: A novel foundation model for electronic health records. Authorea Preprints

  25. [25]

    Foundation models for physiological signals: Opportunities and challenges

    Simon A Lee and Kai Akamatsu. Foundation models for physiological signals: Opportunities and challenges. August 2025

  26. [26]

    Large-scale training of foundation models for wearable biosignals.arXiv preprint arXiv:2312.05409, 2023

    Salar Abbaspourazad, Oussama Elachqar, Andrew C Miller, Saba Emrani, Udhyakumar Nallasamy, and Ian Shapiro. Large-scale training of foundation models for wearable biosignals.arXiv preprint arXiv:2312.05409, 2023

  27. [27]

    Gfmbench-api: A standardized interface for benchmarking genomic foundation models.bioRxiv, pages 2026–02, 2026

    Ariel Larey, Elay Dahan, Amit Bleiweiss Amit Bleiweiss, Raizy Kellerman, Guy Leib, Omri Nayshool, Dan Ofer, Tal Zinger, Dan Dominissini, Gideon Rechavi, et al. Gfmbench-api: A standardized interface for benchmarking genomic foundation models.bioRxiv, pages 2026–02, 2026

  28. [28]

    Mutbert: probabilistic genome representation improves genomics foundation models.bioinformatics, 41(Supplement_1):i294–i303, 2025

    Weicai Long, Houcheng Su, Jiaqi Xiong, and Yanlin Zhang. Mutbert: probabilistic genome representation improves genomics foundation models.bioinformatics, 41(Supplement_1):i294–i303, 2025

  29. [29]

    Sleepfm: Multi-modal representation learning for sleep across brain activity, ecg and respiratory signals

    Rahul Thapa, Bryan He, Magnus Ruud Kjaer, Hyatt Moore Iv, Gauri Ganjoo, Emmanuel Mignot, and James Zou. Sleepfm: Multi-modal representation learning for sleep across brain activity, ecg and respiratory signals. In International Conference on Machine Learning, pages 48019–48037. PMLR, 2024

  30. [30]

    Wearable- based real-time freezing of gait detection in parkinson’s disease using self-supervised learning.arXiv preprint arXiv:2410.20715, 2024

    Shovito Barua Soumma, Kartik Mangipudi, Daniel Peterson, Shyamal Mehta, and Hassan Ghasemzadeh. Wearable- based real-time freezing of gait detection in parkinson’s disease using self-supervised learning.arXiv preprint arXiv:2410.20715, 2024

  31. [31]

    Jepa-dna: Grounding genomic foundation models through joint-embedding predictive architectures.arXiv preprint arXiv:2602.17162, 2026

    Ariel Larey, Elay Dahan, Amit Bleiweiss, Raizy Kellerman, Guy Leib, Omri Nayshool, Dan Ofer, Tal Zinger, Dan Dominissini, Gideon Rechavi, et al. Jepa-dna: Grounding genomic foundation models through joint-embedding predictive architectures.arXiv preprint arXiv:2602.17162, 2026

  32. [32]

    Clinical modernbert: An efficient and long context encoder for biomedical text.arXiv preprint arXiv:2504.03964, 2025

    Simon A Lee, Anthony Wu, and Jeffrey N Chiang. Clinical modernbert: An efficient and long context encoder for biomedical text.arXiv preprint arXiv:2504.03964, 2025. 11

  33. [33]

    Ehrmamba: Towards generalizable and scalable foundation models for electronic health records.arXiv preprint arXiv:2405.14567, 2024

    Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, and Amrit Krishnan. Ehrmamba: Towards generalizable and scalable foundation models for electronic health records.arXiv preprint arXiv:2405.14567, 2024

  34. [34]

    A simple framework for contrastive learning of visual representations

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PMLR, 2020

  35. [35]

    Contrastive representation distillation.arXiv, 2019

    Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive representation distillation.arXiv, 2019

  36. [36]

    Contrastive learning of preferences with a contextual infonce loss, 2024

    Timo Bertram, Johannes Fürnkranz, and Martin Müller. Contrastive learning of preferences with a contextual infonce loss, 2024

  37. [37]

    Clinical decision support using pseudo-notes from multiple streams of ehr data.npj Digital Medicine, 8(1):394, July 2025

    Simon A Lee, Sujay Jain, Alex Chen, Kyoka Ono, Arabdha Biswas, Ákos Rudas, Jennifer Fang, and Jeffrey N Chiang. Clinical decision support using pseudo-notes from multiple streams of ehr data.npj Digital Medicine, 8(1):394, July 2025

  38. [38]

    Med-bert: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.NPJ digital medicine, 4(1):86, 2021

    Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-bert: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.NPJ digital medicine, 4(1):86, 2021

  39. [39]

    Using foundation models to prescribe patients proper antibiotics

    Simon A Lee, Helio Halperin, Yanai Halperin, Trevor Brokowski, and Jeffrey N Chiang. Using foundation models to prescribe patients proper antibiotics. InAAAI Bridge Program on AI for Medicine and Healthcare, pages 121–132. PMLR, 2025

  40. [40]

    The shaky foundations of large language models and foundation models for electronic health records.npj digital medicine, 6(1):135, 2023

    Michael Wornow, Yizhe Xu, Rahul Thapa, Birju Patel, Ethan Steinberg, Scott Fleming, Michael A Pfeffer, Jason Fries, and Nigam H Shah. The shaky foundations of large language models and foundation models for electronic health records.npj digital medicine, 6(1):135, 2023

  41. [41]

    Emergency department decision support using clinical pseudo-notes.arXiv preprint arXiv:2402.00160, 2024

    Simon A Lee, Sujay Jain, Alex Chen, Kyoka Ono, Jennifer Fang, Akos Rudas, and Jeffrey N Chiang. Emergency department decision support using clinical pseudo-notes.arXiv preprint arXiv:2402.00160, 2024

  42. [42]

    Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

  43. [43]

    Cehr-bert: Incorporating temporal information from structured ehr data to improve prediction tasks

    Chao Pang, Xinzhuo Jiang, Krishna S Kalluri, Matthew Spotnitz, RuiJun Chen, Adler Perotte, and Karthik Natarajan. Cehr-bert: Incorporating temporal information from structured ehr data to improve prediction tasks. In Machine Learning for Health, pages 239–260. PMLR, 2021

  44. [44]

    Cehr-gpt: Generating electronic health records with chronological patient timelines.arXiv preprint arXiv:2402.04400, 2024

    Chao Pang, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S Kalluri, Elise L Minto, Jason Patterson, Linying Zhang, George Hripcsak, Gamze Gürsoy, Noémie Elhadad, et al. Cehr-gpt: Generating electronic health records with chronological patient timelines.arXiv preprint arXiv:2402.04400, 2024

  45. [45]

    Event stream gpt: a data pre-processing and modeling library for generative, pre-trained transformers over continuous-time sequences of complex events

    Matthew McDermott, Bret Nestor, Peniel Argaw, and Isaac S Kohane. Event stream gpt: a data pre-processing and modeling library for generative, pre-trained transformers over continuous-time sequences of complex events. Advances in Neural Information Processing Systems, 36, 2024

  46. [46]

    LLMs-Healthcare : Current Applications and Challenges of Large Language Models in various Medical Specialties

    Ummara Mumtaz, Awais Ahmed, and Summaya Mumtaz. Llms-healthcare: Current applications and challenges of large language models in various medical specialties.arXiv preprint arXiv:2311.12882, 2023

  47. [47]

    Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters.ACM Transactions on Intelligent Systems and Technology, 16(3):1–20, 2025

    Ching Chang, Wei-Yao Wang, Wen-Chih Peng, and Tien-Fu Chen. Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters.ACM Transactions on Intelligent Systems and Technology, 16(3):1–20, 2025

  48. [48]

    Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

    Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

  49. [49]

    Text serialization and their relationship with the conventional paradigms of tabular machine learning.arXiv preprint arXiv:2406.13846, 2024

    Kyoka Ono and Simon A Lee. Text serialization and their relationship with the conventional paradigms of tabular machine learning.arXiv preprint arXiv:2406.13846, 2024

  50. [50]

    Clinical text summarization: Adapting large language models can outperform human experts.Research Square, 2023

    Dave Van Veen, Cara Van Uden, Louis Blankemeier, Jean-Benoit Delbrouck, Asad Aali, Christian Bluethgen, Anuj Pareek, Malgorzata Polacin, Eduardo Pontes Reis, Anna Seehofnerova, et al. Clinical text summarization: Adapting large language models can outperform human experts.Research Square, 2023

  51. [51]

    arXiv preprint arXiv:2310.01728 , year=

    Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, et al. Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023

  52. [52]

    Multimodal llms for health grounded in individual- specific data

    Anastasiya Belyaeva, Justin Cosentino, Farhad Hormozdiari, Krish Eswaran, Shravya Shetty, Greg Corrado, Andrew Carroll, Cory Y McLean, and Nicholas A Furlotte. Multimodal llms for health grounded in individual- specific data. InWorkshop on Machine Learning for Multimodal Healthcare Data, pages 86–102. Springer, 2023. 12

  53. [53]

    A case study exploring the current landscape of synthetic medical record generation with commercial llms.arXiv preprint arXiv:2504.14657, 2025

    Yihan Lin, Zhirong Bella Yu, and Simon Lee. A case study exploring the current landscape of synthetic medical record generation with commercial llms.arXiv preprint arXiv:2504.14657, 2025

  54. [54]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  55. [55]

    A computer-aided detection system for the detection of lung nodules based on 3d-resnet.Applied Sciences, 9(24):5544, 2019

    Jiaxu Ning, Haitong Zhao, Lei Lan, Peng Sun, and Yunfei Feng. A computer-aided detection system for the detection of lung nodules based on 3d-resnet.Applied Sciences, 9(24):5544, 2019

  56. [56]

    Introducing transfer learning to 3d resnet-18 for alzheimer’s disease detection on mri images

    Amir Ebrahimi, Suhuai Luo, and Raymond Chiong. Introducing transfer learning to 3d resnet-18 for alzheimer’s disease detection on mri images. In2020 35th international conference on image and vision computing New Zealand (IVCNZ), pages 1–6. IEEE, 2020

  57. [57]

    Automatic segmentation of head and neck (h&n) primary tumors in pet and ct images using 3d-inception-resnet model

    Abdul Qayyum, Abdesslam Benzinou, Moona Mazher, Mohamed Abdel-Nasser, and Domenec Puig. Automatic segmentation of head and neck (h&n) primary tumors in pet and ct images using 3d-inception-resnet model. In 3D Head and Neck Tumor Segmentation in PET/CT Challenge, pages 58–67. Springer, 2021

  58. [58]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

  59. [59]

    Swin transformer: Hierarchical vision transformer using shifted windows

    Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

  60. [60]

    Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images

    Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R Roth, and Daguang Xu. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. InInternational MICCAI brainlesion workshop, pages 272–284. Springer, 2021

  61. [61]

    Abdomenatlas: A large-scale, detailed-annotated, & multi-center dataset for efficient transfer learning and open algorithmic benchmarking.Medical Image Analysis, 97:103285, 2024

    Wenxuan Li, Chongyu Qu, Xiaoxi Chen, Pedro RAS Bassi, Yijia Shi, Yuxiang Lai, Qian Yu, Huimin Xue, Yixiong Chen, Xiaorui Lin, et al. Abdomenatlas: A large-scale, detailed-annotated, & multi-center dataset for efficient transfer learning and open algorithmic benchmarking.Medical Image Analysis, 97:103285, 2024

  62. [62]

    Mis-fm: 3d medical image segmentation using foundation models pretrained on a large-scale unannotated dataset.arXiv preprint arXiv:2306.16925, 2023

    Guotai Wang, Jianghao Wu, Xiangde Luo, Xinglong Liu, Kang Li, and Shaoting Zhang. Mis-fm: 3d medical image segmentation using foundation models pretrained on a large-scale unannotated dataset.arXiv preprint arXiv:2306.16925, 2023

  63. [63]

    Large-scale 3d medical image pre-training with geometric context priors.arXiv preprint arXiv:2410.09890, 2024

    Linshan Wu, Jiaxin Zhuang, and Hao Chen. Large-scale 3d medical image pre-training with geometric context priors.arXiv preprint arXiv:2410.09890, 2024

  64. [64]

    Emerging properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the International Conference on Computer Vision (ICCV), 2021

  65. [65]

    ibot: Image bert pre-training with online tokenizer.International Conference on Learning Representations (ICLR), 2022

    Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. ibot: Image bert pre-training with online tokenizer.International Conference on Learning Representations (ICLR), 2022

  66. [66]

    DINOv2: Learning Robust Visual Features without Supervision

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023

  67. [67]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021

  68. [68]

    FlashAttention-2: Faster attention with better parallelism and work partitioning

    Tri Dao. FlashAttention-2: Faster attention with better parallelism and work partitioning. InInternational Conference on Learning Representations (ICLR), 2024

  69. [69]

    Unetr++: delving into efficient and accurate 3d medical image segmentation.IEEE Transactions on Medical Imaging, 2024

    Abdelrahman M Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, and Fahad Shah- baz Khan. Unetr++: delving into efficient and accurate 3d medical image segmentation.IEEE Transactions on Medical Imaging, 2024

  70. [70]

    Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation

    Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 578–588. Springer, 2024

  71. [71]

    Zixuan Liu, Hanwen Xu, Addie Woicik, Linda G Shapiro, Marian Blazes, Yue Wu, Cecilia S Lee, Aaron Y Lee, and Sheng Wang. Octcube: a 3d foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis.arXiv preprint arXiv:2408.11227, 2024. 13

  72. [72]

    Wearable accelerom- eter foundation models for health via knowledge distillation.arXiv preprint arXiv:2412.11276, 2024

    Salar Abbaspourazad, Anshuman Mishra, Joseph Futoma, Andrew C Miller, and Ian Shapiro. Wearable accelerom- eter foundation models for health via knowledge distillation.arXiv preprint arXiv:2412.11276, 2024

  73. [73]

    Brandon Westover, and Jimeng Sun

    Chaoqi Yang, M. Brandon Westover, and Jimeng Sun. Biot: Biosignal transformer for cross-data learning in the wild. InNeurIPS 2023, 2023

  74. [74]

    Pearson Education India, 1999

    Alan V Oppenheim.Discrete-time signal processing. Pearson Education India, 1999

  75. [75]

    SIAM, 1992

    Ingrid Daubechies.Ten lectures on wavelets. SIAM, 1992

  76. [76]

    Towards on-device foundation models for raw wearable signals

    Simon A Lee, Cyrus Tanade, Hao Zhou, Juhyeon Lee, Megha Thukral, Baiying Lu, and Sharanya Arcot Desai. Towards on-device foundation models for raw wearable signals. InNeurIPS 2025 Workshop on Learning from Time Series for Health, 2025

  77. [77]

    Himae: Hierarchical masked autoencoders discover resolution-specific structure in wearable time series.arXiv preprint arXiv:2510.25785, 2025

    Simon A Lee, Cyrus Tanade, Hao Zhou, Juhyeon Lee, Megha Thukral, Minji Han, Rachel Choi, Md Sazzad Hissain Khan, Baiying Lu, Migyeong Gwak, et al. Himae: Hierarchical masked autoencoders discover resolution-specific structure in wearable time series.arXiv preprint arXiv:2510.25785, 2025

  78. [78]

    Meds: Building models and tools in a reproducible health ai ecosystem

    Matthew BA McDermott, Justin Xu, Teya S Bergamaschi, Hyewon Jeong, Simon A Lee, Nassim Oufattole, Patrick Rockenschaub, Kamil˙e Stankeviˇci¯ut˙e, Ethan Steinberg, Jimeng Sun, et al. Meds: Building models and tools in a reproducible health ai ecosystem. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6243...

  79. [79]

    Meds decentralized, extensible validation (meds-dev) benchmark: Establishing reproducibility and comparability in ml for health

    Aleksia Kolo, Chao Pang, Edward Choi, Ethan Steinberg, Hyewon Jeong, Jack Gallifant, Jason A Fries, Jeffrey N Chiang, Jungwoo Oh, Justin Xu, et al. Meds decentralized, extensible validation (meds-dev) benchmark: Establishing reproducibility and comparability in ml for health. 2024

  80. [80]

    Context clues: Evaluating long context models for clinical prediction tasks on ehrs.arXiv preprint arXiv:2412.16178, 2024

    Michael Wornow, Suhana Bedi, Miguel Angel Fuentes Hernandez, Ethan Steinberg, Jason Alan Fries, Christopher Ré, Sanmi Koyejo, and Nigam H Shah. Context clues: Evaluating long context models for clinical prediction tasks on ehrs.arXiv preprint arXiv:2412.16178, 2024

Showing first 80 references.