Recognition: unknown
Representation Before Training: A Fixed-Budget Benchmark for Generative Medical Event Models
Pith reviewed 2026-05-10 07:19 UTC · model grok-4.3
The pith
Fused code-value tokenization raises mortality AUROC from 0.891 to 0.915 in fixed-budget medical event models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Training 28 matched transformers for one epoch under a shared budget on MIMIC-IV shows that fused code-value tokenization improves mortality AUROC from 0.891 to 0.915, hospital length-of-stay AUROC from 0.763 to 0.788, and mean Spearman rho across 13 regression outcomes from 0.414 to 0.494. Event-order and admission-relative RoPE temporal encodings match or exceed time-token insertion on average while shortening sequences by 11 percent. CLIF remapping preserves downstream performance in the single-site setting while yielding a smaller, clinically interpretable token set. Finer quantization, reference-range anchoring, and soft discretization provide selective benefits, whereas code-normalized
What carries the argument
Fixed-budget benchmark that trains matched one-epoch transformers to isolate representation choices from optimization and architectural confounds.
If this is right
- Fused code-value tokenization improves mortality and length-of-stay AUROC and regression Spearman rho compared with separate encoding.
- Event-order only and admission-relative RoPE temporal encodings achieve comparable or higher average performance than time tokens while cutting sequence length by 11%.
- CLIF remapping preserves task performance while producing a smaller, clinically interpretable token vocabulary suitable for multi-site use.
- Finer-than-decile quantization, reference-range anchoring, and soft discretization improve selected outcomes, but code-normalized xVal lags the discrete and soft families.
Where Pith is reading between the lines
- The benchmark could be reused on other longitudinal datasets to test whether fused tokenization remains advantageous outside single-site ICU data.
- Representation design may offer a lower-compute path to performance gains than increasing model size or training epochs in medical settings.
- Standardized formats like CLIF could enable pooling across hospitals without large performance loss, supporting broader model training.
Load-bearing premise
That one-epoch training of 28 matched transformers under a shared budget sufficiently separates representation effects from optimization dynamics or data leakage in the MIMIC-IV splits.
What would settle it
Retraining the same representation variants for multiple epochs or across different random seeds and data splits, then checking whether the AUROC gaps between fused and unfused tokenization shrink or vanish.
Figures
read the original abstract
Every prediction from a generative medical event model is bounded by how clinical events are tokenized, yet input representation is rarely isolated from other system and architectural choices. We evaluate how representation decisions affect downstream prediction after a shared one-epoch pretraining budget. We train 28 matched transformers on MIMIC-IV and evaluate them on 30 clinical outcomes in three experiments: (1) quantization granularity, reference-range anchoring, and code-value fusion; (2) value encoding (hard bins, soft discretization, code-normalized xVal) crossed with temporal encoding (event order, time tokens, admission-relative RoPE); and (3) native MIMIC laboratory/vital codes versus the Common Longitudinal ICU Format (CLIF)-remapped laboratory/vital codes with compression-preserving perturbation arms. In Experiment 1, fused code-value tokenization improves mortality AUROC from 0.891 to 0.915 (BH-adjusted p < 0.001), hospital length-of-stay AUROC from 0.763 to 0.788 (BH-adjusted p < 0.001), and, for the decile fused-vs-unfused comparison, mean regression Spearman rho across the 13 regression outcomes from 0.414 to 0.494. Across the three temporal encodings, event order only and admission-relative RoPE match or exceed inserting time tokens on average while shortening sequences by 11%. CLIF remapping preserves downstream performance in our single-site setting while yielding a smaller, clinically interpretable token set compatible with multi-site use. Finer-than-decile quantization, reference-range anchoring, and soft discretization help in selective outcomes, while code-normalized xVal remains well below the discrete and soft families, consistent with near-median suppression that persists after the affine variant.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that input representation choices in generative medical event models can be isolated and benchmarked under a fixed one-epoch pretraining budget. By training 28 matched transformers on MIMIC-IV and evaluating on 30 clinical outcomes, it reports that fused code-value tokenization yields AUROC gains for mortality (0.891 to 0.915) and hospital length-of-stay (0.763 to 0.788) with BH-adjusted p < 0.001, plus improved mean Spearman rho (0.414 to 0.494) across 13 regression tasks; additional experiments examine quantization granularity, value encodings, temporal encodings (showing event order and admission-relative RoPE competitive with time tokens while shortening sequences), and CLIF remapping of codes.
Significance. If the central attribution to representation holds, the work supplies a practical, budget-controlled benchmark for tokenization decisions in clinical sequence models. Strengths include the matched-model design across multiple outcomes, use of BH-adjusted tests, and concrete recommendations (e.g., fusion and CLIF compatibility) that could guide practitioners without increasing compute. The empirical focus on held-out performance rather than theoretical claims makes the results directly actionable if confounding factors are ruled out.
major comments (2)
- [Experiment 1 / Methods] The one-epoch shared-budget protocol does not isolate representation effects from optimization dynamics. Different tokenizations change vocabulary size, sequence length (fused tokens shorten sequences), and input statistics, altering per-step gradient magnitudes and convergence speed. Without loss curves, multi-epoch ablations, or seed-wise variance reported for the 28 models, the AUROC gains (mortality 0.891→0.915, LOS 0.763→0.788) and rho improvement cannot be confidently attributed to representation quality rather than faster convergence under the fixed budget. This is load-bearing for the headline claims in Experiment 1.
- [Data and Splits] MIMIC-IV patient splits are not described as strictly temporal. This raises the possibility that representation-specific leakage (e.g., via code-value fusion altering which events appear in training vs. test) interacts with the single-epoch regime, potentially inflating the reported gains. A temporal split or explicit leakage audit would be needed to support the cross-representation comparisons.
minor comments (2)
- [Abstract / Results] The abstract and results should explicitly state how the mean Spearman rho is aggregated across the 13 regression outcomes and whether the decile fused-vs-unfused comparison uses the same patient cohort as the AUROC tasks.
- [Statistical Analysis] Provide the exact number of comparisons underlying the BH adjustment and confirm that all 30 outcomes were included in the correction; this would strengthen interpretation of the p < 0.001 statements.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below, agreeing that further details on optimization and splits will improve clarity and attribution. Revisions will be made accordingly.
read point-by-point responses
-
Referee: [Experiment 1 / Methods] The one-epoch shared-budget protocol does not isolate representation effects from optimization dynamics. Different tokenizations change vocabulary size, sequence length (fused tokens shorten sequences), and input statistics, altering per-step gradient magnitudes and convergence speed. Without loss curves, multi-epoch ablations, or seed-wise variance reported for the 28 models, the AUROC gains (mortality 0.891→0.915, LOS 0.763→0.788) and rho improvement cannot be confidently attributed to representation quality rather than faster convergence under the fixed budget. This is load-bearing for the headline claims in Experiment 1.
Authors: We agree that sequence length and vocabulary differences under a one-epoch budget can affect per-step gradients and convergence rates, potentially confounding pure representation effects. The fixed-budget design was selected to mirror practical clinical modeling constraints where multi-epoch training is often limited by compute and data availability. To strengthen the claims, we will add training loss curves for the fused versus unfused comparisons and report standard deviations across multiple seeds for the primary AUROC and Spearman rho results. Full multi-epoch ablations across all 28 models are not feasible given our compute resources, but we will discuss this limitation explicitly. revision: partial
-
Referee: [Data and Splits] MIMIC-IV patient splits are not described as strictly temporal. This raises the possibility that representation-specific leakage (e.g., via code-value fusion altering which events appear in training vs. test) interacts with the single-epoch regime, potentially inflating the reported gains. A temporal split or explicit leakage audit would be needed to support the cross-representation comparisons.
Authors: We will revise the methods section to explicitly state that splits are random patient-level partitions with no patient overlap across train, validation, and test sets. Tokenization, including code-value fusion, is applied after splitting and therefore does not differentially alter event membership between sets. We will include a leakage audit confirming absence of patient ID or event-type overlap. While a temporal split could address potential distribution shifts, the single-center MIMIC-IV setting with random splits follows common practice; we will add this as a noted limitation and consider supplementary temporal-split results if space allows. revision: yes
Circularity Check
No significant circularity: purely empirical benchmark of tokenization variants under fixed training budget
full rationale
The paper reports results from training 28 matched transformers on MIMIC-IV under a shared one-epoch budget and measuring held-out AUROC/Spearman performance across 30 clinical outcomes for different tokenization schemes (fused code-value, quantization, temporal encodings, CLIF remapping). No equations, derivations, or first-principles claims appear; performance differences are presented as direct experimental measurements rather than predictions derived from fitted parameters or self-referential definitions. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are invoked to justify core results. The evaluation is self-contained against external benchmarks (held-out MIMIC-IV splits) with no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption MIMIC-IV single-site data is representative for testing representation effects that generalize to other settings
Reference graph
Works this paper leans on
-
[1]
Rethinking tokenization for clinical time series: When less is more, 2025
Rafi Al Attrach, Rajna Fani, David Restrepo, Yugang Jia, and Peter Sch\" u ffler. Rethinking tokenization for clinical time series: When less is more, 2025. URL https://arxiv.org/abs/2512.05217
-
[2]
Old optimizer, new norm: An anthology.arXiv preprint arXiv:2409.20325, 2024
Jeremy Bernstein and Laker Newhouse. Old optimizer, new norm: An anthology, 2024. URL https://arxiv.org/abs/2409.20325. Theoretical basis for the Muon optimizer (https://github.com/KellerJordan/Muon)
-
[3]
Foundation models for electronic health records: representation dynamics and transferability
Michael C. Burkhart, Bashar Ramadan, Zewei Liao, Kaveri Chhikara, Juan C. Rojas, William F. Parker, and Brett K. Beaulieu-Jones. Foundation models for electronic health records: representation dynamics and transferability, 2025. URL https://arxiv.org/abs/2504.10422
-
[4]
Burkhart, Bashar Ramadan, Luke Solo, William F
Michael C. Burkhart, Bashar Ramadan, Luke Solo, William F. Parker, and Brett K. Beaulieu-Jones. Quantifying surprise in clinical care: Detecting highly informative events in electronic health records with foundation models. In Pacific Symposium on Biocomputing, volume 31, pages 173--188, 2026. doi:10.1142/9789819824755_0013. URL https://doi.org/10.1142/97...
-
[5]
Tiffany J. Callahan, Adrianne L. Stefanski, Jordan M. Wyrwa, Chenjie Zeng, Anna Ostropolets, Juan M. Banda, William A. Baumgartner, Richard D. Boyce, Elena Casiraghi, Ben D. Coleman, Janine H. Collins, Sara J. Deakyne Davies, James A. Feinstein, Asiyah Y. Lin, Blake Martin, Nicolas A. Matentzoglu, Daniella Meeker, Justin Reese, Jessica Sinclair, Sanya B. ...
-
[6]
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. Doctor AI : Predicting clinical events via recurrent neural networks, 2016. URL https://arxiv.org/abs/1511.05942
-
[7]
Edward Choi, Mohammad Taha Bahadori, Joshua A. Kulas, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. RETAIN : An interpretable predictive model for healthcare using reverse time attention mechanism, 2017. URL https://arxiv.org/abs/1608.05745
-
[8]
CLIMB : Data foundations for large scale multimodal clinical foundation models, 2025
Wei Dai, Peilin Chen, Malinda Lu, Daniel Li, Haowen Wei, Hejie Cui, and Paul Pu Liang. CLIMB : Data foundations for large scale multimodal clinical foundation models, 2025. URL https://arxiv.org/abs/2503.07667
-
[9]
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao. FlashAttention-2 : Faster attention with better parallelism and work partitioning, 2023. URL https://arxiv.org/abs/2307.08691. Published at ICLR 2024
work page internal anchor Pith review arXiv 2023
-
[10]
arXiv preprint arXiv:2410.13351 , year=
Vijay Prakash Dwivedi, Viktor Schlegel, Andy T. Liu, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Jeng Wei, Wei-Hsian Yin, Stefan Winkler, and Robby T. Tan. Representation learning of structured data for medical foundation models, 2024. URL https://arxiv.org/abs/2410.13351
-
[11]
Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, and Amrit Krishnan. EHRMamba : Towards generalizable and scalable foundation models for electronic health records, 2024. URL https://arxiv.org/abs/2405.14567
-
[12]
EHRSHOT : An EHR benchmark for few-shot evaluation of foundation models
Jason Fries, Nigam Shah, Ethan Steinberg, Rahul Thapa, and Michael Wornow. EHRSHOT : An EHR benchmark for few-shot evaluation of foundation models. In Advances in Neural Information Processing Systems, volume 36, pages 67125--67137. Neural Information Processing Systems Foundation, Inc. (NeurIPS), 2023. doi:10.52202/075280-2933. URL https://doi.org/10.522...
-
[13]
Gao, Chad Hochberg, Nick Ingraham, William Parker, and CLIF Consortium
Zewei Liao, Shan Guleria, Kevin Smith, Rachel Baccile, Kaveri Chhikara, Dema Therese, Vaishvik Chaudhari, Michael Craig Burkhart, Brett Beaulieu-Jones, Snigdha Jain, Kathryn Connell, Kevin Buell, Juan Rojas, Patrick Lyons, Siva Bhavani, Catherine A. Gao, Chad Hochberg, Nick Ingraham, William Parker, and CLIF Consortium. MIMIC-IV-Ext-CLIF : MIMIC-IV in the...
-
[14]
Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker, Bruno R\' e galdo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, and Shirley Ho. xVal : A continuous numerical tokenization for scientific language models, 2024. URL https://arxiv.org/abs/2310.02989
-
[15]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, et al. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[16]
Tokenization tradeoffs in structured EHR foundation models, 2026
Lin Lawrence Guo, Santiago Eduardo Arciniegas, Joseph Jihyung Lee, Adam Paul Yan, George Tomlinson, Jason Fries, and Lillian Sung. Tokenization tradeoffs in structured EHR foundation models, 2026. URL https://arxiv.org/abs/2603.15644
-
[17]
Large Language Models are Powerful Electronic Health Record Encoders
Stefan Hegselmann, Georg von Arnim, Tillmann Rheude, Noel Kronenberg, David Sontag, Gerhard Hindricks, Roland Eils, and Benjamin Wild. Large language models are powerful electronic health record encoders, 2025. URL https://arxiv.org/abs/2502.17403
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Unifying heterogeneous electronic health records systems via text-based code embedding, 2022
Kyunghoon Hur, Jiyoung Lee, Jungwoo Oh, Wesley Price, Young-Hak Kim, and Edward Choi. Unifying heterogeneous electronic health records systems via text-based code embedding, 2022. URL https://arxiv.org/abs/2108.03625
-
[19]
MIMIC-IV.PhysioNet, October 2024
Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Brian Gow, Benjamin Moody, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV . PhysioNet, 2024. doi:10.13026/KPB9-MT58. URL https://physionet.org/content/mimiciv/3.1/
-
[20]
Alistair E. W. Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J. Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, Li-wei H. Lehman, Leo A. Celi, and Roger G. Mark. MIMIC-IV , a freely accessible electronic health record dataset. Scientific Data, 10 0 (1), January 2023. ISSN 2052-4463. doi:10.1038/s41597-022-01899-x. URL...
-
[21]
Time2Vec: Learning a Vector Representation of Time
Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, and Marcus Brubaker. Time2Vec : Learning a vector representation of time, 2019. URL https://arxiv.org/abs/1907.05321
work page Pith review arXiv 2019
- [22]
-
[23]
BEHRT: Transformer for Electronic Health Records
Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrishnan, Dexter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. BEHRT : Transformer for electronic health records. Scientific Reports, 10 0 (1), April 2020. ISSN 2045-2322. doi:10.1038/s41598-020-62922-y. URL https://doi.org/10.1038/s41598-020-62922-y
-
[24]
Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, and Jing Gao. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks, 2017. URL https://arxiv.org/abs/1706.05764
-
[25]
Loftus, Parisa Rashidi, Azra Bihorac, and Benjamin Shickel
Yingbo Ma, Suraj Kolla, Dhruv Kaliraman, Victoria Nolan, Zhenhong Hu, Ziyuan Guan, Yuanfang Ren, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Tyler J. Loftus, Parisa Rashidi, Azra Bihorac, and Benjamin Shickel. Temporal cross-attention for dynamic embedding and tokenization of multimodal electronic health records, 2024. URL https://arxiv.org/abs/2403.04012
-
[26]
Medical event data standard (meds), 2024
Medical Event Data Standard Community . Medical event data standard (meds), 2024. URL https://medical-event-data-standard.github.io/docs/intro_pages/what_is_MEDS
2024
-
[27]
Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, and Jeffrey Dean. Zero-shot learning by convex combination of semantic embeddings, 2014. URL https://arxiv.org/abs/1312.5650
-
[28]
Kalluri, Matthew Spotnitz, RuiJun Chen, Adler Perotte, and Karthik Natarajan
Chao Pang, Xinzhuo Jiang, Krishna S. Kalluri, Matthew Spotnitz, RuiJun Chen, Adler Perotte, and Karthik Natarajan. CEHR-BERT : Incorporating temporal information from structured EHR data to improve prediction tasks. In Proceedings of Machine Learning for Health, volume 158 of Proceedings of Machine Learning Research, pages 239--260, 2021
2021
-
[29]
Bashar Ramadan, Ming-Chieh Liu, Michael C. Burkhart, William F. Parker, and Brett K. Beaulieu-Jones. Diagnostic codes in ai prediction models and label leakage of same-admission clinical outcomes. JAMA Network Open, 8 0 (12): 0 e2550454, December 2025. ISSN 2574-3805. doi:10.1001/jamanetworkopen.2025.50454. URL https://doi.org/10.1001/jamanetworkopen.2025.50454
-
[30]
Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-BERT : pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digital Medicine, 4 0 (1), May 2021. ISSN 2398-6352. doi:10.1038/s41746-021-00455-y. URL https://doi.org/10.1038/s41746-021-00455-y
-
[31]
Foundation model of electronic medical records for adaptive risk estimation
Pawel Renc, Michal K Grzeszczyk, Nassim Oufattole, Deirdre Goode, Yugang Jia, Szymon Bieganski, Matthew B A McDermott, Jaroslaw Was, Anthony E Samir, Jonathan W Cunningham, David W Bates, and Arkadiusz Sitek. Foundation model of electronic medical records for adaptive risk estimation. GigaScience, 14, 2025. ISSN 2047-217X. doi:10.1093/gigascience/giaf107....
-
[32]
Juan C. Rojas, Patrick G. Lyons, Kaveri Chhikara, Vaishvik Chaudhari, Sivasubramanium V. Bhavani, Muna Nour, Kevin G. Buell, Kevin D. Smith, Catherine A. Gao, Saki Amagai, Chengsheng Mao, Yuan Luo, Anna K. Barker, Mark Nuppnau, Michael Hermsen, Jay L. Koyner, Haley Beck, Rachel Baccile, Zewei Liao, Kyle A. Carey, Brenna Park-Egan, Xuan Han, Alexander C. O...
-
[33]
Continuous autoregressive language models
Chenze Shao, Darren Li, Fandong Meng, and Jie Zhou. Continuous autoregressive language models, 2025. URL https://arxiv.org/abs/2510.27688
-
[34]
Ethan Steinberg, Ken Jung, Jason A. Fries, Conor K. Corbin, Stephen R. Pfohl, and Nigam H. Shah. Language models are an effective representation learning technique for electronic health record data. Journal of Biomedical Informatics, 113: 0 103637, January 2021. ISSN 1532-0464. doi:10.1016/j.jbi.2020.103637. URL https://doi.org/10.1016/j.jbi.2020.103637
-
[35]
doi:10.48550/arXiv.2301.03150 , abstract =
Ethan Steinberg, Jason Fries, Yizhe Xu, and Nigam Shah. MOTOR : A time-to-event foundation model for structured medical records, 2023. URL https://arxiv.org/abs/2301.03150
-
[36]
RoFormer: Enhanced Transformer with Rotary Position Embedding,
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. RoFormer : Enhanced transformer with rotary position embedding. Neurocomputing, 568: 0 127063, February 2024. ISSN 0925-2312. doi:10.1016/j.neucom.2023.127063. URL https://doi.org/10.1016/j.neucom.2023.127063
-
[37]
Multimodal medical code tokenizer, 2025
Xiaorui Su, Shvat Messica, Yepeng Huang, Ruth Johnson, Lukas Fesser, Shanghua Gao, Faryad Sahneh, and Marinka Zitnik. Multimodal medical code tokenizer, 2025. URL https://arxiv.org/abs/2502.04397
-
[38]
Generative medical event models improve with scale.arXiv preprint arXiv:2508.12104, 2025
Shane Waxler, Paul Blazek, Davis White, Daniel Sneider, Kevin Chung, Mani Nagarathnam, Patrick Williams, Hank Voeller, Karen Wong, Matthew Swanhorst, Sheng Zhang, Naoto Usuyama, Cliff Wong, Tristan Naumann, Hoifung Poon, Andrew Loza, Daniella Meeker, Seth Hain, and Rahul Shah. Generative medical event models improve with scale, 2025. URL https://arxiv.org...
-
[39]
Representing numbers in nlp: a survey and a vision
Avijit Thawani, Jay Pujara, Filip Ilievski, and Pedro Szekely. Representing numbers in nlp: a survey and a vision. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 644--656. Association for Computational Linguistics, 2021. doi:10.18653/v1/2021.naacl-mai...
-
[40]
Xiao Yang, Xuejiao Zhao, and Zhiqi Shen. EHRStruct : A comprehensive benchmark framework for evaluating large language models on structured electronic health record tasks, 2025. URL https://arxiv.org/abs/2511.08206
-
[41]
u ser, Xinrui Lyu, Martin Faltys, and Gunnar R\
Hugo Y\` e che, Rita Kuznetsova, Marc Zimmermann, Matthias H\" u ser, Xinrui Lyu, Martin Faltys, and Gunnar R\" a tsch. HiRID-ICU-Benchmark -- a comprehensive machine learning benchmark on high-resolution ICU data, 2022. URL https://arxiv.org/abs/2111.08536
-
[42]
Benchmarking foundation models with multimodal public electronic health records
Kunyu Yu, Rui Yang, Jingchi Liao, Siqi Li, Huitao Li, Irene Li, Yifan Peng, Rishikesan Kamaleswaran, and Nan Liu. Benchmarking foundation models with multimodal public electronic health records. IEEE Journal of Biomedical and Health Informatics, pages 1--12, 2025. ISSN 2168-2208. doi:10.1109/jbhi.2025.3645076. URL https://doi.org/10.1109/jbhi.2025.3645076
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.