pith. sign in

arxiv: 2605.18701 · v1 · pith:WTZHVCIInew · submitted 2026-05-18 · 💻 cs.LG · q-bio.QM

Learning Normal Representations for Blood Biomarkers

Pith reviewed 2026-05-20 12:49 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM
keywords blood biomarkersreference intervalspersonalized medicinetransformer modellongitudinal laboratory dataclinical outcomesmachine learningpopulation priors
0
0 comments X

The pith

A conditional transformer model improves blood biomarker reference intervals by blending individual patient history with population-level normal variation, leading to better prediction of clinical outcomes than purely personalized or fixed-

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Blood tests use reference intervals to flag abnormal results, but standard population ranges ignore personal baselines while recent personalization efforts using only past tests can overfit and incorrectly label many results as abnormal. Analysis of nearly two billion measurements shows these personalized intervals classify up to 68 percent as abnormal yet lack ties to real outcomes such as mortality or acute kidney injury. The paper presents NORMA, a framework that conditions interval generation on both the patient's own history and broader population patterns of normal values. This hybrid method delivers reference intervals with higher precision for forecasting adverse events. The results indicate that effective interpretation requires anchoring personal data to stable population information rather than relying on either extreme.

Core claim

Laboratory values exhibit substantial individual variation, yet purely personalized reference intervals routinely overfit to sparse data, classifying up to 68% of measurements as abnormal without corresponding associations with adverse clinical outcomes. NORMA addresses this by using a conditional transformer to generate reference intervals conditioned on both a patient's testing history and population-level data about normal variation, resulting in intervals that achieve higher precision for predicting outcomes including mortality, acute kidney injury, and chronic disease. These findings suggest that population-level priors enhance individual trajectory analysis and outperform either pure-

What carries the argument

NORMA, a conditional transformer-based framework that produces reference intervals by conditioning on both patient history and population-level normal variation.

If this is right

  • Personalized intervals without population conditioning lead to inflated abnormal classifications lacking clinical relevance.
  • Hybrid conditioning improves precision in outcome prediction for mortality, acute kidney injury, and chronic disease.
  • Laboratory medicine should moderate the use of purely individual reference intervals.
  • Anchoring individual data to population priors provides superior performance compared to standalone methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applying similar hybrid conditioning could benefit interpretation of other longitudinal clinical measurements such as imaging or vital signs.
  • Expanding the approach to include additional patient context like age or comorbidities might further refine interval accuracy.
  • Deployment in clinical systems could help decrease unnecessary follow-up testing triggered by over-flagged results.
  • The multi-regional scope of the data suggests potential for more generalizable normality definitions across populations.

Load-bearing premise

The multi-regional dataset of laboratory measurements largely captures stable normal biological variation rather than including substantial unrecognized or subclinical disease that contaminates the population priors.

What would settle it

A prospective validation study that applies NORMA intervals, personalized intervals, and population intervals to new patients and tracks which method most accurately associates flagged abnormalities with subsequent clinical events while minimizing unnecessary alerts.

read the original abstract

Blood-based biomarkers underpin clinical diagnosis and management, yet their interpretation relies largely on fixed population reference intervals that ignore stable, intra-patient variability. As such, population-based interpretation can mask meaningful deviation from an individual's baseline, risking delayed disease detection. To remedy this, there have been increasing efforts to personalize blood biomarker interpretation using individual testing histories. However, these methods may overfit to sparse data, inflating false-positive rates and unnecessary follow-up, and can also unwittingly include unrecognized or subclinical disease. Here, we leverage nearly 2 billion longitudinal laboratory measurements from over 1.6 million individuals across North America, the Middle East, and East Asia, to show that while laboratory values are highly individual, purely personalized intervals routinely overfit, classifying up to 68% of measurements as abnormal, without corresponding associations with adverse clinical outcomes. We then introduce NORMA, a conditional transformer-based framework that generates reference intervals by conditioning on both a patient's history and population-level data about "normal" variation. NORMA-derived intervals achieve higher precision for predicting outcomes, including mortality, acute kidney injury, and chronic disease. These findings caution against over-personalization in laboratory medicine and demonstrate that anchoring individual trajectories to population-level priors outperforms either approach alone. To promote transparency, we publicly release the model, code, and an interactive user interface for accessible, individualized laboratory interpretation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces NORMA, a conditional transformer framework that generates blood biomarker reference intervals by conditioning on both individual patient testing histories and population-level priors learned from nearly 2 billion longitudinal measurements across 1.6 million individuals in multiple regions. It shows that purely personalized intervals overfit (classifying up to 68% of values as abnormal without corresponding adverse outcomes) and claims that NORMA-derived intervals yield higher precision for predicting mortality, acute kidney injury, and chronic disease, outperforming either pure personalization or fixed population intervals alone. The work publicly releases the model, code, and an interactive UI.

Significance. If the central results hold after addressing validation gaps, the paper would have substantial clinical significance by providing evidence-based guidance against over-personalization in laboratory medicine and demonstrating the value of hybrid conditioning on stable population priors. The scale of the dataset and the public release of code and UI are clear strengths that support reproducibility and potential adoption.

major comments (2)
  1. [Methods (population prior construction)] The manuscript provides no explicit validation or sensitivity analysis showing that the learned population-level priors are uncontaminated by subclinical or unrecognized disease; this assumption is load-bearing for the claim that NORMA's hybrid conditioning outperforms personalization by avoiding the overfitting and contamination issues acknowledged for individual histories.
  2. [Results (outcome prediction experiments)] Outcome prediction results lack reported details on the exact statistical tests, baseline comparisons (e.g., standard reference intervals or simple history-based models), cross-validation strategy, and effect sizes for the claimed precision gains on mortality, AKI, and chronic disease endpoints.
minor comments (2)
  1. [Abstract] The abstract states headline precision improvements without defining the precise metrics (e.g., AUC, precision-recall, or calibration) or providing quantitative tables comparing NORMA to the two baselines.
  2. [Model description] Notation for the conditional transformer inputs (history embedding vs. population prior embedding) is introduced without a clear diagram or equation showing the fusion mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript. The comments highlight important areas for improving methodological transparency and rigor, and we address each point below with plans for revision.

read point-by-point responses
  1. Referee: [Methods (population prior construction)] The manuscript provides no explicit validation or sensitivity analysis showing that the learned population-level priors are uncontaminated by subclinical or unrecognized disease; this assumption is load-bearing for the claim that NORMA's hybrid conditioning outperforms personalization by avoiding the overfitting and contamination issues acknowledged for individual histories.

    Authors: We agree this is a substantive concern and that the population prior's robustness to subclinical disease is central to interpreting the hybrid conditioning advantage. While the dataset's scale and geographic diversity provide some inherent protection against systematic contamination, we acknowledge the need for explicit checks. In the revised manuscript we will add a sensitivity analysis subsection in Methods that retrains the population prior after (a) excluding patients with any recorded ICD codes for relevant conditions and (b) restricting to the first two measurements per individual. We will report the resulting changes in downstream precision metrics and discuss residual limitations. revision: yes

  2. Referee: [Results (outcome prediction experiments)] Outcome prediction results lack reported details on the exact statistical tests, baseline comparisons (e.g., standard reference intervals or simple history-based models), cross-validation strategy, and effect sizes for the claimed precision gains on mortality, AKI, and chronic disease endpoints.

    Authors: We thank the referee for noting these omissions, which reduce reproducibility. In the revised Results section we will explicitly state: the statistical tests (log-rank for time-to-event, logistic regression with Wald tests and 95% CIs), all baseline comparators (fixed population intervals, per-patient mean±2SD, and a simple autoregressive history model), the cross-validation procedure (patient-stratified 5-fold CV with temporal hold-out), and effect sizes (precision-recall AUC deltas and hazard ratios with confidence intervals). These additions will be accompanied by updated tables and supplementary figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained and empirical

full rationale

The paper introduces NORMA as a conditional transformer framework that generates reference intervals by conditioning on both individual history and population-level priors derived from nearly 2 billion measurements. Claims of superior precision for outcome prediction (mortality, AKI, chronic disease) rest on empirical comparisons showing that purely personalized intervals overfit (classifying up to 68% as abnormal without outcome associations) while the hybrid approach outperforms. No equations, self-citations, or steps reduce outputs by construction to fitted inputs or prior definitions; the model is presented as data-driven with public release of code and interface for external verification. The derivation chain is independent of the target results and does not invoke uniqueness theorems or ansatzes from self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the dataset representing clean normal variation and on the transformer successfully learning useful population priors; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption The multi-regional dataset of nearly 2 billion measurements primarily reflects stable normal biological variation without substantial unrecognized or subclinical disease contamination
    This premise is required to justify using population-level data as reliable priors for conditioning the reference intervals.

pith-pipeline@v0.9.0 · 5819 in / 1370 out tokens · 59462 ms · 2026-05-20T12:49:27.082515+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages

  1. [1]

    Towards better test utilization - strategies to improve physician ordering and their impact on patient outcomes.EJIFCC, 26(1):15–30, January 2015

    Danielle B Freedman. Towards better test utilization - strategies to improve physician ordering and their impact on patient outcomes.EJIFCC, 26(1):15–30, January 2015

  2. [2]

    Laboratory diagnosis of iron- deficiency anemia: an overview.J

    G H Guyatt, A D Oxman, M Ali, A Willan, W McIlroy, and C Patterson. Laboratory diagnosis of iron- deficiency anemia: an overview.J. Gen. Intern. Med., 7(2):145–153, March 1992

  3. [3]

    Guidelines on the management of abnormal liver blood tests.Gut, 67:6–19, November 2017

    P Newsome, R Cramb, S Davison, J Dillon, M Foulerton, E Godfrey, Richard Hall, Ulrike Harrower, M Hudson, A Langford, A Mackie, R Mitchell-Thain, K Sennett, N Sheron, J Verne, Martine Walmsley, and A Y eoman. Guidelines on the management of abnormal liver blood tests.Gut, 67:6–19, November 2017

  4. [4]

    Clinical practice

    Silvio E Inzucchi. Clinical practice. diagnosis of diabetes.N. Engl. J. Med., 367(6):542–550, August 2012

  5. [5]

    Enhancing the clinical value of medical laboratory testing.Clin

    Kenneth A Sikaris. Enhancing the clinical value of medical laboratory testing.Clin. Biochem. Rev., 38(3): 107–114, November 2017

  6. [6]

    Reference intervals: the way forward.Ann

    Ferruccio Ceriotti, Rolf Hinzmann, and Mauro Panteghini. Reference intervals: the way forward.Ann. Clin. Biochem., 46(Pt 1):8–17, January 2009

  7. [7]

    normal ranges

    Richard C Friedberg, Rhona Souers, Elizabeth A Wagar, Ana K Stankovic, Paul N Valenstein, and College of American Pathologists. The origin of reference intervals: A college of american pathologists Q-probes study of “normal ranges” used in 163 clinical laboratories.Arch. Pathol. Lab. Med., 131(3):348–357, March 2007

  8. [8]

    Overuse of diagnostic testing in healthcare: a systematic review.BMJ Qual

    Joris L J M Müskens, Rudolf Bertijn Kool, Simone A van Dulmen, and Gert P Westert. Overuse of diagnostic testing in healthcare: a systematic review.BMJ Qual. Saf., 31(1):54–63, January 2022

  9. [9]

    Low-density lipoproteins cause atherosclerotic cardiovascular disease

    Brian A Ference, Henry N Ginsberg, Ian Graham, Kausik K Ray, Chris J Packard, Eric Bruckert, Robert A Hegele, Ronald M Krauss, Frederick J Raal, Heribert Schunkert, Gerald F Watts, Jan Borén, Sergio Fazio, Jay D Horton, Luis Masana, Stephen J Nicholls, Børge G Nordestgaard, Bart van de Sluis, Marja-Riitta Taskinen, Lale Tokgözoglu, Ulf Landmesser, Ulrich ...

  10. [10]

    Metabolomic profiles predict individual multidisease outcomes.Nat

    Thore Buergel, Jakob Steinfeldt, Greg Ruyoga, Maik Pietzner, Daniele Bizzarri, Dina Vojinovic, Julius Upmeier Zu Belzen, Lukas Loock, Paul Kittner, Lara Christmann, Noah Hollmann, Henrik Strangalies, Jana M Braunger, Benjamin Wild, Scott T Chiesa, Joachim Spranger, Fabian Klostermann, Erik B van den Akker, Stella Trompet, Simon P Mooijaart, Naveed Sattar,...

  11. [11]

    Liver enzyme alteration: a guide for clinicians

    Edoardo G Giannini, Roberto Testa, and Vincenzo Savarino. Liver enzyme alteration: a guide for clinicians. CMAJ, 172(3):367–379, February 2005

  12. [12]

    Interpretation of the complete blood count.Pediatr

    M C Walters and H T Abelson. Interpretation of the complete blood count.Pediatr. Clin. North Am., 43(3): 599–622, June 1996

  13. [13]

    Defining laboratory reference values and decision limits: populations, intervals, and interpretations.Asian J

    James C Boyd. Defining laboratory reference values and decision limits: populations, intervals, and interpretations.Asian J. Androl., 12(1):83–90, January 2010

  14. [14]

    In the era of precision medicine and big data, who is normal?JAMA, 319(19):1981–1982, May 2018

    Arjun K Manrai, Chirag J Patel, and John P A Ioannidis. In the era of precision medicine and big data, who is normal?JAMA, 319(19):1981–1982, May 2018

  15. [15]

    Monthly intra-individual variation in lipids over a 12 1-year period in 22 normal subjects.Clin

    D J Nazir, R S Roberts, S A Hill, and M J McQueen. Monthly intra-individual variation in lipids over a 12 1-year period in 22 normal subjects.Clin. Biochem., 32(5):381–389, July 1999

  16. [16]

    Haematological setpoints are a stable and patient-specific deep phenotype.Nature, 637(8045):430–438, January 2025

    Brody H Foy, Rachel Petherbridge, Maxwell T Roth, Cindy Zhang, Daniel C De Souza, Christopher Mow, Hasmukh R Patel, Chhaya H Patel, Samantha N Ho, Evie Lam, Camille E Powe, Robert P Hasserjian, Konrad J Karczewski, Veronica Tozzo, and John M Higgins. Haematological setpoints are a stable and patient-specific deep phenotype.Nature, 637(8045):430–438, January 2025

  17. [17]

    Annual biological variation and personalized reference intervals of clinical chemistry and hematology analytes.Clin

    Shuo Wang, Min Zhao, Zihan Su, and Runqing Mu. Annual biological variation and personalized reference intervals of clinical chemistry and hematology analytes.Clin. Chem. Lab. Med., 60(4):606–617, March 2022

  18. [18]

    Personalized reference intervals - statistical approaches and considerations.Clin

    Abdurrahman Coskun, Sverre Sandberg, Ibrahim Unsal, Fulya G Y avuz, Coskun Cavusoglu, Mustafa Serteser, Meltem Kilercik, and Aasne K Aarsand. Personalized reference intervals - statistical approaches and considerations.Clin. Chem. Lab. Med., 60(4):629–635, March 2022

  19. [19]

    Personalized reference intervals in laboratory medicine: A new model based on within-subject biological variation.Clin

    Abdurrahman Co¸ skun, Sverre Sandberg, Ibrahim Unsal, Coskun Cavusoglu, Mustafa Serteser, Meltem Kilercik, and Aasne K Aarsand. Personalized reference intervals in laboratory medicine: A new model based on within-subject biological variation.Clin. Chem., 67(2):374–384, January 2021

  20. [20]

    Data mining approaches to reference interval studies.Clinical Chemistry, 67(9):1175–1181, 2021

    A E Obstfeld, K Patel, J C Boyd, J Drees, D T Holmes, J P Ioannidis, and A K Manrai. Data mining approaches to reference interval studies.Clinical Chemistry, 67(9):1175–1181, 2021

  21. [21]

    Association of sickle cell trait with hemoglobin A1c in african americans.JAMA, 317(5):507–515, February 2017

    Mary E Lacy, Gregory A Wellenius, Anne E Sumner, Adolfo Correa, Mercedes R Carnethon, Robert I Liem, James G Wilson, David B Sacks, David R Jacobs, Jr, April P Carson, Xi Luo, Annie Gjelsvik, Alexander P Reiner, Rakhi P Naik, Simin Liu, Solomon K Musani, Charles B Eaton, and Wen-Chih Wu. Association of sickle cell trait with hemoglobin A1c in african amer...

  22. [22]

    Guidelines for the management of high blood cholesterol

    Kenneth R Feingold. Guidelines for the management of high blood cholesterol. InEndotext [Internet]. MDText. com, Inc., 2025

  23. [23]

    Evaluation of hemoglobin cutoff levels to define anemia among healthy individuals.JAMA Netw

    O Y aw Addo, Emma X Yu, Anne M Williams, Melissa Fox Y oung, Andrea J Sharma, Zuguo Mei, Nicholas J Kassebaum, Maria Elena D Jefferds, and Parminder S Suchdev. Evaluation of hemoglobin cutoff levels to define anemia among healthy individuals.JAMA Netw. Open, 4(8):e2119123, August 2021

  24. [24]

    Why should women have lower reference limits for haemoglobin and ferritin concentrations than men?BMJ, 322(7298):1355–1357, June 2001

    D H Rushton, R Dover, A W Sainsbury, M J Norris, J J Gilkes, and I D Ramsay. Why should women have lower reference limits for haemoglobin and ferritin concentrations than men?BMJ, 322(7298):1355–1357, June 2001

  25. [25]

    Implications of race adjustment in lung-function equations.N

    James A Diao, Yixuan He, Rohan Khazanchi, Max Jordan Nguemeni Tiako, Jonathan I Witonsky, Emma Pierson, Pranav Rajpurkar, Jennifer R Elhawary, Luke Melas-Kyriazi, Albert Y en, Alicia R Martin, Sean Levy, Chirag J Patel, Maha Farhat, Luisa N Borrell, Michael H Cho, Edwin K Silverman, Esteban G Burchard, and Arjun K Manrai. Implications of race adjustment i...

  26. [26]

    Hidden in plain sight—reconsidering the use of race correction in clinical algorithms.New England Journal of Medicine, 383(9):874–882, 2020

    Darshali A Vyas, Leo G Eisenstein, and David S Jones. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms.New England Journal of Medicine, 383(9):874–882, 2020

  27. [27]

    Disentangling proxies of demographic adjustments in clinical equations.arXiv [q-bio.QM], November 2025

    Aashna P Shah, James A Diao, Emma Pierson, Chirag J Patel, and Arjun K Manrai. Disentangling proxies of demographic adjustments in clinical equations.arXiv [q-bio.QM], November 2025

  28. [28]

    Personalized statistical learning algorithms to improve the early detection of cancer using longitudinal biomarkers.Cancer Biomark., 33(2):199–210, 2022

    Nabihah Tayob and Ziding Feng. Personalized statistical learning algorithms to improve the early detection of cancer using longitudinal biomarkers.Cancer Biomark., 33(2):199–210, 2022

  29. [29]

    The incidentalome: a threat to genomic medicine

    Isaac S Kohane, Daniel R Masys, and Russ B Altman. The incidentalome: a threat to genomic medicine. JAMA, 296(2):212–215, July 2006

  30. [30]

    The frequency of unnecessary testing in hospitalized patients.Am

    Christina Koch, Katherine Roberts, Christopher Petruccelli, and Daniel J Morgan. The frequency of unnecessary testing in hospitalized patients.Am. J. Med., 131(5):500–503, May 2018. 13

  31. [31]

    Blood tests - too much of a good thing.Scand

    Henrik L Jørgensen and Bent S Lind. Blood tests - too much of a good thing.Scand. J. Prim. Health Care, 40(2):165–166, June 2022

  32. [32]

    More than half of abnormal results from laboratory tests ordered by family physicians could be false-positive.Can

    Christopher Naugler and Irene Ma. More than half of abnormal results from laboratory tests ordered by family physicians could be false-positive.Can. Fam. Physician, 64(3):202–203, March 2018

  33. [33]

    Laboratory reference intervals - history and modern approaches for improved utility.Scand

    Tony Badrick, Joe M El-Khoury, and Elvar Theodorsson. Laboratory reference intervals - history and modern approaches for improved utility.Scand. J. Clin. Lab. Invest., 85(4):229–241, June 2025

  34. [34]

    A comparison of methods to generate adaptive reference ranges in longitudinal monitoring

    Davood Roshan, John Ferguson, Charles R Pedlar, Andrew Simpkin, William Wyns, Frank Sullivan, and John Newell. A comparison of methods to generate adaptive reference ranges in longitudinal monitoring. PLoS One, 16(2):e0247338, February 2021

  35. [35]

    Scalable and accurate deep learning with electronic health records.NPJ Digit

    Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M Dai, Nissan Hajaj, Michaela Hardt, Peter J Liu, Xiaobing Liu, Jake Marcus, Mimi Sun, Patrik Sundberg, Hector Y ee, Kun Zhang, Yi Zhang, Gerardo Flores, Gavin E Duggan, Jamie Irvine, Quoc Le, Kurt Litsch, Alexander Mossin, Justin Tansuwan, De Wang, James Wexler, Jimbo Wilson, Dana Ludwig, Samuel L Volchenboum, ...

  36. [36]

    Event stream GPT: A data pre- processing and modeling library for generative, pre-trained transformers over continuous-time sequences of complex events.arXiv [cs.LG], June 2023

    Matthew B A McDermott, Bret Nestor, Peniel Argaw, and Isaac Kohane. Event stream GPT: A data pre- processing and modeling library for generative, pre-trained transformers over continuous-time sequences of complex events.arXiv [cs.LG], June 2023

  37. [37]

    TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records.Nat

    Zhichao Y ang, Avijit Mitra, Weisong Liu, Dan Berlowitz, and Hong Yu. TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records.Nat. Commun., 14(1):7857, November 2023

  38. [38]

    Health system-scale language models are all-purpose prediction engines.Nature, 619(7969):357–362, July 2023

    Lavender Y ao Jiang, Xujin Chris Liu, Nima Pour Nejatian, Mustafa Nasir-Moin, Duo Wang, Anas Abidin, Kevin Eaton, Howard Antony Riina, Ilya Laufer, Paawan Punjabi, Madeline Miceli, Nora C Kim, Cordelia Orillac, Zane Schnurman, Christopher Livia, Hannah Weiss, David Kurland, Sean Neifert, Y osef Dasta- girzada, Douglas Kondziolka, Alexander T M Cheung, Gra...

  39. [39]

    Learning the natural history of human disease with generative transformers.Nature, 647(8088):248–256, November 2025

    Artem Shmatko, Alexander Wolfgang Jung, Kumar Gaurav, Søren Brunak, Laust Hvas Mortensen, Ewan Birney, Tom Fitzgerald, and Moritz Gerstung. Learning the natural history of human disease with generative transformers.Nature, 647(8088):248–256, November 2025

  40. [40]

    Generative medical event models improve with scale.arXiv [cs.LG], November 2025

    Shane Waxler, Paul Blazek, Davis White, Daniel Sneider, Kevin Chung, Mani Nagarathnam, Patrick Williams, Hank Voeller, Karen Wong, Matthew Swanhorst, Sheng Zhang, Naoto Usuyama, Cliff Wong, Tristan Naumann, Hoifung Poon, Andrew Loza, Daniella Meeker, Seth Hain, and Rahul Shah. Generative medical event models improve with scale.arXiv [cs.LG], November 2025

  41. [41]

    Zero shot health trajectory prediction using transformer.NPJ Digit

    Pawel Renc, Yugang Jia, Anthony E Samir, Jaroslaw Was, Quanzheng Li, David W Bates, and Arkadiusz Sitek. Zero shot health trajectory prediction using transformer.NPJ Digit. Med., 7(1):256, September 2024

  42. [42]

    A multimodal and temporal foundation model for virtual patient representations at healthcare system scale.arXiv [cs.LG], April 2026

    Andrew Zhang, Tong Ding, Sophia J Wagner, Caiwei Tian, Ming Y Lu, Rowland Pettit, Joshua E Lewis, Alexandre Misrahi, Dandan Mo, Long Phi Le, and Faisal Mahmood. A multimodal and temporal foundation model for virtual patient representations at healthcare system scale.arXiv [cs.LG], April 2026

  43. [43]

    A foundation model for continuous glucose monitoring data.Nature, 650 14 (8103):978–986, February 2026

    Guy Lutsker, Gal Sapir, Smadar Shilo, Jordi Merino, Anastasia Godneva, Jerry R Greenfield, Dorit Samocha-Bonet, Raja Dhir, Francisco Gude, Shie Mannor, Eli Meirom, Eric P Xing, Gal Chechik, Hagai Rossman, and Eran Segal. A foundation model for continuous glucose monitoring data.Nature, 650 14 (8103):978–986, February 2026

  44. [44]

    Insulin resistance prediction from wearables and routine blood biomarkers.Nature, March 2026

    Ahmed A Metwally, A Ali Heydari, Daniel McDuff, Alexandru Solot, Zeinab Esmaeilpour, Anthony Z Faranesh, Menglian Zhou, Girish Narayanswamy, Maxwell A Xu, Xin Liu, Yuzhe Y ang, David B Savage, Mark Malhotra, Conor Heneghan, Shwetak Patel, Cathy Speed, and Javier L Prieto. Insulin resistance prediction from wearables and routine blood biomarkers.Nature, March 2026

  45. [45]

    Causal transformer for estimating counter- factual outcomes.arXiv [cs.LG], April 2022

    Valentyn Melnychuk, Dennis Frauen, and Stefan Feuerriegel. Causal transformer for estimating counter- factual outcomes.arXiv [cs.LG], April 2022

  46. [46]

    Controllable sequence editing for biological and clinical trajectories.arXiv [cs.LG], February 2025

    Michelle M Li, Kevin Li, Y asha Ektefaie, Ying Jin, Y epeng Huang, Shvat Messica, Tianxi Cai, and Marinka Zitnik. Controllable sequence editing for biological and clinical trajectories.arXiv [cs.LG], February 2025

  47. [47]

    SSM-CGM: Interpretable state-space forecasting model of continuous glucose monitoring for personalized diabetes management.arXiv [cs.LG], October 2025

    Shakson Isaac, Y entl Collin, and Chirag Patel. SSM-CGM: Interpretable state-space forecasting model of continuous glucose monitoring for personalized diabetes management.arXiv [cs.LG], October 2025

  48. [48]

    Generating longitudinal screening algorithms using novel biomarkers for disease.Cancer Epidemiol

    Martin W McIntosh, Nicole Urban, and Beth Karlan. Generating longitudinal screening algorithms using novel biomarkers for disease.Cancer Epidemiol. Biomarkers Prev., 11(2):159–166, February 2002

  49. [49]

    ClinVec: Unified embeddings of clinical codes enable knowledge-grounded AI in medicine.medRxiv, May 2025

    Ruth Johnson, Uri Gottlieb, Galit Shaham, Lihi Eisen, Jacob Waxman, Stav Devons-Sberro, Curtis R Ginder, Peter Hong, Raheel Sayeed, Xiaorui Su, Ben Y Reis, Ran D Balicer, Noa Dagan, and Marinka Zitnik. ClinVec: Unified embeddings of clinical codes enable knowledge-grounded AI in medicine.medRxiv, May 2025

  50. [50]

    Reducing health disparities: strategy planning and implementation in israel’s largest health care organization.Health Serv

    Ran D Balicer, Efrat Shadmi, Nicky Lieberman, Sari Greenberg-Dotan, Margalit Goldfracht, Liora Jana, Arnon D Cohen, Sigal Regev-Rosenberg, and Orit Jacobson. Reducing health disparities: strategy planning and implementation in israel’s largest health care organization.Health Serv. Res., 46(4): 1281–1299, August 2011

  51. [51]

    The eICU collaborative research database, a freely available multi-center database for critical care research

    Tom J Pollard, Alistair E W Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi. The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data, 5(1):180178, September 2018

  52. [52]

    INSPIRE, a publicly available research dataset for perioperative medicine.Sci

    Leerang Lim, Hyeonhoon Lee, Chul-Woo Jung, Dayeon Sim, Xavier Borrat, Tom J Pollard, Leo A Celi, Roger G Mark, Simon T Vistisen, and Hyung-Chul Lee. INSPIRE, a publicly available research dataset for perioperative medicine.Sci. Data, 11(1):655, June 2024

  53. [53]

    LOINC, a universal standard for identifying laboratory observations: a 5-year update.Clin

    Clement J McDonald, Stanley M Huff, Jeffrey G Suico, Gilbert Hill, Dennis Leavelle, Raymond Aller, Arden Forrey, Kathy Mercer, Georges DeMoor, John Hook, Warren Williams, James Case, and Pat Maloney. LOINC, a universal standard for identifying laboratory observations: a 5-year update.Clin. Chem., 49(4): 624–633, April 2003

  54. [54]

    ABIM laboratory test reference ranges

    American Board of Internal Medicine. ABIM laboratory test reference ranges. Technical report, January 2025

  55. [55]

    MIMIC-IV, October 2024

    Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Brian Gow, Benjamin Moody, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV, October 2024

  56. [56]

    Abnormal

    Michael Wornow, Rahul Thapa, Ethan Steinberg, Jason A Fries, and Nigam H Shah. EHRSHOT: An EHR benchmark for few-shot evaluation of foundation models.arXiv [cs.LG], July 2023. 15 Figures b Training and Validation Cohorts Specificity Sensitivity 19% 20%4 Lipid Panel Metabolic Function Hepatic Function Complete Blood Count PopRI PerRI NORMARI Population-Base...