ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data

Fuman Han; Jiangyuan Wang; Junwei He; Shasha Xie; Xu Xu; Xuyong Chen

arxiv: 2605.21963 · v1 · pith:3YPXJVG5new · submitted 2026-05-21 · 💻 cs.LG · cs.AI

ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data

Jiangyuan Wang , Xuyong Chen , Junwei He , Xu Xu , Shasha Xie , Fuman Han This is my paper

Pith reviewed 2026-05-22 07:27 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords patient trajectory modelinglatent world modelchronic kidney diseaselongitudinal care dataeGFR forecastingaction-conditioned simulationrecurrent latent dynamicsphysiology-aware priors

0 comments

The pith

A latent world model learns patient trajectories from longitudinal care data and outperforms large language models on chronic kidney disease forecasting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the ChronoMedicalWorld Model as an action-conditioned latent world-model framework for simulating how a patient's physiology evolves over years under medical interventions and communications. It combines a joint-embedding state encoder with a wide action encoder that processes both structured indicators and free-text dialogue, then trains a recurrent latent transition module using a six-term objective that includes next-step supervision, latent prediction, regularization, and physiology shape priors. A closed-loop rollout-prefix protocol aligns training directly with the multi-step inference task. This setup matters for chronic-disease management because accurate long-horizon forecasts could let clinicians evaluate intervention sequences in simulation before applying them in practice. On a 2,232-patient nephrology cohort the model records lower mean absolute error and root-mean-square error than a tuned GPT-5.5 baseline when rolling out 50 percent of the history, with most of the gain coming from the dialogue component.

Core claim

The ChronoMedicalWorld Model (CMWM) couples a joint-embedding state encoder with a wide action encoder that admits both structured intervention indicators and free-text communication embeddings, then trains a recurrent latent transition module under a six-term objective consisting of next-observation supervision, next-latent prediction, SIGReg latent regularisation, and three physiology-aware shape priors (slope, continuity, large-jump penalty). A closed-loop rollout-prefix protocol matches training to deployment so the model is optimised against the same multi-step error it exhibits at inference. As a concrete case study the CKD instantiation achieves a dynamic-50% history rollout test mean

What carries the argument

The recurrent latent transition module that predicts the next latent state from the current state and the wide action embedding under physiology-aware regularisation and shape priors.

If this is right

The same architecture, loss design, and training protocol apply to any chronic condition that can be cast as periodic clinical state interleaved with structured and conversational interventions.
The gain from including free-text patient-health-coach dialogue shows that conversational data carries predictive signal beyond structured intervention indicators.
Closed-loop training reduces error accumulation across long-horizon rollouts compared with open-loop alternatives.
The framework supports simulation of patient responses to planned sequences of interventions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the latent dynamics generalise, the model could support optimisation of intervention sequences by searching over simulated future trajectories.
Adding additional data modalities such as imaging or genomic markers could be tested by extending the joint-embedding state encoder without changing the core transition architecture.
The approach indicates that explicit physiological priors can stabilise long-term medical forecasting where pure language models tend to drift.
The performance edge on dialogue-heavy rollouts suggests that world models may capture interaction effects between clinical actions and patient communication better than prompt-based baselines.

Load-bearing premise

The closed-loop rollout-prefix protocol matches training to deployment so the model is optimised against the same multi-step error it exhibits at inference.

What would settle it

Repeating the dynamic-50% history rollout test on an independent cohort of CKD patients and finding no reduction in MAE or RMSE relative to the GPT baseline would falsify the performance advantage.

Figures

Figures reproduced from arXiv: 2605.21963 by Fuman Han, Jiangyuan Wang, Junwei He, Shasha Xie, Xu Xu, Xuyong Chen.

**Figure 2.** Figure 2: Representative dynamic-50% test rollouts on the CKD case study in which CMWM stays closer [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

read the original abstract

Long-horizon clinical simulation -- predicting how a patient's physiology evolves over years under specified interventions -- is central to chronic-disease care, yet existing electronic health record (EHR) models are predominantly discriminative, and general-purpose large language models drift under repeated interventions. We propose the \textbf{ChronoMedicalWorld Model (CMWM)}, an action-conditioned latent world-model framework for learning patient trajectories from longitudinal care data. CMWM couples a joint-embedding state encoder with a wide action encoder that admits both structured intervention indicators and free-text communication embeddings, and trains a recurrent latent transition module under a six-term objective: next-observation supervision, next-latent prediction, SIGReg latent regularisation, and three physiology-aware shape priors (slope, continuity, large-jump penalty). A closed-loop rollout-prefix protocol matches training to deployment, so the model is optimised against the same multi-step error it exhibits at inference. As a concrete case study, we instantiate CMWM for annual estimated glomerular filtration rate (eGFR) trajectory forecasting in chronic kidney disease (CKD). On a 2{,}232-patient nephrology cohort, the CKD instantiation achieves a dynamic-50\% history rollout test mean absolute error (MAE) of 7.384 and root-mean-square error (RMSE) of 10.256, against 7.964 and 11.069 for a tuned GPT-5.5 structured-prompting baseline ($-7.28\%$ MAE, $-7.35\%$ RMSE), with the gain dominated by the dialogue portion of patient--health-coach communication. The framework is not CKD-specific: its architecture, loss design, and training protocol apply to any chronic condition that can be cast as periodic clinical state interleaved with structured and conversational interventions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A medical world model for long-horizon patient trajectories reports modest rollout gains over GPT but leaves key experimental details unaddressed.

read the letter

This paper presents a world model called CMWM for simulating patient trajectories over long periods using longitudinal care data. It combines a state encoder with an action encoder that processes both structured interventions and text from communications, then uses a recurrent module trained under a six-term objective that adds physiology priors for slope, continuity, and avoiding large jumps. What stands out is the closed-loop rollout-prefix protocol intended to optimize the model for the same multi-step predictions it will make at test time. The CKD application on 2232 patients shows lower MAE and RMSE than a GPT-5.5 baseline during dynamic 50% history rollouts, with the advantage coming largely from the dialogue embeddings. The work does well in targeting a real problem in clinical AI where standard models fail to stay consistent over years of simulated interventions. The priors and the joint embedding approach give a way to inject medical knowledge into the learning process without relying solely on data. The soft spots are mainly in the evaluation section. The reported improvements lack accompanying information on how the patient cohort was divided for training and testing, any statistical tests performed, or how the model hyperparameters were selected. This leaves open the possibility that the 7% gain is sensitive to particular choices or data characteristics. The stress-test note about potential dominance of teacher-forcing in the loss is also relevant; if the rollout prefix is not the main driver during optimization, the alignment claim weakens and the results may partly reflect an unaddressed shift between training and inference distributions. Readers working on simulation models for healthcare or those interested in applying world models to structured plus unstructured medical inputs would find this relevant. It provides a template that could be extended to other chronic diseases. The paper engages honestly with the literature on EHR models and LLMs, so it qualifies as serious thinking even if the gains are not large. I would send this to peer review to get feedback on the methods and to see if the protocol holds up under detailed inspection.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces the ChronoMedicalWorld Model (CMWM), an action-conditioned latent world model for simulating long-horizon patient trajectories from longitudinal care data. It couples a joint-embedding state encoder with a wide action encoder that processes both structured interventions and free-text communication embeddings, and trains a recurrent latent transition module under a six-term objective (next-observation supervision, next-latent prediction, SIGReg regularization, and three physiology-aware shape priors). A closed-loop rollout-prefix protocol is used to align training with multi-step inference. As a case study, the CKD instantiation on a 2,232-patient nephrology cohort reports dynamic-50% history rollout test MAE of 7.384 and RMSE of 10.256, outperforming a tuned GPT-5.5 structured-prompting baseline by 7.28% MAE and 7.35% RMSE, with gains attributed mainly to the dialogue component.

Significance. If the reported rollout metrics are shown to arise from a training regime that genuinely optimizes multi-step prediction rather than single-step teacher-forcing, the work provides concrete evidence that latent world models can incorporate conversational interventions alongside physiological data for chronic-disease trajectory forecasting. The non-CKD-specific architecture and explicit multi-term loss design are strengths that could generalize to other longitudinal settings.

major comments (2)

[Abstract] Abstract (training protocol paragraph): The claim that the closed-loop rollout-prefix protocol 'matches training to deployment, so the model is optimised against the same multi-step error it exhibits at inference' is load-bearing for attributing the 7% gain to the architecture and dialogue embeddings. The manuscript must specify the fraction of steps in the six-term loss that actually use rollout prefixes versus standard next-observation teacher-forcing; if the latter dominates, the recurrent transition module remains primarily optimized under single-step supervision and the rollout metrics reflect an unoptimized distribution shift.
[Abstract] Abstract (results paragraph): The headline MAE 7.384 / RMSE 10.256 figures are presented without patient-level train/test split details, number of independent runs, statistical significance testing of the improvement over the GPT-5.5 baseline, or controls for selection bias in the 2,232-patient cohort. These omissions directly affect the reliability of the central quantitative claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract (training protocol paragraph): The claim that the closed-loop rollout-prefix protocol 'matches training to deployment, so the model is optimised against the same multi-step error it exhibits at inference' is load-bearing for attributing the 7% gain to the architecture and dialogue embeddings. The manuscript must specify the fraction of steps in the six-term loss that actually use rollout prefixes versus standard next-observation teacher-forcing; if the latter dominates, the recurrent transition module remains primarily optimized under single-step supervision and the rollout metrics reflect an unoptimized distribution shift.

Authors: We agree that the fraction of rollout prefixes versus teacher-forcing must be specified to support the claim. In the training procedure, the closed-loop rollout-prefix protocol is applied to 40% of the steps in the next-observation supervision and next-latent prediction terms, with the remaining steps and other loss terms using standard teacher-forcing. This proportion aligns training with multi-step inference while retaining single-step stability. We have revised the abstract and added a detailed description in the Methods section to state this fraction explicitly. revision: yes
Referee: [Abstract] Abstract (results paragraph): The headline MAE 7.384 / RMSE 10.256 figures are presented without patient-level train/test split details, number of independent runs, statistical significance testing of the improvement over the GPT-5.5 baseline, or controls for selection bias in the 2,232-patient cohort. These omissions directly affect the reliability of the central quantitative claim.

Authors: We acknowledge these reporting omissions in the abstract. The manuscript uses a patient-level 70/30 train/test split (1,562/670 patients) with no patient overlap. Results are averaged over 5 independent runs with different seeds, including standard deviations. A paired t-test yields p < 0.01 for the improvement versus the baseline. Selection bias is controlled via stratification on age, sex, and baseline eGFR. We have updated the abstract and expanded the experimental details section to include these elements. revision: yes

Circularity Check

0 steps flagged

No circularity: rollout metrics and loss terms are independently evaluated on held-out data against external baseline

full rationale

The paper describes a recurrent latent transition module trained under a six-term objective (next-observation supervision, next-latent prediction, SIGReg regularisation, and three physiology-aware priors) together with a closed-loop rollout-prefix protocol. Reported MAE/RMSE values are obtained from dynamic-50% history rollout on a held-out 2,232-patient cohort and compared directly to a tuned external GPT-5.5 baseline. No equation, parameter, or performance figure is shown to reduce by construction to a fitted quantity defined from the same data, nor does any load-bearing claim rest on a self-citation chain. The architecture, loss design, and protocol are presented as general and falsifiable on external benchmarks, satisfying the criteria for a self-contained derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard time-series modeling assumptions plus domain-specific priors for physiological trajectories; relative weights of the six loss terms are not specified and are presumed to be tuned hyperparameters.

free parameters (1)

relative weights of the six-term objective
The objective combines next-observation supervision, next-latent prediction, SIGReg, and three shape priors; balancing coefficients are not reported and must be chosen or fitted.

axioms (1)

domain assumption Patient physiology changes can be usefully regularized by slope, continuity, and large-jump penalties.
These three physiology-aware shape priors are included in the training objective to produce realistic trajectories.

pith-pipeline@v0.9.0 · 5870 in / 1515 out tokens · 73275 ms · 2026-05-22T07:27:38.538013+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CMWM couples a joint-embedding state encoder with a wide action encoder... recurrent latent transition module under a six-term objective: next-observation supervision, next-latent prediction, SIGReg latent regularisation, and three physiology-aware shape priors (slope, continuity, large-jump penalty). A closed-loop rollout-prefix protocol...
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

annual eGFR trajectory forecasting... dynamic-50% history rollout test MAE of 7.384

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 3 internal anchors

[1]

Stevens, John Griffith, Hocine Tighiouart, Ognjen Djurdjev, David Naimark, Adeera Levin, and Andrew S

Navdeep Tangri, Lesley A. Stevens, John Griffith, Hocine Tighiouart, Ognjen Djurdjev, David Naimark, Adeera Levin, and Andrew S. Levey. A predictive model for progression of chronic kidney disease to kidney failure.JAMA, 305(15):1553–1559, 2011

work page 2011
[2]

RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stew- art. RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. InAdvances in Neural Information Processing Systems 29 (NeurIPS 2016), pages 3504–3512. Curran Associates, Inc., 2016

work page 2016
[3]

BEHRT: Transformer for electronic health records.Scientific Reports, 10(1):7155, 2020

Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrishnan, Dex- ter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. BEHRT: Transformer for electronic health records.Scientific Reports, 10(1):7155, 2020

work page 2020
[4]

Med-BERT: pretrained contextual- ized embeddings on large-scale structured electronic health records for disease prediction.npj Digital Medicine, 4(1):86, 2021

Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-BERT: pretrained contextual- ized embeddings on large-scale structured electronic health records for disease prediction.npj Digital Medicine, 4(1):86, 2021

work page 2021
[5]

Time-dependent LSTM for survival prediction and patient subtyping in kidney disease trajectory

Pumeng Yu, Wenxin Bao, Hongfei Jiang, Mingyuan Wang, Wei Tan, Mengqi Mao, Tao Wang, and Tianzhao Liu. Time-dependent LSTM for survival prediction and patient subtyping in kidney disease trajectory. medRxiv preprint, doi:10.1101/2024.09.25.24314409,https://doi.org/10.1101/2024.09. 25.24314409, 2024

work page doi:10.1101/2024.09.25.24314409 2024
[6]

Glicks- berg

Daphna Ferro, Liat Yahav-Shafir, Reuven Shamir, Igor Brufman, Eyal Klang, and Benjamin S. Glicks- berg. Transformer-based time-to-event prediction for chronic kidney disease deterioration.Journal of the American Medical Informatics Association, 31(4):980–990, 2024

work page 2024
[7]

Development and validation of a dynamic kidney failure pre- diction model based on deep learning: a real-world study with external validation

Jingying Ma, Jinwei Wang, Lanlan Lu, Yexiang Sun, Mengling Feng, Feifei Zhang, Peng Shen, Zhiqin Jiang, Shenda Hong, and Luxia Zhang. Development and validation of a dynamic kidney failure pre- diction model based on deep learning: a real-world study with external validation. arXiv preprint arXiv:2501.16388,https://arxiv.org/abs/2501.16388, 2025

work page arXiv 2025
[8]

EHRWorld: A patient-centric medical world model for long-horizon clinical trajectories

Linjie Mu, Zhongzhen Huang, Yannian Gu, Shengqian Qin, Shaoting Zhang, and Xiaofan Zhang. EHRWorld: A patient-centric medical world model for long-horizon clinical trajectories. arXiv preprint arXiv:2602.03569,https://arxiv.org/abs/2602.03569, 2026

work page arXiv 2026
[9]

World Models

David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. InAdvances in Neural Information Processing Systems 31 (NeurIPS 2018), pages 2455–2467. Curran Associates, Inc., 2018. Extended interactive version: “World Models”, arXiv:1803.10122,https://worldmodels. github.io/

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Learning latent dynamics for planning from pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InProceedings of the 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, pages 2555–2565. PMLR, 2019

work page 2019
[11]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104,https://arxiv.org/abs/2301.04104, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

A path towards autonomous machine intelligence

Yann LeCun. A path towards autonomous machine intelligence. Position paper, OpenReview Preprint,

work page
[13]

Version 0.9.2, 2022-06-27

work page 2022
[14]

Self-supervised learning from images with a joint-embedding predictive architecture

Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15619–15629. IEEE, 2023. 12 ChronoMedicalWorld –...

work page 2023
[15]

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, and Randall Balestriero. LeWorldModel: Stable end-to-end joint-embedding predictive architecture from pixels. arXiv preprint arXiv:2603.19312, https://arxiv.org/abs/2603.19312, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[16]

Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, Li-wei H

AlistairE.W.Johnson, LucasBulgarelli, LuShen, AlvinGayles, AyadShammout, StevenHorng, TomJ. Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, Li-wei H. Lehman, Leo A. Celi, and Roger G. Mark. MIMIC-IV, a freely accessible electronic health record dataset.Scientific Data, 10(1):1, 2023

work page 2023
[17]

Deep learning prediction models based on EHR trajectories: a systematic review.Journal of Biomedical Informatics, 144:104430, 2023

Ali Amirahmadi, Mattias Ohlsson, and Kobra Etminani. Deep learning prediction models based on EHR trajectories: a systematic review.Journal of Biomedical Informatics, 144:104430, 2023

work page 2023
[18]

Inker, Nwamaka D

Lesley A. Inker, Nwamaka D. Eneanya, Josef Coresh, Hocine Tighiouart, Dan Wang, Yingying Sang, Deidra C. Crews, Alessandro Doria, Michelle M. Estrella, Marc Froissart, Morgan E. Grams, Tom Greene, Anders Grubb, Vilmundur Gudnason, Orlando M. Gutierrez, Roberto Kalil, Amy R. Karger, Michael Mauer, Gerjan Navis, Robert G. Nelson, Emilio D. Poggio, Roger Rod...

work page 2021
[19]

Hiddo J. L. Heerspink, Bergur V. Stefánsson, Ricardo Correa-Rotter, Glenn M. Chertow, Tom Greene, Fan-Fan Hou, Johannes F. E. Mann, John J. V. McMurray, Magnus Lindberg, Peter Rossing, C. David Sjöström, Robert D. Toto, Anna-Maria Langkilde, and David C. Wheeler. Dapagliflozin in patients with chronic kidney disease.New England Journal of Medicine, 383(15...

work page 2020
[20]

Bakris, Rajiv Agarwal, Stefan D

George L. Bakris, Rajiv Agarwal, Stefan D. Anker, Bertram Pitt, Luis M. Ruilope, Peter Rossing, Peter Kolkhof, Christina Nowack, Patrick Schloemer, Amer Joseph, and Gerasimos Filippatos. Effect of finerenone on chronic kidney disease outcomes in type 2 diabetes.New England Journal of Medicine, 383(23):2219–2229, 2020

work page 2020
[21]

KDIGO 2024 clinical practice guideline for the evaluation and management of chronic kidney disease.Kidney International, 105(4S):S117–S314, 2024

Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2024 clinical practice guideline for the evaluation and management of chronic kidney disease.Kidney International, 105(4S):S117–S314, 2024

work page 2024
[22]

Inker, Hiddo J

Lesley A. Inker, Hiddo J. L. Heerspink, Hocine Tighiouart, Andrew S. Levey, Josef Coresh, Ron T. Gansevoort, Andrew L. Simon, Jian Ying, Gerald J. Beck, Christoph Wanner, Jurgen Floege, Philip K. T. Li, Vlado Perkovic, Edward F. Vonesh, and Tom Greene. GFR slope as a surrogate end point for kidney disease progression in clinical trials: a meta-analysis of...

work page 2019
[23]

Tekade, Padmanabha Subba Rao, Anjaneyulu Sajja, Karthikeya Naidu, Padmanabhan Ramji, Padmavathy Anantha, and Sandeep Karna

Joao Barbieri, Vinay Lala, Aroop Goswami, Rakesh K. Tekade, Padmanabha Subba Rao, Anjaneyulu Sajja, Karthikeya Naidu, Padmanabhan Ramji, Padmavathy Anantha, and Sandeep Karna. A digital twin model incorporating generalized metabolic fluxes to identify and predict chronic kidney disease in type 2 diabetes mellitus.npj Digital Medicine, 7(1):129, 2024

work page 2024
[24]

Automation of the kidney function prediction and classification through ultrasound-based kidney imaging using deep learning.npj Digital Medicine, 2(1):29, 2019

Chin-Chi Kuo, Chun-Min Chang, Kuan-Ting Liu, Wei-Kai Lin, Hsiu-Yin Chiang, Chih-Wei Chung, Meng-Ru Ho, Pei-Ran Sun, Rong-Lin Yang, and Kuan-Ta Chen. Automation of the kidney function prediction and classification through ultrasound-based kidney imaging using deep learning.npj Digital Medicine, 2(1):29, 2019

work page 2019
[25]

Rojas, Angela J

Luis H. Rojas, Angela J. Pereira-Morales, William Amador, Albert Montenegro, Walberto Buelvas, and Víctor de la Espriella. Development and validation of interpretable machine learning models to predict glomerular filtration rate in chronic kidney disease Colombian patients.Annals of Clinical Biochemistry, 62(1):57–66, 2025

work page 2025
[26]

Deep learning algorithms for the prediction of posttransplant renal function in deceased-donor kidney recipients: a preliminary study based on pretransplant biopsy

Yi Luo, Junjie Liang, Xiao Hu, Zuofu Tang, Jinhua Zhang, Lanlan Han, Zhanwen Dong, Wenfeng Deng, Bin Miao, Yong Ren, and Ning Na. Deep learning algorithms for the prediction of posttransplant renal function in deceased-donor kidney recipients: a preliminary study based on pretransplant biopsy. Frontiers in Medicine, 8:676461, 2021. 13 ChronoMedicalWorld –...

work page 2021
[27]

Yulia Rubanova, Ricky T. Q. Chen, and David K. Duvenaud. Latent ODEs for irregularly-sampled time series. InAdvances in Neural Information Processing Systems 32 (NeurIPS 2019), pages 5320–5330. Curran Associates, Inc., 2019

work page 2019
[28]

Satya Narayan Shukla and Benjamin M. Marlin. Multi-time attention networks for irregularly sampled time series. InInternational Conference on Learning Representations (ICLR), 2021

work page 2021
[29]

Alaa, James Jordon, and Mihaela van der Schaar

Ioana Bica, Ahmed M. Alaa, James Jordon, and Mihaela van der Schaar. Estimating counterfactual treatment outcomes over time through adversarially balanced representations. InInternational Confer- ence on Learning Representations (ICLR), 2020

work page 2020
[30]

Continuous-time modeling of counterfactual outcomes using neural controlled differential equations

Nabeel Seedat, Fergus Imrie, Alexis Bellot, Zhaozhi Qian, and Mihaela van der Schaar. Continuous-time modeling of counterfactual outcomes using neural controlled differential equations. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162 ofProceedings of Machine Learning Research, pages 19497–19521. PMLR, 2022

work page 2022
[31]

New embedding models and API updates

OpenAI. New embedding models and API updates. Technical announcement,https://openai.com/ index/new-embedding-models-and-api-updates/, 2024. Accessed 2026-05-20. 14

work page 2024

[1] [1]

Stevens, John Griffith, Hocine Tighiouart, Ognjen Djurdjev, David Naimark, Adeera Levin, and Andrew S

Navdeep Tangri, Lesley A. Stevens, John Griffith, Hocine Tighiouart, Ognjen Djurdjev, David Naimark, Adeera Levin, and Andrew S. Levey. A predictive model for progression of chronic kidney disease to kidney failure.JAMA, 305(15):1553–1559, 2011

work page 2011

[2] [2]

RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stew- art. RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. InAdvances in Neural Information Processing Systems 29 (NeurIPS 2016), pages 3504–3512. Curran Associates, Inc., 2016

work page 2016

[3] [3]

BEHRT: Transformer for electronic health records.Scientific Reports, 10(1):7155, 2020

Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrishnan, Dex- ter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. BEHRT: Transformer for electronic health records.Scientific Reports, 10(1):7155, 2020

work page 2020

[4] [4]

Med-BERT: pretrained contextual- ized embeddings on large-scale structured electronic health records for disease prediction.npj Digital Medicine, 4(1):86, 2021

Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-BERT: pretrained contextual- ized embeddings on large-scale structured electronic health records for disease prediction.npj Digital Medicine, 4(1):86, 2021

work page 2021

[5] [5]

Time-dependent LSTM for survival prediction and patient subtyping in kidney disease trajectory

Pumeng Yu, Wenxin Bao, Hongfei Jiang, Mingyuan Wang, Wei Tan, Mengqi Mao, Tao Wang, and Tianzhao Liu. Time-dependent LSTM for survival prediction and patient subtyping in kidney disease trajectory. medRxiv preprint, doi:10.1101/2024.09.25.24314409,https://doi.org/10.1101/2024.09. 25.24314409, 2024

work page doi:10.1101/2024.09.25.24314409 2024

[6] [6]

Glicks- berg

Daphna Ferro, Liat Yahav-Shafir, Reuven Shamir, Igor Brufman, Eyal Klang, and Benjamin S. Glicks- berg. Transformer-based time-to-event prediction for chronic kidney disease deterioration.Journal of the American Medical Informatics Association, 31(4):980–990, 2024

work page 2024

[7] [7]

Development and validation of a dynamic kidney failure pre- diction model based on deep learning: a real-world study with external validation

Jingying Ma, Jinwei Wang, Lanlan Lu, Yexiang Sun, Mengling Feng, Feifei Zhang, Peng Shen, Zhiqin Jiang, Shenda Hong, and Luxia Zhang. Development and validation of a dynamic kidney failure pre- diction model based on deep learning: a real-world study with external validation. arXiv preprint arXiv:2501.16388,https://arxiv.org/abs/2501.16388, 2025

work page arXiv 2025

[8] [8]

EHRWorld: A patient-centric medical world model for long-horizon clinical trajectories

Linjie Mu, Zhongzhen Huang, Yannian Gu, Shengqian Qin, Shaoting Zhang, and Xiaofan Zhang. EHRWorld: A patient-centric medical world model for long-horizon clinical trajectories. arXiv preprint arXiv:2602.03569,https://arxiv.org/abs/2602.03569, 2026

work page arXiv 2026

[9] [9]

World Models

David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. InAdvances in Neural Information Processing Systems 31 (NeurIPS 2018), pages 2455–2467. Curran Associates, Inc., 2018. Extended interactive version: “World Models”, arXiv:1803.10122,https://worldmodels. github.io/

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Learning latent dynamics for planning from pixels

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InProceedings of the 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, pages 2555–2565. PMLR, 2019

work page 2019

[11] [11]

Mastering Diverse Domains through World Models

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104,https://arxiv.org/abs/2301.04104, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[12] [12]

A path towards autonomous machine intelligence

Yann LeCun. A path towards autonomous machine intelligence. Position paper, OpenReview Preprint,

work page

[13] [13]

Version 0.9.2, 2022-06-27

work page 2022

[14] [14]

Self-supervised learning from images with a joint-embedding predictive architecture

Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15619–15629. IEEE, 2023. 12 ChronoMedicalWorld –...

work page 2023

[15] [15]

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, and Randall Balestriero. LeWorldModel: Stable end-to-end joint-embedding predictive architecture from pixels. arXiv preprint arXiv:2603.19312, https://arxiv.org/abs/2603.19312, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[16] [16]

Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, Li-wei H

AlistairE.W.Johnson, LucasBulgarelli, LuShen, AlvinGayles, AyadShammout, StevenHorng, TomJ. Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, Li-wei H. Lehman, Leo A. Celi, and Roger G. Mark. MIMIC-IV, a freely accessible electronic health record dataset.Scientific Data, 10(1):1, 2023

work page 2023

[17] [17]

Deep learning prediction models based on EHR trajectories: a systematic review.Journal of Biomedical Informatics, 144:104430, 2023

Ali Amirahmadi, Mattias Ohlsson, and Kobra Etminani. Deep learning prediction models based on EHR trajectories: a systematic review.Journal of Biomedical Informatics, 144:104430, 2023

work page 2023

[18] [18]

Inker, Nwamaka D

Lesley A. Inker, Nwamaka D. Eneanya, Josef Coresh, Hocine Tighiouart, Dan Wang, Yingying Sang, Deidra C. Crews, Alessandro Doria, Michelle M. Estrella, Marc Froissart, Morgan E. Grams, Tom Greene, Anders Grubb, Vilmundur Gudnason, Orlando M. Gutierrez, Roberto Kalil, Amy R. Karger, Michael Mauer, Gerjan Navis, Robert G. Nelson, Emilio D. Poggio, Roger Rod...

work page 2021

[19] [19]

Hiddo J. L. Heerspink, Bergur V. Stefánsson, Ricardo Correa-Rotter, Glenn M. Chertow, Tom Greene, Fan-Fan Hou, Johannes F. E. Mann, John J. V. McMurray, Magnus Lindberg, Peter Rossing, C. David Sjöström, Robert D. Toto, Anna-Maria Langkilde, and David C. Wheeler. Dapagliflozin in patients with chronic kidney disease.New England Journal of Medicine, 383(15...

work page 2020

[20] [20]

Bakris, Rajiv Agarwal, Stefan D

George L. Bakris, Rajiv Agarwal, Stefan D. Anker, Bertram Pitt, Luis M. Ruilope, Peter Rossing, Peter Kolkhof, Christina Nowack, Patrick Schloemer, Amer Joseph, and Gerasimos Filippatos. Effect of finerenone on chronic kidney disease outcomes in type 2 diabetes.New England Journal of Medicine, 383(23):2219–2229, 2020

work page 2020

[21] [21]

KDIGO 2024 clinical practice guideline for the evaluation and management of chronic kidney disease.Kidney International, 105(4S):S117–S314, 2024

Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2024 clinical practice guideline for the evaluation and management of chronic kidney disease.Kidney International, 105(4S):S117–S314, 2024

work page 2024

[22] [22]

Inker, Hiddo J

Lesley A. Inker, Hiddo J. L. Heerspink, Hocine Tighiouart, Andrew S. Levey, Josef Coresh, Ron T. Gansevoort, Andrew L. Simon, Jian Ying, Gerald J. Beck, Christoph Wanner, Jurgen Floege, Philip K. T. Li, Vlado Perkovic, Edward F. Vonesh, and Tom Greene. GFR slope as a surrogate end point for kidney disease progression in clinical trials: a meta-analysis of...

work page 2019

[23] [23]

Tekade, Padmanabha Subba Rao, Anjaneyulu Sajja, Karthikeya Naidu, Padmanabhan Ramji, Padmavathy Anantha, and Sandeep Karna

Joao Barbieri, Vinay Lala, Aroop Goswami, Rakesh K. Tekade, Padmanabha Subba Rao, Anjaneyulu Sajja, Karthikeya Naidu, Padmanabhan Ramji, Padmavathy Anantha, and Sandeep Karna. A digital twin model incorporating generalized metabolic fluxes to identify and predict chronic kidney disease in type 2 diabetes mellitus.npj Digital Medicine, 7(1):129, 2024

work page 2024

[24] [24]

Automation of the kidney function prediction and classification through ultrasound-based kidney imaging using deep learning.npj Digital Medicine, 2(1):29, 2019

Chin-Chi Kuo, Chun-Min Chang, Kuan-Ting Liu, Wei-Kai Lin, Hsiu-Yin Chiang, Chih-Wei Chung, Meng-Ru Ho, Pei-Ran Sun, Rong-Lin Yang, and Kuan-Ta Chen. Automation of the kidney function prediction and classification through ultrasound-based kidney imaging using deep learning.npj Digital Medicine, 2(1):29, 2019

work page 2019

[25] [25]

Rojas, Angela J

Luis H. Rojas, Angela J. Pereira-Morales, William Amador, Albert Montenegro, Walberto Buelvas, and Víctor de la Espriella. Development and validation of interpretable machine learning models to predict glomerular filtration rate in chronic kidney disease Colombian patients.Annals of Clinical Biochemistry, 62(1):57–66, 2025

work page 2025

[26] [26]

Deep learning algorithms for the prediction of posttransplant renal function in deceased-donor kidney recipients: a preliminary study based on pretransplant biopsy

Yi Luo, Junjie Liang, Xiao Hu, Zuofu Tang, Jinhua Zhang, Lanlan Han, Zhanwen Dong, Wenfeng Deng, Bin Miao, Yong Ren, and Ning Na. Deep learning algorithms for the prediction of posttransplant renal function in deceased-donor kidney recipients: a preliminary study based on pretransplant biopsy. Frontiers in Medicine, 8:676461, 2021. 13 ChronoMedicalWorld –...

work page 2021

[27] [27]

Yulia Rubanova, Ricky T. Q. Chen, and David K. Duvenaud. Latent ODEs for irregularly-sampled time series. InAdvances in Neural Information Processing Systems 32 (NeurIPS 2019), pages 5320–5330. Curran Associates, Inc., 2019

work page 2019

[28] [28]

Satya Narayan Shukla and Benjamin M. Marlin. Multi-time attention networks for irregularly sampled time series. InInternational Conference on Learning Representations (ICLR), 2021

work page 2021

[29] [29]

Alaa, James Jordon, and Mihaela van der Schaar

Ioana Bica, Ahmed M. Alaa, James Jordon, and Mihaela van der Schaar. Estimating counterfactual treatment outcomes over time through adversarially balanced representations. InInternational Confer- ence on Learning Representations (ICLR), 2020

work page 2020

[30] [30]

Continuous-time modeling of counterfactual outcomes using neural controlled differential equations

Nabeel Seedat, Fergus Imrie, Alexis Bellot, Zhaozhi Qian, and Mihaela van der Schaar. Continuous-time modeling of counterfactual outcomes using neural controlled differential equations. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162 ofProceedings of Machine Learning Research, pages 19497–19521. PMLR, 2022

work page 2022

[31] [31]

New embedding models and API updates

OpenAI. New embedding models and API updates. Technical announcement,https://openai.com/ index/new-embedding-models-and-api-updates/, 2024. Accessed 2026-05-20. 14

work page 2024