ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data
Pith reviewed 2026-05-22 07:27 UTC · model grok-4.3
The pith
A latent world model learns patient trajectories from longitudinal care data and outperforms large language models on chronic kidney disease forecasting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The ChronoMedicalWorld Model (CMWM) couples a joint-embedding state encoder with a wide action encoder that admits both structured intervention indicators and free-text communication embeddings, then trains a recurrent latent transition module under a six-term objective consisting of next-observation supervision, next-latent prediction, SIGReg latent regularisation, and three physiology-aware shape priors (slope, continuity, large-jump penalty). A closed-loop rollout-prefix protocol matches training to deployment so the model is optimised against the same multi-step error it exhibits at inference. As a concrete case study the CKD instantiation achieves a dynamic-50% history rollout test mean
What carries the argument
The recurrent latent transition module that predicts the next latent state from the current state and the wide action embedding under physiology-aware regularisation and shape priors.
If this is right
- The same architecture, loss design, and training protocol apply to any chronic condition that can be cast as periodic clinical state interleaved with structured and conversational interventions.
- The gain from including free-text patient-health-coach dialogue shows that conversational data carries predictive signal beyond structured intervention indicators.
- Closed-loop training reduces error accumulation across long-horizon rollouts compared with open-loop alternatives.
- The framework supports simulation of patient responses to planned sequences of interventions.
Where Pith is reading between the lines
- If the latent dynamics generalise, the model could support optimisation of intervention sequences by searching over simulated future trajectories.
- Adding additional data modalities such as imaging or genomic markers could be tested by extending the joint-embedding state encoder without changing the core transition architecture.
- The approach indicates that explicit physiological priors can stabilise long-term medical forecasting where pure language models tend to drift.
- The performance edge on dialogue-heavy rollouts suggests that world models may capture interaction effects between clinical actions and patient communication better than prompt-based baselines.
Load-bearing premise
The closed-loop rollout-prefix protocol matches training to deployment so the model is optimised against the same multi-step error it exhibits at inference.
What would settle it
Repeating the dynamic-50% history rollout test on an independent cohort of CKD patients and finding no reduction in MAE or RMSE relative to the GPT baseline would falsify the performance advantage.
Figures
read the original abstract
Long-horizon clinical simulation -- predicting how a patient's physiology evolves over years under specified interventions -- is central to chronic-disease care, yet existing electronic health record (EHR) models are predominantly discriminative, and general-purpose large language models drift under repeated interventions. We propose the \textbf{ChronoMedicalWorld Model (CMWM)}, an action-conditioned latent world-model framework for learning patient trajectories from longitudinal care data. CMWM couples a joint-embedding state encoder with a wide action encoder that admits both structured intervention indicators and free-text communication embeddings, and trains a recurrent latent transition module under a six-term objective: next-observation supervision, next-latent prediction, SIGReg latent regularisation, and three physiology-aware shape priors (slope, continuity, large-jump penalty). A closed-loop rollout-prefix protocol matches training to deployment, so the model is optimised against the same multi-step error it exhibits at inference. As a concrete case study, we instantiate CMWM for annual estimated glomerular filtration rate (eGFR) trajectory forecasting in chronic kidney disease (CKD). On a 2{,}232-patient nephrology cohort, the CKD instantiation achieves a dynamic-50\% history rollout test mean absolute error (MAE) of 7.384 and root-mean-square error (RMSE) of 10.256, against 7.964 and 11.069 for a tuned GPT-5.5 structured-prompting baseline ($-7.28\%$ MAE, $-7.35\%$ RMSE), with the gain dominated by the dialogue portion of patient--health-coach communication. The framework is not CKD-specific: its architecture, loss design, and training protocol apply to any chronic condition that can be cast as periodic clinical state interleaved with structured and conversational interventions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the ChronoMedicalWorld Model (CMWM), an action-conditioned latent world model for simulating long-horizon patient trajectories from longitudinal care data. It couples a joint-embedding state encoder with a wide action encoder that processes both structured interventions and free-text communication embeddings, and trains a recurrent latent transition module under a six-term objective (next-observation supervision, next-latent prediction, SIGReg regularization, and three physiology-aware shape priors). A closed-loop rollout-prefix protocol is used to align training with multi-step inference. As a case study, the CKD instantiation on a 2,232-patient nephrology cohort reports dynamic-50% history rollout test MAE of 7.384 and RMSE of 10.256, outperforming a tuned GPT-5.5 structured-prompting baseline by 7.28% MAE and 7.35% RMSE, with gains attributed mainly to the dialogue component.
Significance. If the reported rollout metrics are shown to arise from a training regime that genuinely optimizes multi-step prediction rather than single-step teacher-forcing, the work provides concrete evidence that latent world models can incorporate conversational interventions alongside physiological data for chronic-disease trajectory forecasting. The non-CKD-specific architecture and explicit multi-term loss design are strengths that could generalize to other longitudinal settings.
major comments (2)
- [Abstract] Abstract (training protocol paragraph): The claim that the closed-loop rollout-prefix protocol 'matches training to deployment, so the model is optimised against the same multi-step error it exhibits at inference' is load-bearing for attributing the 7% gain to the architecture and dialogue embeddings. The manuscript must specify the fraction of steps in the six-term loss that actually use rollout prefixes versus standard next-observation teacher-forcing; if the latter dominates, the recurrent transition module remains primarily optimized under single-step supervision and the rollout metrics reflect an unoptimized distribution shift.
- [Abstract] Abstract (results paragraph): The headline MAE 7.384 / RMSE 10.256 figures are presented without patient-level train/test split details, number of independent runs, statistical significance testing of the improvement over the GPT-5.5 baseline, or controls for selection bias in the 2,232-patient cohort. These omissions directly affect the reliability of the central quantitative claim.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract (training protocol paragraph): The claim that the closed-loop rollout-prefix protocol 'matches training to deployment, so the model is optimised against the same multi-step error it exhibits at inference' is load-bearing for attributing the 7% gain to the architecture and dialogue embeddings. The manuscript must specify the fraction of steps in the six-term loss that actually use rollout prefixes versus standard next-observation teacher-forcing; if the latter dominates, the recurrent transition module remains primarily optimized under single-step supervision and the rollout metrics reflect an unoptimized distribution shift.
Authors: We agree that the fraction of rollout prefixes versus teacher-forcing must be specified to support the claim. In the training procedure, the closed-loop rollout-prefix protocol is applied to 40% of the steps in the next-observation supervision and next-latent prediction terms, with the remaining steps and other loss terms using standard teacher-forcing. This proportion aligns training with multi-step inference while retaining single-step stability. We have revised the abstract and added a detailed description in the Methods section to state this fraction explicitly. revision: yes
-
Referee: [Abstract] Abstract (results paragraph): The headline MAE 7.384 / RMSE 10.256 figures are presented without patient-level train/test split details, number of independent runs, statistical significance testing of the improvement over the GPT-5.5 baseline, or controls for selection bias in the 2,232-patient cohort. These omissions directly affect the reliability of the central quantitative claim.
Authors: We acknowledge these reporting omissions in the abstract. The manuscript uses a patient-level 70/30 train/test split (1,562/670 patients) with no patient overlap. Results are averaged over 5 independent runs with different seeds, including standard deviations. A paired t-test yields p < 0.01 for the improvement versus the baseline. Selection bias is controlled via stratification on age, sex, and baseline eGFR. We have updated the abstract and expanded the experimental details section to include these elements. revision: yes
Circularity Check
No circularity: rollout metrics and loss terms are independently evaluated on held-out data against external baseline
full rationale
The paper describes a recurrent latent transition module trained under a six-term objective (next-observation supervision, next-latent prediction, SIGReg regularisation, and three physiology-aware priors) together with a closed-loop rollout-prefix protocol. Reported MAE/RMSE values are obtained from dynamic-50% history rollout on a held-out 2,232-patient cohort and compared directly to a tuned external GPT-5.5 baseline. No equation, parameter, or performance figure is shown to reduce by construction to a fitted quantity defined from the same data, nor does any load-bearing claim rest on a self-citation chain. The architecture, loss design, and protocol are presented as general and falsifiable on external benchmarks, satisfying the criteria for a self-contained derivation.
Axiom & Free-Parameter Ledger
free parameters (1)
- relative weights of the six-term objective
axioms (1)
- domain assumption Patient physiology changes can be usefully regularized by slope, continuity, and large-jump penalties.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CMWM couples a joint-embedding state encoder with a wide action encoder... recurrent latent transition module under a six-term objective: next-observation supervision, next-latent prediction, SIGReg latent regularisation, and three physiology-aware shape priors (slope, continuity, large-jump penalty). A closed-loop rollout-prefix protocol...
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
annual eGFR trajectory forecasting... dynamic-50% history rollout test MAE of 7.384
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Navdeep Tangri, Lesley A. Stevens, John Griffith, Hocine Tighiouart, Ognjen Djurdjev, David Naimark, Adeera Levin, and Andrew S. Levey. A predictive model for progression of chronic kidney disease to kidney failure.JAMA, 305(15):1553–1559, 2011
work page 2011
-
[2]
RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stew- art. RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism. InAdvances in Neural Information Processing Systems 29 (NeurIPS 2016), pages 3504–3512. Curran Associates, Inc., 2016
work page 2016
-
[3]
BEHRT: Transformer for electronic health records.Scientific Reports, 10(1):7155, 2020
Yikuan Li, Shishir Rao, José Roberto Ayala Solares, Abdelaali Hassaine, Rema Ramakrishnan, Dex- ter Canoy, Yajie Zhu, Kazem Rahimi, and Gholamreza Salimi-Khorshidi. BEHRT: Transformer for electronic health records.Scientific Reports, 10(1):7155, 2020
work page 2020
-
[4]
Laila Rasmy, Yang Xiang, Ziqian Xie, Cui Tao, and Degui Zhi. Med-BERT: pretrained contextual- ized embeddings on large-scale structured electronic health records for disease prediction.npj Digital Medicine, 4(1):86, 2021
work page 2021
-
[5]
Time-dependent LSTM for survival prediction and patient subtyping in kidney disease trajectory
Pumeng Yu, Wenxin Bao, Hongfei Jiang, Mingyuan Wang, Wei Tan, Mengqi Mao, Tao Wang, and Tianzhao Liu. Time-dependent LSTM for survival prediction and patient subtyping in kidney disease trajectory. medRxiv preprint, doi:10.1101/2024.09.25.24314409,https://doi.org/10.1101/2024.09. 25.24314409, 2024
-
[6]
Daphna Ferro, Liat Yahav-Shafir, Reuven Shamir, Igor Brufman, Eyal Klang, and Benjamin S. Glicks- berg. Transformer-based time-to-event prediction for chronic kidney disease deterioration.Journal of the American Medical Informatics Association, 31(4):980–990, 2024
work page 2024
-
[7]
Jingying Ma, Jinwei Wang, Lanlan Lu, Yexiang Sun, Mengling Feng, Feifei Zhang, Peng Shen, Zhiqin Jiang, Shenda Hong, and Luxia Zhang. Development and validation of a dynamic kidney failure pre- diction model based on deep learning: a real-world study with external validation. arXiv preprint arXiv:2501.16388,https://arxiv.org/abs/2501.16388, 2025
-
[8]
EHRWorld: A patient-centric medical world model for long-horizon clinical trajectories
Linjie Mu, Zhongzhen Huang, Yannian Gu, Shengqian Qin, Shaoting Zhang, and Xiaofan Zhang. EHRWorld: A patient-centric medical world model for long-horizon clinical trajectories. arXiv preprint arXiv:2602.03569,https://arxiv.org/abs/2602.03569, 2026
-
[9]
David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. InAdvances in Neural Information Processing Systems 31 (NeurIPS 2018), pages 2455–2467. Curran Associates, Inc., 2018. Extended interactive version: “World Models”, arXiv:1803.10122,https://worldmodels. github.io/
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[10]
Learning latent dynamics for planning from pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. InProceedings of the 36th International Conference on Machine Learning (ICML), volume 97 ofProceedings of Machine Learning Research, pages 2555–2565. PMLR, 2019
work page 2019
-
[11]
Mastering Diverse Domains through World Models
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104,https://arxiv.org/abs/2301.04104, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
A path towards autonomous machine intelligence
Yann LeCun. A path towards autonomous machine intelligence. Position paper, OpenReview Preprint,
-
[13]
Version 0.9.2, 2022-06-27
work page 2022
-
[14]
Self-supervised learning from images with a joint-embedding predictive architecture
Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15619–15629. IEEE, 2023. 12 ChronoMedicalWorld –...
work page 2023
-
[15]
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, and Randall Balestriero. LeWorldModel: Stable end-to-end joint-embedding predictive architecture from pixels. arXiv preprint arXiv:2603.19312, https://arxiv.org/abs/2603.19312, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[16]
Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, Li-wei H
AlistairE.W.Johnson, LucasBulgarelli, LuShen, AlvinGayles, AyadShammout, StevenHorng, TomJ. Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, Li-wei H. Lehman, Leo A. Celi, and Roger G. Mark. MIMIC-IV, a freely accessible electronic health record dataset.Scientific Data, 10(1):1, 2023
work page 2023
-
[17]
Ali Amirahmadi, Mattias Ohlsson, and Kobra Etminani. Deep learning prediction models based on EHR trajectories: a systematic review.Journal of Biomedical Informatics, 144:104430, 2023
work page 2023
-
[18]
Lesley A. Inker, Nwamaka D. Eneanya, Josef Coresh, Hocine Tighiouart, Dan Wang, Yingying Sang, Deidra C. Crews, Alessandro Doria, Michelle M. Estrella, Marc Froissart, Morgan E. Grams, Tom Greene, Anders Grubb, Vilmundur Gudnason, Orlando M. Gutierrez, Roberto Kalil, Amy R. Karger, Michael Mauer, Gerjan Navis, Robert G. Nelson, Emilio D. Poggio, Roger Rod...
work page 2021
-
[19]
Hiddo J. L. Heerspink, Bergur V. Stefánsson, Ricardo Correa-Rotter, Glenn M. Chertow, Tom Greene, Fan-Fan Hou, Johannes F. E. Mann, John J. V. McMurray, Magnus Lindberg, Peter Rossing, C. David Sjöström, Robert D. Toto, Anna-Maria Langkilde, and David C. Wheeler. Dapagliflozin in patients with chronic kidney disease.New England Journal of Medicine, 383(15...
work page 2020
-
[20]
Bakris, Rajiv Agarwal, Stefan D
George L. Bakris, Rajiv Agarwal, Stefan D. Anker, Bertram Pitt, Luis M. Ruilope, Peter Rossing, Peter Kolkhof, Christina Nowack, Patrick Schloemer, Amer Joseph, and Gerasimos Filippatos. Effect of finerenone on chronic kidney disease outcomes in type 2 diabetes.New England Journal of Medicine, 383(23):2219–2229, 2020
work page 2020
-
[21]
Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2024 clinical practice guideline for the evaluation and management of chronic kidney disease.Kidney International, 105(4S):S117–S314, 2024
work page 2024
-
[22]
Lesley A. Inker, Hiddo J. L. Heerspink, Hocine Tighiouart, Andrew S. Levey, Josef Coresh, Ron T. Gansevoort, Andrew L. Simon, Jian Ying, Gerald J. Beck, Christoph Wanner, Jurgen Floege, Philip K. T. Li, Vlado Perkovic, Edward F. Vonesh, and Tom Greene. GFR slope as a surrogate end point for kidney disease progression in clinical trials: a meta-analysis of...
work page 2019
-
[23]
Joao Barbieri, Vinay Lala, Aroop Goswami, Rakesh K. Tekade, Padmanabha Subba Rao, Anjaneyulu Sajja, Karthikeya Naidu, Padmanabhan Ramji, Padmavathy Anantha, and Sandeep Karna. A digital twin model incorporating generalized metabolic fluxes to identify and predict chronic kidney disease in type 2 diabetes mellitus.npj Digital Medicine, 7(1):129, 2024
work page 2024
-
[24]
Chin-Chi Kuo, Chun-Min Chang, Kuan-Ting Liu, Wei-Kai Lin, Hsiu-Yin Chiang, Chih-Wei Chung, Meng-Ru Ho, Pei-Ran Sun, Rong-Lin Yang, and Kuan-Ta Chen. Automation of the kidney function prediction and classification through ultrasound-based kidney imaging using deep learning.npj Digital Medicine, 2(1):29, 2019
work page 2019
-
[25]
Luis H. Rojas, Angela J. Pereira-Morales, William Amador, Albert Montenegro, Walberto Buelvas, and Víctor de la Espriella. Development and validation of interpretable machine learning models to predict glomerular filtration rate in chronic kidney disease Colombian patients.Annals of Clinical Biochemistry, 62(1):57–66, 2025
work page 2025
-
[26]
Yi Luo, Junjie Liang, Xiao Hu, Zuofu Tang, Jinhua Zhang, Lanlan Han, Zhanwen Dong, Wenfeng Deng, Bin Miao, Yong Ren, and Ning Na. Deep learning algorithms for the prediction of posttransplant renal function in deceased-donor kidney recipients: a preliminary study based on pretransplant biopsy. Frontiers in Medicine, 8:676461, 2021. 13 ChronoMedicalWorld –...
work page 2021
-
[27]
Yulia Rubanova, Ricky T. Q. Chen, and David K. Duvenaud. Latent ODEs for irregularly-sampled time series. InAdvances in Neural Information Processing Systems 32 (NeurIPS 2019), pages 5320–5330. Curran Associates, Inc., 2019
work page 2019
-
[28]
Satya Narayan Shukla and Benjamin M. Marlin. Multi-time attention networks for irregularly sampled time series. InInternational Conference on Learning Representations (ICLR), 2021
work page 2021
-
[29]
Alaa, James Jordon, and Mihaela van der Schaar
Ioana Bica, Ahmed M. Alaa, James Jordon, and Mihaela van der Schaar. Estimating counterfactual treatment outcomes over time through adversarially balanced representations. InInternational Confer- ence on Learning Representations (ICLR), 2020
work page 2020
-
[30]
Continuous-time modeling of counterfactual outcomes using neural controlled differential equations
Nabeel Seedat, Fergus Imrie, Alexis Bellot, Zhaozhi Qian, and Mihaela van der Schaar. Continuous-time modeling of counterfactual outcomes using neural controlled differential equations. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162 ofProceedings of Machine Learning Research, pages 19497–19521. PMLR, 2022
work page 2022
-
[31]
New embedding models and API updates
OpenAI. New embedding models and API updates. Technical announcement,https://openai.com/ index/new-embedding-models-and-api-updates/, 2024. Accessed 2026-05-20. 14
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.