Uncertainty-Aware Longitudinal Forecasting of Alzheimer's Disease Progression Using Deep Learning
Pith reviewed 2026-06-25 23:42 UTC · model grok-4.3
The pith
A probabilistic deep learning model generates five-year Alzheimer's trajectories with calibrated uncertainty and outperforms baselines on diagnosis prediction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Conditioning an autoregressive Mixture Density Network on patient-context vectors from a Temporal Fusion Transformer encoder equipped with a CORAL ordinal layer produces five-year probabilistic trajectories for diagnosis state, CDR Sum of Boxes, MMSE orientation, and hippocampal volume that achieve near-nominal 90 percent credible-interval coverage, widen appropriately across the horizon, remain consistent with known Alzheimer's biomarker dynamics, and yield higher next-visit accuracy than linear, recurrent, and transformer baselines, especially on MCI-versus-dementia discrimination; aleatoric and epistemic uncertainty are separated via analytic mixture variance and a five-member bootstrap e
What carries the argument
Conditioning of an autoregressive Mixture Density Network on patient-context representations learned by a Temporal Fusion Transformer encoder with CORAL ordinal output layer.
If this is right
- Next-visit diagnosis accuracy improves most on the MCI-to-dementia transition relative to linear, recurrent, and transformer baselines.
- Generated trajectories maintain near-nominal 90 percent credible-interval coverage that widens across the five-year horizon.
- Biomarker trajectories inside the model remain consistent with expected Alzheimer's progression patterns.
- Epistemic uncertainty rises for rare progression archetypes, MCI and dementia patients, and on external data such as OASIS-3.
- Aleatoric uncertainty is obtained directly from mixture variance while epistemic uncertainty is obtained from bootstrap ensemble diversity.
Where Pith is reading between the lines
- Forecasts could support individualized planning by showing families the range of possible five-year outcomes rather than a single most-likely path.
- The same encoder-plus-MDN structure could be tested on other slowly progressing conditions with ordered stages and repeated biomarker measurements.
- High-epistemic-uncertainty cases identified by the bootstrap ensemble could be flagged for closer clinical follow-up or additional data collection.
- Trajectory distributions could be used as inputs to simulation studies that test how candidate interventions would shift the entire forecast envelope.
Load-bearing premise
The patient representations learned by the encoder capture the dynamics needed to produce long-term trajectories that stay consistent with Alzheimer's biomarker changes.
What would settle it
On a new longitudinal cohort the generated 90 percent credible intervals cover the observed diagnosis and biomarker values at a rate below 75 percent, or the trajectories show hippocampal-volume or MMSE changes opposite in direction to established Alzheimer's progression patterns.
Figures
read the original abstract
Longitudinal modelling of Alzheimer's disease progression is clinically useful only if it can describe not just the most likely next diagnosis, but how a patient may evolve over time and how reliable that forecast is. Most deep learning approaches reduce this problem to single-step classification, treating cognitively normal, mild cognitive impairment, and dementia as flat categories while providing limited insight into how uncertainty accumulates across future visits. We propose a probabilistic framework that combines ordinal diagnosis prediction, multi-horizon trajectory generation, and decomposed uncertainty estimation. A Temporal Fusion Transformer encoder is adapted with a CORAL ordinal output layer, asymmetric loss weighting, and converter oversampling to respect disease-stage ordering and improve sensitivity to MCI-to-dementia transitions. Conditioned on the learned patient-context representation, an autoregressive Mixture Density Network generates five-year probabilistic trajectories for diagnosis state, CDR Sum of Boxes, MMSE orientation, and hippocampal volume. On ADNI, the model outperforms linear, recurrent, and transformer baselines for next-visit diagnosis prediction, with the strongest gains on MCI-versus-dementia discrimination. Generated trajectories achieve near-nominal 90% credible interval coverage, widening uncertainty across the forecast horizon, and biomarker dynamics consistent with expected Alzheimer's disease progression. We further separate aleatoric from epistemic uncertainty using analytic mixture variance and a five-member bootstrap ensemble, which provides the strongest encoder diversity and output-level epistemic signal. Epistemic uncertainty is higher for rare progression archetypes, MCI and dementia patients, and under external evaluation on OASIS-3, where it increases alongside prediction error.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a probabilistic framework for longitudinal Alzheimer's disease forecasting that integrates a Temporal Fusion Transformer encoder (with CORAL ordinal output, asymmetric loss weighting, and converter oversampling) and an autoregressive Mixture Density Network to generate five-year trajectories for diagnosis state, CDR Sum of Boxes, MMSE, and hippocampal volume while decomposing aleatoric and epistemic uncertainty. On the ADNI dataset it claims superior next-visit diagnosis prediction (especially MCI-to-dementia discrimination) over linear, recurrent, and transformer baselines, near-nominal 90% credible-interval coverage that widens with horizon, biomarker dynamics consistent with expected progression, and higher epistemic uncertainty for rare archetypes and under external OASIS-3 evaluation.
Significance. If the performance and coverage claims are substantiated with quantitative results and rigorous evaluation protocols, the work would offer a concrete advance in multi-horizon, uncertainty-aware longitudinal modeling that respects ordinal disease stages and separates uncertainty sources, addressing a recognized gap between single-step classification and clinically useful trajectory forecasting.
major comments (2)
- [Abstract] Abstract: the central claims of outperformance on next-visit diagnosis prediction and near-nominal 90% credible-interval coverage for five-year trajectories are stated without any numerical metrics (AUC, accuracy, coverage percentages), statistical tests, baseline hyper-parameter details, or ablation results, rendering the soundness of the performance assertions unverifiable from the provided text.
- [Abstract / Method] Method and Evaluation (implied in abstract description of autoregressive MDN): the claim that the TFT-derived patient context produces accurate five-year probabilistic trajectories with biomarker-consistent dynamics and nominal coverage does not specify whether scheduled sampling, teacher forcing, or direct multi-horizon training was employed, nor whether coverage was evaluated on actual held-out future visits versus simulated rollouts; this detail is load-bearing for the weakest assumption that autoregressive generation avoids compounding errors over five-year horizons.
minor comments (1)
- [Abstract] The abstract mentions 'converter oversampling ratio' and 'asymmetric loss weighting' as free parameters but does not indicate their chosen values or sensitivity analysis.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments below and have revised the manuscript to improve verifiability of the claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claims of outperformance on next-visit diagnosis prediction and near-nominal 90% credible-interval coverage for five-year trajectories are stated without any numerical metrics (AUC, accuracy, coverage percentages), statistical tests, baseline hyper-parameter details, or ablation results, rendering the soundness of the performance assertions unverifiable from the provided text.
Authors: We agree that the abstract should contain key numerical results to allow readers to assess the claims directly. The revised abstract now reports the AUC for next-visit diagnosis prediction (with emphasis on MCI-to-dementia discrimination), the observed 90% credible-interval coverage at different horizons, and a brief note on the statistical comparisons against baselines. revision: yes
-
Referee: [Abstract / Method] Method and Evaluation (implied in abstract description of autoregressive MDN): the claim that the TFT-derived patient context produces accurate five-year probabilistic trajectories with biomarker-consistent dynamics and nominal coverage does not specify whether scheduled sampling, teacher forcing, or direct multi-horizon training was employed, nor whether coverage was evaluated on actual held-out future visits versus simulated rollouts; this detail is load-bearing for the weakest assumption that autoregressive generation avoids compounding errors over five-year horizons.
Authors: We acknowledge that the original text did not explicitly describe the autoregressive training and rollout protocol. The revised Methods section now states that teacher forcing was used during training with scheduled sampling introduced for longer horizons, and that coverage statistics were computed on actual held-out future visits from the ADNI longitudinal folds (with full five-year trajectories generated via autoregressive rollout only where future observations were unavailable). revision: yes
Circularity Check
No circularity: empirical results on held-out data with no self-referential derivations
full rationale
The paper describes a TFT encoder with CORAL ordinal layer feeding an autoregressive MDN for multi-horizon trajectories, evaluated on ADNI held-out splits for next-visit prediction and coverage metrics. No equations, parameter fits, or self-citations are presented that reduce the reported performance, coverage, or biomarker consistency claims to quantities defined by the model's own fitted inputs or prior author work. The derivation chain consists of standard architectural choices and external data evaluation, remaining self-contained.
Axiom & Free-Parameter Ledger
free parameters (2)
- asymmetric loss weighting
- converter oversampling ratio
axioms (1)
- domain assumption Cognitive diagnosis stages possess a natural ordinal structure that should be explicitly respected by the output layer.
Reference graph
Works this paper leans on
-
[1]
The Alzheimer’s Disease Neuroimaging Initiative clinical core: progress and plans
Aisen, P.S., Veitch, D.P., Sperling, R., Petersen, R.C., Bollinger, J., Raman, R., Donohue, M.C., Weiner, M.W., 2024. The Alzheimer’s Disease Neuroimaging Initiative clinical core: progress and plans. Alzheimer’s & Dementia 20, 5143–5154. doi:10.1002/alz.14167
-
[2]
ChronoFormer: time-aware transformer architectures for structured clinical event modeling
Alsentzer, E., McDermott, M., Falck, F., Schiratti, J.B., Naumann, T., 2025. ChronoFormer: time-aware transformer architectures for structured clinical event modeling. arXiv preprint arXiv:2504.07373
arXiv 2025
-
[3]
Arora, M., Wang, X., Erickson, B.J., 2025. CXR-TFT: Multi-modal temporal fusion transformer for predicting chest X-ray trajectories, in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2025, Springer. doi:10.1007/978-3-032-05182-0_16
-
[4]
The need for uncertainty quantification in machine-assisted medical decision making
Begoli, E., Bhattacharya, T., Kusnezov, D., 2019. The need for uncertainty quantification in machine-assisted medical decision making. Nature Machine Intelligence 1, 20–23. doi:10.1038/ s42256-018-0004-1
2019
-
[5]
Mixture density networks Technical Report NCRG/94/004
Bishop, C.M., 1994. Mixture density networks Technical Report NCRG/94/004
1994
-
[6]
Rank consistent ordinal regression for neural networks with application to age estimation, Elsevier
Cao, W., Mirjalili, V., Raschka, S., 2020. Rank consistent ordinal regression for neural networks with application to age estimation, Elsevier. pp. 325–331. doi:10.1016/j.patrec.2020.11.008
-
[7]
Using mixture density networks to emulate a stochastic within-host model ofFrancisella tularensis infection
Carruthers, J., Finnie, T., 2023. Using mixture density networks to emulate a stochastic within-host model ofFrancisella tularensis infection. PLOS Computational Biology 19, e1011266. doi:10.1371/ journal.pcbi.1011266
2023
-
[8]
A mixture model for subtype identification in the context of disease progression modeling
Castaño, D., Schiratti, J.B., Durrleman, S., Jedynak, B., 2025. A mixture model for subtype identification in the context of disease progression modeling. arXiv preprint arXiv:2603.04286
arXiv 2025
-
[9]
A transformer- based unified multimodal framework for Alzheimer’s disease assess- ment
Chen, T., Wang, Y., Liu, X., Zhang, H., Li, W., 2024. A transformer- based unified multimodal framework for Alzheimer’s disease assess- ment. ComputersinBiologyandMedicine181,109050. doi:10.1016/ j.compbiomed.2024.109050
arXiv 2024
-
[10]
Fonteijn, H.M., Modat, M., Clarkson, M.J., Barnes, J., Lehmann, M., Hobbs, N.Z., Scahill, R.I., Tabrizi, S.J., Ourselin, S., Fox, N.C., et al., 2012. An event-based model for disease progression and its application in familial Alzheimer’s disease and huntington’s disease. NeuroImage 60, 1880–1889. doi:10.1016/j.neuroimage.2012.01.062
-
[12]
On calibration ofmodernneuralnetworks
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q., 2017. On calibration ofmodernneuralnetworks. URL:https://arxiv.org/abs/1706.04599, arXiv:1706.04599
Pith/arXiv arXiv 2017
-
[13]
Hashemifar, S., Iriondo, C., Hejrati, M., Alzheimer’s Disease Neu- roimagingInitiative,2022. DeepAD:arobustdeeplearningmodelof Alzheimer’s disease progression for real-world clinical applications. arXiv preprint arXiv:2203.09096
arXiv 2022
-
[14]
A stage-aware mixture of experts framework for neurodegenerative disease progression mod- elling
He, T., Jiang, K., Zhao, A., Schroder, A., Thompson, E., Soskic, S., Barkhof, F., Alexander, D.C., 2025. A stage-aware mixture of experts framework for neurodegenerative disease progression mod- elling. arXiv preprint arXiv:2508.07032
arXiv 2025
-
[15]
Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., Lerchner, A., 2017.𝛽-VAE: Learning basic visual concepts with a constrained variational framework, in: International Conference on Learning Representations
2017
-
[16]
H¨ullermeier, E., F¨urnkranz, J., Cheng, W., and Brinker, K
Hüllermeier,E.,Waegeman,W.,2021.Aleatoricandepistemicuncer- tainty in machine learning: an introduction to concepts and methods. Machine Learning 110, 457–506. doi:10.1007/s10994-021-05946-3
-
[17]
Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological cas- cade
Jack, C.R., Knopman, D.S., Jagust, W.J., Shaw, L.M., Aisen, P.S., Weiner,M.W.,Petersen,R.C.,Trojanowski,J.Q.,2010. Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological cas- cade. TheLancetNeurology9,119–128. doi:10.1016/S1474-4422(09) 70299-6
-
[18]
Deep ensembles for epistemic uncertainty: a frequentist perspective
Jain, A., Jaakkola, T., Barber, D., 2025. Deep ensembles for epistemic uncertainty: a frequentist perspective. arXiv preprint arXiv:2510.22063
arXiv 2025
-
[19]
Ordinal-ResLogit: interpretable deep residual neural networks for ordered choices
Kamal, K., Farooq, B., 2024. Ordinal-ResLogit: interpretable deep residual neural networks for ordered choices. Journal of Choice Modelling 50, 100454. doi:10.1016/j.jocm.2023.100454
-
[20]
Mixture of input-output hidden Markov models for heterogeneous disease progression modeling, in: Proceedings of the 1st Workshop on Healthcare AI and COVID-19, ICML 2022
Karaçay, B., Bianchi, M., Günnemann, S., Bouchard, G., 2022. Mixture of input-output hidden Markov models for heterogeneous disease progression modeling, in: Proceedings of the 1st Workshop on Healthcare AI and COVID-19, ICML 2022
2022
-
[21]
What uncertainties do we need in Bayesian deep learning for computer vision?, in: Advances in Neural Information Processing Systems, Curran Associates
Kendall, A., Gal, Y., 2017. What uncertainties do we need in Bayesian deep learning for computer vision?, in: Advances in Neural Information Processing Systems, Curran Associates
2017
-
[22]
Adam: a method for stochastic optimiza- tion, in: International Conference on Learning Representations
Kingma, D.P., Ba, J., 2015. Adam: a method for stochastic optimiza- tion, in: International Conference on Learning Representations
2015
-
[23]
Auto-encoding variational Bayes, in: International Conference on Learning Representations
Kingma, D.P., Welling, M., 2014. Auto-encoding variational Bayes, in: International Conference on Learning Representations
2014
-
[24]
Distribution shift detection for the postmarket surveillance of medical AI algorithms: a retrospective simulation study
Koch, L.M., Baumgartner, C.F., Berens, P., 2024. Distribution shift detection for the postmarket surveillance of medical AI algorithms: a retrospective simulation study. npj Digital Medicine 7, 113. doi:10. 1038/s41746-024-01085-w
2024
-
[25]
Second opinion needed: communicatinguncertaintyinmedicalmachinelearning
Kompa, B., Snoek, J., Beam, A.L., 2021. Second opinion needed: communicatinguncertaintyinmedicalmachinelearning. NPJDigital Medicine 4, 4. doi:10.1038/s41746-020-00367-3
-
[26]
Simple and scalable predictive uncertainty estimation using deep ensembles, in: Advances in Neural Information Processing Systems, Curran Asso- ciates
Lakshminarayanan, B., Pritzel, A., Blundell, C., 2017. Simple and scalable predictive uncertainty estimation using deep ensembles, in: Advances in Neural Information Processing Systems, Curran Asso- ciates
2017
-
[27]
OASIS-3:longitudinalneuroimaging,clinical,and cognitivedatasetfornormalagingandAlzheimer’sdisease
LaMontagne,P.J.,Benzinger,T.L.,Morris,J.C.,Keefe,S.,Hornbeck, R., Xiong, C., Grant, E., Hassenstab, J., Moulder, K., Vlassenko, A.G.,etal.,2019. OASIS-3:longitudinalneuroimaging,clinical,and cognitivedatasetfornormalagingandAlzheimer’sdisease. medRxiv doi:10.1101/2019.12.13.19014902
-
[29]
Nguyen, M., He, T., An, L., Alexander, D.C., Feng, J., Yeo, B.T.T.,
-
[30]
Predicting Alzheimer’s disease progression using deep re- current neural networks. NeuroImage 222, 117203. doi:10.1016/j. neuroimage.2020.117203
work page doi:10.1016/j 2020
-
[31]
Oxtoby,N.P.,Young,A.L.,Cash,D.M.,Benzinger,T.L.,Fagan,A.M., Morris,J.C.,Bateman,R.J.,Fox,N.C.,Schott,J.M.,Alexander,D.C.,
-
[32]
Data-driven models of dominantly-inherited Alzheimer’s dis- ease progression. Brain 141, 1529–1544. doi:10.1093/brain/awy050
-
[33]
Petersen, R.C., 2011. Mild cognitive impairment. New England Journal of Medicine 364, 2227–2234. doi:10.1056/NEJMcp0910237. A. Hariharan et al.:Preprint submitted to ElsevierPage 15 of 16 Uncertainty-Aware Longitudinal Forecasting of AD Progression
-
[34]
Phetrittikun, R., Suvirat, C., 2023. Temporal fusion transformer for forecasting vital sign trajectories in intensive care patients, in: 2023 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), IEEE. pp. 1–6. doi:10. 1109/CONECCT57959.2023.10234585
arXiv 2023
-
[35]
Alzheimer’s prevention initiative: a plan to accelerate the evaluation of presymptomatic treatments
Reiman, E.M., Langbaum, J.B., Fleisher, A.S., Caselli, R.J., Chen, K., Ayutyanont, N., Quiroz, Y.T., Kosik, K.S., Lopera, F., Tariot, P.N., 2011. Alzheimer’s prevention initiative: a plan to accelerate the evaluation of presymptomatic treatments. Journal of Alzheimer’s Disease 26, S321–S329. doi:10.3233/JAD-2011-0059
-
[36]
Joint Models for Longitudinal and Time-to- Event Data: With Applications in R
Rizopoulos, D., 2012. Joint Models for Longitudinal and Time-to- Event Data: With Applications in R. CRC Press, Boca Raton, FL
2012
-
[37]
Deep neural networks for rank-consistent ordinal regression based on conditional probabili- ties
Shi, X., Cao, W., Raschka, S., 2023. Deep neural networks for rank-consistent ordinal regression based on conditional probabili- ties. Pattern Analysis and Applications 26, 941–955. doi:10.1007/ s10044-023-01181-9
2023
-
[38]
Tang,X.,Zhao,L.,Chen,M.,Liu,W.,Zhang,J.,2025. Predictingthe progression of mild cognitive impairment based on fine-grained and spatiotemporal features of MRI. Biomedical Signal Processing and Control 98, 107012. doi:10.1016/j.bspc.2025.107012
-
[39]
Attention is all you need, in: Advances in Neural Information Processing Systems, Curran Associates
Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez, A.N., Kaiser, Ł., Polosukhin, I., 2017. Attention is all you need, in: Advances in Neural Information Processing Systems, Curran Associates
2017
-
[40]
Uncertainty-aware ordinal deep learningforcross-datasetdiabeticretinopathygrading
Wang, M., Liu, Y., Fu, H., 2026. Uncertainty-aware ordinal deep learningforcross-datasetdiabeticretinopathygrading. arXivpreprint arXiv:2602.10315
arXiv 2026
-
[41]
Predicting long-term progression of Alzheimer’s disease using a multimodal deep learning model incorporating interaction effects
Wang, Y., Gao, R., Wei, T., Johnston, L., Yuan, X., Zhang, Y., Yu, Z., 2024. Predicting long-term progression of Alzheimer’s disease using a multimodal deep learning model incorporating interaction effects. Journal of Translational Medicine 22, 245. doi:10.1186/ s12967-024-05025-w
2024
-
[42]
Weiner, M.W., Veitch, D.P., Aisen, P.S., Beckett, L.A., Cairns, N.J., Cedarbaum, J., Donohue, M.C., Green, R.C., Harvey, D., Jack, C.R., et al., 2017. The Alzheimer’s Disease Neuroimaging Initiative 3: continued innovation for clinical trial improvement. Alzheimer’s & Dementia 13, 561–571. doi:10.1016/j.jalz.2016.10.006
-
[43]
First, do no harm: addressing AI’s challenges with out-of-distribution data in medicine
Weng, W.H., Liu, Q., Huang, R., Hsieh, J., Foschini, L., 2025. First, do no harm: addressing AI’s challenges with out-of-distribution data in medicine. Clinical and Translational Science 18, e70132. doi:10. 1111/cts.70132
2025
-
[44]
Dementia
World Health Organization, 2023. Dementia. Technical Report. World Health Organization. Fact sheet. Available at:https://www. who.int/news-room/fact-sheets/detail/dementia
2023
-
[45]
Un- certainty quantification for machine learning in healthcare: a survey
Zhang, Z., Chen, T., Hernández-Lobato, J.M., Li, S., 2025. Un- certainty quantification for machine learning in healthcare: a survey. arXiv preprint arXiv:2505.02874
arXiv 2025
-
[46]
Out-of-distribution detection in medical image analysis: a survey
Zhao, T., Guo, Y., Wang, X., Shen, D., 2024. Out-of-distribution detection in medical image analysis: a survey. arXiv preprint arXiv:2404.18279 . Figure 3:Stability analysis results A. Hariharan et al.:Preprint submitted to ElsevierPage 16 of 16 Uncertainty-Aware Longitudinal Forecasting of AD Progression Figure 4:Reliability diagrams for predicted diagno...
arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.