Forecasting Medium-Horizon Alzheimer's Disease Progression: Residual Gap-Aware Transformers for 24-Month CDR-SB Change from ADNI Clinical and Biomarker Histories
Pith reviewed 2026-05-20 23:24 UTC · model grok-4.3
The pith
A residual gap-aware transformer outperforms baselines in predicting 24-month Alzheimer's progression from irregular clinical and biomarker histories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By anchoring at mild cognitive impairment visits and defining the response as the change in CDR-SB to the closest future visit in an 18-30 month window, the residual gap-aware transformer reduces mean squared error by 13.1 percent and increases prediction-observation correlation by 26.4 percent relative to a Bayesian-information-criterion-selected linear mixed-effects baseline across five participant-level random seeds.
What carries the argument
Residual gap-aware transformer that merges a mixed-effects statistical reference with transformer residual learning, using triplet tokenization for irregular histories and a learned nonnegative time-gap penalty in self-attention.
Load-bearing premise
The construction of the analytic cohort by anchoring at mild cognitive impairment visits and selecting the closest future visit within an 18-30 month window assumes this provides an unbiased sample for medium-horizon prediction without major selection effects from the irregular observation patterns.
What would settle it
A test on a held-out dataset from a different study with regular scheduled visits that eliminates the performance gain over the mixed-effects baseline would falsify the utility of the gap-aware residual component.
Figures
read the original abstract
Medium-horizon Alzheimer's disease progression prediction is difficult because future clinical scores can remain tied to baseline severity, while biomarker histories are irregular and incompletely observed. We develop an anchor-based analysis of 24-month Clinical Dementia Rating Sum of Boxes (CDR-SB) change using harmonized Alzheimer's Disease Neuroimaging Initiative (ADNI) tables. Each labeled sample is anchored at a mild cognitive impairment visit, uses only clinical and biomarker history observed at or before that anchor, and defines the response as CDR-SB at the future visit closest to 24 months within an 18--30 month window minus anchor CDR-SB. The analytic cohort contains 2,600 labeled anchors from 858 participants and 7,276 longitudinal rows. We propose a residual gap-aware transformer that combines a mixed-effects statistical reference with transformer-based residual learning from pre-anchor clinical and biomarker histories. The model uses participant-level random intercepts in the mixed-effects reference, observation-level triplet tokenization for irregular histories, and a learned nonnegative time-gap penalty inside self-attention. We compare the proposed model with a Bayesian-information-criterion-selected linear mixed-effects baseline, GRU-D, and STraTS under repeated participant-level train--test splits. Across five participant-level random seeds, the proposed model achieves the best mean test performance across all reported metrics, reducing MSE by 13.1% and increasing prediction--observation correlation by 26.4% relative to the mixed-effects baseline. It also improves over both GRU-D and STraTS in mean error and correlation. These results show that statistical anchoring and gap-aware residual learning provide a useful structure for medium-horizon Alzheimer's disease progression prediction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops an anchor-based residual gap-aware transformer for predicting 24-month CDR-SB change from pre-anchor ADNI clinical and biomarker histories. Each sample is anchored at an MCI visit; the target is CDR-SB at the closest visit inside an 18-30 month window. The model combines a mixed-effects reference (with participant-level random intercepts) and transformer residual learning that uses triplet tokenization and a learned nonnegative time-gap penalty. On 2,600 anchors from 858 participants under five participant-level random splits, the model reports a 13.1% MSE reduction and 26.4% correlation increase relative to a BIC-selected linear mixed-effects baseline, with additional gains over GRU-D and STraTS.
Significance. If the reported gains survive corrections for visit-selection effects, the work supplies a concrete, reproducible template for medium-horizon clinical prediction that fuses a statistically interpretable baseline with gap-aware sequence modeling on irregular longitudinal data. The participant-level splitting protocol and multi-seed reporting are positive features.
major comments (1)
- [Abstract / cohort definition] Abstract and cohort-construction paragraph: the analytic cohort is formed by anchoring at MCI visits and retaining only the single closest future visit inside the 18-30 month window. This implicitly conditions on the existence and timing of that visit. If visit density in ADNI correlates with progression rate, the selected pairs over-represent certain trajectories; the 13.1% MSE and 26.4% correlation improvements are then measured on a filtered distribution rather than on a fixed-horizon counterfactual. No sensitivity analysis (fixed 24-month imputation, all visits in window, or inverse-probability weighting) is described, so it is unclear whether the residual-transformer gains are robust to this sampling mechanism.
minor comments (1)
- [Abstract] The abstract states that the model 'uses participant-level random intercepts in the mixed-effects reference' but does not specify whether these intercepts are re-estimated on each train split or carried over from the full cohort; this detail affects the fairness of the baseline comparison.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed review. The concern regarding potential visit-selection effects in the anchored cohort is well-taken, and we address it directly below. We agree that additional sensitivity checks will strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract / cohort definition] Abstract and cohort-construction paragraph: the analytic cohort is formed by anchoring at MCI visits and retaining only the single closest future visit inside the 18-30 month window. This implicitly conditions on the existence and timing of that visit. If visit density in ADNI correlates with progression rate, the selected pairs over-represent certain trajectories; the 13.1% MSE and 26.4% correlation improvements are then measured on a filtered distribution rather than on a fixed-horizon counterfactual. No sensitivity analysis (fixed 24-month imputation, all visits in window, or inverse-probability weighting) is described, so it is unclear whether the residual-transformer gains are robust to this sampling mechanism.
Authors: We acknowledge that selecting the closest visit within the 18-30 month window conditions on the existence of such a visit and could therefore over-represent trajectories associated with higher visit density. This is a valid methodological point for medium-horizon prediction on observational data. While the anchored design follows common practice for defining clinically relevant 24-month outcomes in ADNI, we agree that robustness to this sampling choice should be demonstrated. In the revised manuscript we will add three sensitivity analyses: (1) inverse-probability weighting based on a model of visit attendance probability, (2) training and evaluation on all visits falling inside the window (with appropriate aggregation), and (3) fixed 24-month targets using last-observation-carried-forward or linear interpolation where data permit. Performance deltas relative to the mixed-effects baseline will be reported under each alternative to confirm that the residual gap-aware transformer gains are not artifacts of the original cohort filter. revision: yes
Circularity Check
No circularity: empirical held-out evaluation on explicitly defined target
full rationale
The paper's central claims rest on supervised training and participant-level held-out evaluation of a residual transformer that predicts a pre-defined target (CDR-SB difference to the closest visit inside the 18-30 month window) from pre-anchor histories. This target definition is a fixed data-construction rule, not a quantity that the model is forced to recover by construction. The mixed-effects reference is a separate fitted baseline whose parameters are not reused as the model's output; performance gains are measured on unseen participants. No self-citation, uniqueness theorem, or ansatz is invoked to justify the architecture or results. The derivation chain is therefore a standard empirical pipeline and remains self-contained against external test data.
Axiom & Free-Parameter Ledger
free parameters (1)
- time-gap penalty coefficient
axioms (1)
- domain assumption The ADNI dataset provides representative longitudinal observations for MCI patients.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
gap-aware self-attention ... s(ℓ,h)_iab = ⟨q,k⟩/√dh − λℓ,h |τia−τib| where λℓ,h = softplus(ηℓ,h) ≥ 0
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
residual gap-aware transformer ... mixed-effects statistical reference + transformer residual
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Mahshid Ahmadzadeh, Gregory J. Christie, and Faezeh Ghasemi. Neuroimaging and machine learning for studying the pathways from mild cognitive impairment to Alzheimer's disease: a systematic review. BMC Neurology, 23:302, 2023
work page 2023
-
[2]
PPAD : a deep learning architecture to predict progression of Alzheimer's disease
Mohammad Al Olaimat, Jared Martinez, and Serdar Bozdag. PPAD : a deep learning architecture to predict progression of Alzheimer's disease. Bioinformatics, 39(Supplement\_1):i149--i157, 2023
work page 2023
-
[3]
Mohammad Al Olaimat and Serdar Bozdag. TA-RNN : an attention-based time-aware recurrent neural network architecture for electronic health records. Bioinformatics, 40(Supplement\_1):i169--i179, 2024
work page 2024
-
[4]
Scott Andrews, Urvi Desai, Noam Y
J. Scott Andrews, Urvi Desai, Noam Y. Kirson, Miriam L. Zichlin, Daniel E. Ball, and Brandy R. Matthews. Disease severity and minimal clinically important differences in clinical outcome assessments for Alzheimer's disease clinical trials. Alzheimer's & Dementia: Translational Research & Clinical Interventions, 5:354--363, 2019
work page 2019
-
[5]
Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K
Inci M. Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K. Jain, and Jiayu Zhou. Patient subtyping via time-aware LSTM networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 65--74, 2017
work page 2017
-
[6]
Jesse M. Cedarbaum, Mark Jaros, Chito Hernandez, Nicola Coley, Sandrine Andrieu, Michael Grundman, Bruno Vellas, and the Alzheimer's Disease Neuroimaging Initiative. Rationale for use of the Clinical Dementia Rating Sum of Boxes as a primary outcome measure for Alzheimer's disease clinical trials. Alzheimer's & Dementia, 9(1 Suppl):S45--S55, 2013
work page 2013
-
[7]
Recurrent neural networks for multivariate time series with missing values
Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8:6085, 2018
work page 2018
-
[8]
Gary S. Collins, Johannes B. Reitsma, Douglas G. Altman, and Karel G. M. Moons. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis ( TRIPOD ): the TRIPOD statement. Annals of Internal Medicine, 162(1):55--63, 2015
work page 2015
-
[9]
Huitong Ding, Biqi Wang, Alexander P. Hamel, Mark Melkonyan, Ting F. A. Ang, Rhoda Au, and Honghuang Lin. Prediction of progression from mild cognitive impairment to Alzheimer's disease with longitudinal and multimodal data. Frontiers in Dementia, 2:1271680, 2023
work page 2023
-
[10]
Garrett M. Fitzmaurice, Nan M. Laird, and James H. Ware. Applied Longitudinal Analysis. Wiley, 2nd edition, 2011
work page 2011
-
[11]
Sergio Grueso and Raquel Viejo-Sobera. Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer's disease dementia: a systematic review. Alzheimer's Research & Therapy, 13:162, 2021
work page 2021
-
[12]
Clifford R. Jack, David A. Bennett, Kaj Blennow, Maria C. Carrillo, Billy Dunn, Samantha B. Haeberlein, David M. Holtzman, William Jagust, Frank Jessen, Jason Karlawish, and others. NIA-AA research framework: toward a biological definition of Alzheimer's disease. Alzheimer's & Dementia, 14(4):535--562, 2018
work page 2018
-
[13]
Samira Jamalian, Michael Dolton, Pascal Chanu, Vidya Ramakrishnan, Yesenia Franco, Kristin Wildsmith, and colleagues. Modeling Alzheimer's disease progression utilizing clinical trial and ADNI data to predict longitudinal trajectory of CDR-SB. CPT: Pharmacometrics & Systems Pharmacology, 12(7):1029--1042, 2023
work page 2023
-
[14]
Jedynak, Alexander Lang, Binghai Liu, Elan Katz, David Wang, B
Bruno M. Jedynak, Alexander Lang, Binghai Liu, Elan Katz, David Wang, B. Yu, Steven Ferris, Paul S. Aisen, Jeffrey L. Cummings, Clifford R. Jack, and Michael W. Weiner. A computational neurodegenerative disease progression score: method and results with the Alzheimer's Disease Neuroimaging Initiative cohort. NeuroImage, 63(3):1478--1486, 2012
work page 2012
-
[15]
Sayantan Kumar, Inez Oh, Suzanne Schindler, Albert M. Lai, Philip R. O. Payne, and Aditi Gupta. Machine learning for modeling the progression of Alzheimer disease dementia using clinical data: a systematic literature review. JAMIA Open, 4(3):ooab052, 2021
work page 2021
-
[16]
Nan M. Laird and James H. Ware. Random-effects models for longitudinal data. Biometrics, 38(4):963--974, 1982
work page 1982
-
[17]
Predicting Alzheimer's disease progression using multi-modal deep learning approach
Garam Lee, Kwangsik Nho, Byungkon Kang, Kyung-Ah Sohn, Dokyoon Kim, and the Alzheimer's Disease Neuroimaging Initiative. Predicting Alzheimer's disease progression using multi-modal deep learning approach. Scientific Reports, 9:1952, 2019
work page 1952
-
[18]
A multimodal machine learning model for predicting dementia conversion in Alzheimer's disease
Min Woo Lee, Hye Weon Kim, Yeong Sim Choe, Hyeon Sik Yang, Jiyeon Lee, and colleagues. A multimodal machine learning model for predicting dementia conversion in Alzheimer's disease. Scientific Reports, 14:12276, 2024
work page 2024
-
[19]
Deep learning for Alzheimer's disease prediction: a comprehensive review
Ishrat Malik, Muhammad Iqbal, and collaborators. Deep learning for Alzheimer's disease prediction: a comprehensive review. Diagnostics, 14(12):1281, 2024
work page 2024
-
[20]
Razvan V. Marinescu, Neil P. Oxtoby, Alexandra L. Young, Esther E. Bron, Arthur W. Toga, Michael W. Weiner, Frederik Barkhof, Nick C. Fox, Stefan Klein, Daniel C. Alexander, and others. The Alzheimer's Disease Prediction Of Longitudinal Evolution ( TADPOLE ) challenge: results after 1 year follow-up. Machine Learning for Biomedical Imaging, 1:1--60, 2021
work page 2021
-
[21]
Minh Nguyen, Tianye N. S. He, Lei An, Daniel C. Alexander, Jianfeng Feng, and Tze Yue Yeo. Predicting Alzheimer's disease progression using deep recurrent neural networks. NeuroImage, 222:117203, 2020
work page 2020
-
[22]
Okonkwo, Maria Rivera-Mindt, and Michael W
Ozioma C. Okonkwo, Maria Rivera-Mindt, and Michael W. Weiner. Alzheimer's Disease Neuroimaging Initiative: two decades of pioneering Alzheimer's disease research and future directions. Alzheimer's & Dementia, 21:e14186, 2025
work page 2025
-
[23]
Joint Models for Longitudinal and Time-to-Event Data: With Applications in R
Dimitris Rizopoulos. Joint Models for Longitudinal and Time-to-Event Data: With Applications in R. Chapman & Hall/CRC, 2012
work page 2012
-
[24]
Estimating the dimension of a model
Gideon Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6(2):461--464, 1978
work page 1978
-
[25]
Soraisam Gobinkumar Singh, Dulumani Das, Utpal Barman, and Manob Jyoti Saikia. Early Alzheimer's disease detection: A review of machine learning techniques for forecasting transition from mild cognitive impairment. Diagnostics, 14(16):1759, 2024
work page 2024
-
[26]
Ewout W. Steyerberg and Frank E. Harrell. Prediction models need appropriate internal, internal--external, and external validation. Journal of Clinical Epidemiology, 69:245--247, 2016
work page 2016
- [27]
-
[28]
Ran Tong, Lanruo Wang, Tong Wang, and Wei Yan. Modeling Parkinson's disease progression from longitudinal voice biomarkers: A comparative study of statistical and neural mixed effects models. Computer Methods and Programs in Biomedicine Update, 9:100242, 2026. ISSN 2666-9900. doi: 10.1016/j.cmpbup.2026.100242 https://doi.org/10.1016/j.cmpbup.2026.100242
-
[29]
McLernon, Maarten van Smeden, Laure Wynants, Ewout W
Ben Van Calster, David J. McLernon, Maarten van Smeden, Laure Wynants, Ewout W. Steyerberg, and Topic Group Evaluating Diagnostic Tests and Prediction Models. Calibration: the Achilles heel of predictive analytics. BMC Medicine, 17:230, 2019
work page 2019
-
[30]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998--6008, 2017
work page 2017
-
[31]
Douglas P. Veitch, Paul S. Aisen, Laurel A. Beckett, and colleagues. The Alzheimer's Disease Neuroimaging Initiative in the era of Alzheimer's disease treatment: a review of ADNI studies from 2021 to 2022. Alzheimer's & Dementia, 20(1):652--694, 2024
work page 2021
-
[32]
Andrew J. Vickers and Elena B. Elkin. Decision curve analysis: a novel method for evaluating prediction models. Medical Decision Making, 26(6):565--574, 2006
work page 2006
-
[33]
Linear Mixed Models for Longitudinal Data
Geert Verbeke and Geert Molenberghs. Linear Mixed Models for Longitudinal Data. Springer, 2000
work page 2000
-
[34]
Michael W. Weiner, Douglas P. Veitch, Paul S. Aisen, Laurel A. Beckett, Nigel J. Cairns, Robert C. Green, Danielle Harvey, Clifford R. Jack, William Jagust, Eran Liu, and others. The Alzheimer's Disease Neuroimaging Initiative: a review of papers published since its inception. Alzheimer's & Dementia, 13(6):730--743, 2017
work page 2017
-
[35]
Williams, Martha Storandt, Catherine M
Monique M. Williams, Martha Storandt, Catherine M. Roe, and John C. Morris. Progression of Alzheimer's disease as measured by Clinical Dementia Rating Sum of Boxes scores. Alzheimer's & Dementia, 9(1 Suppl):S39--S44, 2013
work page 2013
-
[36]
A transformer-based framework for multivariate time series representation learning
George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, and Carsten Eickhoff. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 2114--2124, 2021
work page 2021
-
[37]
Suixia Zhang, Jing Yuan, Yu Sun, Fei Wu, Ziyue Liu, Feifei Zhai, Yaoyun Zhang, Judith Somekh, Mor Peleg, Yi-Cheng Zhu, Zhengxing Huang, and collaborators. Machine learning on longitudinal multi-modal data enables the understanding and prognosis of Alzheimer's disease progression. iScience, 27(7):110263, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.