pith. sign in

arxiv: 2605.16319 · v1 · pith:OS2Z2H4Fnew · submitted 2026-05-04 · 💻 cs.LG · stat.AP· stat.ML

Forecasting Medium-Horizon Alzheimer's Disease Progression: Residual Gap-Aware Transformers for 24-Month CDR-SB Change from ADNI Clinical and Biomarker Histories

Pith reviewed 2026-05-20 23:24 UTC · model grok-4.3

classification 💻 cs.LG stat.APstat.ML
keywords Alzheimer diseasedisease progressiontransformermixed effects modelbiomarkerCDR-SBprediction modellongitudinal data
0
0 comments X

The pith

A residual gap-aware transformer outperforms baselines in predicting 24-month Alzheimer's progression from irregular clinical and biomarker histories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes an anchor-based method for medium-horizon prediction of Alzheimer's progression by fixing the start at mild cognitive impairment visits and measuring CDR-SB change at the nearest visit around 24 months later. It introduces a residual gap-aware transformer that uses a mixed-effects model as a base reference and then applies transformer layers to learn corrections from the irregular pre-anchor clinical and biomarker data. The transformer incorporates special tokenization for each observation and a penalty for time gaps to handle missing or uneven visits. This setup leads to lower error and higher correlation with actual changes than a standard mixed-effects model or other deep learning approaches like GRU-D and STraTS. If correct, it suggests that combining statistical references with attention mechanisms tuned for irregular data can improve forecasts in settings with incomplete longitudinal records.

Core claim

By anchoring at mild cognitive impairment visits and defining the response as the change in CDR-SB to the closest future visit in an 18-30 month window, the residual gap-aware transformer reduces mean squared error by 13.1 percent and increases prediction-observation correlation by 26.4 percent relative to a Bayesian-information-criterion-selected linear mixed-effects baseline across five participant-level random seeds.

What carries the argument

Residual gap-aware transformer that merges a mixed-effects statistical reference with transformer residual learning, using triplet tokenization for irregular histories and a learned nonnegative time-gap penalty in self-attention.

Load-bearing premise

The construction of the analytic cohort by anchoring at mild cognitive impairment visits and selecting the closest future visit within an 18-30 month window assumes this provides an unbiased sample for medium-horizon prediction without major selection effects from the irregular observation patterns.

What would settle it

A test on a held-out dataset from a different study with regular scheduled visits that eliminates the performance gain over the mixed-effects baseline would falsify the utility of the gap-aware residual component.

Figures

Figures reproduced from arXiv: 2605.16319 by Lanruo Wang, Ran Tong, Tong Wang, Yin Ni.

Figure 1
Figure 1. Figure 1: Overview of the proposed residual gap-aware transformer for predicting 24-month CDR [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of 24-month CDR-SB change in the primary analysis cohort. The distribution [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Main performance comparison for the 24-month CDR-SB-change task. Bars show repeated [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Repeated-seed stability across participant-level splits. Each panel shows test performance [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗
read the original abstract

Medium-horizon Alzheimer's disease progression prediction is difficult because future clinical scores can remain tied to baseline severity, while biomarker histories are irregular and incompletely observed. We develop an anchor-based analysis of 24-month Clinical Dementia Rating Sum of Boxes (CDR-SB) change using harmonized Alzheimer's Disease Neuroimaging Initiative (ADNI) tables. Each labeled sample is anchored at a mild cognitive impairment visit, uses only clinical and biomarker history observed at or before that anchor, and defines the response as CDR-SB at the future visit closest to 24 months within an 18--30 month window minus anchor CDR-SB. The analytic cohort contains 2,600 labeled anchors from 858 participants and 7,276 longitudinal rows. We propose a residual gap-aware transformer that combines a mixed-effects statistical reference with transformer-based residual learning from pre-anchor clinical and biomarker histories. The model uses participant-level random intercepts in the mixed-effects reference, observation-level triplet tokenization for irregular histories, and a learned nonnegative time-gap penalty inside self-attention. We compare the proposed model with a Bayesian-information-criterion-selected linear mixed-effects baseline, GRU-D, and STraTS under repeated participant-level train--test splits. Across five participant-level random seeds, the proposed model achieves the best mean test performance across all reported metrics, reducing MSE by 13.1% and increasing prediction--observation correlation by 26.4% relative to the mixed-effects baseline. It also improves over both GRU-D and STraTS in mean error and correlation. These results show that statistical anchoring and gap-aware residual learning provide a useful structure for medium-horizon Alzheimer's disease progression prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper develops an anchor-based residual gap-aware transformer for predicting 24-month CDR-SB change from pre-anchor ADNI clinical and biomarker histories. Each sample is anchored at an MCI visit; the target is CDR-SB at the closest visit inside an 18-30 month window. The model combines a mixed-effects reference (with participant-level random intercepts) and transformer residual learning that uses triplet tokenization and a learned nonnegative time-gap penalty. On 2,600 anchors from 858 participants under five participant-level random splits, the model reports a 13.1% MSE reduction and 26.4% correlation increase relative to a BIC-selected linear mixed-effects baseline, with additional gains over GRU-D and STraTS.

Significance. If the reported gains survive corrections for visit-selection effects, the work supplies a concrete, reproducible template for medium-horizon clinical prediction that fuses a statistically interpretable baseline with gap-aware sequence modeling on irregular longitudinal data. The participant-level splitting protocol and multi-seed reporting are positive features.

major comments (1)
  1. [Abstract / cohort definition] Abstract and cohort-construction paragraph: the analytic cohort is formed by anchoring at MCI visits and retaining only the single closest future visit inside the 18-30 month window. This implicitly conditions on the existence and timing of that visit. If visit density in ADNI correlates with progression rate, the selected pairs over-represent certain trajectories; the 13.1% MSE and 26.4% correlation improvements are then measured on a filtered distribution rather than on a fixed-horizon counterfactual. No sensitivity analysis (fixed 24-month imputation, all visits in window, or inverse-probability weighting) is described, so it is unclear whether the residual-transformer gains are robust to this sampling mechanism.
minor comments (1)
  1. [Abstract] The abstract states that the model 'uses participant-level random intercepts in the mixed-effects reference' but does not specify whether these intercepts are re-estimated on each train split or carried over from the full cohort; this detail affects the fairness of the baseline comparison.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review. The concern regarding potential visit-selection effects in the anchored cohort is well-taken, and we address it directly below. We agree that additional sensitivity checks will strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract / cohort definition] Abstract and cohort-construction paragraph: the analytic cohort is formed by anchoring at MCI visits and retaining only the single closest future visit inside the 18-30 month window. This implicitly conditions on the existence and timing of that visit. If visit density in ADNI correlates with progression rate, the selected pairs over-represent certain trajectories; the 13.1% MSE and 26.4% correlation improvements are then measured on a filtered distribution rather than on a fixed-horizon counterfactual. No sensitivity analysis (fixed 24-month imputation, all visits in window, or inverse-probability weighting) is described, so it is unclear whether the residual-transformer gains are robust to this sampling mechanism.

    Authors: We acknowledge that selecting the closest visit within the 18-30 month window conditions on the existence of such a visit and could therefore over-represent trajectories associated with higher visit density. This is a valid methodological point for medium-horizon prediction on observational data. While the anchored design follows common practice for defining clinically relevant 24-month outcomes in ADNI, we agree that robustness to this sampling choice should be demonstrated. In the revised manuscript we will add three sensitivity analyses: (1) inverse-probability weighting based on a model of visit attendance probability, (2) training and evaluation on all visits falling inside the window (with appropriate aggregation), and (3) fixed 24-month targets using last-observation-carried-forward or linear interpolation where data permit. Performance deltas relative to the mixed-effects baseline will be reported under each alternative to confirm that the residual gap-aware transformer gains are not artifacts of the original cohort filter. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical held-out evaluation on explicitly defined target

full rationale

The paper's central claims rest on supervised training and participant-level held-out evaluation of a residual transformer that predicts a pre-defined target (CDR-SB difference to the closest visit inside the 18-30 month window) from pre-anchor histories. This target definition is a fixed data-construction rule, not a quantity that the model is forced to recover by construction. The mixed-effects reference is a separate fitted baseline whose parameters are not reused as the model's output; performance gains are measured on unseen participants. No self-citation, uniqueness theorem, or ansatz is invoked to justify the architecture or results. The derivation chain is therefore a standard empirical pipeline and remains self-contained against external test data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The model relies on standard assumptions about data representativeness and the validity of the anchor-based labeling for future prediction.

free parameters (1)
  • time-gap penalty coefficient
    Learned nonnegative parameter inside self-attention whose specific fitted value is not reported.
axioms (1)
  • domain assumption The ADNI dataset provides representative longitudinal observations for MCI patients.
    Used to define the analytic cohort of 2600 anchors from 858 participants.

pith-pipeline@v0.9.0 · 5854 in / 1366 out tokens · 42064 ms · 2026-05-20T23:24:11.864626+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Christie, and Faezeh Ghasemi

    Mahshid Ahmadzadeh, Gregory J. Christie, and Faezeh Ghasemi. Neuroimaging and machine learning for studying the pathways from mild cognitive impairment to Alzheimer's disease: a systematic review. BMC Neurology, 23:302, 2023

  2. [2]

    PPAD : a deep learning architecture to predict progression of Alzheimer's disease

    Mohammad Al Olaimat, Jared Martinez, and Serdar Bozdag. PPAD : a deep learning architecture to predict progression of Alzheimer's disease. Bioinformatics, 39(Supplement\_1):i149--i157, 2023

  3. [3]

    TA-RNN : an attention-based time-aware recurrent neural network architecture for electronic health records

    Mohammad Al Olaimat and Serdar Bozdag. TA-RNN : an attention-based time-aware recurrent neural network architecture for electronic health records. Bioinformatics, 40(Supplement\_1):i169--i179, 2024

  4. [4]

    Scott Andrews, Urvi Desai, Noam Y

    J. Scott Andrews, Urvi Desai, Noam Y. Kirson, Miriam L. Zichlin, Daniel E. Ball, and Brandy R. Matthews. Disease severity and minimal clinically important differences in clinical outcome assessments for Alzheimer's disease clinical trials. Alzheimer's & Dementia: Translational Research & Clinical Interventions, 5:354--363, 2019

  5. [5]

    Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K

    Inci M. Baytas, Cao Xiao, Xi Zhang, Fei Wang, Anil K. Jain, and Jiayu Zhou. Patient subtyping via time-aware LSTM networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 65--74, 2017

  6. [6]

    Cedarbaum, Mark Jaros, Chito Hernandez, Nicola Coley, Sandrine Andrieu, Michael Grundman, Bruno Vellas, and the Alzheimer's Disease Neuroimaging Initiative

    Jesse M. Cedarbaum, Mark Jaros, Chito Hernandez, Nicola Coley, Sandrine Andrieu, Michael Grundman, Bruno Vellas, and the Alzheimer's Disease Neuroimaging Initiative. Rationale for use of the Clinical Dementia Rating Sum of Boxes as a primary outcome measure for Alzheimer's disease clinical trials. Alzheimer's & Dementia, 9(1 Suppl):S45--S55, 2013

  7. [7]

    Recurrent neural networks for multivariate time series with missing values

    Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8:6085, 2018

  8. [8]

    Collins, Johannes B

    Gary S. Collins, Johannes B. Reitsma, Douglas G. Altman, and Karel G. M. Moons. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis ( TRIPOD ): the TRIPOD statement. Annals of Internal Medicine, 162(1):55--63, 2015

  9. [9]

    Hamel, Mark Melkonyan, Ting F

    Huitong Ding, Biqi Wang, Alexander P. Hamel, Mark Melkonyan, Ting F. A. Ang, Rhoda Au, and Honghuang Lin. Prediction of progression from mild cognitive impairment to Alzheimer's disease with longitudinal and multimodal data. Frontiers in Dementia, 2:1271680, 2023

  10. [10]

    Fitzmaurice, Nan M

    Garrett M. Fitzmaurice, Nan M. Laird, and James H. Ware. Applied Longitudinal Analysis. Wiley, 2nd edition, 2011

  11. [11]

    Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer's disease dementia: a systematic review

    Sergio Grueso and Raquel Viejo-Sobera. Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer's disease dementia: a systematic review. Alzheimer's Research & Therapy, 13:162, 2021

  12. [12]

    Jack, David A

    Clifford R. Jack, David A. Bennett, Kaj Blennow, Maria C. Carrillo, Billy Dunn, Samantha B. Haeberlein, David M. Holtzman, William Jagust, Frank Jessen, Jason Karlawish, and others. NIA-AA research framework: toward a biological definition of Alzheimer's disease. Alzheimer's & Dementia, 14(4):535--562, 2018

  13. [13]

    Modeling Alzheimer's disease progression utilizing clinical trial and ADNI data to predict longitudinal trajectory of CDR-SB

    Samira Jamalian, Michael Dolton, Pascal Chanu, Vidya Ramakrishnan, Yesenia Franco, Kristin Wildsmith, and colleagues. Modeling Alzheimer's disease progression utilizing clinical trial and ADNI data to predict longitudinal trajectory of CDR-SB. CPT: Pharmacometrics & Systems Pharmacology, 12(7):1029--1042, 2023

  14. [14]

    Jedynak, Alexander Lang, Binghai Liu, Elan Katz, David Wang, B

    Bruno M. Jedynak, Alexander Lang, Binghai Liu, Elan Katz, David Wang, B. Yu, Steven Ferris, Paul S. Aisen, Jeffrey L. Cummings, Clifford R. Jack, and Michael W. Weiner. A computational neurodegenerative disease progression score: method and results with the Alzheimer's Disease Neuroimaging Initiative cohort. NeuroImage, 63(3):1478--1486, 2012

  15. [15]

    Lai, Philip R

    Sayantan Kumar, Inez Oh, Suzanne Schindler, Albert M. Lai, Philip R. O. Payne, and Aditi Gupta. Machine learning for modeling the progression of Alzheimer disease dementia using clinical data: a systematic literature review. JAMIA Open, 4(3):ooab052, 2021

  16. [16]

    Laird and James H

    Nan M. Laird and James H. Ware. Random-effects models for longitudinal data. Biometrics, 38(4):963--974, 1982

  17. [17]

    Predicting Alzheimer's disease progression using multi-modal deep learning approach

    Garam Lee, Kwangsik Nho, Byungkon Kang, Kyung-Ah Sohn, Dokyoon Kim, and the Alzheimer's Disease Neuroimaging Initiative. Predicting Alzheimer's disease progression using multi-modal deep learning approach. Scientific Reports, 9:1952, 2019

  18. [18]

    A multimodal machine learning model for predicting dementia conversion in Alzheimer's disease

    Min Woo Lee, Hye Weon Kim, Yeong Sim Choe, Hyeon Sik Yang, Jiyeon Lee, and colleagues. A multimodal machine learning model for predicting dementia conversion in Alzheimer's disease. Scientific Reports, 14:12276, 2024

  19. [19]

    Deep learning for Alzheimer's disease prediction: a comprehensive review

    Ishrat Malik, Muhammad Iqbal, and collaborators. Deep learning for Alzheimer's disease prediction: a comprehensive review. Diagnostics, 14(12):1281, 2024

  20. [20]

    Marinescu, Neil P

    Razvan V. Marinescu, Neil P. Oxtoby, Alexandra L. Young, Esther E. Bron, Arthur W. Toga, Michael W. Weiner, Frederik Barkhof, Nick C. Fox, Stefan Klein, Daniel C. Alexander, and others. The Alzheimer's Disease Prediction Of Longitudinal Evolution ( TADPOLE ) challenge: results after 1 year follow-up. Machine Learning for Biomedical Imaging, 1:1--60, 2021

  21. [21]

    Minh Nguyen, Tianye N. S. He, Lei An, Daniel C. Alexander, Jianfeng Feng, and Tze Yue Yeo. Predicting Alzheimer's disease progression using deep recurrent neural networks. NeuroImage, 222:117203, 2020

  22. [22]

    Okonkwo, Maria Rivera-Mindt, and Michael W

    Ozioma C. Okonkwo, Maria Rivera-Mindt, and Michael W. Weiner. Alzheimer's Disease Neuroimaging Initiative: two decades of pioneering Alzheimer's disease research and future directions. Alzheimer's & Dementia, 21:e14186, 2025

  23. [23]

    Joint Models for Longitudinal and Time-to-Event Data: With Applications in R

    Dimitris Rizopoulos. Joint Models for Longitudinal and Time-to-Event Data: With Applications in R. Chapman & Hall/CRC, 2012

  24. [24]

    Estimating the dimension of a model

    Gideon Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6(2):461--464, 1978

  25. [25]

    Early Alzheimer's disease detection: A review of machine learning techniques for forecasting transition from mild cognitive impairment

    Soraisam Gobinkumar Singh, Dulumani Das, Utpal Barman, and Manob Jyoti Saikia. Early Alzheimer's disease detection: A review of machine learning techniques for forecasting transition from mild cognitive impairment. Diagnostics, 14(16):1759, 2024

  26. [26]

    Steyerberg and Frank E

    Ewout W. Steyerberg and Frank E. Harrell. Prediction models need appropriate internal, internal--external, and external validation. Journal of Clinical Epidemiology, 69:245--247, 2016

  27. [27]

    Sindhu Tipirneni and Chandan K. Reddy. Self-supervised transformer for sparse and irregularly sampled multivariate clinical time-series. arXiv preprint arXiv:2107.14293, 2021

  28. [28]

    Modeling Parkinson's disease progression from longitudinal voice biomarkers: A comparative study of statistical and neural mixed effects models

    Ran Tong, Lanruo Wang, Tong Wang, and Wei Yan. Modeling Parkinson's disease progression from longitudinal voice biomarkers: A comparative study of statistical and neural mixed effects models. Computer Methods and Programs in Biomedicine Update, 9:100242, 2026. ISSN 2666-9900. doi: 10.1016/j.cmpbup.2026.100242 https://doi.org/10.1016/j.cmpbup.2026.100242

  29. [29]

    McLernon, Maarten van Smeden, Laure Wynants, Ewout W

    Ben Van Calster, David J. McLernon, Maarten van Smeden, Laure Wynants, Ewout W. Steyerberg, and Topic Group Evaluating Diagnostic Tests and Prediction Models. Calibration: the Achilles heel of predictive analytics. BMC Medicine, 17:230, 2019

  30. [30]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998--6008, 2017

  31. [31]

    Veitch, Paul S

    Douglas P. Veitch, Paul S. Aisen, Laurel A. Beckett, and colleagues. The Alzheimer's Disease Neuroimaging Initiative in the era of Alzheimer's disease treatment: a review of ADNI studies from 2021 to 2022. Alzheimer's & Dementia, 20(1):652--694, 2024

  32. [32]

    Vickers and Elena B

    Andrew J. Vickers and Elena B. Elkin. Decision curve analysis: a novel method for evaluating prediction models. Medical Decision Making, 26(6):565--574, 2006

  33. [33]

    Linear Mixed Models for Longitudinal Data

    Geert Verbeke and Geert Molenberghs. Linear Mixed Models for Longitudinal Data. Springer, 2000

  34. [34]

    Weiner, Douglas P

    Michael W. Weiner, Douglas P. Veitch, Paul S. Aisen, Laurel A. Beckett, Nigel J. Cairns, Robert C. Green, Danielle Harvey, Clifford R. Jack, William Jagust, Eran Liu, and others. The Alzheimer's Disease Neuroimaging Initiative: a review of papers published since its inception. Alzheimer's & Dementia, 13(6):730--743, 2017

  35. [35]

    Williams, Martha Storandt, Catherine M

    Monique M. Williams, Martha Storandt, Catherine M. Roe, and John C. Morris. Progression of Alzheimer's disease as measured by Clinical Dementia Rating Sum of Boxes scores. Alzheimer's & Dementia, 9(1 Suppl):S39--S44, 2013

  36. [36]

    A transformer-based framework for multivariate time series representation learning

    George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, and Carsten Eickhoff. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 2114--2124, 2021

  37. [37]

    Machine learning on longitudinal multi-modal data enables the understanding and prognosis of Alzheimer's disease progression

    Suixia Zhang, Jing Yuan, Yu Sun, Fei Wu, Ziyue Liu, Feifei Zhai, Yaoyun Zhang, Judith Somekh, Mor Peleg, Yi-Cheng Zhu, Zhengxing Huang, and collaborators. Machine learning on longitudinal multi-modal data enables the understanding and prognosis of Alzheimer's disease progression. iScience, 27(7):110263, 2024