pith. sign in

arxiv: 2604.28055 · v1 · submitted 2026-04-30 · 💻 cs.LG · cs.AI· eess.IV

PROMISE-AD: Progression-aware Multi-horizon Survival Estimation for Alzheimer's Disease Progression and Dynamic Tracking

Pith reviewed 2026-05-07 06:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AIeess.IV
keywords Alzheimer's disease progressionsurvival analysistemporal Transformermulti-horizon riskdiagnostic leakagetokenizationADNIcalibrated prediction
0
0 comments X

The pith

PROMISE-AD tokenizes irregular patient visits with slopes and missingness masks then fuses them in a temporal Transformer to output calibrated multi-horizon risks for Alzheimer's conversion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PROMISE-AD to predict conversion from cognitively normal to mild cognitive impairment and from MCI to Alzheimer's dementia. It turns sequences of past visits into tokens that carry measurements, missing-data indicators, rates of change, and visit timings while deliberately excluding future diagnostic labels. These tokens feed a temporal Transformer that blends global, attention-pooled, and most-recent views to produce both an overall progression score and discrete-time mixture hazards. The model is trained with survival likelihood plus ranking, focal, smoothness, and balance losses, then isotonic-calibrated on a validation set for 1-, 2-, 3-, and 5-year horizons. A sympathetic reader would care because the resulting individualized risk curves could support earlier clinical decisions and better-powered trials without the leakage problems common in standard time-series predictors.

Core claim

PROMISE-AD converts pre-index visits into tokens that contain standardized measurements, missingness masks, longitudinal changes, time-normalized slopes, visit timing, and non-diagnostic categorical attributes. A temporal Transformer fuses global, attention-pooled, and latest-visit representations to estimate a progression score and latent discrete-time mixture hazards. Training combines survival likelihood, horizon-specific focal risk loss, progression ranking, hazard smoothness, and mixture-balance regularization, followed by validation-set isotonic calibration. On held-out ADNI/TADPOLE data the model records an integrated Brier score of 0.085 for CN-to-MCI and a C-index of 0.894 for MCI-z

What carries the argument

Tokenization of pre-index visits with missingness masks, slopes, and non-diagnostic attributes, fused by a temporal Transformer into global-pooled-latest representations that produce mixture hazard estimates for multiple future horizons.

If this is right

  • Multi-horizon 1- to 5-year risks can be produced with competitive calibration (lowest IBS for CN-to-MCI) and discrimination (highest C-index for MCI-to-AD).
  • Longitudinal change features and recent visits contribute measurable gains over static or single-visit baselines.
  • The mixture-hazard formulation plus separate calibration allows separate reliability at each future time point.
  • Ablation results indicate that cognitive, functional, and APOE4 information remain the strongest contributors after tokenization.
  • The same pipeline can be re-used for dynamic re-prediction as new visits arrive without retraining the core weights.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tokenization-plus-Transformer recipe could be tested on other irregularly observed progressive diseases such as chronic kidney disease or heart failure.
  • Standard electronic-health-record pipelines that feed raw visit sequences into recurrent models may be inadvertently leaking future labels; the explicit masking steps here expose that risk.
  • Performance on the relatively homogeneous ADNI/TADPOLE cohorts may not survive larger socioeconomic or ethnic variation, so targeted external validation on community cohorts is a direct next experiment.
  • The latent mixture component could be inspected post-training to discover data-driven progression subtypes that differ in speed or dominant symptoms.

Load-bearing premise

That encoding visits with missingness masks, slopes, and non-diagnostic attributes fully blocks any future diagnostic information from entering the model and that performance on ADNI/TADPOLE data will hold for real-world clinical populations without retraining or recalibration.

What would settle it

A large drop in integrated Brier score or C-index when the trained model is applied unchanged to an independent clinical dataset that has different visit frequencies, demographics, and missingness patterns would falsify the generalization claim.

Figures

Figures reproduced from arXiv: 2604.28055 by Chenyu You, Christopher T Whitlow, Jeremy Hudson, Mohammad Kawas, Qing Lyu, Yuming Jiang.

Figure 1
Figure 1. Figure 1: Overview of the PROMISE-AD framework, from leakage-safe pre-index visit construction and progression-aware tokenization to temporal encoding, latent mixture survival estimation, calibration, and multi-horizon evaluation. components for irregular AD or EHR data [27], [28]. These studies show the value of temporal neural networks, but their outputs are usually future states or biomarkers rather than survival… view at source ↗
Figure 2
Figure 2. Figure 2: Cohort follow-up and conversion structure. Rows show CN-to-MCI and MCI-to-AD; columns show follow-up, event-time, at-risk/censoring, and Kaplan–Meier summaries. Fi(h), horizon label yih, evaluability mask mih, and class weight wih, ℓfocal(y, R) = −y(1 − R) γ log R − (1 − y)R γ log(1 − R), Lhorizon = P i P h∈H mihwihℓfocal(yih, Rih) P i P h∈H mihwih + ϵ . (8) The pairwise ranking term encourages higher prog… view at source ↗
Figure 3
Figure 3. Figure 3: PROMISE-AD test performance. Rows show CN-to-MCI and MCI-to-AD; columns show risk-group survival (A, E), horizon AUROC/AUPRC (B, F), Brier score (C, G), and calibration (D, H) view at source ↗
Figure 4
Figure 4. Figure 4: Top dynamic feature attributions for CN-to-MCI and MCI-to-AD. validation-loss checkpoint from each seed. Unless otherwise stated, all values below are reported as mean ± standard deviation across the three seeds. We summarized survival ranking with Harrell’s concordance index [31], time-dependent discrimination with cumulative/dynamic AUC [32], probabilis￾tic error with Brier score and integrated Brier sco… view at source ↗
Figure 5
Figure 5. Figure 5: Visit-level attention context by recency, dynamic change magni￾tude, and conversion proximity. TABLE II MCI-TO-AD ABLATION STUDY ON THE HELD-OUT TEST SET. IBS IS LOWER-IS-BETTER; ALL OTHER METRICS ARE HIGHER-IS-BETTER. Variant C-index IBS ↓ Mean TD AUC 5y AUROC 5y AUPRC Full PROMISE-AD 0.894 ± 0.011 0.095 ± 0.007 0.922 ± 0.012 0.997 ± 0.003 0.999 ± 0.001 No dynamic features 0.870 ± 0.017 0.103 ± 0.005 0.92… view at source ↗
read the original abstract

Individualized Alzheimer's disease (AD) progression prediction requires models that use irregular visits, account for censoring, avoid diagnostic leakage, and provide calibrated horizon risks. We propose PROgression-aware MultI-horizon Survival Estimation for Alzheimer's Disease (PROMISE-AD), a leakage-safe survival framework for predicting conversion from cognitively normal (CN) to mild cognitive impairment (MCI) and from MCI to AD dementia using ADNI/TADPOLE tabular histories. PROMISE-AD converts pre-index visits into tokens with standardized measurements, missingness masks, longitudinal changes, time-normalized slopes, visit timing, and non-diagnostic categorical attributes. A temporal Transformer fuses global, attention-pooled, and latest-visit representations to estimate a progression score and latent discrete-time mixture hazards. Training combines survival likelihood, horizon-specific focal risk loss, progression ranking, hazard smoothness, and mixture-balance regularization, followed by validation-set isotonic calibration for 1-, 2-, 3-, and 5-year risks. In held-out testing across three seeds, PROMISE-AD achieved an integrated Brier score (IBS) of 0.085 $\pm$ 0.012, C-index of 0.808 $\pm$ 0.015, and mean time-dependent AUC of 0.840 $\pm$ 0.081 for CN-to-MCI conversion, yielding the lowest IBS among compared methods. For MCI-to-AD conversion, PROMISE-AD achieved the highest C-index (0.894 $\pm$ 0.018) and near-ceiling 5-year discrimination (AUROC 0.997 $\pm$ 0.003; AUPRC 0.999 $\pm$ 0.001), although some baselines had lower IBS. Ablations and interpretability supported longitudinal change features, fused temporal representations, mixture hazards, cognitive and functional measures, APOE4 status, and recent conversion-proximal visits. These findings suggest that progression-aware survival modeling can provide interpretable multi-horizon AD conversion risk estimates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces PROMISE-AD, a temporal Transformer framework for multi-horizon survival estimation of Alzheimer's disease progression. It tokenizes strictly pre-index visits from ADNI/TADPOLE data using standardized measurements, missingness masks, time-normalized slopes, longitudinal changes, visit timing, and non-diagnostic attributes to avoid diagnostic leakage. The model fuses global, attention-pooled, and latest-visit representations to predict a progression score and latent discrete-time mixture hazards. Training uses a composite loss (survival likelihood, horizon-specific focal risk, progression ranking, hazard smoothness, mixture-balance regularization) followed by validation-set isotonic calibration. On held-out test data across three seeds, it reports IBS 0.085±0.012, C-index 0.808±0.015, mean time-dependent AUC 0.840±0.081 for CN-to-MCI (lowest IBS among baselines) and C-index 0.894±0.018 with 5-year AUROC 0.997±0.003 / AUPRC 0.999±0.001 for MCI-to-AD (highest C-index), supported by ablations on key components.

Significance. If the leakage-safe tokenization and calibration claims are substantiated, PROMISE-AD could advance dynamic, individualized AD risk prediction by handling irregular visits, censoring, and providing calibrated multi-horizon probabilities with interpretability. The reported discrimination (especially MCI-to-AD C-index) and ablations crediting longitudinal features, fused representations, and mixture hazards indicate technical merit over standard survival baselines. However, the near-ceiling 5-year metrics on this specific cohort raise generalizability questions; if addressed, the work could influence clinical ML applications in neurodegenerative disease tracking.

major comments (3)
  1. [Methods (tokenization procedure)] Methods (tokenization procedure): The claim that tokenization of pre-index visits (including time-normalized slopes and visit timing) fully eliminates diagnostic leakage by construction is not supported by explicit verification, sensitivity analysis, or formal exclusion of post-index influence. The reported 5-year AUROC of 0.997±0.003 and AUPRC of 0.999±0.001 for MCI-to-AD are atypically high and could arise if slopes or timing patterns indirectly encode decline rates correlated with ADNI's structured follow-up schedule, directly undermining the central leakage-safe claim.
  2. [Results (performance evaluation and calibration)] Results (performance evaluation and calibration): While metrics are reported with standard errors from three seeds and ablations are mentioned, the manuscript lacks full baseline comparison tables, exact cohort exclusion criteria, and explicit verification that post-hoc isotonic calibration (applied on validation set for 1/2/3/5-year risks) does not inflate discrimination metrics such as C-index and time-dependent AUC. This is load-bearing because the abstract highlights these as superior results.
  3. [Discussion (generalizability)] Discussion (generalizability): The assertion that the approach generalizes to real-world clinical data without retraining or recalibration is not tested via external cohorts or domain-shift experiments. This weakens the practical implications, as ADNI/TADPOLE visit patterns and missingness may not transfer, especially given the reliance on longitudinal change features.
minor comments (3)
  1. [Abstract] Abstract: The statement that 'some baselines had lower IBS' for MCI-to-AD is vague; a concise quantitative comparison or reference to the relevant table/figure would improve readability.
  2. [Methods (model and loss)] Notation and equations: The definitions of the progression score, latent discrete-time mixture hazards, and loss-component weights could be more explicitly formalized with numbered equations in the main text (rather than relying on supplementary material) to aid reproducibility.
  3. [Results (interpretability)] Figures: Interpretability plots (e.g., attention weights or feature importance for APOE4 and recent visits) would benefit from clearer axis labels and confidence intervals to match the quantitative rigor of the metrics.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have carefully reviewed each major comment and provide point-by-point responses below. We indicate where revisions will be made to address the concerns while maintaining the integrity of our claims.

read point-by-point responses
  1. Referee: Methods (tokenization procedure): The claim that tokenization of pre-index visits (including time-normalized slopes and visit timing) fully eliminates diagnostic leakage by construction is not supported by explicit verification, sensitivity analysis, or formal exclusion of post-index influence. The reported 5-year AUROC of 0.997±0.003 and AUPRC of 0.999±0.001 for MCI-to-AD are atypically high and could arise if slopes or timing patterns indirectly encode decline rates correlated with ADNI's structured follow-up schedule, directly undermining the central leakage-safe claim.

    Authors: We appreciate the referee raising this critical issue regarding potential leakage. Our tokenization is designed to use only visits strictly preceding the index date, with slopes computed as time-normalized changes across pre-index visits only, visit timing encoded relative to the index, and all other attributes drawn exclusively from pre-index records. No post-index data enters the tokens by construction. To substantiate this and address the concern about indirect encoding, we will add a sensitivity analysis in the revised Methods and Results sections: we will ablate the slope and visit-timing features, retrain, and report the resulting IBS, C-index, and time-dependent AUC for both tasks. This will quantify their contribution and test robustness. Regarding the high 5-year MCI-to-AD metrics, we note that this conversion task benefits from strong longitudinal signals in cognitive and functional measures (as supported by our ablations), and the near-ceiling values are consistent with the informativeness of pre-index trajectories in ADNI; however, the integrated Brier score remains competitive rather than superior, suggesting the discrimination is not artifactual. We will also add an explicit formal statement on post-index exclusion criteria. revision: partial

  2. Referee: Results (performance evaluation and calibration): While metrics are reported with standard errors from three seeds and ablations are mentioned, the manuscript lacks full baseline comparison tables, exact cohort exclusion criteria, and explicit verification that post-hoc isotonic calibration (applied on validation set for 1/2/3/5-year risks) does not inflate discrimination metrics such as C-index and time-dependent AUC. This is load-bearing because the abstract highlights these as superior results.

    Authors: We agree that greater transparency in the results is warranted. In the revision, we will expand the Results section to include complete baseline comparison tables (with all metrics for all methods) either in the main text or as a dedicated supplementary table. We will also provide a detailed description of cohort exclusion criteria, including a supplementary flowchart or enumerated list of inclusion/exclusion steps with exact numbers. For the isotonic calibration: because it is a monotonic recalibration fitted on the validation set and applied to test predictions, it preserves the ordering of risk scores. Consequently, the C-index (a rank-based metric) is invariant to this transformation. Time-dependent AUC can benefit from improved calibration but is not inflated beyond the original ranking performance. We will add an explicit comparison table of C-index and time-dependent AUC computed before versus after calibration to verify this point directly. These additions will be incorporated in the revised manuscript. revision: yes

  3. Referee: Discussion (generalizability): The assertion that the approach generalizes to real-world clinical data without retraining or recalibration is not tested via external cohorts or domain-shift experiments. This weakens the practical implications, as ADNI/TADPOLE visit patterns and missingness may not transfer, especially given the reliance on longitudinal change features.

    Authors: We thank the referee for this observation on generalizability. The manuscript does not make an unqualified claim of generalization without retraining or recalibration; the discussion emphasizes the model's design for irregular visits and censoring within the ADNI/TADPOLE setting and suggests broader applicability as a direction for future work. Nevertheless, we acknowledge that empirical testing on external cohorts is absent and that ADNI visit schedules and missingness patterns may differ from routine clinical care. In the revised Discussion, we will expand this section to explicitly state this as a limitation, discuss potential domain shifts (particularly in longitudinal change features and visit timing), and outline how the missingness masks and time-normalized representations are intended to improve robustness. We will also add a forward-looking statement on planned external validation. These changes will be made without overstating current evidence. revision: partial

Circularity Check

0 steps flagged

No circularity: model trained and evaluated on independent held-out data with standard metrics

full rationale

The paper defines a tokenization procedure for pre-index visits, a temporal Transformer architecture, and a composite training objective (survival likelihood + focal risk + ranking + smoothness + regularization). It then reports performance on held-out test subjects using externally defined metrics (IBS, C-index, time-dependent AUC) after separate validation-set isotonic calibration. No equation or claim reduces the reported quantities to definitions of the fitted parameters themselves, nor does any load-bearing step rest on a self-citation that is itself unverified. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard survival-analysis assumptions plus several design choices introduced for leakage safety and multi-horizon calibration; no new physical entities are postulated.

free parameters (2)
  • isotonic calibration mapping
    Fitted on the validation set after training to adjust 1-, 2-, 3-, and 5-year risk outputs.
  • loss-component weights
    Relative weights among survival likelihood, horizon-specific focal loss, progression ranking, hazard smoothness, and mixture-balance terms.
axioms (2)
  • domain assumption Censoring is non-informative and the ADNI/TADPOLE visit schedule is representative of target clinical populations.
    Required for unbiased survival likelihood and generalization of the reported metrics.
  • ad hoc to paper The chosen tokenization (measurements, masks, slopes, timing, non-diagnostic attributes) eliminates diagnostic leakage by construction.
    Central premise of the leakage-safe claim; no external verification is described.

pith-pipeline@v0.9.0 · 5695 in / 1706 out tokens · 84265 ms · 2026-05-07T06:33:02.473135+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    &=[Δ,𝑠𝑙𝑜𝑝𝑒] ℒ!

    Capturing dynamic featuresCompute feature vary 𝚫from first visit, slopefrom first visit ℎ$=*𝑔++ℎ$+ 𝑔+=softmax𝑊2,+𝑧ℎ$+=𝜎𝑊+𝑧 Each hazard means:ℎ$=𝑃conversion in bin 𝑘∣survived before bin 𝑘 𝑓*+'%)"&=[Δ,𝑠𝑙𝑜𝑝𝑒] ℒ!"#$,&=−logℎ&'!−(log1−ℎ&(()'! ,𝑒𝑣𝑒𝑛𝑡−(log1−ℎ&((:+",-! ,𝑐𝑒𝑛𝑠𝑜𝑟𝑒𝑑ℒ./#&0/1=∑∑𝑚&.𝑤&.ℓ2/345(𝑦&.,𝑅&.).∈ℋ&∑∑𝑚&.𝑤&..∈ℋ+𝜖&ℒ#41'=1𝒫(log{1+exp [−(𝑞&−𝑞()]}(&,()∈𝒫...

  2. [2]

    Mild cognitive impairment: Clinical characterization and outcome,

    R. C. Petersen, G. E. Smith, S. C. Waring, R. J. Ivnik, E. G. Tangalos, and E. Kokmen, “Mild cognitive impairment: Clinical characterization and outcome,”Arch. Neurol., vol. 56, no. 3, pp. 303–308, 1999

  3. [3]

    Hypothetical model of dynamic biomarkers of the alzheimer’s pathological cascade,

    C. R. Jacket al., “Hypothetical model of dynamic biomarkers of the alzheimer’s pathological cascade,”Lancet Neurol., vol. 9, no. 1, pp. 119–128, 2010

  4. [4]

    Alzheimer’s disease neuroimaging initiative (adni): Clinical characterization,

    R. C. Petersenet al., “Alzheimer’s disease neuroimaging initiative (adni): Clinical characterization,”Neurology, vol. 74, no. 3, pp. 201–209, 2010

  5. [5]

    Tadpole challenge: Accurate alzheimer’s disease prediction through crowdsourced forecasting of future data,

    R. V . Marinescuet al., “Tadpole challenge: Accurate alzheimer’s disease prediction through crowdsourced forecasting of future data,” inProc. Predictive Intell. Med., ser. Lecture Notes in Computer Science, vol. 11843. Springer, 2019, pp. 1–10

  6. [6]

    Regression models and life-tables,

    D. R. Cox, “Regression models and life-tables,”J. R. Stat. Soc. Ser . B, vol. 34, no. 2, pp. 187–220, 1972

  7. [7]

    Random survival forests,

    H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer, “Random survival forests,”Ann. Appl. Stat., vol. 2, no. 3, pp. 841–860, 2008

  8. [8]

    Xgboost: A scalable tree boosting system,

    T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2016, pp. 785–794

  9. [9]

    Deepsurv: Personalized treatment recommender system using a cox proportional hazards deep neural network,

    J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y . Kluger, “Deepsurv: Personalized treatment recommender system using a cox proportional hazards deep neural network,”BMC Med. Res. Methodol., vol. 18, no. 1, p. 24, 2018

  10. [10]

    Deephit: A deep learning approach to survival analysis with competing risks,

    C. Lee, W. R. Zame, J. Yoon, and M. van der Schaar, “Deephit: A deep learning approach to survival analysis with competing risks,” inProc. AAAI Conf. Artif. Intell., vol. 32, no. 1, 2018

  11. [11]

    A scalable discrete-time survival model for neural networks,

    M. F. Gensheimer and B. Narasimhan, “A scalable discrete-time survival model for neural networks,”PeerJ, vol. 7, p. e6257, 2019

  12. [12]

    Attention is all you need,

    A. Vaswaniet al., “Attention is all you need,” inProc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 5998–6008. [Online]. Available: https://papers.nips.cc/paper/7181-attention-is-all-you-need

  13. [13]

    Automatic classification of patients with alz- heimer’s disease from structural MRI: A comparison of ten methods using the ADNI database,

    R. Cuingnet, E. G ´erardin, J. Tessieras, G. Auzias, S. Leh ´ericy, M.-O. Habert, M. Chupin, H. Benali, O. Colliot, and Alzheimer’s Disease Neuroimaging Initiative, “Automatic classification of patients with alz- heimer’s disease from structural MRI: A comparison of ten methods using the ADNI database,”NeuroImage, vol. 56, no. 2, pp. 766–781, 2011

  14. [14]

    Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification,

    C. Davatzikos, P. Bhatt, L. M. Shaw, K. N. Batmanghelich, and J. Q. Trojanowski, “Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification,”Neurobiol. Aging, vol. 32, no. 12, pp. 2322.e19–2322.e27, 2011

  15. [15]

    Predictive markers for AD in a multi-modality framework: An analysis of MCI progression in the ADNI population,

    C. Hinrichs, V . Singh, G. Xu, S. C. Johnson, and Alzheimer’s Disease Neuroimaging Initiative, “Predictive markers for AD in a multi-modality framework: An analysis of MCI progression in the ADNI population,” NeuroImage, vol. 55, no. 2, pp. 574–589, 2011

  16. [16]

    Machine learning framework for early mri-based alzheimer’s conversion predic- tion in mci subjects,

    E. Moradi, A. Pepe, C. Gaser, H. Huttunen, and J. Tohka, “Machine learning framework for early mri-based alzheimer’s conversion predic- tion in mci subjects,”NeuroImage, vol. 104, pp. 398–412, 2015

  17. [17]

    Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis,

    H.-I. Suk, S.-W. Lee, D. Shen, and Alzheimer’s Disease Neuroimaging Initiative, “Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis,”NeuroImage, vol. 101, pp. 569–582, 2014

  18. [18]

    The alzheimer’s disease prediction of longitudi- nal evolution (tadpole) challenge: Results after 1 year follow-up,

    R. V . Marinescuet al., “The alzheimer’s disease prediction of longitudi- nal evolution (tadpole) challenge: Results after 1 year follow-up,”Mach. Learn. Biomed. Imaging, vol. 1, pp. 1–60, 2021

  19. [19]

    Predicting alzheimer’s disease progression using deep recurrent neural networks,

    M. Nguyen, T. He, L. An, D. C. Alexander, J. Feng, and B. T. T. Yeo, “Predicting alzheimer’s disease progression using deep recurrent neural networks,”NeuroImage, vol. 222, p. 117203, 2020

  20. [20]

    Prediction of conversion to alzheimer’s disease with longitudinal measures and time- to-event data,

    K. Li, W. Chan, R. S. Doody, J. Quinn, and S. Luo, “Prediction of conversion to alzheimer’s disease with longitudinal measures and time- to-event data,”J. Alzheimer’s Dis., vol. 58, no. 2, pp. 361–371, 2017

  21. [21]

    Rizopoulos,Joint Models for Longitudinal and Time-to-Event Data: With Applications in R

    D. Rizopoulos,Joint Models for Longitudinal and Time-to-Event Data: With Applications in R. Boca Raton, FL, USA: Chapman and Hall/CRC, 2012

  22. [22]

    Time-to-event prediction with neural networks and cox regression,

    H. Kvamme, Ø. Borgan, and I. Scheel, “Time-to-event prediction with neural networks and cox regression,”J. Mach. Learn. Res., vol. 20, no. 129, pp. 1–30, 2019. [Online]. Available: https: //jmlr.org/papers/v20/18-424.html

  23. [23]

    Dynamic-deephit: A deep learning approach for dynamic survival analysis with competing risks based on longitudinal data,

    C. Lee, J. Yoon, and M. van der Schaar, “Dynamic-deephit: A deep learning approach for dynamic survival analysis with competing risks based on longitudinal data,”IEEE Trans. Biomed. Eng., vol. 67, no. 1, pp. 122–133, 2020

  24. [24]

    Tabpfn: A transformer that solves small tabular classification problems in a second,

    N. Hollmann, S. M ¨uller, K. Eggensperger, and F. Hutter, “Tabpfn: A transformer that solves small tabular classification problems in a second,” inProc. Int. Conf. Learn. Represent., 2023. [Online]. Available: https://openreview.net/forum?id=cp5PvcI6w8

  25. [25]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,”Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997

  26. [26]

    Predictive modeling of the progression of alzheimer’s disease with recurrent neural networks,

    T. Wang, R. G. Qiu, and M. Yu, “Predictive modeling of the progression of alzheimer’s disease with recurrent neural networks,”Sci. Rep., vol. 8, p. 9161, 2018

  27. [27]

    Deep recurrent model for individualized prediction of alzheimer’s disease progression,

    W. Jung, E. Jun, and H.-I. Suk, “Deep recurrent model for individualized prediction of alzheimer’s disease progression,”NeuroImage, vol. 237, p. 118143, 2021

  28. [28]

    PPAD: A deep learning architecture to predict progression of alzheimer’s disease,

    M. Al Olaimat, J. Martinez, F. Saeed, S. Bozdag, and Alzheimer’s Disease Neuroimaging Initiative, “PPAD: A deep learning architecture to predict progression of alzheimer’s disease,”Bioinformatics, vol. 39, no. Suppl. 1, pp. i149–i157, 2023

  29. [29]

    TA-RNN: An attention-based time-aware recurrent neural network architecture for electronic health records,

    M. Al Olaimat, S. Bozdag, and Alzheimer’s Disease Neuroimaging Initiative, “TA-RNN: An attention-based time-aware recurrent neural network architecture for electronic health records,”Bioinformatics, vol. 40, no. Suppl. 1, pp. i169–i179, 2024

  30. [30]

    Decoupled weight decay regularization,

    I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” inProc. Int. Conf. Learn. Represent., 2019. [Online]. Available: https://openreview.net/forum?id=Bkg6RiCqY7

  31. [31]

    Transforming classifier scores into accurate multiclass probability estimates,

    B. Zadrozny and C. Elkan, “Transforming classifier scores into accurate multiclass probability estimates,” inProc. 8th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2002, pp. 694–699

  32. [32]

    Evaluating the yield of medical tests,

    F. E. Harrell, R. M. Califf, D. B. Pryor, K. L. Lee, and R. A. Rosati, “Evaluating the yield of medical tests,”JAMA, vol. 247, no. 18, pp. 2543–2546, 1982

  33. [33]

    Time-dependent roc curves for censored survival data and a diagnostic marker,

    P. J. Heagerty, T. Lumley, and M. S. Pepe, “Time-dependent roc curves for censored survival data and a diagnostic marker,”Biometrics, vol. 56, no. 2, pp. 337–344, 2000

  34. [34]

    Verification of forecasts expressed in terms of probability,

    G. W. Brier, “Verification of forecasts expressed in terms of probability,” Mon. Weather Rev., vol. 78, no. 1, pp. 1–3, 1950

  35. [35]

    Assessment and comparison of prognostic classification schemes for survival data,

    E. Graf, C. Schmoor, W. Sauerbrei, and M. Schumacher, “Assessment and comparison of prognostic classification schemes for survival data,” Stat. Med., vol. 18, no. 17–18, pp. 2529–2545, 1999

  36. [36]

    Nonparametric estimation from incomplete observations,

    E. L. Kaplan and P. Meier, “Nonparametric estimation from incomplete observations,”J. Amer . Stat. Assoc., vol. 53, no. 282, pp. 457–481, 1958

  37. [37]

    E. W. Steyerberg,Clinical Prediction Models: A Practical Approach to Development, V alidation, and Updating, 2nd ed. Cham: Springer, 2019

  38. [38]

    Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement,

    G. S. Collins, J. B. Reitsma, D. G. Altman, and K. G. M. Moons, “Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement,”Ann. Intern. Med., vol. 162, no. 1, pp. 55–63, 2015