Evaluating TabPFN for Mild Cognitive Impairment to Alzheimer's Disease Conversion in Data Limited Settings
Pith reviewed 2026-05-21 09:17 UTC · model grok-4.3
The pith
TabPFN, a pre-trained foundation model for tabular data, predicts MCI to Alzheimer's conversion with AUC 0.892 and stays effective when training data drops to 50 samples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TabPFN achieves an AUC of 0.892 for three-year MCI to AD conversion on the TADPOLE dataset and outperforms LightGBM at 0.860. The model sustains strong results at N=50 training samples while traditional approaches decline, using features from demographics, APOE4, MRI, CSF, and PET across training sizes from 50 to 1000. These results indicate that pre-trained tabular foundation models can address data limitations common in Alzheimer's research.
What carries the argument
TabPFN, a tabular pre-trained foundation network that uses prior exposure to large synthetic tabular datasets to learn effectively from small real tabular inputs with minimal tuning.
If this is right
- Clinicians could apply TabPFN-style models for early risk assessment in memory clinics that collect only modest numbers of patient records.
- Alzheimer's studies would require fewer longitudinal cases to build usable predictors.
- The same pre-trained approach may transfer to forecasting other neurodegenerative conditions with limited data.
- Performance gains hold as training size scales from 50 up to 1000 samples.
Where Pith is reading between the lines
- External validation on diverse populations outside ADNI could reveal whether the low-data benefit persists across ethnic and geographic groups.
- Combining TabPFN outputs with emerging blood-based biomarkers might further raise accuracy without adding imaging costs.
- Real-world deployment would still need explicit strategies for handling incomplete scans or lab results that the current evaluation does not detail.
Load-bearing premise
The multimodal biomarker features drawn from MRI, CSF, PET, and demographics are assumed to be consistently complete and high-quality across the TADPOLE samples without substantial missing values or preprocessing complications.
What would settle it
TabPFN would lose its claimed advantage if an independent test set showed its AUC falling below LightGBM's when both are trained on only 50 samples from a new cohort with different missing-data patterns.
read the original abstract
Accurate prediction of conversion from Mild Cognitive Impairment (MCI) to Alzheimers Diseases (AD) is essential for early intervention, however, developing reliable conversion predictive models is difficult to develop due to limited longitudinal data availability We evaluate TabPFN (Tabular Pre-Trained Foundation Network) against traditional machine learning methods for predicting 3 year MCI to AD conversion using the TADPOLE dataset derived from ADNI. Using multimodal biomarker features extracted from demographics, APOE4, MRI volumes, CSF markers, and PET imaging, we conducted an experimental comparison across varying training set sizes (N=50 to 1000) and models including XGBoost, Random Forest, LightGBM, and Logistic Regression. TabPFN achieved one the highest performance (AUC=0.892), outperforming LightGBM (AUC=0.860) and demonstrating advantages in low data settings. At N=50 training samples, TabPFN maintained strong AUC while the traditional machine learning models struggles at small training samples. These findings demonstrate that foundation models are promising for disease prediction in data limited scenarios, such as Alzheimers diseases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates TabPFN against XGBoost, Random Forest, LightGBM, and Logistic Regression for predicting 3-year MCI-to-AD conversion on the TADPOLE dataset derived from ADNI. Using multimodal features (demographics, APOE4, MRI volumes, CSF, PET), it reports performance across training sizes N=50 to 1000, with TabPFN reaching AUC 0.892 (vs. LightGBM 0.860) and retaining strong performance at N=50 where baselines degrade.
Significance. If the reported low-data advantage holds after methodological clarification, the work provides useful empirical evidence that tabular foundation models can be effective for Alzheimer's prediction tasks where sample sizes are small, a common constraint in longitudinal biomarker studies.
major comments (2)
- [Dataset and Feature Extraction] The manuscript provides no description of missing-data handling for CSF and PET biomarkers. Given that ADNI/TADPOLE data typically exhibit high missingness rates in these modalities, the absence of details on imputation, complete-case selection, or feature-wise missingness patterns renders the N=50 performance claims vulnerable to preprocessing artifacts that may interact differently with TabPFN's prior than with tree-based baselines.
- [Experimental Evaluation] The experimental section supplies no information on the cross-validation procedure, exact train-test split strategy, hyperparameter search, or statistical testing used to support the AUC comparisons. Without these, the headline result (TabPFN AUC 0.892 vs. LightGBM 0.860) cannot be independently verified or attributed unambiguously to model properties.
minor comments (2)
- [Abstract] Abstract contains grammatical issues: 'one the highest' should read 'one of the highest'; 'struggles' should be 'struggle'; 'Alzheimers diseases' should be 'Alzheimer's disease'.
- [Dataset and Feature Extraction] The paper would benefit from an explicit statement of the total number of subjects and the distribution of missing values per modality before any performance tables are presented.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comments identify important gaps in methodological transparency that we will address in the revision. Below we respond point-by-point to the major comments.
read point-by-point responses
-
Referee: [Dataset and Feature Extraction] The manuscript provides no description of missing-data handling for CSF and PET biomarkers. Given that ADNI/TADPOLE data typically exhibit high missingness rates in these modalities, the absence of details on imputation, complete-case selection, or feature-wise missingness patterns renders the N=50 performance claims vulnerable to preprocessing artifacts that may interact differently with TabPFN's prior than with tree-based baselines.
Authors: We agree that the current version of the manuscript omits explicit details on missing-data handling, which is a valid concern given the known missingness patterns in ADNI-derived datasets. We will add a dedicated preprocessing subsection in the Methods that reports feature-wise missingness rates, the imputation approach employed (consistent across all models), and whether complete-case analysis was used for any modality. This addition will allow readers to assess whether preprocessing choices could differentially affect TabPFN versus the baselines. revision: yes
-
Referee: [Experimental Evaluation] The experimental section supplies no information on the cross-validation procedure, exact train-test split strategy, hyperparameter search, or statistical testing used to support the AUC comparisons. Without these, the headline result (TabPFN AUC 0.892 vs. LightGBM 0.860) cannot be independently verified or attributed unambiguously to model properties.
Authors: We acknowledge that the experimental protocol is under-specified in the present manuscript, limiting independent verification. In the revised version we will expand the Experimental Evaluation section to describe the cross-validation scheme (stratified k-fold), the train-test partitioning procedure (including subject-level constraints to avoid leakage), the hyperparameter search strategy for each baseline, and the statistical tests or confidence-interval methods used for the reported AUC differences. These clarifications will strengthen the attribution of performance gains to model characteristics rather than experimental choices. revision: yes
Circularity Check
No circularity: direct empirical model comparison on external dataset
full rationale
The manuscript reports standard supervised learning experiments that train models on subsets of the TADPOLE/ADNI cohort and evaluate AUC on held-out test subjects. All reported numbers (AUC=0.892 for TabPFN, AUC=0.860 for LightGBM, performance at N=50) are computed directly from the data splits and model outputs; none are obtained by fitting a parameter to the target metric and then relabeling it as a prediction. No equations, uniqueness theorems, or ansatzes are introduced, and the central claim rests on external baselines rather than self-citation chains. The evaluation is therefore self-contained against the public dataset and does not reduce to any of the enumerated circular patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard machine-learning assumptions of independent and identically distributed train-test splits and appropriate use of AUC as a performance metric hold for this medical prediction task.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
TabPFN achieved one the highest performance (AUC=0.892), outperforming LightGBM (AUC=0.860) ... At N=50 training samples, TabPFN maintained strong AUC
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Missing values were imputed using median imputation based on training set statistics
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION The number of Americans living with Alzheimer’s Disease (AD) is projected to reach 13.8 million by 2060[1], under- scoring the urgent need for improved early detection and in- tervention strategies. Machine learning models show promise for predicting disease progression, yet their development faces a fundamental challenge: limited high-qualit...
work page 2060
-
[2]
MATERIALS AND EXPERIMENTS 2.1. Dataset and Preprocessing We utilized the TADPOLE dataset, derived from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), a comprehen- sive longitudinal study containing clinical, imaging, and biomarker data from 1,737 participants[4]. The dataset in- cludes multiple visit observations spanning up to 10 years, with mea...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[3]
XGBoost achieved the highest AUC score of 0.901, followed closely by TabPFN at 0.892
RESULTS AND DISCUSSIONS Figure 1 presents the overall performance of all models on the holdout validation set. XGBoost achieved the highest AUC score of 0.901, followed closely by TabPFN at 0.892. Random Forest achieved 0.888, while LightGBM and Lo- gistic Regression performed comparably at 0.860 and 0.859 respectively. These results indicate that both tu...
-
[4]
CONCLUSIONS This study provides a systematic evaluation of TabPFN, a foundation model for tabular data, for predicting MCI-to-AD conversion using biomarker features from the TADPOLE dataset. Our results demonstrate that foundation models of- fer meaningful advantages in data-limited clinical scenarios while also revealing important practical consideration...
-
[5]
Metrics for multiclass classification: An overview,
“Metrics for multiclass classification: An overview,” 2020
work page 2020
-
[6]
A. Moore and M. Bell, “Xgboost, a novel explainable ai technique, in the prediction of myocardial infarction: A uk biobank cohort study,”Clinical Medicine Insights: Cardiology, vol. 16, pp. 117954682211336, Jan 2022
work page 2022
-
[7]
S. Woerner and C. F. Baumgartner, “Navigating data scarcity using foundation models: A benchmark of few- shot and zero-shot learning approaches in medical imag- ing,”arXiv preprint arXiv:2408.08058, 2024
-
[8]
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
N. Hollmann, S. M ¨uller, K. Eggensperger, and F. Hut- ter, “Tabpfn: A transformer that solves small tabu- lar classification problems in a second,”arXiv preprint arXiv:2207.01848, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[9]
Min Ai, Yu Liu, Dan Liu, Chengxi Yan, Xia Wang, and Xun Chen, “Research progress in predicting the conver- sion from mild cognitive impairment to alzheimer’s dis- ease via multimodal mri and artificial intelligence,”Fron- tiers in Neurology, vol. 16, pp. 1596632, 2025
work page 2025
-
[10]
A systematic review of the barriers to the implementation of artificial intelligence in healthcare,
M. I. Ahmed, B. Spooner, J. Isherwood, M. A. Lane, E. Orrock, and A. Dennison, “A systematic review of the barriers to the implementation of artificial intelligence in healthcare,”Cureus, vol. 15, no. 10, 2023
work page 2023
-
[11]
R. V . Marinescu et al., “Tadpole challenge: Accu- rate alzheimer’s disease prediction through crowdsourced forecasting of future data,” inLecture Notes in Computer Science. Springer, 2019, vol. 11843, pp. 1–10
work page 2019
-
[12]
2024 alzheimer’s disease facts and figures,
Alzheimer’s Association, “2024 alzheimer’s disease facts and figures,”Alzheimer’s & Dementia, vol. 20, no. 5, pp. 3708–3821, Apr 2024
work page 2024
-
[13]
F. Aracri, M. G. Bianco, A. Quattrone, and A. Sarica, “Bridging the gap: Missing data imputation methods and their effect on dementia classification performance,” Brain Sciences, vol. 15, no. 6, pp. 639, Jun 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.