Recognition: unknown
Prediction of Alzheimer's Disease Risk Factors from Retinal Images via Deep Learning: Development and Validation of Biologically Relevant Morphological Associations in the UK Biobank
Pith reviewed 2026-05-09 19:48 UTC · model grok-4.3
The pith
Deep learning on retinal images predicts multiple Alzheimer's risk factors and flags eye structures tied to future disease.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Deep learning models trained on 62,876 colored fundus photographs predict twelve Alzheimer's-related risk factors with AUROCs from 0.57 to 0.95 for categorical variables and R-squared values up to 0.76 for continuous ones, outperforming morphometry-based models. Saliency analysis via class activation mapping highlights the optic nerve head and retinal vasculature as the most informative regions; these saliency-derived scores align with measured morphological differences and show statistically significant separation between incident AD cases (on average 8.55 years prior) and matched controls.
What carries the argument
Deep learning models with class activation mapping (CAM) applied to colored fundus photographs to generate saliency maps and CAM-Scores that link retinal morphology to AD risk factors.
If this is right
- DL models outperform retinal morphometry for predicting the twelve risk factors.
- Saliency consistently identifies the optic nerve head and retinal vessels as the driving regions.
- CAM-Scores differ between future AD cases and controls, indicating overlap with preclinical retinal changes.
- CFP may capture structural signatures that mirror pathways leading to AD vulnerability.
Where Pith is reading between the lines
- If the retinal signatures prove stable across populations, eye photography could serve as an added low-cost input for population-level AD risk screening.
- Longitudinal studies could test whether changes in these CAM-Scores track with lifestyle or metabolic interventions known to modify AD risk.
- Combining retinal predictions with blood biomarkers might improve early identification of individuals who would benefit from closer monitoring.
Load-bearing premise
The saliency maps and CAM-Scores reflect genuine biological links to AD risk rather than dataset-specific artifacts or spurious correlations.
What would settle it
An external validation set where the same models show chance-level prediction accuracy or produce saliency maps that no longer emphasize the optic nerve head and vasculature while still matching the original performance numbers.
Figures
read the original abstract
The systemic, metabolic, lifestyle factors have established associations with Alzheimer's Disease (AD) through epidemiologic and AD-specific biomarker studies. Whether colored fundus photography (CFP) contains retinal structural signatures corresponding to these AD-related risk domains remains unclear. To determine whether deep learning (DL) models can predict 12 AD-related risk factors from CFP and to characterize the retinal structures underlying these predictions, thereby assessing whether CFP reflects pathways to AD vulnerability. Using UK Biobank CFPs, DL models were trained using 62,876 images from 44,501 unique participants to predict 12 factors linked to AD incidence: 6 categorical (sex, smoking, sleeplessness, economic status, alcohol use, depression) and 6 continuous (age, age at completing education, BMI, systolic, diastolic blood pressure, HbA1c). Model performance, model saliency, and saliency-derived scores (CAM-Score) were evaluated and compared to retinal morphometry. The scores were also compared between incident-AD cases (average 8.55 years before onset) and matched controls. Performance of DL ranged from AUROC= 0.5654-0.9480 for categorical and R2=-0.0291-0.7620 for continuous factors, outperforming most of the morphometry-machine learning models. Saliency-based score consistently highlighted biologically meaningful regions, particularly the optic nerve head and retinal vasculature. It also aligned with present morphometric variations. Several saliency-based scores differed significantly between incident AD and matched controls, suggesting potential overlap between retinal correlates of risk factors and preclinical AD-associated changes. CFP encodes retinal signatures linked to AD risk factors. Although not diagnostic, DL-derived retinal representations may uncover biologically meaningful risk-related structural changes mirroring the potential AD vulnerability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript trains deep learning models on 62,876 UK Biobank colored fundus photographs from 44,501 participants to predict 12 AD-linked risk factors (6 categorical: sex, smoking, sleeplessness, economic status, alcohol use, depression; 6 continuous: age, education completion age, BMI, systolic/diastolic BP, HbA1c). It reports AUROC (0.5654–0.9480) and R² (−0.0291–0.7620) performance, applies post-hoc class activation mapping (CAM) saliency to highlight retinal structures (optic nerve head, vasculature), compares CAM-derived scores to traditional morphometry, and tests score differences between incident AD cases (mean 8.55 years pre-onset) and matched controls. The central claim is that CFP encodes biologically relevant retinal morphological signatures associated with AD risk factors and preclinical vulnerability.
Significance. If the saliency-derived associations prove robust to artifacts and confounders, the work would meaningfully advance non-invasive retinal imaging as a tool for identifying systemic AD risk pathways, bridging DL-based prediction with morphometric and epidemiological evidence. Strengths include the large single-cohort scale, direct comparison of DL to morphometry baselines, and extension to incident cases; these elements could support future multimodal risk models if performance heterogeneity and validation gaps are addressed.
major comments (3)
- [Abstract / Results] Abstract and Results: Reported R² values include negative entries (down to −0.0291) for continuous factors, meaning the DL model underperforms a simple mean baseline for several targets. This heterogeneity directly weakens the claim that CFP encodes predictive retinal signatures across the 12 factors, as the central assertion of biologically relevant morphological associations presupposes consistent outperformance over trivial predictors.
- [Saliency / CAM-Score evaluation] Saliency analysis (CAM-Score derivation): The highlighted regions (optic disc, retinal vasculature) rest on post-hoc CAM without reported robustness tests such as occlusion/perturbation analysis, randomized input controls, or external cohort validation. Given the single UK Biobank source, this leaves open that saliency reflects acquisition artifacts, demographic imbalances, or unmeasured confounders rather than causal morphological links to AD risk, undermining the interpretation of alignment with morphometry and incident-case differences.
- [Methods] Methods: The manuscript provides limited detail on cross-validation procedure, hyperparameter selection, confounder adjustment (e.g., explicit modeling of age/sex/imaging quality), and handling of class imbalance or missing labels. These omissions are load-bearing because UK Biobank CFPs are known to contain systematic biases that could drive both the heterogeneous performance and the saliency maps.
minor comments (2)
- [Figures] Figure captions for CAM visualizations should explicitly state the number of images averaged, the exact CAM variant used, and any thresholding applied to produce the reported CAM-Scores.
- [Abstract / Results] The abstract states 'outperforming most of the morphometry-machine learning models' without quantifying the margin or listing the exact morphometric features and ML baselines used; this comparison should be expanded in the main text with effect sizes.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which has helped us identify areas for clarification and improvement. We address each major comment point by point below, providing the strongest honest defense of the manuscript while committing to revisions where they strengthen the work without misrepresentation.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and Results: Reported R² values include negative entries (down to −0.0291) for continuous factors, meaning the DL model underperforms a simple mean baseline for several targets. This heterogeneity directly weakens the claim that CFP encodes predictive retinal signatures across the 12 factors, as the central assertion of biologically relevant morphological associations presupposes consistent outperformance over trivial predictors.
Authors: We acknowledge the negative R² values and their implication for a subset of continuous targets. These values are reported transparently to reflect the variable strength of retinal associations with different AD risk factors, which is biologically expected given the diverse pathways involved (e.g., age is strongly encoded in retinal images while certain blood pressure components may be less so). Our central claim is not uniform predictability across all 12 factors but that CFP encodes biologically relevant morphological signatures for those factors where predictive signal exists, as supported by consistent outperformance over morphometric baselines in the majority of cases and alignment with incident AD differences. We will add explicit discussion of this performance heterogeneity and its implications in the revised Results and Discussion sections to avoid overgeneralization. revision: partial
-
Referee: [Saliency / CAM-Score evaluation] Saliency analysis (CAM-Score derivation): The highlighted regions (optic disc, retinal vasculature) rest on post-hoc CAM without reported robustness tests such as occlusion/perturbation analysis, randomized input controls, or external cohort validation. Given the single UK Biobank source, this leaves open that saliency reflects acquisition artifacts, demographic imbalances, or unmeasured confounders rather than causal morphological links to AD risk, undermining the interpretation of alignment with morphometry and incident-case differences.
Authors: We agree that post-hoc CAM benefits from additional robustness checks. The current analysis is grounded by quantitative alignment between CAM-derived scores and independent retinal morphometry measures, plus statistically significant differences in these scores between incident AD cases (mean 8.55 years pre-onset) and matched controls. To further address potential artifacts or confounders, we will incorporate occlusion sensitivity and input perturbation analyses in the revision. While external cohort validation is not feasible within the current single-cohort design, the large sample size, internal consistency checks, and preclinical AD comparison provide convergent evidence for biological relevance rather than pure artifact. revision: yes
-
Referee: [Methods] Methods: The manuscript provides limited detail on cross-validation procedure, hyperparameter selection, confounder adjustment (e.g., explicit modeling of age/sex/imaging quality), and handling of class imbalance or missing labels. These omissions are load-bearing because UK Biobank CFPs are known to contain systematic biases that could drive both the heterogeneous performance and the saliency maps.
Authors: We accept that expanded methodological transparency is required for reproducibility and to mitigate concerns about UK Biobank-specific biases. In the revised manuscript, we will detail the participant-level cross-validation strategy (to prevent leakage), hyperparameter selection process, explicit confounder modeling (including age, sex, and image quality metrics as covariates or stratification factors), class imbalance handling via weighted losses or oversampling, and missing label strategies. These elements were implemented in the original analysis but will now be fully documented with pseudocode or supplementary tables. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper trains supervised DL models on UK Biobank CFP images to predict 12 independent AD risk-factor labels, evaluates them on held-out participants, then applies post-hoc saliency (CAM) and compares derived scores to separate morphometric features and to incident-AD cases. None of these steps reduces a claimed result to its own inputs by construction: the risk-factor predictions are not redefined from the saliency maps, the CAM-Scores are interpretive outputs rather than fitted parameters renamed as predictions, and the incident-case comparison uses an external temporal split. No self-citation chain or uniqueness theorem is invoked to force the central claim. The analysis therefore remains non-circular.
Axiom & Free-Parameter Ledger
free parameters (1)
- Neural network weights and hyperparameters
axioms (1)
- domain assumption Saliency maps from DL models on retinal images reflect biologically meaningful structures rather than artifacts.
Reference graph
Works this paper leans on
-
[1]
Retinal manifestations and their diagnostic significance in Alzheimer’s disease
Abhyankar SD, Little K, Stitt A, et al. Retinal manifestations and their diagnostic significance in Alzheimer’s disease. Journal of Alzheimer’s Disease Reports 2025; 9: 25424823251361937. 20. Ge Y-J, Xu W, Ou Y-N, et al. Retinal biomarkers in Alzheimer’s disease and mild cognitive impairment: A systematic review and meta-analysis. Ageing Research Reviews ...
2025
-
[2]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ke G, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: Advances in Neural Information Processing Systems. Curran Associates, Inc., https://papers.nips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html (2017, accessed 9 January 2025). 32. Murti DMP, Pujianto U, Wibawa AP, et al. K-Neares...
work page internal anchor Pith review doi:10.48550/arxiv.2103.14030 2017
-
[3]
Wu N, Xu M, Chen S, et al. Retinal Vascular Morphology Reflects and Predicts Cerebral Small Vessel Disease: Evidences from Eye–Brain Imaging Analysis. Research 2025; 8: 0633. 43. Moss HE, Cao J, Wasi M, et al. Variability of Retinal Vessel Tortuosity Measurements Using a Semiautomated Method Applied to Fundus Images in Subjects With Papilledema. Transl Vi...
-
[4]
Chen Y-L, Xu W, Rosa RH, et al. Hyperglycemia Enhances Constriction of Retinal Venules via Activation of the Reverse-Mode Sodium-Calcium Exchanger. Diabetes 2019; 68: 1624–1634. 55. Liu X, Lai S, Ma S, et al. Development of a Novel Retina−Based Diagnostic Score for Early Detection of Major Depressive Disorder: An Interdisciplinary View. Front Psychiatry; ...
-
[5]
Association between Retinal Vascular Geometric Changes and Cognitive Impairment: A Systematic Review and Meta-Analysis
Wu H, Wang C, Chen C, et al. Association between Retinal Vascular Geometric Changes and Cognitive Impairment: A Systematic Review and Meta-Analysis. J Clin Neurol 2020; 16: 19–28. 67. Williams MA, McGowan AJ, Cardwell CR, et al. Retinal microvascular network attenuation in Alzheimer’s disease. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monito...
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.