arxiv: 2605.00665 · v1 · submitted 2026-05-01 · 💻 cs.CV

Recognition: unknown

Prediction of Alzheimer's Disease Risk Factors from Retinal Images via Deep Learning: Development and Validation of Biologically Relevant Morphological Associations in the UK Biobank

Seowung Leem , Yunchao Yang , Adam J. Woods , Ruogu Fang

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords Alzheimer's diseaseretinal imagingdeep learningrisk factorsfundus photographyUK Biobanksaliency mapspreclinical changes

0 comments

The pith

Deep learning on retinal images predicts multiple Alzheimer's risk factors and flags eye structures tied to future disease.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether colored fundus photographs contain visible signatures of Alzheimer's disease risk factors by training deep learning models on UK Biobank images. Models predict twelve factors spanning demographics, lifestyle, and metabolic measures, with performance that exceeds standard retinal measurements. Attention maps consistently point to the optic nerve head and blood vessels, and derived scores differ between people who later develop AD and matched controls. This work checks whether routine eye photos can surface structural changes that mirror known pathways to brain vulnerability years before diagnosis. If the associations hold, retinal imaging could add a non-invasive layer to risk assessment without replacing existing tests.

Core claim

Deep learning models trained on 62,876 colored fundus photographs predict twelve Alzheimer's-related risk factors with AUROCs from 0.57 to 0.95 for categorical variables and R-squared values up to 0.76 for continuous ones, outperforming morphometry-based models. Saliency analysis via class activation mapping highlights the optic nerve head and retinal vasculature as the most informative regions; these saliency-derived scores align with measured morphological differences and show statistically significant separation between incident AD cases (on average 8.55 years prior) and matched controls.

What carries the argument

Deep learning models with class activation mapping (CAM) applied to colored fundus photographs to generate saliency maps and CAM-Scores that link retinal morphology to AD risk factors.

If this is right

DL models outperform retinal morphometry for predicting the twelve risk factors.
Saliency consistently identifies the optic nerve head and retinal vessels as the driving regions.
CAM-Scores differ between future AD cases and controls, indicating overlap with preclinical retinal changes.
CFP may capture structural signatures that mirror pathways leading to AD vulnerability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the retinal signatures prove stable across populations, eye photography could serve as an added low-cost input for population-level AD risk screening.
Longitudinal studies could test whether changes in these CAM-Scores track with lifestyle or metabolic interventions known to modify AD risk.
Combining retinal predictions with blood biomarkers might improve early identification of individuals who would benefit from closer monitoring.

Load-bearing premise

The saliency maps and CAM-Scores reflect genuine biological links to AD risk rather than dataset-specific artifacts or spurious correlations.

What would settle it

An external validation set where the same models show chance-level prediction accuracy or produce saliency maps that no longer emphasize the optic nerve head and vasculature while still matching the original performance numbers.

Figures

Figures reproduced from arXiv: 2605.00665 by Adam J. Woods, Ruogu Fang, Seowung Leem, Yunchao Yang.

read the original abstract

The systemic, metabolic, lifestyle factors have established associations with Alzheimer's Disease (AD) through epidemiologic and AD-specific biomarker studies. Whether colored fundus photography (CFP) contains retinal structural signatures corresponding to these AD-related risk domains remains unclear. To determine whether deep learning (DL) models can predict 12 AD-related risk factors from CFP and to characterize the retinal structures underlying these predictions, thereby assessing whether CFP reflects pathways to AD vulnerability. Using UK Biobank CFPs, DL models were trained using 62,876 images from 44,501 unique participants to predict 12 factors linked to AD incidence: 6 categorical (sex, smoking, sleeplessness, economic status, alcohol use, depression) and 6 continuous (age, age at completing education, BMI, systolic, diastolic blood pressure, HbA1c). Model performance, model saliency, and saliency-derived scores (CAM-Score) were evaluated and compared to retinal morphometry. The scores were also compared between incident-AD cases (average 8.55 years before onset) and matched controls. Performance of DL ranged from AUROC= 0.5654-0.9480 for categorical and R2=-0.0291-0.7620 for continuous factors, outperforming most of the morphometry-machine learning models. Saliency-based score consistently highlighted biologically meaningful regions, particularly the optic nerve head and retinal vasculature. It also aligned with present morphometric variations. Several saliency-based scores differed significantly between incident AD and matched controls, suggesting potential overlap between retinal correlates of risk factors and preclinical AD-associated changes. CFP encodes retinal signatures linked to AD risk factors. Although not diagnostic, DL-derived retinal representations may uncover biologically meaningful risk-related structural changes mirroring the potential AD vulnerability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DL models predict some AD risk factors from retinal images with uneven success, but saliency maps do not convincingly establish biological links over artifacts or confounders.

read the letter

The paper's main finding is that deep learning models trained on UK Biobank retinal images can predict several Alzheimer's risk factors with moderate accuracy, and saliency maps point to the optic disc and vessels as key areas, with some scores differing in future AD cases. What stands out as new is the breadth: predicting 12 factors including lifestyle and metabolic ones, then deriving CAM-Scores to check biological plausibility against morphometry. The large cohort and the temporal check against incident AD add value over prior single-factor studies. The paper does well in scale and in trying to make the predictions interpretable rather than leaving them as a black box. Outperforming morphometry baselines for most tasks is a positive sign. The soft spots are clear in the numbers. Performance is inconsistent, with AUROCs as low as 0.56 and negative R2 for some continuous factors, which undercuts the idea that retinal images broadly encode these risks. The saliency approach is post-hoc and could capture imaging artifacts or unadjusted confounders instead of true morphological links to AD. Without external validation or more rigorous confounder analysis, the biological relevance claim stays tentative. This work is for researchers exploring non-invasive biomarkers for AD risk. A reader focused on practical screening tools might find the approach useful to build on, but anyone expecting strong causal evidence will be disappointed. I would recommend sending it for peer review. The dataset is substantial and the question is timely, so referees can help tighten the validation and interpretation.

Referee Report

3 major / 2 minor

Summary. The manuscript trains deep learning models on 62,876 UK Biobank colored fundus photographs from 44,501 participants to predict 12 AD-linked risk factors (6 categorical: sex, smoking, sleeplessness, economic status, alcohol use, depression; 6 continuous: age, education completion age, BMI, systolic/diastolic BP, HbA1c). It reports AUROC (0.5654–0.9480) and R² (−0.0291–0.7620) performance, applies post-hoc class activation mapping (CAM) saliency to highlight retinal structures (optic nerve head, vasculature), compares CAM-derived scores to traditional morphometry, and tests score differences between incident AD cases (mean 8.55 years pre-onset) and matched controls. The central claim is that CFP encodes biologically relevant retinal morphological signatures associated with AD risk factors and preclinical vulnerability.

Significance. If the saliency-derived associations prove robust to artifacts and confounders, the work would meaningfully advance non-invasive retinal imaging as a tool for identifying systemic AD risk pathways, bridging DL-based prediction with morphometric and epidemiological evidence. Strengths include the large single-cohort scale, direct comparison of DL to morphometry baselines, and extension to incident cases; these elements could support future multimodal risk models if performance heterogeneity and validation gaps are addressed.

major comments (3)

[Abstract / Results] Abstract and Results: Reported R² values include negative entries (down to −0.0291) for continuous factors, meaning the DL model underperforms a simple mean baseline for several targets. This heterogeneity directly weakens the claim that CFP encodes predictive retinal signatures across the 12 factors, as the central assertion of biologically relevant morphological associations presupposes consistent outperformance over trivial predictors.
[Saliency / CAM-Score evaluation] Saliency analysis (CAM-Score derivation): The highlighted regions (optic disc, retinal vasculature) rest on post-hoc CAM without reported robustness tests such as occlusion/perturbation analysis, randomized input controls, or external cohort validation. Given the single UK Biobank source, this leaves open that saliency reflects acquisition artifacts, demographic imbalances, or unmeasured confounders rather than causal morphological links to AD risk, undermining the interpretation of alignment with morphometry and incident-case differences.
[Methods] Methods: The manuscript provides limited detail on cross-validation procedure, hyperparameter selection, confounder adjustment (e.g., explicit modeling of age/sex/imaging quality), and handling of class imbalance or missing labels. These omissions are load-bearing because UK Biobank CFPs are known to contain systematic biases that could drive both the heterogeneous performance and the saliency maps.

minor comments (2)

[Figures] Figure captions for CAM visualizations should explicitly state the number of images averaged, the exact CAM variant used, and any thresholding applied to produce the reported CAM-Scores.
[Abstract / Results] The abstract states 'outperforming most of the morphometry-machine learning models' without quantifying the margin or listing the exact morphometric features and ML baselines used; this comparison should be expanded in the main text with effect sizes.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which has helped us identify areas for clarification and improvement. We address each major comment point by point below, providing the strongest honest defense of the manuscript while committing to revisions where they strengthen the work without misrepresentation.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results: Reported R² values include negative entries (down to −0.0291) for continuous factors, meaning the DL model underperforms a simple mean baseline for several targets. This heterogeneity directly weakens the claim that CFP encodes predictive retinal signatures across the 12 factors, as the central assertion of biologically relevant morphological associations presupposes consistent outperformance over trivial predictors.

Authors: We acknowledge the negative R² values and their implication for a subset of continuous targets. These values are reported transparently to reflect the variable strength of retinal associations with different AD risk factors, which is biologically expected given the diverse pathways involved (e.g., age is strongly encoded in retinal images while certain blood pressure components may be less so). Our central claim is not uniform predictability across all 12 factors but that CFP encodes biologically relevant morphological signatures for those factors where predictive signal exists, as supported by consistent outperformance over morphometric baselines in the majority of cases and alignment with incident AD differences. We will add explicit discussion of this performance heterogeneity and its implications in the revised Results and Discussion sections to avoid overgeneralization. revision: partial
Referee: [Saliency / CAM-Score evaluation] Saliency analysis (CAM-Score derivation): The highlighted regions (optic disc, retinal vasculature) rest on post-hoc CAM without reported robustness tests such as occlusion/perturbation analysis, randomized input controls, or external cohort validation. Given the single UK Biobank source, this leaves open that saliency reflects acquisition artifacts, demographic imbalances, or unmeasured confounders rather than causal morphological links to AD risk, undermining the interpretation of alignment with morphometry and incident-case differences.

Authors: We agree that post-hoc CAM benefits from additional robustness checks. The current analysis is grounded by quantitative alignment between CAM-derived scores and independent retinal morphometry measures, plus statistically significant differences in these scores between incident AD cases (mean 8.55 years pre-onset) and matched controls. To further address potential artifacts or confounders, we will incorporate occlusion sensitivity and input perturbation analyses in the revision. While external cohort validation is not feasible within the current single-cohort design, the large sample size, internal consistency checks, and preclinical AD comparison provide convergent evidence for biological relevance rather than pure artifact. revision: yes
Referee: [Methods] Methods: The manuscript provides limited detail on cross-validation procedure, hyperparameter selection, confounder adjustment (e.g., explicit modeling of age/sex/imaging quality), and handling of class imbalance or missing labels. These omissions are load-bearing because UK Biobank CFPs are known to contain systematic biases that could drive both the heterogeneous performance and the saliency maps.

Authors: We accept that expanded methodological transparency is required for reproducibility and to mitigate concerns about UK Biobank-specific biases. In the revised manuscript, we will detail the participant-level cross-validation strategy (to prevent leakage), hyperparameter selection process, explicit confounder modeling (including age, sex, and image quality metrics as covariates or stratification factors), class imbalance handling via weighted losses or oversampling, and missing label strategies. These elements were implemented in the original analysis but will now be fully documented with pseudocode or supplementary tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper trains supervised DL models on UK Biobank CFP images to predict 12 independent AD risk-factor labels, evaluates them on held-out participants, then applies post-hoc saliency (CAM) and compares derived scores to separate morphometric features and to incident-AD cases. None of these steps reduces a claimed result to its own inputs by construction: the risk-factor predictions are not redefined from the saliency maps, the CAM-Scores are interpretive outputs rather than fitted parameters renamed as predictions, and the incident-case comparison uses an external temporal split. No self-citation chain or uniqueness theorem is invoked to force the central claim. The analysis therefore remains non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the assumption that DL models extract biologically meaningful retinal features from CFP and that saliency methods validly interpret those features; many implicit assumptions about data quality and absence of confounding in UK Biobank are not detailed in the abstract.

free parameters (1)

Neural network weights and hyperparameters
Fitted during model training to map images to the 12 risk factor targets.

axioms (1)

domain assumption Saliency maps from DL models on retinal images reflect biologically meaningful structures rather than artifacts.
Invoked to link model predictions to optic nerve head and vasculature as relevant to AD risk.

pith-pipeline@v0.9.0 · 5645 in / 1373 out tokens · 51831 ms · 2026-05-09T19:48:43.221381+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Retinal manifestations and their diagnostic significance in Alzheimer’s disease

Abhyankar SD, Little K, Stitt A, et al. Retinal manifestations and their diagnostic significance in Alzheimer’s disease. Journal of Alzheimer’s Disease Reports 2025; 9: 25424823251361937. 20. Ge Y-J, Xu W, Ou Y-N, et al. Retinal biomarkers in Alzheimer’s disease and mild cognitive impairment: A systematic review and meta-analysis. Ageing Research Reviews ...

2025
[2]

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Ke G, Meng Q, Finley T, et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: Advances in Neural Information Processing Systems. Curran Associates, Inc., https://papers.nips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html (2017, accessed 9 January 2025). 32. Murti DMP, Pujianto U, Wibawa AP, et al. K-Neares...

work page internal anchor Pith review doi:10.48550/arxiv.2103.14030 2017
[3]

Retinal Vascular Morphology Reflects and Predicts Cerebral Small Vessel Disease: Evidences from Eye–Brain Imaging Analysis

Wu N, Xu M, Chen S, et al. Retinal Vascular Morphology Reflects and Predicts Cerebral Small Vessel Disease: Evidences from Eye–Brain Imaging Analysis. Research 2025; 8: 0633. 43. Moss HE, Cao J, Wasi M, et al. Variability of Retinal Vessel Tortuosity Measurements Using a Semiautomated Method Applied to Fundus Images in Subjects With Papilledema. Transl Vi...

work page doi:10.3389/fcvm.2021.674622 2025
[4]

Hyperglycemia Enhances Constriction of Retinal Venules via Activation of the Reverse-Mode Sodium-Calcium Exchanger

Chen Y-L, Xu W, Rosa RH, et al. Hyperglycemia Enhances Constriction of Retinal Venules via Activation of the Reverse-Mode Sodium-Calcium Exchanger. Diabetes 2019; 68: 1624–1634. 55. Liu X, Lai S, Ma S, et al. Development of a Novel Retina−Based Diagnostic Score for Early Detection of Major Depressive Disorder: An Interdisciplinary View. Front Psychiatry; ...

work page doi:10.3389/fpsyt.2022.897759 2019
[5]

Association between Retinal Vascular Geometric Changes and Cognitive Impairment: A Systematic Review and Meta-Analysis

Wu H, Wang C, Chen C, et al. Association between Retinal Vascular Geometric Changes and Cognitive Impairment: A Systematic Review and Meta-Analysis. J Clin Neurol 2020; 16: 19–28. 67. Williams MA, McGowan AJ, Cardwell CR, et al. Retinal microvascular network attenuation in Alzheimer’s disease. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monito...

2020