pith. sign in

arxiv: 2606.20037 · v1 · pith:DLQWOI6Nnew · submitted 2026-06-18 · 💻 cs.LG

Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET

Pith reviewed 2026-06-26 17:42 UTC · model grok-4.3

classification 💻 cs.LG
keywords Alzheimer's diseasemultimodal fusionMRIPETmixture of expertsgated multimodal unitneuroimagingdeep learning
0
0 comments X

The pith

A multimodal model fuses 3D MRI and PET features with gated units or self-attention plus input-adaptive mixture-of-experts routing to classify normal cognition, mild impairment, and Alzheimer's disease.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors combine 3D convolutional networks to extract features from MRI and PET scans, then test three fusion methods and route each case through a sparsely gated mixture-of-experts classifier that activates only the most relevant experts. GMU fusion reaches 80.46 percent accuracy on normal versus mild cognitive impairment and 95.47 percent on normal versus Alzheimer's, while gated self-attention reaches 82.08 percent on mild impairment versus Alzheimer's; removing the mixture-of-experts layer lowers results on every task. Grad-CAM maps are used to show which brain regions drive the decisions. This setup is intended to handle differences across patients and imaging sites better than static concatenation of the two modalities.

Core claim

The paper establishes that the first reported combination of 3D convolutional feature extractors, concatenation/GMU/gated self-attention fusion, and a sparsely gated MoE classifier with input-adaptive routing produces the reported accuracies on the three binary tasks and that the MoE component consistently improves performance by selecting informative experts per subject.

What carries the argument

Sparsely gated Mixture-of-Experts classifier that performs input-adaptive routing, activating only the most informative experts per case.

If this is right

  • GMU fusion reaches 80.46 percent accuracy separating normal cognition from mild cognitive impairment and 95.47 percent separating normal cognition from Alzheimer's.
  • Gated self-attention reaches 82.08 percent accuracy separating mild cognitive impairment from Alzheimer's.
  • Removing the mixture-of-experts component lowers accuracy on all three tasks.
  • Grad-CAM produces visualizations of disease-related brain regions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Internal expert routing may supply partial robustness to scanner or site differences that would otherwise require separate domain-adaptation steps.
  • The same architecture could be tested on longitudinal scans to predict conversion from mild impairment to Alzheimer's rather than cross-sectional classification alone.

Load-bearing premise

Performance on the three binary tasks from the study's dataset distribution is enough to claim robustness without external validation or explicit site-effect modeling.

What would settle it

A new multi-site test set in which the full model no longer exceeds a simple concatenation baseline or in which any task accuracy falls below 70 percent.

Figures

Figures reproduced from arXiv: 2606.20037 by Anthi-Maria Vozinaki, Christos Ntanos, Dimitris Askounis, Loukas Ilias.

Figure 2
Figure 2. Figure 2: The CNN architecture used in this study B. Fusion • Concatenation: It directly combines MRI and PET features into a single feature vector. This straightforward yet effective approach preserves all the information from both modalities, serving as a baseline for comparison with more advanced methods. The resulting concatenated vector has a dimensionality of 256, as each modality contributes 128 features. • G… view at source ↗
Figure 1
Figure 1. Figure 1: Methodology pipeline A. Feature Extraction - CNN Architecture As shown in [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: MoE architecture of this study V. EXPERIMENTS A. Baselines We compare our introduced method with the following research studies (see Section II for more details of the following studies): • Unimodal Approaches: – Random Forest [14] – 3D CNNs [15] • Multimodal Approaches – Stacked Autoencoders [21] – Multiscale DNN [22] – 3D Multiscale CNN [24] – 2D & 3D CNNs [26] – camAD [9] – PT DCN [8] – 3D CNN + 3D CLST… view at source ↗
Figure 4
Figure 4. Figure 4: illustrates Grad-CAM visualizations applied to MRI and PET scans of an AD patient. The first column presents the original MRI (top) and PET (bottom) scans, while the subsequent columns display Grad-CAM heatmaps overlaid on different axial slices (slice = 20, 30, 40). This visualization aids in interpreting the model’s decision-making process by identifying key regions contributing to AD classification. The… view at source ↗
read the original abstract

Alzheimer's disease (AD) is an irreversible neurodegenerative disorder and a leading cause of death worldwide. Early diagnosis plays an important part especially at the Mild Cognitive Impairment stage, where timely intervention can help slow its progression before it advances to AD. Neuroimaging data, like Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) scans, can help detect brain changes early by providing structural and functional brain changes related to the disease. Yet, many multimodal models still fuse MRI and PET with static concatenation and apply identical computation to all subjects, which limits robustness to patient/site heterogeneity and can waste computation. To address these limitations, we present the first study of combining 3D convolutional feature extractors with three fusion strategies - concatenation, Gated Multimodal Unit (GMU), and gated self-attention - and a sparsely gated Mixture-of-Experts (MoE) classifier that performs input-adaptive routing, activating only the most informative experts per case. Finally, we utilize Grad-CAM to visualize disease-related regions, ensuring model interpretability. Experiments are performed across three binary classification tasks (NC vs. MCI, MCI vs. AD, and NC vs. AD). Results show that GMU achieves accuracies of 80.46 % (NC vs. MCI) and 95.47 % (NC vs. AD), while gated self-attention attains 82.08 % on MCI vs. AD. Ablations show that removing the MoE consistently degrades accuracy across all tasks. These findings underscore the value of input-adaptive, multimodal modeling for AD diagnosis by leveraging the complementary nature of MRI and PET.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes combining 3D CNN feature extractors from MRI and PET with three fusion strategies (concatenation, Gated Multimodal Unit (GMU), gated self-attention) and a sparsely gated Mixture-of-Experts (MoE) classifier for input-adaptive routing. It evaluates the approach on three binary classification tasks (NC vs. MCI, MCI vs. AD, NC vs. AD), reporting accuracies of 80.46% (GMU on NC vs. MCI), 95.47% (GMU on NC vs. AD), and 82.08% (gated self-attention on MCI vs. AD), with ablations indicating consistent degradation when MoE is removed. Grad-CAM visualizations are used for interpretability, and the work positions itself as addressing limitations of static fusion in handling patient/site heterogeneity.

Significance. If the empirical results can be reproduced with standard experimental controls, the adaptive multimodal fusion via MoE could offer a useful direction for improving robustness in AD neuroimaging models. The explicit comparison of multiple fusion mechanisms plus the ablation on MoE routing provides a concrete test of input-adaptive computation, which is a strength relative to many static multimodal baselines in the literature.

major comments (2)
  1. [Abstract] Abstract, results paragraph: accuracies (80.46% GMU NC vs. MCI; 95.47% GMU NC vs. AD; 82.08% gated self-attention MCI vs. AD) and the MoE ablation gains are reported without dataset name, subject counts, train/test split protocol, cross-validation method, baseline comparisons, confidence intervals, or statistical tests. These omissions are load-bearing for the central claim that the fusion strategies plus MoE demonstrate value for heterogeneity.
  2. [Abstract] Abstract and experimental description: the claim that internal MoE routing addresses robustness to patient/site heterogeneity is not supported by any external validation cohort, multi-site analysis, or explicit site-effect modeling. The three binary tasks on an unspecified data distribution therefore cannot be taken as evidence that performance differences arise from multimodal complementarity rather than in-distribution artifacts or leakage.
minor comments (1)
  1. [Abstract] The abstract states this is the 'first study' combining the listed components; a more detailed positioning against prior multimodal AD work (e.g., other GMU or attention-based fusions) would strengthen the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where the abstract and positioning of results can be strengthened for clarity and rigor. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract, results paragraph: accuracies (80.46% GMU NC vs. MCI; 95.47% GMU NC vs. AD; 82.08% gated self-attention MCI vs. AD) and the MoE ablation gains are reported without dataset name, subject counts, train/test split protocol, cross-validation method, baseline comparisons, confidence intervals, or statistical tests. These omissions are load-bearing for the central claim that the fusion strategies plus MoE demonstrate value for heterogeneity.

    Authors: We agree that the abstract must supply sufficient context for the reported numbers. The full manuscript details the dataset, subject counts, train/test splits, cross-validation protocol, baselines, confidence intervals, and statistical tests in the Methods and Experiments sections. To address this directly, we will revise the abstract to include concise mentions of the dataset, evaluation protocol, and statistical significance of the MoE gains. revision: yes

  2. Referee: [Abstract] Abstract and experimental description: the claim that internal MoE routing addresses robustness to patient/site heterogeneity is not supported by any external validation cohort, multi-site analysis, or explicit site-effect modeling. The three binary tasks on an unspecified data distribution therefore cannot be taken as evidence that performance differences arise from multimodal complementarity rather than in-distribution artifacts or leakage.

    Authors: We acknowledge that the current experiments use a single dataset without external validation cohorts, multi-site analysis, or explicit site-effect modeling, and thus do not provide direct evidence against leakage or in-distribution artifacts. The MoE ablation results demonstrate the benefit of input-adaptive routing, but we agree this falls short of proving robustness to patient/site heterogeneity. We will revise the abstract, introduction, and discussion to tone down the claim, framing the MoE as a promising mechanism for handling heterogeneity and explicitly noting the need for future multi-site validation. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical results from held-out evaluation

full rationale

The paper reports experimental accuracies from training and testing multimodal 3D CNN models (with GMU, gated self-attention, and MoE) on three binary AD classification tasks. No derivation chain, equations, or first-principles claims exist that could reduce to inputs by construction. Ablation results and Grad-CAM visualizations are likewise direct empirical measurements. The work contains no self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations that would trigger any of the enumerated circularity patterns. This is a standard empirical ML paper whose central claims rest on external data splits rather than internal redefinitions.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

Based on abstract only; model relies on standard supervised learning assumptions and the domain claim that MRI and PET are complementary, with all parameters learned from data.

free parameters (2)
  • MoE routing and expert weights
    Learned gating parameters that decide which experts activate per input; fitted during training.
  • Fusion gate parameters in GMU and self-attention
    Learned scalars or matrices controlling modality weighting.
axioms (1)
  • domain assumption MRI and PET scans supply complementary structural and functional signals for AD staging
    Invoked by the decision to pursue multimodal fusion rather than single-modality models.

pith-pipeline@v0.9.1-grok · 5840 in / 1337 out tokens · 28976 ms · 2026-06-26T17:42:18.058334+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 3 canonical work pages

  1. [1]

    Brain glucose metabolism in the early and specific diagnosis of alzheimer’s disease: Fdg-pet studies in mci and ad,

    L. Mosconi, “Brain glucose metabolism in the early and specific diagnosis of alzheimer’s disease: Fdg-pet studies in mci and ad,”European journal of nuclear medicine and molecular imaging, vol. 32, no. 4, pp. 486–510, 2005

  2. [2]

    Khan and R

    R. Khan and R. C. Petersen,Mild Cognitive Impairment. Treasure Island, FL: StatPearls Publishing, 2024, available from: https://www.ncbi. nlm.nih.gov/books/NBK554221/

  3. [3]

    The role of amyloid pet in imaging neurodegenerative disorders: a review,

    M. Chapleau, L. Iaccarino, D. Soleimani-Meigooni, and G. D. Rabinovici, “The role of amyloid pet in imaging neurodegenerative disorders: a review,” Journal of Nuclear Medicine, vol. 63, no. Supplement 1, pp. 13S–19S, 2022

  4. [4]

    Quantitative structural mri for early detection of alzheimer’s disease,

    L. K. McEvoy and J. B. Brewer, “Quantitative structural mri for early detection of alzheimer’s disease,”Expert review of neurotherapeutics, vol. 10, no. 11, pp. 1675–1688, 2010

  5. [5]

    Clinical translation of integrated pet-mri for neurodegenerative disease,

    T. M. Shepherd and S. Dogra, “Clinical translation of integrated pet-mri for neurodegenerative disease,”Journal of Magnetic Resonance Imaging, 2025

  6. [6]

    Multimodal deep learning models for early detection of alzheimer’s disease stage,

    J. Venugopalan, L. Tong, H. R. Hassanzadeh, and M. D. Wang, “Multimodal deep learning models for early detection of alzheimer’s disease stage,”Scientific reports, vol. 11, no. 1, p. 3254, 2021

  7. [7]

    Multimodal 3d deep learning for early diagnosis of alzheimer’s disease,

    S. K. Kim, Q. A. Duong, and J. K. Gahm, “Multimodal 3d deep learning for early diagnosis of alzheimer’s disease,”IEEE Access, vol. 12, pp. 46 278–46 289, 2024

  8. [8]

    Task-induced pyramid and attention gan for multimodal brain image imputation and classification in alzheimer’s disease,

    X. Gao, F. Shi, D. Shen, and M. Liu, “Task-induced pyramid and attention gan for multimodal brain image imputation and classification in alzheimer’s disease,”IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 1, pp. 36–43, 2022

  9. [9]

    Multimodal medical image feature representation and fusion for ad early diagnosis,

    J. Qu, Y . Qu, X. Bai, L. Xu, H. Wang, and G. Liu, “Multimodal medical image feature representation and fusion for ad early diagnosis,” in 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024, pp. 1645–1648. TABLE V: Ablation Study. Performance without the MoE framework. Results are averaged across 10 runs. Fusion Method Task ...

  10. [10]

    Neural Computing and Applications32(14), 10209–10228 (Jul 2020)

    J. Arevalo, T. Solorio, M. Montes-y Gómez, and F. A. González, “Gated multimodal networks,”Neural Comput. Appl., vol. 32, no. 14, p. 10209–10228, Jul. 2020. [Online]. Available: https: //doi.org/10.1007/s00521-019-04559-1

  11. [11]

    Multimodal unified attention networks for vision-and-language interactions,

    Z. Yu, Y . Cui, J. Yu, D. Tao, and Q. Tian, “Multimodal unified attention networks for vision-and-language interactions,”arXiv preprint arXiv:1908.04107, 2019

  12. [12]

    Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,

    N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” inInternational Conference on Learning Representations, 2017. [Online]. Available: https://openreview.net/forum? id=B1ckMDqlg

  13. [13]

    Grad-cam: Visual explanations from deep networks via gradient-based localization,

    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” inProceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017

  14. [14]

    Random forest ensembles for detection and prediction of alzheimer’s disease with good between-cohort robustness,

    A. V . Lebedev, E. Westman, G. J. P. Van Westen, M. G. Kramberger, A. Lundervold, D. Aarsland, and et al., “Random forest ensembles for detection and prediction of alzheimer’s disease with good between-cohort robustness,”NeuroImage: Clinical, vol. 6, pp. 115–125, 2014

  15. [15]

    Predicting alzheimer’s disease: a neu- roimaging study with 3d convolutional neural networks,

    A. Payan and G. Montana, “Predicting alzheimer’s disease: a neu- roimaging study with 3d convolutional neural networks,”arXiv preprint arXiv:1502.02506, 2015

  16. [16]

    Classification of alzheimer’s disease using fmri data and deep learning convolutional neural networks,

    S. Sarraf and G. Tofighi, “Classification of alzheimer’s disease using fmri data and deep learning convolutional neural networks,”arXiv preprint arXiv:1603.08631, 2016

  17. [17]

    A deep learning model for early diagnosis of alzheimer’s disease combined with 3d cnn and video swin transformer,

    J. Zhou, Y . Wei, X. Li, W. Zhou, R. Tao, Y . Hua, and H. Liu, “A deep learning model for early diagnosis of alzheimer’s disease combined with 3d cnn and video swin transformer,”Scientific Reports, vol. 15, no. 1, p. 23311, 2025

  18. [18]

    A deep learning-based ensemble method for early diagnosis of alzheimer’s disease using mri images,

    S. Fathi, A. Ahmadi, A. Dehnad, M. Almasi-Dooghaee, M. Sadegh, and A. D. N. Initiative, “A deep learning-based ensemble method for early diagnosis of alzheimer’s disease using mri images,”Neuroinformatics, vol. 22, no. 1, pp. 89–105, 2024

  19. [19]

    Bilstm-ann: early diagnosis of alzheimer’s disease using hybrid deep learning algorithms,

    P. Matlani, “Bilstm-ann: early diagnosis of alzheimer’s disease using hybrid deep learning algorithms,”Multimedia tools and applications, vol. 83, no. 21, pp. 60 761–60 788, 2024

  20. [20]

    A novel end-to-end hybrid network for alzheimer’s disease detection using 3d cnn and 3d clstm,

    Z. Xia, G. Yue, Y . Xu, C. Feng, M. Yang, T. Wang, and B. Lei, “A novel end-to-end hybrid network for alzheimer’s disease detection using 3d cnn and 3d clstm,” in2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 2020, pp. 1–4

  21. [21]

    Early diagnosis of alzheimer’s disease with deep learning,

    S. Liu, S. Liu, W. Cai, S. Pujol, R. Kikinis, and D. Feng, “Early diagnosis of alzheimer’s disease with deep learning,” in2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI). Beijing, China: IEEE, 2014, pp. 1015–1018

  22. [22]

    Multimodal and multiscale deep neural networks for the early diagnosis of alzheimer’s disease using structural mr and fdg-pet images,

    D. Lu, K. Popuri, G. W. Ding, R. Balachandar, and M. F. Beg, “Multimodal and multiscale deep neural networks for the early diagnosis of alzheimer’s disease using structural mr and fdg-pet images,” Scientific Reports, vol. 8, no. 1, p. 5697, 2018. [Online]. Available: https://doi.org/10.1038/s41598-018-22871-z

  23. [23]

    Diagnosis of alzheimer’s disease via multi-modality 3d convolutional neural network,

    Y . Huang, J. Xu, Y . Zhou, T. Tong, X. Zhuang, and A. D. N. Initiative, “Diagnosis of alzheimer’s disease via multi-modality 3d convolutional neural network,”arXiv preprint arXiv:1902.09904, 2019

  24. [24]

    An effective multimodal image fusion method using mri and pet for alzheimer’s disease diagnosis,

    J. Song, J. Zheng, P. Li, X. Lu, G. Zhu, and P. Shen, “An effective multimodal image fusion method using mri and pet for alzheimer’s disease diagnosis,”Frontiers in Digital Health, vol. 3, p. 637386, 2021

  25. [25]

    Multimodal attention- based deep learning for alzheimer’s disease diagnosis,

    M. Golovanevsky, C. Eickhoff, and R. Singh, “Multimodal attention- based deep learning for alzheimer’s disease diagnosis,”arXiv preprint arXiv:2206.08826v2, 2022. [Online]. Available: https://doi.org/10.48550/ arXiv.2206.08826

  26. [26]

    Automated detection of alzheimer’s disease: A multi-modal approach with 3d mri and amyloid pet,

    G. Castellano, A. Esposito, E. Lella, G. Montanaro, and G. Vessio, “Automated detection of alzheimer’s disease: A multi-modal approach with 3d mri and amyloid pet,”Scientific Reports, vol. 14, no. 1, p. 5210, 2024

  27. [27]

    The alzheimer’s disease neuroimaging initiative (adni): Mri methods,

    C. R. Jack Jr, M. A. Bernstein, N. C. Fox, P. Thompson, G. Alexander, D. Harveyet al., “The alzheimer’s disease neuroimaging initiative (adni): Mri methods,”Journal of Magnetic Resonance Imaging, vol. 27, no. 4, pp. 685–691, 2008

  28. [28]

    Multimodal deep learning models for detecting dementia from speech and transcripts,

    L. Ilias and D. Askounis, “Multimodal deep learning models for detecting dementia from speech and transcripts,”Frontiers in Aging Neuroscience, vol. V olume 14 - 2022, 2022. [Online]. Available: https://www.frontiersin. org/journals/aging-neuroscience/articles/10.3389/fnagi.2022.830943

  29. [29]

    Experiment tracking with weights and biases,

    L. Biewald, “Experiment tracking with weights and biases,” 2020, software available from wandb.com. [Online]. Available: https: //www.wandb.com/

  30. [30]

    Pytorch: An imperative style, high- performance deep learning library,

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high- performance deep learning library,” inAdvances in Neural Information Processing S...