Latent space projections and atlases: A cautionary tale in deep neuroimaging using autoencoders
Pith reviewed 2026-05-25 07:53 UTC · model grok-4.3
The pith
Even minimal autoencoders on ADNI brain MRI capture Alzheimer's progression patterns when paired with latent-regional correlation profiling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A simple convolutional autoencoder with hierarchical encoder and compact latent space, trained on ADNI gray matter images, learns representations that reflect clinical variability across cognitive status. The LRCP framework identifies brain regions encoding clinically relevant latent information by combining statistical association and supervised discriminability. Post-hoc SHAP analysis of reconstruction error from atlas-based regional intensities reveals anatomically meaningful regions involved in class-specific reconstruction, with results further validated by statistical agnostic methods.
What carries the argument
The Latent-Regional Correlation Profiling (LRCP) framework, which integrates statistical association between latent dimensions and atlas regions with supervised discriminability scores to isolate brain areas that carry clinically relevant information.
If this is right
- Even minimal autoencoder architectures capture meaningful patterns associated with progression to Alzheimer's disease.
- LRCP can locate brain regions that encode clinically relevant latent information from the model.
- SHAP regression on reconstruction error can highlight anatomically meaningful regions for different clinical classes.
- Autoencoders can function as exploratory tools for biomarker discovery and hypothesis generation in clinical neuroscience.
- Multiple statistical validation methods are required to ensure interpretations are not driven by methodological artifacts.
Where Pith is reading between the lines
- If LRCP generalizes across datasets, it could be applied to other neurological conditions to surface candidate biomarkers from latent spaces.
- The cautionary framing implies that raw latent projections onto atlases can mislead without the added discriminability step.
- Testing LRCP on autoencoders with altered training objectives would reveal whether the identified regions depend on the specific reconstruction loss.
- The approach could be extended to compare latent spaces across different imaging modalities to check consistency of regional encoding.
Load-bearing premise
Observed correlations between latent dimensions and atlas regions reflect genuine neuroanatomical encoding of clinical status rather than artifacts from the autoencoder architecture, training procedure, or post-hoc dimensionality reduction.
What would settle it
Demonstrating that the regional correlations identified by LRCP disappear or reverse when the same data are processed with a linear dimensionality reduction method or when reconstruction errors are randomly permuted would falsify the claim.
Figures
read the original abstract
This study introduces a deep learning framework for the inferential exploration of latent representations in 3D brain MRI, leveraging a simple convolutional autoencoder with a hierarchical encoder and a compact latent space. Trained on segmented gray matter images from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the model learns latent representations that preserve neuroanatomical structure and reflect clinical variability across cognitive status. Dimensionality reduction techniques (PCA, t-SNE, PLS, UMAP) were applied to visualize and interpret the latent space, correlating it with anatomical regions defined by the AAL atlas. As a novel contribution, the Latent-Regional Correlation Profiling (LRCP) framework, which combines statistical association and supervised discriminability to identify brain regions that encode clinically relevant latent information is proposed. Our results show that even minimal architectures capture meaningful patterns associated with progression to Alzheimer's disease. Interpretability is assessed by applying SHAP-based regression to a post-hoc model that predicts reconstruction error from atlas-based regional gray matter intensities, thereby identifying anatomically meaningful regions involved in class-specific reconstruction strategies. These findings are further validated using statistical agnostic methods, highlighting the importance of rigorous evaluation in neuroimaging. This work demonstrates the potential of autoencoders as exploratory tools for biomarker discovery and hypothesis generation in clinical neuroscience.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a convolutional autoencoder with hierarchical encoder and compact latent space, trained on segmented gray-matter volumes from the ADNI dataset. It applies PCA, t-SNE, PLS and UMAP to the latent representations, correlates them with AAL atlas regions, and proposes the LRCP framework that combines statistical association with supervised discriminability to identify brain regions encoding clinically relevant latent information. SHAP-based regression on a post-hoc model predicting reconstruction error from regional intensities is used for interpretability, with the central claim that even minimal architectures capture meaningful patterns associated with Alzheimer's progression.
Significance. If the LRCP-identified regions and SHAP attributions can be shown to reflect clinical signal rather than reconstruction biases, the work would supply a concrete, reproducible pipeline for using autoencoders as hypothesis-generation tools in clinical neuroimaging and would underscore the value of post-hoc validation methods.
major comments (3)
- [Abstract and §3] Abstract and §3 (LRCP framework): the claim that LRCP identifies regions that 'encode clinically relevant latent information' rests on correlations between latent dimensions and AAL parcels, yet the description supplies no label-permutation tests, null-model controls, or held-out clinical validation that would isolate the Alzheimer's signal from the convolutional inductive biases and atlas parcellation itself.
- [Abstract and §4] Abstract and §4 (SHAP validation): the post-hoc regression predicts reconstruction error from atlas-based regional intensities; without an explicit control that removes clinical-status information (e.g., label permutation or matched reconstruction-error nulls), the resulting SHAP attributions cannot be guaranteed to reflect class-specific clinical encoding rather than architecture-driven reconstruction strategies.
- [Results] Results section: the abstract asserts that 'even minimal architectures capture meaningful patterns' but reports no quantitative metrics (R², AUC, p-values, or cross-validation statistics) for either the LRCP correlations or the SHAP attributions, leaving the central empirical claim without numerical support.
minor comments (1)
- [Abstract] The phrase 'statistical agnostic methods' in the abstract is unclear; a more precise term such as 'non-parametric statistical tests' or 'distribution-free validation' would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and indicate where revisions have been made to incorporate additional controls and metrics.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (LRCP framework): the claim that LRCP identifies regions that 'encode clinically relevant latent information' rests on correlations between latent dimensions and AAL parcels, yet the description supplies no label-permutation tests, null-model controls, or held-out clinical validation that would isolate the Alzheimer's signal from the convolutional inductive biases and atlas parcellation itself.
Authors: We agree that explicit null-model controls strengthen the interpretation. The LRCP framework reports Pearson correlations with associated p-values and incorporates supervised discriminability via PLS regression and classification performance on clinical labels. In the revised manuscript we have added label-permutation tests (1000 permutations) that compare observed correlations against those obtained after randomly shuffling clinical status labels, thereby quantifying the extent to which the identified associations exceed what would be expected from architectural or parcellation biases alone. revision: yes
-
Referee: [Abstract and §4] Abstract and §4 (SHAP validation): the post-hoc regression predicts reconstruction error from atlas-based regional intensities; without an explicit control that removes clinical-status information (e.g., label permutation or matched reconstruction-error nulls), the resulting SHAP attributions cannot be guaranteed to reflect class-specific clinical encoding rather than architecture-driven reconstruction strategies.
Authors: The SHAP analysis is performed on a regression model whose target is reconstruction error, and the manuscript already notes that attributions are interpreted in the context of class-specific reconstruction strategies. To directly address the concern, the revised version includes a label-permutation control: clinical labels are shuffled, the post-hoc regression is retrained, and SHAP values are recomputed; the original attributions are then compared against this null distribution to demonstrate that they are significantly altered when clinical information is removed. revision: yes
-
Referee: [Results] Results section: the abstract asserts that 'even minimal architectures capture meaningful patterns' but reports no quantitative metrics (R², AUC, p-values, or cross-validation statistics) for either the LRCP correlations or the SHAP attributions, leaving the central empirical claim without numerical support.
Authors: The original manuscript reports p-values for the LRCP correlations and classification accuracies for latent-space discriminability. We acknowledge, however, that R² for the SHAP regression and explicit cross-validation statistics were not presented. The revised results section now includes R² values for the post-hoc regression, AUC scores for the supervised discriminability components, and details of the cross-validation scheme used throughout the pipeline. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper describes an empirical workflow: training a convolutional autoencoder on gray-matter volumes to minimize reconstruction error, followed by post-hoc application of dimensionality reduction (PCA/t-SNE/PLS/UMAP), LRCP correlation with AAL parcels, and SHAP analysis on a separate regression model. No equations, uniqueness theorems, or self-citations are invoked that reduce any claimed result to a fitted parameter or input by construction. All reported associations are presented as data-driven observations rather than algebraic identities or renamed fits. The central claims therefore remain independent of the inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
S.M. Hofmann et al., ”The utility of explainable AI for MRI analysis: Relating model predictions to neuroimaging features of the aging brain,”bioRxiv, 2024
work page 2024
-
[2]
FJ Martinez-Murcia, et al.. Studying the manifold structure of Alzheimer’s disease: a deep learning approach using convolutional autoencoders. IEEE journal of biomedical and health informatics 24 (1), 17-26
-
[3]
DRIT++: Diverse image-to-image translation via disentangled representations,
H.-Y . Lee et al., “DRIT++: Diverse image-to-image translation via disentangled representations,” 2019, arXiv:1905.01270
-
[4]
R.A. Zeineldin et al., ”Explainable hybrid vision transformers and convolutional network for multimodal glioma segmentation in brain MRI,”Scientific Reports, 2024
work page 2024
-
[5]
JM Gorriz, et al (2024) Is K-fold cross validation the best model selection method for Machine Learning? arXiv preprint arXiv:2401.16407
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[6]
Cluster failure: Inflated false positives for fMRI
A.Eklund, et al. Cluster failure: Inflated false positives for fMRI. Proceedings of the National Academy of Sci- ences Jul 2016, 113 (28) 7900-7905
work page 2016
-
[7]
S. Noble, et al. Cluster failure or power failure? Evaluating sensitivity in cluster-level inference. NeuroImage, 209, 116468,2020
work page 2020
- [8]
-
[9]
Varma S. et al. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics volume 7, Article number: 91 (2006)
work page 2006
-
[10]
C. Bass, M. da Silva, C. Sudre, L. Z. J. Williams, H. S. Sousa, P.-D. Tudosiu, F. Alfaro-Almagro, S. P. Fitzgib- bon, M. F. Glasser, S. M. Smith, and E. C. Robinson, “ICAM-Reg: Interpretable classification and regression with feature attribution for mapping neurological phenotypes in individual scans,”IEEE Transactions on Medical Imaging, 2023
work page 2023
-
[11]
Alzheimer’s Research & Therapy, vol
Zhang, X., et al.,Longitudinal structural MRI-based deep learning and radiomics features for predicting Alzheimer’s disease progression. Alzheimer’s Research & Therapy, vol. 16, no. 1, 2025. Used 3D-Grad-CAM on a 3D-ResNet model to visualize the most influential voxels contributing to risk predictions in AD
work page 2025
-
[12]
N. Nikaido, H. Tanaka, T. Yamamoto, Y . Fujita, S. Mori, Deep-SHAP: Mapping Multivariate Relationships Between Regional Neuroimaging Biomarkers and Cognition in MCI/AD, NeuroImage, vol. 276, p. 119589, 2024
work page 2024
- [13]
-
[14]
Bass, C., et al. (2022). ICAM-Reg: Interpretable Classification and Regression With Feature Attribution for Mapping Neurological Phenotypes in Individual Scans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022. https://doi.org/10.1109/CVPR52688.2022.01164
-
[15]
C. Biffi et al., ”Explainable anatomical shape analysis through deep hierarchical generative models,”IEEE Trans- actions on Medical Imaging, 2019
work page 2019
-
[16]
J.M. Gorriz et al. (2025) Autoencoder-based MRI linking latent projections to brain anatomy. IEEE NSS-MIC- RTSD conference, Yokohama. Japan
work page 2025
-
[17]
Bates, S., et al. (2023). Cross-Validation: What Does It Estimate and How Well Does It Do It? Journal of the American Statistical Association, 1–12
work page 2023
-
[18]
Gorriz, J.M., et al. (2025). Statistical Agnostic Regression: a machine learning method to validate regression models. Journal of Advanced Research. Advance online publication. https://doi.org/10.1016/j.jare.2025.04.026
-
[19]
Image-to-image translation with conditional adversarial networks,
P. Isola, et al., “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Com- put. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1125–1134. Comput.-Assist. Intervent. Cham, Switzerland: Springer, 2020, pp. 315–325
work page 2017
-
[20]
Unpaired image-to-image translation using cycle-consistent adver- sarial networks,
J.-Y . Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adver- sarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2223–2232
work page 2017
-
[21]
Multimodal unsupervised image-to-image translation,
X. Huang, M.-Y . Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 172–189. 20 Latent Space Projections and Atlases: A Cautionary Tale in Deep Neuroimaging using AutoencodersA PREPRINT
work page 2018
-
[22]
Unsupervised image-to-image translation networks,
M.-Y . Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 700–708
work page 2017
-
[23]
Disentangling factors of variation with cycle-consistent variational auto-encoders,
A. H. Jha, S. Anand, M. Singh, and V . Veeravasarapu, “Disentangling factors of variation with cycle-consistent variational auto-encoders,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2018, pp. 829–845
work page 2018
-
[24]
Visual feature attribution using Wasserstein GANs,
C. F. Baumgartner, L. M. Koch, K. C. Tezcan, J. X. Ang, and E. Konukoglu, “Visual feature attribution using Wasserstein GANs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 8309–8319
work page 2018
-
[25]
Image synthesis with a convolutional capsule generative adversarial network,
C. Bass et al., “Image synthesis with a convolutional capsule generative adversarial network,” in Proc. Int. Conf. Med. Imag. Deep Learn., 2019, pp. 1–24
work page 2019
-
[26]
Deep autoencoding models for unsupervised anomaly segmentation in brain MR images,
C. Baur, B. Wiestler, S. Albarqouni, and N. Navab, “Deep autoencoding models for unsupervised anomaly segmentation in brain MR images,” in Proc. Int. MICCAI Brainlesion Workshop. Cham, Switzerland: Springer, 2018, pp. 161–169
work page 2018
-
[27]
End-to-end adversarial retinal image synthesis,
P. Costa et al., “End-to-end adversarial retinal image synthesis,” IEEE Trans. Med. Imag., vol. 37, no. 3, pp. 781–791, Mar. 2017
work page 2017
- [28]
-
[29]
Snoek, L., et al. (2019). How to control for confounds in decoding analyses of neuroimaging data. NeuroImage, 184, 741–760
work page 2019
-
[30]
G ¨orgen, K.et al. (2018). The same analysis approach: Practical protection against the pitfalls of novel neuroimag- ing analysis methods. NeuroImage, 180, 19–30. https://doi.org/10.1016/j.neuroimage.2017.12.083
-
[31]
R. M. Cichy et al (2019). ”Deep neural networks as scientific models,” Trends in Cognitive Sciences, vol. 23, no. 4, pp. 305–317
work page 2019
-
[32]
R. M. Cichy, et al. Comparison of deep neural networks to spatio-temporal cortical dynamics of human vi- sual object recognition reveals hierarchical correspondence. Scientific Reports, vol. 6, p. 27755, 2016. doi: 10.1038/srep27755
-
[33]
S. Chatterjee et al., ”TorchEsegeta: Framework for Interpretability and Explainability of Image-based DL Mod- els,”Applied Sciences, 2021
work page 2021
-
[34]
Hinton et al., ”Reducing the dimensionality of data with NN,” Science 313(5786):504-7 2006
G.E. Hinton et al., ”Reducing the dimensionality of data with NN,” Science 313(5786):504-7 2006
work page 2006
-
[35]
Tzourio-Mazoyer N., et al. (2002). Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage, 15(1):273–289. doi:10.1006/nimg.2001.0978
-
[36]
CAT – A Computational Anatomy Toolbox for the Analysis of Structural MRI Data
Gaser, C., et al (2016). CAT – A Computational Anatomy Toolbox for the Analysis of Structural MRI Data. Hbm. doi:10.7490/f1000research.111.1603.1
- [37]
-
[38]
S. Boucheron et al. Concentration Inequalities: A Nonasymptotic Theory of Independence ISBN: 9780199535255 Oxford University Press
-
[39]
L. van der Maaten et al., ”Visualizing data using t-SNE,” Journal of Machine Learning Research 9 (2008) 2579- 2605
work page 2008
-
[40]
L. McInnes, J. Healy, and J. Melville, ”UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,”arXiv
-
[41]
Frisoni, G. et al. (2010). The clinical use of structural MRI in Alzheimer disease. Nature Reviews Neurology, 6(2), 67-77
work page 2010
- [42]
-
[43]
Li, X., et al. (2022). Altered functional connectivity of Heschl’s gyrus in Alzheimer’s disease and mild cognitive impairment. Frontiers in Aging Neuroscience, 14, 823456
work page 2022
-
[44]
H. Braak and E. Braak, Neuropathological staging of Alzheimer-related changes, Acta Neuropathologica, vol. 82, no. 4, pp. 239–259, 1991
work page 1991
-
[45]
M. Tondelli et al., Structural MRI changes detectable before mild cognitive impairment in the familial Alzheimer’s disease mutation carriers, Neurobiology of Aging, vol. 33, no. 10, pp. 2556–2566, 2012. 21 Latent Space Projections and Atlases: A Cautionary Tale in Deep Neuroimaging using AutoencodersA PREPRINT
work page 2012
-
[46]
A. Antonelli et al., Caudate nucleus volume and cognitive dysfunction in Alzheimer’s disease, Neurobiology of Aging, vol. 36, no. 10, pp. 2860–2866, 2015
work page 2015
-
[47]
S. Hong et al., Putamen atrophy correlates with cognitive decline in Alzheimer’s disease, Journal of Alzheimer’s Disease, vol. 64, no. 4, pp. 1193–1201, 2018
work page 2018
-
[48]
H. I. L. Jacobs et al., Cerebellar contribution to cognition in Alzheimer’s disease and other dementias, Neuro- science & Biobehavioral Reviews, vol. 90, pp. 234–245, 2018
work page 2018
-
[49]
M. Schafer et al., Cerebellar changes in Alzheimer’s disease and dementia with Lewy bodies, Neurobiology of Aging, vol. 35, no. 6, pp. 1509–1519, 2014
work page 2014
-
[50]
S. L. Risacher and A. J. Saykin, Longitudinal MRI atrophy patterns in mild cognitive impairment and Alzheimer’s disease, Neurobiology of Aging, vol. 34, no. 12, pp. 2449–2464, 2013
work page 2013
-
[51]
K. Tsuchiya et al., Fusiform gyrus volume reduction in Alzheimer’s disease: MRI study, Neuroscience Letters, vol. 402, no. 1-2, pp. 105–110, 2006. 22 Latent Space Projections and Atlases: A Cautionary Tale in Deep Neuroimaging using AutoencodersA PREPRINT Figure 10: Correlation analysis including all comparisons and anatomical AAL regions is shown for the...
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.