BrainAnytime: Anatomy-Aware Cross-Modal Pretraining for Brain Image Analysis with Arbitrary Modality Availability
Pith reviewed 2026-05-14 20:24 UTC · model grok-4.3
The pith
A single pretrained model analyzes brain images using whatever MRI or PET scans are available at the time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BrainAnytime is a single 3D masked autoencoder pretrained on 34,899 scans that uses cross-modal distillation to transfer information between MRI and PET and atlas-guided curriculum masking to emphasize regions prone to neurodegeneration, allowing the same weights to be applied to any combination of available sequences from a lone T1 scan up to a full multimodal workup.
What carries the argument
Multi-MAE3D shared encoder that performs anatomy-aware cross-modal distillation (RCMD) between MRI and PET together with atlas-guided curriculum masking (PACM) to prioritize disease-vulnerable structures during pretraining.
If this is right
- A hospital can deploy one network across scanners that routinely omit different sequences without building separate models for each protocol.
- Performance on CN-versus-AD and CN-versus-MCI tasks improves by roughly 6-7 percent relative to prior missing-modality methods even when only partial imaging is supplied.
- The same pretrained weights support both routine structural diagnosis and molecular confirmation tasks without modality-specific fine-tuning.
- Curriculum masking focused on atlas-defined vulnerable anatomy transfers disease-relevant features more effectively than uniform random masking.
Where Pith is reading between the lines
- The same pretraining recipe could be applied to other organ systems where imaging protocols vary by site and where certain sequences are routinely skipped.
- Longitudinal scans of the same patient could be fed through the model at different time points to track progression without requiring identical modality sets each visit.
- If the learned correspondences generalize, the framework could serve as a starting point for adding emerging modalities such as tau-PET without full retraining.
Load-bearing premise
The structural-molecular relationships captured from the five chosen datasets will continue to hold for previously unseen modality combinations and for patient groups outside the original training populations.
What would settle it
Evaluation on an external cohort that supplies only T2-FLAIR without T1 or PET, checking whether classification accuracy on CN versus AD falls below that of a T2-FLAIR-only model trained from scratch.
Figures
read the original abstract
Clinical diagnostic workups typically follow a modality escalation pathway: after initial clinical evaluation, clinicians begin with routine structural imaging (e.g., MRI), selectively add sequences such as FLAIR or T2 to refine the differential, and reserve molecular imaging (e.g., amyloid-PET) for cases that remain uncertain after standard evaluation. Consequently, patients are observed with heterogeneous and often incomplete modality subsets. However, most current AI models assume fixed data modalities as the model inputs. In this paper, we present BrainAnytime, a unified pretraining framework pretrained on 34,899 3D brain scans from five datasets that support brain image analysis under arbitrary modality availability spanning multi-sequence MRI and amyloid-PET. A single model accepts whatever imaging is available, from a lone T1 scan to a full multimodal workup. Pretraining learns structural-molecular correspondences between MRI and PET via cross-modal distillation (RCMD) and prioritizes disease-vulnerable anatomy via atlas-guided curriculum masking (PACM), all within a shared 3D masked autoencoder (Multi-MAE3D). Across four downstream tasks and five clinically motivated modality settings, BrainAnytime largely outperforms modality-specific models, missing-modality baselines, and large-scale brain MRI pretrained foundation models on most modality settings. Notably, it surpasses the strongest missing-modality baselines with relative improvements of 6.2% and 7.0% in average accuracy on CN vs. AD and CN vs. MCI classification, respectively. Code is available at https://github.com/SDH-Lab/BrainAnytime.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents BrainAnytime, a unified pretraining framework for brain image analysis under arbitrary modality availability. It pretrains a shared 3D masked autoencoder (Multi-MAE3D) on 34,899 scans from five datasets using RCMD cross-modal distillation to learn MRI-PET correspondences and PACM atlas-guided curriculum masking to prioritize disease-vulnerable anatomy. A single model is claimed to accept inputs ranging from lone T1 to full multimodal sets. Across four downstream tasks and five clinically motivated modality settings, it outperforms modality-specific models, missing-modality baselines, and large-scale MRI foundation models, with relative accuracy gains of 6.2% on CN vs. AD and 7.0% on CN vs. MCI.
Significance. If the generalization claims hold, the work would be significant for clinical applications where imaging modalities are heterogeneous and incomplete, reducing the need for modality-specific models. The scale of pretraining data and public code release strengthen the contribution. However, the practical impact is limited by the narrow range of tested modality combinations.
major comments (2)
- Experiments section: The central claim of support for arbitrary modality availability (from lone T1 to full multimodal) is not load-bearing, as all quantitative results are restricted to five clinically motivated modality settings. No evaluations are reported for unseen combinations such as isolated FLAIR, T2+PET without T1, or PET-only, leaving the generalization to arbitrary unseen subsets untested.
- Results section: The reported outperformance and relative improvements (6.2% and 7.0%) lack statistical significance tests, confidence intervals, or ablation controls on the pretraining components (RCMD and PACM), making it unclear whether gains are robust or dataset-specific.
minor comments (2)
- Abstract and methods: The exact five modality settings used in evaluation should be enumerated explicitly for reproducibility.
- Notation: The acronyms RCMD and PACM are introduced without sufficient expansion or pseudocode in the main text, which could aid clarity for readers unfamiliar with the distillation and masking strategies.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and recognition of the potential clinical significance of our work. We address each major comment below and describe the revisions we will implement.
read point-by-point responses
-
Referee: Experiments section: The central claim of support for arbitrary modality availability (from lone T1 to full multimodal) is not load-bearing, as all quantitative results are restricted to five clinically motivated modality settings. No evaluations are reported for unseen combinations such as isolated FLAIR, T2+PET without T1, or PET-only, leaving the generalization to arbitrary unseen subsets untested.
Authors: We agree that the current quantitative results focus on five clinically motivated settings and that testing additional unseen combinations would more strongly support the arbitrary availability claim. In the revised manuscript, we will add evaluations for isolated FLAIR, T2+PET without T1, and PET-only inputs in the Experiments section, using the same downstream tasks to demonstrate generalization. revision: yes
-
Referee: Results section: The reported outperformance and relative improvements (6.2% and 7.0%) lack statistical significance tests, confidence intervals, or ablation controls on the pretraining components (RCMD and PACM), making it unclear whether gains are robust or dataset-specific.
Authors: We acknowledge that statistical tests, confidence intervals, and component ablations are needed to establish robustness. We will add paired statistical significance tests and 95% confidence intervals for the reported accuracy improvements. We will also include ablation experiments that isolate the contributions of RCMD and PACM, reporting results across the datasets to confirm the gains are not dataset-specific. revision: yes
Circularity Check
No circularity: empirical downstream evaluations are independent of pretraining definitions
full rationale
The paper's central claims consist of measured accuracy improvements on four downstream tasks across five modality settings using held-out data from the pretraining corpus. These results are obtained by standard fine-tuning and evaluation protocols rather than by any equation that reduces a reported prediction to a fitted parameter or self-defined quantity inside the same model. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the architecture or loss terms in a load-bearing way; the RCMD and PACM components are presented as design choices whose value is assessed by external task performance. The arbitrary-modality claim is an extrapolation from the tested settings, but this is a generalization question rather than a circular reduction of the reported numbers to their own inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Cross-modal correspondences learned via distillation on the training distribution transfer to arbitrary missing-modality test cases.
- domain assumption Atlas-guided masking prioritizes disease-vulnerable regions without introducing selection bias on downstream tasks.
Reference graph
Works this paper leans on
-
[1]
Alzheimer Disease & Associated Disorders21, 249–258 (2007)
Beekly, D.L., Ramos, E.M., Lee, W.W., Deitrich, W.D., Jacka, M.E., Wu, J., Hub- bard, J.L., Koepsell, T.D., Morris, J.C., Kukull, W.A.: The national Alzheimer’s coordinating center (NACC) database: The uniform data set. Alzheimer Disease & Associated Disorders21, 249–258 (2007)
work page 2007
-
[2]
Acta Neuropathologica112, 389 – 404 (2006)
Braak, H., Alafuzoff, I., Arzberger, T., Kretzschmar, H.A., Tredici, K.D.: Staging of Alzheimer disease-associated neurofibrillary pathology using paraffin sections and immunocytochemistry. Acta Neuropathologica112, 389 – 404 (2006)
work page 2006
-
[3]
Alzheimer’s & Dementia21(2024)
Chen, K., Weng, Y., you Huang, Y., Zhang, Y., Dening, T., et al.: A multi-view learning approach with diffusion model to synthesize FDG PET from MRI T1WI for diagnosis of Alzheimer’s disease. Alzheimer’s & Dementia21(2024)
work page 2024
-
[4]
The Lancet Neurology19(11), 951–962 (2020)
Ch´ etelat, G., Arbizu, J., Barthel, H., Garibotto, V., Law, I., Morbelli, S., Van De Giessen, E., Agosta, F., Barkhof, F., Brooks, D.J., et al.: Amyloid-PET and 18F-FDG-PET in the diagnostic investigation of Alzheimer’s disease and other dementias. The Lancet Neurology19(11), 951–962 (2020)
work page 2020
-
[5]
Deng, Z., Wang, H., Huang, Z., Zhang, L., Aviles-Rivero, A.I., Liu, C., He, J., Kourtzi, Z., Sch¨ onlieb, C.B.: Brain foundation models with hypergraph dynamic adapter for brain disease analysis. Pattern Recognition p. 112595 (2025)
work page 2025
-
[6]
IEEE Transac- tions on Medical Imaging44, 4037–4048 (2025)
Ding, R., Lu, H., Liu, M.: DenseFormer-MoE: A dense transformer foundation model with mixture of experts for multi-task brain image analysis. IEEE Transac- tions on Medical Imaging44, 4037–4048 (2025)
work page 2025
-
[7]
International psychogeriatrics21(4), 672–687 (2009)
Ellis, K.A., Bush, A.I., Darby, D., De Fazio, D., Foster, J., Hudson, P., Lauten- schlager, N.T., Lenzo, N., Martins, R.N., Maruff, P., et al.: The Australian imaging, biomarkers and lifestyle (AIBL) study of aging: methodology and baseline char- acteristics of 1112 individuals recruited for a longitudinal study of Alzheimer’s disease. International psych...
work page 2009
-
[8]
In: International Workshop on Machine Learning in Medical Imaging
Erdur, A.C., Beischl, C., Scholz, D., Pan, J., Wiestler, B., Rueckert, D., Peeken, J.C.: MultiMAE for brain MRIs: Robustness to missing inputs using multi-modal masked autoencoder. In: International Workshop on Machine Learning in Medical Imaging. pp. 572–582. Springer (2025)
work page 2025
-
[9]
Advances in Neural Information Processing Sys- tems37, 67850–67900 (2025) 10 G
Han, X., Nguyen, H., Harris, C., Ho, N., Saria, S.: FuseMoE: Mixture-of-experts transformers for fleximodal fusion. Advances in Neural Information Processing Sys- tems37, 67850–67900 (2025) 10 G. Yang et al
work page 2025
-
[10]
Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. pp. 6546–6555 (2018)
work page 2018
-
[11]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
He, K., Chen, X., Xie, S., Li, Y., Doll´ ar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15979–15988 (2022)
work page 2022
-
[12]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
Hu, W., Guan, Z., Yang, P., Li, J., Liu, Y., Gan, S., Cai, T., Zhang, A., Zhang, T., Qu, J., et al.: Anatomy-guided multimodal graph networks for Alzheimer’s disease: Integrative analysis of cross-modal brain connectivity signatures. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 66–75. Springer (2025)
work page 2025
-
[13]
Alzheimer’s & Dementia20(8), 5143–5169 (2024)
Jack Jr, C.R., Andrews, J.S., Beach, T.G., Buracchio, T., Dunn, B., Graf, A., Hansson, O., Ho, C., Jagust, W., McDade, E., et al.: Revised criteria for diagnosis and staging of Alzheimer’s disease: Alzheimer’s association workgroup. Alzheimer’s & Dementia20(8), 5143–5169 (2024)
work page 2024
-
[14]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Jang, J., Hwang, D.: M3T: three-dimensional medical image classifier using multi- plane and multi-slice transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20686–20697 (2022)
work page 2022
-
[15]
In: International conference on machine learning
Kool, W., Van Hoof, H., Welling, M.: Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement. In: International conference on machine learning. vol. 97, pp. 3499–3508. PMLR (2019)
work page 2019
-
[16]
Computers in Biology and Medicine157, 106788 (2023)
Leng, Y., Cui, W., Peng, Y., Yan, C., Cao, Y., Yan, Z., Chen, S., Jiang, X., Zheng, J., Initiative, A.D.N., et al.: Multimodal cross enhanced fusion network for diagnosis of Alzheimer’s disease and subjective memory complaints. Computers in Biology and Medicine157, 106788 (2023)
work page 2023
-
[17]
In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Rui, S., Chen, L., Tang, Z., Wang, L., Liu, M., Zhang, S., Wang, X.: Multi-modal vision pre-training for medical image analysis. In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5164–5174 (2025)
work page 2025
-
[18]
JAMA neurology77(6), 735–745 (2020)
Sperling, R.A., Donohue, M.C., Raman, R., Sun, C.K., Yaari, R., Holdridge, K., Siemers, E., et al.: Association of factors with elevated amyloid burden in clinically normal older individuals. JAMA neurology77(6), 735–745 (2020)
work page 2020
-
[19]
Tak, D., Garomsa, B.A., Zapaishchykova, A., Chaunzwa, T.L., Climent Pardo, J.C., Ye, Z., Zielke, J., Ravipati, Y., Pai, S., et al.: A generalizable foundation model for analysis of human brain MRI. Nature Neuroscience pp. 1–12 (2026)
work page 2026
-
[20]
Alzheimer’s & Dementia: Translational Research & Clinical Interventions3(2), 177–188 (2017)
Weiner, M.W., Harvey, D., Hayes, J., Landau, S.M., Aisen, P.S., Petersen, R.C., Tosun, D., Veitch, D.P., Jack Jr, C.R., Decarli, C., et al.: Effects of traumatic brain injury and posttraumatic stress disorder on development of Alzheimer’s dis- ease in vietnam veterans using the Alzheimer’s disease neuroimaging initiative: preliminary report. Alzheimer’s &...
work page 2017
-
[21]
Alzheimer’s & Dementia9(5), e111–e194 (2013)
Weiner, M.W., Veitch, D.P., Aisen, P.S., Beckett, L.A., Cairns, N.J., Green, R.C., Harvey, D., Jack, C.R., Jagust, W., Liu, E., et al.: The Alzheimer’s disease neu- roimaging initiative: a review of papers published since its inception. Alzheimer’s & Dementia9(5), e111–e194 (2013)
work page 2013
-
[22]
IEEE Journal of Biomedical and Health Informatics29(11), 8395–8408 (2025)
Yang, G., Du, K., Yang, Z., Du, Y., Cheung, E.Y.W., Zheng, Y., Yang, M., Kourtzi, Z., Sch¨ onlieb, C.B., Wang, S., Initiative, A.D.N.: ADFound: A foundation model for diagnosis and prognosis of Alzheimer’s disease. IEEE Journal of Biomedical and Health Informatics29(11), 8395–8408 (2025)
work page 2025
-
[23]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
Yin, L., Ye, C., Liu, T., Wu, J., Yan, T.: UniCross: Balanced multimodal learning for Alzheimer’s disease diagnosis by uni-modal separation and metadata-guided BrainAnytime: Anatomy-Aware Pretraining for Multi-modal Brain Imaging 11 cross-modal interaction. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 638...
work page 2025
-
[24]
Advances in Neural Information Processing Systems37, 98782–98805 (2025)
Yun, S., Choi, I., Peng, J., Wu, Y., Bao, J., Zhang, Q., Xin, J., et al.: Flex- MoE: Modeling arbitrary modality combination via the flexible mixture-of-experts. Advances in Neural Information Processing Systems37, 98782–98805 (2025)
work page 2025
-
[25]
In: Workshop on Large Language Models and Generative AI for Health at AAAI 2025 (2025)
Yun, S., Xin, J., Choi, I., Peng, J., Ding, Y., Long, Q., Chen, T.: Generate, then retrieve: Addressing missing modalities in multimodal learning via generative AI and MoE. In: Workshop on Large Language Models and Generative AI for Health at AAAI 2025 (2025)
work page 2025
-
[26]
Computers in Biology and Medicine162, 107050 (2023)
Zhang, J., He, X., Liu, Y., Cai, Q., Chen, H., Qing, L.: Multi-modal cross-attention network for Alzheimer’s disease diagnosis with multi-modality data. Computers in Biology and Medicine162, 107050 (2023)
work page 2023
-
[27]
IEEE Transactions on Medical Imaging44(6), 2594–2604 (2025)
Zhang, X., Ou, N., Basaran, B.D., Visentin, M., Qiao, M., Gu, R., Matthews, P.M., Liu, Y., Ye, C., Bai, W.: A foundation model for lesion segmentation on brain MRI with mixture of modality experts. IEEE Transactions on Medical Imaging44(6), 2594–2604 (2025)
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.