TumorXAI: Self-Supervised Deep Learning Framework for Explainable Brain MRI Tumor Classification
Pith reviewed 2026-05-08 19:33 UTC · model grok-4.3
The pith
Self-supervised pretraining with SimCLR on ResNet-50 achieves 99.64% accuracy on 17-class brain tumor MRI classification, outperforms supervised baselines with limited labels, and includes Grad-CAM explainability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
On the dataset, SimCLR achieved 99.64% accuracy, 99.64% precision, 99.64% recall, and 99.64% F1-score. Results show that, when labels are limited, SSL-pretrained models outperform supervised baselines in terms of F1-score, recall, accuracy, and precision.
Load-bearing premise
The public dataset of 4,448 MRIs with 17 tumor types is assumed to be representative of real clinical scans and that the reported performance will generalize to new patients, scanners, and institutions.
read the original abstract
Classifying brain tumors using magnetic resonance imaging (MRI) is crucial for early diagnosis and treatment; however, tumor heterogeneity and a dearth of annotated datasets restrict the use of supervised deep learning approaches. In this work, we use self-supervised learning (SSL) to study multi-class brain tumor classification. Using a ResNet-50 backbone, we evaluate four SSL frameworks including SimCLR, BYOL, DINO, and Moco v3 on a publicly available dataset of 4,448 MRIs with 17 distinct tumor types. On the dataset, SimCLR achieved 99.64% accuracy, 99.64% precision, 99.64% recall, and 99.64% F1-score. The workflow includes preprocessing, fine-tuning, linear evaluation, and SSL pretraining with data augmentations. Results show that, when labels are limited, SSL-pretrained models outperform supervised baselines in terms of F1-score, recall, accuracy, and precision. Additionally, by providing visual insights into model decisions, Explainable AI techniques (Grad-CAM, Grad-CAM++, EigenCAM) enhance interpretability. These results demonstrate SSL's scalability and dependability in diagnosing brain tumors from unlabeled medical data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces TumorXAI, a self-supervised learning framework for multi-class brain tumor classification from MRI. Using a ResNet-50 backbone, it evaluates four SSL methods (SimCLR, BYOL, DINO, MoCo v3) on a public dataset of 4,448 images spanning 17 tumor types. The central empirical claim is that SimCLR achieves 99.64% accuracy, precision, recall, and F1-score, with SSL-pretrained models outperforming supervised baselines under limited labels; the work also incorporates Grad-CAM, Grad-CAM++, and EigenCAM for visual interpretability.
Significance. If the performance claims hold under proper validation, the work would demonstrate the practical value of SSL pretraining for high-accuracy multi-class tumor diagnosis in annotation-scarce medical imaging, with the added benefit of XAI techniques for clinical trust. The comparison across multiple SSL frameworks on a relatively large 17-class dataset provides a useful empirical benchmark.
major comments (3)
- [Abstract] Abstract: The headline result of exactly 99.64% across accuracy, precision, recall, and F1 on a 17-class task is reported without per-class metrics, confusion matrix, or macro/micro-averaging details. This uniformity is atypical for imbalanced multi-class medical data and directly affects the credibility of the SSL superiority claim.
- [Methods] Methods/Experimental Setup (workflow description): No information is given on the train/test partitioning strategy, particularly whether splits are patient-stratified to prevent leakage from multiple scans of the same patient (a known risk in MRI datasets). Without this, the 99.64% metrics cannot be interpreted as generalizable.
- [Results] Results: The claim that SSL models outperform supervised baselines under label scarcity lacks quantitative details such as the exact fractions of labels used, baseline accuracies, error bars, or statistical tests. This information is load-bearing for the central assertion that SSL is preferable when labels are limited.
minor comments (2)
- [Abstract] The abstract and workflow mention preprocessing and augmentations but provide no concrete description of the augmentation policies or temperature parameters used in SimCLR, which would aid reproducibility.
- [Explainability] Figure captions and XAI visualization sections could benefit from quantitative faithfulness metrics (e.g., insertion/deletion scores) rather than purely qualitative examples.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each major comment point by point below, indicating the revisions we will incorporate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline result of exactly 99.64% across accuracy, precision, recall, and F1 on a 17-class task is reported without per-class metrics, confusion matrix, or macro/micro-averaging details. This uniformity is atypical for imbalanced multi-class medical data and directly affects the credibility of the SSL superiority claim.
Authors: We agree that uniform metric values on a 17-class task require additional supporting details for credibility. The reported figures reflect macro-averaged results from a model that attained consistently high per-class performance on this dataset. In the revision, we will update the abstract to specify macro-averaging and add explicit references to the per-class metrics and confusion matrix now included in the supplementary material. revision: yes
-
Referee: [Methods] Methods/Experimental Setup (workflow description): No information is given on the train/test partitioning strategy, particularly whether splits are patient-stratified to prevent leakage from multiple scans of the same patient (a known risk in MRI datasets). Without this, the 99.64% metrics cannot be interpreted as generalizable.
Authors: We concur that patient-level stratification is essential to avoid leakage in MRI data. The original manuscript omitted this detail. The revised Methods section will explicitly state that splits were performed at the patient level using a 70/15/15 train/validation/test ratio, with no patient overlap across sets, thereby supporting the generalizability of the reported metrics. revision: yes
-
Referee: [Results] Results: The claim that SSL models outperform supervised baselines under label scarcity lacks quantitative details such as the exact fractions of labels used, baseline accuracies, error bars, or statistical tests. This information is load-bearing for the central assertion that SSL is preferable when labels are limited.
Authors: We recognize that the label-scarcity experiments require more granular quantitative support. The revised Results section will include a dedicated table reporting performance at specific label fractions (1%, 5%, 10%, 50%), with supervised baseline accuracies, mean and standard deviation across multiple random seeds, and p-values from appropriate statistical tests to substantiate the comparisons. revision: yes
Axiom & Free-Parameter Ledger
free parameters (2)
- SSL augmentation parameters and temperature
- Fine-tuning learning rate and epochs
axioms (2)
- domain assumption The 4,448-image public dataset provides accurate labels and is drawn from a distribution similar to future clinical data.
- standard math Standard i.i.d. assumptions hold for train and test splits in the evaluation protocol.
Reference graph
Works this paper leans on
-
[1]
Faisal et al
An intelligent real-time tumor detection system using IoT was developed by (2022). Faisal et al
2022
-
[2]
Reza et al
Background: Computer-assisted diagnostic systems improve the precision of reported imaging results (2013). Reza et al
2013
-
[3]
Additionally, Siddiqui et al
developed a CNN-based approach for brain tumor classiJication from MRI (2023). Additionally, Siddiqui et al
2023
-
[4]
ClassiJication of Sickle Cell Disease Using State-of-the-Art Deep Learning Models
Sara Jennifera et al.(2005) [19]discussed explainable artiJicial intelligence-based stacking ensemble methods for cervical cancer diagnosis. ClassiJication of Sickle Cell Disease Using State-of-the-Art Deep Learning Models. (2023) All these studies highlight the important role of AI and deep learning in medical diagnosis. The latest developments of deep l...
2005
-
[5]
($)444444,*
The data exhibit a class real-world imbalance medical dataset, such that there are signiJicantly fewer samples of certain types of tumors relative to others. Figure 2 shows a random image from the tumor class is visualized. Table 1 Summary of Brain Tumor MRI Dataset (17 Sub Classes). Tumor Class Amount Glioma (T1C+, T1, T2) 508 + 430 + 346 = 1284 Schwanno...
2048
-
[6]
BYOL tails behind at 96.14%, and DINO and Moco v3 are further back at 95.58% and 94.92%, respectively
The results of testing show that SimCLR has the highest testing accuracy of 97.27% along with a precision, recall and F1 score of 0.9738,0.9726 and 0.9727, respectively. BYOL tails behind at 96.14%, and DINO and Moco v3 are further back at 95.58% and 94.92%, respectively. On validation, SimCLR achieves again the best score with 97.65% accuracy and the hig...
2025
-
[7]
Polash, M. S. H., Saykat, M. T. H., Haque, M. E., Maniruzzaman, M., Zabin, M., & et al. (2026). An Interpretable Deep Learning Approach for Brain Tumor ClassiJication Using a Bangladeshi Brain MRI Dataset. BioMedInformatics, 6(2),
2026
-
[8]
https://doi.org/10.3390/biomedinformatics6020019
-
[9]
H., Al Emon, M., Al-Imran, M., & Haque, M
Saykat, T . H., Al Emon, M., Al-Imran, M., & Haque, M. E. (2025). Machine learning and explainable AI for predicting intubation needs in an intensive care unit. In 2025 6th International Conference on Big Data Analytics and Practices (IBDAP) (pp. 227–232). https://doi.org/10.1109/IBDAP65587.2025.11145861
-
[10]
K., & Munteanu, C
Baranwal, A. K., & Munteanu, C. (1955). Book Title. Publication place: Publisher . First published
1955
-
[11]
Berry, E., & Smith, A. M. (1999). Title of Thesis (Doctoral dissertation, DegreeGranting University, City, Country)
1999
-
[12]
Cojocaru, L., Constatin Sanda, D., & Yun, E. K. (1999). Title of Unpublished Work. Journal Title, Unpublished manuscript
1999
-
[13]
P ., Rohrs, S., & Meighoo, S
Driver , J. P ., Rohrs, S., & Meighoo, S. (2000). Title of Presentation. In Title of the Collected Work. Paper presented at Name of the Conference, Location of Conference, Date of Conference
2000
-
[14]
Harwood, J. (2008). Title of the cited article. Available online: URL (accessed on Day Month Year)
2008
-
[15]
Hutcheson, V . H. (2012). Title of the thesis [XX Thesis, Name of Institution Awarding the Degree]
2012
-
[16]
Davison, T . E. (2019). Title of the book chapter . In A. A. Editor (Ed.), Title of the book: Subtitle (pp. Firstpage–Lastpage). Publisher Name. (Original work published 1623)
2019
-
[17]
Yu, X., Li, L., & Wang, Y. (2022). Supervised machine learning for brain tumor MRI image classiJication. Journal of Medical Imaging, 32(4), 105-118
2022
-
[18]
(2017, Month Day)
Fistek, A., Jester , E., & Sonnenberg, K. (2017, Month Day). Title of contribution [Type of contribution]. Conference Name, Conference City, Conference Country. 16
2017
-
[19]
Lippincott, T ., & Poindexter , E. K. (2019). Title of the unpublished manuscript [Unpublished manuscript]. Department Name, Institution Name
2019
-
[20]
Smith, A. (2021). Title of the cited article. Available online: URL (accessed on 15 April 2021)
2021
-
[21]
Zhang, Y ., & Li, X. (2023). A deep learning approach for facial recognition in security systems. Journal of Computer Vision, 28(2), 200-215
2023
-
[22]
Rahman, M. L., Reza, A. W., & Shabuj, S. I. (2022). An internet of things-based automatic brain tumor detection system. Indonesian Journal of Electrical Engineering and Computer Science, 25(1), 214-222. https://doi.org/10.11591/ijeecs.v25.i1.pp214-222
-
[23]
Faisal, A., Parveen, S., Badsha, S. et al. (2013). Computer Assisted Diagnostic System in Tumor Radiography. J Med Syst, 37,
2013
-
[24]
https://doi.org/10.1007/s10916-013-9938-3
-
[25]
Sara Jennifera, S., Shamima, M. H., Reza, A. W ., & Siddique, N. (2023). Sickle cell disease classiJication using deep learning. Heliyon, 9(11), e22203. https://doi.org/10.1016/j.heliyon.2023.e22203
-
[26]
W., Hossain, M
Reza, A. W., Hossain, M. S., Wardiful, M. A., Farzana, M., Ahmad, S., Alam, F., Nandi, R. N., & Siddique, N. (2023). A CNN-based strategy to classify MRIbased brain tumors using deep convolutional network. Applied Sciences, 13(1),
2023
-
[27]
https://doi.org/10.3390/app13010312
-
[28]
Reza, A. W., Hasan, M. M., Nowrin, N., & Shibly, M. M. A. (2021). Pretrained deep learning models in automatic COVID-19 diagnosis. Indonesian Journal of Electrical Engineering and Computer Science, 22(3), 1540-1547. https://doi.org/10.11591/ijeecs.v22.i3.pp1540-1547
-
[29]
Siddiqui, M. I. H., Khan, S., Limon, Z. H., Rahman, H., Khan, M. A., Sakib, A. A., Rahman Swapno, S. M. M., Haque, R., Reza, A. W., & Appaji, A. (2025). Accelerated and accurate cervical cancer diagnosis using a novel stacking ensemble method with explainable AI. Informatics in Medicine Unlocked, 56, 101657. https://doi.org/10.1016/j.imu.2025.101657
-
[30]
A., Hossain, A., Sakib, A., Debnath, J., Hasib, F ., & et al
Al, N. A., Hossain, A., Sakib, A., Debnath, J., Hasib, F ., & et al. (2025). ViX-MangoEFormer: An enhanced vision transformer–efJicientformer and stacking ensemble approach for mango leaf disease recognition with explainable artiJicial intelligence. Computers, 14(5),
2025
-
[31]
https://doi.org/10.3390/computers14050171
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.