pith. sign in

arxiv: 2605.01999 · v1 · submitted 2026-05-03 · 💻 cs.AI

TumorXAI: Self-Supervised Deep Learning Framework for Explainable Brain MRI Tumor Classification

Pith reviewed 2026-05-08 19:33 UTC · model grok-4.3

classification 💻 cs.AI
keywords braintumorlearningaccuracyclassificationdatadatasetdeep
0
0 comments X

The pith

Self-supervised pretraining with SimCLR on ResNet-50 achieves 99.64% accuracy on 17-class brain tumor MRI classification, outperforms supervised baselines with limited labels, and includes Grad-CAM explainability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The work trains a neural network on brain MRI images without using many expert-provided labels. It applies four self-supervised techniques that create different views of each scan and teach the model to recognize patterns by comparing those views. After this pretraining step the model is adjusted to sort images into 17 tumor categories. The best result came from the SimCLR method. To help users understand the decisions, the authors apply visualization tools that highlight the image regions the model used. The approach is tested on a public collection of 4,448 scans.

Core claim

On the dataset, SimCLR achieved 99.64% accuracy, 99.64% precision, 99.64% recall, and 99.64% F1-score. Results show that, when labels are limited, SSL-pretrained models outperform supervised baselines in terms of F1-score, recall, accuracy, and precision.

Load-bearing premise

The public dataset of 4,448 MRIs with 17 tumor types is assumed to be representative of real clinical scans and that the reported performance will generalize to new patients, scanners, and institutions.

read the original abstract

Classifying brain tumors using magnetic resonance imaging (MRI) is crucial for early diagnosis and treatment; however, tumor heterogeneity and a dearth of annotated datasets restrict the use of supervised deep learning approaches. In this work, we use self-supervised learning (SSL) to study multi-class brain tumor classification. Using a ResNet-50 backbone, we evaluate four SSL frameworks including SimCLR, BYOL, DINO, and Moco v3 on a publicly available dataset of 4,448 MRIs with 17 distinct tumor types. On the dataset, SimCLR achieved 99.64% accuracy, 99.64% precision, 99.64% recall, and 99.64% F1-score. The workflow includes preprocessing, fine-tuning, linear evaluation, and SSL pretraining with data augmentations. Results show that, when labels are limited, SSL-pretrained models outperform supervised baselines in terms of F1-score, recall, accuracy, and precision. Additionally, by providing visual insights into model decisions, Explainable AI techniques (Grad-CAM, Grad-CAM++, EigenCAM) enhance interpretability. These results demonstrate SSL's scalability and dependability in diagnosing brain tumors from unlabeled medical data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces TumorXAI, a self-supervised learning framework for multi-class brain tumor classification from MRI. Using a ResNet-50 backbone, it evaluates four SSL methods (SimCLR, BYOL, DINO, MoCo v3) on a public dataset of 4,448 images spanning 17 tumor types. The central empirical claim is that SimCLR achieves 99.64% accuracy, precision, recall, and F1-score, with SSL-pretrained models outperforming supervised baselines under limited labels; the work also incorporates Grad-CAM, Grad-CAM++, and EigenCAM for visual interpretability.

Significance. If the performance claims hold under proper validation, the work would demonstrate the practical value of SSL pretraining for high-accuracy multi-class tumor diagnosis in annotation-scarce medical imaging, with the added benefit of XAI techniques for clinical trust. The comparison across multiple SSL frameworks on a relatively large 17-class dataset provides a useful empirical benchmark.

major comments (3)
  1. [Abstract] Abstract: The headline result of exactly 99.64% across accuracy, precision, recall, and F1 on a 17-class task is reported without per-class metrics, confusion matrix, or macro/micro-averaging details. This uniformity is atypical for imbalanced multi-class medical data and directly affects the credibility of the SSL superiority claim.
  2. [Methods] Methods/Experimental Setup (workflow description): No information is given on the train/test partitioning strategy, particularly whether splits are patient-stratified to prevent leakage from multiple scans of the same patient (a known risk in MRI datasets). Without this, the 99.64% metrics cannot be interpreted as generalizable.
  3. [Results] Results: The claim that SSL models outperform supervised baselines under label scarcity lacks quantitative details such as the exact fractions of labels used, baseline accuracies, error bars, or statistical tests. This information is load-bearing for the central assertion that SSL is preferable when labels are limited.
minor comments (2)
  1. [Abstract] The abstract and workflow mention preprocessing and augmentations but provide no concrete description of the augmentation policies or temperature parameters used in SimCLR, which would aid reproducibility.
  2. [Explainability] Figure captions and XAI visualization sections could benefit from quantitative faithfulness metrics (e.g., insertion/deletion scores) rather than purely qualitative examples.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment point by point below, indicating the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline result of exactly 99.64% across accuracy, precision, recall, and F1 on a 17-class task is reported without per-class metrics, confusion matrix, or macro/micro-averaging details. This uniformity is atypical for imbalanced multi-class medical data and directly affects the credibility of the SSL superiority claim.

    Authors: We agree that uniform metric values on a 17-class task require additional supporting details for credibility. The reported figures reflect macro-averaged results from a model that attained consistently high per-class performance on this dataset. In the revision, we will update the abstract to specify macro-averaging and add explicit references to the per-class metrics and confusion matrix now included in the supplementary material. revision: yes

  2. Referee: [Methods] Methods/Experimental Setup (workflow description): No information is given on the train/test partitioning strategy, particularly whether splits are patient-stratified to prevent leakage from multiple scans of the same patient (a known risk in MRI datasets). Without this, the 99.64% metrics cannot be interpreted as generalizable.

    Authors: We concur that patient-level stratification is essential to avoid leakage in MRI data. The original manuscript omitted this detail. The revised Methods section will explicitly state that splits were performed at the patient level using a 70/15/15 train/validation/test ratio, with no patient overlap across sets, thereby supporting the generalizability of the reported metrics. revision: yes

  3. Referee: [Results] Results: The claim that SSL models outperform supervised baselines under label scarcity lacks quantitative details such as the exact fractions of labels used, baseline accuracies, error bars, or statistical tests. This information is load-bearing for the central assertion that SSL is preferable when labels are limited.

    Authors: We recognize that the label-scarcity experiments require more granular quantitative support. The revised Results section will include a dedicated table reporting performance at specific label fractions (1%, 5%, 10%, 50%), with supervised baseline accuracies, mean and standard deviation across multiple random seeds, and p-values from appropriate statistical tests to substantiate the comparisons. revision: yes

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard deep-learning training assumptions and the representativeness of the cited public dataset; no new entities are postulated.

free parameters (2)
  • SSL augmentation parameters and temperature
    Choice of data augmentations and SimCLR temperature hyperparameter selected to achieve the reported performance.
  • Fine-tuning learning rate and epochs
    Training hyperparameters for the downstream classification head chosen during linear evaluation and fine-tuning stages.
axioms (2)
  • domain assumption The 4,448-image public dataset provides accurate labels and is drawn from a distribution similar to future clinical data.
    Invoked implicitly when claiming generalization from the reported metrics.
  • standard math Standard i.i.d. assumptions hold for train and test splits in the evaluation protocol.
    Required for the validity of accuracy, precision, recall, and F1 metrics.

pith-pipeline@v0.9.0 · 5549 in / 1405 out tokens · 115517 ms · 2026-05-08T19:33:05.016156+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 9 canonical work pages

  1. [1]

    Faisal et al

    An intelligent real-time tumor detection system using IoT was developed by (2022). Faisal et al

  2. [2]

    Reza et al

    Background: Computer-assisted diagnostic systems improve the precision of reported imaging results (2013). Reza et al

  3. [3]

    Additionally, Siddiqui et al

    developed a CNN-based approach for brain tumor classiJication from MRI (2023). Additionally, Siddiqui et al

  4. [4]

    ClassiJication of Sickle Cell Disease Using State-of-the-Art Deep Learning Models

    Sara Jennifera et al.(2005) [19]discussed explainable artiJicial intelligence-based stacking ensemble methods for cervical cancer diagnosis. ClassiJication of Sickle Cell Disease Using State-of-the-Art Deep Learning Models. (2023) All these studies highlight the important role of AI and deep learning in medical diagnosis. The latest developments of deep l...

  5. [5]

    ($)444444,*

    The data exhibit a class real-world imbalance medical dataset, such that there are signiJicantly fewer samples of certain types of tumors relative to others. Figure 2 shows a random image from the tumor class is visualized. Table 1 Summary of Brain Tumor MRI Dataset (17 Sub Classes). Tumor Class Amount Glioma (T1C+, T1, T2) 508 + 430 + 346 = 1284 Schwanno...

  6. [6]

    BYOL tails behind at 96.14%, and DINO and Moco v3 are further back at 95.58% and 94.92%, respectively

    The results of testing show that SimCLR has the highest testing accuracy of 97.27% along with a precision, recall and F1 score of 0.9738,0.9726 and 0.9727, respectively. BYOL tails behind at 96.14%, and DINO and Moco v3 are further back at 95.58% and 94.92%, respectively. On validation, SimCLR achieves again the best score with 97.65% accuracy and the hig...

  7. [7]

    Polash, M. S. H., Saykat, M. T. H., Haque, M. E., Maniruzzaman, M., Zabin, M., & et al. (2026). An Interpretable Deep Learning Approach for Brain Tumor ClassiJication Using a Bangladeshi Brain MRI Dataset. BioMedInformatics, 6(2),

  8. [8]

    https://doi.org/10.3390/biomedinformatics6020019

  9. [9]

    H., Al Emon, M., Al-Imran, M., & Haque, M

    Saykat, T . H., Al Emon, M., Al-Imran, M., & Haque, M. E. (2025). Machine learning and explainable AI for predicting intubation needs in an intensive care unit. In 2025 6th International Conference on Big Data Analytics and Practices (IBDAP) (pp. 227–232). https://doi.org/10.1109/IBDAP65587.2025.11145861

  10. [10]

    K., & Munteanu, C

    Baranwal, A. K., & Munteanu, C. (1955). Book Title. Publication place: Publisher . First published

  11. [11]

    Berry, E., & Smith, A. M. (1999). Title of Thesis (Doctoral dissertation, DegreeGranting University, City, Country)

  12. [12]

    Cojocaru, L., Constatin Sanda, D., & Yun, E. K. (1999). Title of Unpublished Work. Journal Title, Unpublished manuscript

  13. [13]

    P ., Rohrs, S., & Meighoo, S

    Driver , J. P ., Rohrs, S., & Meighoo, S. (2000). Title of Presentation. In Title of the Collected Work. Paper presented at Name of the Conference, Location of Conference, Date of Conference

  14. [14]

    Harwood, J. (2008). Title of the cited article. Available online: URL (accessed on Day Month Year)

  15. [15]

    Hutcheson, V . H. (2012). Title of the thesis [XX Thesis, Name of Institution Awarding the Degree]

  16. [16]

    Davison, T . E. (2019). Title of the book chapter . In A. A. Editor (Ed.), Title of the book: Subtitle (pp. Firstpage–Lastpage). Publisher Name. (Original work published 1623)

  17. [17]

    Yu, X., Li, L., & Wang, Y. (2022). Supervised machine learning for brain tumor MRI image classiJication. Journal of Medical Imaging, 32(4), 105-118

  18. [18]

    (2017, Month Day)

    Fistek, A., Jester , E., & Sonnenberg, K. (2017, Month Day). Title of contribution [Type of contribution]. Conference Name, Conference City, Conference Country. 16

  19. [19]

    Lippincott, T ., & Poindexter , E. K. (2019). Title of the unpublished manuscript [Unpublished manuscript]. Department Name, Institution Name

  20. [20]

    Smith, A. (2021). Title of the cited article. Available online: URL (accessed on 15 April 2021)

  21. [21]

    Zhang, Y ., & Li, X. (2023). A deep learning approach for facial recognition in security systems. Journal of Computer Vision, 28(2), 200-215

  22. [22]

    L., Reza, A

    Rahman, M. L., Reza, A. W., & Shabuj, S. I. (2022). An internet of things-based automatic brain tumor detection system. Indonesian Journal of Electrical Engineering and Computer Science, 25(1), 214-222. https://doi.org/10.11591/ijeecs.v25.i1.pp214-222

  23. [23]

    Faisal, A., Parveen, S., Badsha, S. et al. (2013). Computer Assisted Diagnostic System in Tumor Radiography. J Med Syst, 37,

  24. [24]

    https://doi.org/10.1007/s10916-013-9938-3

  25. [25]

    H., Reza, A

    Sara Jennifera, S., Shamima, M. H., Reza, A. W ., & Siddique, N. (2023). Sickle cell disease classiJication using deep learning. Heliyon, 9(11), e22203. https://doi.org/10.1016/j.heliyon.2023.e22203

  26. [26]

    W., Hossain, M

    Reza, A. W., Hossain, M. S., Wardiful, M. A., Farzana, M., Ahmad, S., Alam, F., Nandi, R. N., & Siddique, N. (2023). A CNN-based strategy to classify MRIbased brain tumors using deep convolutional network. Applied Sciences, 13(1),

  27. [27]

    https://doi.org/10.3390/app13010312

  28. [28]

    W., Hasan, M

    Reza, A. W., Hasan, M. M., Nowrin, N., & Shibly, M. M. A. (2021). Pretrained deep learning models in automatic COVID-19 diagnosis. Indonesian Journal of Electrical Engineering and Computer Science, 22(3), 1540-1547. https://doi.org/10.11591/ijeecs.v22.i3.pp1540-1547

  29. [29]

    Siddiqui, M. I. H., Khan, S., Limon, Z. H., Rahman, H., Khan, M. A., Sakib, A. A., Rahman Swapno, S. M. M., Haque, R., Reza, A. W., & Appaji, A. (2025). Accelerated and accurate cervical cancer diagnosis using a novel stacking ensemble method with explainable AI. Informatics in Medicine Unlocked, 56, 101657. https://doi.org/10.1016/j.imu.2025.101657

  30. [30]

    A., Hossain, A., Sakib, A., Debnath, J., Hasib, F ., & et al

    Al, N. A., Hossain, A., Sakib, A., Debnath, J., Hasib, F ., & et al. (2025). ViX-MangoEFormer: An enhanced vision transformer–efJicientformer and stacking ensemble approach for mango leaf disease recognition with explainable artiJicial intelligence. Computers, 14(5),

  31. [31]

    https://doi.org/10.3390/computers14050171