Recognition: unknown
DSVTLA: Deep Swin Vision Transformer-Based Transfer Learning Architecture for Multi-Type Cancer Histopathological Cancer Image Classification
Pith reviewed 2026-05-10 16:34 UTC · model grok-4.3
The pith
A hybrid Swin Transformer and ResNet50 model classifies multiple cancer histopathology images with up to 100% accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed DSVTLA framework integrates a hierarchical Swin Transformer with ResNet50-based convolution features extraction, enabling the model to capture both long-range contextual dependencies and fine-grained local morphological patterns within histopathological images, and when benchmarked alongside state-of-the-art CNN and transfer models including DenseNet121, DenseNet201, InceptionV3, ResNet50, EfficientNetB3, multiple ViT variants, and Swin Transformer models using a unified pipeline on multi-cancer datasets including Breast Cancer, Oral Cancer, Lung and Colon Cancer, Kidney Cancer, and Acute Lymphocytic Leukemia with both original and segmented images, it achieves 100% test accurac
What carries the argument
The hybrid architecture that pairs a hierarchical Swin Vision Transformer for global dependencies with ResNet50 convolutional features for local morphological details in a transfer learning setup for multi-cancer image classification.
Load-bearing premise
High accuracies measured on the specific multi-cancer datasets will generalize to new clinical images from different hospitals, scanners, and patient populations without overfitting or data-specific biases.
What would settle it
A clear drop in accuracy when the model is applied to a fresh collection of histopathological images gathered from independent clinical sites that use different scanners or staining methods.
Figures
read the original abstract
In this study, we proposed a deep Swin-Vision Transformer-based transfer learning architecture for robust multi-cancer histopathological image classification. The proposed framework integrates a hierarchical Swin Transformer with ResNet50-based convolution features extraction, enabling the model to capture both long-range contextual dependencies and fine-grained local morphological patterns within histopathological images. To validate the efficiency of the proposed architecture, an extensive experiment was executed on a comprehensive multi-cancer dataset including Breast Cancer, Oral Cancer, Lung and Colon Cancer, Kidney Cancer, and Acute Lymphocytic Leukemia (ALL), including both original and segmented images were analyzed to assess model robustness across heterogeneous clinical imaging conditions. Our approach is benchmarked alongside several state-of-the-art CNN and transfer models, including DenseNet121, DenseNet201, InceptionV3, ResNet50, EfficientNetB3, multiple ViT variants, and Swin Transformer models. However, all models were trained and validated using a unified pipeline, incorporating balanced data preprocessing, transfer learning, and fine-tuning strategies. The experimental results demonstrated that our proposed architecture consistently gained superior performance, reaching 100% test accuracy for lung-colon cancer, segmented leukemia datasets, and up to 99.23% accuracy for breast cancer classification. The model also achieved near-perfect precision, f1 score, and recall, indicating highly stable scores across divers cancer types. Overall, the proposed model establishes a highly accurate, interpretable, and also robust multi-cancer classification system, demonstrating strong benchmark for future research and provides a unified comparative assessment useful for designing reliable AI-assisted histopathological diagnosis and clinical decision-making.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DSVTLA, a hybrid architecture integrating a hierarchical Swin Vision Transformer with ResNet50-based convolutional feature extraction for transfer learning-based classification of multi-type cancer histopathological images (breast, oral, lung-colon, kidney, and acute lymphocytic leukemia, including both original and segmented variants). It benchmarks the model against multiple CNNs (DenseNet121/201, InceptionV3, ResNet50, EfficientNetB3) and ViT/Swin variants under a single unified pipeline of preprocessing, transfer learning, and fine-tuning, claiming consistent superiority with 100% test accuracy on lung-colon and segmented leukemia datasets and 99.23% on breast cancer, plus near-perfect precision/recall/F1 scores.
Significance. If the performance claims are supported by rigorous validation, the work would offer a useful empirical benchmark for hybrid transformer-CNN transfer learning in multi-cancer histopathology, highlighting the value of combining long-range context modeling with local morphological features. The unified comparative evaluation across heterogeneous datasets and model families is a constructive element for the field.
major comments (3)
- [§4, §3] §4 (Experimental Results) and §3 (Methodology): The reported 100% test accuracy on lung-colon cancer and segmented leukemia datasets (and 99.23% on breast) is presented without any description of the train/test split protocol. In histopathology, random per-image splits routinely induce leakage because patches from the same slide or patient share staining, scanner, and cellular patterns; the absence of patient-level or slide-level partitioning details directly undermines the central claim that the model achieves robust, generalizable superiority rather than an artifact of data correlation.
- [§4] §4 (Results tables/figures): No ablation studies isolate the contribution of the Swin Transformer hierarchy versus the ResNet50 backbone, nor are repeated random splits, cross-validation folds, or statistical significance tests (e.g., paired t-tests or McNemar’s test against baselines) reported. Without these, the assertion of consistent outperformance over DenseNet, EfficientNet, and ViT variants cannot be evaluated as load-bearing evidence.
- [§3] §3 (Dataset description): Dataset sizes, class balances, and the exact number of images per cancer type (original vs. segmented) are not quantified, nor is any multi-center or external hold-out validation mentioned. This omission makes it impossible to assess whether the near-perfect metrics reflect genuine robustness across the claimed “heterogeneous clinical imaging conditions.”
minor comments (3)
- [Abstract] Abstract: “gained superior performance” should read “achieved superior performance”; “divers cancer types” should be “diverse cancer types”; the phrase “also robust” is redundant.
- [Figures/Tables] Figure and table captions throughout: Ensure all axes, color legends, and metric definitions (e.g., whether accuracy is macro- or micro-averaged) are explicitly labeled for reproducibility.
- [§2] §2 (Related Work): A few recent transformer-based histopathology papers (e.g., post-2022 Swin or hybrid ViT works) appear to be missing; adding them would strengthen the positioning of DSVTLA.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which highlights important aspects of rigorous validation in histopathological image classification. We have addressed each major comment by clarifying our experimental setup where possible and committing to revisions that strengthen the manuscript's transparency and evidence.
read point-by-point responses
-
Referee: [§4, §3] §4 (Experimental Results) and §3 (Methodology): The reported 100% test accuracy on lung-colon cancer and segmented leukemia datasets (and 99.23% on breast) is presented without any description of the train/test split protocol. In histopathology, random per-image splits routinely induce leakage because patches from the same slide or patient share staining, scanner, and cellular patterns; the absence of patient-level or slide-level partitioning details directly undermines the central claim that the model achieves robust, generalizable superiority rather than an artifact of data correlation.
Authors: We agree that explicit details on the train/test split are essential and that patient- or slide-level partitioning is the gold standard to mitigate leakage in histopathology. Our experiments followed a standard 80/20 random per-image split for each public dataset, consistent with many prior transfer-learning studies on these collections. However, we recognize this does not fully eliminate the risk of correlation. In the revised manuscript we will (i) explicitly state the split ratios and random seed, (ii) discuss the leakage concern as a limitation, and (iii) where patient/slide metadata exists in the source datasets, re-run and report results under patient-level partitioning. These additions will appear in §3 and §4. revision: yes
-
Referee: [§4] §4 (Results tables/figures): No ablation studies isolate the contribution of the Swin Transformer hierarchy versus the ResNet50 backbone, nor are repeated random splits, cross-validation folds, or statistical significance tests (e.g., paired t-tests or McNemar’s test against baselines) reported. Without these, the assertion of consistent outperformance over DenseNet, EfficientNet, and ViT variants cannot be evaluated as load-bearing evidence.
Authors: We concur that ablation studies and statistical rigor are necessary to substantiate the hybrid architecture’s superiority. We will add (i) ablation experiments that systematically remove or replace the Swin Transformer hierarchy and the ResNet50 backbone, (ii) results from five independent random splits with mean ± standard deviation, and (iii) McNemar’s tests (and paired t-tests where appropriate) comparing DSVTLA against each baseline. These will be presented in revised §4 tables and text. revision: yes
-
Referee: [§3] §3 (Dataset description): Dataset sizes, class balances, and the exact number of images per cancer type (original vs. segmented) are not quantified, nor is any multi-center or external hold-out validation mentioned. This omission makes it impossible to assess whether the near-perfect metrics reflect genuine robustness across the claimed “heterogeneous clinical imaging conditions.”
Authors: We will expand §3 to report the precise number of images, class distributions, and original-versus-segmented counts for every cancer type. The datasets are drawn from well-known public repositories; we will cite their sources and note any documented multi-center provenance. We acknowledge the absence of an external hold-out set as a limitation and will add a dedicated paragraph discussing this point together with suggestions for future multi-center validation. revision: yes
Circularity Check
No circularity: empirical ML benchmark with independent test metrics
full rationale
The paper is an empirical study proposing a hybrid Swin Transformer + ResNet50 transfer-learning architecture and reporting its classification accuracies on multi-cancer histopathology datasets (original and segmented versions of breast, oral, lung-colon, kidney, and leukemia images). No mathematical derivation chain, predictive equations, or first-principles results are claimed; performance numbers (100% on lung-colon and segmented leukemia, 99.23% on breast) are direct outputs of a single unified training pipeline evaluated on held-out test splits. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the provided text. The central claims rest on observable benchmark comparisons against DenseNet, ResNet, ViT, and Swin baselines under identical preprocessing, which are falsifiable against external data and do not reduce to the model's own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- Hyperparameters for transfer learning and fine-tuning
axioms (1)
- domain assumption Hierarchical Swin Transformer captures long-range dependencies while ResNet50 extracts fine-grained local patterns
Reference graph
Works this paper leans on
-
[1]
and Laversanne, Mathieu and Soerjomataram, Isabelle and Jemal, Ahmedin and Bray, Freddie , title =
H. Sung et al., “Global Cancer Statistics 2020 : GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries,” vol. 71, no. 3, pp. 209 –249, 2021, doi: 10.3322/caac.21660
-
[2]
R. L. Siegel and K. D. Miller, “Cancer Statistics , 2020,” vol. 0, no. 0, pp. 1 –24, 2020, doi: 10.3322/caac.21590
-
[3]
Artificial intelligence in digital pathology — new tools for diagnosis and precision oncology,
K. A. Schalper and D. L. Rimm, “Artificial intelligence in digital pathology — new tools for diagnosis and precision oncology,” Nat. Rev. Clin. Oncol., doi: 10.1038/s41571-019-0252-y
-
[4]
Multi-omics data integration and drug screening of AML cancer using Generative Adversarial Network,
S. Afroz, N. Isalam, M. A. Hahib, M. S. Reza, M. A. Alam, “Multi-omics data integration and drug screening of AML cancer using Generative Adversarial Network,” Methods, 226:138–50, 2024
2024
-
[5]
Influence Function of Multiple Kernel Canonical Analysis to Identify Outliers in Imaging Genetics Data
M. A. Alam, V. Calhoun, and Y. Wang, “Influence Function of Multiple Kernel Canonical Analysis to Identify Outliers in Imaging Genetics Data”
-
[6]
Higher-order Regularized Kernel CCA,
M. A. Alam, “Higher-order Regularized Kernel CCA,” no. 2, pp. 1–4, 2013, doi: 10.1109/ICMLA.2013.76
-
[7]
Deep residual learning for image recognition
K. He and J. Sun, “Deep Residual Learning for Image Recognition,” 2016, doi: 10.1109/CVPR.2016.90
-
[8]
Densely Connected Convolutional Networks
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” 2017, doi: 10.1109/CVPR.2017.243
-
[9]
Kernel Choice for Unsupervised Kernel Methods ,
M. A. Alam, “Kernel Choice for Unsupervised Kernel Methods ,” The Graduate University for Advanced Studies, Hayama, Kanagawa, Japan, 2014
2014
-
[10]
J. Zhu et al., “Associations between genetically predicted plasma protein levels and Alzheimer ’ s disease risk : a study using genetic prediction models,” Alzheimers. Res. Ther., pp. 1–16, 2024, doi: 10.1186/s13195- 023-01378-4
-
[11]
Swin Transformer : Hierarchical Vision Transformer using Shifted Windows,
Y. Lin et al., “Swin Transformer : Hierarchical Vision Transformer using Shifted Windows,” pp. 1 –11
-
[12]
A Dataset for Breast Cancer Histopathological Image Classification,
F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, “A Dataset for Breast Cancer Histopathological Image Classification,” vol. 9294, no. c, pp. 1–8, 2015, doi: 10.1109/TBME.2015.2496264
-
[13]
M.A. Alam. et al., “ Robust kernel canonical correlation analysis to detect gene-gene co-associations: A case study in genetics,” J Bioinform Comput Biol. 2019;17(4):1950028.2021
-
[14]
On Breast Cancer Detection : An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset,
A. F. M. Agarap, “On Breast Cancer Detection : An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset,” no. 1, pp. 7–11, 2018
2018
-
[15]
N. Islam, M. Hasan, K. Hossain, G. R. Alam, Z. Uddin, and A. Soylu, “Vision transformer and explainable transfer learning models for auto detection of kidney cyst , stone and tumor from CT ‑ radiography,” Sci. Rep., pp. 1–14, 2022, doi: 10.1038/s41598-022-15634-4
-
[16]
Lung and Colon Cancer Histopathological Image,
D. Lc et al., “Lung and Colon Cancer Histopathological Image,” pp. 1–2
-
[17]
Data in brief Histopathological imaging database for oral cancer analysis,
T. Yesmin, L. B. Mahanta, A. K. Das, and J. D. Sarma, “Data in brief Histopathological imaging database for oral cancer analysis,” Data Br., vol. 29, p. 105114, 2020, doi: 10.1016/j.dib.2020.105114
-
[18]
Lung Cancer Detection and Classification from Chest CT Scans Using Machine Learning Techniques,
M. Kashif and I. Abunadi, “Lung Cancer Detection and Classification from Chest CT Scans Using Machine Learning Techniques,” pp. 6–9, 2021
2021
-
[19]
On Breast Cancer Detection : An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset,
A. F. M. Agarap, “On Breast Cancer Detection : An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset,” no. 1, pp. 5–9, 2018
2018
-
[20]
Detection of Cervical Cancer with Texture Analysis using Machine Learning Models,
T. J. Nagalakshmi, N. Nalini, P. Jagadeesh, P. S. Bharathi, V. Amudha, and G. Ramkumar, “Detection of Cervical Cancer with Texture Analysis using Machine Learning Models,” 2022 Int. Conf. Adv. Comput. Commun. Appl. Informatics, pp. 1–6, doi: 10.1109/ACCAI53970.2022.9752550
-
[21]
Detection of Breast Cancer Using Machine Learning and Deep Learning Methods,
D. Kale, “Detection of Breast Cancer Using Machine Learning and Deep Learning Methods,” 2022 3rd Int. Conf. Intell. Eng. Manag., pp. 1–6, 2022, doi: 10.1109/ICIEM54221.2022.9853080
-
[22]
Study of Machine Learning Models for the Prediction and Detection of Lungs Cancer,
A. Singh, P. R. Kumar, and R. Rastogi, “Study of Machine Learning Models for the Prediction and Detection of Lungs Cancer,” 2022 11th Int. Conf. Syst. Model. Adv. Res. Trends , pp. 1243 –1248, 2022, doi: 10.1109/SMART55829.2022.10047610
-
[23]
Kidney Cancer Detection using Deep Learning Models,
K. Rajkumar, “Kidney Cancer Detection using Deep Learning Models,” 2023 7th Int. Conf. Trends Electron. Informatics, no. Icoei, pp. 1197–1203, 2023, doi: 10.1109/ICOEI56765.2023.10125589
-
[24]
S. Rajeswari, C. S. Vasanth, C. Bhavana, and K. S. S. Chowdary, “Detection and Classification of Various Types of Leukemia Using Image Processing , Transfer Learning and Ensemble Averaging Techniques,” 2022 2nd Asian Conf. Innov. Technol., pp. 1–6, 2022, doi: 10.1109/ASIANCON55314.2022.9909377
-
[25]
S. Rezayi, N. Mohammadzadeh, H. Bouraghi, S. Saeedi, and A. Mohammadpour, “Timely Diagnosis of Acute Lymphoblastic Leukemia Using Artificial Intelligence -Oriented Deep Learning Methods,” vol. 2021, 2021, doi: 10.1155/2021/5478157
-
[26]
Research Article A Semi -supervised Deep Learning Method for Cervical Cell Classification,
S. Zhao, Y. He, J. Qin, and Z. Wang, “Research Article A Semi -supervised Deep Learning Method for Cervical Cell Classification,” vol. 2022, 2022, doi: 10.1155/2022/4376178
-
[27]
A Machine Learning Approach to Diagnosing Lung and Colon Cancer Using a Deep Learning-Based Classification Framework,
M. Masud, N. Sikder, A. Nahid, and A. K. Bairagi, “A Machine Learning Approach to Diagnosing Lung and Colon Cancer Using a Deep Learning-Based Classification Framework,” pp. 1–20, 2021
2021
-
[28]
Intelligent Model for Brain Tumor Identification Using Deep Learning,
A. H. Khan et al., “Intelligent Model for Brain Tumor Identification Using Deep Learning,” vol. 2022, 2022, doi: 10.1155/2022/8104054
-
[29]
Boosting Breast Cancer Detection Using Convolutional Neural Network,
S. A. Alanazi et al., “Boosting Breast Cancer Detection Using Convolutional Neural Network,” vol. 2021, 2021, doi: 10.1155/2021/5528622
-
[30]
A. Akilandeswari et al., “Automatic Detection and Segmentation of Colorectal Cancer with Deep Residual Convolutional Neural Network,” vol. 2022, 2022, doi: 10.1155/2022/3415603
-
[31]
A. Bin Tufail et al. , “Review Article Deep Learning in Cancer Diagnosis and Prognosis Prediction : A Minireview on Challenges , Recent Trends , and Future Directions,” vol. 2021, 2021, doi: 10.1155/2021/9025470
-
[32]
Explainable lung cancer classification with ensemble transfer learning of VGG16 , Resnet50 and InceptionV3 using grad-cam,
Y. K. S, J. J. Jeya, T. R. Mahesh, S. B. Khan, S. Alzahrani, and M. Alojail, “Explainable lung cancer classification with ensemble transfer learning of VGG16 , Resnet50 and InceptionV3 using grad-cam,” pp. 1– 19, 2024
2024
-
[33]
A. Deshpande, V. V Estrela, and P. Patavardhan, “Neuroscience Informatics The DCT -CNN-ResNet50 architecture to classify brain tumors with super-resolution , convolutional neural network , and the ResNet50,” Neurosci. Informatics, vol. 1, no. 4, p. 100013, 2021, doi: 10.1016/j.neuri.2021.100013
-
[34]
Classification of Breast Cancer Histopathological Images Using DenseNet and Transfer Learning,
M. A. Wakili et al., “Classification of Breast Cancer Histopathological Images Using DenseNet and Transfer Learning,” vol. 2022, 2022, doi: 10.1155/2022/8904768
-
[35]
D. Putra and F. Axel, “ScienceDirect ScienceDirect Improving Warehouse Layout Effectiveness and Process Picking Improving Layout Effectiveness and Process Picking Efficiency Warehouse with the Discrete Event System Simulation Approach Efficiency with the Discrete Event System Simulation Approach Seventh Information Systems International Conference ( ISICO...
-
[36]
C. Series, “Cancer image classification based on DenseNet model Cancer image classification based on DenseNet model,” 2020, doi: 10.1088/1742-6596/1651/1/012143
-
[37]
Rethinking the Inception Architecture for Computer Vision,
C. Szegedy, V. Vanhoucke, and J. Shlens, “Rethinking the Inception Architecture for Computer Vision,” 2014
2014
-
[38]
Integration of feature enhancement technique in Google inception network for breast cancer detection and classification,
W. S. Admass, Y. Y. Munaye, and A. O. Salau, “Integration of feature enhancement technique in Google inception network for breast cancer detection and classification,” pp. 1 –30, 2024
2024
-
[39]
Revolutionizing breast ultrasound diagnostics with EfficientNet ‑ B7 and Explainable,
M. Latha, P. S. Kumar, R. R. Chandrika, T. R. Mahesh, V. V. Kumar, and S. Guluwadi, “Revolutionizing breast ultrasound diagnostics with EfficientNet ‑ B7 and Explainable,” BMC Med. Imaging , 2024, doi: 10.1186/s12880-024-01404-3
-
[40]
Swin Transformer V2 : Scaling Up Capacity and Resolution,
Z. Liu et al., “Swin Transformer V2 : Scaling Up Capacity and Resolution,” pp. 12009–12019
-
[41]
SwinCup : Cascaded swin transformer for histopathological structures segmentation in colorectal cancer,
U. Zidan, M. Medhat, and M. M. Abdelsamea, “SwinCup : Cascaded swin transformer for histopathological structures segmentation in colorectal cancer,” vol. 216, no. December 2022, 2023
2022
-
[42]
Learning Schizophrenia Imaging Genetics Data Via Multiple Kernel Canonical Correlation Analysis. ,
O. Richfield, M. A. Alam, V. D . Calhoun, Y.P. Wang, “Learning Schizophrenia Imaging Genetics Data Via Multiple Kernel Canonical Correlation Analysis. ,” BIBM 2016, Proceeding of the IEEE International Conference on Bioinformatics and Biomedicine; Shenzhen, China: BIBM; 2016
2016
-
[43]
S. R. Gunasekara, H. N. T. K. Kaldera, and M. B. Dissanayake, “A Systematic Approach for MRI Brain Tumor Localization and Segmentation Using Deep Learning and Active Contouring,” vol. 2021, 2021, doi: 10.1155/2021/6695108
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.