Recognition: unknown
CBAM-Enhanced DenseNet121 for Multi-Class Chest X-Ray Classification with Grad-CAM Explainability
Pith reviewed 2026-05-10 15:53 UTC · model grok-4.3
The pith
Adding a convolutional attention module to DenseNet121 produces a model that classifies chest X-rays as normal, bacterial pneumonia, or viral pneumonia at 84 percent accuracy while generating attention maps of lung regions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By integrating the Convolutional Block Attention Module into DenseNet121 and training on labeled chest X-rays, the resulting model attains 84.29 percent mean test accuracy with standard deviation 1.14 percent across three independent runs, together with per-class AUC values of 0.9565 for bacterial pneumonia, 0.9610 for normal, and 0.9187 for viral pneumonia; Grad-CAM heat maps produced from the same network align with expected pulmonary anatomy for each label.
What carries the argument
The Convolutional Block Attention Module (CBAM) inserted into DenseNet121, which applies sequential channel-wise and spatial attention to feature maps so that the network emphasizes the most informative regions within each X-ray.
If this is right
- The reported accuracy and AUC figures, obtained with three random seeds, supply a statistically repeatable baseline for three-class pneumonia classification.
- Grad-CAM outputs supply visual explanations that could be reviewed by clinicians before acting on the model's label.
- The binary-task comparison establishes that EfficientNetB3 does not automatically outperform simpler architectures on this imaging task.
- The framework is positioned for use in resource-constrained clinics where automated triage plus attention maps could reduce reliance on scarce radiologists.
Where Pith is reading between the lines
- The same attention-augmented backbone could be tested on other radiographic tasks that require distinguishing disease subtypes rather than simple presence or absence.
- Domain-shift experiments that retrain only the final layers on images from new X-ray machines would reveal how much retraining is needed for deployment across sites.
- Pairing the model's output with simple clinical variables such as patient age or fever duration could be checked to see whether combined accuracy rises without losing interpretability.
Load-bearing premise
Performance measured on the paper's chosen test collection of chest X-rays will remain stable when the same model encounters images acquired on different equipment, from different patient groups, or in different clinical environments.
What would settle it
Running the trained CBAM-DenseNet121 on a new collection of chest X-rays gathered from another hospital or region and observing whether accuracy falls below 75 percent or any per-class AUC drops below 0.85.
Figures
read the original abstract
Pneumonia remains a leading cause of childhood mortality worldwide, with a heavy burden in low-resource settings such as Bangladesh where radiologist availability is limited. Most existing deep learning approaches treat pneumonia detection as a binary problem, overlooking the clinically critical distinction between bacterial and viral aetiology. This paper proposes CBAM-DenseNet121, a transfer-learning framework that integrates the Convolutional Block Attention Module (CBAM) into DenseNet121 for three-class chest X-ray classification: Normal, Bacterial Pneumonia, and Viral Pneumonia. We also conduct a systematic binary-task baseline study revealing that EfficientNetB3 (73.88%) underperforms even the custom CNN baseline (78.53%) -- a practically important negative finding for medical imaging model selection. To ensure statistical reliability, all experiments were repeated three times with independent random seeds (42, 7, 123), and results are reported as mean +/- standard deviation. CBAM-DenseNet121 achieves 84.29% +/- 1.14% test accuracy with per-class AUC scores of 0.9565 +/- 0.0010, 0.9610 +/- 0.0014, and 0.9187 +/- 0.0037 for bacterial pneumonia, normal, and viral pneumonia respectively. Grad-CAM visualizations confirm that the model attends to anatomically plausible pulmonary regions for each class, supporting interpretable deployment in resource-constrained clinical environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CBAM-DenseNet121, a transfer-learned DenseNet121 augmented with the Convolutional Block Attention Module, for three-class chest X-ray classification into Normal, Bacterial Pneumonia, and Viral Pneumonia. It reports mean test accuracy of 84.29% ± 1.14% and per-class AUCs of 0.9565 ± 0.0010 (bacterial), 0.9610 ± 0.0014 (normal), and 0.9187 ± 0.0037 (viral) across three independent runs with seeds 42, 7, and 123. Grad-CAM visualizations are provided to show attention on plausible pulmonary regions, and a binary-task baseline comparison is included in which a custom CNN (78.53%) outperforms EfficientNetB3 (73.88%).
Significance. If the performance claims hold on a properly documented, leakage-free test set, the work would provide a useful contribution to clinically relevant multi-class pneumonia classification in low-resource settings, with the added value of attention-based interpretability. The reported negative result for EfficientNetB3 versus a custom CNN in binary tasks could also inform practical model selection in medical imaging.
major comments (1)
- [Methods / Experimental Setup] The manuscript provides no description of the dataset (source, total images per class, class balance, acquisition details, or patient demographics), the train/validation/test split strategy (including whether splits are patient-level to prevent leakage), or preprocessing/augmentation steps. These omissions make it impossible to verify the central claims of 84.29% ± 1.14% accuracy and the listed AUC values, as the metrics depend directly on the data partition and distribution.
minor comments (1)
- [Abstract] The abstract states that results are reported as mean ± standard deviation but does not explicitly note that the three listed seeds correspond to the three independent runs; this detail appears later in the text and could be stated once in the abstract for immediate clarity.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the need for complete methodological transparency. We agree that the current manuscript version omitted key details on the dataset and experimental protocol, which are essential for reproducibility and verification of the reported metrics. We will incorporate a dedicated subsection addressing all points raised.
read point-by-point responses
-
Referee: [Methods / Experimental Setup] The manuscript provides no description of the dataset (source, total images per class, class balance, acquisition details, or patient demographics), the train/validation/test split strategy (including whether splits are patient-level to prevent leakage), or preprocessing/augmentation steps. These omissions make it impossible to verify the central claims of 84.29% ± 1.14% accuracy and the listed AUC values, as the metrics depend directly on the data partition and distribution.
Authors: We fully agree that these details were missing and that their absence prevents independent verification. In the revised manuscript we will add a new 'Dataset and Experimental Setup' subsection that explicitly states: (i) the source of the chest X-ray images (including whether they were collected from Bangladeshi clinical sites or drawn from a public repository), total images per class, and class balance; (ii) patient demographics when available; (iii) the exact train/validation/test partitioning procedure, with confirmation that splits are performed at the patient level to eliminate leakage; and (iv) the complete preprocessing pipeline together with the augmentation strategies applied during training. These additions will directly support the reported mean accuracy and per-class AUC figures. We have already drafted the required text and will include it in the next version. revision: yes
Circularity Check
No circularity: purely empirical ML results with no derivations or self-referential predictions
full rationale
The paper is an empirical machine-learning study reporting classification accuracies and AUCs from training CBAM-DenseNet121 on chest X-ray images. It contains no mathematical derivations, equations, or 'predictions' that reduce to fitted parameters by construction. All claims rest on repeated experimental runs with reported means and standard deviations. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the derivation chain, which is absent. The central performance numbers are direct measurements, not forced by any internal logic.
Axiom & Free-Parameter Ledger
free parameters (1)
- learning rate, batch size, number of epochs, and other training hyperparameters
axioms (2)
- domain assumption ImageNet-pretrained weights transfer usefully to chest X-ray classification
- domain assumption Dataset labels accurately reflect true bacterial versus viral etiology
Reference graph
Works this paper leans on
-
[1]
ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks,
X. Wang et al., “ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks,” in Proc. IEEE CVPR, Jul. 2017, pp. 2097 – 2106
2017
-
[2]
CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
P. Rajpurkar et al., “CheXNet: Radiologist -Level Pneumonia Detection on Chest X -Rays with Deep Learning,” arXiv:1711.05225, 2017
work page Pith review arXiv 2017
-
[3]
CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison,
J. Irvin et al., “CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison,” in Proc. AAAI, 2019, pp. 590–597
2019
-
[4]
Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images,
J. Schlemper et al., “Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images,” Medical Image Analysis, vol. 53, pp. 197–207, 2019
2019
-
[5]
CBAM: Convolutional Block Attention Module,
S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional Block Attention Module,” in Proc. ECCV, 2018, pp. 3–19
2018
-
[6]
Deep Image Mining for Diabetic Retinopathy Screening,
O. Quellec et al., “Deep Image Mining for Diabetic Retinopathy Screening,” Medical Image Analysis, vol. 39, pp. 178–193, 2017
2017
-
[7]
Attention -Guided CNN for Skin Lesion Classification with Visual Interpretability,
Y. Li et al., “Attention -Guided CNN for Skin Lesion Classification with Visual Interpretability,” IEEE Access, vol. 8, pp. 150686 – 150697, 2020
2020
-
[8]
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,
R. R. Selvaraju et al., “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” in Proc. IEEE ICCV, 2017, pp. 618–626
2017
-
[9]
Uncertainty -Aware Convolutional Neural Network for COVID -19 X-Ray Images Classification,
M. Gour and S. Jain, “Uncertainty -Aware Convolutional Neural Network for COVID -19 X-Ray Images Classification,” Computers in Biology and Medicine, vol. 140, p. 105047, 2022
2022
-
[10]
Deep Neural Networks in Medical Imaging for COVID -19 and Pneumonia Detection: A Review,
R. Singh, M. Kalra, and C. Nitiwarangkul, “Deep Neural Networks in Medical Imaging for COVID -19 and Pneumonia Detection: A Review,” Computers in Biology and Medicine, vol. 163, p. 107191, 2023
2023
-
[11]
Learning to Diagnose from Scratch by Exploiting Dependencies among Labels,
L. Yao et al., “Learning to Diagnose from Scratch by Exploiting Dependencies among Labels,” arXiv:1710.10501, 2017
-
[12]
A Transfer Learning Method with Deep Residual Network for Pediatric Pneumonia Diagnosis,
G. Liang and L. Zheng, “A Transfer Learning Method with Deep Residual Network for Pediatric Pneumonia Diagnosis,” Computer Methods and Programs in Biomedicine, vol. 187, p. 104964, 2020
2020
-
[13]
DenseMobileNet: An Efficient Deep Neural Network for Detecting COVID -19 and Pneumonia from Chest X -Ray Images,
A. Paul et al., “DenseMobileNet: An Efficient Deep Neural Network for Detecting COVID -19 and Pneumonia from Chest X -Ray Images,” in Proc. IEEE SSCI, 2021
2021
-
[14]
Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning,
D. S. Kermany et al., “Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning,” Cell, vol. 172, no. 5, pp. 1122–1131, 2018
2018
-
[15]
Densely Connected Convolutional Networks,
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in Proc. IEEE CVPR, 2017, pp. 4700–4708
2017
-
[16]
Deep Residual Learning for Image Recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. IEEE CVPR, 2016, pp. 770–778. Received (date of submission); accepted (date of acceptance); date of publication (date of publication) Digital Object Identifier 10.1109/ACCESS.2025.XXXXXXX VOLUME XX, 2025 1
-
[17]
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,
M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” in Proc. ICML, 2019, pp. 6105– 6114. AUTHOR UTSHO KUMAR DEY is currently pursuing the B.Sc. degree in Computer Science and Engineering from Northern University of Business and Technology Khulna, Bangladesh. His research interests include deep learning, medical...
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.