arxiv: 2604.12305 · v1 · submitted 2026-04-14 · 📡 eess.IV · cs.CV

Recognition: unknown

CBAM-Enhanced DenseNet121 for Multi-Class Chest X-Ray Classification with Grad-CAM Explainability

Utsho Kumar Dey

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:53 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords chest X-ray classificationpneumonia aetiologyCBAM attention moduleDenseNet121Grad-CAM visualizationmulti-class medical imagingtransfer learninginterpretable deep learning

0 comments

The pith

Adding a convolutional attention module to DenseNet121 produces a model that classifies chest X-rays as normal, bacterial pneumonia, or viral pneumonia at 84 percent accuracy while generating attention maps of lung regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to demonstrate that embedding the Convolutional Block Attention Module inside DenseNet121 creates a three-class classifier capable of separating normal chest X-rays from those showing bacterial or viral pneumonia. This distinction matters in low-resource settings because it can guide antibiotic use versus supportive care when radiologists are scarce. The authors run the model on chest X-ray data, record its accuracy and per-class area-under-curve values across repeated trials, and apply Grad-CAM to verify that the network attends to anatomically sensible areas. They also include a binary-task comparison showing that a popular EfficientNet variant underperforms even a basic custom network. If these results hold, the approach supplies both a decision and an interpretable visual cue that clinicians could inspect.

Core claim

By integrating the Convolutional Block Attention Module into DenseNet121 and training on labeled chest X-rays, the resulting model attains 84.29 percent mean test accuracy with standard deviation 1.14 percent across three independent runs, together with per-class AUC values of 0.9565 for bacterial pneumonia, 0.9610 for normal, and 0.9187 for viral pneumonia; Grad-CAM heat maps produced from the same network align with expected pulmonary anatomy for each label.

What carries the argument

The Convolutional Block Attention Module (CBAM) inserted into DenseNet121, which applies sequential channel-wise and spatial attention to feature maps so that the network emphasizes the most informative regions within each X-ray.

If this is right

The reported accuracy and AUC figures, obtained with three random seeds, supply a statistically repeatable baseline for three-class pneumonia classification.
Grad-CAM outputs supply visual explanations that could be reviewed by clinicians before acting on the model's label.
The binary-task comparison establishes that EfficientNetB3 does not automatically outperform simpler architectures on this imaging task.
The framework is positioned for use in resource-constrained clinics where automated triage plus attention maps could reduce reliance on scarce radiologists.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same attention-augmented backbone could be tested on other radiographic tasks that require distinguishing disease subtypes rather than simple presence or absence.
Domain-shift experiments that retrain only the final layers on images from new X-ray machines would reveal how much retraining is needed for deployment across sites.
Pairing the model's output with simple clinical variables such as patient age or fever duration could be checked to see whether combined accuracy rises without losing interpretability.

Load-bearing premise

Performance measured on the paper's chosen test collection of chest X-rays will remain stable when the same model encounters images acquired on different equipment, from different patient groups, or in different clinical environments.

What would settle it

Running the trained CBAM-DenseNet121 on a new collection of chest X-rays gathered from another hospital or region and observing whether accuracy falls below 75 percent or any per-class AUC drops below 0.85.

Figures

Figures reproduced from arXiv: 2604.12305 by Utsho Kumar Dey.

**Figure 1.** Figure 1: Representative chest X-ray samples from the Kermany dataset. Top row: Normal cases showing clear lung fields. Bottom row: Pneumonia cases exhibiting opacity and consolidation patterns [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Training data class distribution (binary labels). Pneumonia cases outnumber Normal cases by approximately 2.9:1, moti [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Phase 1 training history (frozen DenseNet121 base). Validation accuracy converges to 76.61% over 11 epochs. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Phase 2 training history (fine-tuning last 30 DenseNet layers). Early stopping triggered at epoch 6; best checkpoint retained. E. Confusion Matrix Analysis [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: shows the confusion matrix for CBAM-DenseNet121 on the three-class test set. Of 242 bacterial pneumonia cases, 226 are correctly classified (93.4%). Of 234 normal cases, 184 are correctly classified (78.6%). Of 148 viral pneumonia cases, 117 are correctly classified (79.1%). The primary error mode is confusion between viral pneumonia and normal cases (32 false negatives), radiologically plausible given tha… view at source ↗

**Figure 6.** Figure 6: ROC curves for CBAM-DenseNet121 (one-vs-rest evaluation). All three classes exceed AUC = 0.91, indicating clinically reliable discrimination. G. Grad-CAM Analysis [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Grad-CAM visualizations for CBAM-DenseNet121. Each row: (top) Bacterial Pneumonia, (middle) Normal, (bottom) Viral Pneumonia. Columns: original X-ray, class activation heatmap, jet-colormap overlay. Red regions indicate highest model attention. H. Discussion CBAM augments DenseNet121 in two complementary ways. Channel attention suppresses feature maps associated with background tissue while amplifying thos… view at source ↗

read the original abstract

Pneumonia remains a leading cause of childhood mortality worldwide, with a heavy burden in low-resource settings such as Bangladesh where radiologist availability is limited. Most existing deep learning approaches treat pneumonia detection as a binary problem, overlooking the clinically critical distinction between bacterial and viral aetiology. This paper proposes CBAM-DenseNet121, a transfer-learning framework that integrates the Convolutional Block Attention Module (CBAM) into DenseNet121 for three-class chest X-ray classification: Normal, Bacterial Pneumonia, and Viral Pneumonia. We also conduct a systematic binary-task baseline study revealing that EfficientNetB3 (73.88%) underperforms even the custom CNN baseline (78.53%) -- a practically important negative finding for medical imaging model selection. To ensure statistical reliability, all experiments were repeated three times with independent random seeds (42, 7, 123), and results are reported as mean +/- standard deviation. CBAM-DenseNet121 achieves 84.29% +/- 1.14% test accuracy with per-class AUC scores of 0.9565 +/- 0.0010, 0.9610 +/- 0.0014, and 0.9187 +/- 0.0037 for bacterial pneumonia, normal, and viral pneumonia respectively. Grad-CAM visualizations confirm that the model attends to anatomically plausible pulmonary regions for each class, supporting interpretable deployment in resource-constrained clinical environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes CBAM-DenseNet121, a transfer-learned DenseNet121 augmented with the Convolutional Block Attention Module, for three-class chest X-ray classification into Normal, Bacterial Pneumonia, and Viral Pneumonia. It reports mean test accuracy of 84.29% ± 1.14% and per-class AUCs of 0.9565 ± 0.0010 (bacterial), 0.9610 ± 0.0014 (normal), and 0.9187 ± 0.0037 (viral) across three independent runs with seeds 42, 7, and 123. Grad-CAM visualizations are provided to show attention on plausible pulmonary regions, and a binary-task baseline comparison is included in which a custom CNN (78.53%) outperforms EfficientNetB3 (73.88%).

Significance. If the performance claims hold on a properly documented, leakage-free test set, the work would provide a useful contribution to clinically relevant multi-class pneumonia classification in low-resource settings, with the added value of attention-based interpretability. The reported negative result for EfficientNetB3 versus a custom CNN in binary tasks could also inform practical model selection in medical imaging.

major comments (1)

[Methods / Experimental Setup] The manuscript provides no description of the dataset (source, total images per class, class balance, acquisition details, or patient demographics), the train/validation/test split strategy (including whether splits are patient-level to prevent leakage), or preprocessing/augmentation steps. These omissions make it impossible to verify the central claims of 84.29% ± 1.14% accuracy and the listed AUC values, as the metrics depend directly on the data partition and distribution.

minor comments (1)

[Abstract] The abstract states that results are reported as mean ± standard deviation but does not explicitly note that the three listed seeds correspond to the three independent runs; this detail appears later in the text and could be stated once in the abstract for immediate clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need for complete methodological transparency. We agree that the current manuscript version omitted key details on the dataset and experimental protocol, which are essential for reproducibility and verification of the reported metrics. We will incorporate a dedicated subsection addressing all points raised.

read point-by-point responses

Referee: [Methods / Experimental Setup] The manuscript provides no description of the dataset (source, total images per class, class balance, acquisition details, or patient demographics), the train/validation/test split strategy (including whether splits are patient-level to prevent leakage), or preprocessing/augmentation steps. These omissions make it impossible to verify the central claims of 84.29% ± 1.14% accuracy and the listed AUC values, as the metrics depend directly on the data partition and distribution.

Authors: We fully agree that these details were missing and that their absence prevents independent verification. In the revised manuscript we will add a new 'Dataset and Experimental Setup' subsection that explicitly states: (i) the source of the chest X-ray images (including whether they were collected from Bangladeshi clinical sites or drawn from a public repository), total images per class, and class balance; (ii) patient demographics when available; (iii) the exact train/validation/test partitioning procedure, with confirmation that splits are performed at the patient level to eliminate leakage; and (iv) the complete preprocessing pipeline together with the augmentation strategies applied during training. These additions will directly support the reported mean accuracy and per-class AUC figures. We have already drafted the required text and will include it in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ML results with no derivations or self-referential predictions

full rationale

The paper is an empirical machine-learning study reporting classification accuracies and AUCs from training CBAM-DenseNet121 on chest X-ray images. It contains no mathematical derivations, equations, or 'predictions' that reduce to fitted parameters by construction. All claims rest on repeated experimental runs with reported means and standard deviations. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results appear in the derivation chain, which is absent. The central performance numbers are direct measurements, not forced by any internal logic.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions in supervised deep learning for medical images plus several unspecified hyperparameters. No new physical entities or ad-hoc postulates are introduced.

free parameters (1)

learning rate, batch size, number of epochs, and other training hyperparameters
Neural network training requires these choices; specific values and tuning procedure are not provided in the abstract.

axioms (2)

domain assumption ImageNet-pretrained weights transfer usefully to chest X-ray classification
The transfer-learning framework depends on this common but unproven-in-abstract assumption.
domain assumption Dataset labels accurately reflect true bacterial versus viral etiology
Supervised training assumes ground-truth labels are correct; no verification details are given.

pith-pipeline@v0.9.0 · 5559 in / 1722 out tokens · 43482 ms · 2026-05-10T15:53:14.022837+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 3 canonical work pages

[1]

ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks,

X. Wang et al., “ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks,” in Proc. IEEE CVPR, Jul. 2017, pp. 2097 – 2106

2017
[2]

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

P. Rajpurkar et al., “CheXNet: Radiologist -Level Pneumonia Detection on Chest X -Rays with Deep Learning,” arXiv:1711.05225, 2017

work page Pith review arXiv 2017
[3]

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison,

J. Irvin et al., “CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison,” in Proc. AAAI, 2019, pp. 590–597

2019
[4]

Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images,

J. Schlemper et al., “Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images,” Medical Image Analysis, vol. 53, pp. 197–207, 2019

2019
[5]

CBAM: Convolutional Block Attention Module,

S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional Block Attention Module,” in Proc. ECCV, 2018, pp. 3–19

2018
[6]

Deep Image Mining for Diabetic Retinopathy Screening,

O. Quellec et al., “Deep Image Mining for Diabetic Retinopathy Screening,” Medical Image Analysis, vol. 39, pp. 178–193, 2017

2017
[7]

Attention -Guided CNN for Skin Lesion Classification with Visual Interpretability,

Y. Li et al., “Attention -Guided CNN for Skin Lesion Classification with Visual Interpretability,” IEEE Access, vol. 8, pp. 150686 – 150697, 2020

2020
[8]

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,

R. R. Selvaraju et al., “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” in Proc. IEEE ICCV, 2017, pp. 618–626

2017
[9]

Uncertainty -Aware Convolutional Neural Network for COVID -19 X-Ray Images Classification,

M. Gour and S. Jain, “Uncertainty -Aware Convolutional Neural Network for COVID -19 X-Ray Images Classification,” Computers in Biology and Medicine, vol. 140, p. 105047, 2022

2022
[10]

Deep Neural Networks in Medical Imaging for COVID -19 and Pneumonia Detection: A Review,

R. Singh, M. Kalra, and C. Nitiwarangkul, “Deep Neural Networks in Medical Imaging for COVID -19 and Pneumonia Detection: A Review,” Computers in Biology and Medicine, vol. 163, p. 107191, 2023

2023
[11]

Learning to Diagnose from Scratch by Exploiting Dependencies among Labels,

L. Yao et al., “Learning to Diagnose from Scratch by Exploiting Dependencies among Labels,” arXiv:1710.10501, 2017

work page arXiv 2017
[12]

A Transfer Learning Method with Deep Residual Network for Pediatric Pneumonia Diagnosis,

G. Liang and L. Zheng, “A Transfer Learning Method with Deep Residual Network for Pediatric Pneumonia Diagnosis,” Computer Methods and Programs in Biomedicine, vol. 187, p. 104964, 2020

2020
[13]

DenseMobileNet: An Efficient Deep Neural Network for Detecting COVID -19 and Pneumonia from Chest X -Ray Images,

A. Paul et al., “DenseMobileNet: An Efficient Deep Neural Network for Detecting COVID -19 and Pneumonia from Chest X -Ray Images,” in Proc. IEEE SSCI, 2021

2021
[14]

Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning,

D. S. Kermany et al., “Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning,” Cell, vol. 172, no. 5, pp. 1122–1131, 2018

2018
[15]

Densely Connected Convolutional Networks,

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in Proc. IEEE CVPR, 2017, pp. 4700–4708

2017
[16]

Deep Residual Learning for Image Recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. IEEE CVPR, 2016, pp. 770–778. Received (date of submission); accepted (date of acceptance); date of publication (date of publication) Digital Object Identifier 10.1109/ACCESS.2025.XXXXXXX VOLUME XX, 2025 1

work page doi:10.1109/access.2025.xxxxxxx 2016
[17]

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,

M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” in Proc. ICML, 2019, pp. 6105– 6114. AUTHOR UTSHO KUMAR DEY is currently pursuing the B.Sc. degree in Computer Science and Engineering from Northern University of Business and Technology Khulna, Bangladesh. His research interests include deep learning, medical...

2019