arxiv: 2605.02328 · v1 · submitted 2026-05-04 · 💻 cs.CV

Recognition: 1 theorem link

Improving Imbalanced Multi-Label Chest X-Ray Diagnosis via CBAM-Enhanced CNN Backbones

Duy Nguyen Huu , Duy Hoang Khuong , Ngu Huynh Cong Viet

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords chest x-raymulti-label classificationCBAMimbalanced dataCNNattention modulemedical imagingdisease diagnosis

0 comments

The pith

Embedding CBAM attention blocks into CNN backbones raises mean AUC to 0.8695 on multi-label chest X-ray diagnosis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether adding Convolutional Block Attention Modules to standard CNN architectures can improve automated detection of multiple co-occurring chest diseases in X-ray images. It targets two persistent problems: severe class imbalance across disease labels and the need to localize several pathologies within a single scan. The proposed integration refines feature maps by emphasizing informative channels and spatial locations before classification. On the ChestXray14 benchmark the method records a mean AUC of 0.8695 and exceeds several published baselines. Readers would care because the change is simple to insert into existing networks and could support faster, more consistent initial screening of a common imaging exam.

Core claim

Inspired by the feature-refinement capability of CBAM and the extraction power of CNN blocks, the authors integrate CBAM modules directly into traditional CNN backbones. This produces an enhanced model that achieves a mean AUC of 0.8695 on the ChestXray14 dataset and outperforms several state-of-the-art baselines in the multi-label setting.

What carries the argument

CBAM-enhanced CNN backbone: standard convolutional blocks with inserted Convolutional Block Attention Modules that refine features by weighting channels and spatial regions for multi-label output.

If this is right

The model handles class imbalance and simultaneous pathologies more effectively than baseline CNNs.
Mean AUC on the ChestXray14 dataset reaches 0.8695, exceeding multiple prior methods.
The approach offers a reusable way to retrofit attention into existing CNN backbones for medical imaging.
Automated feature extraction becomes more reliable for thoracic disease diagnosis tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same insertion pattern could be tested on other imbalanced multi-label medical imaging tasks such as CT or ultrasound.
Combining the CBAM blocks with additional imbalance techniques like focal loss might produce further gains.
Wider adoption could shorten the time radiologists spend on routine chest X-ray review.

Load-bearing premise

The measured lift from adding CBAM blocks will remain when training protocols, data splits, and imbalance-handling choices are altered.

What would settle it

A re-run of the ChestXray14 experiments with alternate random splits or a different imbalance strategy that produces no AUC gain or a lower score than the reported 0.8695 would falsify the benefit of the CBAM integration.

Figures

Figures reproduced from arXiv: 2605.02328 by Duy Hoang Khuong, Duy Nguyen Huu, Ngu Huynh Cong Viet.

**Figure 1.** Figure 1: The workflow of Learning Technique for ChestXray14 multi-label classification. view at source ↗

**Figure 2.** Figure 2: The overview of CBAM view at source ↗

**Figure 3.** Figure 3: Overview of CBAM-Enhanced CNN Backbones. view at source ↗

**Figure 4.** Figure 4: CBAM-Enhanced DenseNet121-based Feature Extractor. view at source ↗

**Figure 5.** Figure 5: CBAM-Enhanced VGG16-based Feature Extractor. view at source ↗

**Figure 6.** Figure 6: ROC AUC of Each Pathology in our Methods. view at source ↗

read the original abstract

Chest radiography is a widely used imaging modality for thoracic disease diagnosis, yet its conventional interpretation remains time-consuming and heavily dependent on expert knowledge. While deep learning has improved diagnostic efficiency through automated feature extraction, challenges such as class imbalance and the localization of multiple co-existing pathologies remain unsolved. In this paper, inspired by the strength of Convolutional Block Attention Module (CBAM) in feature refinement and the capability of CNN blocks in feature extraction, we propose a strategy to integrate CBAM into traditional CNN blocks to enhance performance in multi-label classification tasks. Our method achieves a mean AUC of 0.8695 on ChestXray14 dataset, outperforming several state-of-the-art baselines.Our source code is available at: https://github.com/NNNguyenDuyyy/FETC_CBAM_Enhanced_CNN.git

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes integrating Convolutional Block Attention Modules (CBAM) into standard CNN backbones to refine features for multi-label chest X-ray classification under class imbalance. It evaluates the approach on the public ChestXray14 benchmark and reports a mean AUC of 0.8695, claiming outperformance over several state-of-the-art baselines, while releasing the source code.

Significance. If the performance gains can be attributed specifically to the CBAM insertion and shown to be robust, the method supplies a lightweight, modular enhancement to existing CNN pipelines for thoracic disease diagnosis. The public code release is a clear strength that aids reproducibility and follow-up work.

major comments (2)

[Experiments] Experimental section: the headline claim of a 0.8695 mean AUC and outperformance rests on a single reported protocol without an ablation that applies the identical training schedule, loss weighting, data splits, and optimizer to the unmodified backbone; without this control the attribution of any lift to CBAM insertion cannot be secured.
[Results] Results and discussion: no statistical significance tests (e.g., DeLong or bootstrap confidence intervals) or variance across multiple runs are reported for the AUC figures, and label-correlation handling is not quantified, leaving the central empirical claim vulnerable to protocol-specific artifacts.

minor comments (2)

[Abstract] Abstract: the phrase 'outperforming several state-of-the-art baselines' should be accompanied by the names of those baselines and their reported AUC values for immediate context.
[Method] Notation: the description of how CBAM blocks are inserted into the CNN stages would benefit from an explicit diagram or pseudocode showing the exact placement relative to convolutional layers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects for strengthening the experimental rigor and statistical robustness of our claims. We address each major comment point by point below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Experiments] Experimental section: the headline claim of a 0.8695 mean AUC and outperformance rests on a single reported protocol without an ablation that applies the identical training schedule, loss weighting, data splits, and optimizer to the unmodified backbone; without this control the attribution of any lift to CBAM insertion cannot be secured.

Authors: We agree that a controlled ablation comparing the CBAM-enhanced backbone directly to the unmodified version under identical conditions is necessary to firmly attribute performance gains to the CBAM modules. In the revised manuscript, we will include this ablation study, training both variants with the exact same training schedule, loss weighting, data splits, and optimizer. This addition will isolate the contribution of CBAM and support the headline claims more rigorously. revision: yes
Referee: [Results] Results and discussion: no statistical significance tests (e.g., DeLong or bootstrap confidence intervals) or variance across multiple runs are reported for the AUC figures, and label-correlation handling is not quantified, leaving the central empirical claim vulnerable to protocol-specific artifacts.

Authors: We acknowledge the absence of statistical significance testing, variance reporting across runs, and explicit quantification of label correlations in the current results. We will perform multiple training runs with different random seeds to report mean AUC values along with standard deviations. Additionally, we will incorporate statistical tests such as DeLong's test or bootstrap confidence intervals to evaluate the significance of improvements over baselines. While our primary contribution is feature refinement via CBAM rather than explicit label-correlation modeling, we will add an analysis in the results section that quantifies performance on correlated labels through per-class AUC breakdowns and observations on co-occurring pathologies. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical AUC on external benchmark with no internal derivations

full rationale

The paper describes an architectural modification (CBAM integration into CNN blocks) and reports a measured mean AUC of 0.8695 on the public ChestXray14 dataset, along with comparisons to baselines. No equations, first-principles derivations, or predictions are presented that reduce the reported performance to a fitted parameter, self-defined quantity, or self-citation chain. The result is obtained via standard training and evaluation on an external benchmark, rendering the central claim self-contained with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that CBAM modules improve feature selection in CNNs for medical images and on standard deep-learning training practices; no new physical entities or ad-hoc constants are introduced.

axioms (1)

domain assumption CBAM modules improve feature refinement when inserted into CNN blocks for image classification
Invoked to justify the architectural choice; treated as established from prior CBAM literature.

pith-pipeline@v0.9.0 · 5446 in / 1236 out tokens · 23812 ms · 2026-05-08T19:30:51.476558+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 5 canonical work pages

[1]

Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases

Wang, Xiaosong, et al. "Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017

2017
[2]

MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports

Johnson, Alistair EW, et al. "MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports." Scientific data 6.1 (2019): 317. 7 arXivTemplateA PREPRINT

2019
[3]

Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison

Irvin, Jeremy, et al. "Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison." Proceedings of the AAAI conference on artificial intelligence. V ol. 33. No. 01. 2019

2019
[4]

A survey on deep learning in medical image analysis

Litjens, Geert, et al. "A survey on deep learning in medical image analysis." Medical image analysis 42 (2017): 60-88

2017
[5]

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

Rajpurkar, Pranav, et al. "Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning." arXiv preprint arXiv:1711.05225 (2017)

work page Pith review arXiv 2017
[6]

Learning to Diagnose from Scratch by Exploiting Dependencies among Labels,

Yao, Li, et al. "Learning to diagnose from scratch by exploiting dependencies among labels." arXiv preprint arXiv:1710.10501 (2017)

work page arXiv 2017
[7]

SynthEnsemble: A fusion of CNN, vision transformer, and Hybrid models for multi-label chest X-ray classification

Ashraf, SM Nabil, et al. "SynthEnsemble: A fusion of CNN, vision transformer, and Hybrid models for multi-label chest X-ray classification." 2023 26th International Conference on Computer and Information Technology (ICCIT). IEEE, 2023

2023
[8]

Swinchex: Multi-label classification on chest x-ray images with transformers.arXiv preprint arXiv:2206.04246, 2022

Taslimi, Sina, et al. "Swinchex: Multi-label classification on chest x-ray images with transformers." arXiv preprint arXiv:2206.04246 (2022)

work page arXiv 2022
[9]

Swin transformer: Hierarchical vision transformer using shifted windows

Liu, Ze, et al. "Swin transformer: Hierarchical vision transformer using shifted windows." Proceedings of the IEEE/CVF international conference on computer vision. 2021

2021
[10]

MedViT: a robust vision transformer for generalized medical image classification

Manzari, Omid Nejati, et al. "MedViT: a robust vision transformer for generalized medical image classification." Computers in biology and medicine 157 (2023): 106791

2023
[11]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020)

work page Pith review arXiv 2010
[12]

Weakly supervised deep learning for thoracic disease classification and localization on chest x-rays

Yan, Chaochao, et al. "Weakly supervised deep learning for thoracic disease classification and localization on chest x-rays." Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. 2018

2018
[13]

Densely connected convolutional networks

Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017

2017
[14]

Melanoma recognition via visual attention

Yan, Yiqi, Jeremy Kawahara, and Ghassan Hamarneh. "Melanoma recognition via visual attention." International Conference on Information Processing in Medical Imaging. Cham: Springer International Publishing, 2019

2019
[15]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014)

work page Pith review arXiv 2014
[16]

Cbam: Convolutional block attention module

Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018

2018
[17]

Focal loss for dense object detection

Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017. 8

2017