Hybrid Swin Attention Networks for Simultaneously Low-Dose PET and CT Denoising
Pith reviewed 2026-05-18 18:30 UTC · model grok-4.3
The pith
A hybrid Swin attention network with efficient global attention modules denoises low-dose PET and CT images more effectively than prior methods while staying lightweight.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By embedding Efficient Global Attention modules and a hybrid upsampling module inside a Swin Transformer backbone, HSANet simultaneously removes noise from low-dose PET and CT images, delivering higher quality results than current methods without requiring extra memory or compute resources.
What carries the argument
Efficient Global Attention (EGA) modules that jointly model spatial and channel-wise dependencies, paired with a hybrid upsampling module inside the Hybrid Swin Attention Network (HSANet) to control noise overfitting during reconstruction.
If this is right
- Clinicians could adopt lower radiation protocols more readily because restored image quality remains diagnostically usable.
- The compact model size permits deployment on standard hospital GPUs without specialized hardware.
- Joint PET-CT denoising removes the need for separate networks and reduces overall processing time in combined scanners.
- The architecture could scale to other paired imaging modalities that also seek dose reduction.
Where Pith is reading between the lines
- Similar attention-plus-upsampling designs might transfer to denoising tasks in other modalities such as MRI or digital radiography where noise and dose trade-offs also appear.
- Systematic testing across multiple independent low-dose datasets would show whether the gains hold beyond the single public set used here.
- Refining the hybrid upsampling rule could further improve recovery of small anatomical structures without raising model size.
Load-bearing premise
The performance gains come from the EGA modules and hybrid upsampling rather than from tuning that fits only the chosen dataset or from comparisons that omit stronger baselines.
What would settle it
An ablation experiment on the same public LDCT/PET dataset in which the EGA modules are removed and denoising quality falls below the full model would confirm their contribution; no measurable drop would undermine the claim.
Figures
read the original abstract
Low-dose computed tomography (LDCT) and positron emission tomography (PET) have emerged as safer alternatives to conventional imaging modalities by significantly reducing radiation exposure. However, current approaches often face a trade$-$off between training stability and computational efficiency. In this study, we propose a novel Hybrid Swin Attention Network (HSANet), which incorporates Efficient Global Attention (EGA) modules and a hybrid upsampling module to address these limitations. The EGA modules enhance both spatial and channel-wise interaction, improving the network's capacity to capture relevant features, while the hybrid upsampling module mitigates the risk of overfitting to noise. We validate the proposed approach using a publicly available LDCT/PET dataset. Experimental results demonstrate that HSANet achieves superior denoising performance compared to state of the art methods, while maintaining a lightweight model size suitable for deployment on GPUs with standard memory configurations. Thus, our approach demonstrates significant potential for practical, real-world clinical applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a Hybrid Swin Attention Network (HSANet) that integrates Efficient Global Attention (EGA) modules and a hybrid upsampling module for simultaneous denoising of low-dose CT (LDCT) and PET images. It claims that this architecture improves feature capture and reduces overfitting to noise, achieving superior denoising performance over state-of-the-art methods on a public LDCT/PET dataset while remaining lightweight enough for standard GPU deployment.
Significance. If the experimental claims hold with rigorous quantitative validation, the work could contribute to practical clinical denoising tools by balancing performance and efficiency in dual-modality low-dose imaging. However, the current manuscript provides no concrete metrics, ablations, or baseline comparisons, so the significance cannot be assessed beyond the architectural description.
major comments (2)
- [Experimental Results] Experimental Results section: The central claim of superior denoising performance (abstract and §1) is unsupported because no quantitative metrics (e.g., PSNR, SSIM, RMSE), error bars, statistical tests, named baseline comparisons, or ablation tables isolating the EGA modules and hybrid upsampling contributions are supplied. This directly prevents evaluation of whether improvements are genuine or due to dataset-specific tuning.
- [§3] §3 (Method): The description of the hybrid upsampling module mitigating overfitting is presented without any supporting derivation, loss-function analysis, or empirical isolation showing its specific contribution versus standard upsampling.
minor comments (2)
- [Abstract] Abstract: The phrase 'trade-off between training stability and computational efficiency' is introduced without prior context or definition in the manuscript.
- [§2] Notation: The paper uses 'EGA' and 'HSANet' without an initial definition or acronym expansion on first use in the main text.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the experimental validation and methodological justification, which we will address in the revision to better support our claims of improved denoising performance.
read point-by-point responses
-
Referee: [Experimental Results] Experimental Results section: The central claim of superior denoising performance (abstract and §1) is unsupported because no quantitative metrics (e.g., PSNR, SSIM, RMSE), error bars, statistical tests, named baseline comparisons, or ablation tables isolating the EGA modules and hybrid upsampling contributions are supplied. This directly prevents evaluation of whether improvements are genuine or due to dataset-specific tuning.
Authors: We agree that the submitted manuscript does not include the detailed quantitative metrics, error bars, statistical tests, named baseline comparisons, or ablation tables in the Experimental Results section. The abstract and introduction provide high-level summaries of the outcomes on the public LDCT/PET dataset, but the full supporting data and analyses were omitted from the initial version. In the revised manuscript, we will expand this section with comprehensive tables reporting PSNR, SSIM, and RMSE values (including error bars from multiple runs), statistical significance tests, explicit comparisons to named state-of-the-art baselines, and ablation studies that isolate the contributions of the EGA modules and hybrid upsampling. This will enable rigorous assessment of the performance gains. revision: yes
-
Referee: [§3] §3 (Method): The description of the hybrid upsampling module mitigating overfitting is presented without any supporting derivation, loss-function analysis, or empirical isolation showing its specific contribution versus standard upsampling.
Authors: We acknowledge that the current description in §3 lacks supporting derivation, loss-function analysis, or empirical isolation for the hybrid upsampling module's role in mitigating overfitting. In the revised manuscript, we will augment §3 with a detailed rationale and derivation explaining the mechanism by which the hybrid upsampling reduces overfitting to noise relative to standard approaches. We will also include loss-function analysis and present ablation results that empirically isolate its contribution, thereby providing stronger justification for its inclusion in the architecture. revision: yes
Circularity Check
No circularity: empirical architecture proposal validated on external public dataset
full rationale
The paper introduces HSANet with EGA modules and hybrid upsampling as an architectural design for simultaneous LDCT/PET denoising. It reports validation on a publicly available external dataset and claims superiority over SOTA methods. No mathematical derivation chain, equations, or first-principles predictions exist that could reduce to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no fitted parameters are relabeled as independent predictions. The central claim rests on empirical comparison to external benchmarks rather than self-referential fitting or definitional equivalence, making the work self-contained.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a novel Hybrid Swin Attention Network (HSANet), which incorporates Efficient Global Attention (EGA) modules and a hybrid upsampling module... Experimental results demonstrate that HSANet achieves superior denoising performance
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
HIC patch expanding module... nearest neighbor interpolation... zero-padding interpolation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
J. W. Hirshfeld, V. A. Ferrari, F. M. Bengel, L. Bergersen, C. E. Cham- bers, A. J. Einstein, M. J. Eisenberg, M. A. Fogel, T. C. Gerber, D. E. Haines, et al., 2018 acc/hrs/nasci/scai/scct expert consensus document on optimal use of ionizing radiation in cardiovascular imaging: best practices for safety and effectiveness: a report of the american college o...
work page 2018
-
[3]
E. Immonen, J. Wong, M. Nieminen, L. Kekkonen, S. Roine, S. Törn- roos, L. Lanca, F. Guan, E. Metsälä, The use of deep learning towards dose optimization in low-dose computed tomography: A scoping review, Radiography 28 (1) (2022) 208–214
work page 2022
- [4]
-
[5]
A. Clement David-Olawade, D. B. Olawade, L. Vanderbloemen, O. B. Rotifa, S. C. Fidelis, E. Egbon, A. O. Akpan, S. Adeleke, A. Ghose, S. Boussios, Ai-driven advances in low-dose imaging and enhancementa review, Diagnostics 15 (6) (2025) 689
work page 2025
- [6]
-
[7]
D. Caruso, D. De Santis, A. Del Gaudio, G. Guido, M. Zerunian, M. Polici, D. Valanzuolo, D. Pugliese, R. Persechino, A. Cremona, et al., Low-dose liver ct: image quality and diagnostic accuracy of deep learn- ing image reconstruction algorithm, European Radiology 34 (4) (2024) 2384–2393
work page 2024
-
[8]
H. Xue, Y. Yao, Y. Teng, Noise-assisted hybrid attention networks for low-dose pet and ct denoising, Medical Physics 52 (1) (2025) 444–453
work page 2025
-
[9]
Y. Lei, X. Dong, T. Wang, K. Higgins, T. Liu, W. J. Curran, H. Mao, J. A. Nye, X. Yang, Whole-body pet estimation from low count statis- tics using cycle-consistent generative adversarial networks, Physics in Medicine & Biology 64 (21) (2019) 215017. 13
work page 2019
-
[10]
H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, G. Wang, Low-dose ct with a residual encoder-decoder convolutional neural network, IEEE transactions on medical imaging 36 (12) (2017) 2524–2535
work page 2017
-
[11]
D. Wang, F. Fan, Z. Wu, R. Liu, F. Wang, H. Yu, Ctformer: convolution- free token2token dilated vision transformer for low-dose ct denoising, Physics in Medicine & Biology 68 (6) (2023) 065012
work page 2023
- [12]
-
[13]
S. Lefkimmiatis, Non-local color image denoising with convolutional neu- ral networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3587–3596
work page 2017
-
[14]
S. Lefkimmiatis, Universal denoising networks: a novel cnn architecture for image denoising, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3204–3213
work page 2018
-
[15]
M. P. Heinrich, M. Stille, T. M. Buzug, Residual u-net convolutional neural network architecture for low-dose ct denoising, Current Directions in Biomedical Engineering 4 (1) (2018) 297–300
work page 2018
-
[16]
Z. Li, S. Zhou, J. Huang, L. Yu, M. Jin, Investigation of low-dose ct im- age denoising using unpaired deep learning methods, IEEE transactions on radiation and plasma medical sciences 5 (2) (2020) 224–234
work page 2020
-
[17]
X. Wang, R. Girshick, A. Gupta, K. He, Non-local neural networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7794–7803
work page 2018
-
[18]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017)
work page 2017
-
[19]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020). 14
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[20]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022
work page 2021
-
[21]
M. Jian, X. Yu, H. Zhang, C. Yang, Swinct: feature enhancement based low-dose ct images denoising with swin transformer, Multimedia Sys- tems 30 (1) (2024) 1
work page 2024
-
[22]
L. Zhu, Y. Han, X. Xi, H. Fu, S. Tan, M. Liu, S. Yang, C. Liu, L. Li, B. Yan, Stednet: Swin transformer-based encoder–decoder network for noise reduction in low-dose ct, Medical Physics 50 (7) (2023) 4443–4458
work page 2023
- [23]
-
[24]
D. Wang, Y. Xu, S. Han, H. Yu, Masked autoencoders for low-dose ct denoising, in: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), IEEE, 2023, pp. 1–4
work page 2023
-
[25]
Y.-R. Wang, P. Wang, L. C. Adams, N. D. Sheybani, L. Qu, A. H. Sar- rami, A. J. Theruvath, S. Gatidis, T. Ho, Q. Zhou, et al., Low-count whole-body pet/mri restoration: an evaluation of dose reduction spec- trum and five state-of-the-art artificial intelligence models, European journal of nuclear medicine and molecular imaging 50 (5) (2023) 1337– 1350
work page 2023
-
[26]
D. Ulyanov, A. Vedaldi, V. Lempitsky, Deep image prior, in: Proceed- ings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9446–9454
work page 2018
-
[27]
Y. Liu, J. Li, Y. Pang, D. Nie, P.-T. Yap, The devil is in the upsam- pling: Architectural decisions made simpler for denoising with deep im- age prior, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 12408–12417
work page 2023
-
[28]
K. Gong, C. Catana, J. Qi, Q. Li, Pet image reconstruction using deep image prior, IEEE transactions on medical imaging 38 (7) (2018) 1655– 1665. 15
work page 2018
-
[29]
H. Sun, L. Peng, H. Zhang, Y. He, S. Cao, L. Lu, Dynamic pet im- age denoising using deep image prior combined with regularization by denoising, IEEE Access 9 (2021) 52378–52392
work page 2021
- [30]
-
[31]
S. Woo, J. Park, J.-Y. Lee, I. S. Kweon, Cbam: Convolutional block attention module, in: Proceedings of the European conference on com- puter vision (ECCV), 2018, pp. 3–19
work page 2018
-
[32]
J. Park, S. Woo, J.-Y. Lee, I. S. Kweon, Bam: Bottleneck attention module, arXiv preprint arXiv:1807.06514 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[33]
W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, Z. Wang, Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network, in: Proceed- ings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1874–1883
work page 2016
-
[34]
Gaussian Error Linear Units (GELUs)
D. Hendrycks, K. Gimpel, Gaussian error linear units (gelus), arXiv preprint arXiv:1606.08415 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[35]
H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, M. Wang, Swin-unet: Unet-like pure transformer for medical image segmentation, in: European conference on computer vision, Springer, 2022, pp. 205– 218
work page 2022
-
[36]
Z. Huang, Z. Chen, G. Quan, Y. Du, Y. Yang, X. Liu, H. Zheng, D. Liang, Z. Hu, Deep cascade residual networks (dcrns): Optimizing an encoder–decoder convolutional neural network for low-dose ct imag- ing, IEEE Transactions on Radiation and Plasma Medical Sciences 6 (8) (2022) 829–840
work page 2022
-
[37]
B. Kim, M. Han, H. Shim, J. Baek, A performance comparison of con- volutional neural network-based image denoising methods: the effect of loss functions on low-dose ct images, Medical physics 46 (9) (2019) 3906–3923. 16
work page 2019
-
[38]
S. Chen, X. Tian, Y. Wang, Y. Song, Y. Zhang, J. Zhao, J.-C. Chen, Daegan: Generative adversarial network based on dual-domain attention-enhanced encoder-decoder for low-dose pet imaging, Biomed- ical Signal Processing and Control 86 (2023) 105197
work page 2023
-
[39]
S. Xue, R. Guo, K. P. Bohn, J. Matzke, M. Viscione, I. Alberts, H. Meng, C. Sun, M. Zhang, M. Zhang, et al., A cross-scanner and cross-tracer deep learning method for the recovery of standard-dose imaging quality from low-dose pet, European journal of nuclear medicine and molecular imaging 49 (6) (2022) 1843–1856
work page 2022
-
[40]
O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241
work page 2015
-
[41]
T. Xiao, M. Singh, E. Mintun, T. Darrell, P. Dollár, R. Girshick, Early convolutions help transformers see better, Advances in neural informa- tion processing systems 34 (2021) 30392–30400. 17 (a) HSANet (b) Encoder-Decoder block Figure 1: The structure of HSANet. It consists of two residual convolutional blocks used in the encoder and two in the decoder,...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.