pith. machine review for the scientific record. sign in

arxiv: 2604.06622 · v1 · submitted 2026-04-08 · 💻 cs.CV

Recognition: no theorem link

Balancing Efficiency and Restoration: Lightweight Mamba-Based Model for CT Metal Artifact Reduction

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:43 UTC · model grok-4.3

classification 💻 cs.CV
keywords metal artifact reductionCT imagingMamba modellightweight UNetimage restorationmulti-scale processingartifact suppressionmedical imaging
0
0 comments X

The pith

A lightweight Mamba-based UNet reduces metal artifacts in CT images while preserving tissue structures and using few resources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MARMamba to fix metal artifacts in CT scans that obscure diagnoses for patients with implants. Existing approaches often damage healthy tissues further, require unavailable sinogram data, or use too much computing power. MARMamba employs a compact UNet built around a multi-scale Mamba module that processes images from flipped orientations and blends average and peak features to target only the artifacts. Success would yield clearer diagnostic images on standard hardware without extra inputs or structural loss. Tests show it outperforms alternatives in both quality and efficiency.

Core claim

MARMamba is a streamlined UNet architecture that incorporates multi-scale Mamba as its core module. Within MS-Mamba, a flip mamba block captures comprehensive contextual information by analyzing images from multiple orientations. The average maximum feed-forward network then integrates critical features with average features to suppress the artifacts. This combination eliminates metal artifacts of different sizes from standard CT images alone, without sinogram data, while keeping original anatomical structures intact.

What carries the argument

The multi-scale Mamba (MS-Mamba) module, which uses a flip mamba block to gather multi-orientation context and an average maximum feed-forward network to combine features for artifact suppression.

Load-bearing premise

The flip mamba block combined with the average maximum feed-forward network inside MS-Mamba gathers enough contextual information to suppress artifacts without damaging organ and tissue structures.

What would settle it

Running the model on a new CT dataset with varied metal implants and finding blurred or distorted anatomical structures in the output compared to ground truth would disprove the central claim.

Figures

Figures reproduced from arXiv: 2604.06622 by Ahmed Elazab, An Yan, Changmiao Wang, Cheng Pan, Dong Zeng, Shanzhou Niu, Sijun Liang, Weikai Qu, Xianfeng Li, Xiang Wan.

Figure 1
Figure 1. Figure 1: MARMamba backbone architecture. It consists of upsampling and downsampling components, each comprising three stages, with MS-Mamba serving [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Internal structure of MS-Mamba, which consists of two components: FMB and AMFN. The right side displays the internal details of those two [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Large-scale visual comparison of metallic implants. The input image is selected from the SynDeepLesion [ [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Medium metallic implants. The input image is selected from the SynDeepLesion [ [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Small metallic implants. The input image is selected from the SynDeepLesion [ [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Tiny metallic implants. The input image is selected from the SynDeepLesion [ [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison in real-world scenarios. Excluding the input image, the visual images of the other models display the inner regions marked by the [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of computational complexity and memory overhead. [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Image size vs. resource consumption. Disable gradient during testing. [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Feature visualization of the FMB. From left to right, the panels illustrate the input image, the Mamba output corresponding to the original (unflipped) [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Directional energy log-ratio of MS-Mamba branches. Positive values [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Error maps between restored images and ground truths. Samples [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗
read the original abstract

In computed tomography imaging, metal implants frequently generate severe artifacts that compromise image quality and hinder diagnostic accuracy. There are three main challenges in the existing methods: the deterioration of organ and tissue structures, dependence on sinogram data, and an imbalance between resource use and restoration efficiency. Addressing these issues, we introduce MARMamba, which effectively eliminates artifacts caused by metals of different sizes while maintaining the integrity of the original anatomical structures of the image. Furthermore, this model only focuses on CT images affected by metal artifacts, thus negating the requirement for additional input data. The model is a streamlined UNet architecture, which incorporates multi-scale Mamba (MS-Mamba) as its core module. Within MS-Mamba, a flip mamba block captures comprehensive contextual information by analyzing images from multiple orientations. Subsequently, the average maximum feed-forward network integrates critical features with average features to suppress the artifacts. This combination allows MARMamba to reduce artifacts efficiently. The experimental results demonstrate that our model excels in reducing metal artifacts, offering distinct advantages over other models. It also strikes an optimal balance between computational demands, memory usage, and the number of parameters, highlighting its practical utility in the real world. The code of the presented model is available at: https://github.com/RICKand-MORTY/MARMamba.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces MARMamba, a lightweight UNet architecture incorporating multi-scale Mamba (MS-Mamba) modules. Within MS-Mamba, a flip mamba block captures multi-orientation contextual information and an average maximum feed-forward network integrates features to suppress metal artifacts in CT images. The model claims to eliminate artifacts from metals of varying sizes while preserving anatomical structures, operate solely on artifact-affected images without sinogram data, and achieve an optimal balance between restoration quality and computational efficiency (memory, parameters, demands). Experimental results are asserted to demonstrate superiority over other models.

Significance. If the claims are substantiated with proper controls, this work could advance practical metal artifact reduction in clinical CT by providing an image-only, lightweight Mamba-based alternative that avoids common structural degradation and high resource costs of prior CNN or sinogram-dependent methods. Public code release aids reproducibility.

major comments (2)
  1. [Experimental Results] Experimental Results section: only aggregate PSNR/SSIM metrics on artifacted images are referenced, with no isolated tests on clean CT volumes or non-artifact regions to measure introduced HU changes, edge preservation, or structural fidelity. This validation is required to support the load-bearing claim that the flip mamba block plus average-max FFN in MS-Mamba selectively suppresses artifacts without deteriorating organ/tissue structures.
  2. [Abstract] Abstract and Methods: the assertion of 'distinct advantages' and 'optimal balance' between computational demands, memory, and parameters lacks any reported quantitative values, baseline comparisons, or ablation studies on the MS-Mamba components, preventing verification of the efficiency and performance claims.
minor comments (1)
  1. [Abstract] The abstract would benefit from including at least one key numerical result (e.g., PSNR improvement or parameter count) to make the performance claims concrete.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. These suggestions highlight areas where additional validation and quantification can strengthen the presentation of our results. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Experimental Results] Experimental Results section: only aggregate PSNR/SSIM metrics on artifacted images are referenced, with no isolated tests on clean CT volumes or non-artifact regions to measure introduced HU changes, edge preservation, or structural fidelity. This validation is required to support the load-bearing claim that the flip mamba block plus average-max FFN in MS-Mamba selectively suppresses artifacts without deteriorating organ/tissue structures.

    Authors: We agree that explicit validation on clean CT volumes is valuable to directly quantify any potential HU deviations or structural changes in non-artifact regions. Our current evaluation follows standard metal artifact reduction benchmarks that focus on artifacted images, where PSNR/SSIM improvements and visual comparisons already indicate preservation of anatomy. To address this point rigorously, we will add experiments applying the model to clean volumes and reporting HU statistics, edge preservation metrics, and structural similarity in non-artifact areas. These results will be incorporated into the revised Experimental Results section. revision: yes

  2. Referee: [Abstract] Abstract and Methods: the assertion of 'distinct advantages' and 'optimal balance' between computational demands, memory, and parameters lacks any reported quantitative values, baseline comparisons, or ablation studies on the MS-Mamba components, preventing verification of the efficiency and performance claims.

    Authors: We acknowledge that the Abstract would benefit from explicit quantitative support for the efficiency claims. The full manuscript contains comparative results against baselines in the Experimental Results section, but we agree that highlighting specific numbers (parameters, memory footprint, inference time) and component ablations directly in the Abstract and Methods would improve clarity. We will revise the Abstract to include key quantitative values and add a concise ablation study on the flip mamba block and average-max FFN within the Methods or Experiments section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical DL model with experimental validation

full rationale

The paper introduces MARMamba, a UNet-style architecture with MS-Mamba blocks (flip Mamba + average-max FFN) for CT metal artifact reduction. All central claims rest on training the model on artifacted CT images and reporting aggregate metrics (PSNR/SSIM) plus efficiency numbers against baselines. No derivation chain, fitted parameters renamed as predictions, self-citation load-bearing uniqueness theorems, or ansatz smuggling appears; the architecture choices are presented as design decisions validated by results rather than self-referential equations. This is the standard non-circular pattern for empirical computer-vision papers.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 3 invented entities

The model introduces three new architectural components whose effectiveness is asserted via experiments; no external benchmarks or formal proofs are supplied in the abstract.

free parameters (1)
  • network weights and training hyperparameters
    Standard deep-learning model whose parameters are fitted to training data; exact count and values not stated in abstract.
invented entities (3)
  • MS-Mamba module no independent evidence
    purpose: Core multi-scale block for contextual feature capture in the UNet
    New module proposed in the paper; no independent evidence outside the work.
  • flip mamba block no independent evidence
    purpose: Captures contextual information by processing images from multiple orientations
    Specific block introduced for this model; no independent evidence outside the work.
  • average maximum feed-forward network no independent evidence
    purpose: Integrates critical and average features to suppress artifacts
    New feed-forward variant introduced; no independent evidence outside the work.

pith-pipeline@v0.9.0 · 5561 in / 1263 out tokens · 38595 ms · 2026-05-10T18:43:22.105993+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 2 canonical work pages

  1. [1]

    InDuDoNet: An interpretable dual domain network for CT metal artifact reduction,

    H. Wang, Y . Li, H. Zhang, J. Chen, K. Ma, D. Meng, and Y . Zheng, “InDuDoNet: An interpretable dual domain network for CT metal artifact reduction,” inProc. Int. Conf. Med. Image Comput. Comput.- Assisted Intervention (MICCAI), Sep. 2021, pp. 107–118

  2. [2]

    Advances in metal artifact reduction in ct images: A review of traditional and novel metal artifact reduction techniques,

    M. Selles, J. A. van Osch, M. Maas, M. F. Boomsma, and R. H. Wellenberg, “Advances in metal artifact reduction in ct images: A review of traditional and novel metal artifact reduction techniques,”Eur . J. Radiol., vol. 170, p. 111276, Jan. 2024

  3. [3]

    Advancements in supervised deep learning for metal artifact reduction in computed tomography: A systematic review,

    C. E. Kleber, R. Karius, L. E. Naessens, C. O. Van Toledo, J. A. C. van Osch, M. F. Boomsma, J. W. Heemskerk, and A. J. van der Molen, “Advancements in supervised deep learning for metal artifact reduction in computed tomography: A systematic review,”Eur . J. Radiol., vol. 181, p. 111732, Dec. 2024

  4. [4]

    Reduction of ct artifacts caused by metallic implants

    W. A. Kalender, R. Hebel, and J. Ebersberger, “Reduction of ct artifacts caused by metallic implants.”Radiol., vol. 164, no. 2, pp. 576–577, Aug. 1987

  5. [5]

    Nor- malized metal artifact reduction (NMAR) in computed tomography,

    E. Meyer, R. Raupach, M. Lell, B. Schmidt, and M. Kachelrieß, “Nor- malized metal artifact reduction (NMAR) in computed tomography,” Med. Phys., vol. 37, no. 10, pp. 5482–5493, Sep. 2010

  6. [6]

    InDuDoNet+: A deep unfolding dual domain network for metal artifact reduction in CT images,

    H. Wang, Y . Li, H. Zhang, D. Meng, and Y . Zheng, “InDuDoNet+: A deep unfolding dual domain network for metal artifact reduction in CT images,”Med. Image Anal., vol. 85, p. 102729, Apr. 2023

  7. [7]

    MEPNet: A model- driven equivariant proximal network for joint sparse-view reconstruction and metal artifact reduction in CT images,

    H. Wang, M. Zhou, D. Wei, Y . Li, and Y . Zheng, “MEPNet: A model- driven equivariant proximal network for joint sparse-view reconstruction and metal artifact reduction in CT images,” inProc. Int. Conf. Med. Image Comput. Comput.-Assisted Intervention (MICCAI), Oct. 2023, pp. 109–120

  8. [8]

    Idol-net: An interactive dual-domain parallel network for ct metal artifact reduction,

    T. Wang, Z. Lu, Z. Yang, W. Xia, M. Hou, H. Sun, Y . Liu, H. Chen, J. Zhou, and Y . Zhang, “Idol-net: An interactive dual-domain parallel network for ct metal artifact reduction,”IEEE Trans. Radiat. Plasma Med. Sci., vol. 6, no. 8, pp. 874–885, 2022

  9. [9]

    Irdnet: Iterative relation-based dual-domain network via metal artifact feature guidance for ct metal artifact reduction,

    H. Wang, S. Yang, X. Bai, Z. Wang, J. Wu, Y . Lv, and G. Cao, “Irdnet: Iterative relation-based dual-domain network via metal artifact feature guidance for ct metal artifact reduction,”IEEE Trans. Radiat. Plasma Med. Sci., vol. 8, no. 8, pp. 959–972, 2024

  10. [10]

    Deep learning based projection domain metal segmentation for metal artifact reduction in cone beam computed tomography,

    “Deep learning based projection domain metal segmentation for metal artifact reduction in cone beam computed tomography,”IEEE Access, vol. 11, pp. 100 371–100 382, Sep. 2023

  11. [11]

    Adaptive convolutional dictionary network for CT metal artifact reduction,

    H. Wang, Y . Li, D. Meng, and Y . Zheng, “Adaptive convolutional dictionary network for CT metal artifact reduction,” inProc. Int. Joint Conf. Artif. Intell. (IJCAI), Jul. 2022, pp. 1401–1407

  12. [12]

    Orientation- shared convolution representation for CT metal artifact learning,

    H. Wang, Q. Xie, Y . Li, Y . Huang, D. Meng, and Y . Zheng, “Orientation- shared convolution representation for CT metal artifact learning,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assisted Intervention (MICCAI), Sep. 2022, pp. 665–675

  13. [13]

    OSCNet: Orientation-shared convolutional network for CT metal artifact learning,

    H. Wang, Q. Xie, D. Zeng, J. Ma, D. Meng, and Y . Zheng, “OSCNet: Orientation-shared convolutional network for CT metal artifact learning,” IEEE Trans. Med. Imaging, vol. 43, no. 1, pp. 489–502, Jan. 2024

  14. [14]

    Lwcdnet: An interpretable learning weighted convolutional dictionary network for metal artifact reduction in ct images,

    J. Liu, T. Jin, Z. Ye, F. Wu, K. Wang, Z. Wu, Y . Zhang, D. Hu, and Y . Chen, “Lwcdnet: An interpretable learning weighted convolutional dictionary network for metal artifact reduction in ct images,”IEEE Trans. Instrum. Meas., vol. 74, pp. 1–15, Mar. 2025

  15. [15]

    Deep unsupervised learning using nonequilibrium thermodynamics,

    J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” inProc. Int. Conf. Mach. Learn. (ICML), vol. 37, Jul. 2015, pp. 2256–2265

  16. [16]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, 2020, pp. 6840–6851

  17. [17]

    A denoising diffusion probabilistic model for metal artifact reduction in CT,

    G. M. Karageorgos, J. Zhang, N. Peters, W. Xia, C. Niu, H. Paganetti, G. Wang, and B. De Man, “A denoising diffusion probabilistic model for metal artifact reduction in CT,”IEEE Trans. Med. Imaging, vol. 43, no. 10, pp. 3521–3532, Oct. 2024

  18. [18]

    Unsupervised CT metal artifact reduction by plugging diffusion priors in dual domains,

    X. Liu, Y . Xie, S. Diao, S. Tan, and X. Liang, “Unsupervised CT metal artifact reduction by plugging diffusion priors in dual domains,”IEEE Trans. Med. Imaging, vol. 43, no. 10, pp. 3533–3545, Oct. 2024

  19. [19]

    Dual-domain denoising diffusion probabilis- tic model for metal artifact reduction,

    W. Xia, C. Niu, G. M. Karageorgos, J. Zhang, N. Peters, H. Paganetti, B. D. Man, and G. Wang, “Dual-domain denoising diffusion probabilis- tic model for metal artifact reduction,”IEEE Trans. Radiat. Plasma Med. Sci., pp. 1–1, Jun. 2025

  20. [20]

    Denoising diffusion implicit models,

    J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inProc. Int. Conf. Learn. Represent. (ICLR), 2021

  21. [21]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” vol. 30, 2017

  22. [22]

    Unsupervised metal artifacts reduction network for CT images based on efficient transformer,

    L. Zhu, Y . Han, X. Xi, L. Li, M. Liu, H. Fu, S. Tan, and B. Yan, “Unsupervised metal artifacts reduction network for CT images based on efficient transformer,”Biomed. Signal Process. Control, vol. 89, p. 105753, Mar. 2024

  23. [23]

    Dense transformer based enhanced coding network for unsupervised metal artifact reduction,

    W. Xie and M. B. Blaschko, “Dense transformer based enhanced coding network for unsupervised metal artifact reduction,” inProc. Int. Conf. Med. Image Comput. Comput.-Assisted Intervention (MICCAI), Oct. 2023, pp. 77–86

  24. [24]

    Mupo- net: A multilevel dual-domain progressive enhancement network with embedded attention for ct metal artifact reduction,

    X. Yao, J. Tan, Z. Deng, D. Xiong, Q. Zhao, and M. Wu, “Mupo- net: A multilevel dual-domain progressive enhancement network with embedded attention for ct metal artifact reduction,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2025, pp. 1–5

  25. [25]

    Mamba: Linear-time sequence modeling with selective state spaces,

    A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” inProc. Conf. Lang. Model. (COLM), Jul. 2024

  26. [26]

    Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality,

    T. Dao and A. Gu, “Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality,” inProc. Int. Conf. Mach. Learn. (ICML), Jul. 2024

  27. [27]

    Efficiently modeling long sequences with structured state spaces,

    A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,” inProc. Int. Conf. Learn. Represent. (ICLR), 2022

  28. [28]

    Vision Mamba: Efficient visual representation learning with bidirectional state space model,

    L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision Mamba: Efficient visual representation learning with bidirectional state space model,” inProc. Int. Conf. Mach. Learn. (ICML), Jul. 2024. IEEE TRANSACTIONS ON RADIATION AND PLASMA MEDICAL SCIENCES, VOL. 0, NO. 0, JULY 2025 15

  29. [29]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inProc. Int. Conf. Med. Image Comput. Comput.-Assisted Intervention (MICCAI), Nov. 2015, pp. 234– 241

  30. [30]

    MambaOut: Do we really need mamba for vision?

    W. Yu and X. Wang, “MambaOut: Do we really need mamba for vision?” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2025, pp. 4484–4496

  31. [31]

    Improved techniques for training consistency models,

    Y . Song and P. Dhariwal, “Improved techniques for training consistency models,” inProc. Int. Conf. Learn. Represent. (ICLR), Jan. 2024

  32. [32]

    The unreasonable effectiveness of deep features as a perceptual metric,

    R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 586–595

  33. [33]

    DeepLesion: Automated Deep Mining, Categorization and Detection of Significant Radiology Image Findings using Large-Scale Clinical Lesion Annotations

    K. Yan, X. Wang, L. Lu, and R. M. Summers, “DeepLesion: Auto- mated deep mining, categorization and detection of significant radiology image findings using large-scale clinical lesion annotations,” 2017, arXiv:1710.01766

  34. [34]

    Deep learning to segment pelvic bones: large-scale CT datasets and baseline models,

    P. Liu, H. Han, Y . Du, H. Zhu, Y . Li, F. Gu, H. Xiao, J. Li, C. Zhao, L. Xiaoet al., “Deep learning to segment pelvic bones: large-scale CT datasets and baseline models,”Int. J. Comput. Assisted Radiol. Surg., vol. 16, p. 749–756, Apr. 2021

  35. [35]

    Adam: A method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inProc. Int. Conf. Learn. Represent. (ICLR), 2015

  36. [36]

    SGDR: Stochastic gradient descent with warm restarts,

    I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” inProc. Int. Conf. Learn. Represent. (ICLR), 2017

  37. [37]

    Restormer: Efficient transformer for high-resolution image restoration,

    S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 5718–5729

  38. [38]

    Marformer: An efficient metal artifact re- duction transformer for dental CBCT images,

    Y . Shi, J. Xu, and D. Shen, “Marformer: An efficient metal artifact re- duction transformer for dental CBCT images,” 2024,arXiv:2311.09590

  39. [39]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inProc. Int. Conf. Learn. Represent. (ICLR), 2021

  40. [40]

    Pvt v2: Improved baselines with pyramid vision transformer,

    W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pvt v2: Improved baselines with pyramid vision transformer,” Comput. Visual Media, vol. 8, no. 3, pp. 415–424, Mar. 2022

  41. [41]

    DICDNet: Deep interpretable convolutional dictionary network for metal artifact reduction in CT images,

    H. Wang, Y . Li, N. He, K. Ma, D. Meng, and Y . Zheng, “DICDNet: Deep interpretable convolutional dictionary network for metal artifact reduction in CT images,”IEEE Trans. Med. Imaging, vol. 41, no. 4, pp. 869–880, Apr. 2022

  42. [42]

    Scope of validity of psnr in im- age/video quality assessment,

    Q. Huynh-Thu and M. Ghanbari, “Scope of validity of psnr in im- age/video quality assessment,”Electron. Lett., vol. 44, no. 13, pp. 800– 801, Jun. 2008

  43. [43]

    Image quality assessment: from error visibility to structural similarity,

    Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004