pith. sign in

arxiv: 2511.03232 · v2 · submitted 2025-11-05 · 💻 cs.CV

Transformer-Progressive Mamba Network for Lightweight Image Super-Resolution

Pith reviewed 2026-05-18 01:38 UTC · model grok-4.3

classification 💻 cs.CV
keywords image super-resolutionlightweight networkMambaTransformerprogressive modelinghigh-frequency refinementreceptive field
0
0 comments X

The pith

Integrating window self-attention with progressive Mamba enables scale interactions that improve lightweight image super-resolution without added cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes T-PMambaSR as a lightweight framework for image super-resolution. Existing Mamba methods miss fine-grained transitions across modeling scales, which limits how effectively they build feature representations. The new approach merges window-based self-attention with Progressive Mamba to create interactions among receptive fields of varying sizes. These interactions occur progressively and incur no extra computational cost. An Adaptive High-Frequency Refinement Module then restores details lost in the main processing steps. Experiments indicate the resulting model outperforms recent Transformer- and Mamba-based alternatives while using fewer resources.

Core claim

By integrating window-based self-attention with Progressive Mamba, the method establishes a fine-grained modeling paradigm that progressively enhances feature representation through interactions among receptive fields of different scales without introducing additional computational cost. The Adaptive High-Frequency Refinement Module recovers high-frequency details lost during Transformer and Mamba processing. This yields better performance than recent Transformer- or Mamba-based methods while incurring lower computational cost.

What carries the argument

Progressive Mamba, which creates progressive interactions among receptive fields at different scales to enhance feature representation without added cost.

If this is right

  • Receptive fields expand progressively across network layers through scale interactions.
  • Feature expressiveness grows without raising overall computational cost.
  • High-frequency image details are restored more effectively after main processing.
  • The network achieves higher super-resolution quality than recent Transformer or Mamba baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The scale-interaction pattern could transfer to other efficient vision tasks such as denoising or deblurring.
  • Real-time super-resolution on mobile hardware might become practical with this efficiency gain.
  • Further tests on video sequences could show whether temporal consistency benefits from the same progressive mechanism.

Load-bearing premise

Existing Mamba-based methods lack fine-grained scale transitions and that window self-attention combined with Progressive Mamba plus high-frequency refinement can deliver better results at no extra computational cost.

What would settle it

If standard SR benchmarks such as Set5 or DIV2K show that T-PMambaSR fails to exceed the PSNR or SSIM of recent Mamba-based SR networks while matching or beating their FLOPs, the central performance claim would be disproven.

Figures

Figures reproduced from arXiv: 2511.03232 by Chia-Wen Lin, Guangwei Gao, Jian Yang, Sichen Guo, Wenjie Li, Yuanyang Liu.

Figure 1
Figure 1. Figure 1: (Top): Our design rationale is based on progressively exploiting internal interactions within Window multi-head self￾attention (MHSA), combined with a Window Scan Mamba (WSM) and a Global Scan Mamba (GSM). This hierarchical structure facilitates the gradual expansion of receptive fields, ensuring comprehensive information exchange both within and across windows. (Bottom)Leveraging our design, our method st… view at source ↗
Figure 2
Figure 2. Figure 2: The network architecture of our T-PMambaSR, as well as the framework of the (a) Window Scan Mamba Layer [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The illustration of our (a) Window Interaction State Space Module (WISSM) with its two flatten mechanisms, Window [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The architecture of our (a) Adaptive High-Frequency Refinement Module (AHFRM), (b) Multi-Scale Gating Module [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparisons with existing methods in different scenes. Our method can restore clearer edges and structures. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparisons of our T-PMambaSR with existing methods on the real-world test set RealSRv3 [ [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparisons of different SR methods on multi-scale targets. The target in [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of features from three stages. We show the outputs from our TL and WSML (from the final T-WSM [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: LAM [48] comparison between our model and existing methods [11], [16], [17] on the ×4 scale. Our method leads to the maximum pixel contribution areas, confirming its superior capacity for capturing broad contextual information. B. High-Frequency Restoration Our AHFRM aims to refine the high-frequency information that is often lost after being processed by the Transformer and Mamba blocks [PITH_FULL_IMAGE:… view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of our AHFRM’s high-frequency [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
read the original abstract

Recently, Mamba-based super-resolution (SR) methods have demonstrated the ability to capture global receptive fields with linear complexity, addressing the quadratic computational cost of Transformer-based SR approaches. However, existing Mamba-based methods lack fine-grained transitions across different modeling scales, which limits the efficiency of feature representation. In this paper, we propose T-PMambaSR, a lightweight SR framework that integrates window-based self-attention with Progressive Mamba. By enabling interactions among receptive fields of different scales, our method establishes a fine-grained modeling paradigm that progressively enhances feature representation without introducing additional computational cost. Furthermore, we introduce an Adaptive High-Frequency Refinement Module (AHFRM) to recover high-frequency details lost during Transformer and Mamba processing. Extensive experiments demonstrate that T-PMambaSR progressively enhances the model's receptive field and expressiveness, yielding better performance than recent Transformer- or Mamba-based methods while incurring lower computational cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper proposes T-PMambaSR, a lightweight image super-resolution framework integrating window-based self-attention with Progressive Mamba to enable fine-grained interactions among receptive fields of different scales. This is claimed to progressively enhance feature representation without additional computational cost. An Adaptive High-Frequency Refinement Module (AHFRM) is introduced to recover high-frequency details lost during processing. Extensive experiments are reported to demonstrate superior performance over recent Transformer- and Mamba-based SR methods while incurring lower computational cost.

Significance. If the efficiency and performance claims hold under rigorous verification, the work offers a meaningful advance in lightweight SR by establishing a cross-scale modeling paradigm that combines Transformer locality with Mamba's linear complexity. Credit is given for the explicit focus on zero-overhead scale interactions and the empirical comparisons on standard benchmarks, which provide falsifiable performance predictions.

major comments (1)
  1. [Abstract and §3.3] Abstract and §3.3: The central claim that Progressive Mamba integration with window self-attention enables scale interactions 'without introducing additional computational cost' is load-bearing for the lightweight positioning. The complexity analysis must explicitly derive or measure the overhead of progressive state updates and AHFRM adaptation steps; if these introduce hidden FLOPs or memory traffic not reflected in the reported tables, the attribution of gains to the fine-grained paradigm is weakened.
minor comments (3)
  1. [§4.1] §4.1: Ensure all dataset splits and training protocols (e.g., patch sizes, augmentation) are stated with sufficient detail for reproducibility, including any differences from prior Mamba-SR baselines.
  2. [Figure 2] Figure 2: The network diagram would benefit from explicit annotation of the data flow between window-attention and Progressive Mamba blocks to clarify the claimed scale interactions.
  3. [Table 1] Table 1: Add standard deviation or multiple-run statistics to PSNR/SSIM entries if single-run results are reported, to strengthen the performance superiority claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below and will revise the paper to strengthen the complexity analysis as requested.

read point-by-point responses
  1. Referee: [Abstract and §3.3] Abstract and §3.3: The central claim that Progressive Mamba integration with window self-attention enables scale interactions 'without introducing additional computational cost' is load-bearing for the lightweight positioning. The complexity analysis must explicitly derive or measure the overhead of progressive state updates and AHFRM adaptation steps; if these introduce hidden FLOPs or memory traffic not reflected in the reported tables, the attribution of gains to the fine-grained paradigm is weakened.

    Authors: We agree that an explicit derivation and measurement of overhead is necessary to rigorously support the lightweight claims. In the revised manuscript, we will expand the complexity analysis in §3.3 with detailed equations deriving the FLOPs for Progressive Mamba state updates (showing that cross-scale interactions reuse the same linear-complexity SSM transitions without extra matrix operations) and for AHFRM (which employs parameter-efficient adaptive filtering with O(1) overhead relative to the backbone). We will also add empirical measurements of runtime and peak memory on standard benchmarks to rule out hidden costs from memory traffic. This revision will clarify that the reported gains are attributable to the fine-grained paradigm while preserving the overall complexity profile. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new architecture integration with empirical validation

full rationale

The paper proposes T-PMambaSR as an integration of window-based self-attention and Progressive Mamba plus a new AHFRM module to enable cross-scale receptive field interactions and high-frequency recovery. No equations or sections reduce the claimed fine-grained paradigm or zero-cost property to a self-definition, fitted parameter, or self-citation chain; the central claims rest on the architectural design choices and reported experimental comparisons rather than any input being renamed or forced as output. The derivation chain is therefore self-contained as a standard model-construction-plus-validation process.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on background assumptions about Transformer and Mamba computational properties drawn from prior literature, plus newly introduced architectural components whose benefits are asserted via experiments.

free parameters (1)
  • network design hyperparameters
    Choices such as number of blocks, channel dimensions, and window sizes are required to implement the hybrid architecture and balance the claimed efficiency gains.
axioms (1)
  • domain assumption Mamba-based methods can capture global receptive fields with linear complexity while Transformers incur quadratic cost
    Invoked in the abstract as the motivation for using Mamba to address limitations of Transformer-based SR approaches.
invented entities (2)
  • Progressive Mamba no independent evidence
    purpose: To enable fine-grained transitions and interactions across different modeling scales
    New component introduced to address the stated limitation of existing Mamba methods.
  • Adaptive High-Frequency Refinement Module (AHFRM) no independent evidence
    purpose: To recover high-frequency details lost during Transformer and Mamba processing
    New module proposed as an addition to the framework.

pith-pipeline@v0.9.0 · 5698 in / 1647 out tokens · 60916 ms · 2026-05-18T01:38:01.176116+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

  1. [1]

    A systematic survey of deep learning-based single-image super-resolution,

    J. Li, Z. Pei, W. Li, G. Gao, L. Wang, Y . Wang, and T. Zeng, “A systematic survey of deep learning-based single-image super-resolution,” ACM Computing Surveys, vol. 56, no. 10, pp. 1–40, 2024. 1

  2. [2]

    Towards realistic data generation for real-world super-resolution,

    L. Peng, W. Li, R. Pei, J. Ren, J. Xu, Y . Wang, Y . Cao, and Z.-J. Zha, “Towards realistic data generation for real-world super-resolution,” in ICLR, 2024. 1

  3. [3]

    Pmq-ve: Progressive multi-frame quantization for video enhancement,

    Z. Feng, L. Peng, X. Di, Y . Guo, W. Li, Y . Zhang, R. Pei, Y . Wang, Y . Cao, and Z.-J. Zha, “Pmq-ve: Progressive multi-frame quantization for video enhancement,”arXiv preprint arXiv:2505.12266, 2025. 1

  4. [4]

    Survey on deep face restoration: From non-blind to blind and beyond,

    W. Li, M. Wang, K. Zhang, J. Li, X. Li, Y . Zhang, G. Gao, W. Deng, and C.-W. Lin, “Survey on deep face restoration: From non-blind to blind and beyond,”arXiv preprint arXiv:2309.15490, 2023. 1

  5. [5]

    Self-supervised selective- guided diffusion model for old-photo face restoration,

    W. Li, X. Wang, H. Guo, G. Gao, and Z. Ma, “Self-supervised selective- guided diffusion model for old-photo face restoration,” inNeurIPS, 2025. 1

  6. [6]

    Lightweight image super- resolution with information multi-distillation network,

    Z. Hui, X. Gao, Y . Yang, and X. Wang, “Lightweight image super- resolution with information multi-distillation network,” inACM MM, 2019, pp. 2024–2032. 1, 2

  7. [7]

    Feature distillation interaction weighting network for lightweight image super-resolution,

    G. Gao, W. Li, J. Li, F. Wu, H. Lu, and Y . Yu, “Feature distillation interaction weighting network for lightweight image super-resolution,” inAAAI, vol. 36, no. 1, 2022, pp. 661–669. 1, 2

  8. [8]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inICLR, 2021. 1

  9. [9]

    Swinir: Image restoration using swin transformer,

    J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” inICCVW, 2021, pp. 1833–1844. 1, 2, 6, 7, 8

  10. [10]

    Efficient long-range attention network for image super-resolution,

    X. Zhang, H. Zeng, S. Guo, and L. Zhang, “Efficient long-range attention network for image super-resolution,” inECCV. Springer, 2022, pp. 649–667. 1, 6, 7

  11. [11]

    Srformer: Permuted self-attention for single image super-resolution,

    Y . Zhou, Z. Li, C.-L. Guo, S. Bai, M.-M. Cheng, and Q. Hou, “Srformer: Permuted self-attention for single image super-resolution,” inICCV, 2023, pp. 12 780–12 791. 1, 2, 6, 7, 8, 9, 10

  12. [12]

    Hit-sr: Hierarchical transformer for efficient image super-resolution,

    X. Zhang, Y . Zhang, and F. Yu, “Hit-sr: Hierarchical transformer for efficient image super-resolution,” inECCV. Springer, 2024, pp. 483–500. 1, 2, 6, 7, 8, 9, 10

  13. [13]

    Dual-domain modulation network for lightweight image super-resolution,

    W. Li, H. Guo, Y . Hou, G. Gao, and Z. Ma, “Dual-domain modulation network for lightweight image super-resolution,”IEEE Trans. Multimedia,

  14. [14]

    On single image scale-up using sparse-representations,

    R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” inICCS, 2010, pp. 711–730. 1, 6

  15. [15]

    Single image super-resolution from transformed self-exemplars,

    J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” inCVPR, 2015, pp. 5197–5206. 1, 6, 7

  16. [16]

    Mambair: A simple baseline for image restoration with state-space model,

    H. Guo, J. Li, T. Dai, Z. Ouyang, X. Ren, and S.-T. Xia, “Mambair: A simple baseline for image restoration with state-space model,” inECCV. Springer, 2024, pp. 222–241. 1, 2, 4, 6, 7, 8, 9, 10

  17. [17]

    Mambairv2: Attentive state space restoration,

    H. Guo, Y . Guo, Y . Zha, Y . Zhang, W. Li, T. Dai, S.-T. Xia, and Y . Li, “Mambairv2: Attentive state space restoration,” inCVPR, 2025, pp. 28 124–28 133. 1, 2, 6, 7, 8, 9, 10

  18. [18]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2024. 1, 3

  19. [19]

    Efficient image super-resolution with feature interaction weighted hybrid network,

    W. Li, J. Li, G. Gao, W. Deng, J. Yang, G.-J. Qi, and C.-W. Lin, “Efficient image super-resolution with feature interaction weighted hybrid network,” IEEE Trans. Multimedia, vol. 27, pp. 2256–2267, 2025. 1, 2

  20. [20]

    Feature modulation transformer: Cross-refinement of global representation via high-frequency prior for image super-resolution,

    A. Li, L. Zhang, Y . Liu, and C. Zhu, “Feature modulation transformer: Cross-refinement of global representation via high-frequency prior for image super-resolution,” inICCV, 2023, pp. 12 514–12 524. 2

  21. [21]

    FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution

    S. Xu, W. Li, G. Gao, J. Yang, G.-J. Qi, and C.-W. Lin, “Fadpnet: Frequency-aware dual-path network for face super-resolution,”arXiv preprint arXiv:2506.14121, 2025. 2

  22. [22]

    Soft-edge assisted network for single image super-resolution,

    F. Fang, J. Li, and T. Zeng, “Soft-edge assisted network for single image super-resolution,”IEEE Trans. Image Process, vol. 29, pp. 4656–4668,

  23. [23]

    2 JOURNAL OF LATEX CLASS FILES, VOL-, NO-, 2020 12

  24. [24]

    Transforming image super- resolution: a convformer-based efficient approach,

    G. Wu, J. Jiang, J. Jiang, and X. Liu, “Transforming image super- resolution: a convformer-based efficient approach,”IEEE Trans. Image Process, vol. 33, pp. 6071–6082, 2024. 2

  25. [25]

    Accelerating the super-resolution convolutional neural network,

    C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” inECCV. Springer, 2016, pp. 391–407. 2

  26. [26]

    Cross-receptive focused inference network for lightweight image super- resolution,

    W. Li, J. Li, G. Gao, W. Deng, J. Zhou, J. Yang, and G.-J. Qi, “Cross-receptive focused inference network for lightweight image super- resolution,”IEEE Trans. Multimedia, vol. 26, pp. 864–877, 2023. 2

  27. [27]

    Lightweight bimodal network for single-image super-resolution via symmetric cnn and recursive transformer,

    G. Gao, Z. Wang, J. Li, W. Li, Y . Yu, and T. Zeng, “Lightweight bimodal network for single-image super-resolution via symmetric cnn and recursive transformer,” inIJCAI, 2022, pp. 913–919. 2

  28. [28]

    Frequency-assisted mamba for remote sensing image super-resolution,

    Y . Xiao, Q. Yuan, K. Jiang, Y . Chen, Q. Zhang, and C.-W. Lin, “Frequency-assisted mamba for remote sensing image super-resolution,” IEEE Trans. Multimedia, vol. 27, pp. 1783–1796, 2025. 2

  29. [29]

    Transformer for single image super-resolution,

    Z. Lu, J. Li, H. Liu, C. Huang, L. Zhang, and T. Zeng, “Transformer for single image super-resolution,” inCVPRW, 2022, pp. 457–466. 2, 5

  30. [30]

    Efficient face super-resolution via wavelet-based feature enhancement network,

    W. Li, H. Guo, X. Liu, K. Liang, J. Hu, Z. Ma, and J. Guo, “Efficient face super-resolution via wavelet-based feature enhancement network,” inACM MM, 2024, pp. 4515–4523. 2

  31. [31]

    Adaptive frequency filters as efficient global token mixers,

    Z. Huang, Z. Zhang, C. Lan, Z.-J. Zha, Y . Lu, and B. Guo, “Adaptive frequency filters as efficient global token mixers,” inICCV, 2023, pp. 6049–6059. 2

  32. [32]

    Fouriersr: A fourier token-based plu- gin for efficient image super-resolution,

    W. Li, H. Guo, Y . Hou, and Z. Ma, “Fouriersr: A fourier token-based plu- gin for efficient image super-resolution,”arXiv preprint arXiv:2503.10043,

  33. [33]

    Fdsr: An interpretable frequency division stepwise process based single-image super-resolution network,

    P. Xu, Q. Liu, H. Bao, R. Zhang, L. Gu, and G. Wang, “Fdsr: An interpretable frequency division stepwise process based single-image super-resolution network,”IEEE Trans. Image Process, vol. 33, pp. 1710– 1725, 2024. 2

  34. [34]

    Exploring the potential of pooling techniques for universal image restoration,

    Y . Cui, W. Ren, and A. Knoll, “Exploring the potential of pooling techniques for universal image restoration,”IEEE Trans. Image Process, vol. 34, pp. 3403–3416, 2025. 2

  35. [35]

    Can: Cascade augmentations against noise for image restoration,

    Y . Yan, S. Yao, W. Ren, R. Zhang, Q. Guo, and X. Cao, “Can: Cascade augmentations against noise for image restoration,”IEEE Trans. Image Process, vol. 34, pp. 5131–5146, 2025. 2

  36. [36]

    Mamballie: Implicit retinex-aware low light enhancement with global-then-local state space,

    J. Weng, Z. Yan, Y . Tai, J. Qian, J. Yang, and J. Li, “Mamballie: Implicit retinex-aware low light enhancement with global-then-local state space,” NeurIPS, pp. 27 440–27 462, 2024. 4, 8

  37. [37]

    Wave-mamba: Wavelet state space model for ultra-high-definition low-light image enhancement,

    W. Zou, H. Gao, W. Yang, and T. Liu, “Wave-mamba: Wavelet state space model for ultra-high-definition low-light image enhancement,” in ACM MM, 2024, pp. 1534–1543. 4

  38. [38]

    Low- complexity single-image super-resolution based on nonnegative neighbor embedding,

    M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Low- complexity single-image super-resolution based on nonnegative neighbor embedding,” inBMVC, 2012, pp. 135.1–135.10. 6

  39. [39]

    A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,

    D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” inICCV, 2001, pp. 416–

  40. [40]

    Sketch-based manga retrieval using manga109 dataset,

    Y . Matsui, K. Ito, Y . Aramaki, A. Fujimoto, T. Ogawa, T. Yamasaki, and K. Aizawa, “Sketch-based manga retrieval using manga109 dataset,” Multimed. Tools Appl., vol. 76, pp. 21 811–21 838, 2017. 6

  41. [41]

    Omni aggregation networks for lightweight image super-resolution,

    H. Wang, X. Chen, B. Ni, Y . Liu, and J. Liu, “Omni aggregation networks for lightweight image super-resolution,” inCVPR, 2023, pp. 22 378– 22 387. 6, 7

  42. [42]

    Emulating self-attention with convolution for efficient image super-resolution,

    D. Lee, S. Yun, and Y . Ro, “Emulating self-attention with convolution for efficient image super-resolution,” inICCV, 2025, pp. 24 467–24 477. 6, 7, 9, 10

  43. [43]

    A collaborative network of mamba and cnn for lightweight image super-resolution,

    X. Wang, J. Li, J. Li, S. Wang, L. Yan, and Y . Xu, “A collaborative network of mamba and cnn for lightweight image super-resolution,”IEEE Trans. Consum. Electron., vol. 71, no. 2, pp. 3591–3604, 2025. 6, 7

  44. [44]

    Mair: A locality- and continuity-preserving mamba for image restoration,

    B. Li, H. Zhao, W. Wang, P. Hu, Y . Gou, and X. Peng, “Mair: A locality- and continuity-preserving mamba for image restoration,” inCVPR, 2025, pp. 7491–7501. 6, 7, 8

  45. [45]

    Toward real-world single image super-resolution: A new benchmark and a new model,

    J. Cai, H. Zeng, H. Yong, Z. Cao, and L. Zhang, “Toward real-world single image super-resolution: A new benchmark and a new model,” in CVPR, 2019, pp. 3086–3095. 6, 7, 8

  46. [46]

    Ntire 2017 challenge on single image super-resolution: Methods and results,

    R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, and L. Zhang, “Ntire 2017 challenge on single image super-resolution: Methods and results,” inCVPRW, 2017, pp. 126–135. 6

  47. [47]

    Image quality assessment: from error visibility to structural similarity,

    Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Trans. Image Process, vol. 13, no. 4, pp. 600–612, 2004. 6

  48. [48]

    Vmamba: Visual state space model,

    Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, J. Jiao, and Y . Liu, “Vmamba: Visual state space model,” inNeurIPS, vol. 37, 2024, pp. 103 031–103 063. 8

  49. [49]

    Interpreting super-resolution networks with local attribution maps,

    J. Gu and C. Dong, “Interpreting super-resolution networks with local attribution maps,” inCVPR, 2021, pp. 9199–9208. 9, 10