Transformer-Progressive Mamba Network for Lightweight Image Super-Resolution

Chia-Wen Lin; Guangwei Gao; Jian Yang; Sichen Guo; Wenjie Li; Yuanyang Liu

arxiv: 2511.03232 · v2 · submitted 2025-11-05 · 💻 cs.CV

Transformer-Progressive Mamba Network for Lightweight Image Super-Resolution

Sichen Guo , Wenjie Li , Yuanyang Liu , Guangwei Gao , Jian Yang , Chia-Wen Lin This is my paper

Pith reviewed 2026-05-18 01:38 UTC · model grok-4.3

classification 💻 cs.CV

keywords image super-resolutionlightweight networkMambaTransformerprogressive modelinghigh-frequency refinementreceptive field

0 comments

The pith

Integrating window self-attention with progressive Mamba enables scale interactions that improve lightweight image super-resolution without added cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes T-PMambaSR as a lightweight framework for image super-resolution. Existing Mamba methods miss fine-grained transitions across modeling scales, which limits how effectively they build feature representations. The new approach merges window-based self-attention with Progressive Mamba to create interactions among receptive fields of varying sizes. These interactions occur progressively and incur no extra computational cost. An Adaptive High-Frequency Refinement Module then restores details lost in the main processing steps. Experiments indicate the resulting model outperforms recent Transformer- and Mamba-based alternatives while using fewer resources.

Core claim

By integrating window-based self-attention with Progressive Mamba, the method establishes a fine-grained modeling paradigm that progressively enhances feature representation through interactions among receptive fields of different scales without introducing additional computational cost. The Adaptive High-Frequency Refinement Module recovers high-frequency details lost during Transformer and Mamba processing. This yields better performance than recent Transformer- or Mamba-based methods while incurring lower computational cost.

What carries the argument

Progressive Mamba, which creates progressive interactions among receptive fields at different scales to enhance feature representation without added cost.

If this is right

Receptive fields expand progressively across network layers through scale interactions.
Feature expressiveness grows without raising overall computational cost.
High-frequency image details are restored more effectively after main processing.
The network achieves higher super-resolution quality than recent Transformer or Mamba baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The scale-interaction pattern could transfer to other efficient vision tasks such as denoising or deblurring.
Real-time super-resolution on mobile hardware might become practical with this efficiency gain.
Further tests on video sequences could show whether temporal consistency benefits from the same progressive mechanism.

Load-bearing premise

Existing Mamba-based methods lack fine-grained scale transitions and that window self-attention combined with Progressive Mamba plus high-frequency refinement can deliver better results at no extra computational cost.

What would settle it

If standard SR benchmarks such as Set5 or DIV2K show that T-PMambaSR fails to exceed the PSNR or SSIM of recent Mamba-based SR networks while matching or beating their FLOPs, the central performance claim would be disproven.

Figures

Figures reproduced from arXiv: 2511.03232 by Chia-Wen Lin, Guangwei Gao, Jian Yang, Sichen Guo, Wenjie Li, Yuanyang Liu.

**Figure 1.** Figure 1: (Top): Our design rationale is based on progressively exploiting internal interactions within Window multi-head selfattention (MHSA), combined with a Window Scan Mamba (WSM) and a Global Scan Mamba (GSM). This hierarchical structure facilitates the gradual expansion of receptive fields, ensuring comprehensive information exchange both within and across windows. (Bottom)Leveraging our design, our method st… view at source ↗

**Figure 2.** Figure 2: The network architecture of our T-PMambaSR, as well as the framework of the (a) Window Scan Mamba Layer [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The illustration of our (a) Window Interaction State Space Module (WISSM) with its two flatten mechanisms, Window [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The architecture of our (a) Adaptive High-Frequency Refinement Module (AHFRM), (b) Multi-Scale Gating Module [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparisons with existing methods in different scenes. Our method can restore clearer edges and structures. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparisons of our T-PMambaSR with existing methods on the real-world test set RealSRv3 [ [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Visual comparisons of different SR methods on multi-scale targets. The target in [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of features from three stages. We show the outputs from our TL and WSML (from the final T-WSM [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: LAM [48] comparison between our model and existing methods [11], [16], [17] on the ×4 scale. Our method leads to the maximum pixel contribution areas, confirming its superior capacity for capturing broad contextual information. B. High-Frequency Restoration Our AHFRM aims to refine the high-frequency information that is often lost after being processed by the Transformer and Mamba blocks [PITH_FULL_IMAGE:… view at source ↗

**Figure 10.** Figure 10: Visualization of our AHFRM’s high-frequency [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

read the original abstract

Recently, Mamba-based super-resolution (SR) methods have demonstrated the ability to capture global receptive fields with linear complexity, addressing the quadratic computational cost of Transformer-based SR approaches. However, existing Mamba-based methods lack fine-grained transitions across different modeling scales, which limits the efficiency of feature representation. In this paper, we propose T-PMambaSR, a lightweight SR framework that integrates window-based self-attention with Progressive Mamba. By enabling interactions among receptive fields of different scales, our method establishes a fine-grained modeling paradigm that progressively enhances feature representation without introducing additional computational cost. Furthermore, we introduce an Adaptive High-Frequency Refinement Module (AHFRM) to recover high-frequency details lost during Transformer and Mamba processing. Extensive experiments demonstrate that T-PMambaSR progressively enhances the model's receptive field and expressiveness, yielding better performance than recent Transformer- or Mamba-based methods while incurring lower computational cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The hybrid Progressive Mamba plus window attention gives a practical efficiency edge in lightweight SR experiments, but the zero-added-cost claim for scale interactions is the part that needs the closest check in the tables.

read the letter

The paper's main move is a hybrid network that layers window-based self-attention with a Progressive Mamba block to create cross-scale receptive field interactions, then adds an Adaptive High-Frequency Refinement Module to pull back details that get smoothed out. This is positioned as fixing the lack of fine-grained scale transitions in earlier Mamba SR work while keeping the linear complexity benefit of state-space models.

Referee Report

1 major / 3 minor

Summary. The paper proposes T-PMambaSR, a lightweight image super-resolution framework integrating window-based self-attention with Progressive Mamba to enable fine-grained interactions among receptive fields of different scales. This is claimed to progressively enhance feature representation without additional computational cost. An Adaptive High-Frequency Refinement Module (AHFRM) is introduced to recover high-frequency details lost during processing. Extensive experiments are reported to demonstrate superior performance over recent Transformer- and Mamba-based SR methods while incurring lower computational cost.

Significance. If the efficiency and performance claims hold under rigorous verification, the work offers a meaningful advance in lightweight SR by establishing a cross-scale modeling paradigm that combines Transformer locality with Mamba's linear complexity. Credit is given for the explicit focus on zero-overhead scale interactions and the empirical comparisons on standard benchmarks, which provide falsifiable performance predictions.

major comments (1)

[Abstract and §3.3] Abstract and §3.3: The central claim that Progressive Mamba integration with window self-attention enables scale interactions 'without introducing additional computational cost' is load-bearing for the lightweight positioning. The complexity analysis must explicitly derive or measure the overhead of progressive state updates and AHFRM adaptation steps; if these introduce hidden FLOPs or memory traffic not reflected in the reported tables, the attribution of gains to the fine-grained paradigm is weakened.

minor comments (3)

[§4.1] §4.1: Ensure all dataset splits and training protocols (e.g., patch sizes, augmentation) are stated with sufficient detail for reproducibility, including any differences from prior Mamba-SR baselines.
[Figure 2] Figure 2: The network diagram would benefit from explicit annotation of the data flow between window-attention and Progressive Mamba blocks to clarify the claimed scale interactions.
[Table 1] Table 1: Add standard deviation or multiple-run statistics to PSNR/SSIM entries if single-run results are reported, to strengthen the performance superiority claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below and will revise the paper to strengthen the complexity analysis as requested.

read point-by-point responses

Referee: [Abstract and §3.3] Abstract and §3.3: The central claim that Progressive Mamba integration with window self-attention enables scale interactions 'without introducing additional computational cost' is load-bearing for the lightweight positioning. The complexity analysis must explicitly derive or measure the overhead of progressive state updates and AHFRM adaptation steps; if these introduce hidden FLOPs or memory traffic not reflected in the reported tables, the attribution of gains to the fine-grained paradigm is weakened.

Authors: We agree that an explicit derivation and measurement of overhead is necessary to rigorously support the lightweight claims. In the revised manuscript, we will expand the complexity analysis in §3.3 with detailed equations deriving the FLOPs for Progressive Mamba state updates (showing that cross-scale interactions reuse the same linear-complexity SSM transitions without extra matrix operations) and for AHFRM (which employs parameter-efficient adaptive filtering with O(1) overhead relative to the backbone). We will also add empirical measurements of runtime and peak memory on standard benchmarks to rule out hidden costs from memory traffic. This revision will clarify that the reported gains are attributable to the fine-grained paradigm while preserving the overall complexity profile. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new architecture integration with empirical validation

full rationale

The paper proposes T-PMambaSR as an integration of window-based self-attention and Progressive Mamba plus a new AHFRM module to enable cross-scale receptive field interactions and high-frequency recovery. No equations or sections reduce the claimed fine-grained paradigm or zero-cost property to a self-definition, fitted parameter, or self-citation chain; the central claims rest on the architectural design choices and reported experimental comparisons rather than any input being renamed or forced as output. The derivation chain is therefore self-contained as a standard model-construction-plus-validation process.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on background assumptions about Transformer and Mamba computational properties drawn from prior literature, plus newly introduced architectural components whose benefits are asserted via experiments.

free parameters (1)

network design hyperparameters
Choices such as number of blocks, channel dimensions, and window sizes are required to implement the hybrid architecture and balance the claimed efficiency gains.

axioms (1)

domain assumption Mamba-based methods can capture global receptive fields with linear complexity while Transformers incur quadratic cost
Invoked in the abstract as the motivation for using Mamba to address limitations of Transformer-based SR approaches.

invented entities (2)

Progressive Mamba no independent evidence
purpose: To enable fine-grained transitions and interactions across different modeling scales
New component introduced to address the stated limitation of existing Mamba methods.
Adaptive High-Frequency Refinement Module (AHFRM) no independent evidence
purpose: To recover high-frequency details lost during Transformer and Mamba processing
New module proposed as an addition to the framework.

pith-pipeline@v0.9.0 · 5698 in / 1647 out tokens · 60916 ms · 2026-05-18T01:38:01.176116+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

[1]

A systematic survey of deep learning-based single-image super-resolution,

J. Li, Z. Pei, W. Li, G. Gao, L. Wang, Y . Wang, and T. Zeng, “A systematic survey of deep learning-based single-image super-resolution,” ACM Computing Surveys, vol. 56, no. 10, pp. 1–40, 2024. 1

work page 2024
[2]

Towards realistic data generation for real-world super-resolution,

L. Peng, W. Li, R. Pei, J. Ren, J. Xu, Y . Wang, Y . Cao, and Z.-J. Zha, “Towards realistic data generation for real-world super-resolution,” in ICLR, 2024. 1

work page 2024
[3]

Pmq-ve: Progressive multi-frame quantization for video enhancement,

Z. Feng, L. Peng, X. Di, Y . Guo, W. Li, Y . Zhang, R. Pei, Y . Wang, Y . Cao, and Z.-J. Zha, “Pmq-ve: Progressive multi-frame quantization for video enhancement,”arXiv preprint arXiv:2505.12266, 2025. 1

work page arXiv 2025
[4]

Survey on deep face restoration: From non-blind to blind and beyond,

W. Li, M. Wang, K. Zhang, J. Li, X. Li, Y . Zhang, G. Gao, W. Deng, and C.-W. Lin, “Survey on deep face restoration: From non-blind to blind and beyond,”arXiv preprint arXiv:2309.15490, 2023. 1

work page arXiv 2023
[5]

Self-supervised selective- guided diffusion model for old-photo face restoration,

W. Li, X. Wang, H. Guo, G. Gao, and Z. Ma, “Self-supervised selective- guided diffusion model for old-photo face restoration,” inNeurIPS, 2025. 1

work page 2025
[6]

Lightweight image super- resolution with information multi-distillation network,

Z. Hui, X. Gao, Y . Yang, and X. Wang, “Lightweight image super- resolution with information multi-distillation network,” inACM MM, 2019, pp. 2024–2032. 1, 2

work page 2019
[7]

Feature distillation interaction weighting network for lightweight image super-resolution,

G. Gao, W. Li, J. Li, F. Wu, H. Lu, and Y . Yu, “Feature distillation interaction weighting network for lightweight image super-resolution,” inAAAI, vol. 36, no. 1, 2022, pp. 661–669. 1, 2

work page 2022
[8]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inICLR, 2021. 1

work page 2021
[9]

Swinir: Image restoration using swin transformer,

J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” inICCVW, 2021, pp. 1833–1844. 1, 2, 6, 7, 8

work page 2021
[10]

Efficient long-range attention network for image super-resolution,

X. Zhang, H. Zeng, S. Guo, and L. Zhang, “Efficient long-range attention network for image super-resolution,” inECCV. Springer, 2022, pp. 649–667. 1, 6, 7

work page 2022
[11]

Srformer: Permuted self-attention for single image super-resolution,

Y . Zhou, Z. Li, C.-L. Guo, S. Bai, M.-M. Cheng, and Q. Hou, “Srformer: Permuted self-attention for single image super-resolution,” inICCV, 2023, pp. 12 780–12 791. 1, 2, 6, 7, 8, 9, 10

work page 2023
[12]

Hit-sr: Hierarchical transformer for efficient image super-resolution,

X. Zhang, Y . Zhang, and F. Yu, “Hit-sr: Hierarchical transformer for efficient image super-resolution,” inECCV. Springer, 2024, pp. 483–500. 1, 2, 6, 7, 8, 9, 10

work page 2024
[13]

Dual-domain modulation network for lightweight image super-resolution,

W. Li, H. Guo, Y . Hou, G. Gao, and Z. Ma, “Dual-domain modulation network for lightweight image super-resolution,”IEEE Trans. Multimedia,

work page
[14]

On single image scale-up using sparse-representations,

R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” inICCS, 2010, pp. 711–730. 1, 6

work page 2010
[15]

Single image super-resolution from transformed self-exemplars,

J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” inCVPR, 2015, pp. 5197–5206. 1, 6, 7

work page 2015
[16]

Mambair: A simple baseline for image restoration with state-space model,

H. Guo, J. Li, T. Dai, Z. Ouyang, X. Ren, and S.-T. Xia, “Mambair: A simple baseline for image restoration with state-space model,” inECCV. Springer, 2024, pp. 222–241. 1, 2, 4, 6, 7, 8, 9, 10

work page 2024
[17]

Mambairv2: Attentive state space restoration,

H. Guo, Y . Guo, Y . Zha, Y . Zhang, W. Li, T. Dai, S.-T. Xia, and Y . Li, “Mambairv2: Attentive state space restoration,” inCVPR, 2025, pp. 28 124–28 133. 1, 2, 6, 7, 8, 9, 10

work page 2025
[18]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2024. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

Efficient image super-resolution with feature interaction weighted hybrid network,

W. Li, J. Li, G. Gao, W. Deng, J. Yang, G.-J. Qi, and C.-W. Lin, “Efficient image super-resolution with feature interaction weighted hybrid network,” IEEE Trans. Multimedia, vol. 27, pp. 2256–2267, 2025. 1, 2

work page 2025
[20]

Feature modulation transformer: Cross-refinement of global representation via high-frequency prior for image super-resolution,

A. Li, L. Zhang, Y . Liu, and C. Zhu, “Feature modulation transformer: Cross-refinement of global representation via high-frequency prior for image super-resolution,” inICCV, 2023, pp. 12 514–12 524. 2

work page 2023
[21]

FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution

S. Xu, W. Li, G. Gao, J. Yang, G.-J. Qi, and C.-W. Lin, “Fadpnet: Frequency-aware dual-path network for face super-resolution,”arXiv preprint arXiv:2506.14121, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Soft-edge assisted network for single image super-resolution,

F. Fang, J. Li, and T. Zeng, “Soft-edge assisted network for single image super-resolution,”IEEE Trans. Image Process, vol. 29, pp. 4656–4668,

work page
[23]

2 JOURNAL OF LATEX CLASS FILES, VOL-, NO-, 2020 12

work page 2020
[24]

Transforming image super- resolution: a convformer-based efficient approach,

G. Wu, J. Jiang, J. Jiang, and X. Liu, “Transforming image super- resolution: a convformer-based efficient approach,”IEEE Trans. Image Process, vol. 33, pp. 6071–6082, 2024. 2

work page 2024
[25]

Accelerating the super-resolution convolutional neural network,

C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” inECCV. Springer, 2016, pp. 391–407. 2

work page 2016
[26]

Cross-receptive focused inference network for lightweight image super- resolution,

W. Li, J. Li, G. Gao, W. Deng, J. Zhou, J. Yang, and G.-J. Qi, “Cross-receptive focused inference network for lightweight image super- resolution,”IEEE Trans. Multimedia, vol. 26, pp. 864–877, 2023. 2

work page 2023
[27]

Lightweight bimodal network for single-image super-resolution via symmetric cnn and recursive transformer,

G. Gao, Z. Wang, J. Li, W. Li, Y . Yu, and T. Zeng, “Lightweight bimodal network for single-image super-resolution via symmetric cnn and recursive transformer,” inIJCAI, 2022, pp. 913–919. 2

work page 2022
[28]

Frequency-assisted mamba for remote sensing image super-resolution,

Y . Xiao, Q. Yuan, K. Jiang, Y . Chen, Q. Zhang, and C.-W. Lin, “Frequency-assisted mamba for remote sensing image super-resolution,” IEEE Trans. Multimedia, vol. 27, pp. 1783–1796, 2025. 2

work page 2025
[29]

Transformer for single image super-resolution,

Z. Lu, J. Li, H. Liu, C. Huang, L. Zhang, and T. Zeng, “Transformer for single image super-resolution,” inCVPRW, 2022, pp. 457–466. 2, 5

work page 2022
[30]

Efficient face super-resolution via wavelet-based feature enhancement network,

W. Li, H. Guo, X. Liu, K. Liang, J. Hu, Z. Ma, and J. Guo, “Efficient face super-resolution via wavelet-based feature enhancement network,” inACM MM, 2024, pp. 4515–4523. 2

work page 2024
[31]

Adaptive frequency filters as efficient global token mixers,

Z. Huang, Z. Zhang, C. Lan, Z.-J. Zha, Y . Lu, and B. Guo, “Adaptive frequency filters as efficient global token mixers,” inICCV, 2023, pp. 6049–6059. 2

work page 2023
[32]

Fouriersr: A fourier token-based plu- gin for efficient image super-resolution,

W. Li, H. Guo, Y . Hou, and Z. Ma, “Fouriersr: A fourier token-based plu- gin for efficient image super-resolution,”arXiv preprint arXiv:2503.10043,

work page arXiv
[33]

Fdsr: An interpretable frequency division stepwise process based single-image super-resolution network,

P. Xu, Q. Liu, H. Bao, R. Zhang, L. Gu, and G. Wang, “Fdsr: An interpretable frequency division stepwise process based single-image super-resolution network,”IEEE Trans. Image Process, vol. 33, pp. 1710– 1725, 2024. 2

work page 2024
[34]

Exploring the potential of pooling techniques for universal image restoration,

Y . Cui, W. Ren, and A. Knoll, “Exploring the potential of pooling techniques for universal image restoration,”IEEE Trans. Image Process, vol. 34, pp. 3403–3416, 2025. 2

work page 2025
[35]

Can: Cascade augmentations against noise for image restoration,

Y . Yan, S. Yao, W. Ren, R. Zhang, Q. Guo, and X. Cao, “Can: Cascade augmentations against noise for image restoration,”IEEE Trans. Image Process, vol. 34, pp. 5131–5146, 2025. 2

work page 2025
[36]

Mamballie: Implicit retinex-aware low light enhancement with global-then-local state space,

J. Weng, Z. Yan, Y . Tai, J. Qian, J. Yang, and J. Li, “Mamballie: Implicit retinex-aware low light enhancement with global-then-local state space,” NeurIPS, pp. 27 440–27 462, 2024. 4, 8

work page 2024
[37]

Wave-mamba: Wavelet state space model for ultra-high-definition low-light image enhancement,

W. Zou, H. Gao, W. Yang, and T. Liu, “Wave-mamba: Wavelet state space model for ultra-high-definition low-light image enhancement,” in ACM MM, 2024, pp. 1534–1543. 4

work page 2024
[38]

Low- complexity single-image super-resolution based on nonnegative neighbor embedding,

M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Low- complexity single-image super-resolution based on nonnegative neighbor embedding,” inBMVC, 2012, pp. 135.1–135.10. 6

work page 2012
[39]

A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,

D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” inICCV, 2001, pp. 416–

work page 2001
[40]

Sketch-based manga retrieval using manga109 dataset,

Y . Matsui, K. Ito, Y . Aramaki, A. Fujimoto, T. Ogawa, T. Yamasaki, and K. Aizawa, “Sketch-based manga retrieval using manga109 dataset,” Multimed. Tools Appl., vol. 76, pp. 21 811–21 838, 2017. 6

work page 2017
[41]

Omni aggregation networks for lightweight image super-resolution,

H. Wang, X. Chen, B. Ni, Y . Liu, and J. Liu, “Omni aggregation networks for lightweight image super-resolution,” inCVPR, 2023, pp. 22 378– 22 387. 6, 7

work page 2023
[42]

Emulating self-attention with convolution for efficient image super-resolution,

D. Lee, S. Yun, and Y . Ro, “Emulating self-attention with convolution for efficient image super-resolution,” inICCV, 2025, pp. 24 467–24 477. 6, 7, 9, 10

work page 2025
[43]

A collaborative network of mamba and cnn for lightweight image super-resolution,

X. Wang, J. Li, J. Li, S. Wang, L. Yan, and Y . Xu, “A collaborative network of mamba and cnn for lightweight image super-resolution,”IEEE Trans. Consum. Electron., vol. 71, no. 2, pp. 3591–3604, 2025. 6, 7

work page 2025
[44]

Mair: A locality- and continuity-preserving mamba for image restoration,

B. Li, H. Zhao, W. Wang, P. Hu, Y . Gou, and X. Peng, “Mair: A locality- and continuity-preserving mamba for image restoration,” inCVPR, 2025, pp. 7491–7501. 6, 7, 8

work page 2025
[45]

Toward real-world single image super-resolution: A new benchmark and a new model,

J. Cai, H. Zeng, H. Yong, Z. Cao, and L. Zhang, “Toward real-world single image super-resolution: A new benchmark and a new model,” in CVPR, 2019, pp. 3086–3095. 6, 7, 8

work page 2019
[46]

Ntire 2017 challenge on single image super-resolution: Methods and results,

R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, and L. Zhang, “Ntire 2017 challenge on single image super-resolution: Methods and results,” inCVPRW, 2017, pp. 126–135. 6

work page 2017
[47]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Trans. Image Process, vol. 13, no. 4, pp. 600–612, 2004. 6

work page 2004
[48]

Vmamba: Visual state space model,

Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, J. Jiao, and Y . Liu, “Vmamba: Visual state space model,” inNeurIPS, vol. 37, 2024, pp. 103 031–103 063. 8

work page 2024
[49]

Interpreting super-resolution networks with local attribution maps,

J. Gu and C. Dong, “Interpreting super-resolution networks with local attribution maps,” inCVPR, 2021, pp. 9199–9208. 9, 10

work page 2021

[1] [1]

A systematic survey of deep learning-based single-image super-resolution,

J. Li, Z. Pei, W. Li, G. Gao, L. Wang, Y . Wang, and T. Zeng, “A systematic survey of deep learning-based single-image super-resolution,” ACM Computing Surveys, vol. 56, no. 10, pp. 1–40, 2024. 1

work page 2024

[2] [2]

Towards realistic data generation for real-world super-resolution,

L. Peng, W. Li, R. Pei, J. Ren, J. Xu, Y . Wang, Y . Cao, and Z.-J. Zha, “Towards realistic data generation for real-world super-resolution,” in ICLR, 2024. 1

work page 2024

[3] [3]

Pmq-ve: Progressive multi-frame quantization for video enhancement,

Z. Feng, L. Peng, X. Di, Y . Guo, W. Li, Y . Zhang, R. Pei, Y . Wang, Y . Cao, and Z.-J. Zha, “Pmq-ve: Progressive multi-frame quantization for video enhancement,”arXiv preprint arXiv:2505.12266, 2025. 1

work page arXiv 2025

[4] [4]

Survey on deep face restoration: From non-blind to blind and beyond,

W. Li, M. Wang, K. Zhang, J. Li, X. Li, Y . Zhang, G. Gao, W. Deng, and C.-W. Lin, “Survey on deep face restoration: From non-blind to blind and beyond,”arXiv preprint arXiv:2309.15490, 2023. 1

work page arXiv 2023

[5] [5]

Self-supervised selective- guided diffusion model for old-photo face restoration,

W. Li, X. Wang, H. Guo, G. Gao, and Z. Ma, “Self-supervised selective- guided diffusion model for old-photo face restoration,” inNeurIPS, 2025. 1

work page 2025

[6] [6]

Lightweight image super- resolution with information multi-distillation network,

Z. Hui, X. Gao, Y . Yang, and X. Wang, “Lightweight image super- resolution with information multi-distillation network,” inACM MM, 2019, pp. 2024–2032. 1, 2

work page 2019

[7] [7]

Feature distillation interaction weighting network for lightweight image super-resolution,

G. Gao, W. Li, J. Li, F. Wu, H. Lu, and Y . Yu, “Feature distillation interaction weighting network for lightweight image super-resolution,” inAAAI, vol. 36, no. 1, 2022, pp. 661–669. 1, 2

work page 2022

[8] [8]

An image is worth 16x16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” inICLR, 2021. 1

work page 2021

[9] [9]

Swinir: Image restoration using swin transformer,

J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” inICCVW, 2021, pp. 1833–1844. 1, 2, 6, 7, 8

work page 2021

[10] [10]

Efficient long-range attention network for image super-resolution,

X. Zhang, H. Zeng, S. Guo, and L. Zhang, “Efficient long-range attention network for image super-resolution,” inECCV. Springer, 2022, pp. 649–667. 1, 6, 7

work page 2022

[11] [11]

Srformer: Permuted self-attention for single image super-resolution,

Y . Zhou, Z. Li, C.-L. Guo, S. Bai, M.-M. Cheng, and Q. Hou, “Srformer: Permuted self-attention for single image super-resolution,” inICCV, 2023, pp. 12 780–12 791. 1, 2, 6, 7, 8, 9, 10

work page 2023

[12] [12]

Hit-sr: Hierarchical transformer for efficient image super-resolution,

X. Zhang, Y . Zhang, and F. Yu, “Hit-sr: Hierarchical transformer for efficient image super-resolution,” inECCV. Springer, 2024, pp. 483–500. 1, 2, 6, 7, 8, 9, 10

work page 2024

[13] [13]

Dual-domain modulation network for lightweight image super-resolution,

W. Li, H. Guo, Y . Hou, G. Gao, and Z. Ma, “Dual-domain modulation network for lightweight image super-resolution,”IEEE Trans. Multimedia,

work page

[14] [14]

On single image scale-up using sparse-representations,

R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” inICCS, 2010, pp. 711–730. 1, 6

work page 2010

[15] [15]

Single image super-resolution from transformed self-exemplars,

J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” inCVPR, 2015, pp. 5197–5206. 1, 6, 7

work page 2015

[16] [16]

Mambair: A simple baseline for image restoration with state-space model,

H. Guo, J. Li, T. Dai, Z. Ouyang, X. Ren, and S.-T. Xia, “Mambair: A simple baseline for image restoration with state-space model,” inECCV. Springer, 2024, pp. 222–241. 1, 2, 4, 6, 7, 8, 9, 10

work page 2024

[17] [17]

Mambairv2: Attentive state space restoration,

H. Guo, Y . Guo, Y . Zha, Y . Zhang, W. Li, T. Dai, S.-T. Xia, and Y . Li, “Mambairv2: Attentive state space restoration,” inCVPR, 2025, pp. 28 124–28 133. 1, 2, 6, 7, 8, 9, 10

work page 2025

[18] [18]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2024. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[19] [19]

Efficient image super-resolution with feature interaction weighted hybrid network,

W. Li, J. Li, G. Gao, W. Deng, J. Yang, G.-J. Qi, and C.-W. Lin, “Efficient image super-resolution with feature interaction weighted hybrid network,” IEEE Trans. Multimedia, vol. 27, pp. 2256–2267, 2025. 1, 2

work page 2025

[20] [20]

Feature modulation transformer: Cross-refinement of global representation via high-frequency prior for image super-resolution,

A. Li, L. Zhang, Y . Liu, and C. Zhu, “Feature modulation transformer: Cross-refinement of global representation via high-frequency prior for image super-resolution,” inICCV, 2023, pp. 12 514–12 524. 2

work page 2023

[21] [21]

FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution

S. Xu, W. Li, G. Gao, J. Yang, G.-J. Qi, and C.-W. Lin, “Fadpnet: Frequency-aware dual-path network for face super-resolution,”arXiv preprint arXiv:2506.14121, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025

[22] [22]

Soft-edge assisted network for single image super-resolution,

F. Fang, J. Li, and T. Zeng, “Soft-edge assisted network for single image super-resolution,”IEEE Trans. Image Process, vol. 29, pp. 4656–4668,

work page

[23] [23]

2 JOURNAL OF LATEX CLASS FILES, VOL-, NO-, 2020 12

work page 2020

[24] [24]

Transforming image super- resolution: a convformer-based efficient approach,

G. Wu, J. Jiang, J. Jiang, and X. Liu, “Transforming image super- resolution: a convformer-based efficient approach,”IEEE Trans. Image Process, vol. 33, pp. 6071–6082, 2024. 2

work page 2024

[25] [25]

Accelerating the super-resolution convolutional neural network,

C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” inECCV. Springer, 2016, pp. 391–407. 2

work page 2016

[26] [26]

Cross-receptive focused inference network for lightweight image super- resolution,

W. Li, J. Li, G. Gao, W. Deng, J. Zhou, J. Yang, and G.-J. Qi, “Cross-receptive focused inference network for lightweight image super- resolution,”IEEE Trans. Multimedia, vol. 26, pp. 864–877, 2023. 2

work page 2023

[27] [27]

Lightweight bimodal network for single-image super-resolution via symmetric cnn and recursive transformer,

G. Gao, Z. Wang, J. Li, W. Li, Y . Yu, and T. Zeng, “Lightweight bimodal network for single-image super-resolution via symmetric cnn and recursive transformer,” inIJCAI, 2022, pp. 913–919. 2

work page 2022

[28] [28]

Frequency-assisted mamba for remote sensing image super-resolution,

Y . Xiao, Q. Yuan, K. Jiang, Y . Chen, Q. Zhang, and C.-W. Lin, “Frequency-assisted mamba for remote sensing image super-resolution,” IEEE Trans. Multimedia, vol. 27, pp. 1783–1796, 2025. 2

work page 2025

[29] [29]

Transformer for single image super-resolution,

Z. Lu, J. Li, H. Liu, C. Huang, L. Zhang, and T. Zeng, “Transformer for single image super-resolution,” inCVPRW, 2022, pp. 457–466. 2, 5

work page 2022

[30] [30]

Efficient face super-resolution via wavelet-based feature enhancement network,

W. Li, H. Guo, X. Liu, K. Liang, J. Hu, Z. Ma, and J. Guo, “Efficient face super-resolution via wavelet-based feature enhancement network,” inACM MM, 2024, pp. 4515–4523. 2

work page 2024

[31] [31]

Adaptive frequency filters as efficient global token mixers,

Z. Huang, Z. Zhang, C. Lan, Z.-J. Zha, Y . Lu, and B. Guo, “Adaptive frequency filters as efficient global token mixers,” inICCV, 2023, pp. 6049–6059. 2

work page 2023

[32] [32]

Fouriersr: A fourier token-based plu- gin for efficient image super-resolution,

W. Li, H. Guo, Y . Hou, and Z. Ma, “Fouriersr: A fourier token-based plu- gin for efficient image super-resolution,”arXiv preprint arXiv:2503.10043,

work page arXiv

[33] [33]

Fdsr: An interpretable frequency division stepwise process based single-image super-resolution network,

P. Xu, Q. Liu, H. Bao, R. Zhang, L. Gu, and G. Wang, “Fdsr: An interpretable frequency division stepwise process based single-image super-resolution network,”IEEE Trans. Image Process, vol. 33, pp. 1710– 1725, 2024. 2

work page 2024

[34] [34]

Exploring the potential of pooling techniques for universal image restoration,

Y . Cui, W. Ren, and A. Knoll, “Exploring the potential of pooling techniques for universal image restoration,”IEEE Trans. Image Process, vol. 34, pp. 3403–3416, 2025. 2

work page 2025

[35] [35]

Can: Cascade augmentations against noise for image restoration,

Y . Yan, S. Yao, W. Ren, R. Zhang, Q. Guo, and X. Cao, “Can: Cascade augmentations against noise for image restoration,”IEEE Trans. Image Process, vol. 34, pp. 5131–5146, 2025. 2

work page 2025

[36] [36]

Mamballie: Implicit retinex-aware low light enhancement with global-then-local state space,

J. Weng, Z. Yan, Y . Tai, J. Qian, J. Yang, and J. Li, “Mamballie: Implicit retinex-aware low light enhancement with global-then-local state space,” NeurIPS, pp. 27 440–27 462, 2024. 4, 8

work page 2024

[37] [37]

Wave-mamba: Wavelet state space model for ultra-high-definition low-light image enhancement,

W. Zou, H. Gao, W. Yang, and T. Liu, “Wave-mamba: Wavelet state space model for ultra-high-definition low-light image enhancement,” in ACM MM, 2024, pp. 1534–1543. 4

work page 2024

[38] [38]

Low- complexity single-image super-resolution based on nonnegative neighbor embedding,

M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Low- complexity single-image super-resolution based on nonnegative neighbor embedding,” inBMVC, 2012, pp. 135.1–135.10. 6

work page 2012

[39] [39]

A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,

D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” inICCV, 2001, pp. 416–

work page 2001

[40] [40]

Sketch-based manga retrieval using manga109 dataset,

Y . Matsui, K. Ito, Y . Aramaki, A. Fujimoto, T. Ogawa, T. Yamasaki, and K. Aizawa, “Sketch-based manga retrieval using manga109 dataset,” Multimed. Tools Appl., vol. 76, pp. 21 811–21 838, 2017. 6

work page 2017

[41] [41]

Omni aggregation networks for lightweight image super-resolution,

H. Wang, X. Chen, B. Ni, Y . Liu, and J. Liu, “Omni aggregation networks for lightweight image super-resolution,” inCVPR, 2023, pp. 22 378– 22 387. 6, 7

work page 2023

[42] [42]

Emulating self-attention with convolution for efficient image super-resolution,

D. Lee, S. Yun, and Y . Ro, “Emulating self-attention with convolution for efficient image super-resolution,” inICCV, 2025, pp. 24 467–24 477. 6, 7, 9, 10

work page 2025

[43] [43]

A collaborative network of mamba and cnn for lightweight image super-resolution,

X. Wang, J. Li, J. Li, S. Wang, L. Yan, and Y . Xu, “A collaborative network of mamba and cnn for lightweight image super-resolution,”IEEE Trans. Consum. Electron., vol. 71, no. 2, pp. 3591–3604, 2025. 6, 7

work page 2025

[44] [44]

Mair: A locality- and continuity-preserving mamba for image restoration,

B. Li, H. Zhao, W. Wang, P. Hu, Y . Gou, and X. Peng, “Mair: A locality- and continuity-preserving mamba for image restoration,” inCVPR, 2025, pp. 7491–7501. 6, 7, 8

work page 2025

[45] [45]

Toward real-world single image super-resolution: A new benchmark and a new model,

J. Cai, H. Zeng, H. Yong, Z. Cao, and L. Zhang, “Toward real-world single image super-resolution: A new benchmark and a new model,” in CVPR, 2019, pp. 3086–3095. 6, 7, 8

work page 2019

[46] [46]

Ntire 2017 challenge on single image super-resolution: Methods and results,

R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, and L. Zhang, “Ntire 2017 challenge on single image super-resolution: Methods and results,” inCVPRW, 2017, pp. 126–135. 6

work page 2017

[47] [47]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Trans. Image Process, vol. 13, no. 4, pp. 600–612, 2004. 6

work page 2004

[48] [48]

Vmamba: Visual state space model,

Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, J. Jiao, and Y . Liu, “Vmamba: Visual state space model,” inNeurIPS, vol. 37, 2024, pp. 103 031–103 063. 8

work page 2024

[49] [49]

Interpreting super-resolution networks with local attribution maps,

J. Gu and C. Dong, “Interpreting super-resolution networks with local attribution maps,” inCVPR, 2021, pp. 9199–9208. 9, 10

work page 2021