pith. sign in

arxiv: 2605.17470 · v2 · pith:XQTSFI4Xnew · submitted 2026-05-17 · 💻 cs.CV · cs.MM· eess.IV

EchoSR: Efficient Context Harnessing for Lightweight Image Super-Resolution

Pith reviewed 2026-05-20 14:25 UTC · model grok-4.3

classification 💻 cs.CV cs.MMeess.IV
keywords lightweight super-resolutioncontext fusionmulti-scale modelingimage upscalingefficient neural networkshierarchical contextcomputer vision
0
0 comments X

The pith

EchoSR splits feature processing into local, multi-scale, and global stages with overlapping fusion to deliver higher-quality lightweight super-resolution at roughly twice the speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EchoSR as a way to improve image super-resolution when computing resources are limited. It separates the work into three distinct stages that each focus on a different kind of context: nearby pixels, features at many different sizes, and the overall scene layout. These stages are then joined by a cross-scale overlapping fusion step that mixes the information without adding much extra work. Tests on standard image benchmarks show the method produces sharper results than earlier lightweight approaches while running about twice as fast. Readers would care if this makes detailed image enlargement practical on phones or other small hardware.

Core claim

EchoSR decouples feature learning into disentangled local, multi-scale, and global modeling stages through an efficient context-harnessing strategy, and further promotes seamless cross-scale integration via a cross-scale overlapping fusion mechanism, consistently outperforming state-of-the-art lightweight super-resolution methods across multiple benchmarks while achieving approximately 2x faster speed.

What carries the argument

Disentangled local, multi-scale, and global modeling stages together with a cross-scale overlapping fusion mechanism that unifies multi-scale receptive field modeling and hierarchical context fusion.

If this is right

  • Lightweight super-resolution models can reach higher reconstruction accuracy without large increases in computation.
  • The separation into local, multi-scale, and global stages followed by fusion supports efficient handling of context at different ranges.
  • Faster inference makes real-time upscaling feasible in settings with tight power or memory limits.
  • The same design choices produce gains on multiple common test sets for single-image super-resolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The stage-separation idea could be tried in other efficiency-focused tasks such as image denoising or low-light enhancement.
  • Adding a temporal stage to the same disentanglement pattern might adapt the method for video super-resolution.
  • Checking performance on uncurated phone-camera photos could show whether benchmark gains carry over to everyday use.
  • If the fusion step proves general, it might reduce the need for hand-tuned scale-specific layers in other vision networks.

Load-bearing premise

The proposed disentangled stages and cross-scale overlapping fusion will combine into coherent results that deliver the claimed quality and speed gains without hidden extra costs or extra tuning.

What would settle it

Side-by-side timing and quality measurements on the same hardware and datasets where EchoSR fails to run approximately twice as fast or fails to exceed the PSNR and SSIM scores of prior top lightweight methods.

Figures

Figures reproduced from arXiv: 2605.17470 by Binhao Wang, Hanli Zhao, Kaihao Zhang, Shihao Zhao, Tao Wang, Wanglong Lu.

Figure 1
Figure 1. Figure 1: Visualization of the effective receptive field (ERF) (top) and the feature [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparisons on the Urban100 test set at ×2 scale with input resolution of 1024 × 1024. The area of each circle indicates peak memory usage during inference. EchoSR demonstrates the best balance among performance, memory consumption, and inference latency. achieves spatial structural rectification by aligning features from different spatial hierarchies. This mechanism ensures a gradual and coherent transiti… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of our EchoSR architecture for lightweight image super-resolution. CHB extracts local, multi-scale, and global features in parallel, while [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visual comparison of EchoSR (ours) and SOTA methods on the Urban100 benchmark for [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual comparisons of EchoSR (ours) and SOTA methods on the Urban100 benchmark for [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visual comparison of pixel-wise error maps on the Urban100 ( [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparisons of EchoSR (ours) and SOTA methods on the Manga109 benchmark for [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Peak GPU memory usage (left) and average inference latency (right) of SR methods at different input resolutions under the [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparisons on the Urban100 test set at ×2 scale with input resolution of 256 × 256. The area of each circle indicates the computational complexity (MACs). EchoSR (ours) demonstrates a superior trade-off among reconstruction performance, computational efficiency, and inference speed. lation [8] and channel rearrangement [35], [41], which can come at the cost of neglecting the interaction of information flo… view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of the ERF across different SR models, including Transformer-based (SwinIR, HiT-SIR), Mamba-based (MambaIR, MaIR), and CNN [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visual comparisons of EchoSR-lite (ours) and SOTA tiny methods on the Urban100 benchmark for [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visual comparisons of EchoSR (ours) and SOTA methods on the RealSR dataset for [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Visualization of feature maps in our MRFE module. We showcase outputs from different branches within MRFE. The identity mapping branch [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗
Figure 15
Figure 15. Figure 15: Visualization of COFB. After processing by the COFB module, the [PITH_FULL_IMAGE:figures/full_fig_p016_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Visualization of the ERF for the three parallel branches in the [PITH_FULL_IMAGE:figures/full_fig_p017_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Quantitative analysis of ERF area ratio relative change across various cumulative contribution score thresholds ( [PITH_FULL_IMAGE:figures/full_fig_p018_17.png] view at source ↗
read the original abstract

Image super-resolution (SR) aims to reconstruct high-quality, high-resolution (HR) images from low-resolution (LR) inputs and plays a critical role in various downstream applications. Despite recent advancements, balancing reconstruction fidelity and computational efficiency remains a fundamental challenge, particularly in resource-constrained scenarios. While existing lightweight methods attempt to expand receptive fields, many of them either incur substantial computational overhead, naively scale up kernel sizes, or lack mechanisms for coherent multi-scale integration, limiting their overall effectiveness and scalability. To address these limitations, we propose EchoSR, an efficient context-harnessing framework for lightweight image super-resolution, which unifies multi-scale receptive field modeling and hierarchical context fusion. EchoSR decouples feature learning into disentangled local, multi-scale, and global modeling stages through an efficient context-harnessing strategy, and further promotes seamless cross-scale integration via a cross-scale overlapping fusion mechanism. Extensive experiments have shown that EchoSR consistently outperforms state-of-the-art lightweight super-resolution methods across multiple benchmarks, while also achieving a faster speed $(\sim 2\times)$. The source code is available at https://github.com/funnyWang-Echoes/EchoSR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes EchoSR, a lightweight image super-resolution framework that decouples feature learning into disentangled local, multi-scale, and global modeling stages via an efficient context-harnessing strategy and introduces a cross-scale overlapping fusion mechanism for hierarchical context integration. It claims that this unification enables consistent outperformance over state-of-the-art lightweight SR methods (e.g., IMDN, RFDN, CARN) across multiple benchmarks while delivering approximately 2x faster inference speed, with source code released.

Significance. If the empirical claims hold under rigorous verification, the work would advance lightweight SR by addressing receptive-field expansion without naive kernel scaling or excessive overhead, offering a practical unification of multi-scale modeling and fusion that could benefit real-time applications on edge devices. The public code release supports reproducibility, which strengthens the contribution relative to purely empirical papers lacking such artifacts.

major comments (2)
  1. [§4] §4 (Experiments) and associated tables: the headline claim of ~2x faster speed and superior PSNR/SSIM is load-bearing for the central contribution, yet the manuscript provides no ablation that removes only the cross-scale overlapping fusion block while holding stage channel counts and other parameters fixed; without this, it is impossible to isolate whether fusion overhead negates the reported latency gains under standardized PyTorch/CUDA timing at fixed input resolutions.
  2. [§3.2] §3.2 (Cross-scale overlapping fusion): the mechanism description asserts coherent integration without substantial computational overhead, but contains no FLOPs or memory-traffic bound on the overlapping feature exchange; this directly risks the efficiency claim when compared to prior lightweight baselines at identical parameter/FLOP budgets.
minor comments (2)
  1. [Figure 2] Figure 2 (architecture diagram): the flow between local/multi-scale/global branches and the fusion module would benefit from explicit arrow labels indicating tensor shapes or channel counts to clarify the disentanglement.
  2. [§4.1] §4.1 (Datasets and metrics): specify the exact training/validation splits and whether results are averaged over multiple random seeds with standard deviations, as the abstract asserts 'consistent' outperformance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and outline the revisions planned to strengthen the empirical validation of our efficiency claims.

read point-by-point responses
  1. Referee: §4 (Experiments) and associated tables: the headline claim of ~2x faster speed and superior PSNR/SSIM is load-bearing for the central contribution, yet the manuscript provides no ablation that removes only the cross-scale overlapping fusion block while holding stage channel counts and other parameters fixed; without this, it is impossible to isolate whether fusion overhead negates the reported latency gains under standardized PyTorch/CUDA timing at fixed input resolutions.

    Authors: We agree that an ablation isolating the cross-scale overlapping fusion block (with all other stage channel counts and hyperparameters held fixed) would provide clearer evidence for the source of the reported latency gains. In the revised manuscript we will add this experiment to §4. The variant without the fusion block will be evaluated on the same benchmarks, hardware, and standardized PyTorch/CUDA timing protocol used for the main results, allowing direct quantification of any overhead introduced by the fusion mechanism. revision: yes

  2. Referee: §3.2 (Cross-scale overlapping fusion): the mechanism description asserts coherent integration without substantial computational overhead, but contains no FLOPs or memory-traffic bound on the overlapping feature exchange; this directly risks the efficiency claim when compared to prior lightweight baselines at identical parameter/FLOP budgets.

    Authors: We acknowledge that explicit FLOPs and memory-traffic bounds for the overlapping feature exchange would better support the efficiency assertions. In the revised §3.2 we will insert a dedicated complexity analysis that derives the additional FLOPs and memory traffic of the cross-scale overlapping fusion and compares these quantities to the overall model budget as well as to the corresponding costs in the cited lightweight baselines (IMDN, RFDN, CARN) at matched parameter and FLOP counts. Empirical measurements on the same hardware will also be reported. revision: yes

Circularity Check

0 steps flagged

No circularity in EchoSR empirical architecture proposal

full rationale

The paper introduces EchoSR as an empirical neural architecture for lightweight super-resolution, with claims resting on benchmark experiments and speed measurements rather than any closed-form derivation or prediction. No equations, fitted parameters renamed as outputs, or self-citation chains are present in the provided text that would reduce the central claims to inputs by construction. The design choices (disentangled stages and fusion) are presented as engineering decisions validated externally via comparisons to IMDN, RFDN, etc., making the work self-contained against independent benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard deep-learning assumptions plus several design choices introduced in the paper. No machine-checked proofs or parameter-free derivations are mentioned.

free parameters (1)
  • stage channel counts and fusion kernel sizes
    Design hyperparameters that define the local, multi-scale, and global branches and the overlapping fusion; these are chosen to balance efficiency and performance.
axioms (1)
  • domain assumption Disentangling feature learning into independent local, multi-scale, and global stages plus cross-scale overlapping fusion yields coherent integration without substantial overhead.
    Invoked in the abstract description of the framework as the basis for the efficiency and performance claims.
invented entities (1)
  • EchoSR context-harnessing modules (local/multi-scale/global branches and cross-scale overlapping fusion) no independent evidence
    purpose: To unify multi-scale receptive field modeling and hierarchical context fusion in a lightweight manner.
    New architectural components introduced by the paper; no independent evidence outside the claimed experiments is provided in the abstract.

pith-pipeline@v0.9.0 · 5749 in / 1575 out tokens · 36356 ms · 2026-05-20T14:25:08.664770+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 1 internal anchor

  1. [1]

    S. Liu, W. Li, D. He, G. Wang, Y . Huang, Ssefusion: Salient semantic enhancement for multimodal medical image fusion with mamba and dy- namic spiking neural networks, Information Fusion 119 (2025) 103031

  2. [2]

    J. Qu, D. Huang, Y . Shi, J. Liu, W. Tang, Entropy-aware dynamic path selection network for multi-modality medical image fusion, Information Fusion 123 (2025) 103312

  3. [3]

    D. K. Jain, X. Zhao, C. Gan, P. K. Shukla, A. Jain, S. Sharma, Fusion- driven deep feature network for enhanced object detection and tracking in video surveillance systems, Information Fusion 109 (2024) 102429

  4. [4]

    Zhang, T

    W. Zhang, T. Li, Y . Zhang, G. Pei, X. Jiang, Y . Yao, Ltformer: A light-weight transformer-based self-supervised matching network for heterogeneous remote sensing images, Information Fusion 109 (2024) 102425

  5. [5]

    J. Liu, R. Xu, Y . Duan, T. Guo, G. Shi, F. Luo, Mdgf-cd: Land-cover change detection with multi-level diffformer feature grouping fusion for vhr remote sensing images, Information Fusion 120 (2025) 103110

  6. [6]

    W. Lu, J. Wang, X. Jin, X. Jiang, H. Zhao, Facemug: A multimodal generative and fusion framework for local facial editing, IEEE Trans. Vis. Comput. Gr. (2024) 1–15

  7. [7]

    W. Lu, J. Wang, T. Wang, K. Zhang, X. Jiang, H. Zhao, Visual style prompt learning using diffusion models for blind face restoration, Pattern Recognit. 161 (2025) 111312

  8. [8]

    Y . Wang, T. Su, Y . Li, J. Cao, G. Wang, X. Liu, Ddistill-sr: Reparameter- ized dynamic distillation network for lightweight image super-resolution, IEEE Trans. Multim. 25 (2023) 7222–7234

  9. [9]

    B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee, Enhanced deep residual net- works for single image super-resolution, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2017, pp. 1132–1140

  10. [10]

    Z. Hui, X. Gao, Y . Yang, X. Wang, Lightweight image super-resolution with information multi-distillation network, in: ACM Int. Conf. Multi- media, 2019, pp. 2024–2032

  11. [11]

    Liang, J

    J. Liang, J. Cao, G. Sun, K. Zhang, L. V . Gool, R. Timofte, Swinir: Image restoration using swin transformer, in: Proc. IEEE Int. Conf. Comput. Vis. Workshops, 2021, pp. 1833–1844

  12. [12]

    Z. Chen, Y . Zhang, J. Gu, L. Kong, X. Yang, F. Yu, Dual aggrega- tion transformer for image super-resolution, in: Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 12278–12287

  13. [13]

    H. Choi, J. Lee, J. Yang, N-gram in swin transformers for efficient lightweight image super-resolution, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 2071–2081

  14. [14]

    Y . Zhou, Z. Li, C. Guo, S. Bai, M. Cheng, Q. Hou, Srformer: Permuted self-attention for single image super-resolution, in: Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 12734–12745

  15. [15]

    A. Gu, T. Dao, Mamba: Linear-time sequence modeling with selective state spaces, in: First Conference on Language Modeling, 2024

  16. [16]

    H. Guo, J. Li, T. Dai, Z. Ouyang, X. Ren, S. Xia, Mambair: A simple baseline for image restoration with state-space model, in: Proc. Eur. Conf. Comput. Vis., V ol. 15076, 2024, pp. 222–241

  17. [17]

    B. Li, H. Zhao, W. Wang, P. Hu, Y . Gou, X. Peng, Mair: A locality- and continuity-preserving mamba for image restoration, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2025

  18. [18]

    H. Feng, L. Wang, Y . Li, A. Du, LKASR: large kernel attention for lightweight image super-resolution, Knowl. Based Syst. 252 (2022) 109376

  19. [19]

    Y . Wang, Y . Li, G. Wang, X. Liu, Multi-scale attention network for single image super-resolution, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2024, pp. 5950–5960

  20. [20]

    X. Ding, X. Zhang, J. Han, G. Ding, Scaling up your kernels to 31×31: Revisiting large kernel design in cnns, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 11953–11965

  21. [21]

    W. Yu, M. Luo, P. Zhou, C. Si, Y . Zhou, X. Wang, J. Feng, S. Yan, Metaformer is actually what you need for vision, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., IEEE, 2022, pp. 10809–10819

  22. [22]

    J. Kim, J. K. Lee, K. M. Lee, Accurate image super-resolution using very deep convolutional networks, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1646–1654

  23. [23]

    Zamfir, Z

    E. Zamfir, Z. Wu, N. Mehta, Y . Zhang, R. Timofte, See more details: Efficient image super-resolution by experts mining, in: Proc. Int. Conf. Mach. Learn., 2024

  24. [24]

    Zhang, H

    X. Zhang, H. Zeng, S. Guo, L. Zhang, Efficient long-range attention network for image super-resolution, in: Proc. Eur. Conf. Comput. Vis., V ol. 13677, 2022, pp. 649–667

  25. [25]

    Zhang, Y

    X. Zhang, Y . Zhang, F. Yu, Hit-sr: Hierarchical transformer for efficient image super-resolution, in: Proc. Eur. Conf. Comput. Vis., V ol. 15098, 2024, pp. 483–500

  26. [26]

    A. Gu, T. Dao, Mamba: Linear-time sequence modeling with selective state spaces, CoRR abs/2312.00752 (2023)

  27. [27]

    M. Guo, C. Lu, Z. Liu, M. Cheng, S. Hu, Visual attention network, Comput. Vis. Media 9 (4) (2023) 733–752

  28. [28]

    S. Liu, T. Chen, X. Chen, X. Chen, Q. Xiao, B. Wu, T. K ¨arkk¨ainen, M. Pechenizkiy, D. C. Mocanu, Z. Wang, More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity, in: Proc. Int. Conf. Learn. Represent., 2023

  29. [29]

    W. Yu, P. Zhou, S. Yan, X. Wang, Inceptionnext: When inception meets convnext, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2024, pp. 5672–5683

  30. [30]

    X. Ding, Y . Zhang, Y . Ge, S. Zhao, L. Song, X. Yue, Y . Shan, Unireplknet: A universal perception large-kernel convnet for audio, video, point cloud, time-ignoreseries and image recognition, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2024, pp. 5513–5524

  31. [31]

    G. Wu, J. Jiang, J. Jiang, X. Liu, Transforming image super-resolution: A convformer-based efficient approach, IEEE Trans. Image Process. 33 (2024) 6071–6082

  32. [32]

    Z. Liu, H. Mao, C. Wu, C. Feichtenhofer, T. Darrell, S. Xie, A convnet for the 2020s, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 11966–11976

  33. [33]

    M. Tan, Q. V . Le, Mixconv: Mixed depthwise convolutional kernels, in: Proc. Brit. Mach. Vis. Conf., BMV A Press, 2019, p. 74

  34. [34]

    C. Dong, C. C. Loy, K. He, X. Tang, Learning a deep convolutional network for image super-resolution, in: Proc. Eur. Conf. Comput. Vis., V ol. 8692, 2014, pp. 184–199

  35. [35]

    L. Sun, J. Pan, J. Tang, Shufflemixer: An efficient convnet for image super-resolution, in: Proc. Adv. Neural Inf. Process. Syst., 2022

  36. [36]

    Behjati, P

    P. Behjati, P. Rodr ´ıguez, C. Fern´andez, I. Hupont, A. Mehri, J. Gonz`alez, Single image super-resolution based on directional variance attention network, Pattern Recognit. 133 (2023) 108997

  37. [37]

    H. Wang, X. Chen, B. Ni, Y . Liu, J. Liu, Omni aggregation networks for lightweight image super-resolution, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2023, pp. 22378–22387

  38. [38]

    A. Li, L. Zhang, Y . Liu, C. Zhu, Exploring frequency-inspired opti- mization in transformer for efficient single image super-resolution, IEEE Trans. Pattern Anal. Mach. Intell. 47 (4) (2025) 3141–3158

  39. [39]

    Timofte, E

    R. Timofte, E. Agustsson, L. V . Gool, M. Yang, NTIRE 2017 challenge on single image super-resolution: Methods and results, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2017, pp. 1110–1121

  40. [40]

    B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee, Enhanced deep residual net- works for single image super-resolution, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1132–1140

  41. [41]

    L. Sun, J. Dong, J. Tang, J. Pan, Spatially-adaptive feature modulation for efficient image super-resolution, in: Proc. IEEE Int. Conf. Comput. Vis., 2023, pp. 13144–13153

  42. [42]

    S. Li, Z. Wang, Z. Liu, C. Tan, H. Lin, D. Wu, Z. Chen, J. Zheng, S. Z. Li, Moganet: Multi-order gated aggregation network, in: Proc. Int. Conf. Learn. Represent., 2024

  43. [43]

    Y . Wang, T. Zhang, Osffnet: Omni-stage feature fusion network for lightweight image super-resolution, in: Proc. AAAI Conf. Artif. Intell., 2024, pp. 5660–5668

  44. [44]

    F. Li, R. Cong, J. Wu, H. Bai, M. Wang, Y . Zhao, Srconvnet: A transformer-style convnet for lightweight image super-resolution, Int. J. Comput. Vis. 133 (1) (2025) 173–189

  45. [45]

    W. Luo, Y . Li, R. Urtasun, R. S. Zemel, Understanding the effective receptive field in deep convolutional neural networks, in: Adv. Neural Inform. Process. Syst., 2016, pp. 4898–4906

  46. [46]

    Y . Blau, T. Michaeli, The perception-distortion tradeoff, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6228–6237

  47. [47]

    Zheng, L

    M. Zheng, L. Sun, J. Dong, J. Pan, Smfanet: A lightweight self- modulation feature aggregation network for efficient image super- resolution, in: Proc. Eur. Conf. Comput. Vis., V ol. 15108, 2024, pp. 359–375

  48. [48]

    X. Wang, L. Xie, C. Dong, Y . Shan, Real-esrgan: Training real-world blind super-resolution with pure synthetic data, in: Proc. IEEE Int. Conf. Comput. Vis. Workshops, 2021, pp. 1905–1914

  49. [49]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in: in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5998–6008

  50. [50]

    W. Wang, E. Xie, X. Li, D. Fan, K. Song, D. Liang, T. Lu, P. Luo, L. Shao, PVT v2: Improved baselines with pyramid vision transformer, Comput. Vis. Media 8 (3) (2022) 415–424