pith. sign in

arxiv: 2504.16455 · v2 · submitted 2025-04-23 · 💻 cs.CV

Cross Paradigm Representation and Alignment Transformer for Image Deraining

Pith reviewed 2026-05-22 17:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords image derainingtransformercross-paradigm representationadaptive alignmentself-attentionimage restoration
0
0 comments X

The pith

The CPRAformer integrates spatial-channel and global-local attention paradigms through alignment to enhance image deraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper aims to solve the problem of irregular rain patterns in images by creating a unified framework that combines two complementary representation paradigms in a Transformer architecture. It uses specialized self-attention modules for channel dependencies and spatial details, then aligns them with a frequency-based module to allow better feature interaction. A reader should care because current single-paradigm models have trouble with complex rain overlaps, and this cross-paradigm method could lead to more accurate image restoration in real-world scenarios like autonomous driving or photography.

Core claim

The paper claims that the Cross Paradigm Representation and Alignment Transformer extracts the most valuable interactive fusion information by bridging gaps within and between spatial-channel and global-local paradigms using sparse prompt channel self-attention, spatial pixel refinement self-attention, and the Adaptive Alignment Frequency Module for progressive alignment and complementarity.

What carries the argument

The Adaptive Alignment Frequency Module (AAFM) that performs two-stage progressive alignment and interaction between features from the two self-attention mechanisms to reduce information gaps.

If this is right

  • Achieves state-of-the-art results on eight benchmark deraining datasets.
  • Validates robustness across other image restoration tasks.
  • Enables deep interaction and fusion of complementary features from different paradigms.
  • Supports improved performance in downstream vision applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could be extended to handle multiple degradation types simultaneously by adding more paradigms.
  • Optimizing the AAFM might lead to more efficient models for real-time applications.
  • Similar alignment strategies may improve performance in related tasks such as image deblurring or super-resolution.

Load-bearing premise

The spatial-channel and global-local paradigms provide complementary information that can be reliably aligned by the frequency module without losing critical details or creating new problems in the restored image.

What would settle it

Running ablation studies that disable the Adaptive Alignment Frequency Module and measuring the drop in performance metrics on the benchmark datasets; little to no drop would indicate the alignment is not essential.

Figures

Figures reproduced from arXiv: 2504.16455 by Guangwei Gao, Guojun Qi, Juncheng Li, Shun Zou, Yi Zou.

Figure 1
Figure 1. Figure 1: Feature patterns obtained from four perspectives [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overall architecture of our proposed CPRAformer. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of different self-attention mechanisms. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The qualitative comparison on Test100 [77]. See the [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The qualitative comparison on raindrop datasets [41]. Our result has the best visual quality and details. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: Ablation study of EPGO [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The qualitative comparison on hazy images. Our result has the best visual quality and details. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Semantic segmentation results on Deeplab V3 [ [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
read the original abstract

Transformer-based networks have achieved strong performance in low-level vision tasks like image deraining by utilizing spatial or channel-wise self-attention. However, irregular rain patterns and complex geometric overlaps challenge single-paradigm architectures, necessitating a unified framework to integrate complementary global-local and spatial-channel representations. To address this, we propose a novel Cross Paradigm Representation and Alignment Transformer (CPRAformer). Its core idea is the hierarchical representation and alignment, leveraging the strengths of both paradigms (spatial-channel and global-local) to aid image reconstruction. It bridges the gap within and between paradigms, aligning and coordinating them to enable deep interaction and fusion of features. Specifically, we use two types of self-attention in the Transformer blocks: sparse prompt channel self-attention (SPC-SA) and spatial pixel refinement self-attention (SPR-SA). SPC-SA enhances global channel dependencies through dynamic sparsity, while SPR-SA focuses on spatial rain distribution and fine-grained texture recovery. To address the feature misalignment and knowledge differences between them, we introduce the Adaptive Alignment Frequency Module (AAFM), which aligns and interacts with features in a two-stage progressive manner, enabling adaptive guidance and complementarity. This reduces the information gap within and between paradigms. Through this unified cross-paradigm dynamic interaction framework, we achieve the extraction of the most valuable interactive fusion information from the two paradigms. Extensive experiments demonstrate that our model achieves state-of-the-art performance on eight benchmark datasets and further validates CPRAformer's robustness in other image restoration tasks and downstream applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes the Cross Paradigm Representation and Alignment Transformer (CPRAformer) for image deraining. It integrates two self-attention paradigms within Transformer blocks: Sparse Prompt Channel Self-Attention (SPC-SA) to enhance global channel dependencies via dynamic sparsity, and Spatial Pixel Refinement Self-Attention (SPR-SA) to focus on spatial rain distribution and texture recovery. These are bridged by the Adaptive Alignment Frequency Module (AAFM), which performs two-stage progressive alignment in frequency space to enable adaptive guidance, complementarity, and reduction of intra- and inter-paradigm information gaps. The central claim is that this unified cross-paradigm dynamic interaction framework extracts the most valuable fusion information, achieving state-of-the-art performance on eight benchmark datasets while showing robustness on other restoration tasks and downstream applications.

Significance. If the results hold under rigorous verification, the work offers a concrete approach to overcoming limitations of single-paradigm transformers in low-level vision by explicitly aligning complementary spatial-channel and global-local representations. The introduction of SPC-SA, SPR-SA, and especially the frequency-based AAFM provides named, implementable mechanisms that could generalize beyond deraining. Credit is due for framing the problem as a cross-paradigm alignment task and for extending evaluation to robustness and downstream tasks, though these strengths remain conditional on the quality of the supporting ablations and quantitative evidence.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (Experiments): The SOTA claim on eight benchmarks is stated without any quantitative PSNR/SSIM tables, baseline comparisons, or effect-size numbers in the abstract and is only summarized at high level in the provided text. This makes the magnitude of improvement over prior single-paradigm transformers impossible to assess directly and leaves the central claim unverified in the absence of the full experimental section.
  2. [§3.3] §3.3 (AAFM description): The text states that AAFM 'enables adaptive guidance and complementarity' and 'reduces the information gap' between SPC-SA and SPR-SA, yet no ablation isolating AAFM (e.g., full model vs. direct concatenation of the two attention outputs or vs. single-paradigm baselines) is referenced. Because the SOTA and robustness claims rest on the premise that AAFM reliably bridges knowledge differences without introducing artifacts or losing rain cues, this missing control is load-bearing for the central contribution.
  3. [§4.2] §4.2 (Ablation studies): If an ablation table exists, it should explicitly report the performance drop when AAFM is replaced by simpler fusion; without such a row the reported gains cannot be attributed to the cross-paradigm alignment mechanism rather than increased parameter count or training protocol.
minor comments (2)
  1. [Figures] Ensure all figures include clear captions that distinguish SPC-SA, SPR-SA, and AAFM outputs so readers can visually verify the claimed complementarity.
  2. [§3.3] The notation for frequency-domain operations inside AAFM should be defined once in §3.3 and used consistently; occasional undefined symbols (e.g., frequency alignment operator) appear in the method description.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, clarifying the content of the manuscript and indicating revisions that will strengthen the presentation of our results and ablations.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): The SOTA claim on eight benchmarks is stated without any quantitative PSNR/SSIM tables, baseline comparisons, or effect-size numbers in the abstract and is only summarized at high level in the provided text. This makes the magnitude of improvement over prior single-paradigm transformers impossible to assess directly and leaves the central claim unverified in the absence of the full experimental section.

    Authors: We agree that the abstract would benefit from quantitative highlights to immediately convey the scale of improvements. While the full experimental section (§4) already contains detailed PSNR/SSIM tables with baseline comparisons across all eight benchmarks, we will revise the abstract to include concise effect-size numbers (e.g., average PSNR gain over the strongest prior method) for better accessibility without exceeding length limits. revision: yes

  2. Referee: [§3.3] §3.3 (AAFM description): The text states that AAFM 'enables adaptive guidance and complementarity' and 'reduces the information gap' between SPC-SA and SPR-SA, yet no ablation isolating AAFM (e.g., full model vs. direct concatenation of the two attention outputs or vs. single-paradigm baselines) is referenced. Because the SOTA and robustness claims rest on the premise that AAFM reliably bridges knowledge differences without introducing artifacts or losing rain cues, this missing control is load-bearing for the central contribution.

    Authors: We thank the referee for this observation. Section 4.2 already includes ablations comparing the full model to single-paradigm baselines and to the model without AAFM. To more precisely isolate AAFM's role, we will add an explicit comparison of the full model versus direct concatenation (or addition) of SPC-SA and SPR-SA outputs, reporting the resulting performance difference to confirm that the frequency-based alignment provides the claimed complementarity without artifacts. revision: yes

  3. Referee: [§4.2] §4.2 (Ablation studies): If an ablation table exists, it should explicitly report the performance drop when AAFM is replaced by simpler fusion; without such a row the reported gains cannot be attributed to the cross-paradigm alignment mechanism rather than increased parameter count or training protocol.

    Authors: We agree that explicit controls are essential for attribution. Our existing ablation table in §4.2 demonstrates performance drops when AAFM is removed. We will revise the table to add a dedicated row for AAFM replaced by simpler fusion (direct concatenation), explicitly reporting the PSNR/SSIM drops and including a brief discussion of parameter counts to rule out confounding factors and directly link gains to the cross-paradigm alignment mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture and claims are independently specified without reduction to fitted inputs or self-citation loops

full rationale

The paper defines CPRAformer via explicit architectural choices (SPC-SA for channel dependencies, SPR-SA for spatial refinement, and AAFM for two-stage frequency alignment) that are motivated by stated limitations of single-paradigm transformers rather than derived from or equivalent to the target SOTA metrics. No equations or modules are shown to be fitted to the evaluation datasets and then re-presented as predictions; the complementarity assumption is an explicit design premise, not a self-referential definition. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The central claim rests on the untested premise that the two attention paradigms are complementary enough for alignment to yield net gains, plus standard transformer assumptions about attention scaling and feature fusion.

axioms (2)
  • domain assumption Self-attention mechanisms can be specialized into sparse channel and spatial pixel forms without losing essential rain pattern information.
    Invoked when defining SPC-SA and SPR-SA as the two core blocks.
  • domain assumption Frequency-domain alignment can resolve misalignment between spatial-channel and global-local features.
    Central to the introduction and function of the Adaptive Alignment Frequency Module.
invented entities (3)
  • Sparse Prompt Channel Self-Attention (SPC-SA) no independent evidence
    purpose: Enhance global channel dependencies through dynamic sparsity
    New attention variant introduced to capture one paradigm.
  • Spatial Pixel Refinement Self-Attention (SPR-SA) no independent evidence
    purpose: Focus on spatial rain distribution and fine-grained texture recovery
    New attention variant introduced to capture the complementary paradigm.
  • Adaptive Alignment Frequency Module (AAFM) no independent evidence
    purpose: Align and interact features from the two paradigms in a two-stage progressive manner
    New module to bridge the paradigms.

pith-pipeline@v0.9.0 · 5806 in / 1550 out tokens · 30183 ms · 2026-05-22T17:52:48.589976+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · 1 internal anchor

  1. [1]

    Hongming Chen, Xiang Chen, Jiyang Lu, and Yufeng Li. 2024. Rethinking multi- scale representations in deep deraining transformer. InAAAI, Vol. 38. 1046–1053

  2. [2]

    Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. 2022. Simple baselines for image restoration. In ECCV. Springer, 17–33

  3. [3]

    Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Chengpeng Chen. 2021. Hinet: Half instance normalization network for image restoration. In CVPR. 182–192

  4. [4]

    Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In ECCV

  5. [5]

    Sixiang Chen, Tian Ye, Yun Liu, and Erkang Chen. 2024. Dual-former: Hybrid self-attention transformer for efficient image restoration.Digital Signal Processing 149 (2024), 104485

  6. [6]

    Xiang Chen, Hao Li, Mingqiang Li, and Jinshan Pan. 2023. Learning a sparse transformer network for effective image deraining. In CVPR. 5896–5905

  7. [7]

    Xiang Chen, Jinshan Pan, and Jiangxin Dong. 2024. Bidirectional multi-scale implicit neural representations for image deraining. In CVPR. 25627–25636

  8. [8]

    Xiang Chen, Jinshan Pan, Jiyang Lu, Zhentao Fan, and Hao Li. 2023. Hybrid cnn-transformer feature fusion for single image deraining. In AAAI, Vol. 37. 378–386

  9. [9]

    Zixuan Chen, Zewei He, and Zhe-Ming Lu. 2024. DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention.TIP 33 (2024), 1002–1015

  10. [10]

    Zheng Chen, Yulun Zhang, Jinjin Gu, Linghe Kong, Xiaokang Yang, and Fisher Yu. 2023. Dual Aggregation Transformer for Image Super-Resolution. In ICCV

  11. [11]

    Yuning Cui and Alois Knoll. 2023. Exploring the potential of channel interactions for image restoration. Knowledge-Based Systems 282 (2023), 111156

  12. [12]

    Yuning Cui, Wenqi Ren, Xiaochun Cao, and Alois Knoll. 2023. Image restoration via frequency selection. TPAMI 46, 2 (2023), 1093–1108

  13. [13]

    Yuning Cui, Wenqi Ren, and Alois Knoll. 2024. Omni-kernel network for image restoration. In AAAI, Vol. 38. 1426–1434

  14. [14]

    Yuning Cui, Wenqi Ren, Sining Yang, Xiaochun Cao, and Alois Knoll. 2023. Irnext: Rethinking convolutional network design for image restoration. In ICML

  15. [15]

    Yuning Cui, Syed Waqas Zamir, Salman Khan, Alois Knoll, Mubarak Shah, and Fahad Shahbaz Khan. 2025. AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation. In ICLR

  16. [16]

    Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, and Ming- Hsuan Yang. 2020. Multi-scale boosted dehazing network with dense feature fusion. In CVPR. 2157–2167

  17. [17]

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  18. [19]

    Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Ding, and John Paisley. 2017. Removing rain from single images via a deep detail network. In CVPR. 3855–3863

  19. [20]

    Xueyang Fu, Jie Xiao, Yurui Zhu, Aiping Liu, Feng Wu, and Zheng-Jun Zha. 2023. Continual image deraining with hypergraph convolutional networks. TPAMI 45, 8 (2023), 9534–9551

  20. [21]

    Ning Gao, Xingyu Jiang, Xiuhui Zhang, and Yue Deng. 2024. Efficient Frequency- Domain Image Deraining with Contrastive Regularization. In ECCV. Springer, 240–257

  21. [22]

    Shuhang Gu, Deyu Meng, Wangmeng Zuo, and Lei Zhang. 2017. Joint con- volutional analysis and synthesis sparse representation for single image layer separation. In ICCV. 1708–1716

  22. [23]

    Kui Jiang, Zhongyuan Wang, Chen Chen, Zheng Wang, Laizhong Cui, and Chia- Wen Lin. 2022. Magic ELF: Image deraining meets association learning and transformer. ACMMM (2022)

  23. [24]

    Kui Jiang, Zhongyuan Wang, Zheng Wang, Peng Yi, Junjun Jiang, Jinsheng Xiao, and Chia-Wen Lin. 2022. Danet: Image deraining via dynamic association learning.. In IJCAI. 980–986

  24. [25]

    Kui Jiang, Zhongyuan Wang, Peng Yi, Chen Chen, Baojin Huang, Yimin Luo, Jiayi Ma, and Junjun Jiang. 2020. Multi-scale progressive fusion network for single image deraining. In CVPR. 8346–8355

  25. [26]

    Kui Jiang, Zhongyuan Wang, Peng Yi, Chen Chen, Zheng Wang, Xiao Wang, Junjun Jiang, and Chia-Wen Lin. 2021. Rain-free and residue hand-in-hand: A progressive coupled network for real-time image deraining. IEEE Transactions on Image Processing 30 (2021), 7404–7418

  26. [27]

    Xingyu Jiang, Xiuhui Zhang, Ning Gao, and Yue Deng. 2024. When Fast Fourier Transform Meets Transformer for Image Restoration. InECCV. Springer, 381–402

  27. [28]

    Li-Wei Kang, Chia-Wen Lin, and Yu-Hsiang Fu. 2011. Automatic single-image- based rain streaks removal via image decomposition.TIP 21, 4 (2011), 1742–1755

  28. [29]

    Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. 2018. Benchmarking single-image dehazing and beyond. TIP 28, 1 (2018), 492–505

  29. [30]

    Pengpeng Li, Jiyu Jin, Guiyue Jin, Lei Fan, Xiao Gao, Tianyu Song, and Xiang Chen. 2022. Deep scale-space mining network for single image deraining. In CVPR. 4276–4285

  30. [31]

    Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, and Hongbin Zha. 2018. Recurrent squeeze-and-excitation context aggregation net for single image deraining. In ECCV. 254–269

  31. [32]

    Yu Li, Robby T Tan, Xiaojie Guo, Jiangbo Lu, and Michael S Brown. 2016. Rain streak removal using layer priors. In CVPR. 2736–2744

  32. [33]

    Yu Li, Robby T Tan, Xiaojie Guo, Jiangbo Lu, and Michael S Brown. 2016. Rain streak removal using layer priors. In ICCV. 2736–2744

  33. [34]

    Yawei Li, Kai Zhang, Jiezhang Cao, Radu Timofte, and Luc Van Gool. 2021. Localvit: Bringing locality to vision transformers. arXiv preprint arXiv:2104.05707 (2021)

  34. [35]

    Yuanchu Liang, Saeed Anwar, and Yang Liu. 2022. Drt: A lightweight single image deraining recursive transformer. In CVPR. 589–598

  35. [36]

    Ye Liu, Lei Zhu, Shunda Pei, Huazhu Fu, Jing Qin, Qing Zhang, Liang Wan, and Wei Feng. 2021. From synthetic to real: Image dehazing collaborating with unlabeled real data. In ACMMM. 50–58

  36. [37]

    LiPing Lu, Qian Xiong, Bingrong Xu, and Duanfeng Chu. 2024. Mixdehazenet: Mix structure block for image dehazing network. In IJCNN. IEEE, 1–10

  37. [38]

    Pinjun Luo, Guoqiang Xiao, Xinbo Gao, and Song Wu. 2023. LKD-Net: Large kernel convolution network for single image dehazing. InICME. IEEE, 1601–1606

  38. [39]

    Namuk Park and Songkuk Kim. 2022. How do vision transformers work? arXiv preprint arXiv:2202.06709 (2022)

  39. [40]

    Yan-Tsung Peng and Wei-Hua Li. 2023. Rain2Avoid: Self-Supervised Single Image Deraining. In ICASSP. IEEE, 1–5

  40. [41]

    Tan, Wenhan Yang, Jiajun Su, and Jiaying Liu

    Rui Qian, Robby T. Tan, Wenhan Yang, Jiajun Su, and Jiaying Liu. 2018. Attentive Generative Adversarial Network for Raindrop Removal From a Single Image. In CVPR

  41. [42]

    Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and Huizhu Jia. 2020. FFA-Net: Feature fusion attention network for single image dehazing. In AAAI, Vol. 34. 11908–11915

  42. [43]

    Yuwei Qiu, Kaihao Zhang, Chenxi Wang, Wenhan Luo, Hongdong Li, and Zhi Jin

  43. [44]

    Mb-taylorformer: Multi-branch efficient transformer expanded by taylor formula for image dehazing. In ICCV. 12802–12813

  44. [45]

    Chun Ren, Danfeng Yan, Yuanqiang Cai, and Yangchun Li. 2023. Semi-swinderain: Semi-supervised image deraining network using swin transformer. In ICASSP. IEEE, 1–5

  45. [46]

    Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, and Deyu Meng. 2019. Progressive image deraining networks: A better and simpler baseline. In CVPR. 3937–3946

  46. [47]

    Dai Shi. 2024. Transnext: Robust foveal visual perception for vision transformers. In CVPR. 17773–17783

  47. [48]

    Yuda Song, Zhuqing He, Hui Qian, and Xin Du. 2023. Vision Transformers for Single Image Dehazing. TIP 32 (2023), 1927–1941

  48. [49]

    Yuda Song, Yang Zhou, Hui Qian, and Xin Du. 2022. Rethinking Performance Gains in Image Dehazing Networks. arXiv preprint arXiv:2209.11448 (2022)

  49. [50]

    Jian-Nan Su, Min Gan, Guang-Yong Chen, Wenzhong Guo, and CL Philip Chen

  50. [51]

    TIP 33 (2024), 610–624

    High-similarity-pass attention for single image super-resolution. TIP 33 (2024), 610–624

  51. [52]

    Shangquan Sun, Wenqi Ren, Xinwei Gao, Rui Wang, and Xiaochun Cao. 2024. Restoring Images in Adverse Weather Conditions via Histogram Transformer. ECCV (2024)

  52. [53]

    Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. 2022. Maxvit: Multi-axis vision transformer. In ECCV. Springer, 459–479

  53. [54]

    Jeya Maria Jose Valanarasu, Rajeev Yasarla, and Vishal M Patel. 2022. Tran- sweather: Transformer-based restoration of images degraded by adverse weather conditions. In CVPR. 2353–2363

  54. [55]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. NeurIPS 30 (2017)

  55. [56]

    Cong Wang, Jinshan Pan, Wei Wang, Jiangxin Dong, Mengzhu Wang, Yakun Ju, and Junyang Chen. 2023. Promptrestorer: A prompting image restoration method with degradation perception. NIPS 36 (2023), 8898–8912

  56. [57]

    Cong Wang, Xiaoying Xing, Yutong Wu, Zhixun Su, and Junyang Chen. 2020. Dcsfn: Deep cross-scale fusion network for single image rain removal. InACMMM. 1643–1651

  57. [58]

    Peihao Wang, Wenqing Zheng, Tianlong Chen, and Zhangyang Wang. 2022. Anti- Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice. In International Conference on Learning Representations

  58. [59]

    Qiong Wang, Kui Jiang, Jinyi Lai, Zheng Wang, and Jianhui Zhang. 2023. Hpcnet: A hybrid progressive coupled network for image deraining. In ICME. IEEE, 2747– 2752

  59. [60]

    Qiong Wang, Kui Jiang, Zheng Wang, Wenqi Ren, Jianhui Zhang, and Chia- Wen Lin. 2023. Multi-scale fusion and decomposition network for single image deraining. TIP 33 (2023), 191–204. , , Shun Zou, Yi Zou, Juncheng Li, Guangwei Gao ∗, and Guojun Qi

  60. [61]

    Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, and Rynson WH Lau. 2019. Spatial attentive single-image deraining with a high quality real rain dataset. In CVPR. 12270–12279

  61. [62]

    Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In ICCV. 568–578

  62. [63]

    Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. 2022. Uformer: A general u-shaped transformer for image restoration. In CVPR. 17683–17693

  63. [64]

    Boxue Xiao, Zhuoran Zheng, Xiang Chen, Chen Lv, Yunliang Zhuang, and Tao Wang. 2022. Single UHD Image Dehazing via Interpretable Pyramid Network. arXiv:2202.08589

  64. [65]

    Jie Xiao, Xueyang Fu, Aiping Liu, Feng Wu, and Zheng-Jun Zha. 2022. Image de-raining transformer. TPAMI 45, 11 (2022), 12978–12995

  65. [66]

    Jie Xiao, Xueyang Fu, Aiping Liu, Feng Wu, and Zheng-Jun Zha. 2023. Image De-Raining Transformer. TPAMI 45, 11 (2023), 12978–12995. doi:10.1109/TPAMI. 2022.3183612

  66. [67]

    Yi Xiao, Qiangqiang Yuan, Kui Jiang, Jiang He, Chia-Wen Lin, and Liangpei Zhang. 2024. TTST: A top-k token selective transformer for remote sensing image super-resolution. TIP 33 (2024), 738–752

  67. [68]

    Wenhan Yang, Robby T Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan. 2017. Deep joint rain detection and removal from a single image. In CVPR. 1357–1366

  68. [69]

    Wenhan Yang, Robby T Tan, Shiqi Wang, Yuming Fang, and Jiaying Liu. 2020. Single image deraining: From model-based to data-driven and beyond. TPAMI 43, 11 (2020), 4059–4077

  69. [70]

    Qiaosi Yi, Juncheng Li, Qinyan Dai, Faming Fang, Guixu Zhang, and Tieyong Zeng. 2021. Structure-preserving deraining with residue channel prior guidance. In ICCV. 4238–4247

  70. [71]

    Qiaosi Yi, Juncheng Li, Qinyan Dai, Faming Fang, Guixu Zhang, and Tieyong Zeng. 2021. Structure-Preserving Deraining with Residue Channel Prior Guidance. In ICCV. 4218–4227

  71. [72]

    Yi Yu, Wenhan Yang, Yap-Peng Tan, and Alex C Kot. 2022. Towards robust rain removal against adversarial attacks: A comprehensive benchmark analysis and beyond. In CVPR. 6013–6022

  72. [73]

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2022. Restormer: Efficient transformer for high- resolution image restoration. In CVPR. 5728–5739

  73. [74]

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. 2021. Multi-stage progressive image restoration. In CVPR. 14821–14831

  74. [75]

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. 2022. Learning enriched features for fast image restoration and enhancement. TPAMI 45, 2 (2022), 1934–1948

  75. [76]

    He Zhang and Vishal M Patel. 2017. Convolutional sparse and low-rank coding- based rain streak removal. In W ACV. IEEE, 1259–1267

  76. [77]

    He Zhang and Vishal M Patel. 2018. Density-aware single image de-raining using a multi-stream dense network. In CVPR. 695–704

  77. [78]

    He Zhang and Vishal M Patel. 2018. Density-aware Single Image De-raining using a Multi-stream Dense Network. In CVPR

  78. [79]

    He Zhang, Vishwanath Sindagi, and Vishal M Patel. 2019. Image de-raining using a conditional generative adversarial network. TCSVT (2019)

  79. [80]

    Mingjun Zheng, Long Sun, Jiangxin Dong, and Jinshan Pan. 2024. SMFANet: A lightweight self-modulation feature aggregation network for efficient image super-resolution. In European Conference on Computer Vision . Springer, 359–375

  80. [81]

    Shihao Zhou, Duosheng Chen, Jinshan Pan, Jinglei Shi, and Jufeng Yang. 2024. Adapt or perish: Adaptive sparse transformer with attentive feature refinement for image restoration. In CVPR. 2952–2963