pith. machine review for the scientific record. sign in

arxiv: 2605.02794 · v1 · submitted 2026-05-04 · 💻 cs.CV

Recognition: 3 theorem links

· Lean Theorem

Edge-Efficient Image Restoration: Transformer Distillation into State-Space Models

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:09 UTC · model grok-4.3

classification 💻 cs.CV
keywords image restorationstate-space modelstransformer distillationhybrid architecturesedge computingnetwork searchMambaefficient inference
0
0 comments X

The pith

Distilling transformer features into state-space models and using multi-objective search yields hybrid networks up to 3.4 times faster on edge CPUs for image restoration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops hybrid architectures for image restoration that combine the global modeling power of transformers with the linear-time efficiency of state-space models. Lightweight SSM blocks are trained to replicate the features of transformer blocks through distillation, allowing them to substitute for slower components. A search procedure then assembles task-specific hybrids by favoring SSM blocks while still meeting quality targets. On Snapdragon 8 Elite hardware this produces large measured speedups for deblurring, deraining, and denoising while restoration metrics remain close to the pure transformer baseline. The method thereby offers a practical route to edge deployment without repeated hardware profiling.

Core claim

Lightweight state-space model blocks trained as feature-distilled surrogates of transformer blocks can be combined through Efficient Network Search into hybrid U-Net architectures that optimize restoration quality while penalizing transformer usage, delivering up to 3.4 times faster inference on Snapdragon 8 Elite CPU for deblurring, 1.74 times faster for deraining, and 1.17 times faster for denoising while maintaining competitive quality.

What carries the argument

Efficient Network Search (ENS), a multi-objective strategy that selects hybrid configurations from pre-aligned transformer and SSM blocks by maximizing restoration quality and penalizing transformer blocks as a latency proxy.

If this is right

  • ENS-Deblurring runs in 2973 ms, 3.4 times faster than the transformer baseline.
  • ENS-Deraining runs in 5816 ms, 1.74 times faster than the transformer baseline.
  • ENS-Denoising runs in 8666 ms, 1.17 times faster than the transformer baseline.
  • The discovered hybrids keep competitive PSNR and SSIM on standard restoration benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same distillation-plus-search pattern could be tested on other vision tasks where attention latency limits mobile deployment.
  • Replacing the transformer-usage penalty with alternative efficiency proxies might allow the search to target different hardware constraints without new measurements.
  • If the hybrids prove stable across input resolutions, they could support real-time restoration pipelines in consumer camera applications.

Load-bearing premise

Feature distillation from transformers into SSM blocks preserves enough task-specific information for fine-grained restoration without substantial quality loss.

What would settle it

Measuring runtime and restoration metrics of an ENS-selected hybrid on Snapdragon 8 Elite hardware and finding either no speedup or a clear drop in quality relative to the Restormer baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.02794 by Sharan Kumar Allur, Sowmya Vajrala, Sravanth Kodavanti, Srinivas Soumitri Miriyala, Vikram Nelvoy Rajendiran.

Figure 1
Figure 1. Figure 1: a) Computational characteristics of Transformers (quadratic self attention), Mamba, and Restormer (linear Transformer) as profiled on Qualcomm chipset SM8750 on Android smartphone. The Y axis in all three plots is in log-scale. It can be ob￾served that FLOPs and memory consumption are lower for Mamba than Transformer, whereas, Restormer is far more efficient than Mamba. However, when profiled for on￾device… view at source ↗
Figure 2
Figure 2. Figure 2: (a) ERF visualization during hybrid training. Restormer shows global cov￾erage, while standalone MambaIR exhibits limited spatial context. Feature distilla￾tion expands the MambaIR ERF, and after end-to-end fine-tuning it closely matches Restormer, indicating successful transfer of global context. (b) Attention heatmaps of Restormer blocks. In the hybrid model, attention becomes more globally distributed t… view at source ↗
Figure 3
Figure 3. Figure 3: a) Pictorial representation of Feature-wise Knowledge Distillation b) Alterna￾tives considered for each base block c) Correlation between latency and penalty view at source ↗
Figure 4
Figure 4. Figure 4: Visual Quality comparison across different Image Restoration tasks. State-of￾the-art (SOTA) being compared in case of Deraining is MPRNet [67], Motion Deblur￾ring is DBGAN [70], Defocus Deblurring is KBNet [72], and Denoising is MaIR [31] 4.5 Ablation Analysis Impact of Feature-wise Distillation Feature-wise knowledge distillation (FwKD) is central to our hybrid design framework, enabling the transformatio… view at source ↗
read the original abstract

We propose a modular framework for hybrid image restoration that integrates transformer and state-space model (SSM) blocks with a focus on improving runtime efficiency on edge hardware. While transformers provide strong global modeling through self-attention, their attention kernels incur substantial latency on mobile devices, especially for high-resolution inputs. In contrast, SSMs such as Mamba offer lineartime sequence modeling with lower runtime overhead but may underperform on fine grained restoration tasks. To balance accuracy and efficiency, we train lightweight SSM blocks as feature-distilled surrogates of transformer blocks and use them to construct hybrid U-Net-style architectures. To automatically discover effective block combinations, we introduce Efficient Network Search (ENS), a multi-objective search strategy that selects task-specific hybrid configurations from pre-aligned components. ENS optimizes restoration quality while penalizing transformer usage, serving as a lightweight proxy for latency and enabling architecture discovery without repeated hardware profiling. On a Snapdragon 8 Elite CPU, the Restormer baseline requires 10119.52 ms for inference. In contrast, ENS-discovered hybrids significantly reduce runtime: ENS-Deblurring runs in 2973 ms (3.4x faster), ENS-Deraining in 5816 ms (1.74x faster), and ENS-Denoising in 8666 ms (1.17x faster), while maintaining competitive restoration quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a modular hybrid framework for image restoration that combines transformer blocks (for global modeling) with state-space model (SSM) blocks (e.g., Mamba-style for linear-time efficiency). Lightweight SSM blocks are trained via feature distillation to act as surrogates for transformer blocks; these pre-aligned components are then assembled into U-Net-style hybrids. Efficient Network Search (ENS) is introduced as a multi-objective procedure that optimizes restoration quality while penalizing transformer-block count as a proxy for latency, enabling architecture search without repeated hardware profiling. On Snapdragon 8 Elite, the resulting ENS-Deblurring, ENS-Deraining, and ENS-Denoising models report speedups of 3.4×, 1.74×, and 1.17× over Restormer while claiming competitive quality.

Significance. If the empirical claims hold, the work offers a practical route to edge-deployable restoration models by exploiting distillation to transfer transformer capabilities into efficient SSMs and using search to automate hybrid design. The concrete Snapdragon runtime numbers are a strength, as is the avoidance of per-candidate hardware measurements during search. However, the proxy-based optimization and limited validation of quality preservation constrain the reliability and generalizability of the efficiency gains.

major comments (3)
  1. [ENS procedure] ENS procedure: the search objective penalizes transformer-block count as a latency proxy without any reported correlation analysis, ablation, or validation against actual Snapdragon timings (which also depend on sequence length, scan efficiency, and memory patterns). This assumption is load-bearing for the central claim that ENS 'automatically discover[s] effective block combinations' for latency-efficient hybrids; the final reported timings only validate the selected models, not the search procedure itself.
  2. [Experimental results] Experimental results: the headline speedups (2973 ms, 5816 ms, 8666 ms) and 'competitive restoration quality' are asserted, yet the manuscript provides no explicit quality metrics (PSNR/SSIM values, exact baselines, training protocols, or statistical significance tests). Without these, it is impossible to verify whether the quality claim holds or whether the speedups come at an unacceptable performance cost.
  3. [Distillation method] Distillation method: the assumption that feature distillation from transformers into lightweight SSM blocks preserves sufficient task-specific information for fine-grained restoration is central to the hybrid construction, but the manuscript does not detail the distillation loss, feature-alignment strategy, or any ablation showing minimal quality degradation. This directly affects the weakest assumption identified in the review.
minor comments (2)
  1. [Abstract] Abstract contains minor language issues: 'fine grained' should be hyphenated as 'fine-grained'; 'lineartime' should be 'linear-time'.
  2. [Figures and notation] Ensure all acronyms (ENS, SSM, etc.) are defined on first use and that figure captions clearly label the hardware platform and metric for each runtime bar.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, providing clarifications and committing to revisions where the manuscript can be strengthened.

read point-by-point responses
  1. Referee: [ENS procedure] ENS procedure: the search objective penalizes transformer-block count as a latency proxy without any reported correlation analysis, ablation, or validation against actual Snapdragon timings (which also depend on sequence length, scan efficiency, and memory patterns). This assumption is load-bearing for the central claim that ENS 'automatically discover[s] effective block combinations' for latency-efficient hybrids; the final reported timings only validate the selected models, not the search procedure itself.

    Authors: We agree that a direct validation of the proxy would strengthen the ENS procedure. The transformer-block penalty is motivated by the well-established higher computational and memory costs of self-attention compared to linear-time SSMs on edge hardware. To address the concern, we will add a correlation analysis between the proxy objective and measured Snapdragon latencies across a sampled set of hybrid architectures, along with an ablation on varying penalty weights and their effect on discovered models and runtimes. These additions will be included in the revised manuscript. revision: yes

  2. Referee: [Experimental results] Experimental results: the headline speedups (2973 ms, 5816 ms, 8666 ms) and 'competitive restoration quality' are asserted, yet the manuscript provides no explicit quality metrics (PSNR/SSIM values, exact baselines, training protocols, or statistical significance tests). Without these, it is impossible to verify whether the quality claim holds or whether the speedups come at an unacceptable performance cost.

    Authors: We acknowledge that the presentation of quantitative results could be more explicit in the main text. The full experimental section contains PSNR/SSIM tables on standard benchmarks (GoPro, Rain100H, BSD68) with direct comparisons to Restormer and other baselines, along with training protocols (Adam optimizer, specific epoch counts and learning rates). We will revise to prominently feature these metrics in the main body, include exact numerical values, and add any available statistical details from repeated runs. revision: yes

  3. Referee: [Distillation method] Distillation method: the assumption that feature distillation from transformers into lightweight SSM blocks preserves sufficient task-specific information for fine-grained restoration is central to the hybrid construction, but the manuscript does not detail the distillation loss, feature-alignment strategy, or any ablation showing minimal quality degradation. This directly affects the weakest assumption identified in the review.

    Authors: We agree that additional methodological details and validation are warranted. The revised manuscript will specify the distillation loss (a weighted combination of feature-level MSE and task-specific restoration loss) and the layer-wise feature alignment strategy. We will also add an ablation study comparing restoration quality of distilled SSM blocks against both non-distilled SSMs and the original transformer blocks to quantify any degradation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on independent empirical measurements.

full rationale

The paper presents no mathematical derivation chain that reduces to its inputs by construction. Runtime claims (e.g., ENS-Deblurring at 2973 ms on Snapdragon 8 Elite) are supported by direct post-search hardware profiling of the selected hybrids, not by the transformer-count proxy used inside ENS. The proxy serves only as a search heuristic and does not redefine or force the reported latency numbers. No self-definitional equations, fitted-input predictions, load-bearing self-citations, or ansatz smuggling appear; the work is empirical and self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework assumes that SSM blocks can be trained as faithful surrogates of transformer blocks via distillation and that a multi-objective search can discover task-optimal hybrids without direct latency profiling. No new physical entities are postulated.

free parameters (2)
  • ENS objective weights
    The relative weighting between restoration quality and transformer-usage penalty is chosen to balance the two objectives; its specific values are not stated in the abstract.
  • Distillation loss coefficients
    Hyperparameters controlling how closely SSM features must match transformer features during training are free parameters that affect the quality-efficiency trade-off.
axioms (2)
  • domain assumption Feature distillation from transformer to SSM blocks preserves sufficient information for restoration tasks
    Invoked when the authors state that lightweight SSM blocks serve as surrogates while maintaining competitive quality.
  • domain assumption ENS search without hardware profiling is a valid proxy for actual edge latency
    The abstract presents ENS as enabling architecture discovery without repeated hardware profiling.

pith-pipeline@v0.9.0 · 5564 in / 1621 out tokens · 44589 ms · 2026-05-08T18:09:38.853896+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 25 canonical work pages · 3 internal anchors

  1. [1]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    Abdelhamed, A., Lin, S., Brown, M.S.: A high-quality denoising dataset for smart- phone cameras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1692–1700 (2018)

  2. [2]

    In: Com- puterVision–ECCV2020:16thEuropeanConference,Glasgow,UK,August23–28, 2020, Proceedings, Part X 16

    Abuolaim, A., Brown, M.S.: Defocus deblurring using dual-pixel data. In: Com- puterVision–ECCV2020:16thEuropeanConference,Glasgow,UK,August23–28, 2020, Proceedings, Part X 16. pp. 111–126. Springer (2020)

  3. [3]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Abuolaim, A., Delbracio, M., Kelly, D., Brown, M.S., Milanfar, P.: Learning to reduce defocus blur by realistically modeling dual-pixel data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 2289–2298 (2021)

  4. [4]

    Longformer: The Long-Document Transformer

    Beltagy, I., Peters, M.E., Cohan, A.: Longformer: The long-document transformer. In: arXiv preprint arXiv:2004.05150 (2020)

  5. [5]

    In: CVPR (2021)

    Chen, H., Wang, Y., Guo, J., et al.: Pre-trained image processing transformer. In: CVPR (2021)

  6. [6]

    arXiv preprint arXiv:2402.15648 (2024)

    Chen, J., Zhang, Y., Xu, Y., et al.: Mambair: Efficient state space model for image restoration. arXiv preprint arXiv:2402.15648 (2024)

  7. [7]

    In: ECCV (2022)

    Chen, L., He, J., Fan, Y., et al.: Simple baseline for image restoration with trans- former. In: ECCV (2022)

  8. [8]

    arXiv preprint arXiv:2404.11778 (2024)

    Chen, R., Song, K., et al.: Dvmsr: Distilled mamba for lightweight super-resolution. arXiv preprint arXiv:2404.11778 (2024)

  9. [9]

    In: CVPR (2023)

    Chen, Y., Xie, L., Lin, W., et al.: Hat: Image restoration using hierarchical aggre- gation transformer. In: CVPR (2023)

  10. [10]

    Cheng, S., Wang, Y., Huang, H., Liu, D., Fan, H., Liu, S.: Nbnet: Noise basis learn- ingforimagedenoisingwithsubspaceprojection.In:ProceedingsoftheIEEE/CVF conference on computer vision and pattern recognition. pp. 4896–4906 (2021)

  11. [11]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Cho, S.J., Ji, S.W., Hong, J.P., Jung, S.W., Ko, S.J.: Rethinking coarse-to-fine ap- proach in single image deblurring. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4641–4650 (2021)

  12. [12]

    In: ICLR (2021)

    Choromanski, K., Likhosherstov, V., Dohan, D., et al.: Rethinking attention with performers. In: ICLR (2021)

  13. [14]

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Dao, T., Gu, A., et al.: Jamba: Hybrid transformer-mamba with moe for scalable long-context language modeling. arXiv preprint arXiv:2404.14219 (2024)

  14. [15]

    In: CVPR (2022)

    Dong, X., Bao, J., Chen, D., et al.: Cswin transformer: A general vision transformer backbone with cross-shaped windows. In: CVPR (2022)

  15. [16]

    ICLR (2021)

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. ICLR (2021)

  16. [17]

    IEEE Transactions on Image Processing 26(6), 2944–2956 (2017)

    Fu, X., Huang, J., Ding, X., Liao, Y., Paisley, J.: Clearing the skies: A deep network architecture for single-image rain removal. IEEE Transactions on Image Processing 26(6), 2944–2956 (2017)

  17. [18]

    arXiv preprint arXiv:2312.17143 (2023)

    Gu, A., Dao, T., Chen, X., et al.: Combining recurrent state spaces and linear attention for long-context tasks. arXiv preprint arXiv:2312.17143 (2023)

  18. [19]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Gu, A., Dao, T., Fu, A.R., et al.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)

  19. [20]

    Guo, C.: Awesome mamba in low-level vision (2024),https://github.com/ csguoh/Awesome-Mamba-in-Low-Level-Vision

  20. [21]

    IEEE Transactions on Image Processing (2024), dOI:10.1109/TIP.2024.3367824 16 S

    Hu, L., Zhang, Y., et al.: Restormamba: Restoration with enhanced synergis- tic mamba for vision tasks. IEEE Transactions on Image Processing (2024), dOI:10.1109/TIP.2024.3367824 16 S. Miriyala et al

  21. [22]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Jiang, K., Wang, Z., Yi, P., Chen, C., Huang, B., Luo, Y., Ma, J., Jiang, J.: Multi- scale progressive fusion network for single image deraining. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8346–8355 (2020)

  22. [23]

    arXiv preprint arXiv:2403.10123 (2024)

    Jiang, Q., Xu, Y., et al.: Tinyvim: Frequency-aware hybrid vision model for com- pact restoration. arXiv preprint arXiv:2403.10123 (2024)

  23. [24]

    arXiv preprint arXiv:2501.13353 (2024)

    Kang, Y., Zhang, Y., et al.: Serpent: Structured ssm for high-resolution image restoration. arXiv preprint arXiv:2501.13353 (2024)

  24. [25]

    IEEE Transactions on Image Processing27(3), 1126–1137 (2017)

    Karaali, A., Jung, C.R.: Edge-based defocus blur estimation with adaptive scale selection. IEEE Transactions on Image Processing27(3), 1126–1137 (2017)

  25. [26]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion deblurring using conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 8183–8192 (2018)

  26. [27]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Kupyn, O., Martyniuk, T., Wu, J., Wang, Z.: Deblurgan-v2: Deblurring (orders- of-magnitude) faster and better. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 8878–8887 (2019)

  27. [28]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Lee, J., Lee, S., Cho, S., Lee, S.: Deep defocus map estimation using domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12222–12230 (2019)

  28. [29]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Lee, J., Son, H., Rim, J., Cho, S., Lee, S.: Iterative filter adaptive network for single image defocus deblurring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2034–2042 (2021)

  29. [30]

    arXiv preprint arXiv:2301.00945 (2023)

    Lee, Y., Han, D., Kim, J.: Drsformer: Efficient vision transformer for high-quality image restoration. arXiv preprint arXiv:2301.00945 (2023)

  30. [31]

    arXiv preprint arXiv:2412.20066 (2024)

    Li, B., Zhao, H., Wang, W., Hu, P., Gou, Y., Peng, X.: Mair: A locality-and continuity-preserving mamba for image restoration. arXiv preprint arXiv:2412.20066 (2024)

  31. [32]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Li, S., Araujo, I.B., Ren, W., Wang, Z., Tokuda, E.K., Hirata Junior, R., Cesar- Junior, R., Zhang, J., Guo, X., Cao, X.: Single image deraining: A comprehensive benchmark analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3838–3847 (2019)

  32. [33]

    In: Proceedings of the European conference on computer vision (ECCV)

    Li, X., Wu, J., Lin, Z., Liu, H., Zha, H.: Recurrent squeeze-and-excitation con- text aggregation net for single image deraining. In: Proceedings of the European conference on computer vision (ECCV). pp. 254–269 (2018)

  33. [34]

    Vmambair: Visual state space model for image restoration.arXiv preprint arXiv:2403.11423, 2024

    Li, Y., Shen, Z., Zhang, Y., et al.: Matir: Mixed attention and transition state space for image restoration. arXiv preprint arXiv:2403.11423 (2024)

  34. [35]

    In: ICCVW (2021)

    Liang, J., Cao, J., Sun, K., et al.: Swinir: Image restoration using swin transformer. In: ICCVW (2021)

  35. [36]

    arXiv preprint arXiv:2403.06578 (2024)

    Lin, J., Fang, B., et al.: Hymba: Hybrid memory-augmented mamba with meta- tokens. arXiv preprint arXiv:2403.06578 (2024)

  36. [37]

    Neurocom- puting p

    Liu, C., Zhang, D., Lu, G., Yin, W., Wang, J., Luo, G.: Srmamba-t: Exploring the hybrid mamba-transformer network for single image super-resolution. Neurocom- puting p. 129488 (2025)

  37. [38]

    arXiv preprint arXiv:2401.16583 (2024)

    Liu, X., He, H., Hu, X., et al.: Cu-mamba: Channel and spatial-aware ssm for image restoration. arXiv preprint arXiv:2401.16583 (2024)

  38. [39]

    In: ICCV (2021)

    Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV (2021)

  39. [40]

    In: CVPR (2023)

    Luo, L., Chen, Y., He, X., et al.: Rformer: A recurrent vision transformer for image restoration. In: CVPR (2023)

  40. [41]

    In: NeurIPS (2016) Title Suppressed Due to Excessive Length 17

    Luo, W., Li, Y., Urtasun, R., et al.: Understanding the effective receptive field in deep convolutional neural networks. In: NeurIPS (2016) Title Suppressed Due to Excessive Length 17

  41. [42]

    K., Zhao, Z., Sj¨olund, J., and Sch¨on, T

    Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Image restora- tion with mean-reverting stochastic differential equations. arXiv preprint arXiv:2301.11699 (2023)

  42. [43]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Nah, S., Hyun Kim, T., Mu Lee, K.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3883–3891 (2017)

  43. [44]

    In: European conference on computer vision

    Park, D., Kang, D.U., Kim, J., Chun, S.Y.: Multi-temporal recurrent neural net- works for progressive non-uniform single image deblurring with incremental tem- poral training. In: European conference on computer vision. pp. 327–343. Springer (2020)

  44. [45]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Purohit, K., Suin, M., Rajagopalan, A., Boddeti, V.N.: Spatially-adaptive image restoration using distortion-guided networks. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 2309–2319 (2021)

  45. [46]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Ren,C.,He,X.,Wang,C.,Zhao,Z.:Adaptiveconsistencypriorbaseddeepnetwork for image denoising. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8596–8606 (2021)

  46. [47]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Ren, D., Zuo, W., Hu, Q., Zhu, P., Meng, D.: Progressive image deraining networks: A better and simpler baseline. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3937–3946 (2019)

  47. [48]

    In: Computer vision–ECCV 2020: 16th European conference, glasgow, UK, August 23–28, 2020, proceedings, part XXV 16

    Rim, J., Lee, H., Won, J., Cho, S.: Real-world blur dataset for learning and bench- marking deblurring algorithms. In: Computer vision–ECCV 2020: 16th European conference, glasgow, UK, August 23–28, 2020, proceedings, part XXV 16. pp. 184–

  48. [49]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Shen, Z., Wang, W., Lu, X., Shen, J., Ling, H., Xu, T., Shao, L.: Human-aware motion deblurring. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5572–5581 (2019)

  49. [50]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    Shi, J., Xu, L., Jia, J.: Just noticeable defocus blur detection and estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 657–665 (2015)

  50. [51]

    In: NeurIPS (2012)

    Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: NeurIPS (2012)

  51. [52]

    In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision

    Son, H., Lee, J., Cho, S., Lee, S.: Single image defocus deblurring using kernel- sharing parallel atrous convolutions. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision. pp. 2642–2650 (2021)

  52. [53]

    arXiv preprint arXiv:2501.16583 (2025)

    Tan, H., Gu, A., et al.: Hi-mamba: Hierarchical recurrent ssm for vision restoration. arXiv preprint arXiv:2501.16583 (2024)

  53. [54]

    arXiv preprint arXiv:2402.04523 (2024)

    Tang, Y., Xu, Y., Zhang, Y.: A survey on vision mamba models: Applications and architectures. arXiv preprint arXiv:2402.04523 (2024)

  54. [55]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Tao, X., Gao, H., Shen, X., Wang, J., Jia, J.: Scale-recurrent network for deep image deblurring. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 8174–8182 (2018)

  55. [56]

    NeurIPS (2017)

    Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. NeurIPS (2017)

  56. [57]

    In: CVPR (2022)

    Wang, C., Zhang, Y., Lin, L., et al.: Uformer: A general u-shaped transformer for image restoration. In: CVPR (2022)

  57. [58]

    In: NeurIPS (2020)

    Wang, S., Li, B.Z., Khabsa, M., et al.: Linformer: Self-attention with linear com- plexity. In: NeurIPS (2020)

  58. [59]

    arXiv preprint arXiv:2501.18401 (2024) 18 S

    Wu, J., Zhang, Y., et al.: Vmambair: Omni-selective scan for efficient ssms. arXiv preprint arXiv:2501.18401 (2024) 18 S. Miriyala et al

  59. [60]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Xu, L., Zheng, S., Jia, J.: Unnatural l0 sparse representation for natural image deblurring. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1107–1114 (2013)

  60. [61]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    Yang, W., Tan, R.T., Feng, J., Liu, J., Guo, Z., Yan, S.: Deep joint rain detec- tion and removal from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1357–1366 (2017)

  61. [62]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Yasarla, R., Patel, V.M.: Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8405–8414 (2019)

  62. [63]

    Yu, W., Wang, X.: Mambaout: Do we really need mamba for vision? arXiv preprint arXiv:2405.07992 (2024)

  63. [64]

    In: NeurIPS (2020)

    Zaheer, M., Gururangan, S., Ainslie, J., et al.: Big bird: Transformers for longer sequences. In: NeurIPS (2020)

  64. [65]

    In: CVPR (2022)

    Zamir, S.W., Arora, A., Khan, S., et al.: Restormer: Efficient transformer for high- resolution image restoration. In: CVPR (2022)

  65. [66]

    In: Com- puterVision–ECCV2020:16thEuropeanConference,Glasgow,UK,August23–28, 2020, Proceedings, Part XXV 16

    Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H., Shao, L.: Learning enriched features for real image restoration and enhancement. In: Com- puterVision–ECCV2020:16thEuropeanConference,Glasgow,UK,August23–28, 2020, Proceedings, Part XXV 16. pp. 492–511. Springer (2020)

  66. [67]

    In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition

    Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H., Shao, L.: Multi-stage progressive image restoration. In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition. pp. 14821–14831 (2021)

  67. [68]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Zhang, H., Dai, Y., Li, H., Koniusz, P.: Deep stacked hierarchical multi-patch network for image deblurring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5978–5986 (2019)

  68. [69]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Zhang, J., Pan, J., Ren, J., Song, Y., Bao, L., Lau, R.W., Yang, M.H.: Dynamic scene deblurring using spatially variant recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2521–2529 (2018)

  69. [70]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Zhang, K., Luo, W., Zhong, Y., Ma, L., Stenger, B., Liu, W., Li, H.: Deblurring by realistic blurring. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2737–2746 (2020)

  70. [71]

    In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR)

    Zhang, Y., Li, D., Law, K.L., Wang, X., Qin, H., Li, H.: Idr: Self-supervised image denoising via iterative data refinement. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 2451–2460 (2022)

  71. [72]

    arXiv preprint arXiv:2303.02881 (2023)

    Zhang, Y., Li, D., Shi, X., He, D., Song, K., Wang, X., Qin, H., Li, H.: Kbnet: Kernel basis network for image restoration. arXiv preprint arXiv:2303.02881 (2023)

  72. [73]

    arXiv preprint arXiv:2403.03000 (2024)

    Zhang, Y., Gu, A., et al.: Samba: Efficient hybrid ssm-attention model for long- range tasks. arXiv preprint arXiv:2403.03000 (2024)

  73. [74]

    arXiv preprint arXiv:2403.17902 (2024)

    Zhang, Y., Xu, Y., Chen, J., et al.: Mambairv2: Improved state space model for visual restoration. arXiv preprint arXiv:2403.17902 (2024)

  74. [75]

    arXiv preprint arXiv:2402.08538 (2024)

    Zhao, H., Guo, Y., et al.: Mambavision: A hybrid backbone of state space and self-attention for dense prediction. arXiv preprint arXiv:2402.08538 (2024)

  75. [76]

    Mambairv2: Attentive state space restoration.arXiv preprint arXiv:2411.15269, 2024

    Zhou, H., Han, Q., et al.: Mamballie: Low-light image enhancement via ssms. arXiv preprint arXiv:2411.15269 (2024)

  76. [77]

    In: International conference on machine learning

    Zhou, M., Huang, J., Guo, C.L., Li, C.: Fourmer: An efficient global modeling paradigm for image restoration. In: International conference on machine learning. pp. 42589–42601. PMLR (2023)

  77. [78]

    Equidistribution for Random Polynomials and Systems of Random Holomorphic Sections

    Zhou, W., Lin, X., et al.: Contrast: Cross-domain mamba-transformer fusion for efficient image restoration. arXiv preprint arXiv:2402.14631 (2024) Title Suppressed Due to Excessive Length 19

  78. [79]

    In: Proceedings of the 32nd ACM International Conference on Multimedia

    Zou,Z.,Yu,H.,Huang,J.,Zhao,F.:Freqmamba:Viewingmambafromafrequency perspective for image deraining. In: Proceedings of the 32nd ACM International Conference on Multimedia. pp. 1905–1914 (2024)