pith. sign in

arxiv: 2604.10081 · v1 · submitted 2026-04-11 · 💻 cs.CV · cs.AI

MatRes: Zero-Shot Test-Time Model Adaptation for Simultaneous Matching and Restoration

Pith reviewed 2026-05-10 16:44 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords image restorationgeometric matchingtest-time adaptationzero-shot learningcorrespondence estimationimage degradationcomputer vision
0
0 comments X

The pith

Enforcing conditional similarity at matched points on one image pair lets a test-time method improve both restoration and geometric alignment without any training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MatRes as a way to handle real-world image pairs that are both degraded and taken from different viewpoints. These two problems interfere when solved separately, but the method claims they can help each other when a lightweight update is applied at test time. The update uses only the given pair, leaves all original models frozen, and requires no extra data or offline learning. A reader would care because many everyday photos come in mixed-quality sets of the same scene, and solving the tasks together could give better results than fixing one then the other.

Core claim

MatRes is a zero-shot test-time adaptation framework that jointly improves restoration quality and correspondence estimation using only a single low-quality and high-quality image pair. By enforcing conditional similarity at corresponding locations, MatRes updates only lightweight modules while keeping all pretrained components frozen, requiring no offline training or additional supervision. Extensive experiments across diverse combinations show that MatRes yields significant gains in both restoration and geometric alignment compared to using either restoration or matching models alone.

What carries the argument

Enforcement of conditional similarity at corresponding locations across the input pair, which updates only lightweight adaptation modules to make restoration and matching reinforce each other.

If this is right

  • Restoration and matching can be performed on the same pair without one task harming the other.
  • The approach works on any combination of existing pretrained restoration and matching models.
  • No new training data or retraining is needed to handle viewpoint changes plus degradation.
  • Users can capture multiple shots of a scene and obtain both a cleaned image and reliable point matches from them.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same idea of using correspondence to guide adaptation might apply to other coupled tasks such as denoising followed by object detection.
  • If the method scales, it could reduce reliance on large clean training sets for restoration models.
  • Practical pipelines for mobile photography or surveillance could incorporate this joint step instead of separate restoration and alignment stages.

Load-bearing premise

That the single-pair conditional similarity signal is enough to drive mutual gains in restoration and matching while all original models stay frozen and no labels are available.

What would settle it

Running MatRes on a degraded pair and finding that neither the restored image quality metrics nor the number of correct correspondences improves over applying the two models independently.

Figures

Figures reproduced from arXiv: 2604.10081 by Kanggeon Lee, Kyoung Mu Lee, Soochahn Lee.

Figure 1
Figure 1. Figure 1: Zero-shot Test-Time Adaptation. MATRES enhances degraded and viewpoint-shifted image pairs by leveraging the mutual guidance of a matching and a restoration network to produce a restored and aligned output image. Abstract Real-world image pairs often exhibit both severe degradations and large viewpoint changes, making image restoration and geometric matching mutually interfering tasks when treated independ… view at source ↗
Figure 2
Figure 2. Figure 2: Method Overview. Given an input pair (ILQ, IHQ), the pretrained generative prior model Mθ and the pretrained restoration network Rϕ remain frozen during adaptation (gra￾dient flows through both models but their parameters are not updated). The zero-initialized adapter Hψ is the only train￾able module, refined iteratively using losses computed on the IEQ and IHQ image pairs. At each adaptation step, Mθ extr… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative Restoration Results. Restoration performance is compared with and without the matching network Mθ across three tasks. For super-resolution (SR), we evaluate EDSR [33], ESRGAN [45], SwinIR [32], and HAT [46]; for denois￾ing, we evaluate DnCNN [47], N2V [35], N2N [34], and Restormer [31]; and for deblurring, we evaluate DeblurGAN [36], SRN [37], Restormer [31], and NAFNet [48]. The green box indi… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative Matching Results. Geometric transform estimation results for five matching networks (LoftUp [26], ART [27], DIFT [28], RoMa [29], Mast3R [30]) across SR, denoising, and deblurring tasks, is evaluated with and without the restoration network Rϕ. The green box de￾notes the ground-truth transform, while the red box represents the estimated transform [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Real-world image pairs often exhibit both severe degradations and large viewpoint changes, making image restoration and geometric matching mutually interfering tasks when treated independently. In this work, we propose MatRes, a zero-shot test-time adaptation framework that jointly improves restoration quality and correspondence estimation using only a single low-quality and high-quality image pair. By enforcing conditional similarity at corresponding locations, MatRes updates only lightweight modules while keeping all pretrained components frozen, requiring no offline training or additional supervision. Extensive experiments across diverse combinations show that MatRes yields significant gains in both restoration and geometric alignment compared to using either restoration or matching models alone. MatRes offers a practical and widely applicable solution for real-world scenarios where users commonly capture multiple images of a scene with varying viewpoints and quality, effectively addressing the often-overlooked mutual interference between matching and restoration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes MatRes, a zero-shot test-time adaptation framework for jointly performing image restoration and geometric matching on a single degraded low-quality image paired with a high-quality reference. It updates only lightweight adaptation modules by enforcing conditional similarity at corresponding locations while freezing all pretrained restoration and matching models, requiring no offline training or extra supervision. Experiments across diverse image combinations reportedly demonstrate significant gains in both restoration quality and correspondence accuracy over using either task model in isolation.

Significance. If the central claims hold, the work is significant for addressing the mutual interference between restoration and matching in real-world scenarios with degradations and viewpoint changes. The zero-shot, single-pair, test-time nature without additional supervision or training offers a practical solution for common user-captured image sets, and the extensive experiments across combinations provide empirical support for the mutual improvement idea.

major comments (1)
  1. [§3] §3 (Method): The central mechanism relies on initial correspondences produced by the frozen matching model on the degraded input to define locations for conditional similarity enforcement. The manuscript does not specify an explicit initialization strategy, robustness filter, or iterative refinement loop to ensure these initial matches are reliable enough under severe degradations and viewpoint changes; if the starting correspondences are too noisy, the lightweight modules receive unreliable gradients and the mutual improvement may not materialize.
minor comments (2)
  1. [Abstract] Abstract and §4: The phrase 'significant gains' is used repeatedly; replace with concrete quantitative improvements (e.g., PSNR deltas, matching accuracy percentages) and include error bars or statistical tests to allow readers to assess effect sizes.
  2. [§4] §4 (Experiments): Clarify the exact combinations of restoration and matching backbones tested and whether the reported gains are consistent across all pairs or driven by a subset of easier cases.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential significance of MatRes for real-world image pairs with degradations and viewpoint changes. We address the single major comment point-by-point below and have prepared revisions to improve clarity.

read point-by-point responses
  1. Referee: [§3] §3 (Method): The central mechanism relies on initial correspondences produced by the frozen matching model on the degraded input to define locations for conditional similarity enforcement. The manuscript does not specify an explicit initialization strategy, robustness filter, or iterative refinement loop to ensure these initial matches are reliable enough under severe degradations and viewpoint changes; if the starting correspondences are too noisy, the lightweight modules receive unreliable gradients and the mutual improvement may not materialize.

    Authors: We appreciate this observation on the initialization of correspondences. In the revised manuscript we have added an explicit statement in Section 3 clarifying that initial correspondences are obtained directly by running the frozen pretrained matching model on the degraded input paired with the high-quality reference; no additional preprocessing or selection is applied at this stage. We deliberately omit a separate robustness filter or iterative refinement loop at initialization to preserve the zero-shot, single-pair, no-training character of the framework. Instead, the conditional similarity loss applied during test-time adaptation of the lightweight modules serves as the mechanism for refinement: gradients update only the adaptation parameters while the core models remain frozen, allowing the system to improve effective alignment even when some initial matches are noisy. Our experiments across diverse degradation and viewpoint combinations already demonstrate that mutual gains occur in practice, including under severe conditions where standalone matching would fail. To further address the concern, the revision includes a short sensitivity discussion and an additional ablation that perturbs the initial matches to quantify robustness. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is a new empirical framework

full rationale

The paper introduces MatRes as a zero-shot test-time adaptation framework that enforces conditional similarity at corresponding locations on a single degraded/high-quality pair to update only lightweight modules while freezing all pretrained models. No equations or steps reduce by construction to fitted inputs, self-definitions, or self-citation chains. Claims rest on the proposed adaptation procedure plus extensive experiments across combinations, which are independent of the target results. The skeptic concern about bootstrap stability from poor initial matches is a validity issue, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are detailed beyond reliance on pretrained models and the conditional similarity idea.

axioms (1)
  • domain assumption Pretrained restoration and matching models exist and can be kept frozen during lightweight adaptation.
    The framework explicitly keeps all pretrained components frozen.

pith-pipeline@v0.9.0 · 5439 in / 1153 out tokens · 40008 ms · 2026-05-10T16:44:47.300127+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages

  1. [1]

    Deep learning in medical image registration: a survey.Machine Vision and Applications, 31(1):8, 2020

    Grant Haskins, Uwe Kruger, and Pingkun Yan. Deep learning in medical image registration: a survey.Machine Vision and Applications, 31(1):8, 2020. 2

  2. [2]

    Local feature matching using deep learn- ing: A survey.Information Fusion, 107:102344, 2024

    Shibiao Xu, Shunpeng Chen, Rongtao Xu, Changwei Wang, Peng Lu, and Li Guo. Local feature matching using deep learn- ing: A survey.Information Fusion, 107:102344, 2024. 2

  3. [3]

    Deep learning reforms image matching: A survey and outlook.arXiv preprint arXiv:2506.04619, 2025

    Shihua Zhang, Zizhuo Li, Kaining Zhang, Yifan Lu, Yuxin Deng, Linfeng Tang, Xingyu Jiang, and Jiayi Ma. Deep learning reforms image matching: A survey and outlook.arXiv preprint arXiv:2506.04619, 2025. 2

  4. [4]

    Brief review of image denoising techniques.Visual computing for industry, biomedicine, and art, 2(1):7, 2019

    Linwei Fan, Fan Zhang, Hui Fan, and Caiming Zhang. Brief review of image denoising techniques.Visual computing for industry, biomedicine, and art, 2(1):7, 2019. 2

  5. [5]

    Deep learning for image super-resolution: A survey.IEEE transactions on pattern analysis and machine intelligence, 43(10):3365–3387, 2020

    Zhihao Wang, Jian Chen, and Steven CH Hoi. Deep learning for image super-resolution: A survey.IEEE transactions on pattern analysis and machine intelligence, 43(10):3365–3387, 2020. 2

  6. [6]

    Deep image de- blurring: A survey.International Journal of Computer Vision, 130(9):2103–2130, 2022

    Kaihao Zhang, Wenqi Ren, Wenhan Luo, Wei-Sheng Lai, Bj¨orn Stenger, Ming-Hsuan Yang, and Hongdong Li. Deep image de- blurring: A survey.International Journal of Computer Vision, 130(9):2103–2130, 2022. 2

  7. [7]

    A survey of deep learn- ing approaches to image restoration.Neurocomputing, 487:46– 65, 2022

    Jingwen Su, Boyan Xu, and Hujun Yin. A survey of deep learn- ing approaches to image restoration.Neurocomputing, 487:46– 65, 2022. 2

  8. [8]

    Image de- noising: The deep learning revolution and beyond—a survey paper.SIAM Journal on Imaging Sciences, 16(3):1594–1654,

    Michael Elad, Bahjat Kawar, and Gregory Vaksman. Image de- noising: The deep learning revolution and beyond—a survey paper.SIAM Journal on Imaging Sciences, 16(3):1594–1654,

  9. [9]

    A comprehensive review of deep learning-based real-world image restoration.IEEE Access, 11:21049–21067, 2023

    Lujun Zhai, Yonghui Wang, Suxia Cui, and Yu Zhou. A comprehensive review of deep learning-based real-world image restoration.IEEE Access, 11:21049–21067, 2023. 2

  10. [10]

    Eficient image denoising using deep learning: A brief survey.Information Fusion, page 103013, 2025

    Bo Jiang, Jinxing Li, Yao Lu, Qing Cai, Huaibo Song, and Guangming Lu. Eficient image denoising using deep learning: A brief survey.Information Fusion, page 103013, 2025. 2

  11. [11]

    A survey on all-in-one image restoration: Taxon- omy, evaluation and future trends.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2025

    Junjun Jiang, Zengyuan Zuo, Gang Wu, Kui Jiang, and Xian- ming Liu. A survey on all-in-one image restoration: Taxon- omy, evaluation and future trends.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 2025. 2

  12. [12]

    Multi-shot imaging: joint alignment, deblurring and resolution-enhancement

    Haichao Zhang and Lawrence Carin. Multi-shot imaging: joint alignment, deblurring and resolution-enhancement. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2925–2932, 2014. 2

  13. [13]

    Deep burst super-resolution

    Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. Deep burst super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion, pages 9209–9218, 2021. 2

  14. [14]

    Lucas-kanade reloaded: End-to-end super-resolution from raw image bursts

    Bruno Lecouat, Jean Ponce, and Julien Mairal. Lucas-kanade reloaded: End-to-end super-resolution from raw image bursts. InProceedings of the IEEE/CVF international conference on computer vision, pages 2370–2379, 2021. 2

  15. [15]

    A differentiable two-stage alignment scheme for burst image reconstruction with large shift

    Shi Guo, Xi Yang, Jianqi Ma, Gaofeng Ren, and Lei Zhang. A differentiable two-stage alignment scheme for burst image reconstruction with large shift. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17472–17481, 2022. 2

  16. [16]

    Stereo video deblurring

    Anita Sellent, Carsten Rother, and Stefan Roth. Stereo video deblurring. InEuropean conference on computer vision, pages 558–575. Springer, 2016. 2

  17. [17]

    Spatio-temporal transformer network for video restoration

    Tae Hyun Kim, Mehdi SM Sajjadi, Michael Hirsch, and Bern- hard Scholkopf. Spatio-temporal transformer network for video restoration. InProceedings of the European conference on com- puter vision (ECCV), pages 106–122, 2018. 2

  18. [18]

    Joint stereo video deblurring, scene flow estimation and moving object segmentation.IEEE Transactions on Image Processing, 29:1748–1761, 2019

    Liyuan Pan, Yuchao Dai, Miaomiao Liu, Fatih Porikli, and Quan Pan. Joint stereo video deblurring, scene flow estimation and moving object segmentation.IEEE Transactions on Image Processing, 29:1748–1761, 2019. 2

  19. [19]

    Tdan: Temporally-deformable alignment network for video super- resolution

    Yapeng Tian, Yulun Zhang, Yun Fu, and Chenliang Xu. Tdan: Temporally-deformable alignment network for video super- resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. 2

  20. [20]

    Hpatches: A benchmark and evaluation of hand- crafted and learned local descriptors, 2017

    Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krystian Mikolajczyk. Hpatches: A benchmark and evaluation of hand- crafted and learned local descriptors, 2017. 2

  21. [21]

    Lawrence Zitnick, and Piotr Doll ´ar

    Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bour- dev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Doll ´ar. Microsoft coco: Com- mon objects in context, 2015. 2

  22. [22]

    Deep lucas- kanade homography for multimodal image alignment

    Yiming Zhao, Xinming Huang, and Ziming Zhang. Deep lucas- kanade homography for multimodal image alignment. InCVPR,

  23. [23]

    Megadepth: Learning single- view depth prediction from internet photos, 2018

    Zhengqi Li and Noah Snavely. Megadepth: Learning single- view depth prediction from internet photos, 2018. 2

  24. [24]

    Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner

    Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly- annotated 3d reconstructions of indoor scenes. InProc. Com- puter Vision and Pattern Recognition (CVPR), IEEE, 2017. 2

  25. [25]

    LoRA: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Representations, 2022. 2

  26. [26]

    Loftup: Learning a coordinate-based feature upsampler for vision foundation models, 2025

    Haiwen Huang, Anpei Chen, V olodymyr Havrylov, Andreas Geiger, and Dan Zhang. Loftup: Learning a coordinate-based feature upsampler for vision foundation models, 2025. 3, 5, 6, 8

  27. [27]

    Auto- regressive transformation for image alignment

    Kanggeon Lee, Soochahn Lee, and Kyoung Mu Lee. Auto- regressive transformation for image alignment. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13569–13579, October 2025. 3, 5, 6, 8

  28. [28]

    arXiv preprint arXiv:2306.03881 (2023) 14

    Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, and Bharath Hariharan. Emergent correspondence from image diffusion.arXiv preprint arXiv:2306.03881, 2023. 3, 4, 5, 6, 8

  29. [29]

    RoMa: Robust Dense Feature Matching

    Johan Edstedt, Qiyu Sun, Georg B ¨okman, M˚arten Wadenb¨ack, and Michael Felsberg. RoMa: Robust Dense Feature Matching. IEEE Conference on Computer Vision and Pattern Recognition,

  30. [30]

    Grounding image matching in 3d with mast3r, 2024

    Vincent Leroy, Yohann Cabon, and Jerome Revaud. Grounding image matching in 3d with mast3r, 2024. 3, 5, 6, 8

  31. [31]

    Restormer: Efficient transformer for high-resolution image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InCVPR, 2022. 3, 5, 6, 7, 8 9

  32. [32]

    Swinir: Image restoration using swin transformer.arXiv preprint arXiv:2108.10257,

    Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration us- ing swin transformer.arXiv preprint arXiv:2108.10257, 2021. 3, 5, 6, 7

  33. [33]

    Enhanced deep residual networks for single im- age super-resolution

    Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Ky- oung Mu Lee. Enhanced deep residual networks for single im- age super-resolution. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017. 3, 5, 6, 7, 8

  34. [34]

    Noise2noise: Learning image restoration without clean data, 2018

    Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala, and Timo Aila. Noise2noise: Learning image restoration without clean data, 2018. 3, 5, 6, 7

  35. [35]

    Noise2void-learning denoising from single noisy images

    Alexander Krull, Tim-Oliver Buchholz, and Florian Jug. Noise2void-learning denoising from single noisy images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2129–2137, 2019. 3, 5, 6, 7

  36. [36]

    Deblurgan: Blind motion de- blurring using conditional adversarial networks.ArXiv e-prints,

    Orest Kupyn, V olodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Jiri Matas. Deblurgan: Blind motion de- blurring using conditional adversarial networks.ArXiv e-prints,

  37. [37]

    Scale-recurrent network for deep image deblurring

    Xin Tao, Hongyun Gao, Xiaoyong Shen, Jue Wang, and Jiaya Jia. Scale-recurrent network for deep image deblurring. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 3, 5, 6, 7

  38. [38]

    A tale of two features: Stable diffusion complements dino for zero-shot semantic correspondence

    Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera, Varun Jampani, Deqing Sun, and Ming-Hsuan Yang. A tale of two features: Stable diffusion complements dino for zero-shot semantic correspondence. InAdvances in Neural In- formation Processing Systems (NeurIPS) 2023, 2023. 3, 4

  39. [39]

    Zero-shot image feature consensus with deep functional maps.arXiv preprint arXiv:2403.12038, 2024

    Xinle Cheng, Congyue Deng, Adam Harley, Yixin Zhu, and Leonidas Guibas. Zero-shot image feature consensus with deep functional maps.arXiv preprint arXiv:2403.12038, 2024. 3, 4, 5

  40. [40]

    Retinaregnet: A zero-shot approach for retinal image registration.Computers in Biology and Medicine, 186:109645,

    Vishal Balaji Sivaraman, Muhammad Imran, Qingyue Wei, Preethika Muralidharan, Michelle R Tamplin, Isabella M Grum- bach, Randy H Kardon, Jui-Kai Wang, Yuyin Zhou, and Wei Shao. Retinaregnet: A zero-shot approach for retinal image registration.Computers in Biology and Medicine, 186:109645,

  41. [41]

    Denoising diffusion implicit models.International Conference on Learn- ing Representations (ICLR), 2021

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.International Conference on Learn- ing Representations (ICLR), 2021. 4

  42. [42]

    High-resolution image synthesis with latent diffusion models.IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image synthesis with latent diffusion models.IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. 4

  43. [43]

    Visual autoregressive modeling: Scalable image genera- tion via next-scale prediction, 2024

    Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, and Liwei Wang. Visual autoregressive modeling: Scalable image genera- tion via next-scale prediction, 2024. 4

  44. [44]

    An image is worth 16x16 words: Transformers for image recognition at scale, 2021

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa De- hghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021. 4

  45. [45]

    Esrgan: Enhanced super-resolution generative adversarial networks

    Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. InThe Eu- ropean Conference on Computer Vision Workshops (ECCVW), September 2018. 5, 6, 7, 8

  46. [46]

    Activating more pixels in image super-resolution trans- former

    Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. Activating more pixels in image super-resolution trans- former. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 22367– 22377, June 2023. 5, 6, 7

  47. [47]

    Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising.IEEE Transactions on Image Pro- cessing, 26(7):3142–3155, 2017

    Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising.IEEE Transactions on Image Pro- cessing, 26(7):3142–3155, 2017. 5, 6, 7

  48. [48]

    Simple baselines for image restoration

    Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration.arXiv preprint arXiv:2204.04676, 2022. 5, 6, 7

  49. [49]

    Toward real-world single image super-resolution: A new benchmark and a new model

    Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. InProceedings of the IEEE International Conference on Computer Vision, 2019. 5, 7, 8

  50. [50]

    Ntire 2017 challenge on single image super-resolution: Dataset and study

    Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017. 5, 7, 8

  51. [51]

    Low-complexity single-image super- resolution based on nonnegative neighbor embedding

    Marco Bevilacqua, Aline Roumy, Christine Guillemot, and Marie-Line Alberi-Morel. Low-complexity single-image super- resolution based on nonnegative neighbor embedding. InPro- ceedings of the British Machine Vision Conference (BMVC),

  52. [52]

    On single im- age scale-up using sparse representations

    Roman Zeyde, Michael Elad, and Matan Protter. On single im- age scale-up using sparse representations. InProceedings of the International Conference on Curves and Surfaces, 2010. 5, 7, 8

  53. [53]

    Single image super-resolution from transformed self-exemplars

    Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 5, 7, 8

  54. [54]

    A database of human segmented natural images and its ap- plication to evaluating segmentation algorithms and measuring ecological statistics

    David Martin, Charless Fowlkes, Doron Tal, and Jitendra Ma- lik. A database of human segmented natural images and its ap- plication to evaluating segmentation algorithms and measuring ecological statistics. InProceedings of the IEEE International Conference on Computer Vision (ICCV), 2001. 5, 7, 8

  55. [55]

    Deep multi-scale convolutional neural network for dynamic scene de- blurring

    Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. Deep multi-scale convolutional neural network for dynamic scene de- blurring. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), July 2017. 5, 7, 8

  56. [56]

    Human-aware motion deblur- ring

    Ziyi Shen, Wenguan Wang, Xiankai Lu, Jianbing Shen, Haibin Ling, Tingfa Xu, and Ling Shao. Human-aware motion deblur- ring. InProceedings of the IEEE International Conference on Computer Vision (ICCV), 2019. 5, 7, 8

  57. [57]

    Real-world blur dataset for learning and benchmarking deblur- ring algorithms

    Jaesung Rim, Haeyun Lee, Jucheol Won, and Sunghyun Cho. Real-world blur dataset for learning and benchmarking deblur- ring algorithms. InProceedings of the European Conference on Computer Vision (ECCV), 2020. 5, 7, 8

  58. [58]

    Deep learning face attributes in the wild

    Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. InProceedings of Interna- tional Conference on Computer Vision (ICCV), December 2015. 5, 7, 8 10

  59. [59]

    Autolut: Lut-based image super-resolution with au- tomatic sampling and adaptive residual learning, 2025

    Yuheng Xu, Shijie Yang, Xin Liu, Jie Liu, Jie Tang, and Gang- shan Wu. Autolut: Lut-based image super-resolution with au- tomatic sampling and adaptive residual learning, 2025. 5, 7, 8

  60. [60]

    Catanet: Efficient content-aware token aggregation for lightweight image super- resolution.arXiv preprint arXiv:2503.06896, 2025

    Xin Liu, Jie Liu, Jie Tang, and Gangshan Wu. Catanet: Efficient content-aware token aggregation for lightweight image super- resolution.arXiv preprint arXiv:2503.06896, 2025. 5, 7, 8

  61. [61]

    Im-lut: Interpolation mixing look-up tables for image super-resolution, 2025

    Sejin Park, Sangmin Lee, Kyong Hwan Jin, and Seung-Won Jung. Im-lut: Interpolation mixing look-up tables for image super-resolution, 2025. 5, 7

  62. [62]

    Multi-stage progressive image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InCVPR,

  63. [63]

    Robust image denoising through adversarial fre- quency mixup

    Donghun Ryou, Inju Ha, Hyewon Yoo, Dongwan Kim, and Bo- hyung Han. Robust image denoising through adversarial fre- quency mixup. In2024 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 2723–2732, 2024. 5, 7, 8

  64. [64]

    Maxim: Multi-axis mlp for image processing.CVPR, 2022

    Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Pey- man Milanfar, Alan Bovik, and Yinxiao Li. Maxim: Multi-axis mlp for image processing.CVPR, 2022. 5, 7

  65. [65]

    Blur2blur: Blur conversion for unsu- pervised image deblurring on unknown domains

    Bang-Dang Pham, Phong Tran, Anh Tran, Cuong Pham, Rang Nguyen, and Minh Hoai. Blur2blur: Blur conversion for unsu- pervised image deblurring on unknown domains. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 2024. 5, 7, 8

  66. [66]

    Stripformer: Strip transformer for fast im- age deblurring, 2022

    Fu-Jen Tsai, Yan-Tsung Peng, Yen-Yu Lin, Chung-Chi Tsai, and Chia-Wen Lin. Stripformer: Strip transformer for fast im- age deblurring, 2022. 5, 7, 8

  67. [67]

    Hierarchical integration diffusion model for realistic image deblurring

    Zheng Chen, Yulun Zhang, Liu Ding, Xia Bin, Jinjin Gu, Linghe Kong, and Xin Yuan. Hierarchical integration diffusion model for realistic image deblurring. InNeurIPS, 2023. 5, 7

  68. [68]

    Minima: Modality invariant image matching

    Jiangwei Ren, Xingyu Jiang, Zizhuo Li, Dingkang Liang, Xin Zhou, and Xiang Bai. Minima: Modality invariant image matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 5, 8

  69. [69]

    Learning affine correspondences by integrating geometric constraints

    Pengju Sun, Banglei Guan, Zhenbao Yu, Yang Shang, Qifeng Yu, and Daniel Barath. Learning affine correspondences by integrating geometric constraints. InIEEE Conference on Computer Vision and Pattern Recognition, pages 27038–27048,

  70. [70]

    Matchanything: Universal cross-modality image matching with large-scale pre-training

    Xingyi He, Hao Yu, Sida Peng, Dongli Tan, Zehong Shen, Hujun Bao, and Xiaowei Zhou. Matchanything: Universal cross-modality image matching with large-scale pre-training. In Arxiv, 2025. 5, 8

  71. [71]

    Decoupled weight decay reg- ularization, 2019

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay reg- ularization, 2019. 5

  72. [72]

    The dual-bootstrap iterative closest point algorithm with application to retinal image registration.IEEE Transactions on Medical Imaging, 22, 2003

    Charles Stewart, Chia-Ling Tsai, and Badrinath Roysam. The dual-bootstrap iterative closest point algorithm with application to retinal image registration.IEEE Transactions on Medical Imaging, 22, 2003. 7

  73. [73]

    Fire: Fundus image registration dataset.Modeling and Artificial Intelligence in Ophthalmology, 2017

    Carlos Hernandez-Matas, Xenophon Zabulis, Areti Triantafyl- lou, Panagiota Anyfanti, Stella Douma, and Antonis A Argyros. Fire: Fundus image registration dataset.Modeling and Artificial Intelligence in Ophthalmology, 2017. 7

  74. [74]

    Image quality metrics: Psnr vs

    Alain Hor ´e and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In2010 20th International Conference on Pattern Recog- nition, pages 2366–2369, 2010. 7 11