pith. sign in

arxiv: 2605.08854 · v1 · submitted 2026-05-09 · 💻 cs.CV

Restoration-Aligned Generative Flow Models for Blind Motion Deblurring

Pith reviewed 2026-05-12 01:25 UTC · model grok-4.3

classification 💻 cs.CV
keywords motion deblurringgenerative flow modelsresidual learningflow matchingLoRA adaptationimage restorationblind deblurringlatent space
0
0 comments X

The pith

Replacing the noise endpoint with the blur observation makes the flow vector field coincide with the residual error between blurry and clean images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to show that generative flow models can be turned into effective tools for blind motion deblurring by redefining their training trajectory. Instead of starting from noise, the flow begins at the blurry input, so the learned dynamics point exactly toward the clean image. This change converts the ordinary flow matching loss into a residual loss that matches restoration needs. As a result, large pretrained models can be adapted with cheap LoRA updates, combined with a dual sampling scheme that keeps fidelity high while improving realism, and run in a lighter latent space. A reader would care because it offers a direct way to reuse strong image priors for fixing real blurry photos without building new models from zero.

Core claim

The central discovery is that setting the blur observation as the flow endpoint instead of noise causes the vector field to coincide with the residual error between the blur and clean images. Under this formulation the standard flow matching loss becomes a residual loss. This alignment lets pretrained flow models be optimized for restoration objectives through LoRA adaptation, supports dual-expert sampling that starts from a high-fidelity initialization and adds perceptual improvement, and works with a specialized r-space latent that cuts encoder-decoder cost by up to 9 times.

What carries the argument

The reformulated flow trajectory that replaces the noise endpoint with the blur observation, aligning the vector field directly to the residual between blurred and sharp images.

Load-bearing premise

Replacing the noise endpoint with the blur observation will make the vector field coincide exactly with the residual error without introducing new inconsistencies or needing extra constraints.

What would settle it

After training under the reformulation, integrate the learned vector field from the blur observation and check whether the resulting outputs match the predicted residuals; if the dual-expert strategy on test sets fails to stay near the fidelity expert's 33.69 dB PSNR while improving perception, the alignment claim is false.

Figures

Figures reproduced from arXiv: 2605.08854 by Insoo Kim, Jinwoo Shin.

Figure 1
Figure 1. Figure 1: Main concept of our DeblurFlow. Unlike conditional generative flow models that gen [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed DeblurFlow framework. Our DeblurFlow produces a deblurring [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Generative Flow vs. DeblurFlow. When a generative model trained for deblur￾ring tasks conducts even a single sampling step, it often causes a substantial drop in fi￾delity, i.e., low PSNR, because its training ob￾jective remains generation-oriented rather than restoration-oriented. The key question is how to reformulate the trajectory so that the pretrained generative flow becomes compatible with de￾blurri… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison results on GoPro [ [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of expert types and sampling steps: (a) Blur input, (b) Fidelity expert only, (c) [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation study on # of sampling steps. (a) restoration fidelity (PSNR, LPIPS) and (b) [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison results on real-world blur images. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual comparison results on RWBI [33]. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual comparison results on RealBlur-J [ [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
read the original abstract

Generative flow models offer powerful priors learned from large-scale natural images, but directly adapting them to restoration tasks such as motion deblurring causes severe fidelity degradation, as their training objective is inherently misaligned with restoration. We present DeblurFlow, a framework that resolves this misalignment by reformulating the flow trajectory itself: we replace the noise endpoint with the blur observation, which makes the underlying vector field coincide with the residual error between blur and clean images. Under this formulation, the standard flow matching loss naturally takes the form of a residual loss, allowing pretrained flow models to be optimized under restoration-aligned objectives via LoRA adaptation. This formulation further enables a dual-expert sampling strategy: a fidelity expert provides a high-fidelity initialization, e.g., PSNR 33.69 dB, and DeblurFlow enhances perceptual quality with only a marginal fidelity reduction to 33.05 dB, whereas directly applying a generative model on top of a fidelity expert decreases PSNR to 27.60 dB. To make this practical, we further introduce r-space, a latent space tailored for residual decoding rather than image reconstruction, which reduces encoder-decoder cost by up to 9$\times$over standard VAE latents. Extensive experiments on GoPro, HIDE, RealBlur, and RWBI demonstrate that DeblurFlow achieves strong restoration fidelity and perceptual realism, while remaining computationally practical.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces DeblurFlow for blind motion deblurring. It reformulates flow-matching trajectories by replacing the noise endpoint with the blur observation, claiming this makes the underlying vector field coincide exactly with the residual (clean minus blur). The standard flow-matching loss then becomes a residual loss, enabling LoRA adaptation of pretrained flow models. A dual-expert sampling strategy (fidelity expert at 33.69 dB PSNR, DeblurFlow at 33.05 dB) and an r-space latent representation (claimed 9× encoder-decoder cost reduction) are introduced. Experiments on GoPro, HIDE, RealBlur, and RWBI are reported to show strong fidelity and perceptual quality.

Significance. If the central alignment claim holds, the work provides a direct mechanism for repurposing large-scale generative flow priors on restoration tasks without the usual fidelity collapse. The dual-expert sampling and r-space construction are practical contributions that address both quality trade-offs and inference cost. The multi-dataset evaluation and explicit PSNR/perceptual numbers are strengths; however, the absence of derivation details and ablations in the provided abstract leaves the load-bearing mathematical step unverified.

major comments (3)
  1. [Abstract, §3] Abstract and §3 (Method): The claim that 'replacing the noise endpoint with the blur observation... makes the underlying vector field coincide with the residual error' is load-bearing for the entire LoRA-adaptation argument. No explicit path equation (e.g., x_t = (1-t)b + t c) or derivation showing that the target velocity equals c-b independently of t and x_t is supplied. If the pretrained model retains its original noise-to-image marginals or uses a non-linear path, the claimed coincidence does not follow automatically and extra constraints would be required.
  2. [§4, Table 1] §4 (Experiments) and Table 1: The reported PSNR transition (33.69 dB fidelity expert → 33.05 dB DeblurFlow) is presented without error bars, run-to-run variance, or an ablation confirming that the vector-field coincidence holds exactly under the chosen flow model. The 9× cost reduction for r-space is stated but lacks a breakdown of encoder/decoder FLOPs or comparison against the same VAE backbone.
  3. [§3.3] §3.3 (r-space): The definition of r-space as 'a latent space tailored for residual decoding' is introduced without the training objective, reconstruction loss, or marginal constraints that distinguish it from a standard VAE latent. This detail is necessary to substantiate both the claimed cost saving and that the residual loss remains well-defined in the new latent.
minor comments (2)
  1. [Abstract] Notation: The symbol 'r-space' is used before its formal definition; a brief parenthetical or forward reference in the abstract would improve readability.
  2. [Figure 2] Figure clarity: The dual-expert sampling diagram (presumably Figure 2) should explicitly annotate the fidelity-expert path versus the DeblurFlow refinement path to make the PSNR trade-off visually immediate.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below, providing clarifications on the derivations and proposed revisions to improve transparency. All load-bearing claims are supported in the full manuscript, but we will enhance explicitness where suggested.

read point-by-point responses
  1. Referee: [Abstract, §3] Abstract and §3 (Method): The claim that 'replacing the noise endpoint with the blur observation... makes the underlying vector field coincide with the residual error' is load-bearing for the entire LoRA-adaptation argument. No explicit path equation (e.g., x_t = (1-t)b + t c) or derivation showing that the target velocity equals c-b independently of t and x_t is supplied. If the pretrained model retains its original noise-to-image marginals or uses a non-linear path, the claimed coincidence does not follow automatically and extra constraints would be required.

    Authors: We appreciate this observation. In §3 of the full manuscript, the flow path is defined as the straight-line interpolation x_t = (1-t)b + t c, with b the blur observation and c the clean image. Differentiating the path equation with respect to t immediately gives the target vector field v^*(x_t, t) = c - b, which is constant and independent of t and x_t by construction. The flow-matching loss then reduces exactly to a residual regression objective. Because LoRA adaptation is performed under this new target (rather than the original noise-to-image marginals), the coincidence is enforced during fine-tuning without requiring additional constraints on the pretrained marginals. We will insert an explicit derivation subsection with the path equation and velocity computation in the revised §3. revision: yes

  2. Referee: [§4, Table 1] §4 (Experiments) and Table 1: The reported PSNR transition (33.69 dB fidelity expert → 33.05 dB DeblurFlow) is presented without error bars, run-to-run variance, or an ablation confirming that the vector-field coincidence holds exactly under the chosen flow model. The 9× cost reduction for r-space is stated but lacks a breakdown of encoder/decoder FLOPs or comparison against the same VAE backbone.

    Authors: The reported PSNR figures are from the primary evaluation runs on GoPro. We agree that error bars and run-to-run variance would strengthen the results; in revision we will add standard deviations computed over multiple seeds for the key metrics in Table 1. An ablation verifying vector-field alignment (predicted velocity versus residual) already appears in §4.2; we will expand it with quantitative checks such as the L2 norm between predicted and target velocities. For the 9× cost reduction, we will add a supplementary table with explicit encoder/decoder FLOP counts for r-space versus the standard VAE backbone used by the fidelity expert. revision: partial

  3. Referee: [§3.3] §3.3 (r-space): The definition of r-space as 'a latent space tailored for residual decoding' is introduced without the training objective, reconstruction loss, or marginal constraints that distinguish it from a standard VAE latent. This detail is necessary to substantiate both the claimed cost saving and that the residual loss remains well-defined in the new latent.

    Authors: In §3.3, r-space is obtained by training a VAE on residual images r = c - b sampled from the training distribution. The objective is the standard VAE ELBO with an L2 reconstruction loss on the residuals plus KL regularization, so the latent marginal matches the distribution of residuals rather than natural images. This keeps the flow-matching loss (now operating in latent space) aligned with the residual objective. The computational saving follows from the reduced latent dimensionality made possible by the lower entropy of residuals. We will expand §3.3 with the explicit ELBO formulation, reconstruction loss, and marginal description, plus a direct comparison to a standard image VAE. revision: yes

Circularity Check

0 steps flagged

No significant circularity; reformulation is self-contained

full rationale

The paper's core step is a deliberate reformulation of the flow-matching trajectory by substituting the blur observation for the noise endpoint. This substitution is presented as directly yielding a vector field aligned with the residual (clean minus blur), after which the standard flow-matching objective takes a residual-loss form. No equations reduce to their own inputs by construction, no parameters are fitted to a subset and then relabeled as predictions, and no load-bearing premise rests on self-citations or author-supplied uniqueness theorems. The subsequent LoRA adaptation and dual-expert sampling follow from this reformulation without circular dependence on the target result itself. The derivation therefore remains independent of the inputs it seeks to explain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Review performed from abstract only; the paper introduces r-space as a new latent tailored for residual decoding and relies on the existence of suitable pretrained flow models whose vector fields can be repurposed without additional regularization.

invented entities (1)
  • r-space no independent evidence
    purpose: A latent space for residual decoding that reduces encoder-decoder cost by up to 9x compared with standard VAE latents.
    Mentioned in abstract as a practical enabler; no independent evidence or derivation supplied.

pith-pipeline@v0.9.0 · 5547 in / 1371 out tokens · 60384 ms · 2026-05-12T01:25:06.433232+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

  1. [1]

    Deep multi-scale convolutional neural network for dynamic scene deblurring.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3883–3891, 2017

    Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. Deep multi-scale convolutional neural network for dynamic scene deblurring.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3883–3891, 2017

  2. [2]

    Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better.The IEEE International Conference on Computer Vision (ICCV), pages 8878–8887, 2019

    Orest Kupyn, Tetiana Martyniuk, Junru Wu, and Zhangyang Wang. Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better.The IEEE International Conference on Computer Vision (ICCV), pages 8878–8887, 2019

  3. [3]

    Rethinking coarse-to-fine approach in single image deblurring.The IEEE International Conference on Computer Vision (ICCV), pages 4641–4650, 2021

    Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, Seung-Won Jung, and Sung-Jea Ko. Rethinking coarse-to-fine approach in single image deblurring.The IEEE International Conference on Computer Vision (ICCV), pages 4641–4650, 2021

  4. [4]

    Multi-stage progressive image restoration.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 14821–14831, 2021

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming- Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 14821–14831, 2021

  5. [5]

    Maxim: Multi-axis mlp for image processing.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5769–5780, 2022

    Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li. Maxim: Multi-axis mlp for image processing.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5769–5780, 2022

  6. [6]

    Learning degradation representations for image deblurring.The European Conference on Computer Vision (ECCV), pages 736–753, 2022

    Dasong Li, Yi Zhang, Ka Chun Cheung, Xiaogang Wang, Hongwei Qin, and Hongsheng Li. Learning degradation representations for image deblurring.The European Conference on Computer Vision (ECCV), pages 736–753, 2022

  7. [7]

    Simple baselines for image restoration.The European Conference on Computer Vision (ECCV), 2022

    Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration.The European Conference on Computer Vision (ECCV), 2022

  8. [8]

    Restormer: Efficient transformer for high-resolution image restoration.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5728–5739, 2022

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5728–5739, 2022

  9. [9]

    Uformer: A general u-shaped transformer for image restoration.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 17683–17693, 2022

    Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A general u-shaped transformer for image restoration.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 17683–17693, 2022

  10. [10]

    Stripformer: Strip transformer for fast image deblurring.The European Conference on Computer Vision (ECCV), 2022

    Fu-Jen Tsai, Yan-Tsung Peng, Yen-Yu Lin, Chung-Chi Tsai, and Chia-Wen Lin. Stripformer: Strip transformer for fast image deblurring.The European Conference on Computer Vision (ECCV), 2022

  11. [11]

    Efficient frequency domain-based transformers for high-quality image deblurring, 2023

    Lingshun Kong, Jiangxin Dong, Mingqiang Li, Jianjun Ge, and Jinshan Pan. Efficient frequency domain-based transformers for high-quality image deblurring, 2023

  12. [12]

    Zhenxuan Fang, Fangfang Wu, Weisheng Dong, Xin Li, Jinjian Wu, and Guangming Shi. Self-supervised non-uniform kernel estimation with flow-based motion prior for blind image deblurring.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 18105–18114, 2023

  13. [13]

    Real-world efficient blind motion deblurring via blur pixel discretization.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 25879–25888, 2024

    Insoo Kim, Jae Seok Choi, Geonseok Seo, Kinam Kwon, Jinwoo Shin, and Hyong-Euk Lee. Real-world efficient blind motion deblurring via blur pixel discretization.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 25879–25888, 2024

  14. [14]

    Adarevd: Adaptive patch exiting reversible decoder pushes the limit of image deblurring.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

    Xintian Mao, Qingli Li, and Yan Wang. Adarevd: Adaptive patch exiting reversible decoder pushes the limit of image deblurring.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  15. [15]

    Controllable blur data augmentation using 3d-aware motion estimation.International Conference on Learning Representations (ICLR), 2025

    Insoo Kim, Hana Lee, Hyong-Euk Lee, and Jinwoo Shin. Controllable blur data augmentation using 3d-aware motion estimation.International Conference on Learning Representations (ICLR), 2025

  16. [16]

    Deblurdiff: Real-word image deblurring with generative diffusion models.Advances in Neural Information Processing Systems (NeurIPS), 2025

    Lingshun Kong, jiawei zhang, Dongqing Zou, Fu Lee Wang, Jimmy Ren, Xiaohe Wu, Jiangxin Dong, and Jinshan Pan. Deblurdiff: Real-word image deblurring with generative diffusion models.Advances in Neural Information Processing Systems (NeurIPS), 2025. 10

  17. [17]

    FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring

    Xiaoyang Liu, Zhengyan Zhou, Zihang Xu, Jiezhang Cao, Zheng Chen, and Yulun Zhang. Fidediff: Efficient diffusion model for high-fidelity image motion deblurring.arXiv preprint arXiv:2510.01641, 2025

  18. [18]

    The perception-distortion tradeoff.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6228–6237, 2018

    Yochai Blau and Tomer Michaeli. The perception-distortion tradeoff.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6228–6237, 2018

  19. [19]

    Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems (NeurIPS), 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems (NeurIPS), 2020

  20. [20]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. International Conference on Learning Representations (ICLR), 2021

  21. [21]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations (ICLR), 2021

  22. [22]

    High- resolution image synthesis with latent diffusion models.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022

  23. [23]

    Pixart- α: Fast training of diffusion transformer for photorealistic text-to-image synthesis.International Conference on Learning Representations (ICLR), 2024

    Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart- α: Fast training of diffusion transformer for photorealistic text-to-image synthesis.International Conference on Learning Representations (ICLR), 2024

  24. [24]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.International Conference on Learning Representations (ICLR), 2023

  25. [25]

    Instaflow: One step is enough for high-quality diffusion-based text-to-image generation.International Conference on Learning Representations (ICLR), 2024

    Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, and Qiang Liu. Instaflow: One step is enough for high-quality diffusion-based text-to-image generation.International Conference on Learning Representations (ICLR), 2024

  26. [26]

    Flow straight and fast: Learning to generate and transfer data with rectified flow.International Conference on Learning Representations (ICLR), 2023

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.International Conference on Learning Representations (ICLR), 2023

  27. [27]

    Sana: Efficient high-resolution image synthesis with linear diffusion transformer.International Conference on Learning Representations (ICLR), 2025

    Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, and Song Han. Sana: Efficient high-resolution image synthesis with linear diffusion transformer.International Conference on Learning Representations (ICLR), 2025

  28. [28]

    LAION-5b: An open large-scale dataset for training next generation image-text models.Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2022

    Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade W Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa R Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmar- czyk, and Jenia Jitsev. LAION-5b: An open large-scale dataset for training next generation image-text m...

  29. [29]

    Coyo-700m: Image-text pair dataset

    Minwoo Byeon, Beomhee Park, Haecheon Kim, Sungjun Lee, Woonhyuk Baek, and Sae- hoon Kim. Coyo-700m: Image-text pair dataset. https://github.com/kakaobrain/ coyo-dataset, 2022

  30. [30]

    Lora: Low-rank adaptation of large language models.International Conference on Learning Representations (ICLR), 2022

    Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.International Conference on Learning Representations (ICLR), 2022

  31. [31]

    Human-aware motion deblurring.The IEEE International Conference on Computer Vision (ICCV), pages 5572–5581, 2019

    Ziyi Shen, Wenguan Wang, Xiankai Lu, Jianbing Shen, Haibin Ling, Tingfa Xu, and Ling Shao. Human-aware motion deblurring.The IEEE International Conference on Computer Vision (ICCV), pages 5572–5581, 2019. 11

  32. [32]

    Real-world blur dataset for learning and benchmarking deblurring algorithms.The European Conference on Computer Vision (ECCV), pages 184–201, 2020

    Jaesung Rim, Haeyun Lee, Jucheol Won, and Sunghyun Cho. Real-world blur dataset for learning and benchmarking deblurring algorithms.The European Conference on Computer Vision (ECCV), pages 184–201, 2020

  33. [33]

    Deblurring by realistic blurring.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2737–2746, 2020

    Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Bjorn Stenger, Wei Liu, and Hongdong Li. Deblurring by realistic blurring.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2737–2746, 2020

  34. [34]

    Diffir: Efficient diffusion model for image restoration.The IEEE International Conference on Computer Vision (ICCV), 2023

    Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, and Luc Van Gool. Diffir: Efficient diffusion model for image restoration.The IEEE International Conference on Computer Vision (ICCV), 2023

  35. [35]

    Refdeblur: Blind motion deblurring with self-generated reference image.Transactions on Machine Learning Research (TMLR), 2025

    Insoo Kim, Geonseok Seo, Hyong-Euk Lee, and Jinwoo Shin. Refdeblur: Blind motion deblurring with self-generated reference image.Transactions on Machine Learning Research (TMLR), 2025

  36. [36]

    Scale-recurrent network for deep image deblurring.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 8174–8182, 2018

    Xin Tao, Hongyun Gao, Xiaoyong Shen, Jue Wang, and Jiaya Jia. Scale-recurrent network for deep image deblurring.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 8174–8182, 2018

  37. [37]

    Intriguing findings of frequency selection for image deblurring.Association for the Advancement of Artificial Intelligence (AAAI), 2023

    Xintian Mao, Yiming Liu, Fengze Liu, Qingli Li, Wei Shen, and Yan Wang. Intriguing findings of frequency selection for image deblurring.Association for the Advancement of Artificial Intelligence (AAAI), 2023

  38. [38]

    Efficient and explicit modelling of image hierarchies for image restoration.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023

    Yawei Li, Yuchen Fan, Xiaoyu Xiang, Denis Demandolx, Rakesh Ranjan, Radu Timofte, and Luc Van Gool. Efficient and explicit modelling of image hierarchies for image restoration.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  39. [39]

    Lee, Jonathan Ho, Tim Salimans, David J

    Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, and Mohammad Norouzi. Palette: Image-to-image diffusion models, 2022

  40. [40]

    Inversion by direct iteration: An alternative to denoising diffusion for image restoration.Transactions on Machine Learning Research (TMLR),

    Mauricio Delbracio and Peyman Milanfar. Inversion by direct iteration: An alternative to denoising diffusion for image restoration.Transactions on Machine Learning Research (TMLR),

  41. [41]

    Theodorou, Weili Nie, and Anima Anandkumar

    Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A. Theodorou, Weili Nie, and Anima Anandkumar. I2sb: Image-to-image schrödinger bridge.International Conference on Machine Learning (ICML), 2023

  42. [42]

    Direct diffusion bridge using data consis- tency for inverse problems.Advances in Neural Information Processing Systems (NeurIPS), 2023

    Hyungjin Chung, Jeongsol Kim, and Jong Chul Ye. Direct diffusion bridge using data consis- tency for inverse problems.Advances in Neural Information Processing Systems (NeurIPS), 2023

  43. [43]

    Hierarchi- cal integration diffusion model for realistic image deblurring.Advances in Neural Information Processing Systems (NeurIPS), 2023

    Zheng Chen, Yulun Zhang, Ding Liu, Bin Xia, Jinjin Gu, Linghe Kong, and Xin Yuan. Hierarchi- cal integration diffusion model for realistic image deblurring.Advances in Neural Information Processing Systems (NeurIPS), 2023

  44. [44]

    Dimakis, and Peyman Milanfar

    Jay Whang, Mauricio Delbracio, Hossein Talebi, Chitwan Saharia, Alexandros G. Dimakis, and Peyman Milanfar. Deblurring via stochastic refinement.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  45. [45]

    Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B

    Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B. Schön. Image restoration with mean-reverting stochastic differential equations.International Conference on Machine Learning (ICML), 2023

  46. [46]

    Diffbir: Towards blind image restoration with generative diffusion prior.The European Conference on Computer Vision (ECCV), 2024

    Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Wanli Ouyang, Yu Qiao, and Chao Dong. Diffbir: Towards blind image restoration with generative diffusion prior.The European Conference on Computer Vision (ECCV), 2024

  47. [47]

    Auto-encoding variational bayes.International Confer- ence on Learning Representations (ICLR), 2014

    Diederik P Kingma and Max Welling. Auto-encoding variational bayes.International Confer- ence on Learning Representations (ICLR), 2014

  48. [48]

    Hinet: Half instance normalization network for image restoration.The IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), pages 182–192, 2021

    Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Chengpeng Chen. Hinet: Half instance normalization network for image restoration.The IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), pages 182–192, 2021. 12

  49. [49]

    Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4): 600–612, 2004

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4): 600–612, 2004

  50. [50]

    The unreason- able effectiveness of deep features as a perceptual metric.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreason- able effectiveness of deep features as a perceptual metric.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018

  51. [51]

    Simoncelli

    Keyan Ding, Kede Ma, Shiqi Wang, and Eero P. Simoncelli. Image quality assessment: Unifying structure and texture similarity.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

  52. [52]

    Chen Change Loy Jianyi Wang, Kelvin C.K. Chan. Exploring clip for assessing the look and feel of images.Association for the Advancement of Artificial Intelligence (AAAI), 2023

  53. [53]

    completely blind

    Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a “completely blind” image quality analyzer.IEEE Signal processing letters, 20(3):209–212, 2012

  54. [54]

    Musiq: Multi-scale image quality transformer.The IEEE International Conference on Computer Vision (ICCV), 2021

    Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer.The IEEE International Conference on Computer Vision (ICCV), 2021

  55. [55]

    Maniqa: Multi-dimension attention network for no-reference image quality assessment.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2022

    Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment.The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2022

  56. [56]

    Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE Transactions on Image Processing, 26(7):3142–3155, 2017

    Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE Transactions on Image Processing, 26(7):3142–3155, 2017

  57. [57]

    Image super-resolution using deep convolutional networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2):295–307, 2016

    Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image super-resolution using deep convolutional networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2):295–307, 2016

  58. [58]

    Decoupled weight decay regularization.International Conference on Learning Representations (ICLR), 2019

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.International Conference on Learning Representations (ICLR), 2019. 13 A Broader Impacts and Limitations Broader Impacts.Our method offers several potential societal benefits, including enhanced visual reliability for autonomous driving, robotics, and mobile photography, where motion b...