pith. machine review for the scientific record. sign in

arxiv: 2604.04924 · v1 · submitted 2026-04-06 · 💻 cs.CV · cs.AI

Recognition: no theorem link

Your Pre-trained Diffusion Model Secretly Knows Restoration

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:03 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords diffusion modelsimage restorationprompt embeddingsall-in-one restorationpre-trained modelsdiffusion bridgevideo restoration
0
0 comments X

The pith

Pre-trained diffusion models contain inherent restoration behavior that is unlocked by learning prompt embeddings at the text encoder output.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that pre-trained diffusion models already hold the ability to restore degraded images and video. This capacity is accessed by optimizing prompt embeddings directly at the text encoder rather than through text prompts or full model changes. The authors identify that standard prompt learning fails because noising degraded images during training does not match the reverse denoising path used at inference. They fix this misalignment by training the prompts inside a diffusion bridge that creates a consistent trajectory from noisy degraded inputs to clean outputs. The result converts existing models such as WAN and FLUX into effective restoration systems using only lightweight prompt adjustments.

Core claim

Pre-trained diffusion models inherently possess restoration behavior, which can be unlocked by directly learning prompt embeddings at the output of the text encoder. This behavior is largely inaccessible through text prompts and text-token embedding optimization. Naive prompt learning is unstable because the forward noising process using degraded images is misaligned with the reverse sampling trajectory. Training prompts within a diffusion bridge formulation aligns training and inference dynamics and enforces a coherent denoising path from noisy degraded states to clean images. This converts the pre-trained WAN video model and FLUX image models into high-performing restoration models.

What carries the argument

Learned prompt embeddings at the text encoder output trained inside a diffusion bridge formulation that aligns the forward noising of degraded images with the reverse sampling trajectory.

If this is right

  • The approach delivers competitive performance and generalization across diverse degradations using only prompt changes.
  • Existing pre-trained models can be turned into restoration systems without fine-tuning or added control modules.
  • The method works on both image models such as FLUX and video models such as WAN.
  • Restoration quality remains high while keeping the adaptation lightweight.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same prompt-embedding technique could be tested on other generative tasks to see whether latent capabilities surface without retraining.
  • If the bridge alignment proves general, it might simplify adaptation of large models across many downstream problems.
  • Further checks on additional model families would clarify how widely the hidden restoration behavior exists.

Load-bearing premise

The restoration behavior is truly inherent in the pre-trained model rather than produced by the prompt optimization and bridge method, and that misalignment between noising and sampling is the main source of instability.

What would settle it

An experiment in which the same prompt optimization and bridge formulation fails to produce restoration on a diffusion model that lacks the claimed inherent priors, or succeeds equally well without the bridge, would disprove the central claim.

Figures

Figures reproduced from arXiv: 2604.04924 by Sudarshan Rajagopalan, Vishal M. Patel.

Figure 1
Figure 1. Figure 1: Popular editing based approaches such as SDEdit [36], and Prompt-to￾prompt [18] with Null-text inversion [37] (P2P + NTI) work well for high-level editing but perform poorly for restoration tasks, in this case dehazing. modules [34, 58]. While effective, these approaches typically require substantial training and can be overly sensitive to the training distribution which can weaken the strong pre-trained p… view at source ↗
Figure 2
Figure 2. Figure 2: Text token-space prompting is ineffective for restoration: Even with optimized token prompts (textual inversion/prompt tuning), the model tends to denoise with￾out removing degradations, whereas embedding-space optimization enables restoration from the same noisy degraded input. against state-of-the-art restoration methods across diverse degradations with significantly fewer trainable parameters. To summar… view at source ↗
Figure 3
Figure 3. Figure 3: (a) We freeze the diffusion backbone and optimize only the conditioning: token￾space prompt optimization fails, while embedding-space (text-encoder output) opti￾mization elicits restoration. (b) Naive tuning yields states anchored at zdeg; DDBM [61] is pinned at both endpoints; our desired/EBR-style [21] bridge starts from noisy de￾graded inputs and denoises monotonically as the content transitions toward … view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparisons of the pre-trained FLUX model using our learned prompts with state-of-the-art AiOR approaches. Our approach enables the pre-trained FLUX to achieve remarkable restoration performance. WBSnow denotes the snow sub￾set of the WeatherBench [15] dataset. video quality metric. We also report the scores for image-based approaches. Since the publicly available checkpoints of AverNet and ViW… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparisons of the pre-trained WAN model using our learned prompts with state-of-the-art AiOR approaches. ViWS-Net and AverNet are video restoration approaches while others are proposed for image restoration. Our prompts elicit the strong restoration potential of the pre-trained WAN model. AAU: AAURain￾Snow [2], NTU: real test set of NTU-Rain [7]. instantiated WAN model for video restoration, w… view at source ↗
Figure 6
Figure 6. Figure 6: Although naive prompt training enhances the image, it produces several arti￾facts due to trajectory mismatch. DDBM with prompt enhances the image marginally. EBR provides best results [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Plot of normalized scores (Sec. 4.4) for determining the best value of T0 across different candidates: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6. A higher score is better [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
read the original abstract

Pre-trained diffusion models have enabled significant advancements in All-in-One Restoration (AiOR), offering improved perceptual quality and generalization. However, diffusion-based restoration methods primarily rely on fine-tuning or Control-Net style modules to leverage the pre-trained diffusion model's priors for AiOR. In this work, we show that these pre-trained diffusion models inherently possess restoration behavior, which can be unlocked by directly learning prompt embeddings at the output of the text encoder. Interestingly, this behavior is largely inaccessible through text prompts and text-token embedding optimization. Furthermore, we observe that naive prompt learning is unstable because the forward noising process using degraded images is misaligned with the reverse sampling trajectory. To resolve this, we train prompts within a diffusion bridge formulation that aligns training and inference dynamics, enforcing a coherent denoising path from noisy degraded states to clean images. Building on these insights, we introduce our lightweight learned prompts on the pre-trained WAN video model and FLUX image models, converting them into high-performing restoration models. Extensive experiments demonstrate that our approach achieves competitive performance and generalization across diverse degradations, while avoiding fine-tuning and restoration-specific control modules.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that pre-trained diffusion models inherently possess restoration behavior for all-in-one image/video restoration, which can be unlocked by directly optimizing prompt embeddings at the text-encoder output (rather than via text prompts or token embeddings). Naive prompt learning is unstable due to misalignment between forward noising on degraded inputs and reverse sampling; this is resolved by training inside a diffusion-bridge formulation that enforces coherent denoising paths. The method is applied as lightweight learned prompts to the pre-trained WAN video model and FLUX image model, yielding competitive performance and generalization across degradations without any fine-tuning or restoration-specific control modules.

Significance. If the central claim holds, the work provides a lightweight, fine-tuning-free route to repurpose large pre-trained diffusion models for restoration, which could reduce adaptation costs and improve generalization. The diffusion-bridge alignment and text-encoder-output embedding optimization are practical contributions that avoid ControlNet-style modules.

major comments (2)
  1. [Abstract] Abstract: the claim that pre-trained models 'inherently possess restoration behavior' is load-bearing for the paper's novelty yet rests on empirical observation without a control experiment that applies the identical prompt-optimization recipe and diffusion-bridge formulation to a randomly-initialized network of the same architecture. Without this, the results remain consistent with the bridge simply learning a restoration mapping in embedding space rather than unlocking an intrinsic prior.
  2. [Abstract] The abstract states that 'extensive experiments demonstrate competitive performance' but supplies no quantitative metrics, baselines, ablation tables, or validation details for the diffusion-bridge alignment; this absence makes it impossible to evaluate whether the reported gains depend on the pre-trained weights or on the specific bridge construction.
minor comments (2)
  1. Clarify the precise mathematical definition of the diffusion bridge (e.g., the forward and reverse processes, any additional loss terms, and how alignment between training and inference trajectories is enforced).
  2. Specify the optimization details for the learned prompt embeddings, including the exact loss, learning rate schedule, and number of trainable parameters relative to the full model.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. We address the major comments point by point below, acknowledging where the manuscript can be strengthened and outlining the planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that pre-trained models 'inherently possess restoration behavior' is load-bearing for the paper's novelty yet rests on empirical observation without a control experiment that applies the identical prompt-optimization recipe and diffusion-bridge formulation to a randomly-initialized network of the same architecture. Without this, the results remain consistent with the bridge simply learning a restoration mapping in embedding space rather than unlocking an intrinsic prior.

    Authors: We agree that a control experiment on a randomly initialized network of identical architecture would provide stronger causal evidence that the observed restoration behavior originates from the pre-trained weights rather than being learned entirely by the prompt optimization and bridge. Our current support for the claim rests on (i) the method requiring no fine-tuning of the diffusion model itself, (ii) the failure of naive prompt learning on the same pre-trained models, and (iii) the success of the bridge formulation only when applied to pre-trained checkpoints. We will add a dedicated limitations paragraph in the revised manuscript discussing this point and the computational impracticality of full random-initialization controls at the scale of WAN and FLUX. If space and resources permit, we will also report a small-scale ablation on a smaller pre-trained model versus its randomly initialized counterpart. revision: partial

  2. Referee: [Abstract] The abstract states that 'extensive experiments demonstrate competitive performance' but supplies no quantitative metrics, baselines, ablation tables, or validation details for the diffusion-bridge alignment; this absence makes it impossible to evaluate whether the reported gains depend on the pre-trained weights or on the specific bridge construction.

    Authors: The abstract is written as a high-level summary and therefore omits detailed numbers and tables; all quantitative results, baselines, and ablations (including those isolating the diffusion-bridge alignment) appear in Sections 4 and 5. To address the concern, we will revise the abstract to include one or two concrete performance highlights (e.g., average PSNR/SSIM on standard benchmarks and a brief statement on the bridge ablation) while remaining within the word limit. This will make the abstract more self-contained without duplicating the full experimental section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical prompt optimization on pre-trained models with bridge alignment

full rationale

The paper presents an empirical method: direct optimization of prompt embeddings at the text-encoder output, combined with a diffusion-bridge formulation to align forward noising on degraded inputs with reverse sampling. The central claim of 'inherent' restoration behavior is supported by observed performance on pre-trained WAN and FLUX models after this training, without any derivation that reduces a result to its own fitted inputs by construction, self-referential definitions, or load-bearing self-citations. No equations or steps in the provided text equate a prediction to a parameter fit or rename a known result as a new derivation. The bridge is introduced as a practical fix for observed misalignment rather than a definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is primarily empirical with no explicit mathematical axioms, free parameters beyond learned prompts, or invented entities; the diffusion bridge is a formulation choice rather than a new postulate.

pith-pipeline@v0.9.0 · 5491 in / 1159 out tokens · 36882 ms · 2026-05-10T19:03:48.583105+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 24 canonical work pages · 6 internal anchors

  1. [1]

    21227/H2001Q,https://dx.doi.org/10.21227/H2001Q10

    Hazerd: an outdoor dataset for dehazing algorithms (2017).https://doi.org/10. 21227/H2001Q,https://dx.doi.org/10.21227/H2001Q10

  2. [2]

    https://doi.org/10.34740/KAGGLE/DSV/105294,https://www.kaggle.com/ dsv/10529410, 13

    Bahnsen, C.H., Moeslund, T.B.: Aau rainsnow traffic surveillance dataset (2018). https://doi.org/10.34740/KAGGLE/DSV/105294,https://www.kaggle.com/ dsv/10529410, 13

  3. [3]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Brack, M., Friedrich, F., Kornmeier, K., Tsaban, L., Schramowski, P., Kersting, K., Passos, A.L.: Limitless image editing using text-to-image models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8861–8870 4

  4. [4]

    IEEE Transactions on Image Processing27(4), 2049–2062 (2018) 10

    Cai, J., Gu, S., Zhang, L.: Learning a deep single image contrast enhancer from multi-exposure images. IEEE Transactions on Image Processing27(4), 2049–2062 (2018) 10

  5. [5]

    In: Pro- ceedings of the IEEE/CVF international conference on computer vision

    Cao, M., Wang, X., Qi, Z., Shan, Y., Qie, X., Zheng, Y.: Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In: Pro- ceedings of the IEEE/CVF international conference on computer vision. pp. 22560– 22570 (2023) 4

  6. [6]

    In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

    Chen, H., Ren, J., Gu, J., Wu, H., Lu, X., Cai, H., Zhu, L.: Snow removal in video: A new dataset and a novel method. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 13165–13176. IEEE (2023) 10

  7. [7]

    Chen, J., Tan, C.H., Hou, J., Chau, L.P., Li, H.: Robust video content alignment andcompensationforrainremovalinacnnframework.In:ProceedingsoftheIEEE conference on computer vision and pattern recognition. pp. 6286–6295 (2018) 10, 13 16 S. Rajagopalan, V.M. Patel

  8. [8]

    arXiv preprint arXiv:2507.23685 (2025) 1, 4

    Cheng, Z., Zhou, L., Chen, D., Tang, N., Luo, X., Qu, Y.: Unildiff: Unlock- ing the power of diffusion priors for all-in-one image restoration. arXiv preprint arXiv:2507.23685 (2025) 1, 4

  9. [9]

    In: 2021 13th International Conference on Quality of Multimedia Experi- ence (QoMEX)

    Chu, Y., Luo, G., Chen, F.: A real haze video database for haze level evalua- tion. In: 2021 13th International Conference on Quality of Multimedia Experi- ence (QoMEX). pp. 69–72 (2021).https://doi.org/10.1109/QoMEX51781.2021. 946546110

  10. [10]

    In: European Conference on Computer Vision

    Conde, M.V., Geigle, G., Timofte, R.: Instructir: High-quality image restoration following human instructions. In: European Conference on Computer Vision. pp. 1–21. Springer (2025) 3

  11. [11]

    Deng, S., Ren, W., Yan, Y., Wang, T., Song, F., Cao, X.: Multi-scale separable networkforultra-high-definitionvideodeblurring.In:theIEEE/CVFInternational Conference on Computer Vision (ICCV). pp. 14030–14039 (2021) 10

  12. [12]

    IEEE transactions on pattern analysis and machine intelligence44(5), 2567–2581 (2020) 11

    Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Image quality assessment: Unify- ing structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence44(5), 2567–2581 (2020) 11

  13. [13]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Fei, B., Lyu, Z., Pan, L., Zhang, J., Yang, W., Luo, T., Zhang, B., Dai, B.: Genera- tive diffusion prior for unified image restoration and enhancement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9935–9946 (2023) 2, 4

  14. [14]

    Lumina-t2x: Transforming text into any modality, resolution, and duration via flow-based large diffusion transformers, 2024

    Gao, P., Zhuo, L., Liu, C., , Du, R., Luo, X., Qiu, L., Zhang, Y., et al.: Lumina-t2x: Transforming text into any modality, resolution, and duration via flow-based large diffusion transformers. arXiv preprint arXiv:2405.05945 (2024) 1

  15. [15]

    In: Pro- ceedings of the 33rd ACM International Conference on Multimedia

    Guan, Q., Yang, Q., Chen, X., Song, T., Jin, G., Jin, J.: Weatherbench: A real- world benchmark dataset for all-in-one adverse weather image restoration. In: Pro- ceedings of the 33rd ACM International Conference on Multimedia. pp. 12607– 12613 (2025) 10, 12

  16. [16]

    In: European conference on com- puter vision

    Guo, Y., Gao, Y., Lu, Y., Zhu, H., Liu, R.W., He, S.: Onerestore: A universal restoration framework for composite degradation. In: European conference on com- puter vision. pp. 255–272. Springer (2024) 10

  17. [17]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

    Guo, Y., Xiao, X., Chang, Y., Deng, S., Yan, L.: From sky to the ground: A large- scale benchmark and simple baseline towards real rain removal. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 12097– 12107 (October 2023) 10

  18. [18]

    Prompt-to-Prompt Image Editing with Cross Attention Control

    Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022) 2, 4

  19. [19]

    Universal image restoration pre-training via degradation classification,

    Hu, J., Jin, L., Yao, Z., Lu, Y.: Universal image restoration pre-training via degra- dation classification. arXiv preprint arXiv:2501.15510 (2025) 11

  20. [20]

    arXiv preprint arXiv:2310.10123 (2023) 1, 4, 11

    Jiang, Y., Zhang, Z., Xue, T., Gu, J.: Autodir: Automatic all-in-one image restora- tion with latent diffusion. arXiv preprint arXiv:2310.10123 (2023) 1, 4, 11

  21. [21]

    Jinhui, H., Zhu, Z., Hou, J.: Consistency geodesic bridge: Image restoration with pretraineddiffusionmodels.In:TheFourteenthInternationalConferenceonLearn- ing Representations 5, 8, 13, 14

  22. [22]

    In: The Twelfth International Conference on Learning Representations (2023) 4

    Ju, X., Zeng, A., Bian, Y., Liu, S., Xu, Q.: Pnp inversion: Boosting diffusion-based editing with 3 lines of code. In: The Twelfth International Conference on Learning Representations (2023) 4

  23. [23]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: Musiq: Multi-scale image quality transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5148–5157 (2021) 11 Your Pre-trained Diffusion Model Secretly Knows Restoration 17

  24. [24]

    Labs, B.F.: Flux.https://github.com/black-forest-labs/flux(2024) 1, 2, 9

  25. [25]

    IEEE Transactions on Image Processing28(1), 492–505 (2019).https://doi.org/10.1109/TIP.2018.286795110

    Li, B., Ren, W., Fu, D., Tao, D., Feng, D., Zeng, W., Wang, Z.: Benchmarking single-image dehazing and beyond. IEEE Transactions on Image Processing28(1), 492–505 (2019).https://doi.org/10.1109/TIP.2018.286795110

  26. [26]

    IEEE Transactions on Pattern Analysis and Machine Intelligence (2021) 10

    Li, C., Guo, C., Han, L., Jiang, J., Cheng, M.M., Gu, J., Loy, C.C.: Low-light image and video enhancement using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021) 10

  27. [27]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Li, H., Chen, X., Dong, J., Tang, J., Pan, J.: Foundir: Unleashing million-scale training data to advance foundation models for image restoration. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12626–12636 (2025) 3, 10, 11

  28. [28]

    In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR)

    Li, R., Tan, R.T., Cheong, L.F.: All in one bad weather removal using architectural search. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 3172–3182 (2020).https://doi.org/10.1109/CVPR42600. 2020.003243

  29. [29]

    Pixwizard: Versatile image-to-image visual assistant with open-language instructions.arXiv preprint arXiv:2409.15278, 2024

    Lin, W., Wei, X., Zhang, R., Zhuo, L., Zhao, S., Huang, S., Teng, H., Xie, J., Qiao, Y., Gao, P., et al.: Pixwizard: Versatile image-to-image visual assistant with open-language instructions. arXiv preprint arXiv:2409.15278 (2024) 1, 4, 9, 11

  30. [30]

    In: 2019 IEEE International Conference on Multi- media and Expo (ICME)

    Liu, T., Xu, M., Wang, Z.: Removing rain in videos: a large-scale database and a two-stream convlstm approach. In: 2019 IEEE International Conference on Multi- media and Expo (ICME). pp. 664–669. IEEE (2019) 10

  31. [31]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Liu, Y., Ke, Z., Liu, F., Zhao, N., Lau, R.W.: Diff-plugin: Revitalizing details for diffusion-based low-level tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4197–4208 (2024) 1, 4

  32. [32]

    IEEE Transactions on Image Processing27(6), 3064– 3073 (2018).https://doi.org/10.1109/TIP.2018.280620210

    Liu, Y.F., Jaw, D.W., Huang, S.C., Hwang, J.N.: Desnownet: Context-aware deep network for snow removal. IEEE Transactions on Image Processing27(6), 3064– 3073 (2018).https://doi.org/10.1109/TIP.2018.280620210

  33. [33]

    Decoupled Weight Decay Regularization

    Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017) 9

  34. [34]

    Unicorn: Latent diffusion-based unified controllable image restoration network across multiple degradations,

    Mandal, D., Chattopadhyay, S., Tong, G., Chakravarthula, P.: Unicorn: Latent diffusion-based unified controllable image restoration network across multiple degradations. arXiv preprint arXiv:2503.15868 (2025) 2, 4

  35. [35]

    Pnp-flow: Plug-and-play image restoration with flow matching,

    Martin, S., Gagneux, A., Hagemann, P., Steidl, G.: Pnp-flow: Plug-and-play image restoration with flow matching. arXiv preprint arXiv:2410.02423 (2024) 2, 4

  36. [36]

    SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

    Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021) 2, 4

  37. [37]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inver- sion for editing real images using guided diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6038–6047 (2023) 2, 4

  38. [38]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Nah, S., Hyun Kim, T., Mu Lee, K.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3883–3891 (2017) 10

  39. [39]

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

    Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952 (2023) 1

  40. [40]

    Advances in Neural Information Processing Systems36(2024) 3 18 S

    Potlapalli, V., Zamir, S.W., Khan, S.H., Shahbaz Khan, F.: Promptir: Prompt- ing for all-in-one image restoration. Advances in Neural Information Processing Systems36(2024) 3 18 S. Rajagopalan, V.M. Patel

  41. [41]

    Rajagopalan, S., Nair, N.G., Paranjape, J.N., Patel, V.M.: Gendeg: Diffusion-based degradationsynthesisforgeneralizableall-in-oneimagerestoration.In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 28144–28154 (2025) 3, 10

  42. [42]

    arXiv preprint arXiv:2505.18047 (2025) 1, 4, 11

    Rajagopalan, S., Narayan, K., Patel, V.M.: Restorevar: Visual autoregressive gen- eration for all-in-one image restoration. arXiv preprint arXiv:2505.18047 (2025) 1, 4, 11

  43. [43]

    arXiv preprint arXiv:2409.00263 (2024) 3

    Rajagopalan, S., Patel, V.M.: Awracle: All-weather image restoration using visual in-context learning. arXiv preprint arXiv:2409.00263 (2024) 3

  44. [44]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022) 1, 4

  45. [45]

    Denoising Diffusion Implicit Models

    Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020) 8

  46. [46]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Tian, X., Liao, X., Liu, X., Li, M., Ren, C.: Degradation-aware feature perturba- tion for all-in-one image restoration. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 28165–28175 (2025) 3, 9, 11

  47. [47]

    In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition

    Tumanyan, N., Geyer, M., Bagon, S., Dekel, T.: Plug-and-play diffusion features for text-driven image-to-image translation. In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition. pp. 1921–1930 (2023) 4

  48. [48]

    In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Valanarasu, J.J., Yasarla, R., Patel, V.M.: Transweather: Transformer-based restoration of images degraded by adverse weather conditions. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2343–2353 (2022) 3

  49. [49]

    Wan: Open and Advanced Large-Scale Video Generative Models

    Wan, T., Wang, A., Ai, B., Wen, B., Mao, C., Xie, C.W., Chen, D., Yu, F., Zhao, H., Yang, J., et al.: Wan: Open and advanced large-scale video generative models. arXiv preprint arXiv:2503.20314 (2025) 2, 9

  50. [50]

    In: Proceedings of the AAAI conference on artificial intelligence

    Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. In: Proceedings of the AAAI conference on artificial intelligence. vol. 37, pp. 2555–2563 (2023) 11

  51. [51]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Wang, R., Xu, X., Fu, C.W., Lu, J., Yu, B., Jia, J.: Seeing dynamic scene in the dark: A high-quality video dataset with mechatronic alignment. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9700–9709 (2021) 10

  52. [52]

    Deep Retinex Decomposition for Low-Light Enhancement

    Wei, C., Wang, W., Yang, W., Liu, J.: Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560 (2018) 10

  53. [53]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Wu,H.,Zhang,E.,Liao,L.,Chen,C.,Hou,J.,Wang,A.,Sun,W.,Yan,Q.,Lin,W.: Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 20144–20154 (2023) 11

  54. [54]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Yang, Y., Aviles-Rivero, A.I., Fu, H., Liu, Y., Wang, W., Zhu, L.: Video adverse- weather-component suppression network via weather messenger and adversarial backpropagation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 13200–13210 (2023) 3, 11

  55. [55]

    In: CVPR (2021) 10

    Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H., Shao, L.: Multi-stage progressive image restoration. In: CVPR (2021) 10

  56. [56]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018) 11 Your Pre-trained Diffusion Model Secretly Knows Restoration 19

  57. [57]

    In: CVPR

    Zhang, X., Dong, H., Pan, J., Zhu, C., Tai, Y., Wang, C., Li, J., Huang, F., Wang, F.: Learning to restore hazy video: A new real-world dataset and a new method. In: CVPR. pp. 9239–9248 (2021) 10

  58. [58]

    IEEE Transactions on Circuits and Systems for Video Technology (2025) 2, 4

    Zhang, Y., Zhang, H., Chai, X., Cheng, Z., Xie, R., Song, L., Zhang, W.: Diff- restorer: Unleashing visual prompts for diffusion-based universal image restoration. IEEE Transactions on Circuits and Systems for Video Technology (2025) 2, 4

  59. [59]

    Advances in Neural Informa- tion Processing Systems37, 127296–127316 (2024) 3, 11

    Zhao, H., Tian, L., Xiao, X., Hu, P., Gou, Y., Peng, X.: Avernet: All-in-one video restoration for time-varying unknown degradations. Advances in Neural Informa- tion Processing Systems37, 127296–127316 (2024) 3, 11

  60. [60]

    Dpm-solver-v3: Improved diffusion ode solver with empirical model statistics

    Zheng, K., He, G., Chen, J., Bao, F., Zhu, J.: Diffusion bridge implicit models. arXiv preprint arXiv:2405.15885 (2024) 8

  61. [61]

    Denoising diffusion bridge models.arXiv preprint arXiv:2309.16948, 2023

    Zhou, L., Lou, A., Khanna, S., Ermon, S.: Denoising diffusion bridge models. arXiv preprint arXiv:2309.16948 (2023) 5, 7, 8, 13, 14

  62. [62]

    In: European conference on computer vision

    Zhou, S., Li, C., Change Loy, C.: Lednet: Joint low-light enhancement and de- blurring in the dark. In: European conference on computer vision. pp. 573–589. Springer (2022) 10

  63. [63]

    In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition

    Zhou, Y., Ren, D., Emerton, N., Lim, S., Large, T.: Image restoration for under- display camera. In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition. pp. 9179–9188 (2021) 10

  64. [64]

    Advances in Neural Information Processing Systems37, 131278–131315 (2024) 1

    Zhuo, L., Du, R., Xiao, H., Li, Y., Liu, D., Huang, R., Liu, W., Zhu, X., Wang, F.Y., Ma, Z., et al.: Lumina-next: Making lumina-t2x stronger and faster with next-dit. Advances in Neural Information Processing Systems37, 131278–131315 (2024) 1