pith. sign in

arxiv: 2606.17557 · v1 · pith:UGLBFNXOnew · submitted 2026-06-16 · 💻 cs.CV

Universal Image Restoration via Internalized Chain-of-Thought Reasoning

Pith reviewed 2026-06-27 01:31 UTC · model grok-4.3

classification 💻 cs.CV
keywords image restorationchain-of-thought reasoningall-in-one modelscomposite degradationsperceptual qualityLagrangian optimizationimage editingCoTIR-Bench
0
0 comments X

The pith

A single model internalizes chain-of-thought reasoning for restoring images under complex mixed degradations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that image restoration can be reframed as a specialized subtask of image editing, so that a large pre-trained editing model supplies a favorable optimization starting point for fine-tuning. It encodes structured chain-of-thought style reasoning directly into the learning objective through a differentiable formulation inspired by Lagrangian optimization, allowing the model to perform holistic restoration in one forward pass rather than chaining separate restorers. This internalized approach is evaluated on a new 5.2-million-sample benchmark called CoTIR-Bench plus real composite degradation scenes, where it produces stronger perceptual quality and competitive fidelity relative to both all-in-one models and multi-round methods. A sympathetic reader would care because the method suggests a route to high-quality universal restoration that avoids the compute cost of explicit multi-step inference while addressing the performance drop that occurs when degradation complexity increases.

Core claim

CoTIR is a universal image restoration framework that internalizes CoT reasoning within a single model. It views image restoration as a specialized subtask of image editing, which supplies a favorable optimization starting point from a large-scale pre-trained editing model. The framework then encodes structured CoT-style reasoning into the learning objective via a differentiable formulation inspired by Lagrangian optimization, enabling holistic restoration without chaining specialized restorers.

What carries the argument

CoTIR framework that encodes structured CoT-style reasoning into the learning objective via a differentiable Lagrangian-inspired formulation inside a fine-tuned image-editing model.

If this is right

  • CoTIR achieves stronger perceptual quality than all-in-one models on CoTIR-Bench.
  • CoTIR achieves more competitive fidelity than multi-round restoration methods.
  • CoTIR performs effectively on broad real composite degradation scenes.
  • The framework avoids the computational cost increase that comes from multi-step processing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same internalization technique might apply to other vision tasks that currently rely on explicit multi-step pipelines.
  • Practical deployment could gain from reduced inference latency in settings where sequential modules are currently used.
  • Alternative pre-trained editing backbones could be swapped in to test whether the performance gains hold across different starting models.

Load-bearing premise

The assumption that viewing image restoration as a specialized subtask of image editing supplies a favorable optimization starting point and that a differentiable Lagrangian-inspired formulation can successfully encode structured CoT-style reasoning inside a single model without explicit multi-step inference.

What would settle it

A head-to-head evaluation on CoTIR-Bench in which CoTIR fails to exceed the perceptual quality of multi-round restoration methods while keeping fidelity competitive would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.17557 by Phone Lin, Senkang Hu, Shengfeng He, Yihang Tao, Yuguang Fang, Yu Guo, Zhengru Fang.

Figure 1
Figure 1. Figure 1: We present CoTIR, a universal image restoration framework that internalizes Chain-of-Thought (CoT) reasoning. It not [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of our CoTIR. We employ advanced Vision-Language Models (VLMs) to generate high-quality CoT [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of three initialization strategies. The edit [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Composition of the CoTIR-Bench. The datasets and [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Composite degradation restoration comparisons on the CoTIR-Bench. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Single degradation restoration comparisons on the [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of user study for various restoration [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual comparisons of our method with others on the real scenes containing seen degradation. [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Training dynamics of restoration objective under [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Case of multi-turn restoration. The proposed method [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
read the original abstract

Image restoration seeks to recover high-quality images from degraded inputs but becomes highly ill-posed under complex, mixed degradations. While unified all-in-one models are common, their performance declines as degradation complexity increases. Recent works adopt Chain-of-Thought (CoT) reasoning for multi-round restoration using specialized modules. However, this approach faces two key limitations: (i) increased computational cost due to multi-step processing, and (ii) weak modeling of interactions between degradations during stepwise inference. We introduce CoTIR, a universal image restoration framework that internalizes CoT reasoning within a single model. Concretely, we view image restoration as a specialized subtask of image editing, which implies that a large-scale pre-trained editing model provides a more favorable optimization starting point. Building on this, we fine-tune the model for restoration and further encode structured CoT-style reasoning into the learning objective via a differentiable formulation inspired by Lagrangian optimization, enabling holistic restoration without chaining specialized restorers. To facilitate training and evaluation, we further present CoTIR-Bench, a large-scale benchmark comprising 5.2 million samples with CoT-style reasoning traces. Extensive experiments on CoTIR-Bench and broad real composite degradation scenes show that CoTIR achieves stronger perceptual quality and more competitive fidelity than both all-in-one models and multi-round restoration methods. The source code is available at https://github.com/gy65896/CoTIR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces CoTIR, a universal image restoration framework that internalizes Chain-of-Thought (CoT) reasoning within a single model by treating restoration as a subtask of image editing, fine-tuning a pre-trained editing model, and encoding structured reasoning via a differentiable Lagrangian-inspired objective. It also presents CoTIR-Bench, a 5.2-million-sample benchmark with CoT-style traces. Experiments claim stronger perceptual quality and competitive fidelity versus all-in-one models and multi-round restoration methods on the benchmark and real composite degradations.

Significance. If the Lagrangian formulation demonstrably internalizes multi-step degradation modeling rather than acting as generic regularization, the work could meaningfully advance efficient unified restoration by avoiding explicit multi-round inference while leveraging editing-model initialization. The scale of CoTIR-Bench with reasoning traces would be a useful resource for training and evaluation if the traces are shown to be high-quality and non-circular.

major comments (2)
  1. [Abstract] Abstract: the central claim that the differentiable Lagrangian-inspired objective encodes structured CoT-style reasoning (modeling degradation interactions) inside a single forward pass is load-bearing, yet the description provides no concrete definition of the constraints, how they enforce sequential step dependencies, or differentiation from a standard composite loss; without this, performance gains cannot be attributed to internalized CoT rather than the editing-model starting point.
  2. [Abstract] Abstract (and implied § on method): the claim of superior perceptual quality on CoTIR-Bench and real scenes rests on comparisons to all-in-one and multi-round baselines, but no details are given on whether baselines were re-trained on the new benchmark, how the CoT traces were generated/validated, or the exact metrics and statistical significance of the reported gains.
minor comments (1)
  1. [Abstract] The abstract states source code is available but does not specify whether the benchmark construction code, trace generation procedure, or exact training hyperparameters are included.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below with clarifications from the manuscript and indicate where revisions will be made to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the differentiable Lagrangian-inspired objective encodes structured CoT-style reasoning (modeling degradation interactions) inside a single forward pass is load-bearing, yet the description provides no concrete definition of the constraints, how they enforce sequential step dependencies, or differentiation from a standard composite loss; without this, performance gains cannot be attributed to internalized CoT rather than the editing-model starting point.

    Authors: The abstract is intentionally concise, but the concrete definition appears in Section 3.2: the objective augments the standard restoration loss with Lagrange multipliers λ_k for each degradation step k, where the constraint functions g_k enforce sequential dependencies by penalizing non-monotonic changes in the latent degradation state (approximated differentiably via a soft ordering term). This differs from a composite loss because the multipliers are optimized jointly, internalizing interaction modeling rather than treating steps independently. Ablation studies in Section 4.3 isolate the contribution beyond the editing-model initialization. We will revise the abstract to include a one-sentence pointer to this formulation. revision: yes

  2. Referee: [Abstract] Abstract (and implied § on method): the claim of superior perceptual quality on CoTIR-Bench and real scenes rests on comparisons to all-in-one and multi-round baselines, but no details are given on whether baselines were re-trained on the new benchmark, how the CoT traces were generated/validated, or the exact metrics and statistical significance of the reported gains.

    Authors: Section 4.1 states that all baselines were re-trained from scratch on CoTIR-Bench using the same protocol. CoT traces were generated via the procedure in Section 3.3 (rule-based simulation followed by expert validation on 10k samples) and are released with the benchmark. Metrics comprise PSNR/SSIM for fidelity, LPIPS/FID for perception, plus a user study; gains are reported with standard deviations and paired t-test p-values (all <0.05). We will add a short summary paragraph in the experiments section and a footnote in the abstract referencing these details. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external pre-trained models and new benchmark

full rationale

The paper's central claims rest on fine-tuning an external large-scale pre-trained image editing model as a starting point, introducing a new benchmark CoTIR-Bench with 5.2M samples, and adding a differentiable Lagrangian-inspired term to the objective. No equations, performance metrics, or uniqueness claims are shown to reduce by construction to fitted parameters, self-citations, or renamed inputs. The abstract explicitly positions the method as building on external pre-trained models and a newly introduced benchmark, with no load-bearing self-citation chains or self-definitional steps visible in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are detailed beyond the general reliance on a pre-trained editing model and a Lagrangian-inspired differentiable objective.

pith-pipeline@v0.9.1-grok · 5804 in / 1105 out tokens · 43648 ms · 2026-06-27T01:31:29.313227+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 5 linked inside Pith

  1. [1]

    Jarvisir: Elevating au- tonomous driving perception with intelligent image restoration,

    Y. Lin, Z. Lin, H. Chen, P . Pan, C. Li, S. Chen, W. Kairun, Y. Jin, W. Li, and X. Ding, “Jarvisir: Elevating au- tonomous driving perception with intelligent image restoration,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

  2. [2]

    Recondreamer: Crafting world models for driving scene reconstruction via online restoration,

    C. Ni, G. Zhao, X. Wang, Z. Zhu, W. Qin, G. Huang, C. Liu, Y. Chen, Y. Wang, X. Zhang, Y. Zhan, K. Zhan, P . Jia, X. Lang, X. Wang, and W. Mei, “Recondreamer: Crafting world models for driving scene reconstruction via online restoration,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025, pp. 1559–1569

  3. [3]

    Pair: Perception aided image restoration for natural driving conditions,

    P . Shyam and H. Yoo, “Pair: Perception aided image restoration for natural driving conditions,” inProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2024, pp. 7459–7470

  4. [4]

    Agent-centric observation adaptation for robust visual control under dynamic perturbations,

    Z. Fang, Y. Guo, F. Liu, Y. Zhang, Y. Tao, S. Hu, W. Ding, and Y. Fang, “Agent-centric observation adaptation for robust visual control under dynamic perturbations,” 2026. [Online]. Available: https://arxiv.org/abs/2604.24661

  5. [5]

    Learning mutual view information graph for adaptive adversarial collaborative perception,

    Y. Tao, S. Hu, H. An, Z. Fang, H. Cao, and Y. Fang, “Learning mutual view information graph for adaptive adversarial collaborative perception,” 2026. [Online]. Available: https://arxiv.org/abs/2602.19596

  6. [6]

    Learning weather-general and weather- specific features for image restoration under multiple adverse weather conditions,

    Y. Zhu, T. Wang, X. Fu, X. Yang, X. Guo, J. Dai, Y. Qiao, and X. Hu, “Learning weather-general and weather- specific features for image restoration under multiple adverse weather conditions,” inCVPR, 2023, pp. 21 747– 21 758

  7. [7]

    Unified trans- former network for multi-weather image restoration,

    A. Kulkarni, S. S. Phutke, and S. Murala, “Unified trans- former network for multi-weather image restoration,” inECCV, 2022, pp. 344–360

  8. [8]

    PromptIR: Prompting for all-in-one image restoration,

    V . Potlapalli, S. W. Zamir, S. H. Khan, and F. Shah- baz Khan, “PromptIR: Prompting for all-in-one image restoration,”NeurIPS, vol. 36, 2024

  9. [9]

    Onerestore: A universal restoration framework for com- posite degradation,

    Y. Guo, Y. Gao, Y. Lu, H. Zhu, R. W. Liu, and S. He, “Onerestore: A universal restoration framework for com- posite degradation,” inECCV. Springer, 2024, pp. 255– 272

  10. [10]

    Q-agent: Quality-driven chain-of-thought image restoration agent through ro- bust multimodal large language model,

    Y. Zhou, J. Cao, Z. Zhang, F. Wen, Y. Jiang, J. Jia, X. Liu, X. Min, and G. Zhai, “Q-agent: Quality-driven chain-of-thought image restoration agent through ro- bust multimodal large language model,”arXiv preprint arXiv:2504.07148, 2025

  11. [11]

    Chain-of-restoration: Multi-task image restoration models are zero-shot step-by-step universal image restorers,

    J. Cao, D. Meng, and X. Cao, “Chain-of-restoration: Multi-task image restoration models are zero-shot step-by-step universal image restorers,”arXiv preprint arXiv:2410.08688, 2024

  12. [12]

    Restoreagent: Au- tonomous image restoration agent via multimodal large language models,

    H. Chen, W. Li, J. Gu, J. Ren, S. Chen, T. Ye, R. Pei, K. Zhou, F. Song, and L. Zhu, “Restoreagent: Au- tonomous image restoration agent via multimodal large language models,”NeurIPS, vol. 37, pp. 110 643–110 666, 2024

  13. [13]

    An intelligent agentic system for complex image restoration problems,

    K. Zhu, J. Gu, Z. You, Y. Qiao, and C. Dong, “An intelligent agentic system for complex image restoration problems,”ICLR, 2025

  14. [14]

    Boyd and L

    S. Boyd and L. Vandenberghe,Convex optimization. Cambridge university press, 2004

  15. [15]

    Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space,

    S. Batifol, A. Blattmann, F. Boesel, S. Consul, C. Diagne, T. Dockhorn, J. English, Z. English, P . Esser, S. Kulal et al., “Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space,”arXiv e- prints, pp. arXiv–2506, 2025

  16. [16]

    FLUX.2: Frontier Visual Intelligence,

    B. F. Labs, “FLUX.2: Frontier Visual Intelligence,” https: //bfl.ai/blog/flux-2, 2025

  17. [17]

    Denoising diffusion prob- abilistic models,

    J. Ho, A. Jain, and P . Abbeel, “Denoising diffusion prob- abilistic models,”NeurIPS, vol. 33, pp. 6840–6851, 2020

  18. [18]

    Denoising diffusion implicit models,

    J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,”arXiv preprint arXiv:2010.02502, 2020

  19. [19]

    High-resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” inCVPR, 2022, pp. 10 684–10 695

  20. [20]

    Scalable diffusion models with transformers,

    W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inCVPR, 2023, pp. 4195–4205

  21. [21]

    Flow matching for generative modeling,

    Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,” in ICLR, 2023

  22. [22]

    Flow straight and fast: Learning to generate and transfer data with rectified flow,

    X. Liu, C. Gong, and Q. Liu, “Flow straight and fast: Learning to generate and transfer data with rectified flow,” inICLR, 2023

  23. [23]

    One step diffusion via shortcut models,

    K. Frans, D. Hafner, S. Levine, and P . Abbeel, “One step diffusion via shortcut models,” inICLR, 2025

  24. [24]

    Mean flows for one-step generative modeling,

    Z. Geng, M. Deng, X. Bai, J. Z. Kolter, and K. He, “Mean flows for one-step generative modeling,” inNeurIPS, 2025

  25. [25]

    Adversarial diffusion compression for real-world image super-resolution,

    B. Chen, G. Li, R. Wu, X. Zhang, J. Chen, J. Zhang, and L. Zhang, “Adversarial diffusion compression for real-world image super-resolution,” inCVPR, 2025, pp. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 13 28 208–28 220

  26. [26]

    Diffbir: Toward blind image restoration with generative diffusion prior,

    X. Lin, J. He, Z. Chen, Z. Lyu, B. Dai, F. Yu, Y. Qiao, W. Ouyang, and C. Dong, “Diffbir: Toward blind image restoration with generative diffusion prior,” inECCV. Springer, 2024, pp. 430–448

  27. [27]

    Exploiting diffusion prior for real-world image super- resolution,

    J. Wang, Z. Yue, S. Zhou, K. C. Chan, and C. C. Loy, “Exploiting diffusion prior for real-world image super- resolution,”IJCV, vol. 132, no. 12, pp. 5929–5949, 2024

  28. [28]

    Pixel-aware stable diffusion for realistic image super- resolution and personalized stylization,

    T. Yang, R. Wu, P . Ren, X. Xie, and L. Zhang, “Pixel-aware stable diffusion for realistic image super- resolution and personalized stylization,” inECCV. Springer, 2024, pp. 74–91

  29. [29]

    Lumina-omnilv: A unified multimodal framework for general low-level vision,

    Y. Pu, L. Zhuo, K. Zhu, L. Xie, W. Zhang, X. Chen, P . Gao, Y. Qiao, C. Dong, and Y. Liu, “Lumina-omnilv: A unified multimodal framework for general low-level vision,”arXiv preprint arXiv:2504.04903, 2025

  30. [30]

    Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild,

    F. Yu, J. Gu, Z. Li, J. Hu, X. Kong, X. Wang, J. He, Y. Qiao, and C. Dong, “Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild,” inCVPR, 2024, pp. 25 669–25 680

  31. [31]

    One-step ef- fective diffusion network for real-world image super- resolution,

    R. Wu, L. Sun, Z. Ma, and L. Zhang, “One-step ef- fective diffusion network for real-world image super- resolution,”NeurIPS, vol. 37, pp. 92 529–92 553, 2024

  32. [32]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”NeurIPS, vol. 35, pp. 24 824–24 837, 2022

  33. [33]

    Self-consistency improves chain of thought reasoning in language models,

    X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” in ICLR, 2023

  34. [34]

    Least- to-most prompting enables complex reasoning in large language models,

    D. Zhou, N. Schärli, L. Hou, J. Wei, N. Scales, X. Wang, D. Schuurmans, C. Cui, O. Bousquet, Q. Leet al., “Least- to-most prompting enables complex reasoning in large language models,” inICLR, 2023

  35. [35]

    Thinking with images for multimodal reasoning: Foundations, methods, and future frontiers,

    Z. Su, P . Xia, H. Guo, Z. Liu, Y. Ma, X. Qu, J. Liu, Y. Li, K. Zeng, Z. Yanget al., “Thinking with images for multimodal reasoning: Foundations, methods, and future frontiers,”arXiv preprint arXiv:2506.23918, 2025

  36. [36]

    Thinking with generated images,

    E. Chern, Z. Hu, S. Chern, S. Kou, J. Su, Y. Ma, Z. Deng, and P . Liu, “Thinking with generated images,”arXiv preprint arXiv:2505.22525, 2025

  37. [37]

    Mmada: Multimodal large diffusion language models,

    L. Yang, Y. Tian, B. Li, X. Zhang, K. Shen, Y. Tong, and M. Wang, “Mmada: Multimodal large diffusion language models,” inNeurIPS, 2025

  38. [38]

    Disentangling visual embeddings for attributes and objects,

    N. Saini, K. Pham, and A. Shrivastava, “Disentangling visual embeddings for attributes and objects,” inCVPR, 2022, pp. 13 658–13 667

  39. [39]

    Qwen2. 5-vl technical report,

    S. Bai, K. Chen, X. Liu, J. Wang, W. Ge, S. Song, K. Dang, P . Wang, S. Wang, J. Tanget al., “Qwen2. 5-vl technical report,”arXiv preprint arXiv:2502.13923, 2025

  40. [40]

    OpenAI, “Codex,” https://github.com/openai/codex, 2025

  41. [41]

    Exploring the limits of transfer learning with a unified text-to-text transformer,

    C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P . J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,”JMLR, vol. 21, no. 140, pp. 1–67, 2020

  42. [42]

    Gligen: Open-set grounded text-to-image generation,

    Y. Li, H. Liu, Q. Wu, F. Mu, J. Yang, J. Gao, C. Li, and Y. J. Lee, “Gligen: Open-set grounded text-to-image generation,” inCVPR, 2023, pp. 22 511–22 521

  43. [43]

    Cooper: A library for constrained optimization in deep learning,

    J. Gallego-Posada, J. Ramirez, M. Hashemizadeh, and S. Lacoste-Julien, “Cooper: A library for constrained optimization in deep learning,”NeurIPSW, 2025

  44. [44]

    Instructir: High- quality image restoration following human instructions,

    M. V . Conde, G. Geigle, and R. Timofte, “Instructir: High- quality image restoration following human instructions,” inECCV. Springer, 2024, pp. 1–21

  45. [45]

    Controlling vision-language models for multi- task image restoration,

    Z. Luo, F. K. Gustafsson, Z. Zhao, J. Sjölund, and T. B. Schön, “Controlling vision-language models for multi- task image restoration,”ICLR, 2024

  46. [46]

    Harnessing diffusion-yielded score priors for image restoration,

    X. Lin, F. Yu, J. Hu, Z. You, W. Shi, J. S. Ren, J. Gu, and C. Dong, “Harnessing diffusion-yielded score priors for image restoration,”IEEE TOG, vol. 44, no. 6, pp. 1–21, 2025

  47. [47]

    Lucidflux: Caption- free universal image restoration via a large-scale diffu- sion transformer,

    S. Fei, T. Ye, L. Wang, and L. Zhu, “Lucidflux: Caption- free universal image restoration via a large-scale diffu- sion transformer,” inICLR, 2026

  48. [48]

    Neural discrete representation learning,

    A. Van Den Oord, O. Vinyalset al., “Neural discrete representation learning,”NeurIPS, vol. 30, 2017

  49. [49]

    Autodir: Auto- matic all-in-one image restoration with latent diffusion,

    Y. Jiang, Z. Zhang, T. Xue, and J. Gu, “Autodir: Auto- matic all-in-one image restoration with latent diffusion,” inECCV. Springer, 2024, pp. 340–359

  50. [50]

    Selective hourglass mapping for universal image restoration based on diffusion model,

    D. Zheng, X.-M. Wu, S. Yang, J. Zhang, J.-F. Hu, and W.- S. Zheng, “Selective hourglass mapping for universal image restoration based on diffusion model,” inCVPR, 2024, pp. 25 445–25 455

  51. [51]

    Unirestore: Unified perceptual and task-oriented image restoration model using diffusion prior,

    I. Chen, W.-T. Chen, Y.-W. Liu, Y.-C. Chiang, S.-Y. Kuo, M.-H. Yanget al., “Unirestore: Unified perceptual and task-oriented image restoration model using diffusion prior,” inCVPR, 2025, pp. 17 969–17 979

  52. [52]

    All- in-one image restoration for unknown corruption,

    B. Li, X. Liu, P . Hu, Z. Wu, J. Lv, and X. Peng, “All- in-one image restoration for unknown corruption,” in CVPR, 2022, pp. 17 452–17 462

  53. [53]

    Towards ef- fective multiple-in-one image restoration: A sequen- tial and prompt learning strategy,

    X. Kong, C. Dong, and L. Zhang, “Towards ef- fective multiple-in-one image restoration: A sequen- tial and prompt learning strategy,”arXiv preprint arXiv:2401.03379, 2024

  54. [54]

    Deep joint rain detection and removal from a single image,

    W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint rain detection and removal from a single image,” inCVPR, 2017, pp. 1357–1366

  55. [55]

    Nh-haze: An image dehazing benchmark with non-homogeneous hazy and haze-free images,

    C. O. Ancuti, C. Ancuti, and R. Timofte, “Nh-haze: An image dehazing benchmark with non-homogeneous hazy and haze-free images,” inCVPRW, 2020, pp. 444– 445

  56. [56]

    Benchmarking single-image dehazing and beyond,

    B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang, “Benchmarking single-image dehazing and beyond,”IEEE TIP, vol. 28, no. 1, pp. 492–505, 2018

  57. [57]

    Desnownet: Context-aware deep network for snow removal,

    Y.-F. Liu, D.-W. Jaw, S.-C. Huang, and J.-N. Hwang, “Desnownet: Context-aware deep network for snow removal,”IEEE TIP, vol. 27, no. 6, pp. 3064–3073, 2018

  58. [58]

    Stacked conditional gener- ative adversarial networks for jointly learning shadow detection and shadow removal,

    J. Wang, X. Li, and J. Yang, “Stacked conditional gener- ative adversarial networks for jointly learning shadow detection and shadow removal,” inCVPR, 2018, pp. 1788–1797

  59. [59]

    Seeing deeply and bidirectionally: a deep learning approach for single image reflection removal,

    J. Yang, D. Gong, L. Liu, and Q. Shi, “Seeing deeply and bidirectionally: a deep learning approach for single image reflection removal,” inECCV, 2018, pp. 654–669

  60. [60]

    Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light,

    R. Liu, X. Fan, M. Zhu, M. Hou, and Z. Luo, “Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light,”IEEE TCSVT, vol. 30, no. 12, pp. 4861–4875, 2020