pith. sign in

arxiv: 2606.28971 · v1 · pith:Y5HUYPI4new · submitted 2026-06-27 · 💻 cs.CV

Self-Evolving Agentic Image Restoration via Deliberate Planning and Intuitive Execution

Pith reviewed 2026-06-30 09:37 UTC · model grok-4.3

classification 💻 cs.CV
keywords image restorationagentic frameworksMonte Carlo Tree Searchepisodic memorydual-process systemsself-evolving memorysequential decision makingmultimodal large language models
0
0 comments X

The pith

SEAR restores images by using tree search for long-horizon planning and distilling successful trajectories into reusable memory for future cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that agentic image restoration currently relies too heavily on short-sighted greedy choices and forgets what worked on earlier similar images. SEAR models the process as a sequence of decisions in which one module plans several steps ahead while the other recalls and adapts prior solutions. A reader would care because real-world photos often contain mixed, unknown degradations that defeat single-shot or memory-less methods. If the dual approach holds, systems could improve over repeated use without paying the full planning cost each time.

Core claim

SEAR formulates restoration as sequential decision-making with a Deliberate Planner that runs Pruning-Aware Monte Carlo Tree Search guided by a hybrid no-reference reward and an MLLM-based tournament, paired with an Intuitive Executor that indexes self-evolving episodic memory by degradation-aware state fingerprints to convert expensive search paths into adaptive expertise.

What carries the argument

The dual-process structure of a Deliberate Planner using Pruning-Aware Monte Carlo Tree Search and an Intuitive Executor maintaining self-evolving episodic memory indexed by degradation-aware state fingerprints.

If this is right

  • Long-horizon search balances exploration against exploitation when choosing restoration tools.
  • Expensive planning trajectories are compressed into memory that future similar images can reuse.
  • Cold-start costs decline as the memory grows across multiple restoration tasks.
  • Hybrid rewards and tournament judging reduce the chance that the planner optimizes the wrong metric.
  • Quantitative and perceptual scores improve on both synthetic and real-world test sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same planner-plus-memory split could be tested on sequential visual tasks such as video deblurring or multi-frame enhancement.
  • Indexing memory by degradation fingerprints may cut repeated computation in production pipelines that process similar camera artifacts daily.
  • If the tournament mechanism generalizes, other agentic systems that rely on learned rewards could adopt it to limit gaming.
  • Replacing MCTS with other search methods inside the same dual framework offers a direct way to measure whether the memory component alone drives gains.

Load-bearing premise

The hybrid no-reference reward plus MLLM tournament stops metric exploitation while degradation-aware fingerprints let search paths be distilled into reusable memory without bias or high extra cost.

What would settle it

An ablation experiment in which disabling either the memory module or the tree-search planner produces equal or better results on real-world benchmarks, or in which the system clearly exploits the no-reference metrics despite the tournament safeguard.

Figures

Figures reproduced from arXiv: 2606.28971 by Fan Ji, Fanjiang Xu, Guanglong Sun, Jiangmeng Li, Shuang Cui, Xiongxin Tang, Yufei Guo.

Figure 1
Figure 1. Figure 1: Analysis of restoration limitations. (a) Static AiO models struggle with coupled degradations. (b) Restoration quality is highly sensitive to execution sequences and tools, where greedy decisions (e.g., over-smoothed) incur irreversible failure. degradations such as denoising [6,42], deblurring [12,17,59], and super-resolution [30,38,52]. However, in real-world scenarios, images exhibit complex and coupled… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of restoration decision strategies. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the SEAR Framework. SEAR follows a dual-process design: the Intuitive Executor performs rapid, memory-driven inference by retrieving proven strategies, while the Deliberate Planner tackles complex scenarios via long-horizon hierarchical optimization. By progressively distilling planned trajectories into the Ex￾ecutor, SEAR evolves from costly deliberation to memory-guided execution. by combinin… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison on synthetic datasets [64] [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison on the real-world dataset [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Dynamic self-evolution. Step1. deraining Restormer Step1. defocus deblurring IFAN Zoomed-In Step2. denoising Restormer Step1. deraining MAXIM Step2. denoising MPRNet Step3. super resultaion SwinIR Step3. super resultaion DiffBIR Input w/ System 2 w/o System 2 (Greedy) Step1. defocus deblurring DRBNet Step2. denoising SwinIR Step2. denoising Restormer Zoomed-In noise w/o P-MCTS (Greedy) w/ P-MCTS Input Scor… view at source ↗
Figure 8
Figure 8. Figure 8: Reward design ablation. (a) Radar comparison of reward components. (b) Vi￾sual analysis: tournament selection prioritizes perceptual quality over inflated scores. execution. Cross-domain initialization (yellow) further accelerates convergence, demonstrating the transferability of distilled trajectories, while the eventual con￾vergence of both settings highlights the robustness of the self-evolving architec… view at source ↗
read the original abstract

Real-world image restoration (IR) remains challenging due to complex and coupled degradations. While recent agentic IR frameworks leverage Large Language Models for flexible tool planning, they face two critical limitations. First, from a search scheme perspective, excessive reliance on greedy strategies fails to balance exploration and exploitation. Second, existing agentic systems underutilize information, exhibiting episodic amnesia. To address these challenges, we propose \textbf{Self-Evolving Agentic Image Restoration (SEAR)}, which formulates restoration as a sequential decision-making problem. Inspired by the dual-process theory, SEAR comprises an Intuitive Executor and a Deliberate Planner, respectively following the fast-thinking \textit{System 1} and slow-thinking \textit{System 2} principles. The Deliberate Planner employs Pruning-Aware Monte Carlo Tree Search for long-horizon reasoning, utilizing a hybrid no-reference reward and a Multimodal Large Language Model (MLLM)-based tournament to prevent metric exploitation. Complementarily, the Intuitive Executor leverages a self-evolving episodic memory indexed by degradation-aware state fingerprints. This mechanism distills expensive search trajectories into adaptive expertise, overcoming episodic amnesia while progressively amortizing cold-start exploration costs through memory reuse. Extensive experiments on synthetic and real-world benchmarks demonstrate its strong perceptual and quantitative performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes Self-Evolving Agentic Image Restoration (SEAR) to address limitations in agentic image restoration frameworks, specifically excessive reliance on greedy strategies and episodic amnesia. It formulates the task as sequential decision-making with a Deliberate Planner using Pruning-Aware Monte Carlo Tree Search, a hybrid no-reference reward, and an MLLM-based tournament, paired with an Intuitive Executor that uses self-evolving episodic memory indexed by degradation-aware state fingerprints. The paper claims this dual-process approach leads to strong performance on synthetic and real-world benchmarks.

Significance. If the proposed components function as described—particularly if the MLLM tournament reliably prevents metric exploitation and the state fingerprints enable unbiased distillation of trajectories—this could represent a meaningful contribution to agentic methods in image restoration by improving exploration-exploitation balance and knowledge reuse. The inspiration from dual-process theory provides an interesting conceptual framework, though its empirical impact would need demonstration through the experiments.

major comments (3)
  1. [Abstract] The central performance claim of 'strong perceptual and quantitative performance' is asserted without any supporting numerical results, baseline comparisons, ablation studies, or error analysis, which is load-bearing for evaluating whether the proposed mechanisms achieve the claimed improvements over prior agentic IR methods.
  2. [Abstract] Details on the MLLM-based tournament (e.g., voting protocol) and the hybrid reward balancing are absent, making it impossible to assess whether this component reliably prevents metric exploitation as claimed; this is critical to the Deliberate Planner's contribution.
  3. [Abstract] The construction of degradation-aware state fingerprints and the process for distilling search trajectories into episodic memory lack specifics on how selection bias is avoided or computational costs are amortized, which directly impacts the validity of the Intuitive Executor's solution to episodic amnesia.
minor comments (1)
  1. [Abstract] The term 'Pruning-Aware Monte Carlo Tree Search' is introduced without a reference or brief explanation of how pruning is incorporated.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the abstract would benefit from greater specificity on the claimed contributions and will revise it accordingly while preserving its length constraints. Below we respond point-by-point to the major comments.

read point-by-point responses
  1. Referee: [Abstract] The central performance claim of 'strong perceptual and quantitative performance' is asserted without any supporting numerical results, baseline comparisons, ablation studies, or error analysis, which is load-bearing for evaluating whether the proposed mechanisms achieve the claimed improvements over prior agentic IR methods.

    Authors: We agree the abstract, as currently written, does not include numerical support for the performance claim. The body of the manuscript contains the requested quantitative comparisons, ablations, and error analysis on synthetic and real-world benchmarks. We will revise the abstract to include concise numerical highlights (e.g., average PSNR/SSIM gains and perceptual metric improvements versus prior agentic baselines) to make the claim self-contained. revision: yes

  2. Referee: [Abstract] Details on the MLLM-based tournament (e.g., voting protocol) and the hybrid reward balancing are absent, making it impossible to assess whether this component reliably prevents metric exploitation as claimed; this is critical to the Deliberate Planner's contribution.

    Authors: The abstract provides only a high-level description. The full manuscript specifies the MLLM tournament (majority vote across three MLLM judges with tie-breaking by score variance) and the hybrid reward formulation (weighted sum of NIQE, BRISQUE, and a learned no-reference term). We will add a brief clause to the abstract describing the tournament's anti-exploitation role. revision: yes

  3. Referee: [Abstract] The construction of degradation-aware state fingerprints and the process for distilling search trajectories into episodic memory lack specifics on how selection bias is avoided or computational costs are amortized, which directly impacts the validity of the Intuitive Executor's solution to episodic amnesia.

    Authors: We accept that the abstract lacks these implementation details. The manuscript describes fingerprint construction via concatenated degradation-type embeddings and low-level image statistics, together with a distillation procedure that employs stratified sampling over trajectory quality to mitigate selection bias and amortize cost via reuse. We will insert a short clarifying phrase in the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation is self-contained

full rationale

The paper introduces SEAR as a new sequential decision-making formulation for image restoration, with the Deliberate Planner (Pruning-Aware MCTS + hybrid reward + MLLM tournament) and Intuitive Executor (self-evolving memory via degradation-aware fingerprints) presented as distinct architectural contributions. No equations, parameter fits, or derivations appear that reduce any claimed performance or mechanism to its own inputs by construction. The dual-process inspiration and experimental validation on benchmarks are external to the definitions, so the chain does not collapse into self-reference or renaming.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; no equations or implementation details are present to audit.

pith-pipeline@v0.9.1-grok · 5773 in / 1162 out tokens · 35789 ms · 2026-06-30T09:37:25.055699+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 8 canonical work pages · 5 internal anchors

  1. [1]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Abdelhamed, A., Lin, S., Brown, M.S.: A high-quality denoising dataset for smart- phone cameras. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1692–1700 (2018)

  2. [2]

    GPT-4 Technical Report

    Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  3. [3]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops

    Ancuti, C.O., Ancuti, C., Timofte, R.: Nh-haze: An image dehazing bench- mark with non-homogeneous hazy and haze-free images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 444–445 (2020) 16 S. Cui et al

  4. [4]

    In: International conference on advanced concepts for intelligent vision systems

    Ancuti, C., Ancuti, C.O., Timofte, R., De Vleeschouwer, C.: I-haze: A dehazing benchmark with real hazy and haze-free indoor images. In: International conference on advanced concepts for intelligent vision systems. pp. 620–631. Springer (2018)

  5. [5]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Cai, J., Zeng, H., Yong, H., Cao, Z., Zhang, L.: Toward real-world single im- age super-resolution: A new benchmark and a new model. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 3086–3095 (2019)

  6. [6]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Chen, H., Gu, J., Liu, Y., Magid, S.A., Dong, C., Wang, Q., Pfister, H., Zhu, L.: Masked image training for generalizable deep image denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1692–1703 (2023)

  7. [7]

    Advances in Neural Information Processing Systems37, 110643– 110666 (2024)

    Chen, H., Li, W., Gu, J., Ren, J., Chen, S., Ye, T., Pei, R., Zhou, K., Song, F., Zhu, L.: Restoreagent: Autonomous image restoration agent via multimodal large language models. Advances in Neural Information Processing Systems37, 110643– 110666 (2024)

  8. [8]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Chen,X., Li, H., Li, M., Pan, J.: Learning a sparse transformer network for effective image deraining. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5896–5905 (2023)

  9. [9]

    In: European Conference on Computer Vision

    Chen, X., Li, Z., Pu, Y., Liu, Y., Zhou, J., Qiao, Y., Dong, C.: A comparative study of image restoration networks for general backbone network design. In: European Conference on Computer Vision. pp. 74–91. Springer (2024)

  10. [10]

    IEEE Transactions on Pattern Analysis and Machine Intelligence pp

    Chen, X., Wang, X., Zhang, W., Kong, X., Qiao, Y., Zhou, J., Dong, C.: Hat: Hybrid attention transformer for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence pp. 2676 – 2694 (2025)

  11. [11]

    In: European Conference on Computer Vision

    Conde, M.V., Geigle, G., Timofte, R.: Instructir: High-quality image restoration following human instructions. In: European Conference on Computer Vision. pp. 1–21. Springer (2024)

  12. [12]

    Inter- national Journal of Computer Vision133(7), 4134–4157 (2025)

    Cui, S., Li, Y., Li, J., Tang, X., Su, B., Xu, F., Xiong, H.: Continual test-time adaptation for single image defocus deblurring via causal siamese networks. Inter- national Journal of Computer Vision133(7), 4134–4157 (2025)

  13. [13]

    International Journal of Com- puter Vision133(2), 672–687 (2025)

    Duan, J., Ai, Y., Liu, J., Huang, S., Huang, H., Cao, J., He, R.: Test-time forgery detection with spatial-frequency prompt learning. International Journal of Com- puter Vision133(2), 672–687 (2025)

  14. [14]

    Agent AI: Surveying the Horizons of Multimodal Interaction

    Durante, Z., Huang, Q., Wake, N., Gong, R., Park, J.S., Sarkar, B., Taori, R., Noda, Y., Terzopoulos, D., Choi, Y., et al.: Agent ai: Surveying the horizons of multimodal interaction. arXiv preprint arXiv:2401.03568 (2024)

  15. [15]

    In: The Thirty-ninth Annual Conference on Neural Information Processing Systems

    Gao, N., Zhang, X., Jiang, X., You, M., Zhang, M., Deng, Y.: Rf-agent: Automated reward function design via language agent tree search. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems

  16. [16]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Guo, Y., Xiao, X., Chang, Y., Deng, S., Yan, L.: From sky to the ground: A large- scale benchmark and simple baseline towards real rain removal. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12097–12107 (2023)

  17. [17]

    In: Advances in Neural Information Processing Systems (2025)

    He, J.T., Tsai, F.J., Peng, Y.T., Chen, M.H., Lin, C.W., Lin, Y.Y.: Blurdm: A blur diffusion model for image deblurring. In: Advances in Neural Information Processing Systems (2025)

  18. [18]

    In: The Twelfth International Conference on Learning Representations (2023) Self-Evolving Agentic Image Restoration 17

    Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S.K.S., Lin, Z., et al.: Metagpt: Meta programming for a multi-agent collaborative framework. In: The Twelfth International Conference on Learning Representations (2023) Self-Evolving Agentic Image Restoration 17

  19. [19]

    Qwen2.5-Coder Technical Report

    Hui, B., Yang, J., Cui, Z., Yang, J., Liu, D., Zhang, L., Liu, T., Zhang, J., Yu, B., Lu, K., et al.: Qwen2. 5-coder technical report. arXiv preprint arXiv:2409.12186 (2024)

  20. [20]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Jiang, J., Zhang, K., Timofte, R.: Towards flexible blind jpeg artifacts removal. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4997–5006 (2021)

  21. [21]

    IEEE Transactions on Pattern Anal- ysis and Machine Intelligence (2025)

    Jiang, J., Zuo, Z., Wu, G., Jiang, K., Liu, X.: A survey on all-in-one image restora- tion: Taxonomy, evaluation and future trends. IEEE Transactions on Pattern Anal- ysis and Machine Intelligence (2025)

  22. [22]

    Jiang,X.,Li,G.,Chen,B.,Zhang,J.:Multi-agentimagerestoration.arXivpreprint arXiv:2503.09403 (2025)

  23. [23]

    In: European Conference on Computer Vision

    Jiang, Y., Zhang, Z., Xue, T., Gu, J.: Autodir: Automatic all-in-one image restora- tion with latent diffusion. In: European Conference on Computer Vision. pp. 340–

  24. [24]

    macmillan (2011)

    Kahneman, D.: Thinking, fast and slow. macmillan (2011)

  25. [25]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Ke, J., Wang, Q., Wang, Y., Milanfar, P., Yang, F.: Musiq: Multi-scale image quality transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 5148–5157 (2021)

  26. [26]

    Towards effective multiple-in-one image restoration: A sequential and prompt learning strategy.arXiv preprint arXiv:2401.03379, 2024

    Kong, X., Dong, C., Zhang, L.: Towards effective multiple-in-one image restoration: A sequential and prompt learning strategy. arXiv preprint arXiv:2401.03379 (2024)

  27. [27]

    Hybrid agents for image restoration.arXiv preprint arXiv:2503.10120, 2025

    Li, B., Li, X., Lu, Y., Chen, Z.: Hybrid agents for image restoration. arXiv preprint arXiv:2503.10120 (2025)

  28. [28]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Li, B., Liu, X., Hu, P., Wu, Z., Lv, J., Peng, X.: All-in-one image restoration for unknown corruption. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17452–17462 (2022)

  29. [29]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 1833–1844 (2021)

  30. [30]

    In: European conference on computer vision

    Lin, X., He, J., Chen, Z., Lyu, Z., Dai, B., Yu, F., Qiao, Y., Ouyang, W., Dong, C.: Diffbir: Toward blind image restoration with generative diffusion prior. In: European conference on computer vision. pp. 430–448. Springer (2024)

  31. [31]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Lin, Y., Lin, Z., Chen, H., Pan, P., Li, C., Chen, S., Wen, K., Jin, Y., Li, W., Ding, X.: Jarvisir: Elevating autonomous driving perception with intelligent im- age restoration. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 22369–22380 (2025)

  32. [32]

    In: International Conference on Learning Representations (2024)

    Luo, Z., Gustafsson, F.K., Zhao, Z., Sjölund, J., Schön, T.B.: Controlling vision- language models for multi-task image restoration. In: International Conference on Learning Representations (2024)

  33. [33]

    In: Proceedings of the IEEE/CVF winter con- ference on applications of computer vision

    Mehri, A., Ardakani, P.B., Sappa, A.D.: Mprnet: Multi-path residual network for lightweight image super resolution. In: Proceedings of the IEEE/CVF winter con- ference on applications of computer vision. pp. 2704–2713 (2021)

  34. [34]

    completely blind

    Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Signal processing letters20(3), 209–212 (2012)

  35. [35]

    In: Proceedings of the 36th annual acm symposium on user interface software and technology

    Park, J.S., O’Brien, J., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S.: Gener- ative agents: Interactive simulacra of human behavior. In: Proceedings of the 36th annual acm symposium on user interface software and technology. pp. 1–22 (2023)

  36. [36]

    Pei, Y., Huang, Y., Zou, Q., Lu, Y., Wang, S.: Does haze removal help cnn-based image classification? In: Proceedings of the European conference on computer vi- sion (ECCV). pp. 682–697 (2018) 18 S. Cui et al

  37. [37]

    Advances in Neural Information Processing Systems36, 71275–71293 (2023)

    Potlapalli, V., Zamir, S.W., Khan, S.H., Shahbaz Khan, F.: Promptir: Prompt- ing for all-in-one image restoration. Advances in Neural Information Processing Systems36, 71275–71293 (2023)

  38. [38]

    Proceedings of Machine Learning Research, vol

    Qu, Y., Yuan, K., Hao, J., Zhao, K., Xie, Q., Sun, M., Zhou, C.: Visual autore- gressive modeling for image super-resolution. Proceedings of Machine Learning Research, vol. 267, pp. 50926–50948. PMLR (2025)

  39. [39]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Ruan, L., Chen, B., Li, J., Lam, M.: Learning to deblur using light field generated and real defocus images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16304–16313 (2022)

  40. [40]

    IEEE Transactions on Image Processing32, 1927–1941 (2023)

    Song, Y., He, Z., Qian, H., Du, X.: Vision transformers for single image dehazing. IEEE Transactions on Image Processing32, 1927–1941 (2023)

  41. [41]

    Psychology Press (1999)

    Stanovich, K.E.: Who is rational?: Studies of individual differences in reasoning. Psychology Press (1999)

  42. [42]

    Pattern Recognition134, 109050 (2023)

    Tian, C., Zheng, M., Zuo, W., Zhang, B., Zhang, Y., Zhang, D.: Multi-stage image denoising with the wavelet transform. Pattern Recognition134, 109050 (2023)

  43. [43]

    LLaMA: Open and Efficient Foundation Language Models

    Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

  44. [44]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik, A., Li, Y.: Maxim: Multi-axis mlp for image processing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5769–5780 (2022)

  45. [45]

    IEEE Transac- tions on Image Processing35, 3421–3436 (2026)

    Wang, B., Qin, B., Li, J., Xu, F., Sun, F., Xiong, H.: All-in-one image restoration via causal-deconfounding wavelet-disentangled prompt network. IEEE Transac- tions on Image Processing35, 3421–3436 (2026)

  46. [46]

    In: Proceedings of the AAAI conference on artificial intelligence

    Wang, J., Chan, K.C., Loy, C.C.: Exploring clip for assessing the look and feel of images. In: Proceedings of the AAAI conference on artificial intelligence. vol. 37, pp. 2555–2563 (2023)

  47. [47]

    IEEE transactions on image processing 13(4), 600–612 (2004)

    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4), 600–612 (2004)

  48. [48]

    Advances in neural information processing systems35, 24824–24837 (2022)

    Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems35, 24824–24837 (2022)

  49. [49]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Wei, K., Fu, Y., Yang, J., Huang, H.: A physics-based noise formation model for extreme low-light raw denoising. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2758–2767 (2020)

  50. [50]

    In: European conference on computer vision

    Wei, P., Xie, Z., Lu, H., Zhan, Z., Ye, Q., Zuo, W., Lin, L.: Component divide- and-conquer for real-world image super-resolution. In: European conference on computer vision. pp. 101–117. Springer (2020)

  51. [51]

    In: First Conference on Language Modeling (2024)

    Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., et al.: Autogen: Enabling next-gen llm applications via multi- agent conversations. In: First Conference on Language Modeling (2024)

  52. [52]

    Advances in Neural Information Processing Systems 37, 92529–92553 (2024)

    Wu, R., Sun, L., Ma, Z., Zhang, L.: One-step effective diffusion network for real- world image super-resolution. Advances in Neural Information Processing Systems 37, 92529–92553 (2024)

  53. [53]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Wu, R.Q., Duan, Z.P., Guo, C.L., Chai, Z., Li, C.: Ridcp: Revitalizing real im- age dehazing via high-quality codebook priors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 22282–22291 (2023)

  54. [54]

    Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

    Wu, X., Hao, Y., Sun, K., Chen, Y., Zhu, F., Zhao, R., Li, H.: Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis. arXiv preprint arXiv:2306.09341 (2023) Self-Evolving Agentic Image Restoration 19

  55. [55]

    IEEE transactions on pattern analysis and machine intelligence45(11), 12978–12995 (2022)

    Xiao, J., Fu, X., Liu, A., Wu, F., Zha, Z.J.: Image de-raining transformer. IEEE transactions on pattern analysis and machine intelligence45(11), 12978–12995 (2022)

  56. [56]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

    Yang, S., Wu, T., Shi, S., Lao, S., Gong, Y., Cao, M., Wang, J., Yang, Y.: Maniqa: Multi-dimension attention network for no-reference image quality assessment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 1191–1200 (2022)

  57. [57]

    In: European Conference on Computer Vision

    You,Z.,Li,Z.,Gu,J.,Yin,Z.,Xue,T.,Dong,C.:Depictingbeyondscores:Advanc- ing image quality assessment through multi-modal language models. In: European Conference on Computer Vision. pp. 259–276. Springer (2024)

  58. [58]

    In: The Thirteenth International Conference on Learning Representations (2025)

    Yu, X., Peng, B., Vajipey, V., Cheng, H., Galley, M., Gao, J., Yu, Z.: Exact: Teaching ai agents to explore with reflective-mcts and exploratory learning. In: The Thirteenth International Conference on Learning Representations (2025)

  59. [59]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: Efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5728–5739 (2022)

  60. [60]

    IEEE transactions on image processing26(7), 3142–3155 (2017)

    Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE transactions on image processing26(7), 3142–3155 (2017)

  61. [61]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)

  62. [62]

    In: Proceedings of the AAAI conference on artificial intelligence

    Zhong, W., Guo, L., Gao, Q., Ye, H., Wang, Y.: Memorybank: Enhancing large language models with long-term memory. In: Proceedings of the AAAI conference on artificial intelligence. vol. 38, pp. 19724–19731 (2024)

  63. [63]

    In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition

    Zhou, Y., Ren, D., Emerton, N., Lim, S., Large, T.: Image restoration for under- display camera. In: Proceedings of the ieee/cvf conference on computer vision and pattern recognition. pp. 9179–9188 (2021)

  64. [64]

    In: International Conference on Learning Repre- sentations

    Zhu, K., Gu, J., You, Z., Qiao, Y., Dong, C.: An intelligent agentic system for com- plex image restoration problems. In: International Conference on Learning Repre- sentations. vol. 2025, pp. 57985–58013 (2025)

  65. [65]

    Proceedings of the IEEE111(3), 257–276 (2023)

    Zou, Z., Chen, K., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: A survey. Proceedings of the IEEE111(3), 257–276 (2023)

  66. [66]

    In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

    Zuo, Y., Zheng, Q., Wu, M., Jiang, X., Li, R., Wang, J., Zhang, Y., Mai, G., Wang, L., Zou, J., et al.: 4kagent: Agentic any image to 4k super-resolution. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)