pith. sign in

arxiv: 2605.22104 · v1 · pith:XKAXGGXHnew · submitted 2026-05-21 · 💻 cs.CV

OPERA: An Agent for Image Restoration with End-to-End Joint Planning-Execution Optimization

Pith reviewed 2026-05-22 07:15 UTC · model grok-4.3

classification 💻 cs.CV
keywords image restorationreinforcement learningagent-based methodstool compositionco-trainingmulti-degradationend-to-end optimization
0
0 comments X

The pith

OPERA jointly optimizes planning and execution of restoration tools using reinforcement learning to handle mixed image degradations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to show that agent-based image restoration is held back by limited planning spaces and tools that are trained separately without learning to work together. It proposes an end-to-end framework where reinforcement learning directly selects sequences of tools based on final output quality, while co-training lets the tools adapt to each other's outputs in sequence. A reader would care because everyday photos often combine several degradations at once, such as noise plus blur, and current single models or uncoordinated agents fall short. If the joint approach works, restoration systems could become more flexible for real photographs without needing hand-designed rules for every degradation type.

Core claim

OPERA jointly optimizes restoration planning and tool execution in an end-to-end manner. On the planning side, it uses reinforcement learning to directly optimize tool composition over a combinatorial plan space, with the final restoration quality as the reward. On the execution side, it introduces agent-guided co-training of restoration tools, enabling them to learn cooperative behaviors under sequential composition.

What carries the argument

The end-to-end joint optimization loop in which reinforcement learning searches tool sequences for maximum final quality while co-training adapts each tool to the outputs of prior tools in the sequence.

Load-bearing premise

Reinforcement learning can stably search the space of tool sequences without getting lost in sparse rewards or huge combinatorial explosion, and co-training will produce genuine cooperation rather than just independent improvements.

What would settle it

Training runs where the reinforcement learning policy shows no improvement over random or fixed tool ordering, or where co-trained tools give the same results as independently trained tools when applied in sequence.

Figures

Figures reproduced from arXiv: 2605.22104 by Feng Zhu, Ming Liu, Shuyang Xie, Wangmeng Zuo, Yihan Zeng.

Figure 1
Figure 1. Figure 1: Empirical study of cooperative multi-tool image restoration. Zoom in to see image details. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our OPERA framework. (a) Planning Optimization: The restoration agent is trained via Group Relative Policy Optimization (GRPO) to generate complete restoration plans end￾to-end, receiving rewards based on final image quality. (b) Execution Optimization: At inference time, the agent generates a restoration plan that is executed by specialized tools. The tools are jointly optimized under agent gu… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison on benchmarks from AgenticIR [48]. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The planning optimization GRPO training dynamics. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of mean response length during training with and without consistency reward. [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison across degradation categories of Groups A and B on the AgenticIR [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative comparison on Group C triple-degradation categories (part 2/2). Metrics [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
read the original abstract

Real-world image restoration is challenging due to complex and interacting mixed degradations. Recent agent-based approaches address this problem by composing multiple task-specific restoration tools. However, empirical analysis reveals that their performance is fundamentally limited by implicitly constrained planning spaces and the lack of coordination among independently pretrained tools. To address these issues, we propose OPERA (Optimized Planning-Execution Restoration Agent), a framework that jointly optimizes restoration planning and tool execution in an end-to-end manner. On the planning side, OPERA uses reinforcement learning to directly optimize tool composition over a combinatorial plan space, with the final restoration quality as the reward. On the execution side, OPERA introduces agent-guided co-training of restoration tools, enabling them to learn cooperative behaviors under sequential composition. Extensive experiments on multi-degradation benchmarks and real-world datasets demonstrate that OPERA consistently outperforms both all-in-one restoration models and existing agent-based methods across diverse and complex degradation scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes OPERA, an agent-based framework for real-world image restoration under complex mixed degradations. It jointly optimizes planning and execution end-to-end: reinforcement learning is used to optimize tool composition over a combinatorial plan space with final restoration quality as the reward, while agent-guided co-training enables cooperative behaviors among restoration tools. Extensive experiments on multi-degradation benchmarks and real-world datasets are reported to show consistent outperformance over all-in-one models and prior agent-based methods.

Significance. If the central claims hold under scrutiny, the work would demonstrate a viable path for end-to-end optimization of planning-execution loops in agentic vision systems, addressing limitations of independently pretrained tools and constrained planners. The combination of RL-driven combinatorial planning with co-training could influence future agent designs for sequential decision tasks in computer vision.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (method): the central claim that RL directly optimizes tool composition over the combinatorial plan space using only terminal restoration quality as reward is load-bearing for the 'end-to-end joint optimization' headline. Standard policy-gradient methods face well-known sparse-reward and credit-assignment difficulties in long-horizon combinatorial spaces; the manuscript does not describe reward shaping, hierarchical decomposition, variance-reduction baselines, or any other mitigation, leaving the stability of the claimed optimization unclear.
  2. [§4] §4 (experiments): while the abstract asserts consistent outperformance, the reported results must include ablations that isolate the contribution of the RL planner versus the co-training component. Without such controls (e.g., comparing against a fixed planner with co-trained tools), it is impossible to verify that the joint optimization, rather than independent tool improvements, drives the gains.
minor comments (2)
  1. [Abstract and §3] Notation for the RL policy and value functions should be introduced once and used consistently; currently the abstract and method description mix 'plan space' and 'tool composition' without a clear formal definition.
  2. [§4] Figure captions and axis labels in the experimental section would benefit from explicit mention of the exact degradation combinations and metrics (PSNR/SSIM/LPIPS) used for each comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify key aspects of our method and experiments. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (method): the central claim that RL directly optimizes tool composition over the combinatorial plan space using only terminal restoration quality as reward is load-bearing for the 'end-to-end joint optimization' headline. Standard policy-gradient methods face well-known sparse-reward and credit-assignment difficulties in long-horizon combinatorial spaces; the manuscript does not describe reward shaping, hierarchical decomposition, variance-reduction baselines, or any other mitigation, leaving the stability of the claimed optimization unclear.

    Authors: We appreciate the referee's observation on the challenges of sparse rewards in long-horizon RL for combinatorial planning. Section 3 formulates the planning as RL with terminal restoration quality as reward and employs a policy-gradient approach, but we acknowledge that explicit discussion of stability measures is needed. In the revision, we will expand §3 to detail the specific RL algorithm (including variance-reduction baselines), reward scaling, and any episode-length handling that supports stable optimization in our setting. revision: yes

  2. Referee: [§4] §4 (experiments): while the abstract asserts consistent outperformance, the reported results must include ablations that isolate the contribution of the RL planner versus the co-training component. Without such controls (e.g., comparing against a fixed planner with co-trained tools), it is impossible to verify that the joint optimization, rather than independent tool improvements, drives the gains.

    Authors: We agree that isolating the RL planner's contribution from the co-training is important to substantiate the joint optimization claim. Our current experiments compare against all-in-one models and prior agent methods, but we will add targeted ablations in the revised §4, including a fixed-planner variant with co-trained tools and an RL-planner variant without co-training, to demonstrate that the end-to-end joint optimization is responsible for the observed gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core description in the abstract frames OPERA as using RL to optimize tool sequences with terminal restoration quality as the explicit reward signal. This is a standard RL setup for directly targeting the desired objective metric rather than a self-referential loop or fitted parameter renamed as a prediction. No equations, self-citations, or ansatzes are quoted that reduce the claimed joint planning-execution optimization to its inputs by construction. The method is presented as an independent algorithmic contribution with external benchmarks for validation, satisfying the criteria for a self-contained derivation without load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The abstract relies on the unstated premise that RL can tractably search the combinatorial plan space and that co-training produces cooperative rather than merely additive tool behavior; no free parameters or invented entities are explicitly named.

axioms (2)
  • domain assumption Reinforcement learning can directly optimize tool composition over a combinatorial plan space using final restoration quality as reward
    Invoked in the planning-side description; no justification or implementation detail supplied.
  • domain assumption Agent-guided co-training enables restoration tools to learn cooperative behaviors under sequential composition
    Invoked in the execution-side description; no mechanism or loss term described.

pith-pipeline@v0.9.0 · 5696 in / 1501 out tokens · 33671 ms · 2026-05-22T07:15:49.759696+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 7 internal anchors

  1. [1]

    Not just streaks: Towards ground truth for single image deraining

    Yunhao Ba, Howard Zhang, Ethan Yang, Akira Suzuki, Arnold Pfahnl, Chethan Chinder Chandrappa, Celso de Melo, Suya You, Stefano Soatto, Alex Wong, and Achuta Kadambi. Not just streaks: Towards ground truth for single image deraining. InECCV, 2022

  2. [2]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025

  3. [3]

    Pangu embedded: An efficient dual-system llm reasoner with metacognition.arXiv preprint arXiv:2505.22375, 2025

    Hanting Chen, Yasheng Wang, Kai Han, Dong Li, Lin Li, Zhenni Bi, Jinpeng Li, Haoyu Wang, Fei Mi, Mingjian Zhu, et al. Pangu embedded: An efficient dual-system llm reasoner with metacognition.arXiv preprint arXiv:2505.22375, 2025

  4. [4]

    Restoreagent: Autonomous image restoration agent via multimodal large language models.Advances in Neural Information Processing Systems, 37:110643–110666, 2024

    Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Sixiang Chen, Tian Ye, Renjing Pei, Kaiwen Zhou, Fenglong Song, and Lei Zhu. Restoreagent: Autonomous image restoration agent via multimodal large language models.Advances in Neural Information Processing Systems, 37:110643–110666, 2024

  5. [5]

    Bidirectional multi-scale implicit neural rep- resentations for image deraining

    Xiang Chen, Jinshan Pan, and Jiangxin Dong. Bidirectional multi-scale implicit neural rep- resentations for image deraining. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25627–25636, 2024

  6. [6]

    A comparative study of image restoration networks for general backbone network design

    Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, and Chao Dong. A comparative study of image restoration networks for general backbone network design. In European Conference on Computer Vision, pages 74–91. Springer, 2024

  7. [7]

    Dea-net: Single image dehazing based on detail- enhanced convolution and content-guided attention.IEEE transactions on image processing, 33:1002–1015, 2024

    Zixuan Chen, Zewei He, and Zhe-Ming Lu. Dea-net: Single image dehazing based on detail- enhanced convolution and content-guided attention.IEEE transactions on image processing, 33:1002–1015, 2024

  8. [8]

    Instructir: High-quality image restoration following human instructions

    Marcos V Conde, Gregor Geigle, and Radu Timofte. Instructir: High-quality image restoration following human instructions. InEuropean Conference on Computer Vision, pages 1–21. Springer, 2024

  9. [9]

    Advancing real-world image dehazing: Perspective, modules, and training.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9303–9320, 2024

    Yuxin Feng, Long Ma, Xiaozhe Meng, Fan Zhou, Risheng Liu, and Zhuo Su. Advancing real-world image dehazing: Perspective, modules, and training.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9303–9320, 2024

  10. [10]

    Iterative predictor-critic code decoding for real-world image dehazing

    Jiayi Fu, Siyu Liu, Zikun Liu, Chun-Le Guo, Hyunhee Park, Ruiqi Wu, Guoqing Wang, and Chongyi Li. Iterative predictor-critic code decoding for real-world image dehazing. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 12700–12709, 2025

  11. [11]

    Efficient frequency-domain image deraining with contrastive regularization

    Ning Gao, Xingyu Jiang, Xiuhui Zhang, and Yue Deng. Efficient frequency-domain image deraining with contrastive regularization. InEuropean conference on computer vision, pages 240–257. Springer, 2024

  12. [12]

    Image dehazing transformer with transmission-aware 3d position embedding

    Chun-Le Guo, Qixin Yan, Saeed Anwar, Runmin Cong, Wenqi Ren, and Chongyi Li. Image dehazing transformer with transmission-aware 3d position embedding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5812–5820, 2022

  13. [13]

    From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal

    Yun Guo, Xueyao Xiao, Yi Chang, Shumin Deng, and Luxin Yan. From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12097–12107, October 2023. 10

  14. [14]

    A survey on all-in-one image restoration: Taxonomy, evaluation and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Junjun Jiang, Zengyuan Zuo, Gang Wu, Kui Jiang, and Xianming Liu. A survey on all-in-one image restoration: Taxonomy, evaluation and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  15. [15]

    Multi-agent image restoration.arXiv preprint arXiv:2503.09403, 2025

    Xu Jiang, Gehui Li, Bin Chen, and Jian Zhang. Multi-agent image restoration.arXiv preprint arXiv:2503.09403, 2025

  16. [16]

    Autodir: Automatic all-in-one image restoration with latent diffusion

    Yitong Jiang, Zhaoyang Zhang, Tianfan Xue, and Jinwei Gu. Autodir: Automatic all-in-one image restoration with latent diffusion. InEuropean Conference on Computer Vision, pages 340–359. Springer, 2024

  17. [17]

    Perceptual losses for real-time style transfer and super-resolution

    Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. InEuropean conference on computer vision, pages 694–711. Springer, 2016

  18. [18]

    Musiq: Multi-scale image quality transformer

    Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021

  19. [19]

    Towards ef- fective multiple-in-one image restoration: A sequential and prompt learning strategy

    Xiangtao Kong, Chao Dong, and Lei Zhang. Towards effective multiple-in-one image restora- tion: A sequential and prompt learning strategy.arXiv preprint arXiv:2401.03379, 2024

  20. [20]

    Benchmarking single-image dehazing and beyond.IEEE transactions on image processing, 28(1):492–505, 2018

    Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single-image dehazing and beyond.IEEE transactions on image processing, 28(1):492–505, 2018

  21. [21]

    All-in-one image restoration for unknown corruption

    Boyun Li, Xiao Liu, Peng Hu, Zhongqin Wu, Jiancheng Lv, and Xi Peng. All-in-one image restoration for unknown corruption. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17452–17462, 2022

  22. [22]

    Swinir: Image restoration using swin transformer

    Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021

  23. [23]

    Restore-R1: Efficient Image Restoration Agents via Reinforcement Learning with Multimodal LLM Perceptual Feedback

    Jianglin Lu, Yuanwei Wu, Ziyi Zhao, Hongcheng Wang, Felix Jimenez, Abrar Majeedi, and Yun Fu. Simplecall: A lightweight image restoration agent in label-free environments with mllm perceptual feedback.arXiv preprint arXiv:2512.18599, 2025

  24. [24]

    Controlling vision-language models for multi-task image restoration

    Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B Schön. Controlling vision-language models for multi-task image restoration. InICLR, 2024

  25. [25]

    Promptir: Prompting for all-in-one image restoration.Advances in Neural Information Processing Systems, 36:71275–71293, 2023

    Vaishnav Potlapalli, Syed Waqas Zamir, Salman H Khan, and Fahad Shahbaz Khan. Promptir: Prompting for all-in-one image restoration.Advances in Neural Information Processing Systems, 36:71275–71293, 2023

  26. [26]

    Progressive image deraining networks: A better and simpler baseline

    Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, and Deyu Meng. Progressive image deraining networks: A better and simpler baseline. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3937–3946, 2019

  27. [27]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  28. [28]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

  29. [29]

    HybridFlow: A Flexible and Efficient RLHF Framework

    Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework.arXiv preprint arXiv: 2409.19256, 2024

  30. [30]

    OpenAI GPT-5 System Card

    Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025. 11

  31. [31]

    Kwai keye-vl technical report.arXiv preprint arXiv:2507.01949, 2025

    Kwai Keye Team, Biao Yang, Bin Wen, Changyi Liu, Chenglong Chu, Chengru Song, Chongling Rao, Chuan Yi, Da Li, Dunju Zang, et al. Kwai keye-vl technical report.arXiv preprint arXiv:2507.01949, 2025

  32. [32]

    Transweather: Transformer- based restoration of images degraded by adverse weather conditions

    Jeya Maria Jose Valanarasu, Rajeev Yasarla, and Vishal M Patel. Transweather: Transformer- based restoration of images degraded by adverse weather conditions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2353–2363, 2022

  33. [33]

    Exploring clip for assessing the look and feel of images

    Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 2555–2563, 2023

  34. [34]

    Spatial attentive single-image deraining with a high quality real rain dataset

    Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, and Rynson WH Lau. Spatial attentive single-image deraining with a high quality real rain dataset. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12270–12279, 2019

  35. [35]

    Uformer: A general u-shaped transformer for image restoration

    Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A general u-shaped transformer for image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17683–17693, 2022

  36. [36]

    Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600– 612, 2004

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600– 612, 2004

  37. [37]

    Chain-of-thought prompting elicits reasoning in large language models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

  38. [38]

    Scdformer: Spatial and channel denoising transformer for human pose estimation using millimeter-wave radar

    Qiuxia Wu, Yu Sun, Panpan Cai, and Wenxiong Kang. Scdformer: Spatial and channel denoising transformer for human pose estimation using millimeter-wave radar. In2025 IEEE International Joint Conference on Biometrics (IJCB), pages 1–10. IEEE, 2025

  39. [39]

    VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank.arXiv e-prints2025, arXiv:2505.14460

    Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, and Kede Ma. VisualQuality-R1: Reasoning- induced image quality assessment via reinforcement learning to rank.arXiv preprint arXiv:2505.14460, 2025

  40. [40]

    Towards real-world adverse weather image restoration: Enhancing clearness and semantics with vision- language models

    Jiaqi Xu, Mengyang Wu, Xiaowei Hu, Chi-Wing Fu, Qi Dou, and Pheng-Ann Heng. Towards real-world adverse weather image restoration: Enhancing clearness and semantics with vision- language models. InEuropean Conference on Computer Vision, pages 147–164. Springer, 2024

  41. [41]

    Maniqa: Multi-dimension attention network for no-reference image quality assessment

    Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1191–1200, 2022

  42. [42]

    Crafting a toolchain for image restoration by deep reinforcement learning

    Ke Yu, Chao Dong, Liang Lin, and Chen Change Loy. Crafting a toolchain for image restoration by deep reinforcement learning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2443–2452, 2018

  43. [43]

    Restormer: Efficient transformer for high-resolution image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022

  44. [44]

    Multi-stage progressive image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming- Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021

  45. [45]

    The unrea- sonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018. 12

  46. [46]

    R1-reward: Training multimodal reward model through stable reinforcement learning.arXiv preprint arXiv:2505.02835, 2025

    Yi-Fan Zhang, Xingyu Lu, Xiao Hu, Chaoyou Fu, Bin Wen, Tianke Zhang, Changyi Liu, Kaiyu Jiang, Kaibing Chen, Kaiyu Tang, et al. R1-reward: Training multimodal reward model through stable reinforcement learning.arXiv preprint arXiv:2505.02835, 2025

  47. [47]

    Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model

    Yingjie Zhou, Jiezhang Cao, Zicheng Zhang, Farong Wen, Yanwei Jiang, Jun Jia, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. Q-agent: Quality-driven chain-of-thought image restoration agent through robust multimodal large language model.arXiv preprint arXiv:2504.07148, 2025

  48. [48]

    An intelligent agentic system for complex image restoration problems

    Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, and Chao Dong. An intelligent agentic system for complex image restoration problems. InThe Thirteenth International Conference on Learning Representations, 2025

  49. [49]

    Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, and Zhengzhong Tu

    Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V . Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, and Zhengzhong Tu. 4kagent: Agentic any image to 4k super-resolution. 2025. 13 A Appendix Overview Appendix B provides detailed experimental settings for the empirical studies presented in Section 3. Ap...

  50. [50]

    Evaluate the Reasoning Process - The reasoning process must NOT be empty - It must contain meaningful, coherent, and logical reasoning steps - It should include analysis of constraints, assumptions, or decision logic - If the reasoning process is missing, empty, superficial, or logically flawed, mark it as unreasonable

  51. [51]

    Check Consistency Between Reasoning Process and Final Plan - The final plan must be logically derivable from the reasoning process - There should be no contradictions between the reasoning process and the final plan - If the reasoning supports one conclusion but the final plan states another, mark them as inconsistent

  52. [52]

    Yes” or “No

    Provide a Clear Judgment and Explanation Only output a single “Yes” or “No”. Do not provide other explanations or text. 23 Table 14: Full prompt used for planning agent. Usage Prompt System Prompt You are a professional image restoration assistant. You will be given an image as input. Your task is to:

  53. [53]

    Visually analyze the image and identify what degradations it contains

  54. [54]

    Design an optimal sequence of restoration tool calls to enhance the image quality. # Possible Degradations: - noise - rain - haze - defocus_blur - motion_blur - low_resolution - jpeg # Tools from Restormer - restormer.gaussian_denoise_15 - restormer.gaussian_denoise_25 - restormer.gaussian_denoise_50 - restormer.derain - restormer.defocus_deblur - restorm...