OPERA: An Agent for Image Restoration with End-to-End Joint Planning-Execution Optimization

Feng Zhu; Ming Liu; Shuyang Xie; Wangmeng Zuo; Yihan Zeng

arxiv: 2605.22104 · v1 · pith:XKAXGGXHnew · submitted 2026-05-21 · 💻 cs.CV

OPERA: An Agent for Image Restoration with End-to-End Joint Planning-Execution Optimization

Feng Zhu , Shuyang Xie , Yihan Zeng , Ming Liu , Wangmeng Zuo This is my paper

Pith reviewed 2026-05-22 07:15 UTC · model grok-4.3

classification 💻 cs.CV

keywords image restorationreinforcement learningagent-based methodstool compositionco-trainingmulti-degradationend-to-end optimization

0 comments

The pith

OPERA jointly optimizes planning and execution of restoration tools using reinforcement learning to handle mixed image degradations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to show that agent-based image restoration is held back by limited planning spaces and tools that are trained separately without learning to work together. It proposes an end-to-end framework where reinforcement learning directly selects sequences of tools based on final output quality, while co-training lets the tools adapt to each other's outputs in sequence. A reader would care because everyday photos often combine several degradations at once, such as noise plus blur, and current single models or uncoordinated agents fall short. If the joint approach works, restoration systems could become more flexible for real photographs without needing hand-designed rules for every degradation type.

Core claim

OPERA jointly optimizes restoration planning and tool execution in an end-to-end manner. On the planning side, it uses reinforcement learning to directly optimize tool composition over a combinatorial plan space, with the final restoration quality as the reward. On the execution side, it introduces agent-guided co-training of restoration tools, enabling them to learn cooperative behaviors under sequential composition.

What carries the argument

The end-to-end joint optimization loop in which reinforcement learning searches tool sequences for maximum final quality while co-training adapts each tool to the outputs of prior tools in the sequence.

Load-bearing premise

Reinforcement learning can stably search the space of tool sequences without getting lost in sparse rewards or huge combinatorial explosion, and co-training will produce genuine cooperation rather than just independent improvements.

What would settle it

Training runs where the reinforcement learning policy shows no improvement over random or fixed tool ordering, or where co-trained tools give the same results as independently trained tools when applied in sequence.

Figures

Figures reproduced from arXiv: 2605.22104 by Feng Zhu, Ming Liu, Shuyang Xie, Wangmeng Zuo, Yihan Zeng.

**Figure 2.** Figure 2: Overview of our OPERA framework. (a) Planning Optimization: The restoration agent is trained via Group Relative Policy Optimization (GRPO) to generate complete restoration plans endto-end, receiving rewards based on final image quality. (b) Execution Optimization: At inference time, the agent generates a restoration plan that is executed by specialized tools. The tools are jointly optimized under agent gu… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison on benchmarks from AgenticIR [48]. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The planning optimization GRPO training dynamics. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of mean response length during training with and without consistency reward. [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparison across degradation categories of Groups A and B on the AgenticIR [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative comparison on Group C triple-degradation categories (part 2/2). Metrics [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

read the original abstract

Real-world image restoration is challenging due to complex and interacting mixed degradations. Recent agent-based approaches address this problem by composing multiple task-specific restoration tools. However, empirical analysis reveals that their performance is fundamentally limited by implicitly constrained planning spaces and the lack of coordination among independently pretrained tools. To address these issues, we propose OPERA (Optimized Planning-Execution Restoration Agent), a framework that jointly optimizes restoration planning and tool execution in an end-to-end manner. On the planning side, OPERA uses reinforcement learning to directly optimize tool composition over a combinatorial plan space, with the final restoration quality as the reward. On the execution side, OPERA introduces agent-guided co-training of restoration tools, enabling them to learn cooperative behaviors under sequential composition. Extensive experiments on multi-degradation benchmarks and real-world datasets demonstrate that OPERA consistently outperforms both all-in-one restoration models and existing agent-based methods across diverse and complex degradation scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OPERA uses RL to optimize tool sequences for image restoration and adds co-training for better coordination, but the abstract gives no numbers or implementation details to back the performance claims.

read the letter

The core idea is straightforward: OPERA runs reinforcement learning over sequences of restoration tools, using final image quality as the reward, while also co-training the tools so they adapt to being used in combination. This is positioned as fixing two limits in prior agent work—narrow planning spaces and uncoordinated independent tools. That joint end-to-end framing is the main new piece. It directly targets real issues in handling mixed degradations, and the co-training step is a sensible way to encourage tools to complement each other rather than just run separately. If the experiments hold, it could give restoration systems more flexibility on complex real-world cases without needing a single all-in-one model. The approach stays within the computer-vision restoration subfield and builds on existing RL-for-composition ideas, so the novelty is incremental rather than a big shift. The soft spots are mostly around missing evidence. The abstract asserts consistent gains over baselines but shows no quantitative results, ablations, or specifics on how the RL policy is trained, how the combinatorial space is searched, or whether any reward shaping or variance reduction is used. The sparse-reward and credit-assignment problem in long tool sequences is a legitimate worry here; without those details it is hard to know whether the planner actually learns useful compositions or just rides along with improved individual tools. The full paper would need to show that the RL component delivers measurable gains beyond what simpler search or independent training would achieve. This is the kind of work that would interest people already working on agent-based or multi-tool restoration pipelines. A reader focused on practical gains for complex degradations could get something useful out of the experiments if they are solid. It is worth sending to peer review so the methods and results can be checked properly.

Referee Report

2 major / 2 minor

Summary. The paper proposes OPERA, an agent-based framework for real-world image restoration under complex mixed degradations. It jointly optimizes planning and execution end-to-end: reinforcement learning is used to optimize tool composition over a combinatorial plan space with final restoration quality as the reward, while agent-guided co-training enables cooperative behaviors among restoration tools. Extensive experiments on multi-degradation benchmarks and real-world datasets are reported to show consistent outperformance over all-in-one models and prior agent-based methods.

Significance. If the central claims hold under scrutiny, the work would demonstrate a viable path for end-to-end optimization of planning-execution loops in agentic vision systems, addressing limitations of independently pretrained tools and constrained planners. The combination of RL-driven combinatorial planning with co-training could influence future agent designs for sequential decision tasks in computer vision.

major comments (2)

[Abstract and §3] Abstract and §3 (method): the central claim that RL directly optimizes tool composition over the combinatorial plan space using only terminal restoration quality as reward is load-bearing for the 'end-to-end joint optimization' headline. Standard policy-gradient methods face well-known sparse-reward and credit-assignment difficulties in long-horizon combinatorial spaces; the manuscript does not describe reward shaping, hierarchical decomposition, variance-reduction baselines, or any other mitigation, leaving the stability of the claimed optimization unclear.
[§4] §4 (experiments): while the abstract asserts consistent outperformance, the reported results must include ablations that isolate the contribution of the RL planner versus the co-training component. Without such controls (e.g., comparing against a fixed planner with co-trained tools), it is impossible to verify that the joint optimization, rather than independent tool improvements, drives the gains.

minor comments (2)

[Abstract and §3] Notation for the RL policy and value functions should be introduced once and used consistently; currently the abstract and method description mix 'plan space' and 'tool composition' without a clear formal definition.
[§4] Figure captions and axis labels in the experimental section would benefit from explicit mention of the exact degradation combinations and metrics (PSNR/SSIM/LPIPS) used for each comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify key aspects of our method and experiments. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (method): the central claim that RL directly optimizes tool composition over the combinatorial plan space using only terminal restoration quality as reward is load-bearing for the 'end-to-end joint optimization' headline. Standard policy-gradient methods face well-known sparse-reward and credit-assignment difficulties in long-horizon combinatorial spaces; the manuscript does not describe reward shaping, hierarchical decomposition, variance-reduction baselines, or any other mitigation, leaving the stability of the claimed optimization unclear.

Authors: We appreciate the referee's observation on the challenges of sparse rewards in long-horizon RL for combinatorial planning. Section 3 formulates the planning as RL with terminal restoration quality as reward and employs a policy-gradient approach, but we acknowledge that explicit discussion of stability measures is needed. In the revision, we will expand §3 to detail the specific RL algorithm (including variance-reduction baselines), reward scaling, and any episode-length handling that supports stable optimization in our setting. revision: yes
Referee: [§4] §4 (experiments): while the abstract asserts consistent outperformance, the reported results must include ablations that isolate the contribution of the RL planner versus the co-training component. Without such controls (e.g., comparing against a fixed planner with co-trained tools), it is impossible to verify that the joint optimization, rather than independent tool improvements, drives the gains.

Authors: We agree that isolating the RL planner's contribution from the co-training is important to substantiate the joint optimization claim. Our current experiments compare against all-in-one models and prior agent methods, but we will add targeted ablations in the revised §4, including a fixed-planner variant with co-trained tools and an RL-planner variant without co-training, to demonstrate that the end-to-end joint optimization is responsible for the observed gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core description in the abstract frames OPERA as using RL to optimize tool sequences with terminal restoration quality as the explicit reward signal. This is a standard RL setup for directly targeting the desired objective metric rather than a self-referential loop or fitted parameter renamed as a prediction. No equations, self-citations, or ansatzes are quoted that reduce the claimed joint planning-execution optimization to its inputs by construction. The method is presented as an independent algorithmic contribution with external benchmarks for validation, satisfying the criteria for a self-contained derivation without load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The abstract relies on the unstated premise that RL can tractably search the combinatorial plan space and that co-training produces cooperative rather than merely additive tool behavior; no free parameters or invented entities are explicitly named.

axioms (2)

domain assumption Reinforcement learning can directly optimize tool composition over a combinatorial plan space using final restoration quality as reward
Invoked in the planning-side description; no justification or implementation detail supplied.
domain assumption Agent-guided co-training enables restoration tools to learn cooperative behaviors under sequential composition
Invoked in the execution-side description; no mechanism or loss term described.

pith-pipeline@v0.9.0 · 5696 in / 1501 out tokens · 33671 ms · 2026-05-22T07:15:49.759696+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

OPERA uses reinforcement learning to directly optimize tool composition over a combinatorial plan space, with the final restoration quality as the reward.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 7 internal anchors

[1]

Not just streaks: Towards ground truth for single image deraining

Yunhao Ba, Howard Zhang, Ethan Yang, Akira Suzuki, Arnold Pfahnl, Chethan Chinder Chandrappa, Celso de Melo, Suya You, Stefano Soatto, Alex Wong, and Achuta Kadambi. Not just streaks: Towards ground truth for single image deraining. InECCV, 2022

work page 2022
[2]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Pangu embedded: An efficient dual-system llm reasoner with metacognition.arXiv preprint arXiv:2505.22375, 2025

Hanting Chen, Yasheng Wang, Kai Han, Dong Li, Lin Li, Zhenni Bi, Jinpeng Li, Haoyu Wang, Fei Mi, Mingjian Zhu, et al. Pangu embedded: An efficient dual-system llm reasoner with metacognition.arXiv preprint arXiv:2505.22375, 2025

work page arXiv 2025
[4]

Restoreagent: Autonomous image restoration agent via multimodal large language models.Advances in Neural Information Processing Systems, 37:110643–110666, 2024

Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Sixiang Chen, Tian Ye, Renjing Pei, Kaiwen Zhou, Fenglong Song, and Lei Zhu. Restoreagent: Autonomous image restoration agent via multimodal large language models.Advances in Neural Information Processing Systems, 37:110643–110666, 2024

work page 2024
[5]

Bidirectional multi-scale implicit neural rep- resentations for image deraining

Xiang Chen, Jinshan Pan, and Jiangxin Dong. Bidirectional multi-scale implicit neural rep- resentations for image deraining. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25627–25636, 2024

work page 2024
[6]

A comparative study of image restoration networks for general backbone network design

Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, and Chao Dong. A comparative study of image restoration networks for general backbone network design. In European Conference on Computer Vision, pages 74–91. Springer, 2024

work page 2024
[7]

Dea-net: Single image dehazing based on detail- enhanced convolution and content-guided attention.IEEE transactions on image processing, 33:1002–1015, 2024

Zixuan Chen, Zewei He, and Zhe-Ming Lu. Dea-net: Single image dehazing based on detail- enhanced convolution and content-guided attention.IEEE transactions on image processing, 33:1002–1015, 2024

work page 2024
[8]

Instructir: High-quality image restoration following human instructions

Marcos V Conde, Gregor Geigle, and Radu Timofte. Instructir: High-quality image restoration following human instructions. InEuropean Conference on Computer Vision, pages 1–21. Springer, 2024

work page 2024
[9]

Advancing real-world image dehazing: Perspective, modules, and training.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9303–9320, 2024

Yuxin Feng, Long Ma, Xiaozhe Meng, Fan Zhou, Risheng Liu, and Zhuo Su. Advancing real-world image dehazing: Perspective, modules, and training.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9303–9320, 2024

work page 2024
[10]

Iterative predictor-critic code decoding for real-world image dehazing

Jiayi Fu, Siyu Liu, Zikun Liu, Chun-Le Guo, Hyunhee Park, Ruiqi Wu, Guoqing Wang, and Chongyi Li. Iterative predictor-critic code decoding for real-world image dehazing. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 12700–12709, 2025

work page 2025
[11]

Efficient frequency-domain image deraining with contrastive regularization

Ning Gao, Xingyu Jiang, Xiuhui Zhang, and Yue Deng. Efficient frequency-domain image deraining with contrastive regularization. InEuropean conference on computer vision, pages 240–257. Springer, 2024

work page 2024
[12]

Image dehazing transformer with transmission-aware 3d position embedding

Chun-Le Guo, Qixin Yan, Saeed Anwar, Runmin Cong, Wenqi Ren, and Chongyi Li. Image dehazing transformer with transmission-aware 3d position embedding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5812–5820, 2022

work page 2022
[13]

From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal

Yun Guo, Xueyao Xiao, Yi Chang, Shumin Deng, and Luxin Yan. From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12097–12107, October 2023. 10

work page 2023
[14]

A survey on all-in-one image restoration: Taxonomy, evaluation and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Junjun Jiang, Zengyuan Zuo, Gang Wu, Kui Jiang, and Xianming Liu. A survey on all-in-one image restoration: Taxonomy, evaluation and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025
[15]

Multi-agent image restoration.arXiv preprint arXiv:2503.09403, 2025

Xu Jiang, Gehui Li, Bin Chen, and Jian Zhang. Multi-agent image restoration.arXiv preprint arXiv:2503.09403, 2025

work page arXiv 2025
[16]

Autodir: Automatic all-in-one image restoration with latent diffusion

Yitong Jiang, Zhaoyang Zhang, Tianfan Xue, and Jinwei Gu. Autodir: Automatic all-in-one image restoration with latent diffusion. InEuropean Conference on Computer Vision, pages 340–359. Springer, 2024

work page 2024
[17]

Perceptual losses for real-time style transfer and super-resolution

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. InEuropean conference on computer vision, pages 694–711. Springer, 2016

work page 2016
[18]

Musiq: Multi-scale image quality transformer

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021

work page 2021
[19]

Towards ef- fective multiple-in-one image restoration: A sequential and prompt learning strategy

Xiangtao Kong, Chao Dong, and Lei Zhang. Towards effective multiple-in-one image restora- tion: A sequential and prompt learning strategy.arXiv preprint arXiv:2401.03379, 2024

work page arXiv 2024
[20]

Benchmarking single-image dehazing and beyond.IEEE transactions on image processing, 28(1):492–505, 2018

Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single-image dehazing and beyond.IEEE transactions on image processing, 28(1):492–505, 2018

work page 2018
[21]

All-in-one image restoration for unknown corruption

Boyun Li, Xiao Liu, Peng Hu, Zhongqin Wu, Jiancheng Lv, and Xi Peng. All-in-one image restoration for unknown corruption. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17452–17462, 2022

work page 2022
[22]

Swinir: Image restoration using swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021

work page 2021
[23]

Restore-R1: Efficient Image Restoration Agents via Reinforcement Learning with Multimodal LLM Perceptual Feedback

Jianglin Lu, Yuanwei Wu, Ziyi Zhao, Hongcheng Wang, Felix Jimenez, Abrar Majeedi, and Yun Fu. Simplecall: A lightweight image restoration agent in label-free environments with mllm perceptual feedback.arXiv preprint arXiv:2512.18599, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Controlling vision-language models for multi-task image restoration

Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B Schön. Controlling vision-language models for multi-task image restoration. InICLR, 2024

work page 2024
[25]

Promptir: Prompting for all-in-one image restoration.Advances in Neural Information Processing Systems, 36:71275–71293, 2023

Vaishnav Potlapalli, Syed Waqas Zamir, Salman H Khan, and Fahad Shahbaz Khan. Promptir: Prompting for all-in-one image restoration.Advances in Neural Information Processing Systems, 36:71275–71293, 2023

work page 2023
[26]

Progressive image deraining networks: A better and simpler baseline

Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, and Deyu Meng. Progressive image deraining networks: A better and simpler baseline. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3937–3946, 2019

work page 2019
[27]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[28]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

HybridFlow: A Flexible and Efficient RLHF Framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework.arXiv preprint arXiv: 2409.19256, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

OpenAI GPT-5 System Card

Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025. 11

work page internal anchor Pith review Pith/arXiv arXiv 2025
[31]

Kwai keye-vl technical report.arXiv preprint arXiv:2507.01949, 2025

Kwai Keye Team, Biao Yang, Bin Wen, Changyi Liu, Chenglong Chu, Chengru Song, Chongling Rao, Chuan Yi, Da Li, Dunju Zang, et al. Kwai keye-vl technical report.arXiv preprint arXiv:2507.01949, 2025

work page arXiv 2025
[32]

Transweather: Transformer- based restoration of images degraded by adverse weather conditions

Jeya Maria Jose Valanarasu, Rajeev Yasarla, and Vishal M Patel. Transweather: Transformer- based restoration of images degraded by adverse weather conditions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2353–2363, 2022

work page 2022
[33]

Exploring clip for assessing the look and feel of images

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 2555–2563, 2023

work page 2023
[34]

Spatial attentive single-image deraining with a high quality real rain dataset

Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, and Rynson WH Lau. Spatial attentive single-image deraining with a high quality real rain dataset. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12270–12279, 2019

work page 2019
[35]

Uformer: A general u-shaped transformer for image restoration

Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A general u-shaped transformer for image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17683–17693, 2022

work page 2022
[36]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600– 612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600– 612, 2004

work page 2004
[37]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022
[38]

Scdformer: Spatial and channel denoising transformer for human pose estimation using millimeter-wave radar

Qiuxia Wu, Yu Sun, Panpan Cai, and Wenxiong Kang. Scdformer: Spatial and channel denoising transformer for human pose estimation using millimeter-wave radar. In2025 IEEE International Joint Conference on Biometrics (IJCB), pages 1–10. IEEE, 2025

work page 2025
[39]

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank.arXiv e-prints2025, arXiv:2505.14460

Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, and Kede Ma. VisualQuality-R1: Reasoning- induced image quality assessment via reinforcement learning to rank.arXiv preprint arXiv:2505.14460, 2025

work page arXiv 2025
[40]

Towards real-world adverse weather image restoration: Enhancing clearness and semantics with vision- language models

Jiaqi Xu, Mengyang Wu, Xiaowei Hu, Chi-Wing Fu, Qi Dou, and Pheng-Ann Heng. Towards real-world adverse weather image restoration: Enhancing clearness and semantics with vision- language models. InEuropean Conference on Computer Vision, pages 147–164. Springer, 2024

work page 2024
[41]

Maniqa: Multi-dimension attention network for no-reference image quality assessment

Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1191–1200, 2022

work page 2022
[42]

Crafting a toolchain for image restoration by deep reinforcement learning

Ke Yu, Chao Dong, Liang Lin, and Chen Change Loy. Crafting a toolchain for image restoration by deep reinforcement learning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2443–2452, 2018

work page 2018
[43]

Restormer: Efficient transformer for high-resolution image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022

work page 2022
[44]

Multi-stage progressive image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming- Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021

work page 2021
[45]

The unrea- sonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018. 12

work page 2018
[46]

R1-reward: Training multimodal reward model through stable reinforcement learning.arXiv preprint arXiv:2505.02835, 2025

Yi-Fan Zhang, Xingyu Lu, Xiao Hu, Chaoyou Fu, Bin Wen, Tianke Zhang, Changyi Liu, Kaiyu Jiang, Kaibing Chen, Kaiyu Tang, et al. R1-reward: Training multimodal reward model through stable reinforcement learning.arXiv preprint arXiv:2505.02835, 2025

work page arXiv 2025
[47]

Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model

Yingjie Zhou, Jiezhang Cao, Zicheng Zhang, Farong Wen, Yanwei Jiang, Jun Jia, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. Q-agent: Quality-driven chain-of-thought image restoration agent through robust multimodal large language model.arXiv preprint arXiv:2504.07148, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[48]

An intelligent agentic system for complex image restoration problems

Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, and Chao Dong. An intelligent agentic system for complex image restoration problems. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[49]

Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, and Zhengzhong Tu

Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V . Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, and Zhengzhong Tu. 4kagent: Agentic any image to 4k super-resolution. 2025. 13 A Appendix Overview Appendix B provides detailed experimental settings for the empirical studies presented in Section 3. Ap...

work page arXiv 2025
[50]

Evaluate the Reasoning Process - The reasoning process must NOT be empty - It must contain meaningful, coherent, and logical reasoning steps - It should include analysis of constraints, assumptions, or decision logic - If the reasoning process is missing, empty, superficial, or logically flawed, mark it as unreasonable

work page
[51]

Check Consistency Between Reasoning Process and Final Plan - The final plan must be logically derivable from the reasoning process - There should be no contradictions between the reasoning process and the final plan - If the reasoning supports one conclusion but the final plan states another, mark them as inconsistent

work page
[52]

Yes” or “No

Provide a Clear Judgment and Explanation Only output a single “Yes” or “No”. Do not provide other explanations or text. 23 Table 14: Full prompt used for planning agent. Usage Prompt System Prompt You are a professional image restoration assistant. You will be given an image as input. Your task is to:

work page
[53]

Visually analyze the image and identify what degradations it contains

work page
[54]

Design an optimal sequence of restoration tool calls to enhance the image quality. # Possible Degradations: - noise - rain - haze - defocus_blur - motion_blur - low_resolution - jpeg # Tools from Restormer - restormer.gaussian_denoise_15 - restormer.gaussian_denoise_25 - restormer.gaussian_denoise_50 - restormer.derain - restormer.defocus_deblur - restorm...

work page

[1] [1]

Not just streaks: Towards ground truth for single image deraining

Yunhao Ba, Howard Zhang, Ethan Yang, Akira Suzuki, Arnold Pfahnl, Chethan Chinder Chandrappa, Celso de Melo, Suya You, Stefano Soatto, Alex Wong, and Achuta Kadambi. Not just streaks: Towards ground truth for single image deraining. InECCV, 2022

work page 2022

[2] [2]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

Pangu embedded: An efficient dual-system llm reasoner with metacognition.arXiv preprint arXiv:2505.22375, 2025

Hanting Chen, Yasheng Wang, Kai Han, Dong Li, Lin Li, Zhenni Bi, Jinpeng Li, Haoyu Wang, Fei Mi, Mingjian Zhu, et al. Pangu embedded: An efficient dual-system llm reasoner with metacognition.arXiv preprint arXiv:2505.22375, 2025

work page arXiv 2025

[4] [4]

Restoreagent: Autonomous image restoration agent via multimodal large language models.Advances in Neural Information Processing Systems, 37:110643–110666, 2024

Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Sixiang Chen, Tian Ye, Renjing Pei, Kaiwen Zhou, Fenglong Song, and Lei Zhu. Restoreagent: Autonomous image restoration agent via multimodal large language models.Advances in Neural Information Processing Systems, 37:110643–110666, 2024

work page 2024

[5] [5]

Bidirectional multi-scale implicit neural rep- resentations for image deraining

Xiang Chen, Jinshan Pan, and Jiangxin Dong. Bidirectional multi-scale implicit neural rep- resentations for image deraining. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25627–25636, 2024

work page 2024

[6] [6]

A comparative study of image restoration networks for general backbone network design

Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, and Chao Dong. A comparative study of image restoration networks for general backbone network design. In European Conference on Computer Vision, pages 74–91. Springer, 2024

work page 2024

[7] [7]

Dea-net: Single image dehazing based on detail- enhanced convolution and content-guided attention.IEEE transactions on image processing, 33:1002–1015, 2024

Zixuan Chen, Zewei He, and Zhe-Ming Lu. Dea-net: Single image dehazing based on detail- enhanced convolution and content-guided attention.IEEE transactions on image processing, 33:1002–1015, 2024

work page 2024

[8] [8]

Instructir: High-quality image restoration following human instructions

Marcos V Conde, Gregor Geigle, and Radu Timofte. Instructir: High-quality image restoration following human instructions. InEuropean Conference on Computer Vision, pages 1–21. Springer, 2024

work page 2024

[9] [9]

Advancing real-world image dehazing: Perspective, modules, and training.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9303–9320, 2024

Yuxin Feng, Long Ma, Xiaozhe Meng, Fan Zhou, Risheng Liu, and Zhuo Su. Advancing real-world image dehazing: Perspective, modules, and training.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12):9303–9320, 2024

work page 2024

[10] [10]

Iterative predictor-critic code decoding for real-world image dehazing

Jiayi Fu, Siyu Liu, Zikun Liu, Chun-Le Guo, Hyunhee Park, Ruiqi Wu, Guoqing Wang, and Chongyi Li. Iterative predictor-critic code decoding for real-world image dehazing. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 12700–12709, 2025

work page 2025

[11] [11]

Efficient frequency-domain image deraining with contrastive regularization

Ning Gao, Xingyu Jiang, Xiuhui Zhang, and Yue Deng. Efficient frequency-domain image deraining with contrastive regularization. InEuropean conference on computer vision, pages 240–257. Springer, 2024

work page 2024

[12] [12]

Image dehazing transformer with transmission-aware 3d position embedding

Chun-Le Guo, Qixin Yan, Saeed Anwar, Runmin Cong, Wenqi Ren, and Chongyi Li. Image dehazing transformer with transmission-aware 3d position embedding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5812–5820, 2022

work page 2022

[13] [13]

From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal

Yun Guo, Xueyao Xiao, Yi Chang, Shumin Deng, and Luxin Yan. From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12097–12107, October 2023. 10

work page 2023

[14] [14]

A survey on all-in-one image restoration: Taxonomy, evaluation and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Junjun Jiang, Zengyuan Zuo, Gang Wu, Kui Jiang, and Xianming Liu. A survey on all-in-one image restoration: Taxonomy, evaluation and future trends.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025

[15] [15]

Multi-agent image restoration.arXiv preprint arXiv:2503.09403, 2025

Xu Jiang, Gehui Li, Bin Chen, and Jian Zhang. Multi-agent image restoration.arXiv preprint arXiv:2503.09403, 2025

work page arXiv 2025

[16] [16]

Autodir: Automatic all-in-one image restoration with latent diffusion

Yitong Jiang, Zhaoyang Zhang, Tianfan Xue, and Jinwei Gu. Autodir: Automatic all-in-one image restoration with latent diffusion. InEuropean Conference on Computer Vision, pages 340–359. Springer, 2024

work page 2024

[17] [17]

Perceptual losses for real-time style transfer and super-resolution

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. InEuropean conference on computer vision, pages 694–711. Springer, 2016

work page 2016

[18] [18]

Musiq: Multi-scale image quality transformer

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021

work page 2021

[19] [19]

Towards ef- fective multiple-in-one image restoration: A sequential and prompt learning strategy

Xiangtao Kong, Chao Dong, and Lei Zhang. Towards effective multiple-in-one image restora- tion: A sequential and prompt learning strategy.arXiv preprint arXiv:2401.03379, 2024

work page arXiv 2024

[20] [20]

Benchmarking single-image dehazing and beyond.IEEE transactions on image processing, 28(1):492–505, 2018

Boyi Li, Wenqi Ren, Dengpan Fu, Dacheng Tao, Dan Feng, Wenjun Zeng, and Zhangyang Wang. Benchmarking single-image dehazing and beyond.IEEE transactions on image processing, 28(1):492–505, 2018

work page 2018

[21] [21]

All-in-one image restoration for unknown corruption

Boyun Li, Xiao Liu, Peng Hu, Zhongqin Wu, Jiancheng Lv, and Xi Peng. All-in-one image restoration for unknown corruption. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17452–17462, 2022

work page 2022

[22] [22]

Swinir: Image restoration using swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021

work page 2021

[23] [23]

Restore-R1: Efficient Image Restoration Agents via Reinforcement Learning with Multimodal LLM Perceptual Feedback

Jianglin Lu, Yuanwei Wu, Ziyi Zhao, Hongcheng Wang, Felix Jimenez, Abrar Majeedi, and Yun Fu. Simplecall: A lightweight image restoration agent in label-free environments with mllm perceptual feedback.arXiv preprint arXiv:2512.18599, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

Controlling vision-language models for multi-task image restoration

Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sjölund, and Thomas B Schön. Controlling vision-language models for multi-task image restoration. InICLR, 2024

work page 2024

[25] [25]

Promptir: Prompting for all-in-one image restoration.Advances in Neural Information Processing Systems, 36:71275–71293, 2023

Vaishnav Potlapalli, Syed Waqas Zamir, Salman H Khan, and Fahad Shahbaz Khan. Promptir: Prompting for all-in-one image restoration.Advances in Neural Information Processing Systems, 36:71275–71293, 2023

work page 2023

[26] [26]

Progressive image deraining networks: A better and simpler baseline

Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, and Deyu Meng. Progressive image deraining networks: A better and simpler baseline. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3937–3946, 2019

work page 2019

[27] [27]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[28] [28]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[29] [29]

HybridFlow: A Flexible and Efficient RLHF Framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework.arXiv preprint arXiv: 2409.19256, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[30] [30]

OpenAI GPT-5 System Card

Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025. 11

work page internal anchor Pith review Pith/arXiv arXiv 2025

[31] [31]

Kwai keye-vl technical report.arXiv preprint arXiv:2507.01949, 2025

Kwai Keye Team, Biao Yang, Bin Wen, Changyi Liu, Chenglong Chu, Chengru Song, Chongling Rao, Chuan Yi, Da Li, Dunju Zang, et al. Kwai keye-vl technical report.arXiv preprint arXiv:2507.01949, 2025

work page arXiv 2025

[32] [32]

Transweather: Transformer- based restoration of images degraded by adverse weather conditions

Jeya Maria Jose Valanarasu, Rajeev Yasarla, and Vishal M Patel. Transweather: Transformer- based restoration of images degraded by adverse weather conditions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2353–2363, 2022

work page 2022

[33] [33]

Exploring clip for assessing the look and feel of images

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 2555–2563, 2023

work page 2023

[34] [34]

Spatial attentive single-image deraining with a high quality real rain dataset

Tianyu Wang, Xin Yang, Ke Xu, Shaozhe Chen, Qiang Zhang, and Rynson WH Lau. Spatial attentive single-image deraining with a high quality real rain dataset. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12270–12279, 2019

work page 2019

[35] [35]

Uformer: A general u-shaped transformer for image restoration

Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A general u-shaped transformer for image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17683–17693, 2022

work page 2022

[36] [36]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600– 612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600– 612, 2004

work page 2004

[37] [37]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022

[38] [38]

Scdformer: Spatial and channel denoising transformer for human pose estimation using millimeter-wave radar

Qiuxia Wu, Yu Sun, Panpan Cai, and Wenxiong Kang. Scdformer: Spatial and channel denoising transformer for human pose estimation using millimeter-wave radar. In2025 IEEE International Joint Conference on Biometrics (IJCB), pages 1–10. IEEE, 2025

work page 2025

[39] [39]

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank.arXiv e-prints2025, arXiv:2505.14460

Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, and Kede Ma. VisualQuality-R1: Reasoning- induced image quality assessment via reinforcement learning to rank.arXiv preprint arXiv:2505.14460, 2025

work page arXiv 2025

[40] [40]

Towards real-world adverse weather image restoration: Enhancing clearness and semantics with vision- language models

Jiaqi Xu, Mengyang Wu, Xiaowei Hu, Chi-Wing Fu, Qi Dou, and Pheng-Ann Heng. Towards real-world adverse weather image restoration: Enhancing clearness and semantics with vision- language models. InEuropean Conference on Computer Vision, pages 147–164. Springer, 2024

work page 2024

[41] [41]

Maniqa: Multi-dimension attention network for no-reference image quality assessment

Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1191–1200, 2022

work page 2022

[42] [42]

Crafting a toolchain for image restoration by deep reinforcement learning

Ke Yu, Chao Dong, Liang Lin, and Chen Change Loy. Crafting a toolchain for image restoration by deep reinforcement learning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2443–2452, 2018

work page 2018

[43] [43]

Restormer: Efficient transformer for high-resolution image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022

work page 2022

[44] [44]

Multi-stage progressive image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming- Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021

work page 2021

[45] [45]

The unrea- sonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018. 12

work page 2018

[46] [46]

R1-reward: Training multimodal reward model through stable reinforcement learning.arXiv preprint arXiv:2505.02835, 2025

Yi-Fan Zhang, Xingyu Lu, Xiao Hu, Chaoyou Fu, Bin Wen, Tianke Zhang, Changyi Liu, Kaiyu Jiang, Kaibing Chen, Kaiyu Tang, et al. R1-reward: Training multimodal reward model through stable reinforcement learning.arXiv preprint arXiv:2505.02835, 2025

work page arXiv 2025

[47] [47]

Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model

Yingjie Zhou, Jiezhang Cao, Zicheng Zhang, Farong Wen, Yanwei Jiang, Jun Jia, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. Q-agent: Quality-driven chain-of-thought image restoration agent through robust multimodal large language model.arXiv preprint arXiv:2504.07148, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[48] [48]

An intelligent agentic system for complex image restoration problems

Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, and Chao Dong. An intelligent agentic system for complex image restoration problems. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[49] [49]

Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, and Zhengzhong Tu

Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V . Wang, James Zou, Xiaoyu Wang, Ming-Hsuan Yang, and Zhengzhong Tu. 4kagent: Agentic any image to 4k super-resolution. 2025. 13 A Appendix Overview Appendix B provides detailed experimental settings for the empirical studies presented in Section 3. Ap...

work page arXiv 2025

[50] [50]

Evaluate the Reasoning Process - The reasoning process must NOT be empty - It must contain meaningful, coherent, and logical reasoning steps - It should include analysis of constraints, assumptions, or decision logic - If the reasoning process is missing, empty, superficial, or logically flawed, mark it as unreasonable

work page

[51] [51]

Check Consistency Between Reasoning Process and Final Plan - The final plan must be logically derivable from the reasoning process - There should be no contradictions between the reasoning process and the final plan - If the reasoning supports one conclusion but the final plan states another, mark them as inconsistent

work page

[52] [52]

Yes” or “No

Provide a Clear Judgment and Explanation Only output a single “Yes” or “No”. Do not provide other explanations or text. 23 Table 14: Full prompt used for planning agent. Usage Prompt System Prompt You are a professional image restoration assistant. You will be given an image as input. Your task is to:

work page

[53] [53]

Visually analyze the image and identify what degradations it contains

work page

[54] [54]

Design an optimal sequence of restoration tool calls to enhance the image quality. # Possible Degradations: - noise - rain - haze - defocus_blur - motion_blur - low_resolution - jpeg # Tools from Restormer - restormer.gaussian_denoise_15 - restormer.gaussian_denoise_25 - restormer.gaussian_denoise_50 - restormer.derain - restormer.defocus_deblur - restorm...

work page