Recognition: unknown
Q-DeepSight: Incentivizing Thinking with Images for Image Quality Assessment and Refinement
Pith reviewed 2026-05-10 06:40 UTC · model grok-4.3
The pith
Q-DeepSight uses interleaved multimodal reasoning with image crops to provide actionable quality feedback for assessment and refinement.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Q-DeepSight performs interleaved Multimodal Chain-of-Thought with tool-augmented evidence acquisition such as crop-and-zoom to determine where quality degrades and why. It is trained via reinforcement learning with Perceptual Curriculum Reward to mitigate reward sparsity and Evidence Gradient Filtering to improve credit assignment for visually-grounded reasoning. This produces state-of-the-art results on benchmarks for natural, restored, and AI-generated images and supports the Perceptual-in-Generation framework in which the model's diagnoses guide iterative enhancement.
What carries the argument
Interleaved Multimodal Chain-of-Thought with tool-augmented evidence acquisition (crop-and-zoom), trained by Perceptual Curriculum Reward and Evidence Gradient Filtering.
Load-bearing premise
The training process using curriculum rewards and evidence filtering will reliably produce rationales grounded in visible image details that improve refinement without creating new errors or reward exploitation.
What would settle it
A test set in which refinements guided by Q-DeepSight's rationales produce lower human preference scores or automated perceptual metrics than refinements guided by baseline feedback or no feedback at all.
Figures
read the original abstract
Image Quality Assessment (IQA) models are increasingly deployed as perceptual critics to guide generative models and image restoration. This role demands not only accurate scores but also actionable, localized feedback. However, current MLLM-based methods adopt a single-look, language-only paradigm, which departs from human evidence-seeking judgment and yields weakly grounded rationales, limiting their reliability for in-the-loop refinement. We propose Q-DeepSight, a think-with-image framework that emulates this human-like process. It performs interleaved Multimodal Chain-of-Thought (iMCoT) with tool-augmented evidence acquisition (e.g., crop-and-zoom) to explicitly determine where quality degrades and why. To train these long iMCoT trajectories via reinforcement learning, we introduce two techniques: Perceptual Curriculum Reward (PCR) to mitigate reward sparsity and Evidence Gradient Filtering (EGF) to improve credit assignment for visually-grounded reasoning. Q-DeepSight achieves state-of-the-art performance across diverse benchmarks, including natural, restored, and AI-generated content. Furthermore, we demonstrate its practical value with Perceptual-in-Generation (PiG), a training-free framework where Q-DeepSight's diagnoses guide iterative image enhancement, effectively closing the loop between assessment and refinement.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Q-DeepSight, a multimodal framework for image quality assessment that performs interleaved Multimodal Chain-of-Thought (iMCoT) reasoning augmented by tool-based evidence acquisition (e.g., crop-and-zoom). It introduces Perceptual Curriculum Reward (PCR) and Evidence Gradient Filtering (EGF) to train these long trajectories via reinforcement learning, claiming state-of-the-art performance on diverse IQA benchmarks (natural, restored, and AI-generated images) and demonstrating practical utility through the training-free Perceptual-in-Generation (PiG) loop in which the model's localized diagnoses iteratively guide image enhancement.
Significance. If the central claims hold, the work would meaningfully advance IQA by shifting from single-look language-only paradigms to human-like, visually grounded evidence-seeking that yields actionable rationales. The PCR and EGF mechanisms for mitigating reward sparsity and improving credit assignment on long multimodal trajectories, together with the closed-loop PiG demonstration, could influence both perceptual evaluation and in-the-loop refinement of generative models, provided the gains are shown to derive from the proposed techniques rather than base model scale or data curation.
major comments (2)
- [Abstract and §3] Abstract and §3 (Method): the claim of SOTA performance across benchmarks is asserted without any reported baselines, ablation studies, quantitative metrics on rationale localization precision, or error analysis. This leaves the contribution of iMCoT + crop-and-zoom, PCR, and EGF unisolated from base MLLM capabilities or prompt engineering.
- [§4 and §5] §4 (Experiments) and §5 (PiG): no quantitative checks are provided on whether PCR/EGF produce reliably grounded rationales, reduce failure modes under iterative refinement, or avoid reward hacking/sparsity issues on long trajectories. Without localization accuracy metrics or PiG failure-rate analysis, the training-free loop's reliability cannot be assessed.
minor comments (2)
- [§2] Clarify the precise definitions, differences, and implementation details of iMCoT, PCR, and EGF in the early sections to improve readability and reproducibility.
- [Figures] Ensure all figures showing example iMCoT trajectories include clear annotations of crop-and-zoom steps and final diagnoses for easier verification of grounding.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important areas for strengthening the empirical support and isolation of contributions. We address each point below and commit to substantial revisions that will include new experiments, metrics, and analyses.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Method): the claim of SOTA performance across benchmarks is asserted without any reported baselines, ablation studies, quantitative metrics on rationale localization precision, or error analysis. This leaves the contribution of iMCoT + crop-and-zoom, PCR, and EGF unisolated from base MLLM capabilities or prompt engineering.
Authors: We acknowledge that the current manuscript asserts SOTA results without sufficiently detailed baseline tables or component ablations in the sections referenced. To address this, we will add explicit comparisons against prior IQA methods (both traditional and MLLM-based) in a revised Section 4, along with ablation studies that isolate iMCoT with crop-and-zoom, PCR, and EGF from base model scale and prompt engineering. We will also introduce quantitative localization precision metrics (e.g., IoU overlap with human-annotated degradation regions on available subsets) and expanded error analysis with categorized failure cases. These additions will be cross-referenced in the abstract and Section 3. The revisions will make the source of performance gains transparent. revision: yes
-
Referee: [§4 and §5] §4 (Experiments) and §5 (PiG): no quantitative checks are provided on whether PCR/EGF produce reliably grounded rationales, reduce failure modes under iterative refinement, or avoid reward hacking/sparsity issues on long trajectories. Without localization accuracy metrics or PiG failure-rate analysis, the training-free loop's reliability cannot be assessed.
Authors: We agree that additional quantitative validation is required to substantiate the effects of PCR and EGF and the reliability of the PiG loop. In the revision, we will add human-rated rationale grounding scores comparing trajectories with and without PCR/EGF, trajectory statistics to assess sparsity mitigation, and checks for reward hacking (e.g., reward variance and length distributions). For PiG, we will report iterative failure rates, perceptual score trajectories over multiple refinement steps, and divergence cases. These results will be presented in expanded Sections 4 and 5 with corresponding figures and tables. revision: yes
Circularity Check
No circularity detected; empirical framework with no self-referential derivations
full rationale
The paper describes an empirical ML framework (iMCoT with crop-and-zoom, trained via PCR and EGF) for IQA and PiG refinement. No equations, first-principles derivations, or parameter-fitting steps appear in the abstract or described method that reduce by construction to the inputs (e.g., no self-definitional rewards, no fitted parameters renamed as predictions, no load-bearing self-citations of uniqueness theorems). Performance claims rest on benchmark results and a training-free loop rather than any closed mathematical chain. This is the common case of a self-contained empirical proposal without the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025. 5, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 6
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Linhan Cao, Wei Sun, Weixia Zhang, Xiangyang Zhu, Jun Jia, Kaiwei Zhang, Dandan Zhu, Guangtao Zhai, and 9 Xiongkuo Min. Vqathinker: Exploring generalizable and ex- plainable video quality assessment via reinforcement learn- ing.arXiv preprint arXiv:2508.06051, 2025. 4
-
[4]
Topiq: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024
Chaofeng Chen, Jiadi Mo, Jingwen Hou, Haoning Wu, Liang Liao, Wenxiu Sun, Qiong Yan, and Weisi Lin. Topiq: A top-down approach from semantics to distortions for image quality assessment.IEEE Transactions on Image Processing, 33:2404–2418, 2024. 6
2024
-
[5]
Toward gen- eralized image quality assessment: Relaxing the perfect ref- erence quality assumption
Du Chen, Tianhe Wu, Kede Ma, and Lei Zhang. Toward gen- eralized image quality assessment: Relaxing the perfect ref- erence quality assumption. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12742– 12752, 2025. 5, 6
2025
-
[6]
Hat: Hybrid attention transformer for image restoration.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 2025
Xiangyu Chen, Xintao Wang, Wenlong Zhang, Xiangtao Kong, Yu Qiao, Jiantao Zhou, and Chao Dong. Hat: Hybrid attention transformer for image restoration.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 2025. 7
2025
-
[7]
Emerging Properties in Unified Multimodal Pretraining
Chaorui Deng, Deyao Zhu, Kunchang Li, Chenhui Gou, Feng Li, Zeyu Wang, Shu Zhong, Weihao Yu, Xiaonan Nie, Ziang Song, et al. Emerging properties in unified multimodal pretraining.arXiv preprint arXiv:2505.14683, 2025. 7
work page internal anchor Pith review arXiv 2025
-
[8]
Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and ma- chine intelligence, 2020
Keyan Ding, Kede Ma, Shiqi Wang, and Eero P Simoncelli. Image quality assessment: Unifying structure and texture similarity.IEEE transactions on pattern analysis and ma- chine intelligence, 2020. 6
2020
-
[9]
Perceptual quality assessment of smartphone photog- raphy
Yuming Fang, Hanwei Zhu, Yan Zeng, Kede Ma, and Zhou Wang. Perceptual quality assessment of smartphone photog- raphy. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 3677–3686,
-
[10]
Massive online crowdsourced study of subjective and objective picture qual- ity.IEEE Transactions on Image Processing, 25(1):372–387,
Deepti Ghadiyaram and Alan C Bovik. Massive online crowdsourced study of subjective and objective picture qual- ity.IEEE Transactions on Image Processing, 25(1):372–387,
-
[11]
Pipal: a large-scale image quality assessment dataset for perceptual image restoration
Jinjin Gu, Haoming Cai, Haoyu Chen, Jimmy Ren Xiaox- ing Ye, and Chao Dong. Pipal: a large-scale image quality assessment dataset for perceptual image restoration. InEu- ropean Conference on Computer Vision (ECCV) 2020, pages 633–651, Cham, 2020. Springer International Publishing. 5
2020
-
[12]
Videoscore: Building automatic met- rics to simulate fine-grained human feedback for video gen- eration
Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, et al. Videoscore: Building automatic met- rics to simulate fine-grained human feedback for video gen- eration. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 2105–2123,
2024
-
[13]
Deepeyesv2: Toward agentic multimodal model
Jack Hong, Chenxiao Zhao, ChengLin Zhu, Weiheng Lu, Guohai Xu, and Xing Yu. Deepeyesv2: Toward agentic mul- timodal model.arXiv preprint arXiv:2511.05271, 2025. 3
-
[14]
Koniq-10k: An ecologically valid database for deep learning of blind image quality assessment.IEEE Transactions on Image Processing, 29:4041–4056, 2020
Vlad Hosu, Hanhe Lin, Tamas Sziranyi, and Dietmar Saupe. Koniq-10k: An ecologically valid database for deep learning of blind image quality assessment.IEEE Transactions on Image Processing, 29:4041–4056, 2020. 5
2020
-
[15]
Uhd-iqa benchmark database: Push- ing the boundaries of blind photo quality assessment
Vlad Hosu, Lorenzo Agnolucci, Oliver Wiedemann, Daisuke Iso, and Dietmar Saupe. Uhd-iqa benchmark database: Push- ing the boundaries of blind photo quality assessment. In European Conference on Computer Vision, pages 467–482. Springer, 2024. 5
2024
-
[16]
Musiq: Multi-scale image quality transformer
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5148–5157, 2021. 2, 6
2021
-
[17]
Xin Lai, Junyi Li, Wei Li, Tao Liu, Tianjian Li, and Hengshuang Zhao. Mini-o3: Scaling up reasoning pat- terns and interaction turns for visual search.arXiv preprint arXiv:2509.07969, 2025. 3
-
[18]
Attentions help cnns see better: Attention-based hybrid image quality assessment network
Shanshan Lao, Yuan Gong, Shuwei Shi, Sidi Yang, Tianhe Wu, Jiahao Wang, Weihao Xia, and Yujiu Yang. Attentions help cnns see better: Attention-based hybrid image quality assessment network. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 1140–1149, 2022. 6
2022
-
[19]
Most apparent distortion: full-reference image quality assessment and the role of strategy.Journal of electronic imaging, 19 (1):011006, 2010
Eric Cooper Larson and Damon Michael Chandler. Most apparent distortion: full-reference image quality assessment and the role of strategy.Journal of electronic imaging, 19 (1):011006, 2010. 5
2010
-
[20]
Agiqa-3k: An open database for ai-generated image quality assessment.IEEE Transactions on Circuits and Sys- tems for Video Technology, 34(8):6833–6846, 2023
Chunyi Li, Zicheng Zhang, Haoning Wu, Wei Sun, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, and Weisi Lin. Agiqa-3k: An open database for ai-generated image quality assessment.IEEE Transactions on Circuits and Sys- tems for Video Technology, 34(8):6833–6846, 2023. 5, 6
2023
-
[21]
Weiqi Li, Xuanyu Zhang, Bin Chen, Jingfen Xie, Yan Wang, Kexin Zhang, Junlin Li, Li Zhang, Jian Zhang, and Shi- jie Zhao. Uare: A unified vision-language model for im- age quality assessment, restoration, and enhancement.arXiv preprint arXiv:2512.06750, 2025. 6, 7
-
[22]
Weiqi Li, Xuanyu Zhang, Shijie Zhao, Yabin Zhang, Junlin Li, Li Zhang, and Jian Zhang. Q-insight: Understanding im- age quality via visual reinforcement learning.arXiv preprint arXiv:2503.22679, 2025. 2, 4, 5, 6
-
[23]
Guoqiang Liang, Jianyi Wang, Zhonghua Wu, and Shangchen Zhou. Zoom-iqa: Image quality assessment with reliable region-aware reasoning.arXiv preprint arXiv:2601.02918, 2026. 3, 5, 6, 8
-
[24]
Kadid-10k: A large-scale artificially distorted iqa database
Hanhe Lin, Vlad Hosu, and Dietmar Saupe. Kadid-10k: A large-scale artificially distorted iqa database. In2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), pages 1–3. IEEE, 2019. 5
2019
-
[25]
Diff- bir: Toward blind image restoration with generative diffusion prior
Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, and Chao Dong. Diff- bir: Toward blind image restoration with generative diffusion prior. InEuropean conference on computer vision, pages 430–448. Springer, 2024. 7
2024
-
[26]
Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023. 3
2023
-
[27]
A no-reference metric for evalu- ating the quality of motion deblurring.ACM Trans
Yiming Liu, Jue Wang, Sunghyun Cho, Adam Finkelstein, and Szymon Rusinkiewicz. A no-reference metric for evalu- ating the quality of motion deblurring.ACM Trans. Graph., 32(6):175–1, 2013. 5, 6
2013
-
[28]
No-reference image quality assessment in the spatial domain.IEEE Transactions on image processing, 21(12): 4695–4708, 2012
Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spatial domain.IEEE Transactions on image processing, 21(12): 4695–4708, 2012. 3
2012
-
[29]
No-reference image quality assessment in the spatial 10 domain.IEEE Transactions on image processing, 21(12): 4695–4708, 2012
Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. No-reference image quality assessment in the spatial 10 domain.IEEE Transactions on image processing, 21(12): 4695–4708, 2012. 3
2012
-
[30]
completely blind
Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal processing letters, 20(3):209–212, 2012. 2
2012
-
[31]
Dacnn: Blind image quality assessment via a distortion-aware convolutional neu- ral network.IEEE Transactions on Circuits and Systems for Video Technology, 32(11):7518–7531, 2022
Zhaoqing Pan, Hao Zhang, Jianjun Lei, Yuming Fang, Xiao Shao, Nam Ling, and Sam Kwong. Dacnn: Blind image quality assessment via a distortion-aware convolutional neu- ral network.IEEE Transactions on Circuits and Systems for Video Technology, 32(11):7518–7531, 2022. 6
2022
-
[32]
Re- iqa: Unsupervised learning for image quality assessment in the wild
Avinab Saha, Sandeep Mishra, and Alan C Bovik. Re- iqa: Unsupervised learning for image quality assessment in the wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5846– 5855, 2023. 2, 5
2023
-
[33]
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Team Seed, Yunpeng Chen, Yu Gao, Lixue Gong, Meng Guo, Qiushan Guo, Zhiyao Guo, Xiaoxia Hou, Weilin Huang, Yixuan Huang, et al. Seedream 4.0: Toward next- generation multimodal image generation.arXiv preprint arXiv:2509.20427, 2025. 7
work page internal anchor Pith review arXiv 2025
-
[34]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of math- ematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. 4
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[35]
Blindly assess image qual- ity in the wild guided by a self-adaptive hyper network
Shaolin Su, Qingsen Yan, Yu Zhu, Cheng Zhang, Xin Ge, Jinqiu Sun, and Yanning Zhang. Blindly assess image qual- ity in the wild guided by a self-adaptive hyper network. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 3667–3676, 2020. 6
2020
-
[36]
Zhaochen Su, Linjie Li, Mingyang Song, Yunzhuo Hao, Zhengyuan Yang, Jun Zhang, Guanjie Chen, Jiawei Gu, Jun- tao Li, Xiaoye Qu, et al. Openthinkimg: Learning to think with images via visual tool reinforcement learning.arXiv preprint arXiv:2505.08617, 2025. 3
-
[37]
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Zhaochen Su, Peng Xia, Hangyu Guo, Zhenhua Liu, Yan Ma, Xiaoye Qu, Jiaqi Liu, Yanshu Li, Kaide Zeng, Zhengyuan Yang, et al. Thinking with images for multimodal reasoning: Foundations, methods, and future frontiers.arXiv preprint arXiv:2506.23918, 2025. 3
work page internal anchor Pith review arXiv 2025
-
[38]
Pixel-level and semantic-level ad- justable super-resolution: A dual-lora approach
Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, and Lei Zhang. Pixel-level and semantic-level ad- justable super-resolution: A dual-lora approach. InProceed- ings of the Computer Vision and Pattern Recognition Con- ference, pages 2333–2343, 2025. 7
2025
-
[39]
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning
Haozhe Wang, Alex Su, Weiming Ren, Fangzhen Lin, and Wenhu Chen. Pixel reasoner: Incentivizing pixel-space rea- soning with curiosity-driven reinforcement learning.arXiv preprint arXiv:2505.15966, 2025. 3, 4
work page internal anchor Pith review arXiv 2025
-
[40]
Ex- ploring clip for assessing the look and feel of images
Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. InPro- ceedings of the AAAI conference on artificial intelligence, pages 2555–2563, 2023. 6
2023
-
[41]
Sinsr: diffusion-based image super- resolution in a single step
Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: diffusion-based image super- resolution in a single step. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25796–25805, 2024. 7
2024
-
[42]
Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, and Jiaqi Wang. Unified multimodal chain-of-thought reward model through reinforcement fine- tuning.arXiv preprint arXiv:2505.03318, 2025. 2
-
[43]
Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 3, 6
2004
-
[44]
Perception-Aware Policy Optimization for Multimodal Reasoning
Zhenhailong Wang, Xuehang Guo, Sofia Stoica, Haiyang Xu, Hongru Wang, Hyeonjeong Ha, Xiusi Chen, Yangyi Chen, Ming Yan, Fei Huang, et al. Perception-aware pol- icy optimization for multimodal reasoning.arXiv preprint arXiv:2507.06448, 2025. 5
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[45]
Hongyang Wei, Shuaizheng Liu, Chun Yuan, and Lei Zhang. Perceive, understand and restore: Real-world image super- resolution with autoregressive multimodal generative mod- els.arXiv preprint arXiv:2503.11073, 2025. 7
-
[46]
Component divide-and-conquer for real-world image super-resolution
Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qix- iang Ye, Wangmeng Zuo, and Liang Lin. Component divide-and-conquer for real-world image super-resolution. In European conference on computer vision, pages 101–117. Springer, 2020. 6
2020
-
[47]
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
Haoning Wu, Zicheng Zhang, Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, et al. Q-align: Teaching lmms for visual scoring via discrete text-defined levels.arXiv preprint arXiv:2312.17090, 2023. 2, 3, 6
work page internal anchor Pith review arXiv 2023
-
[48]
One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Process- ing Systems, 37:92529–92553, 2024
Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Process- ing Systems, 37:92529–92553, 2024. 7
2024
-
[49]
Seesr: Towards semantics- aware real-world image super-resolution
Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics- aware real-world image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024. 7
2024
-
[50]
arXiv preprint arXiv:2505.14460 (2025)
Tianhe Wu, Jian Zou, Jie Liang, Lei Zhang, and Kede Ma. Visualquality-r1: Reasoning-induced image quality assess- ment via reinforcement learning to rank.arXiv preprint arXiv:2505.14460, 2025. 2, 3, 6, 8
-
[51]
Maniqa: Multi-dimension attention network for no-reference image quality assessment
Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1191–1200, 2022. 6
2022
-
[52]
Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization
Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang. Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization. InEuropean conference on computer vision, pages 74–91. Springer, 2024. 7
2024
-
[53]
Descriptive image quality assessment in the wild,
Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, and Tianfan Xue. Descriptive image quality assessment in the wild.arXiv preprint arXiv:2405.18842,
-
[54]
Depicting beyond scores: Advanc- ing image quality assessment through multi-modal language models
Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, and Chao Dong. Depicting beyond scores: Advanc- ing image quality assessment through multi-modal language models. InEuropean Conference on Computer Vision, pages 259–276. Springer, 2024. 6
2024
-
[55]
Teaching large language models to regress accurate image quality scores using score distribution
Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, and Chao Dong. Teaching large language models to regress accurate image quality scores using score distribution. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 14483–14494, 2025. 2, 3, 6
2025
-
[56]
Resshift: Efficient diffusion model for image super- resolution by residual shifting.Advances in Neural Infor- mation Processing Systems, 36:13294–13307, 2023
Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: Efficient diffusion model for image super- resolution by residual shifting.Advances in Neural Infor- mation Processing Systems, 36:13294–13307, 2023. 7
2023
-
[57]
Aiping Zhang, Zongsheng Yue, Renjing Pei, Wenqi Ren, and Xiaochun Cao. Degradation-guided one-step im- age super-resolution with diffusion priors.arXiv preprint arXiv:2409.17058, 2024. 6, 7
-
[58]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6
2018
-
[59]
Blind image quality assessment via vision- language correspondence: A multitask learning perspective
Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. Blind image quality assessment via vision- language correspondence: A multitask learning perspective. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14071–14081, 2023. 6
2023
-
[60]
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
Ziwei Zheng, Michael Yang, Jack Hong, Chenxiao Zhao, Guohai Xu, Le Yang, Chao Shen, and Xing Yu. Deep- eyes: Incentivizing” thinking with images” via reinforce- ment learning.arXiv preprint arXiv:2505.14362, 2025. 3, 4
work page internal anchor Pith review arXiv 2025
-
[61]
Adaptive image quality as- sessment via teaching large multimodal model to compare
Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Bao- liang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, and Shiqi Wang. Adaptive image quality as- sessment via teaching large multimodal model to compare. Advances in Neural Information Processing Systems, 37: 32611–32629, 2024. 6
2024
-
[62]
arXiv preprint arXiv:2410.17809 (2024) 14
Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, and Chao Dong. An intelligent agentic system for complex image restoration problems.arXiv preprint arXiv:2410.17809,
-
[63]
arXiv preprint arXiv:2507.07105 (2025)
Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong V Wang, James Zou, et al. 4kagent: agentic any image to 4k super- resolution.arXiv preprint arXiv:2507.07105, 2025. 7 12
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.