FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution
Pith reviewed 2026-05-16 18:57 UTC · model grok-4.3
The pith
A fine-grained reward model with perceptual maps and co-evolutionary curriculum stabilizes RL training for real-world image super-resolution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FinPercep-RM supplies both a global quality score and a spatially localized Perceptual Degradation Map that quantifies local defects; when paired with a Co-evolutionary Curriculum Learning mechanism that jointly ramps the reward model and the ISR generator from coarse global signals to the full fine-grained outputs, RL training becomes stable, reward hacking is suppressed, and the resulting super-resolved images show measurable gains in both global perceptual quality and local realism.
What carries the argument
FinPercep-RM, an Encoder-Decoder architecture that outputs a global quality score together with a Perceptual Degradation Map to localize and quantify local defects, combined with the Co-evolutionary Curriculum Learning schedule that synchronizes increasing reward complexity with generator training.
Load-bearing premise
The FGR-30k dataset contains a representative set of subtle real-world super-resolution distortions and the synchronized easy-to-hard curriculum preserves the benefits of fine-grained feedback without creating new training instabilities.
What would settle it
Training an RL-based ISR model with FinPercep-RM but without the CCL schedule either diverges or produces images whose local artifacts remain undetected by the reward model yet still receive high global scores.
Figures
read the original abstract
Reinforcement Learning with Human Feedback (RLHF) has proven effective in image generation field guided by reward models to align human preferences. Motivated by this, adapting RLHF for Image Super-Resolution (ISR) tasks has shown promise in optimizing perceptual quality with Image Quality Assessment (IQA) model as reward models. However, the traditional IQA model usually output a single global score, which are exceptionally insensitive to local and fine-grained distortions. This insensitivity allows ISR models to produce perceptually undesirable artifacts that yield spurious high scores, misaligning optimization objectives with perceptual quality and results in reward hacking. To address this, we propose a Fine-grained Perceptual Reward Model (FinPercep-RM) based on an Encoder-Decoder architecture. While providing a global quality score, it also generates a Perceptual Degradation Map that spatially localizes and quantifies local defects. We specifically introduce the FGR-30k dataset to train this model, consisting of diverse and subtle distortions from real-world super-resolution models. Despite the success of the FinPercep-RM model, its complexity introduces significant challenges in generator policy learning, leading to training instability. To address this, we propose a Co-evolutionary Curriculum Learning (CCL) mechanism, where both the reward model and the ISR model undergo synchronized curricula. The reward model progressively increases in complexity, while the ISR model starts with a simpler global reward for rapid convergence, gradually transitioning to the more complex model outputs. This easy-to-hard strategy enables stable training while suppressing reward hacking. Experiments validates the effectiveness of our method across ISR models in both global quality and local realism on RLHF methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces FinPercep-RM, an Encoder-Decoder reward model that outputs both a global quality score and a spatially localized Perceptual Degradation Map to address the insensitivity of standard IQA models to local distortions in RLHF-based image super-resolution. It presents the FGR-30k dataset of subtle real-world SR artifacts for training and proposes a Co-evolutionary Curriculum Learning (CCL) mechanism that synchronizes progressive complexity increases in the reward model with an easy-to-hard transition in the ISR generator policy, starting from global rewards. The central claim is that this combination enables stable RL training, suppresses reward hacking, and yields improvements in both global perceptual quality and local realism across RLHF ISR methods.
Significance. If the empirical claims hold, the work would be a meaningful contribution to RLHF applications in low-level vision. The spatially explicit reward and synchronized curriculum address a recognized failure mode (reward hacking from global-only scores) in a concrete, deployable way. The introduction of a dedicated fine-grained dataset and the co-evolutionary training protocol are novel elements that could be adopted or extended in subsequent reward-modeling research for generative tasks.
major comments (2)
- [Experiments] Experiments section: The claim that CCL enables stable training while preserving the benefits of the full Perceptual Degradation Map lacks any ablation study. No results compare the ISR model trained with versus without the curriculum (or with different transition schedules), so the assertion that the synchronized easy-to-hard strategy both stabilizes convergence and ultimately improves local realism metrics cannot be verified from the presented evidence.
- [§3.2] §3.2 (FGR-30k dataset description): The dataset is presented as capturing 'diverse and subtle distortions from real-world super-resolution models,' yet no quantitative characterization (e.g., distribution of distortion types, number of source SR models, or human validation of subtlety) is provided. Without these details it is impossible to assess whether the dataset is representative enough to support the claim that FinPercep-RM generalizes beyond the training distribution.
minor comments (2)
- [Abstract] Abstract: 'Experiments validates' is grammatically incorrect and should read 'Experiments validate'.
- [§3.1] Notation: The precise mathematical definition of the Perceptual Degradation Map (how the decoder output is normalized and combined with the global score) is not stated explicitly enough for reproduction; an equation or pseudocode block would clarify the reward formulation used in the RL objective.
Simulated Author's Rebuttal
Thank you for the constructive feedback and for recognizing the potential of FinPercep-RM and the co-evolutionary curriculum in addressing reward hacking in RLHF-based image super-resolution. We address each major comment below and will revise the manuscript accordingly to strengthen the empirical support and dataset characterization.
read point-by-point responses
-
Referee: [Experiments] Experiments section: The claim that CCL enables stable training while preserving the benefits of the full Perceptual Degradation Map lacks any ablation study. No results compare the ISR model trained with versus without the curriculum (or with different transition schedules), so the assertion that the synchronized easy-to-hard strategy both stabilizes convergence and ultimately improves local realism metrics cannot be verified from the presented evidence.
Authors: We agree that the manuscript would benefit from explicit ablation studies on the Co-evolutionary Curriculum Learning (CCL) mechanism. In the revised version, we will add new experiments that directly compare the ISR generator trained with CCL against baselines without the curriculum and with alternative transition schedules. These ablations will include quantitative metrics on training stability (such as reward variance and convergence curves) as well as local realism scores to verify that the easy-to-hard strategy stabilizes training while retaining the benefits of the full Perceptual Degradation Map. revision: yes
-
Referee: [§3.2] §3.2 (FGR-30k dataset description): The dataset is presented as capturing 'diverse and subtle distortions from real-world super-resolution models,' yet no quantitative characterization (e.g., distribution of distortion types, number of source SR models, or human validation of subtlety) is provided. Without these details it is impossible to assess whether the dataset is representative enough to support the claim that FinPercep-RM generalizes beyond the training distribution.
Authors: We acknowledge that the current description of the FGR-30k dataset lacks sufficient quantitative details. In the revised manuscript, we will expand §3.2 to include: the distribution of distortion types, the number and diversity of source super-resolution models used to synthesize the artifacts, and results from human validation studies confirming the subtlety of the distortions. These additions will provide stronger evidence for the dataset's representativeness and support the generalization claims for FinPercep-RM. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper introduces a new Encoder-Decoder-based FinPercep-RM model, a newly constructed FGR-30k dataset of real-world SR distortions, and a Co-evolutionary Curriculum Learning (CCL) mechanism with synchronized easy-to-hard progression. Central claims of stable training, reward-hacking suppression, and improved global/local quality rest on experimental validation of these novel components rather than any self-definitional loops, fitted parameters relabeled as predictions, or load-bearing self-citations. No equations, uniqueness theorems, or ansatzes are shown that reduce outputs to inputs by construction; the derivation chain remains self-contained through independent model design and empirical results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Traditional IQA models can serve as reward models for RL in ISR but suffer from insensitivity to local distortions
invented entities (3)
-
FinPercep-RM
no independent evidence
-
FGR-30k dataset
no independent evidence
-
CCL mechanism
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges
The paper introduces the Proxy Compression Hypothesis as a unifying framework explaining reward hacking in RLHF as an emergent result of compressing high-dimensional human objectives into proxy reward signals under op...
Reference graph
Works this paper leans on
-
[1]
Dream- clear: high-capacity real-world image restoration with privacy-safe dataset curation
Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, and Hongxia Yang. Dream- clear: high-capacity real-world image restoration with privacy-safe dataset curation. InProceedings of the 38th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2024. Curran Associates Inc. 3, 6
work page 2024
-
[2]
Towards bet- ter optimization for listwise preference in diffusion models
Jiamu Bai, Xin Yu, Meilong Xu, Weitao Lu, Xin Pan, Kiwan Maeng, Daniel Kifer, Jian Wang, and Yu Wang. Towards bet- ter optimization for listwise preference in diffusion models. arXiv preprint arXiv:2510.01540, 2025. 2
-
[3]
Toward real-world single image super-resolution: A new benchmark and a new model
Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. InProceedings of the IEEE/CVF international conference on computer vision, pages 3086–3095, 2019. 6
work page 2019
-
[4]
Adversarial diffusion compression for real-world image super-resolution
Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, and Lei Zhang. Adversarial diffusion compression for real-world image super-resolution. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 1, 2
work page 2025
-
[5]
Faithd- iff: Unleashing diffusion priors for faithful image super- resolution
Junyang Chen, Jinshan Pan, and Jiangxin Dong. Faithd- iff: Unleashing diffusion priors for faithful image super- resolution. In2025 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 28188–28197,
-
[6]
Zewen Chen, Juan Wang, Wen Wang, Sunhan Xu, Hang Xiong, Yun Zeng, Jian Guo, Shuxun Wang, Chunfeng Yuan, Bing Li, et al. Seagull: No-reference image quality assess- ment for regions of interest via vision-language instruction tuning.arXiv preprint arXiv:2411.10161, 2024. 2
-
[7]
Taming diffusion prior for image super-resolution with do- main shift sdes
Qinpeng Cui, Xinyi Zhang, Qiqi Bao, Qingmin Liao, Lu Tian, Zicheng Liu, Zhongdao Wang, Emad Barsoum, et al. Taming diffusion prior for image super-resolution with do- main shift sdes. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems. 1, 2
-
[8]
Learning a deep convolutional network for image super-resolution
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InComputer Vision – ECCV 2014, pages 184–199, Cham, 2014. Springer International Publishing. 1, 2
work page 2014
-
[9]
Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution
Linwei Dong, Qingnan Fan, Yihong Guo, Zhonghao Wang, Qi Zhang, Jinwei Chen, Yawei Luo, and Changqing Zou. Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23174–23184, 2025. 1, 2
work page 2025
-
[10]
Dit4sr: Taming diffusion transformer for real-world image super-resolution
Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy Ren, Chun-Le Guo, and Chongyi Li. Dit4sr: Taming diffusion transformer for real-world image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision,
-
[11]
CLIPScore: A reference-free evaluation metric for image captioning
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Pro- cessing, pages 7514–7528, Online and Punta Cana, Domini- can Republic, 2021. Association for Computational Linguis- tics. 2, 3
work page 2021
-
[12]
Gans trained by a two time-scale update rule converge to a local nash equilib- rium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. InProceedings of the 31st International Conference on Neural Information Processing Systems, page 6629–6640, Red Hook, NY , USA, 2017. Curran Associates Inc. 2, 3
work page 2017
-
[13]
Pipal: a large-scale image quality assessment dataset for perceptual image restoration
Gu Jinjin, Cai Haoming, Chen Haoyu, Ye Xiaoxing, Jimmy S Ren, and Dong Chao. Pipal: a large-scale image quality assessment dataset for perceptual image restoration. InEuropean conference on computer vision, pages 633–651. Springer, 2020. 6
work page 2020
-
[14]
Musiq: Multi-scale image quality transformer
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. In2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5128–5137, 2021. 2, 3, 6
work page 2021
-
[15]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 5
work page 2023
-
[16]
Pick-a-pic: an open dataset of user preferences for text-to-image generation
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Ma- tiana, Joe Penna, and Omer Levy. Pick-a-pic: an open dataset of user preferences for text-to-image generation. InPro- ceedings of the 37th International Conference on Neural In- formation Processing Systems, Red Hook, NY , USA, 2023. Curran Associates Inc. 2, 3
work page 2023
-
[17]
Diff- bir: Toward blind image restoration with generative diffusion prior
Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, and Chao Dong. Diff- bir: Toward blind image restoration with generative diffusion prior. InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LIX, page 430–448, Berlin, Heidelberg,
work page 2024
-
[18]
Springer-Verlag. 3, 6
-
[19]
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via on- line rl.arXiv preprint arXiv:2505.05470, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[20]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. In2023 IEEE/CVF International Confer- ence on Computer Vision (ICCV), pages 4172–4182, 2023. 3
work page 2023
-
[21]
Chitwan Saharia, Jonathan Ho, William Chan, Tim Sali- mans, David J. Fleet, and Mohammad Norouzi. Image super-resolution via iterative refinement.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713– 4726, 2023. 1, 2 9
work page 2023
-
[22]
Laion-5b: an open large-scale dataset for training next generation image-text models
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. Laion-5b: an open large-scale dataset for training next generation image-text model...
work page 2022
-
[23]
Oriane Siméoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 5
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
Chenyue Song, Chen Hui, Haiqi Zhu, Feng Jiang, Yachun Mi, Wei Zhang, and Shaohui Liu. Segmenting and under- standing: Region-aware semantic attention for fine-grained image quality assessment with large language models.arXiv preprint arXiv:2508.07818, 2025. 2
-
[25]
Coser: Bridging image and language for cognitive super-resolution
Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Ren- jing Pei, Xueyi Zou, Youliang Yan, and Yujiu Yang. Coser: Bridging image and language for cognitive super-resolution. In2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 25868–25878, 2024. 3
work page 2024
-
[26]
Pixel-level and semantic-level adjustable super-resolution: A dual-lora approach
Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, and Lei Zhang. Pixel-level and semantic-level adjustable super-resolution: A dual-lora approach. 2025. 3
work page 2025
-
[27]
Diffusion model align- ment using direct preference optimization
Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, and Nikhil Naik. Diffusion model align- ment using direct preference optimization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8228–8238, 2024. 2, 6, 8
work page 2024
-
[28]
Ex- ploring clip for assessing the look and feel of images
Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. InPro- ceedings of the AAAI conference on artificial intelligence, pages 2555–2563, 2023. 2, 3, 6
work page 2023
-
[29]
Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin C.K. Chan, and Chen Change Loy. Exploiting diffusion prior for real-world image super-resolution. 2024. 2
work page 2024
-
[30]
Real-esrgan: Training real-world blind super-resolution with pure synthetic data
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In2021 IEEE/CVF International Con- ference on Computer Vision Workshops (ICCVW), pages 1905–1914, 2021. 1, 2, 4
work page 1905
-
[31]
Sinsr: diffusion-based image super- resolution in a single step
Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: diffusion-based image super- resolution in a single step. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25796–25805, 2024. 1, 2
work page 2024
-
[32]
Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4): 600–612, 2004. 3, 6
work page 2004
-
[33]
Component divide-and-conquer for real-world image super-resolution
Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qix- iang Ye, Wangmeng Zuo, and Liang Lin. Component divide-and-conquer for real-world image super-resolution. In European conference on computer vision, pages 101–117. Springer, 2020. 6
work page 2020
-
[34]
Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Process- ing Systems, 37:92529–92553, 2024. 1, 2
work page 2024
-
[35]
Seesr: Towards semantics- aware real-world image super-resolution
Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics- aware real-world image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024. 3, 6
work page 2024
-
[36]
Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis.arXiv preprint arXiv:2306.09341,
work page internal anchor Pith review Pith/arXiv arXiv
-
[37]
Imagereward: learning and evaluating human preferences for text-to-image generation
Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: learning and evaluating human preferences for text-to-image generation. InProceedings of the 37th International Con- ference on Neural Information Processing Systems, pages 15903–15935, 2023. 2, 3, 6, 7, 8
work page 2023
-
[38]
DanceGRPO: Unleashing GRPO on Visual Generation
Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, et al. Dancegrpo: Unleashing grpo on visual generation.arXiv preprint arXiv:2505.07818, 2025. 2, 6, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Maniqa: Multi-dimension attention network for no-reference image quality assessment
Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1191–1200, 2022. 2, 3, 6
work page 2022
-
[40]
Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization
Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang. Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization. InEuropean conference on computer vision, pages 74–91. Springer, 2024. 3
work page 2024
-
[41]
Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25669–25680, 2024. 3
work page 2024
-
[42]
Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25669–25680, 2024. 6
work page 2024
-
[43]
Resshift: efficient diffusion model for image super- resolution by residual shifting
Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: efficient diffusion model for image super- resolution by residual shifting. InProceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2023. Curran Associates Inc. 1, 2
work page 2023
-
[44]
Arbitrary-steps image super-resolution via diffusion inver- 10 sion
Zongsheng Yue, Kang Liao, and Chen Change Loy. Arbitrary-steps image super-resolution via diffusion inver- 10 sion. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23153–23163, 2025. 1, 2
work page 2025
-
[45]
Designing a practical degradation model for deep blind image super-resolution
Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timo- fte. Designing a practical degradation model for deep blind image super-resolution. InIEEE International Conference on Computer Vision, pages 4791–4800, 2021. 1, 2
work page 2021
-
[46]
Adding conditional control to text-to-image diffusion models, 2023
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models, 2023. 3
work page 2023
-
[47]
Uncertainty-guided perturbation for image super-resolution diffusion model
Leheng Zhang, Weiyi You, Kexuan Shi, and Shuhang Gu. Uncertainty-guided perturbation for image super-resolution diffusion model. In2025 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 17980– 17989, 2025. 1, 2
work page 2025
-
[48]
Efros, Eli Shecht- man, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018. 3
work page 2018
-
[49]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6
work page 2018
-
[50]
Learning multi- dimensional human preference for text-to-image generation
Sixian Zhang, Bohan Wang, Junqiang Wu, Yan Li, Tingt- ing Gao, Di Zhang, and Zhongyuan Wang. Learning multi- dimensional human preference for text-to-image generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8018–8027, 2024. 2, 3
work page 2024
-
[51]
Blind image quality assessment via vision- language correspondence: A multitask learning perspective
Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. Blind image quality assessment via vision- language correspondence: A multitask learning perspective. InIEEE Conference on Computer Vision and Pattern Recog- nition, pages 14071–14081, 2023. 2, 3, 6
work page 2023
-
[52]
Image super-resolution using very deep residual channel attention networks
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super-resolution using very deep residual channel attention networks. InECCV, 2018. 1, 2 11
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.