arxiv: 2512.22647 · v2 · submitted 2025-12-27 · 💻 cs.CV

FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution

Yidi Liu , Zihao Fan , Jie Huang , Jie Xiao , Dong Li , Wenlong Zhang , Lei Bai , Xueyang Fu

show 1 more author

Zheng-Jun Zha

This is my paper

Pith reviewed 2026-05-16 18:57 UTC · model grok-4.3

classification 💻 cs.CV

keywords fine-grained reward modelperceptual degradation mapco-evolutionary curriculumRLHFimage super-resolutionreward hackingreal-world ISRFGR-30k dataset

0 comments

The pith

A fine-grained reward model with perceptual maps and co-evolutionary curriculum stabilizes RL training for real-world image super-resolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard global image quality scores used as RL rewards for super-resolution let generators produce local artifacts that still receive high scores, creating reward hacking. The paper introduces FinPercep-RM, an encoder-decoder that adds a Perceptual Degradation Map to spatially locate and score those local defects, trained on the new FGR-30k dataset of real super-resolution distortions. Because the richer signal makes policy learning harder and unstable, the authors pair it with Co-evolutionary Curriculum Learning that starts the reward and generator on simple global feedback and gradually shifts to full fine-grained outputs. This synchronized progression keeps training stable and yields images with stronger global quality and fewer visible local flaws across RLHF-based super-resolution methods.

Core claim

FinPercep-RM supplies both a global quality score and a spatially localized Perceptual Degradation Map that quantifies local defects; when paired with a Co-evolutionary Curriculum Learning mechanism that jointly ramps the reward model and the ISR generator from coarse global signals to the full fine-grained outputs, RL training becomes stable, reward hacking is suppressed, and the resulting super-resolved images show measurable gains in both global perceptual quality and local realism.

What carries the argument

FinPercep-RM, an Encoder-Decoder architecture that outputs a global quality score together with a Perceptual Degradation Map to localize and quantify local defects, combined with the Co-evolutionary Curriculum Learning schedule that synchronizes increasing reward complexity with generator training.

Load-bearing premise

The FGR-30k dataset contains a representative set of subtle real-world super-resolution distortions and the synchronized easy-to-hard curriculum preserves the benefits of fine-grained feedback without creating new training instabilities.

What would settle it

Training an RL-based ISR model with FinPercep-RM but without the CCL schedule either diverges or produces images whose local artifacts remain undetected by the reward model yet still receive high global scores.

Figures

Figures reproduced from arXiv: 2512.22647 by Dong Li, Jie Huang, Jie Xiao, Lei Bai, Wenlong Zhang, Xueyang Fu, Yidi Liu, Zheng-Jun Zha, Zihao Fan.

**Figure 2.** Figure 2: The overall pipeline of the proposed FinPercep-RM and Co-evolutionary Curriculum Learning (CCL) framework. FinPercep-RM [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: FGR-30k construction pipeline. We synthesize fine [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparisons with state-of-the-art Real-ISR methods on on RealSR based on RLHF method of REFL [ [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Reinforcement Learning with Human Feedback (RLHF) has proven effective in image generation field guided by reward models to align human preferences. Motivated by this, adapting RLHF for Image Super-Resolution (ISR) tasks has shown promise in optimizing perceptual quality with Image Quality Assessment (IQA) model as reward models. However, the traditional IQA model usually output a single global score, which are exceptionally insensitive to local and fine-grained distortions. This insensitivity allows ISR models to produce perceptually undesirable artifacts that yield spurious high scores, misaligning optimization objectives with perceptual quality and results in reward hacking. To address this, we propose a Fine-grained Perceptual Reward Model (FinPercep-RM) based on an Encoder-Decoder architecture. While providing a global quality score, it also generates a Perceptual Degradation Map that spatially localizes and quantifies local defects. We specifically introduce the FGR-30k dataset to train this model, consisting of diverse and subtle distortions from real-world super-resolution models. Despite the success of the FinPercep-RM model, its complexity introduces significant challenges in generator policy learning, leading to training instability. To address this, we propose a Co-evolutionary Curriculum Learning (CCL) mechanism, where both the reward model and the ISR model undergo synchronized curricula. The reward model progressively increases in complexity, while the ISR model starts with a simpler global reward for rapid convergence, gradually transitioning to the more complex model outputs. This easy-to-hard strategy enables stable training while suppressing reward hacking. Experiments validates the effectiveness of our method across ISR models in both global quality and local realism on RLHF methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main advance is a fine-grained reward model with a degradation map plus a co-evolutionary curriculum to stabilize RL for real-world super-resolution, though the curriculum's claimed benefits rest on limited visible support.

read the letter

The core contribution is FinPercep-RM, an encoder-decoder reward model that outputs both a global quality score and a spatial Perceptual Degradation Map, trained on the new FGR-30k dataset of subtle real-world SR distortions. They pair it with a Co-evolutionary Curriculum Learning (CCL) setup that starts the ISR generator on simple global rewards and gradually brings in the full map while the reward model itself increases in complexity. This directly targets the known problem that standard IQA rewards let models produce local artifacts that still score high globally, which is a practical issue in RLHF for ISR. The motivation and architecture choice are reasonable and build on existing RLHF ideas without obvious circularity. The dataset and map output are concrete additions that prior global-only models lack. The synchronized easy-to-hard schedule is a sensible engineering response to the added complexity of the fine-grained signal. That said, the abstract's claim that CCL enables stable training and suppresses reward hacking while preserving local realism gains lacks supporting details on the exact schedule, progression steps, or ablations comparing curriculum versus non-curriculum runs on final metrics. Without those, it's hard to confirm the transition mechanism works as described rather than just delaying instability. Experiments are mentioned across ISR models for global and local quality, but the summary gives no numbers, baselines, or variance, so the strength of the results is unclear from what's here. This work is aimed at researchers applying RL to perceptual image restoration who already know the reward-hacking pitfalls. A reader focused on reward design or curriculum methods in low-level vision would find the specific architecture and dataset useful even if the training claims need more backing. It deserves peer review because the problem is real, the proposed components are new, and the gaps are fixable with added ablations and metrics rather than fundamental flaws.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces FinPercep-RM, an Encoder-Decoder reward model that outputs both a global quality score and a spatially localized Perceptual Degradation Map to address the insensitivity of standard IQA models to local distortions in RLHF-based image super-resolution. It presents the FGR-30k dataset of subtle real-world SR artifacts for training and proposes a Co-evolutionary Curriculum Learning (CCL) mechanism that synchronizes progressive complexity increases in the reward model with an easy-to-hard transition in the ISR generator policy, starting from global rewards. The central claim is that this combination enables stable RL training, suppresses reward hacking, and yields improvements in both global perceptual quality and local realism across RLHF ISR methods.

Significance. If the empirical claims hold, the work would be a meaningful contribution to RLHF applications in low-level vision. The spatially explicit reward and synchronized curriculum address a recognized failure mode (reward hacking from global-only scores) in a concrete, deployable way. The introduction of a dedicated fine-grained dataset and the co-evolutionary training protocol are novel elements that could be adopted or extended in subsequent reward-modeling research for generative tasks.

major comments (2)

[Experiments] Experiments section: The claim that CCL enables stable training while preserving the benefits of the full Perceptual Degradation Map lacks any ablation study. No results compare the ISR model trained with versus without the curriculum (or with different transition schedules), so the assertion that the synchronized easy-to-hard strategy both stabilizes convergence and ultimately improves local realism metrics cannot be verified from the presented evidence.
[§3.2] §3.2 (FGR-30k dataset description): The dataset is presented as capturing 'diverse and subtle distortions from real-world super-resolution models,' yet no quantitative characterization (e.g., distribution of distortion types, number of source SR models, or human validation of subtlety) is provided. Without these details it is impossible to assess whether the dataset is representative enough to support the claim that FinPercep-RM generalizes beyond the training distribution.

minor comments (2)

[Abstract] Abstract: 'Experiments validates' is grammatically incorrect and should read 'Experiments validate'.
[§3.1] Notation: The precise mathematical definition of the Perceptual Degradation Map (how the decoder output is normalized and combined with the global score) is not stated explicitly enough for reproduction; an equation or pseudocode block would clarify the reward formulation used in the RL objective.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback and for recognizing the potential of FinPercep-RM and the co-evolutionary curriculum in addressing reward hacking in RLHF-based image super-resolution. We address each major comment below and will revise the manuscript accordingly to strengthen the empirical support and dataset characterization.

read point-by-point responses

Referee: [Experiments] Experiments section: The claim that CCL enables stable training while preserving the benefits of the full Perceptual Degradation Map lacks any ablation study. No results compare the ISR model trained with versus without the curriculum (or with different transition schedules), so the assertion that the synchronized easy-to-hard strategy both stabilizes convergence and ultimately improves local realism metrics cannot be verified from the presented evidence.

Authors: We agree that the manuscript would benefit from explicit ablation studies on the Co-evolutionary Curriculum Learning (CCL) mechanism. In the revised version, we will add new experiments that directly compare the ISR generator trained with CCL against baselines without the curriculum and with alternative transition schedules. These ablations will include quantitative metrics on training stability (such as reward variance and convergence curves) as well as local realism scores to verify that the easy-to-hard strategy stabilizes training while retaining the benefits of the full Perceptual Degradation Map. revision: yes
Referee: [§3.2] §3.2 (FGR-30k dataset description): The dataset is presented as capturing 'diverse and subtle distortions from real-world super-resolution models,' yet no quantitative characterization (e.g., distribution of distortion types, number of source SR models, or human validation of subtlety) is provided. Without these details it is impossible to assess whether the dataset is representative enough to support the claim that FinPercep-RM generalizes beyond the training distribution.

Authors: We acknowledge that the current description of the FGR-30k dataset lacks sufficient quantitative details. In the revised manuscript, we will expand §3.2 to include: the distribution of distortion types, the number and diversity of source super-resolution models used to synthesize the artifacts, and results from human validation studies confirming the subtlety of the distortions. These additions will provide stronger evidence for the dataset's representativeness and support the generalization claims for FinPercep-RM. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces a new Encoder-Decoder-based FinPercep-RM model, a newly constructed FGR-30k dataset of real-world SR distortions, and a Co-evolutionary Curriculum Learning (CCL) mechanism with synchronized easy-to-hard progression. Central claims of stable training, reward-hacking suppression, and improved global/local quality rest on experimental validation of these novel components rather than any self-definitional loops, fitted parameters relabeled as predictions, or load-bearing self-citations. No equations, uniqueness theorems, or ansatzes are shown that reduce outputs to inputs by construction; the derivation chain remains self-contained through independent model design and empirical results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim rests on the new FinPercep-RM architecture, the FGR-30k dataset, and the CCL training strategy, all introduced in the paper without upstream independent evidence or formal verification.

axioms (1)

domain assumption Traditional IQA models can serve as reward models for RL in ISR but suffer from insensitivity to local distortions
Stated in the motivation for developing a fine-grained alternative.

invented entities (3)

FinPercep-RM no independent evidence
purpose: Encoder-decoder model providing global score and perceptual degradation map
Newly proposed reward model architecture.
FGR-30k dataset no independent evidence
purpose: Training data consisting of diverse subtle real-world SR distortions
New dataset introduced for the reward model.
CCL mechanism no independent evidence
purpose: Synchronized curriculum for stable policy learning with complex rewards
New training strategy to address instability.

pith-pipeline@v0.9.0 · 5628 in / 1350 out tokens · 65486 ms · 2026-05-16T18:57:03.954943+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges
cs.LG 2026-04 unverdicted novelty 5.0

The paper introduces the Proxy Compression Hypothesis as a unifying framework explaining reward hacking in RLHF as an emergent result of compressing high-dimensional human objectives into proxy reward signals under op...

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

Dream- clear: high-capacity real-world image restoration with privacy-safe dataset curation

Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, and Hongxia Yang. Dream- clear: high-capacity real-world image restoration with privacy-safe dataset curation. InProceedings of the 38th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2024. Curran Associates Inc. 3, 6

work page 2024
[2]

Towards bet- ter optimization for listwise preference in diffusion models

Jiamu Bai, Xin Yu, Meilong Xu, Weitao Lu, Xin Pan, Kiwan Maeng, Daniel Kifer, Jian Wang, and Yu Wang. Towards bet- ter optimization for listwise preference in diffusion models. arXiv preprint arXiv:2510.01540, 2025. 2

work page arXiv 2025
[3]

Toward real-world single image super-resolution: A new benchmark and a new model

Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. InProceedings of the IEEE/CVF international conference on computer vision, pages 3086–3095, 2019. 6

work page 2019
[4]

Adversarial diffusion compression for real-world image super-resolution

Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, and Lei Zhang. Adversarial diffusion compression for real-world image super-resolution. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 1, 2

work page 2025
[5]

Faithd- iff: Unleashing diffusion priors for faithful image super- resolution

Junyang Chen, Jinshan Pan, and Jiangxin Dong. Faithd- iff: Unleashing diffusion priors for faithful image super- resolution. In2025 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 28188–28197,

work page
[6]

Seagull: No-reference image quality assess- ment for regions of interest via vision-language instruction tuning.arXiv preprint arXiv:2411.10161, 2024

Zewen Chen, Juan Wang, Wen Wang, Sunhan Xu, Hang Xiong, Yun Zeng, Jian Guo, Shuxun Wang, Chunfeng Yuan, Bing Li, et al. Seagull: No-reference image quality assess- ment for regions of interest via vision-language instruction tuning.arXiv preprint arXiv:2411.10161, 2024. 2

work page arXiv 2024
[7]

Taming diffusion prior for image super-resolution with do- main shift sdes

Qinpeng Cui, Xinyi Zhang, Qiqi Bao, Qingmin Liao, Lu Tian, Zicheng Liu, Zhongdao Wang, Emad Barsoum, et al. Taming diffusion prior for image super-resolution with do- main shift sdes. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems. 1, 2

work page
[8]

Learning a deep convolutional network for image super-resolution

Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Learning a deep convolutional network for image super-resolution. InComputer Vision – ECCV 2014, pages 184–199, Cham, 2014. Springer International Publishing. 1, 2

work page 2014
[9]

Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution

Linwei Dong, Qingnan Fan, Yihong Guo, Zhonghao Wang, Qi Zhang, Jinwei Chen, Yawei Luo, and Changqing Zou. Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23174–23184, 2025. 1, 2

work page 2025
[10]

Dit4sr: Taming diffusion transformer for real-world image super-resolution

Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy Ren, Chun-Le Guo, and Chongyi Li. Dit4sr: Taming diffusion transformer for real-world image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision,

work page
[11]

CLIPScore: A reference-free evaluation metric for image captioning

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Pro- cessing, pages 7514–7528, Online and Punta Cana, Domini- can Republic, 2021. Association for Computational Linguis- tics. 2, 3

work page 2021
[12]

Gans trained by a two time-scale update rule converge to a local nash equilib- rium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. InProceedings of the 31st International Conference on Neural Information Processing Systems, page 6629–6640, Red Hook, NY , USA, 2017. Curran Associates Inc. 2, 3

work page 2017
[13]

Pipal: a large-scale image quality assessment dataset for perceptual image restoration

Gu Jinjin, Cai Haoming, Chen Haoyu, Ye Xiaoxing, Jimmy S Ren, and Dong Chao. Pipal: a large-scale image quality assessment dataset for perceptual image restoration. InEuropean conference on computer vision, pages 633–651. Springer, 2020. 6

work page 2020
[14]

Musiq: Multi-scale image quality transformer

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. In2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5128–5137, 2021. 2, 3, 6

work page 2021
[15]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 5

work page 2023
[16]

Pick-a-pic: an open dataset of user preferences for text-to-image generation

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Ma- tiana, Joe Penna, and Omer Levy. Pick-a-pic: an open dataset of user preferences for text-to-image generation. InPro- ceedings of the 37th International Conference on Neural In- formation Processing Systems, Red Hook, NY , USA, 2023. Curran Associates Inc. 2, 3

work page 2023
[17]

Diff- bir: Toward blind image restoration with generative diffusion prior

Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, and Chao Dong. Diff- bir: Toward blind image restoration with generative diffusion prior. InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LIX, page 430–448, Berlin, Heidelberg,

work page 2024
[18]

Springer-Verlag. 3, 6

work page
[19]

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via on- line rl.arXiv preprint arXiv:2505.05470, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. In2023 IEEE/CVF International Confer- ence on Computer Vision (ICCV), pages 4172–4182, 2023. 3

work page 2023
[21]

Fleet, and Mohammad Norouzi

Chitwan Saharia, Jonathan Ho, William Chan, Tim Sali- mans, David J. Fleet, and Mohammad Norouzi. Image super-resolution via iterative refinement.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713– 4726, 2023. 1, 2 9

work page 2023
[22]

Laion-5b: an open large-scale dataset for training next generation image-text models

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Worts- man, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. Laion-5b: an open large-scale dataset for training next generation image-text model...

work page 2022
[23]

DINOv3

Oriane Siméoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 5

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Segmenting and under- standing: Region-aware semantic attention for fine-grained image quality assessment with large language models.arXiv preprint arXiv:2508.07818, 2025

Chenyue Song, Chen Hui, Haiqi Zhu, Feng Jiang, Yachun Mi, Wei Zhang, and Shaohui Liu. Segmenting and under- standing: Region-aware semantic attention for fine-grained image quality assessment with large language models.arXiv preprint arXiv:2508.07818, 2025. 2

work page arXiv 2025
[25]

Coser: Bridging image and language for cognitive super-resolution

Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Ren- jing Pei, Xueyi Zou, Youliang Yan, and Yujiu Yang. Coser: Bridging image and language for cognitive super-resolution. In2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 25868–25878, 2024. 3

work page 2024
[26]

Pixel-level and semantic-level adjustable super-resolution: A dual-lora approach

Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, and Lei Zhang. Pixel-level and semantic-level adjustable super-resolution: A dual-lora approach. 2025. 3

work page 2025
[27]

Diffusion model align- ment using direct preference optimization

Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, and Nikhil Naik. Diffusion model align- ment using direct preference optimization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8228–8238, 2024. 2, 6, 8

work page 2024
[28]

Ex- ploring clip for assessing the look and feel of images

Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. InPro- ceedings of the AAAI conference on artificial intelligence, pages 2555–2563, 2023. 2, 3, 6

work page 2023
[29]

Chan, and Chen Change Loy

Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin C.K. Chan, and Chen Change Loy. Exploiting diffusion prior for real-world image super-resolution. 2024. 2

work page 2024
[30]

Real-esrgan: Training real-world blind super-resolution with pure synthetic data

Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In2021 IEEE/CVF International Con- ference on Computer Vision Workshops (ICCVW), pages 1905–1914, 2021. 1, 2, 4

work page 1905
[31]

Sinsr: diffusion-based image super- resolution in a single step

Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: diffusion-based image super- resolution in a single step. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25796–25805, 2024. 1, 2

work page 2024
[32]

Bovik, H.R

Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4): 600–612, 2004. 3, 6

work page 2004
[33]

Component divide-and-conquer for real-world image super-resolution

Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qix- iang Ye, Wangmeng Zuo, and Liang Lin. Component divide-and-conquer for real-world image super-resolution. In European conference on computer vision, pages 101–117. Springer, 2020. 6

work page 2020
[34]

One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Process- ing Systems, 37:92529–92553, 2024

Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world image super-resolution.Advances in Neural Information Process- ing Systems, 37:92529–92553, 2024. 1, 2

work page 2024
[35]

Seesr: Towards semantics- aware real-world image super-resolution

Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics- aware real-world image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024. 3, 6

work page 2024
[36]

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis.arXiv preprint arXiv:2306.09341,

work page internal anchor Pith review Pith/arXiv arXiv
[37]

Imagereward: learning and evaluating human preferences for text-to-image generation

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: learning and evaluating human preferences for text-to-image generation. InProceedings of the 37th International Con- ference on Neural Information Processing Systems, pages 15903–15935, 2023. 2, 3, 6, 7, 8

work page 2023
[38]

DanceGRPO: Unleashing GRPO on Visual Generation

Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, et al. Dancegrpo: Unleashing grpo on visual generation.arXiv preprint arXiv:2505.07818, 2025. 2, 6, 8

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Maniqa: Multi-dimension attention network for no-reference image quality assessment

Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1191–1200, 2022. 2, 3, 6

work page 2022
[40]

Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization

Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang. Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization. InEuropean conference on computer vision, pages 74–91. Springer, 2024. 3

work page 2024
[41]

Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild

Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25669–25680, 2024. 3

work page 2024
[42]

Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild

Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25669–25680, 2024. 6

work page 2024
[43]

Resshift: efficient diffusion model for image super- resolution by residual shifting

Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: efficient diffusion model for image super- resolution by residual shifting. InProceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2023. Curran Associates Inc. 1, 2

work page 2023
[44]

Arbitrary-steps image super-resolution via diffusion inver- 10 sion

Zongsheng Yue, Kang Liao, and Chen Change Loy. Arbitrary-steps image super-resolution via diffusion inver- 10 sion. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23153–23163, 2025. 1, 2

work page 2025
[45]

Designing a practical degradation model for deep blind image super-resolution

Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timo- fte. Designing a practical degradation model for deep blind image super-resolution. InIEEE International Conference on Computer Vision, pages 4791–4800, 2021. 1, 2

work page 2021
[46]

Adding conditional control to text-to-image diffusion models, 2023

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models, 2023. 3

work page 2023
[47]

Uncertainty-guided perturbation for image super-resolution diffusion model

Leheng Zhang, Weiyi You, Kexuan Shi, and Shuhang Gu. Uncertainty-guided perturbation for image super-resolution diffusion model. In2025 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 17980– 17989, 2025. 1, 2

work page 2025
[48]

Efros, Eli Shecht- man, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018. 3

work page 2018
[49]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6

work page 2018
[50]

Learning multi- dimensional human preference for text-to-image generation

Sixian Zhang, Bohan Wang, Junqiang Wu, Yan Li, Tingt- ing Gao, Di Zhang, and Zhongyuan Wang. Learning multi- dimensional human preference for text-to-image generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8018–8027, 2024. 2, 3

work page 2024
[51]

Blind image quality assessment via vision- language correspondence: A multitask learning perspective

Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. Blind image quality assessment via vision- language correspondence: A multitask learning perspective. InIEEE Conference on Computer Vision and Pattern Recog- nition, pages 14071–14081, 2023. 2, 3, 6

work page 2023
[52]

Image super-resolution using very deep residual channel attention networks

Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super-resolution using very deep residual channel attention networks. InECCV, 2018. 1, 2 11

work page 2018