Recognition: 2 theorem links
· Lean TheoremBird-SR: Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution
Pith reviewed 2026-05-16 06:32 UTC · model grok-4.3
The pith
Bird-SR applies bidirectional reward guidance in diffusion trajectories to super-resolve real-world images while preserving structure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Bird-SR formulates super-resolution as trajectory-level preference optimization via reward feedback learning (ReFL), jointly leveraging synthetic LR-HR pairs and real-world LR images. For structural fidelity easily affected in ReFL, the model is directly optimized on synthetic pairs at early diffusion steps, which also facilitates structure preservation for real-world inputs under smaller distribution gap in structure levels. For perceptual enhancement, quality-guided rewards are applied to both synthetic and real LR images at the later trajectory phase. To mitigate reward hacking, the rewards for synthetic results are formulated in a relative advantage space bounded by their ground-truth, a
What carries the argument
Bidirectional reward-guided diffusion using ReFL with early direct optimization on synthetic pairs for structure, later quality rewards with relative bounding and semantic alignment for perception, and dynamic fidelity-perception weighting across diffusion steps.
If this is right
- Separate optimization phases in diffusion trajectories allow models to use synthetic pairs for structure and real images for perception without one undermining the other.
- Reward feedback learning for images stays stable when synthetic rewards are bounded relative to ground-truth and real rewards are constrained by semantic alignment.
- Dynamic weighting that starts with structure and shifts to perception produces balanced results without manual hyperparameter search at each stage.
- Joint training on synthetic and real data under this scheme yields higher perceptual quality on real-world benchmarks than methods relying only on synthetic pairs.
Where Pith is reading between the lines
- The trajectory-level preference optimization could be adapted to other diffusion restoration tasks such as deblurring or denoising where real and synthetic distributions also differ.
- The semantic alignment constraint might extend to multi-modal or video super-resolution by enforcing consistency across frames or modalities.
- Varying the specific quality metrics used for rewards could test whether the gains hold beyond the benchmarks reported in the paper.
Load-bearing premise
That quality-guided rewards applied at later diffusion steps will reliably enhance perception on real inputs without artifacts or reward hacking even after relative advantage bounding and semantic alignment regularization.
What would settle it
If visual inspection or metrics on standard real-world SR test images show Bird-SR outputs with new artifacts or lower structural similarity scores than the best baseline while perceptual scores are only marginally higher, the central claim would be falsified.
Figures
read the original abstract
Powered by multimodal text-to-image priors, diffusion-based super-resolution excels at synthesizing intricate details; however, models trained on synthetic low-resolution (LR) and high-resolution (HR) image pairs often degrade when applied to real-world LR images due to significant distribution shifts. We propose Bird-SR, a bidirectional reward-guided diffusion framework that formulates super-resolution as trajectory-level preference optimization via reward feedback learning (ReFL), jointly leveraging synthetic LR-HR pairs and real-world LR images. For structural fidelity easily affected in ReFL, the model is directly optimized on synthetic pairs at early diffusion steps, which also facilitates structure preservation for real-world inputs under smaller distribution gap in structure levels. For perceptual enhancement, quality-guided rewards are applied to both synthetic and real LR images at the later trajectory phase. To mitigate reward hacking, the rewards for synthetic results are formulated in a relative advantage space bounded by their ground-truth counterparts, while real-world optimization is regularized via a semantic alignment constraint. Furthermore, to balance structural and perceptual learning, we introduce a dynamic fidelity-perception weighting strategy that emphasizes structure preservation at early stages and progressively shifts focus toward perceptual optimization at later diffusion steps. Extensive experiments on real-world SR benchmarks demonstrate that Bird-SR consistently outperforms state-of-the-art methods in perceptual quality while preserving structural consistency, validating its effectiveness for real-world super-resolution. Our code can be obtained at https://github.com/fanzh03/Bird-SR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Bird-SR, a bidirectional reward-guided diffusion framework for real-world image super-resolution. It formulates SR as trajectory-level preference optimization via ReFL, directly optimizing structural fidelity on synthetic LR-HR pairs at early diffusion steps while applying quality-guided rewards for perceptual enhancement at later steps on both synthetic and real inputs. Mitigations include relative advantage bounding of synthetic rewards by ground-truth and semantic alignment regularization for real inputs, together with a dynamic fidelity-perception weighting schedule. Experiments on real-world SR benchmarks are reported to show consistent outperformance over state-of-the-art methods in perceptual quality while preserving structural consistency.
Significance. If the empirical claims are substantiated, the bidirectional ReFL formulation with explicit early-structure / late-perception separation and the proposed safeguards could advance real-world diffusion SR by reducing reliance on purely synthetic training and mitigating distribution shift. The code release at the cited GitHub repository supports reproducibility and further analysis of the reward-guided trajectory optimization.
major comments (3)
- [§3.3] §3.3: The semantic alignment regularization is presented as sufficient to prevent reward hacking and structural drift on real inputs under distribution shift, yet no quantitative verification (e.g., divergence of reward scores from human-aligned perception or ablation on alignment strength) is supplied; this assumption is load-bearing for the claim that later-stage reward optimization reliably improves perception without artifacts.
- [§4.1] §4.1, Eq. (7)–(9): The dynamic fidelity-perception weighting schedule is introduced as a progressive shift, but the functional form and transition hyperparameters appear chosen without sensitivity analysis or ablation on alternative schedules; the central balance between structure preservation and perceptual gain depends on this choice being robust across datasets.
- [Table 2] Table 2 and §5.2: While perceptual metrics (LPIPS, NIQE) and structural metrics (PSNR, SSIM) are reported to favor Bird-SR, the absence of error bars across random seeds, statistical significance tests, or cross-validation on multiple real-world benchmarks leaves the consistency of the outperformance claim difficult to assess.
minor comments (2)
- [Abstract] Abstract: The summary states that Bird-SR 'consistently outperforms' SOTA methods but supplies no numerical values; including at least the key metric deltas would make the abstract self-contained.
- [§2.2] §2.2: The notation for the bidirectional ReFL objective mixes trajectory-level and step-level terms without an explicit consolidated loss equation; a single boxed equation would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We have revised the manuscript to incorporate additional quantitative verification, sensitivity analyses, and statistical reporting as detailed in the point-by-point responses below.
read point-by-point responses
-
Referee: [§3.3] §3.3: The semantic alignment regularization is presented as sufficient to prevent reward hacking and structural drift on real inputs under distribution shift, yet no quantitative verification (e.g., divergence of reward scores from human-aligned perception or ablation on alignment strength) is supplied; this assumption is load-bearing for the claim that later-stage reward optimization reliably improves perception without artifacts.
Authors: We agree that explicit quantitative verification strengthens the claim. In the revised manuscript we add an ablation on alignment strength λ_align (values 0.1, 0.5, 1.0) together with the KL divergence between reward scores and human-aligned LPIPS on a held-out real-image set. The results confirm that the chosen regularization keeps reward trajectories aligned with perceptual quality and prevents the structural drift observed when λ_align = 0. We also include failure-case visualizations without the constraint. revision: yes
-
Referee: [§4.1] §4.1, Eq. (7)–(9): The dynamic fidelity-perception weighting schedule is introduced as a progressive shift, but the functional form and transition hyperparameters appear chosen without sensitivity analysis or ablation on alternative schedules; the central balance between structure preservation and perceptual gain depends on this choice being robust across datasets.
Authors: We acknowledge the need for sensitivity analysis. The revised supplementary material now reports results for linear, exponential, and step-function schedules with transition points at 20 %, 30 %, and 40 % of diffusion steps. Performance variation across these alternatives remains within 0.3 dB PSNR and 0.01 LPIPS on both RealSR and DRealSR, indicating that the proposed schedule is robust and the fidelity-perception balance does not hinge on a single hyper-parameter choice. revision: yes
-
Referee: [Table 2] Table 2 and §5.2: While perceptual metrics (LPIPS, NIQE) and structural metrics (PSNR, SSIM) are reported to favor Bird-SR, the absence of error bars across random seeds, statistical significance tests, or cross-validation on multiple real-world benchmarks leaves the consistency of the outperformance claim difficult to assess.
Authors: We have updated Table 2 to include mean ± standard deviation computed over five independent random seeds. Paired t-tests against the strongest baseline yield p < 0.01 for both LPIPS and NIQE. In addition, we report results on the extra RealSR benchmark in the supplementary material, confirming consistent ranking across three real-world datasets. revision: yes
Circularity Check
No significant circularity detected in derivation chain
full rationale
The paper introduces a bidirectional reward-guided diffusion framework that combines ReFL with early-stage supervised optimization on synthetic pairs, later-stage quality-guided rewards on both synthetic and real inputs, relative advantage bounding, semantic alignment regularization, and dynamic fidelity-perception weighting. These elements are presented as novel combinations rather than reductions of outputs to inputs by construction. The central performance claims rest on empirical results from real-world SR benchmarks, not on self-referential equations or load-bearing self-citations that would force the result. No self-definitional loops, fitted inputs renamed as predictions, or ansatzes smuggled via citation are exhibited in the provided text.
Axiom & Free-Parameter Ledger
free parameters (1)
- dynamic fidelity-perception weighting schedule
axioms (2)
- domain assumption Synthetic LR-HR pairs supply reliable early-stage structure supervision despite distribution gaps
- domain assumption Quality-guided reward models provide faithful perceptual feedback on both synthetic and real inputs
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
bidirectional reward-guided diffusion framework that formulates super-resolution as trajectory-level preference optimization via reward feedback learning (ReFL), jointly leveraging synthetic LR-HR pairs and real-world LR images... relative advantage space bounded by their ground-truth counterparts... semantic alignment constraint... dynamic fidelity-perception weighting strategy
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
dynamic distortion–perception weighting... λ(t) monotonically decreasing function of timestep t
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges
The paper introduces the Proxy Compression Hypothesis as a unifying framework explaining reward hacking in RLHF as an emergent result of compressing high-dimensional human objectives into proxy reward signals under op...
Reference graph
Works this paper leans on
-
[1]
Ntire 2017 challenge on single image super-resolution: Dataset and study
Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In2017 IEEE Conference on Computer Vision and Pattern Recogni- tion Workshops (CVPRW), pages 1122–1131, Honolulu, HI, USA, 2017. IEEE. 5
work page 2017
-
[2]
Yuang Ai, Xiaoqiang Zhou, Huaibo Huang, Xiaotian Han, Zhengyu Chen, Quanzeng You, and Hongxia Yang. Dream- clear: High-capacity real-world image restoration with privacy-safe dataset curation.Advances in Neural Informa- tion Processing Systems, 37:55443–55469, 2024. 3, 6, 7
work page 2024
-
[3]
Training diffusion models with reinforce- ment learning, 2024
Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforce- ment learning, 2024. 3
work page 2024
-
[4]
The perception-distortion tradeoff
Yochai Blau and Tomer Michaeli. The perception-distortion tradeoff. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6228–6237, Salt Lake City, UT, USA, 2018. IEEE. 8
work page 2018
-
[5]
Toward real-world single image super-resolution: A new benchmark and a new model
Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. In2019 IEEE/CVF In- ternational Conference on Computer Vision (ICCV), pages 3086–3095, Seoul, Korea (South), 2019. IEEE. 6, 7
work page 2019
-
[6]
Chan, Xintao Wang, Xiangyu Xu, Jinwei Gu, and Chen Change Loy
Kelvin C.K. Chan, Xintao Wang, Xiangyu Xu, Jinwei Gu, and Chen Change Loy. Glean: Generative latent bank for large-factor image super-resolution. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14240–14249, Nashville, TN, USA, 2021. IEEE. 3
work page 2021
-
[7]
Adversarial diffusion compression for real-world image super-resolution
Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, and Lei Zhang. Adversarial diffusion compression for real-world image super-resolution. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 28208–28220, Nashville, TN, USA, 2025. IEEE. 3
work page 2025
-
[8]
IQA-PyTorch: Pytorch toolbox for image quality assessment
Chaofeng Chen and Jiadi Mo. IQA-PyTorch: Pytorch toolbox for image quality assessment. [Online]. Avail- able:https : / / github . com / chaofengc / IQA - PyTorch, 2022. 12
work page 2022
-
[9]
Human guided ground-truth generation for realistic image super-resolution
Du Chen, Jie Liang, Xindong Zhang, Ming Liu, Hui Zeng, and Lei Zhang. Human guided ground-truth generation for realistic image super-resolution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14082–14091, Vancouver, BC, Canada, 2023. IEEE. 3
work page 2023
-
[10]
Pre-trained image processing transformer
Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yip- ing Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. Pre-trained image processing transformer. In 2021 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 12294–12305, Nashville, TN, USA, 2021. IEEE. 3
work page 2021
-
[11]
Faithd- iff: Unleashing diffusion priors for faithful image super- resolution
Junyang Chen, Jinshan Pan, and Jiangxin Dong. Faithd- iff: Unleashing diffusion priors for faithful image super- resolution. In2025 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 28188–28197, Nashville, TN, USA, 2025. IEEE. 3
work page 2025
-
[12]
Activating more pixels in image super- resolution transformer
Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong. Activating more pixels in image super- resolution transformer. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22367–22377, Vancouver, BC, Canada, 2023. IEEE. 3
work page 2023
-
[13]
Effective diffusion transformer architecture for image super- resolution
Kun Cheng, Lei Yu, Zhijun Tu, Xiao He, Liyu Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, and Jie Hu. Effective diffusion transformer architecture for image super- resolution. InProceedings of the Thirty-Ninth AAAI Con- ference on Artificial Intelligence and Thirty-Seventh Confer- ence on Innovative Applications of Artificial Intelligence and Fifte...
work page 2025
-
[14]
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differentiable re- wards.arXiv preprint arXiv:2309.17400, 2023. 3, 11
work page internal anchor Pith review arXiv 2023
-
[15]
Taming diffusion prior for image super- resolution with domain shift sdes
Qinpeng Cui, Yixuan Liu, Xinyi Zhang, Qiqi Bao, Qingmin Liao, Li Wang, Tian Lu, Zicheng liu, Zhongdao Wang, and Emad Barsoum. Taming diffusion prior for image super- resolution with domain shift sdes. InProceedings of the 38th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2024. Curran Associates Inc. 3
work page 2024
-
[16]
Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, and Serge Belongie. Flickr 8k dataset, 2024. 5
work page 2024
-
[17]
Second-order attention network for single image super-resolution
Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and Lei Zhang. Second-order attention network for single image super-resolution. In2019 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 11057– 11066, Long Beach, CA, USA, 2019. IEEE. 3
work page 2019
-
[18]
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image super-resolution using deep convolutional net- works.IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2):295–307, 2016. 3
work page 2016
-
[19]
Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution
Linwei Dong, Qingnan Fan, Yihong Guo, Zhonghao Wang, Qi Zhang, Jinwei Chen, Yawei Luo, and Changqing Zou. Tsd-sr: One-step diffusion with target score distillation for real-world image super-resolution. In2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23174–23184, Nashville, TN, USA, 2025. IEEE. 3
work page 2025
-
[20]
Dit4sr: Taming diffusion transformer for real-world image super-resolution
Zheng-Peng Duan, Jiawei Zhang, Xin Jin, Ziheng Zhang, Zheng Xiong, Dongqing Zou, Jimmy Ren, Chun-Le Guo, and Chongyi Li. Dit4sr: Taming diffusion transformer for real-world image super-resolution. InProceedings of the IEEE/CVF International Conference on Computer Vision, Honolulu, Hawaii, USA, 2025. IEEE. 3, 6, 7, 12
work page 2025
-
[21]
Scaling rec- tified flow transformers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, and Robin Rombach. Scaling rec- tified flow transformers for high-resolution image synthesis. InProceedings of the 41st International Conference on Ma- chine Learning, ...
work page 2024
-
[22]
Dpok: 21 Reinforcement learning for fine-tuning text-to-image diffu- sion models, 2023
Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Moham- mad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: 21 Reinforcement learning for fine-tuning text-to-image diffu- sion models, 2023. 3
work page 2023
-
[23]
Vivid: Video virtual try-on using diffusion models,
Zixun Fang, Wei Zhai, Aimin Su, Hongliang Song, Kai Zhu, Mao Wang, Yu Chen, Zhiheng Liu, Yang Cao, and Zheng- Jun Zha. Vivid: Video virtual try-on using diffusion models,
-
[24]
Div8k: Diverse 8k resolution image dataset
Shuhang Gu, Andreas Lugmayr, Martin Danelljan, Manuel Fritsche, Julien Lamour, and Radu Timofte. Div8k: Diverse 8k resolution image dataset. In2019 IEEE/CVF Interna- tional Conference on Computer Vision Workshop (ICCVW), pages 3512–3516, 2019. 5
work page 2019
-
[25]
Denoising dif- fusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. InAdvances in Neural Infor- mation Processing Systems, pages 6840–6851. Curran Asso- ciates, Inc., 2020. 1, 3, 12
work page 2020
-
[26]
LoRA: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations, 2022. 10
work page 2022
-
[27]
Gu Jinjin, Cai Haoming, Chen Haoyu, Ye Xiaoxing, Jimmy S. Ren, and Dong Chao. Pipal: A large-scale image quality assessment dataset for perceptual image restoration. InComputer Vision – ECCV 2020, pages 633–651, Cham,
work page 2020
-
[28]
Springer International Publishing. 8
-
[29]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In2019 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 4396–4405, 2019. 5
work page 2019
-
[30]
Musiq: Multi-scale image quality transformer
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. In2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5128–5137, 2021. 7
work page 2021
-
[31]
Accurate image super-resolution using very deep convolutional net- works
Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very deep convolutional net- works. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1646–1654, 2016. 3
work page 2016
-
[32]
Flux.https://github.com/ black-forest-labs/flux, 2024
Black Forest Labs. Flux.https://github.com/ black-forest-labs/flux, 2024. 1, 3
work page 2024
-
[33]
Swinir: Image restoration using swin transformer
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. In2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pages 1833–1844, 2021. 3
work page 2021
-
[34]
Efficient and degradation-adaptive network for real-world image super- resolution
Jie Liang, Hui Zeng, and Lei Zhang. Efficient and degradation-adaptive network for real-world image super- resolution. InComputer Vision – ECCV 2022, pages 574– 591, Cham, 2022. Springer Nature Switzerland. 3
work page 2022
-
[35]
Enhanced deep residual networks for single image super-resolution
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. InThe IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR) Workshops,
-
[36]
Enhanced deep residual networks for single image super-resolution
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. In2017 IEEE Conference on Com- puter Vision and Pattern Recognition Workshops (CVPRW), pages 1132–1140, 2017. 5
work page 2017
-
[37]
Diff- bir: Toward blind image restoration with generative diffusion prior
Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, and Chao Dong. Diff- bir: Toward blind image restoration with generative diffusion prior. InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LIX, page 430–448, Berlin, Heidelberg,
work page 2024
-
[38]
Springer-Verlag. 3, 7
-
[39]
Visual instruction tuning, 2023
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning, 2023. 12
work page 2023
-
[40]
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via on- line rl.arXiv preprint arXiv:2505.05470, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
Maxime Oquab, Timothée Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Rus- sell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang- Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nico- las Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patrick...
work page 2023
-
[42]
Scalable diffusion models with transformers, 2023
William Peebles and Saining Xie. Scalable diffusion models with transformers, 2023. 3
work page 2023
-
[43]
Sdxl: Improving latent diffusion models for high-resolution image synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis. InInternational Confer- ence on Learning Representations, pages 1862–1874, 2024. 1, 3
work page 2024
-
[44]
Aligning text-to-image diffusion models with reward backpropagation, 2023
Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, and Ka- terina Fragkiadaki. Aligning text-to-image diffusion models with reward backpropagation, 2023. 3
work page 2023
-
[45]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695. IEEE, 2022. 1, 3
work page 2022
-
[46]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. 3
work page 2024
-
[47]
Directly aligning the full diffusion tra- jectory with fine-grained human preference, 2025
Xiangwei Shen, Zhimin Li, Zhantao Yang, Shiyi Zhang, Yingfang Zhang, Donghao Li, Chunyu Wang, Qinglin Lu, and Yansong Tang. Directly aligning the full diffusion tra- jectory with fine-grained human preference, 2025. 3
work page 2025
-
[48]
Score-based generative modeling through stochastic differential equa- tions
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions. InInternational Conference on Learning Represen- tations, 2021. 1, 3
work page 2021
-
[49]
Coser: Bridging image and language for cognitive super-resolution
Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Ren- jing Pei, Xueyi Zou, Youliang Yan, and Yujiu Yang. Coser: Bridging image and language for cognitive super-resolution. In2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 25868–25878, 2024. 3
work page 2024
-
[50]
Lingchen Sun, Rongyuan Wu, Zhengqiang Zhang, Hong- wei Yong, and Lei Zhang. Improving the stability of dif- 22 fusion models for content consistent super-resolution.arXiv preprint arXiv:2401.00877, 2024. 3
-
[51]
Pixel-level and semantic-level ad- justable super-resolution: A dual-lora approach
Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, and Lei Zhang. Pixel-level and semantic-level ad- justable super-resolution: A dual-lora approach. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, 2025. 3
work page 2025
-
[52]
Rfsr: Improving isr diffusion models via reward feedback learning,
Xiaopeng Sun, Qinwei Lin, Yu Gao, Yujie Zhong, Chengjian Feng, Dengjie Li, Zheng Zhao, Jie Hu, and Lin Ma. Rfsr: Improving isr diffusion models via reward feedback learning,
-
[53]
Diffusion model alignment using direct preference optimization, 2023
Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caim- ing Xiong, Shafiq Joty, and Nikhil Naik. Diffusion model alignment using direct preference optimization, 2023. 3
work page 2023
-
[54]
Controlsr: Taming diffusion models for consistent real-world image super reso- lution, 2025
Yuhao Wan, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jin- wei Chen, Ming-Ming Cheng, and Bo Li. Controlsr: Taming diffusion models for consistent real-world image super reso- lution, 2025. 3
work page 2025
-
[55]
Ex- ploring clip for assessing the look and feel of images
Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. In AAAI, 2023. 7
work page 2023
-
[56]
Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin C.K. Chan, and Chen Change Loy. Exploiting diffusion prior for real-world image super-resolution. InInternational Journal of Computer Vision, 2024. 3, 7
work page 2024
-
[57]
Esrgan: En- hanced super-resolution generative adversarial networks
Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: En- hanced super-resolution generative adversarial networks. In The European Conference on Computer Vision Workshops (ECCVW), 2018. 3
work page 2018
-
[58]
Real-esrgan: Training real-world blind super-resolution with pure synthetic data
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InInternational Conference on Com- puter Vision Workshops (ICCVW), 2021. 3, 6
work page 2021
-
[59]
Sinsr: diffusion-based image super- resolution in a single step
Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C Kot, and Bihan Wen. Sinsr: diffusion-based image super- resolution in a single step. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25796–25805, 2024. 3
work page 2024
-
[60]
Dual aggregation convo- lution for image super-resolution
Zhongxun Wang and Zheng Xie. Dual aggregation convo- lution for image super-resolution. In2024 3rd International Conference on Cloud Computing, Big Data Application and Software Engineering (CBASE), pages 470–474, 2024. 3
work page 2024
-
[61]
Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4): 600–612, 2004. 7
work page 2004
-
[62]
Component divide- and-conquer for real-world image super-resolution
Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixi- ang Ye, Wangmeng Zuo, and Liang Lin. Component divide- and-conquer for real-world image super-resolution. InCom- puter Vision – ECCV 2020, pages 101–117, Cham, 2020. Springer International Publishing. 6
work page 2020
-
[63]
Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world image super-resolution.arXiv preprint arXiv:2406.08177, 2024. 3
-
[64]
Seesr: Towards semantics- aware real-world image super-resolution
Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics- aware real-world image super-resolution. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 25456–25467, 2024. 3, 6, 7
work page 2024
-
[65]
DP²o-SR: Direct perceptual preference optimization for real- world image super-resolution
Rongyuan Wu, Lingchen Sun, Zhengqiang ZHANG, Shi- hao Wang, Tianhe Wu, Qiaosi Yi, Shuai Li, and Lei Zhang. DP²o-SR: Direct perceptual preference optimization for real- world image super-resolution. InThe Thirty-ninth An- nual Conference on Neural Information Processing Systems,
-
[66]
Imagereward: learning and evaluating human preferences for text-to-image generation
Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: learning and evaluating human preferences for text-to-image generation. InProceedings of the 37th International Con- ference on Neural Information Processing Systems, pages 15903–15935, 2023. 3
work page 2023
-
[67]
DanceGRPO: Unleashing GRPO on Visual Generation
Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, et al. Dancegrpo: Unleashing grpo on visual generation.arXiv preprint arXiv:2505.07818, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[68]
Maniqa: Multi-dimension attention network for no-reference image quality assessment
Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1191–1200, 2022. 7
work page 2022
-
[69]
Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization, 2024
Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang. Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization, 2024. 3
work page 2024
-
[70]
Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, and Chao Dong. Scaling up to excellence: Practicing model scaling for photo- realistic image restoration in the wild. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25669–25680. IEEE, 2024. 3, 7
work page 2024
-
[71]
Resshift: efficient diffusion model for image super- resolution by residual shifting
Zongsheng Yue, Jianyi Wang, and Chen Change Loy. Resshift: efficient diffusion model for image super- resolution by residual shifting. InProceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2023. Curran Associates Inc. 3, 6, 7
work page 2023
-
[72]
Arbitrary-steps image super-resolution via diffusion inver- sion
Zongsheng Yue, Kang Liao, and Chen Change Loy. Arbitrary-steps image super-resolution via diffusion inver- sion. In2025 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 23153–23163, Nashville, TN, USA, 2025. IEEE. 3
work page 2025
-
[73]
Designing a practical degradation model for deep blind image super-resolution
Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timo- fte. Designing a practical degradation model for deep blind image super-resolution. InIEEE International Conference on Computer Vision, pages 4791–4800, 2021. 3
work page 2021
-
[74]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In 2023 IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 3813–3824, 2023. 3
work page 2023
-
[75]
Uncertainty-guided perturbation for image super-resolution 23 diffusion model
Leheng Zhang, Weiyi You, Kexuan Shi, and Shuhang Gu. Uncertainty-guided perturbation for image super-resolution 23 diffusion model. In2025 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 17980– 17989, Nashville, TN, USA, 2025. IEEE. 3
work page 2025
-
[76]
Efros, Eli Shecht- man, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018. 7
work page 2018
-
[77]
Blind image quality assessment via vision- language correspondence: A multitask learning perspective
Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. Blind image quality assessment via vision- language correspondence: A multitask learning perspective. InIEEE Conference on Computer Vision and Pattern Recog- nition, pages 14071–14081, 2023. 7
work page 2023
-
[78]
Efficient long-range attention network for image super- resolution
Xindong Zhang, Hui Zeng, Shi Guo, and Lei Zhang. Efficient long-range attention network for image super- resolution. InComputer Vision – ECCV 2022, pages 649– 667, Cham, 2022. Springer Nature Switzerland. 3
work page 2022
-
[79]
Image super-resolution using very deep residual channel attention networks
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super-resolution using very deep residual channel attention networks. InComputer Vision – ECCV 2018, pages 294–310, Cham, 2018. Springer Interna- tional Publishing. 3 24
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.