Recognition: unknown
HP-Edit: A Human-Preference Post-Training Framework for Image Editing
Pith reviewed 2026-05-10 03:21 UTC · model grok-4.3
The pith
A scorer trained on small human preference data enables scalable post-training of image editing models to better match human tastes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By training HP-Scorer on limited human-preference scoring data together with a pretrained VLM, the framework can automatically evaluate and score edited images according to human preferences. This allows efficient construction of the RealPref-50K dataset across eight editing tasks and serves as the reward function for post-training diffusion-based editing models such as Qwen-Image-Edit-2509, resulting in outputs that align more closely with human preference as shown in experiments on RealPref-Bench.
What carries the argument
The HP-Scorer, an automatic human preference-aligned evaluator developed from small human data and a VLM to score editing results.
If this is right
- Editing models post-trained this way will generate results preferred by humans on common tasks like object editing.
- Large-scale preference datasets can be created without proportional increases in human effort.
- A dedicated benchmark RealPref-Bench allows standardized evaluation of real-world editing performance.
- The gap in applying RLHF techniques to diffusion image editing is addressed through this scalable approach.
Where Pith is reading between the lines
- Similar scorers could be developed for other creative AI tasks like video editing or text-to-image generation to automate preference alignment.
- Over time, the scorer could be updated with new human data to adapt to changing preferences or new editing styles.
- This method might lower barriers for smaller teams to fine-tune advanced editing models without access to massive annotation resources.
Load-bearing premise
The HP-Scorer accurately captures unbiased human preferences for a wide range of editing tasks using only a small amount of initial data and a pretrained VLM.
What would settle it
A direct comparison where humans rate the edited outputs from the post-trained model lower than the base model on a held-out set of diverse editing prompts would falsify the effectiveness claim.
Figures
read the original abstract
Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing remains largely unexplored, due to a lack of scalable human-preference datasets and frameworks tailored to diverse editing needs. To fill this gap, we propose HP-Edit, a post-training framework for Human Preference-aligned Editing, and introduce RealPref-50K, a real-world dataset across eight common tasks and balancing common object editing. Specifically, HP-Edit leverages a small amount of human-preference scoring data and a pretrained visual large language model (VLM) to develop HP-Scorer--an automatic, human preference-aligned evaluator. We then use HP-Scorer both to efficiently build a scalable preference dataset and to serve as the reward function for post-training the editing model. We also introduce RealPref-Bench, a benchmark for evaluating real-world editing performance. Extensive experiments demonstrate that our approach significantly enhances models such as Qwen-Image-Edit-2509, aligning their outputs more closely with human preference.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes HP-Edit, a post-training framework for human-preference-aligned image editing. It develops HP-Scorer from a small human-preference seed set plus a pretrained VLM, uses this scorer to automatically label the RealPref-50K dataset across eight editing tasks, and employs the same scorer as the reward signal for RL post-training (e.g., on Qwen-Image-Edit-2509). A new RealPref-Bench is introduced for evaluation, with the central claim that the approach yields outputs significantly better aligned with human preferences than the base model.
Significance. If the HP-Scorer is shown to be accurate and unbiased, the framework would provide a practical route to scalable RLHF for diffusion-based editing models, addressing the noted scarcity of preference datasets and tailored training methods in this domain.
major comments (3)
- [Abstract and Experiments section] The abstract states that 'extensive experiments demonstrate significant enhancement' but supplies no quantitative metrics, baselines, ablation results, or scorer validation statistics (e.g., Pearson/Spearman correlation with held-out human ratings, inter-task consistency, or bias analysis). Because HP-Scorer labels the entire 50K dataset and serves as the RL reward, this omission leaves the central empirical claim without visible supporting evidence.
- [HP-Scorer development and RealPref-50K construction] HP-Scorer is trained on limited human seed data and then used both to construct RealPref-50K labels and as the reward function for post-training. No independent validation (cross-validation on held-out human judgments, error analysis per editing task, or comparison against direct human scoring) is described; any systematic bias in the scorer would be amplified in the preference dataset and directly shape the policy gradient, undermining the claim of genuine human-preference alignment.
- [Experiments and RealPref-Bench evaluation] The evaluation on RealPref-Bench reports improvements for Qwen-Image-Edit-2509 but does not include standard RLHF baselines (e.g., Diffusion-DPO or Flow-GRPO applied without the HP-Edit pipeline) or ablations that isolate the contribution of the scorer-derived reward versus the dataset alone. This makes it impossible to attribute gains specifically to the proposed framework.
minor comments (2)
- [Method] Clarify the exact size and composition of the initial human seed set used to train HP-Scorer, including how many ratings per editing task.
- [Post-training details] Provide the precise RL objective and hyper-parameters used when the HP-Scorer serves as reward (e.g., PPO or GRPO variant, clipping values).
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, outlining the revisions that will be incorporated into the next version of the manuscript to strengthen the empirical support and clarity of our claims.
read point-by-point responses
-
Referee: [Abstract and Experiments section] The abstract states that 'extensive experiments demonstrate significant enhancement' but supplies no quantitative metrics, baselines, ablation results, or scorer validation statistics (e.g., Pearson/Spearman correlation with held-out human ratings, inter-task consistency, or bias analysis). Because HP-Scorer labels the entire 50K dataset and serves as the RL reward, this omission leaves the central empirical claim without visible supporting evidence.
Authors: We agree that the abstract would benefit from greater specificity to make the central claims immediately evident. The Experiments section already contains quantitative results on RealPref-Bench (including preference alignment metrics and model comparisons), but these are not summarized in the abstract. We will revise the abstract to include key quantitative findings such as win rates on human preference judgments and overall improvement scores. We will also add a dedicated subsection on HP-Scorer validation that reports Pearson and Spearman correlations with held-out human ratings, inter-task consistency, and bias analysis. revision: yes
-
Referee: [HP-Scorer development and RealPref-50K construction] HP-Scorer is trained on limited human seed data and then used both to construct RealPref-50K labels and as the reward function for post-training. No independent validation (cross-validation on held-out human judgments, error analysis per editing task, or comparison against direct human scoring) is described; any systematic bias in the scorer would be amplified in the preference dataset and directly shape the policy gradient, undermining the claim of genuine human-preference alignment.
Authors: We acknowledge the importance of rigorous independent validation for the HP-Scorer given its central role. The current manuscript describes the training procedure but does not include the requested validation details. In the revision we will add a new validation subsection that reports cross-validation results on held-out human judgments, per-task error analysis across the eight editing tasks, and direct comparisons of HP-Scorer outputs against additional human scoring. Any detected biases will be quantified and discussed. revision: yes
-
Referee: [Experiments and RealPref-Bench evaluation] The evaluation on RealPref-Bench reports improvements for Qwen-Image-Edit-2509 but does not include standard RLHF baselines (e.g., Diffusion-DPO or Flow-GRPO applied without the HP-Edit pipeline) or ablations that isolate the contribution of the scorer-derived reward versus the dataset alone. This makes it impossible to attribute gains specifically to the proposed framework.
Authors: We agree that additional baselines and ablations are necessary to isolate the contribution of the HP-Edit framework. The current evaluation focuses on the end-to-end improvement but does not include the suggested comparisons. We will expand the Experiments section to include results from Diffusion-DPO and Flow-GRPO applied directly to the base model (without the HP-Edit pipeline) as well as ablations that separately evaluate the scorer-derived reward signal versus training on RealPref-50K alone. revision: yes
Circularity Check
No circularity: derivation uses external human seed data plus pretrained VLM to scale labels and reward, with claims resting on independent benchmark experiments
full rationale
The paper constructs HP-Scorer from a small external human-preference dataset plus a pretrained VLM, then applies the scorer to label RealPref-50K and to supply the RL reward signal. This is a standard semi-supervised scaling step rather than a self-definitional loop or fitted-input prediction. The central claim of improved human alignment is supported by experiments on the separately introduced RealPref-Bench, which is not shown to be constructed from the same scorer outputs in a way that forces the result. No equations, self-citations, or uniqueness theorems are invoked that reduce the final performance gain to the input data by construction. Potential bias propagation is a correctness risk, not a circularity violation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption A pretrained VLM can be adapted with limited human ratings to serve as an accurate proxy for human editing preferences across eight tasks.
- domain assumption Using the scorer to label a larger dataset and as RL reward will produce models that generalize to real-world human preferences.
Forward citations
Cited by 1 Pith paper
-
Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping
Super-Linear Advantage Shaping (SLAS) introduces a non-linear geometric policy update for RL post-training of text-to-image models that reshapes the local policy space via advantage-dependent Fisher-Rao weighting to r...
Reference graph
Works this paper leans on
-
[1]
FLUX.https://github.com/black- forest- labs/flux. 2
-
[2]
com / Stability-AI/StableDiffusion
Stable Diffusion.https : / / github . com / Stability-AI/StableDiffusion. 2
-
[3]
Pixabay.https://pixabay.com. 5
-
[4]
Ntire 2017 challenge on single image super-resolution: Dataset and study
Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. InPro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition workshops, pages 126–135, 2017. 5
2017
-
[5]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Training Diffusion Models with Reinforcement Learning
Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforce- ment learning.arXiv preprint arXiv:2305.13301, 2023. 3
work page internal anchor Pith review arXiv 2023
-
[7]
In- structpix2pix: Learning to follow image editing instructions
Tim Brooks, Aleksander Holynski, and Alexei A Efros. In- structpix2pix: Learning to follow image editing instructions. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 18392–18402, 2023. 2
2023
-
[8]
Emerging Properties in Unified Multimodal Pretraining
Chaorui Deng, Deyao Zhu, Kunchang Li, Chenhui Gou, Feng Li, Zeyu Wang, Shu Zhong, Weihao Yu, Xiaonan Nie, Ziang Song, et al. Emerging properties in unified multimodal pretraining.arXiv preprint arXiv:2505.14683, 2025. 2, 6, 7, 8
work page internal anchor Pith review arXiv 2025
-
[9]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
Zhen Han, Zeyinzi Jiang, Yulin Pan, Jingfeng Zhang, Chao- jie Mao, Chenwei Xie, Yu Liu, and Jingren Zhou. Ace: All- round creator and editor following instructions via diffusion transformer.arXiv preprint arXiv:2410.00086, 2024. 2
-
[11]
Tempflow-grpo: When timing matters for grpo in flow models.arXiv preprint arXiv:2508.04324,
Xiaoxuan He, Siming Fu, Yuke Zhao, Wanli Li, Jian Yang, Dacheng Yin, Fengyun Rao, and Bo Zhang. Tempflow-grpo: When timing matters for grpo in flow models.arXiv preprint arXiv:2508.04324, 2025. 3
-
[12]
Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 2
2020
-
[13]
Zijing Hu, Fengda Zhang, and Kun Kuang. D-fusion: Direct preference optimization for aligning diffusion mod- els with visually consistent samples.arXiv preprint arXiv:2505.22002, 2025. 3
-
[14]
Smartedit: Exploring complex instruction-based image editing with multimodal large lan- guage models
Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, et al. Smartedit: Exploring complex instruction-based image editing with multimodal large lan- guage models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8362– 8371, 2024. 2
2024
-
[15]
Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richard- son, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720, 2024. 2
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[16]
Dual prompting image restoration with diffusion transformers
Dehong Kong, Fan Li, Zhixin Wang, Jiaqi Xu, Renjing Pei, Wenbo Li, and WenQi Ren. Dual prompting image restoration with diffusion transformers. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12809–12819, 2025. 2
2025
-
[17]
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dock- horn, Jack English, Zion English, Patrick Esser, et al. Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space.arXiv preprint arXiv:2506.15742,
work page internal anchor Pith review arXiv
-
[18]
Magiceraser: Erasing any objects via semantics-aware control
Fan Li, Zixiao Zhang, Yi Huang, Jianzhuang Liu, Renjing Pei, Bin Shao, and Songcen Xu. Magiceraser: Erasing any objects via semantics-aware control. InEuropean Confer- ence on Computer Vision, pages 215–231. Springer, 2024. 2
2024
-
[19]
Lsdir: A large scale dataset for image restoration
Yawei Li, Kai Zhang, Jingyun Liang, Jiezhang Cao, Ce Liu, Rui Gong, Yulun Zhang, Hao Tang, Yun Liu, Denis Deman- dolx, et al. Lsdir: A large scale dataset for image restoration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1775–1787, 2023. 5
2023
-
[20]
Brushedit: All-in-one image inpainting and editing
Yaowei Li, Yuxuan Bian, Xuan Ju, Zhaoyang Zhang, Junhao Zhuang, Ying Shan, Yuexian Zou, and Qiang Xu. Brushedit: All-in-one image inpainting and editing.arXiv preprint arXiv:2412.10316, 2024. 2
-
[21]
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Bin Lin, Zongjian Li, Xinhua Cheng, Yuwei Niu, Yang Ye, Xianyi He, Shenghai Yuan, Wangbo Yu, Shaodong Wang, Yunyang Ge, et al. Uniworld: High-resolution semantic en- coders for unified visual understanding and generation.arXiv preprint arXiv:2506.03147, 2025. 2, 6, 7, 8
work page internal anchor Pith review arXiv 2025
-
[22]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014. 2, 5
2014
-
[23]
Flow Matching for Generative Modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matt Le. Flow matching for generative mod- eling.arXiv preprint arXiv:2210.02747, 2022. 2, 3
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[24]
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via on- line rl.arXiv preprint arXiv:2505.05470, 2025. 2, 3
work page internal anchor Pith review arXiv 2025
-
[25]
Videodpo: Omni- preference alignment for video diffusion generation
Runtao Liu, Haoyu Wu, Ziqiang Zheng, Chen Wei, Yingqing He, Renjie Pi, and Qifeng Chen. Videodpo: Omni- preference alignment for video diffusion generation. InPro- ceedings of the Computer Vision and Pattern Recognition Conference, pages 8009–8019, 2025. 3
2025
-
[26]
Step1X-Edit: A Practical Framework for General Image Editing
Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chun- rui Han, et al. Step1x-edit: A practical framework for general image editing.arXiv preprint arXiv:2504.17761, 2025. 2, 6, 7, 8, 1
work page internal anchor Pith review arXiv 2025
-
[27]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022. 2, 3
work page internal anchor Pith review arXiv 2022
-
[28]
Mia-dpo: Multi-image augmented di- rect preference optimization for large vision-language mod- els
Ziyu Liu, Yuhang Zang, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Haodong Duan, Conghui He, Yuanjun Xiong, Dahua Lin, and Jiaqi Wang. Mia-dpo: Multi-image augmented di- rect preference optimization for large vision-language mod- els.arXiv preprint arXiv:2410.17637, 2024. 3
-
[29]
Decoupled weight de- cay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learning Representations, 2019. 5
2019
-
[30]
Yifu Luo, Penghui Du, Bo Li, Sinan Du, Tiantian Zhang, Yongzhe Chang, Kai Wu, Kun Gai, and Xueqian Wang. Sample by step, optimize by chunk: Chunk-level grpo for text-to-image generation.arXiv preprint arXiv:2510.21583,
-
[31]
X2edit: Revisiting arbitrary- instruction image editing through self-constructed data and task-aware representation learning.ICCV, 2025
Jian Ma, Xujie Zhu, Zihao Pan, Qirong Peng, Xu Guo, Chen Chen, and Haonan Lu. X2edit: Revisiting arbitrary- instruction image editing through self-constructed data and task-aware representation learning.ICCV, 2025. 2, 6, 7, 8, 1
2025
-
[32]
Chaojie Mao, Jingfeng Zhang, Yulin Pan, Zeyinzi Jiang, Zhen Han, Yu Liu, and Jingren Zhou. Ace++: Instruction- based image creation and editing via context-aware content filling.arXiv preprint arXiv:2501.02487, 2025. 2
-
[33]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational conference on machine learning, pages 8162–8171. PMLR,
-
[34]
Camedit: Continuous camera parameter control for photorealistic image editing
Xinran Qin, Zhixin Wang, Fan Li, Haoyu Chen, Renjing Pei, Wenbo Li, and Xiaochun Cao. Camedit: Continuous camera parameter control for photorealistic image editing. InThe Thirty-ninth Annual Conference on Neural Information Pro- cessing Systems, 2025. 2
2025
-
[35]
Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems, 36:53728–53741, 2023
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christo- pher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems, 36:53728–53741, 2023. 3
2023
-
[36]
High-resolution image syn- thesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10674– 10685, 2022. 2
2022
-
[37]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of math- ematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. 3, 4
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[38]
Seededit: Align image re-generation to image editing
Yichun Shi, Peng Wang, and Weilin Huang. Seededit: Align image re-generation to image editing.arXiv preprint arXiv:2411.06686, 2024. 2
-
[39]
Deep unsupervised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInternational confer- ence on machine learning, pages 2256–2265. pmlr, 2015. 2
2015
-
[40]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020. 2
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[41]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions.arXiv preprint arXiv:2011.13456, 2020. 2
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[42]
Pocketsr: The super-resolution expert in your pocket mobiles.NIPS, 2025
Haoze Sun, Linfeng Jiang, Fan Li, Renjing Pei, Zhixin Wang, Yong Guo, Jiaqi Xu, Haoyu Chen, Jin Han, Fenglong Song, et al. Pocketsr: The super-resolution expert in your pocket mobiles.NIPS, 2025. 2
2025
-
[43]
MIT press Cambridge, 1998
Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction. MIT press Cambridge, 1998. 2
1998
-
[44]
BalancedDPO: Adaptive Multi-Metric Alignment
Dipesh Tamboli, Souradip Chakraborty, Aditya Malusare, Biplab Banerjee, Amrit Singh Bedi, and Vaneet Aggar- wal. Balanceddpo: Adaptive multi-metric alignment.arXiv preprint arXiv:2503.12575, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[45]
Diffusion model align- ment using direct preference optimization
Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, and Nikhil Naik. Diffusion model align- ment using direct preference optimization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8228–8238, 2024. 2, 3
2024
-
[46]
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Yibin Wang, Zhimin Li, Yuhang Zang, Yujie Zhou, Jiazi Bu, Chunyu Wang, Qinglin Lu, Cheng Jin, and Jiaqi Wang. Pref-grpo: Pairwise preference reward-based grpo for sta- ble text-to-image reinforcement learning.arXiv preprint arXiv:2508.20751, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
Ace: Anti-editing concept erasure in text-to-image models
Zihao Wang, Yuxiang Wei, Fan Li, Renjing Pei, Hang Xu, and Wangmeng Zuo. Ace: Anti-editing concept erasure in text-to-image models. 2025. 2
2025
-
[48]
Chenyang Wu, Jiayi Fu, Chun-Le Guo, Shuhao Han, and Chongyi Li. Vtinker: Guided flow upsampling and texture mapping for high-resolution video frame interpolation.arXiv preprint arXiv:2511.16124, 2025. 2
-
[49]
Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025. 2, 5, 7, 8, 1
work page internal anchor Pith review arXiv 2025
-
[50]
OmniGen2: Towards Instruction-Aligned Multimodal Generation
Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, et al. Omnigen2: Exploration to advanced multimodal generation.arXiv preprint arXiv:2506.18871, 2025. 2, 6, 7, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[51]
DenseDPO: Fine-grained temporal preference optimization for video diffusion models, 2025
Ziyi Wu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Ashkan Mirzaei, Igor Gilitschenski, Sergey Tulyakov, and Aliaksandr Siarohin. Densedpo: Fine-grained temporal preference optimization for video diffusion models.arXiv preprint arXiv:2506.03517, 2025. 3
-
[52]
Omnigen: Unified image genera- tion
Shitao Xiao, Yueze Wang, Junjie Zhou, Huaying Yuan, Xin- grun Xing, Ruiran Yan, Chaofan Li, Shuting Wang, Tiejun Huang, and Zheng Liu. Omnigen: Unified image genera- tion. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 13294–13304, 2025. 2
2025
-
[53]
DanceGRPO: Unleashing GRPO on Visual Generation
Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, et al. Dancegrpo: Unleashing grpo on visual generation.arXiv preprint arXiv:2505.07818, 2025. 2, 3
work page internal anchor Pith review arXiv 2025
-
[54]
Anyedit: Mastering unified high-quality image editing for any idea
Qifan Yu, Wei Chow, Zhongqi Yue, Kaihang Pan, Yang Wu, Xiaoyang Wan, Juncheng Li, Siliang Tang, Hanwang Zhang, and Yueting Zhuang. Anyedit: Mastering unified high-quality image editing for any idea. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26125–26135, 2025. 2
2025
-
[55]
Magicbrush: A manually annotated dataset for instruction- guided image editing.Advances in Neural Information Pro- cessing Systems, 36:31428–31449, 2023
Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, and Yu Su. Magicbrush: A manually annotated dataset for instruction- guided image editing.Advances in Neural Information Pro- cessing Systems, 36:31428–31449, 2023. 2
2023
-
[56]
Ultraedit: Instruction-based fine-grained image editing at scale.Advances in Neural Information Pro- cessing Systems, 37:3058–3093, 2024
Haozhe Zhao, Xiaojian Shawn Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, and Baobao Chang. Ultraedit: Instruction-based fine-grained image editing at scale.Advances in Neural Information Pro- cessing Systems, 37:3058–3093, 2024. 2
2024
-
[57]
Add a person standing on the green turf next to the paragliding harness, wearing a white helmet, holding the paraglider’s control lines
Huaisheng Zhu, Teng Xiao, and Vasant G Honavar. Dspo: Direct score preference optimization for diffusion model alignment. InThe Thirteenth International Conference on Learning Representations, 2025. 3 HP-Edit: A Human-Preference Post-Training Framework for Image Editing Supplementary Material Section S1 provides more details of experiments of the main pap...
2025
-
[58]
Does Image A contain a clearly identifiable subject or main object?
-
[59]
Does the object mentioned in the instruction appear in Image A?
-
[60]
Has the object been successfully removed in Image B?
-
[62]
Scoring Guidelines: • 0: The edited result is completely incorrect, does not follow the Editing Instruction at all, or fails to meet any of the requirements
Does Image B look visually natural and realistic, without artifacts or corrupted regions? In particular, does the region where the object was removed avoid unnatural blur or unnatural shadows? You need to rate the editing result from 0 to 5 based on the accuracy and quality of the edit. Scoring Guidelines: • 0: The edited result is completely incorrect, d...
-
[63]
Is Image A of high quality (clear, undistorted, and visually usable)?
-
[64]
Has the target object been successfully added in Image B?
-
[65]
Are Image A and Image B meaningfully different (not nearly identical)?
-
[66]
Does Image B look visually natural and realistic, without obvious artifacts, corrupted regions, unnatural blur, or unnatural shadows in the region where the object was added?
-
[67]
Scoring Guidelines: • 0: The edited result is completely incorrect, does not follow the Editing Instruction at all, or fails to meet any of the requirements
Do the objects added in Image B follow the given editing instruction accurately (in terms of category, attributes, position, and other specified details)? You need to rate the editing result from 0 to 5 based on the accuracy and quality of the edit. Scoring Guidelines: • 0: The edited result is completely incorrect, does not follow the Editing Instruction...
-
[68]
Does the original Image A contain a clearly identifiable person or object that is required to be replaced according to the editing instruction?
-
[69]
Does the object replacement (swapping) operation described in the instruction satisfy both logical feasibility and a clear, unambiguous description?
-
[70]
Comparing Image B with Image A, has the original object that needs to be replaced in A completely disappeared in B?
-
[71]
Is the replacement object in Image B clear and complete, without missing parts or distorted local shapes?
-
[72]
Does the replacement object in Image B meet the description requirements specified in the instruction (category, attributes, pose, position, etc.)?
-
[73]
Are there no extra objects in Image B that are not required by the editing instruction?
-
[74]
Does Image B completely retain the background information of Image A, without background loss, distortion, or damage?
-
[75]
Does Image B completely retain the parts of the original image that were not mentioned in the editing instruction?
-
[76]
Scoring guidelines: • 0: The edited result is completely incorrect, does not follow the editing instruction at all, or fails to meet any of the requirements
Does Image B look realistic and consistent with physical and real-world logic (no unsupported floating objects, no object penetration, no obvious compositing artifacts)? You need to rate the editing result from 0 to 5 based on the accuracy and quality of the edit. Scoring guidelines: • 0: The edited result is completely incorrect, does not follow the edit...
-
[77]
Does Image A contain a clearly identifiable foreground subject (such as a person or an object)?
-
[78]
Does the editing instruction describe a valid background replacement operation?
-
[79]
Has the background in Image B changed compared to Image A, in accordance with the instruction?
-
[80]
Is the foreground subject preserved correctly in Image B (not missing, distorted, or corrupted)?
-
[81]
Scoring guidelines: • 0: The edited result is completely incorrect, does not follow the editing instruction at all, or fails to meet any of the requirements
Does Image B look visually natural and realistic, without visible artifacts or unnatural blending? You need to rate the editing result from 0 to 5 based on the accuracy and quality of the edit. Scoring guidelines: • 0: The edited result is completely incorrect, does not follow the editing instruction at all, or fails to meet any of the requirements. • 1: ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.