Edit-GRPO: A Locality-Preserving Policy Optimization Framework for Image Editing

Bo Zheng; Houqiang Li; Litong Gong; Shaodong Xu; Tiezheng Ge; Wengang Zhou; Zexian Li; Zhendong Wang

arxiv: 2605.16951 · v1 · pith:VI6JI2JPnew · submitted 2026-05-16 · 💻 cs.CV

Edit-GRPO: A Locality-Preserving Policy Optimization Framework for Image Editing

Shaodong Xu , Zexian Li , Zhendong Wang , Litong Gong , Tiezheng Ge , Wengang Zhou , Bo Zheng , Houqiang Li This is my paper

Pith reviewed 2026-05-19 20:41 UTC · model grok-4.3

classification 💻 cs.CV

keywords image editinglocality preservationpolicy optimizationregion-specific signalscontext distortionboundary consistencyvisual coherence

0 comments

The pith

Edit-GRPO decouples editing and preservation objectives with region-specific signals to keep image edits localized.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that optimization-based image editing often fails at locality because global policy updates alter unintended areas. Edit-GRPO fixes this mismatch by separating the goals for targeted edit regions from those for surrounding preservation regions. Distinct optimization signals are applied according to spatial masks so that updates respect the intended edit boundaries. This produces cleaner results with fewer distortions while still delivering the requested changes. Anyone using AI tools for photo retouching or creative image modification would see fewer side effects on background elements.

Core claim

Edit-GRPO is a locality-preserving policy optimization framework that explicitly decouples editing and preservation objectives. By assigning region-specific optimization signals to edit and non-edit areas, it aligns policy updates with the spatial structure of editing tasks, enabling localized improvements while maintaining global visual coherence. This design effectively suppresses common artifacts such as context distortion and boundary inconsistency.

What carries the argument

Region-specific optimization signals that decouple editing rewards from preservation constraints during policy updates.

If this is right

Targeted content improves without altering surrounding regions.
Artifacts such as context distortion and boundary inconsistency decrease.
Editing performance stays strong across multiple image editing scenarios.
The same decoupling approach applies to a range of optimization-based editing tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could integrate with automatic region proposal networks to reduce manual mask input.
Extending the region signals to video frames might support consistent edits over time.
Similar objective separation could address locality issues in other generative domains like text-to-image synthesis.

Load-bearing premise

That assigning region-specific optimization signals will align policy updates with spatial structure without needing extra mechanisms to handle boundary effects or context interactions.

What would settle it

An experiment in which Edit-GRPO still produces large unintended changes in non-edit regions on images containing interacting objects near edit boundaries.

Figures

Figures reproduced from arXiv: 2605.16951 by Bo Zheng, Houqiang Li, Litong Gong, Shaodong Xu, Tiezheng Ge, Wengang Zhou, Zexian Li, Zhendong Wang.

**Figure 2.** Figure 2: Visual comparison between global and decoupled optimization. Please zoom in for details. Image editing is fundamentally different from general image generation, as it requires not only precise semantic control over the target content, but also faithful context preservation of the background. However, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed Edit-GRPO pipeline. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of editing capabilities before and after policy optimization. Our [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Training reward curves on FLUX.1-Kontext [Dev]. Edit-GRPO consistently improves locality preservation while retaining competitive editing performance. On GEdit-Bench ( [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Prompt template used for MLLM-based semantic score evaluation. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: The visualization results of Edit-GRPO-FLUX-Kontext [dev]. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: The visualization results of Edit-GRPO-Qwen-Image-Edit [2509]. [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

read the original abstract

A fundamental challenge in image editing lies in preserving spatial locality: edits should improve targeted content without inadvertently altering surrounding regions. However, most optimization-based editing approaches treat images as holistic entities, causing global policy updates that undermine locality and introduce undesired context changes. We observe that this issue stems from a mismatch between localized editing intent and globally applied optimization signals. Motivated by this insight, we propose Edit-GRPO, preserving Locality while optimizing image editing, a locality-preserving policy optimization framework that explicitly decouples editing and preservation objectives. By assigning region-specific optimization signals to edit and non-edit areas, Edit-GRPO aligns policy updates with the spatial structure of editing tasks, enabling localized improvements while maintaining global visual coherence. This design effectively suppresses common artifacts such as context distortion and boundary inconsistency. Extensive experiments across diverse image editing scenarios demonstrate that Edit-GRPO significantly improves locality preservation while maintaining strong editing performance compared to existing optimization-based methods, validating the generality and effectiveness of the proposed framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Edit-GRPO splits policy updates by edit and non-edit regions to reduce unwanted changes in image editing, but the boundary leakage risk from global model components looks like a real soft spot.

read the letter

Edit-GRPO takes the standard policy optimization setup used in image editing and adds region-specific signals so that edit areas get one objective while non-edit areas get another. The goal is to stop global updates from altering stuff outside the target zone. That framing matches a practical pain point in optimization-based editing, where holistic signals often create context distortion or boundary artifacts. If the experiments hold, the approach gives a straightforward way to keep edits more contained without changing the base model much. The paper earns credit for spelling out the mismatch between localized intent and global signals and for testing across multiple editing scenarios rather than one narrow case. The claim that it maintains editing strength while gaining locality is the kind of result that could matter for downstream tools. On the soft side, the decoupling assumes the signals stay isolated, but the stress-test concern holds weight here. Most generative models rely on attention or convolutional mixing that crosses region boundaries, so even clean masks might let gradients or features leak and reintroduce inconsistency. The abstract asserts suppression of those artifacts, yet without seeing the exact masking, loss formulation, or ablations on boundary handling, it is hard to know whether the fix is mechanical or just hopeful. The experiments are summarized at a high level, which leaves room for questions about baseline selection and metric sensitivity. This work sits in the applied computer vision lane for people building controllable editing systems or refining RL-style methods on images. A reader who already works with optimization-based editors would pick up a usable design pattern and some comparative numbers. It is not a foundational shift, but the idea is clear enough and the problem is real enough that a serious referee should look at the implementation details and run checks on the locality metrics. I would send it out for review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper proposes Edit-GRPO, a locality-preserving policy optimization framework for image editing. It argues that global optimization signals in existing methods cause undesired context changes, and introduces region-specific optimization signals assigned separately to edit and non-edit regions to decouple editing and preservation objectives. This alignment with spatial structure is claimed to suppress artifacts such as context distortion and boundary inconsistency while preserving editing performance, with validation via extensive experiments across diverse scenarios showing significant improvements in locality preservation over prior optimization-based methods.

Significance. If the experimental claims hold with rigorous controls, the framework could meaningfully advance optimization-based image editing by addressing a core mismatch between localized intent and global updates in generative models. The explicit decoupling via region-specific signals offers a generalizable design principle that may apply beyond the reported scenarios. No machine-checked proofs or parameter-free derivations are present, but the emphasis on spatial alignment and reproducibility of the approach (if code is released) would strengthen its contribution to the field.

major comments (2)

[§3.2] §3.2 (Region-Specific Optimization): The central mechanism assigns distinct signals to edit vs. non-edit regions to achieve locality without additional boundary handling. However, for models relying on global attention or convolutional receptive fields, this does not address potential gradient propagation or context mixing across region boundaries, which directly undermines the claim that the design 'effectively suppresses' boundary inconsistency and context distortion.
[§5] §5 (Experiments): The abstract and results section describe improvements only at a high level ('significantly improves locality preservation') without reporting specific quantitative metrics, baseline comparisons, statistical tests, or ablation on boundary effects. This makes it impossible to verify whether the reported gains are robust or affected by post-hoc region mask choices, which is load-bearing for the central experimental claim.

minor comments (2)

[§3] Notation for the region masks and optimization signals (e.g., how the preservation signal is formulated mathematically) could be clarified with an explicit equation in §3 to avoid ambiguity in implementation.
[§2] The related work section should include a direct comparison table to recent locality-aware editing methods to better position the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and indicating where revisions have been made to strengthen the presentation and evidence.

read point-by-point responses

Referee: [§3.2] §3.2 (Region-Specific Optimization): The central mechanism assigns distinct signals to edit vs. non-edit regions to achieve locality without additional boundary handling. However, for models relying on global attention or convolutional receptive fields, this does not address potential gradient propagation or context mixing across region boundaries, which directly undermines the claim that the design 'effectively suppresses' boundary inconsistency and context distortion.

Authors: We appreciate the referee's point on the potential limitations of region-specific signals in the presence of global attention mechanisms. In Edit-GRPO, the decoupling is achieved by applying distinct reward functions and computing masked policy gradients separately for edit and non-edit regions. This structure ensures that the preservation objective in non-edit areas actively counters changes that could result from cross-boundary gradient flow. While the original submission relied primarily on empirical validation rather than a dedicated gradient propagation analysis, the consistent reduction in boundary artifacts across experiments supports the effectiveness of this approach. In the revised manuscript, we have expanded the discussion in §3.2 to explicitly address interactions with global attention and added attention map visualizations demonstrating localized update patterns. revision: yes
Referee: [§5] §5 (Experiments): The abstract and results section describe improvements only at a high level ('significantly improves locality preservation') without reporting specific quantitative metrics, baseline comparisons, statistical tests, or ablation on boundary effects. This makes it impossible to verify whether the reported gains are robust or affected by post-hoc region mask choices, which is load-bearing for the central experimental claim.

Authors: We thank the referee for this observation. The experiments in Section 5 report quantitative comparisons using locality preservation metrics and editing performance measures against prior optimization-based methods, along with qualitative results across diverse scenarios. We agree that additional statistical rigor and boundary-specific analysis would enhance verifiability. The revised manuscript now includes explicit numerical values with standard deviations, baseline tables with direct metric comparisons, t-test results for significance, and an ablation study on region mask variations to assess robustness to post-hoc choices. revision: yes

Circularity Check

0 steps flagged

No circularity: framework introduced as explicit design choice

full rationale

The paper motivates Edit-GRPO from an observed mismatch between localized editing intent and global optimization signals, then proposes the framework as a design that assigns region-specific signals to decouple objectives. No equations, fitted parameters, or derivation steps appear in the provided text that would reduce any claim to its own inputs by construction. The approach is presented as a methodological choice rather than a prediction derived from prior results or self-citations, and validation is external via experiments. This keeps the contribution self-contained without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract; the framework is introduced at the level of design motivation rather than formal assumptions or new entities.

pith-pipeline@v0.9.0 · 5722 in / 1065 out tokens · 25871 ms · 2026-05-19T20:41:04.325420+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By assigning region-specific optimization signals to edit and non-edit areas, Edit-GRPO aligns policy updates with the spatial structure of editing tasks... L=λ_sem L_sem + λ_pres L_pres + β L_KL
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We partition the images into edit regions and non-edit regions, and design different rewards separately... logπ_sem_θ = 1/|M| Σ_{k∈M} logπ_θ^(k)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 25 internal anchors

[1]

Humanedit: A high-quality human-rewarded dataset for instruction-based image editing.arXiv preprint arXiv:2412.04280, 2024

Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, and Shuicheng Yan. Humanedit: A high-quality human-rewarded dataset for instruction-based image editing.arXiv preprint arXiv:2412.04280, 2024

work page arXiv 2024
[2]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

arXiv preprint arXiv:2211.09800 , year=

Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instructions.arXiv preprint arXiv:2211.09800, 2022

work page arXiv 2022
[5]

HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer

Qi Cai, Jingwen Chen, Yang Chen, Yehao Li, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Yiheng Zhang, Feng Gao, Peihan Xu, Yimeng Wang, Kai Yu, Wenxuan Chen, Ziwei Feng, Zi-Qiang Gong, Jia-Wern Pan, Yingzhi Peng, Rui Tian, Siyu Wang, Bo Zhao, Ting Yao, and Tao Mei. Hidream-i1: A high-efficient image generative foundation model with sparse diffusion transformer....

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

SAM 3: Segment Anything with Concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Emerging Properties in Unified Multimodal Pretraining

Chaorui Deng, Deyao Zhu, Kunchang Li, Chenhui Gou, Feng Li, Zeyu Wang, Shu Zhong, Weihao Yu, Xiao- Ping Nie, Ziang Song, Shi Guang, and Haoqi Fan. Emerging properties in unified multimodal pretraining. ArXiv, abs/2505.14683, 2025. URLhttps://api.semanticscholar.org/CorpusID:278768720

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024

work page 2024
[9]

arXiv preprint arXiv:2305.16381 , year=

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, P. Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to- image diffusion models.ArXiv, abs/2305.16381, 2023. URL https://api.semanticscholar.org/ CorpusID:258947323

work page arXiv 2023
[10]

Onereward: Unified mask-guided image generation via multi-task human preference learning.arXiv preprint arXiv:2508.21066, 2025

Yuan Gong, Xionghui Wang, Jie Wu, Shiyin Wang, Yitong Wang, and Xinglong Wu. Onereward: Unified mask-guided image generation via multi-task human preference learning.arXiv preprint arXiv:2508.21066, 2025

work page arXiv 2025
[11]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020
[13]

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, et al. Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space.arXiv preprint arXiv:2506.15742, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Junzhe Li, Yutao Cui, Tao Huang, Yinping Ma, Chun Fan, Yiming Cheng, Miles Yang, Zhao Zhong, and Liefeng Bo. Mixgrpo: Unlocking flow-based grpo efficiency with mixed ode-sde.arXiv preprint arXiv:2507.21802, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

arXiv:2510.16888 (2025) 2

Zongjian Li, Zheyuan Liu, Qihui Zhang, Bin Lin, Feize Wu, Shenghai Yuan, Zhiyuan Yan, Yang Ye, Wangbo Yu, Yuwei Niu, et al. Uniworld-v2: Reinforce image editing with diffusion negative-aware finetuning and mllm implicit feedback.arXiv preprint arXiv:2510.16888, 2025

work page arXiv 2025
[16]

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Bin Lin, Zongjian Li, Xinhua Cheng, Yuwei Niu, Yang Ye, Xianyi He, Shenghai Yuan, Wangbo Yu, Shaodong Wang, Yunyang Ge, Yatian Pang, and Li Yuan. Uniworld-v1: High-resolution semantic encoders for unified visual understanding and generation.ArXiv, abs/2506.03147, 2025. URL https: //api.semanticscholar.org/CorpusID:279119654

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[18]

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Step1X-Edit: A Practical Framework for General Image Editing

Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, et al. Step1x-edit: A practical framework for general image editing.arXiv preprint arXiv:2504.17761, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

Sdedit: Guided image synthesis and editing with stochastic differential equations

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations. InInternational Conference on Learning Representations, 2021. URL https://api.semanticscholar.org/CorpusID:245704504

work page 2021
[23]

Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

work page 2022
[24]

Peebles and Saining Xie

William S. Peebles and Saining Xie. Scalable diffusion models with transformers.2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4172–4182, 2022. URL https://api. semanticscholar.org/CorpusID:254854389

work page 2023
[25]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model.ArXiv, abs/2305.18290,

work page internal anchor Pith review Pith/arXiv arXiv
[26]

URLhttps://api.semanticscholar.org/CorpusID:258959321

work page
[27]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022
[28]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.ArXiv, abs/1707.06347, 2017. URL https://api.semanticscholar.org/ CorpusID:28695052

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.ArXiv, abs/2010.02502, 2020. URLhttps://api.semanticscholar.org/CorpusID:222140788

work page internal anchor Pith review Pith/arXiv arXiv 2010
[31]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Narain Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.ArXiv, abs/2011.13456,

work page internal anchor Pith review Pith/arXiv arXiv 2011
[32]

URLhttps://api.semanticscholar.org/CorpusID:227209335

work page
[33]

Joty, and Nikhil Naik

Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq R. Joty, and Nikhil Naik. Diffusion model alignment using direct preference optimization.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8228–8238, 2023. URLhttps://api.semanticscholar.org/CorpusID...

work page 2024
[34]

Instructedit: Improving automatic masks for diffusion-based image editing with user instructions.ArXiv, abs/2305.18047, 2023

Qian Wang, Biao Zhang, Michael Birsak, and Peter Wonka. Instructedit: Improving automatic masks for diffusion-based image editing with user instructions.ArXiv, abs/2305.18047, 2023. URL https: //api.semanticscholar.org/CorpusID:258959425. 11

work page arXiv 2023
[35]

Unified Reward Model for Multimodal Understanding and Generation

Yibin Wang, Yuhang Zang, Hao Li, Cheng Jin, and Jiaqi Wang. Unified reward model for multimodal understanding and generation.arXiv preprint arXiv:2503.05236, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

Qwen-Image Technical Report

Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Rewarddance: Reward scaling in visual generation.ArXiv, abs/2509.08826, 2025

Jie Wu, Yu Gao, Zi-Nuo Ye, Ming Li, Liang Li, Hanzhong Guo, Jie Liu, Zeyue Xue, Xiaoxia Hou, Wei Liu, Yangyang Zeng, and Weilin Huang. Rewarddance: Reward scaling in visual generation.ArXiv, abs/2509.08826, 2025. URLhttps://api.semanticscholar.org/CorpusID:281247213

work page arXiv 2025
[38]

Editreward: A human- aligned reward model for instruction-guided image editing.arXiv preprint arXiv:2509.26346, 2025

Keming Wu, Sicong Jiang, Max Ku, Ping Nie, Minghao Liu, and Wenhu Chen. Editreward: A human- aligned reward model for instruction-guided image editing.arXiv preprint arXiv:2509.26346, 2025

work page arXiv 2025
[39]

Omnigen: Unified image generation

Shitao Xiao, Yueze Wang, Junjie Zhou, Huaying Yuan, Xingrun Xing, Ruiran Yan, Chaofan Li, Shuting Wang, Tiejun Huang, and Zheng Liu. Omnigen: Unified image generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13294–13304, 2025

work page 2025
[40]

Imagereward: Learning and evaluating human preferences for text-to-image generation.ArXiv, abs/2304.05977, 2023

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation.ArXiv, abs/2304.05977, 2023. URLhttps://api.semanticscholar.org/CorpusID:258079316

work page arXiv 2023
[41]

Visionreward: Fine-grained multi-dimensional human preference learning for image and video generation

Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, Yuan Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li, et al. Visionreward: Fine-grained multi-dimensional human preference learning for image and video generation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 11269–11277, 2026

work page 2026
[42]

DanceGRPO: Unleashing GRPO on Visual Generation

Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, et al. Dancegrpo: Unleashing grpo on visual generation.arXiv preprint arXiv:2505.07818, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Qwen2.5 Technical Report

Qwen An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxin Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin,...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[44]

ImgEdit: A Unified Image Editing Dataset and Benchmark

Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, and Li Yuan. Imgedit: A unified image editing dataset and benchmark.ArXiv, abs/2505.20275, 2025. URL https: //api.semanticscholar.org/CorpusID:278911803

work page internal anchor Pith review Pith/arXiv arXiv 2025
[45]

Magicbrush: A manually annotated dataset for instruction-guided image editing.ArXiv, abs/2306.10012, 2023

Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, and Yu Su. Magicbrush: A manually annotated dataset for instruction-guided image editing.ArXiv, abs/2306.10012, 2023. URL https://api.semanticscholar. org/CorpusID:259187796

work page arXiv 2023
[46]

Qihui Zhang, Munan Ning, Zheyuan Liu, Yanbo Wang, Jiayi Ye, Yue Huang, Shuo Yang, Xiao Chen, Yibing Song, and Li Yuan. Upme: An unsupervised peer review framework for multimodal large language model evaluation.2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9165–9174, 2025. URLhttps://api.semanticscholar.org/CorpusID:277113471

work page 2025
[47]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

work page 2018
[48]

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, and Ming-Yu Liu. Diffusionnft: Online diffusion reinforcement with forward process. arXiv preprint arXiv:2509.16117, 2025. 12 A Implementation Details A.1 Training Details In our implementation, we use FLUX.1-Kontext [Dev] [13] and Qwen-Image-E...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

Humanedit: A high-quality human-rewarded dataset for instruction-based image editing.arXiv preprint arXiv:2412.04280, 2024

Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, and Shuicheng Yan. Humanedit: A high-quality human-rewarded dataset for instruction-based image editing.arXiv preprint arXiv:2412.04280, 2024

work page arXiv 2024

[2] [2]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

arXiv preprint arXiv:2211.09800 , year=

Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instructions.arXiv preprint arXiv:2211.09800, 2022

work page arXiv 2022

[5] [5]

HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer

Qi Cai, Jingwen Chen, Yang Chen, Yehao Li, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Yiheng Zhang, Feng Gao, Peihan Xu, Yimeng Wang, Kai Yu, Wenxuan Chen, Ziwei Feng, Zi-Qiang Gong, Jia-Wern Pan, Yingzhi Peng, Rui Tian, Siyu Wang, Bo Zhao, Ting Yao, and Tao Mei. Hidream-i1: A high-efficient image generative foundation model with sparse diffusion transformer....

work page internal anchor Pith review Pith/arXiv arXiv 2025

[6] [6]

SAM 3: Segment Anything with Concepts

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

Emerging Properties in Unified Multimodal Pretraining

Chaorui Deng, Deyao Zhu, Kunchang Li, Chenhui Gou, Feng Li, Zeyu Wang, Shu Zhong, Weihao Yu, Xiao- Ping Nie, Ziang Song, Shi Guang, and Haoqi Fan. Emerging properties in unified multimodal pretraining. ArXiv, abs/2505.14683, 2025. URLhttps://api.semanticscholar.org/CorpusID:278768720

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024

work page 2024

[9] [9]

arXiv preprint arXiv:2305.16381 , year=

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, P. Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to- image diffusion models.ArXiv, abs/2305.16381, 2023. URL https://api.semanticscholar.org/ CorpusID:258947323

work page arXiv 2023

[10] [10]

Onereward: Unified mask-guided image generation via multi-task human preference learning.arXiv preprint arXiv:2508.21066, 2025

Yuan Gong, Xionghui Wang, Jie Wu, Shiyin Wang, Yitong Wang, and Xinglong Wu. Onereward: Unified mask-guided image generation via multi-task human preference learning.arXiv preprint arXiv:2508.21066, 2025

work page arXiv 2025

[11] [11]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[12] [12]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020

[13] [13]

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, et al. Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space.arXiv preprint arXiv:2506.15742, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [14]

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Junzhe Li, Yutao Cui, Tao Huang, Yinping Ma, Chun Fan, Yiming Cheng, Miles Yang, Zhao Zhong, and Liefeng Bo. Mixgrpo: Unlocking flow-based grpo efficiency with mixed ode-sde.arXiv preprint arXiv:2507.21802, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025

[15] [15]

arXiv:2510.16888 (2025) 2

Zongjian Li, Zheyuan Liu, Qihui Zhang, Bin Lin, Feize Wu, Shenghai Yuan, Zhiyuan Yan, Yang Ye, Wangbo Yu, Yuwei Niu, et al. Uniworld-v2: Reinforce image editing with diffusion negative-aware finetuning and mllm implicit feedback.arXiv preprint arXiv:2510.16888, 2025

work page arXiv 2025

[16] [16]

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Bin Lin, Zongjian Li, Xinhua Cheng, Yuwei Niu, Yang Ye, Xianyi He, Shenghai Yuan, Wangbo Yu, Shaodong Wang, Yunyang Ge, Yatian Pang, and Li Yuan. Uniworld-v1: High-resolution semantic encoders for unified visual understanding and generation.ArXiv, abs/2506.03147, 2025. URL https: //api.semanticscholar.org/CorpusID:279119654

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[18] [18]

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [19]

Step1X-Edit: A Practical Framework for General Image Editing

Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, et al. Step1x-edit: A practical framework for general image editing.arXiv preprint arXiv:2504.17761, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[20] [20]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[21] [21]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[22] [22]

Sdedit: Guided image synthesis and editing with stochastic differential equations

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations. InInternational Conference on Learning Representations, 2021. URL https://api.semanticscholar.org/CorpusID:245704504

work page 2021

[23] [23]

Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

work page 2022

[24] [24]

Peebles and Saining Xie

William S. Peebles and Saining Xie. Scalable diffusion models with transformers.2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4172–4182, 2022. URL https://api. semanticscholar.org/CorpusID:254854389

work page 2023

[25] [25]

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model.ArXiv, abs/2305.18290,

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

URLhttps://api.semanticscholar.org/CorpusID:258959321

work page

[27] [27]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022

[28] [28]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.ArXiv, abs/1707.06347, 2017. URL https://api.semanticscholar.org/ CorpusID:28695052

work page internal anchor Pith review Pith/arXiv arXiv 2017

[29] [29]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[30] [30]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.ArXiv, abs/2010.02502, 2020. URLhttps://api.semanticscholar.org/CorpusID:222140788

work page internal anchor Pith review Pith/arXiv arXiv 2010

[31] [31]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Narain Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.ArXiv, abs/2011.13456,

work page internal anchor Pith review Pith/arXiv arXiv 2011

[32] [32]

URLhttps://api.semanticscholar.org/CorpusID:227209335

work page

[33] [33]

Joty, and Nikhil Naik

Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq R. Joty, and Nikhil Naik. Diffusion model alignment using direct preference optimization.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8228–8238, 2023. URLhttps://api.semanticscholar.org/CorpusID...

work page 2024

[34] [34]

Instructedit: Improving automatic masks for diffusion-based image editing with user instructions.ArXiv, abs/2305.18047, 2023

Qian Wang, Biao Zhang, Michael Birsak, and Peter Wonka. Instructedit: Improving automatic masks for diffusion-based image editing with user instructions.ArXiv, abs/2305.18047, 2023. URL https: //api.semanticscholar.org/CorpusID:258959425. 11

work page arXiv 2023

[35] [35]

Unified Reward Model for Multimodal Understanding and Generation

Yibin Wang, Yuhang Zang, Hao Li, Cheng Jin, and Jiaqi Wang. Unified reward model for multimodal understanding and generation.arXiv preprint arXiv:2503.05236, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[36] [36]

Qwen-Image Technical Report

Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng-ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, et al. Qwen-image technical report.arXiv preprint arXiv:2508.02324, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

Rewarddance: Reward scaling in visual generation.ArXiv, abs/2509.08826, 2025

Jie Wu, Yu Gao, Zi-Nuo Ye, Ming Li, Liang Li, Hanzhong Guo, Jie Liu, Zeyue Xue, Xiaoxia Hou, Wei Liu, Yangyang Zeng, and Weilin Huang. Rewarddance: Reward scaling in visual generation.ArXiv, abs/2509.08826, 2025. URLhttps://api.semanticscholar.org/CorpusID:281247213

work page arXiv 2025

[38] [38]

Editreward: A human- aligned reward model for instruction-guided image editing.arXiv preprint arXiv:2509.26346, 2025

Keming Wu, Sicong Jiang, Max Ku, Ping Nie, Minghao Liu, and Wenhu Chen. Editreward: A human- aligned reward model for instruction-guided image editing.arXiv preprint arXiv:2509.26346, 2025

work page arXiv 2025

[39] [39]

Omnigen: Unified image generation

Shitao Xiao, Yueze Wang, Junjie Zhou, Huaying Yuan, Xingrun Xing, Ruiran Yan, Chaofan Li, Shuting Wang, Tiejun Huang, and Zheng Liu. Omnigen: Unified image generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13294–13304, 2025

work page 2025

[40] [40]

Imagereward: Learning and evaluating human preferences for text-to-image generation.ArXiv, abs/2304.05977, 2023

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation.ArXiv, abs/2304.05977, 2023. URLhttps://api.semanticscholar.org/CorpusID:258079316

work page arXiv 2023

[41] [41]

Visionreward: Fine-grained multi-dimensional human preference learning for image and video generation

Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, Yuan Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li, et al. Visionreward: Fine-grained multi-dimensional human preference learning for image and video generation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 11269–11277, 2026

work page 2026

[42] [42]

DanceGRPO: Unleashing GRPO on Visual Generation

Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, et al. Dancegrpo: Unleashing grpo on visual generation.arXiv preprint arXiv:2505.07818, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[43] [43]

Qwen2.5 Technical Report

Qwen An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxin Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin,...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[44] [44]

ImgEdit: A Unified Image Editing Dataset and Benchmark

Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, and Li Yuan. Imgedit: A unified image editing dataset and benchmark.ArXiv, abs/2505.20275, 2025. URL https: //api.semanticscholar.org/CorpusID:278911803

work page internal anchor Pith review Pith/arXiv arXiv 2025

[45] [45]

Magicbrush: A manually annotated dataset for instruction-guided image editing.ArXiv, abs/2306.10012, 2023

Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, and Yu Su. Magicbrush: A manually annotated dataset for instruction-guided image editing.ArXiv, abs/2306.10012, 2023. URL https://api.semanticscholar. org/CorpusID:259187796

work page arXiv 2023

[46] [46]

Qihui Zhang, Munan Ning, Zheyuan Liu, Yanbo Wang, Jiayi Ye, Yue Huang, Shuo Yang, Xiao Chen, Yibing Song, and Li Yuan. Upme: An unsupervised peer review framework for multimodal large language model evaluation.2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9165–9174, 2025. URLhttps://api.semanticscholar.org/CorpusID:277113471

work page 2025

[47] [47]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

work page 2018

[48] [48]

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, and Ming-Yu Liu. Diffusionnft: Online diffusion reinforcement with forward process. arXiv preprint arXiv:2509.16117, 2025. 12 A Implementation Details A.1 Training Details In our implementation, we use FLUX.1-Kontext [Dev] [13] and Qwen-Image-E...

work page internal anchor Pith review Pith/arXiv arXiv 2025