pith. sign in

Sample by step, optimize by chunk: Chunk-level grpo for text-to-image generation

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it
abstract

Recent Progress in post-training flow matching for text-to-image (T2I) generation with Group Relative Policy Optimization (GRPO) has demonstrated strong potential. However, it is hindered by a critical limitation: inaccurate advantage attribution. In this work, we argue that aggregating consecutive steps into a coherent `chunk' and shifting the policy optimization paradigm from GRPO's step level to the chunk level can effectively mitigate the negative impact of this issue. Building on this insight, we propose Group Chunking Policy Optimization (GCPO), the first chunk-level reinforcement learning approach for post-training flow matching. Extensive experiments demonstrate that GCPO achieves superior performance on both standard T2I benchmarks and preference alignment, with up to 43% relative gains over GRPO, highlighting the promise of chunk-level policy optimization. The code is available on https://github.com/xingzhejun/GCPO.

citation-role summary

background 2

citation-polarity summary

fields

cs.CV 3

years

2026 3

verdicts

UNVERDICTED 3

roles

background 2

polarities

background 1 unclear 1

clear filters

representative citing papers

citing papers explorer

Showing 3 of 3 citing papers after filters.