Delving into rl for image generation with cot: A study on dpo vs

Chengzhuo Tong, Ziyu Guo, Renrui Zhang, Wenyu Shan, Xinyu Wei, Zhenghao Xing, Hongsheng Li, Pheng-Ann Heng · 2025 · arXiv 2505.17017

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

MICo-150K is a new 150K-image dataset with 7 tasks, a De&Re real-image subset, MICo-Bench, and Weighted-Ref-VIEScore metric that improves AI models for generating consistent composites from arbitrary numbers of reference images.

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

cs.AI · 2025-07-29 · unverdicted · novelty 7.0

MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.

VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation

cs.CV · 2026-04-02 · conditional · novelty 6.0

VERTIGO post-trains camera trajectory generators with visual preference signals from Unity-rendered previews scored by a cinematically fine-tuned VLM, cutting character off-screen rates from 38% to near zero while improving framing and prompt adherence.

From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation

cs.LG · 2026-03-12 · unverdicted · novelty 6.0

EG-GRPO improves autoregressive text-to-image models by reallocating RL updates according to token entropy, excluding low-entropy tokens from reward signals while adding entropy bonuses to high-entropy ones, yielding state-of-the-art results on standard benchmarks.

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

cs.CV · 2025-08-28 · unverdicted · novelty 5.0

Pref-GRPO stabilizes T2I RL training by using pairwise win rates from preference models as rewards instead of normalized pointwise scores, while UniGenBench enables finer-grained model evaluation across themes and criteria.

OmniGen2: Towards Instruction-Aligned Multimodal Generation

cs.CV · 2025-06-23 · unverdicted · novelty 5.0

OmniGen2 introduces a unified generative model with two distinct decoding pathways and a decoupled image tokenizer that achieves competitive results on text-to-image and editing benchmarks plus state-of-the-art consistency among open-source models on the new OmniContext benchmark.

HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation

cs.CL · 2026-03-19

citing papers explorer

Showing 7 of 7 citing papers.

MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition cs.CV · 2025-12-08 · unverdicted · none · ref 71
MICo-150K is a new 150K-image dataset with 7 tasks, a De&Re real-image subset, MICo-Bench, and Weighted-Ref-VIEScore metric that improves AI models for generating consistent composites from arbitrary numbers of reference images.
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE cs.AI · 2025-07-29 · unverdicted · none · ref 38
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation cs.CV · 2026-04-02 · conditional · none · ref 44
VERTIGO post-trains camera trajectory generators with visual preference signals from Unity-rendered previews scored by a cinematically fine-tuned VLM, cutting character off-screen rates from 38% to near zero while improving framing and prompt adherence.
From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation cs.LG · 2026-03-12 · unverdicted · none · ref 25
EG-GRPO improves autoregressive text-to-image models by reallocating RL updates according to token entropy, excluding low-entropy tokens from reward signals while adding entropy bonuses to high-entropy ones, yielding state-of-the-art results on standard benchmarks.
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning cs.CV · 2025-08-28 · unverdicted · none · ref 15
Pref-GRPO stabilizes T2I RL training by using pairwise win rates from preference models as rewards instead of normalized pointwise scores, while UniGenBench enables finer-grained model evaluation across themes and criteria.
OmniGen2: Towards Instruction-Aligned Multimodal Generation cs.CV · 2025-06-23 · unverdicted · none · ref 74
OmniGen2 introduces a unified generative model with two distinct decoding pathways and a decoupled image tokenizer that achieves competitive results on text-to-image and editing benchmarks plus state-of-the-art consistency among open-source models on the new OmniContext benchmark.
HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation cs.CL · 2026-03-19 · unreviewed · ref 25

Delving into rl for image generation with cot: A study on dpo vs

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer