Densegrpo: From sparse to dense reward for flow matching model alignment

Haoyou Deng, Keyu Yan, Chaojie Mao, Xiang Wang, Yu Liu, Changxin Gao, Nong Sang · 2026 · arXiv 2601.20218

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Super-Linear Advantage Shaping (SLAS) introduces a non-linear geometric policy update for RL post-training of text-to-image models that reshapes the local policy space via advantage-dependent Fisher-Rao weighting to reduce reward hacking and improve performance over GRPO baselines.

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling

cs.LG · 2026-04-08 · unverdicted · novelty 6.0

Sol-RL decouples FP4-based candidate exploration from BF16 policy optimization in diffusion RL, delivering up to 4.64x faster convergence with maintained or superior alignment performance on models like FLUX.1 and SD3.5.

Wan-Image: Pushing the Boundaries of Generative Visual Intelligence

cs.CV · 2026-04-21 · unverdicted · novelty 3.0

Wan-Image is a unified multi-modal system that integrates LLMs and diffusion transformers to deliver professional-grade image generation features including complex typography, multi-subject consistency, and precise editing, outperforming several prior models in human tests.

citing papers explorer

Showing 3 of 3 citing papers.

Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping cs.CV · 2026-05-11 · unverdicted · none · ref 103
Super-Linear Advantage Shaping (SLAS) introduces a non-linear geometric policy update for RL post-training of text-to-image models that reshapes the local policy space via advantage-dependent Fisher-Rao weighting to reduce reward hacking and improve performance over GRPO baselines.
FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling cs.LG · 2026-04-08 · unverdicted · none · ref 21
Sol-RL decouples FP4-based candidate exploration from BF16 policy optimization in diffusion RL, delivering up to 4.64x faster convergence with maintained or superior alignment performance on models like FLUX.1 and SD3.5.
Wan-Image: Pushing the Boundaries of Generative Visual Intelligence cs.CV · 2026-04-21 · unverdicted · none · ref 6
Wan-Image is a unified multi-modal system that integrates LLMs and diffusion transformers to deliver professional-grade image generation features including complex typography, multi-subject consistency, and precise editing, outperforming several prior models in human tests.

Densegrpo: From sparse to dense reward for flow matching model alignment

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer