CreFlow combines LTL compositional rewards with credit-aware NFT and corrective reflow losses in online RL to improve embodied video diffusion models, raising downstream task success by 23.8 percentage points on eight bimanual manipulation tasks.
Seeing What Matters: Visual Preference Policy Optimization for Visual Generation
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
Reinforcement learning (RL) has become a powerful tool for post-training visual generative models, with Group Relative Policy Optimization (GRPO) increasingly used to align generators with human preferences. However, existing GRPO pipelines rely on a single scalar reward per sample, treating each image or video as a holistic entity and ignoring the rich spatial and temporal structure of visual content. This coarse supervision hinders the correction of localized artifacts and the modeling of fine-grained perceptual cues. We introduce Visual Preference Policy Optimization (ViPO), a GRPO variant that lifts scalar feedback into structured, pixel-level advantages. ViPO employs a Perceptual Structuring Module that uses pretrained vision backbones to construct spatially and temporally aware advantage maps, redistributing optimization pressure toward perceptually important regions while preserving the stability of standard GRPO. Across both image and video benchmarks, ViPO consistently outperforms vanilla GRPO, improving in-domain alignment with human-preference rewards and enhancing generalization on out-of-domain evaluations. The method is architecture-agnostic, lightweight, and fully compatible with existing GRPO training pipelines, providing a more expressive and informative learning signal for visual generation.
citation-role summary
citation-polarity summary
fields
cs.CV 3years
2026 3roles
method 1polarities
use method 1representative citing papers
OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
CogniVerse is a proposed MMRAG framework that combines cognitive reflection for retrieval filtering, Riemannian manifold alignment plus spectral graphs for retrieval, and optimal transport loss for generation, claiming better accuracy, coherence, and lower latency than prior systems.
citing papers explorer
-
Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation
OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
-
CogniVerse: Revolutionizing Multi-Modal Retrieval-Augmented Generation with Cognitive Reflection and Geometric Reasoning
CogniVerse is a proposed MMRAG framework that combines cognitive reflection for retrieval filtering, Riemannian manifold alignment plus spectral graphs for retrieval, and optimal transport loss for generation, claiming better accuracy, coherence, and lower latency than prior systems.