SOAR is a reward-free on-policy method that supplies dense per-timestep supervision to correct exposure bias in diffusion model denoising trajectories, raising GenEval from 0.70 to 0.78 and OCR from 0.64 to 0.67 over SFT on SD3.5-Medium.
Improving text-to- image consistency via automatic prompt optimization
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7representative citing papers
PAMELA provides a multi-user rating dataset and personalized reward model that predicts individual image preferences more accurately than prior population-level aesthetic models.
Noise optimization during sampling recovers diversity in mode-collapsed diffusion models while preserving output fidelity.
VideoRepair detects text-video misalignments via MLLM-generated questions and performs localized, region-preserving refinement to improve alignment in existing T2V diffusion models.
RAPO++ is a three-stage prompt optimization framework combining retrieval-augmented refinement, closed-loop test-time scaling, and LLM fine-tuning to enhance text-to-video generation quality.
A post-training pipeline for video generation models combines SFT, RLHF with novel GRPO, prompt enhancement, and inference optimization to improve visual quality, temporal coherence, and instruction following.
AutoVQA-G is a self-improving framework that generates VQA-G datasets with higher visual grounding accuracy than leading multimodal LLMs via iterative CoT verification and prompt refinement.
citing papers explorer
-
Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement
VideoRepair detects text-video misalignments via MLLM-generated questions and performs localized, region-preserving refinement to improve alignment in existing T2V diffusion models.