LocalDPO aligns text-to-video diffusion models with human preferences at the spatio-temporal region level by automatically generating localized preference pairs from corrupted real videos and applying a region-aware DPO loss.
Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Text Slider uses LoRA adapters on pre-trained text encoders to identify low-rank directions for efficient, plug-and-play continuous concept control in diffusion-based image and video synthesis.
citing papers explorer
-
Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models
LocalDPO aligns text-to-video diffusion models with human preferences at the spatio-temporal region level by automatically generating localized preference pairs from corrupted real videos and applying a region-aware DPO loss.
-
Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters
Text Slider uses LoRA adapters on pre-trained text encoders to identify low-rank directions for efficient, plug-and-play continuous concept control in diffusion-based image and video synthesis.