VideoRepair detects text-video misalignments via MLLM-generated questions and performs localized, region-preserving refinement to improve alignment in existing T2V diffusion models.
Blip- 2: Bootstrapping language-image pre-training with frozen image encoders and large language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
SEAL uses semantic embeddings and locality-sensitive hashing to create distortion-free, database-free watermarks for generative images that are conditioned on content for improved forgery resistance.
citing papers explorer
-
Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement
VideoRepair detects text-video misalignments via MLLM-generated questions and performs localized, region-preserving refinement to improve alignment in existing T2V diffusion models.
-
SEAL: Semantic Aware Image Watermarking
SEAL uses semantic embeddings and locality-sensitive hashing to create distortion-free, database-free watermarks for generative images that are conditioned on content for improved forgery resistance.