Blip- 2: Bootstrapping language-image pre-training with frozen image encoders and large language models

Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement

cs.CV · 2024-11-22 · unverdicted · novelty 7.0

VideoRepair detects text-video misalignments via MLLM-generated questions and performs localized, region-preserving refinement to improve alignment in existing T2V diffusion models.

SEAL: Semantic Aware Image Watermarking

cs.LG · 2025-03-15 · unverdicted · novelty 6.0

SEAL uses semantic embeddings and locality-sensitive hashing to create distortion-free, database-free watermarks for generative images that are conditioned on content for improved forgery resistance.

citing papers explorer

Showing 2 of 2 citing papers.

Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement cs.CV · 2024-11-22 · unverdicted · none · ref 17
VideoRepair detects text-video misalignments via MLLM-generated questions and performs localized, region-preserving refinement to improve alignment in existing T2V diffusion models.
SEAL: Semantic Aware Image Watermarking cs.LG · 2025-03-15 · unverdicted · none · ref 19
SEAL uses semantic embeddings and locality-sensitive hashing to create distortion-free, database-free watermarks for generative images that are conditioned on content for improved forgery resistance.

Blip- 2: Bootstrapping language-image pre-training with frozen image encoders and large language models

fields

years

verdicts

representative citing papers

citing papers explorer