pith. sign in

Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 3 2025 3

verdicts

UNVERDICTED 6

roles

background 2

polarities

background 2

representative citing papers

Let ViT Speak: Generative Language-Image Pre-training

cs.CV · 2026-05-01 · unverdicted · novelty 5.0

GenLIP pretrains ViTs to generate language tokens from visual tokens via autoregressive language modeling, matching strong baselines on multimodal tasks with less data.

citing papers explorer

Showing 6 of 6 citing papers.