Blip-2: Bootstrapping language-image pre- training with frozen image encoders and large language models

Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs

cs.CV · 2025-05-21 · unverdicted · novelty 6.0

Chain-of-Focus enables VLMs to adaptively search and zoom on important image areas via a two-stage SFT and RL pipeline on a custom 3K-sample dataset, yielding 5% gains on the V* benchmark across resolutions from 224 to 4K.

citing papers explorer

Showing 1 of 1 citing paper.

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs cs.CV · 2025-05-21 · unverdicted · none · ref 23
Chain-of-Focus enables VLMs to adaptively search and zoom on important image areas via a two-stage SFT and RL pipeline on a custom 3K-sample dataset, yielding 5% gains on the V* benchmark across resolutions from 224 to 4K.

Blip-2: Bootstrapping language-image pre- training with frozen image encoders and large language models

fields

years

verdicts

representative citing papers

citing papers explorer