InstructBLIP: Towards general-purpose vision- language models with instruction tuning

Wenliang Dai, Junnan Li, Dongxu Li, Anthony Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale N Fung, Steven Hoi · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Aligning What Vision-Language Models See and Perceive with Adaptive Information Flow

cs.CV · 2026-04-17 · unverdicted · novelty 5.0

An inference-time technique that uses token activation dynamics to adaptively restrict text attention to important visual tokens, improving VLM accuracy on VQA, grounding, counting, OCR, and hallucination benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Aligning What Vision-Language Models See and Perceive with Adaptive Information Flow cs.CV · 2026-04-17 · unverdicted · none · ref 9
An inference-time technique that uses token activation dynamics to adaptively restrict text attention to important visual tokens, improving VLM accuracy on VQA, grounding, counting, OCR, and hallucination benchmarks.

InstructBLIP: Towards general-purpose vision- language models with instruction tuning

fields

years

verdicts

representative citing papers

citing papers explorer