Eyes wide shut? exploring the visual shortcomings of multi- modal llms

Shengbang Tong, Zhuang Liu, Yuexiang Zhai, et al · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Mechanisms of Object Localization in Vision-Language Models

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Localization in VLMs relies on a containerization mechanism driven by object-aligned tokens and a narrow set of specialized attention heads in early-to-mid or mid-to-late layers.

citing papers explorer

Showing 1 of 1 citing paper.

Mechanisms of Object Localization in Vision-Language Models cs.CV · 2026-05-19 · unverdicted · none · ref 31
Localization in VLMs relies on a containerization mechanism driven by object-aligned tokens and a narrow set of specialized attention heads in early-to-mid or mid-to-late layers.

Eyes wide shut? exploring the visual shortcomings of multi- modal llms

fields

years

verdicts

representative citing papers

citing papers explorer