VLMs encode visual evidence as strongly in failed cases as successful ones, with final-layer logit gaps predicting grounding outcomes and full-sequence activation patching altering 60-84% of outputs to improve visual use.
blue”), (2) capitalized (“Blue
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Arbitration Failure, Not Perceptual Blindness: How Vision-Language Models Resolve Visual-Linguistic Conflicts
VLMs encode visual evidence as strongly in failed cases as successful ones, with final-layer logit gaps predicting grounding outcomes and full-sequence activation patching altering 60-84% of outputs to improve visual use.