The work identifies a small set of attention heads in VLMs that mediate conflicts between parametric knowledge and visual input, and shows that intervening on them steers model behavior while attention patterns provide precise image-region attribution.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Seeing Overrides Knowing: Disentangling Knowledge Conflicts in Vision-Language Models
The work identifies a small set of attention heads in VLMs that mediate conflicts between parametric knowledge and visual input, and shows that intervening on them steers model behavior while attention patterns provide precise image-region attribution.