Spatial attention metrics in VLMs correlate near zero (R≈0.001) with accuracy while self-consistency predicts truth at R=0.429; reliability stems from generation dynamics rather than visual grounding.
Compass: Context-modulated pid attention steering system for hallucination mitigation.arXiv preprint arXiv:2511.14776,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models
Spatial attention metrics in VLMs correlate near zero (R≈0.001) with accuracy while self-consistency predicts truth at R=0.429; reliability stems from generation dynamics rather than visual grounding.