Decoder-based VLMs over-align visual embeddings to text manifold causing linguistic bias in top PCs of a universal text subspace; projecting out this subspace reduces hallucinations on POPE/CHAIR/AMBER and improves CLAIR.
11 Yiming Tang, Abhijeet Sinha, and Dianbo Liu
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models
Decoder-based VLMs over-align visual embeddings to text manifold causing linguistic bias in top PCs of a universal text subspace; projecting out this subspace reduces hallucinations on POPE/CHAIR/AMBER and improves CLAIR.