Insight over sight: Exploring the vision-knowledge conflicts in multimodal llms

Xiaoyuan Liu, Wenxuan Wang, Youliang Yuan, Jen-tse Huang, Qiuzhi Liu, Pinjia He, Zhaopeng Tu · 2024 · arXiv 2410.08145

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Measuring Cross-Modal Synergy: A Benchmark for VLM Explainability

cs.AI · 2026-05-21 · unverdicted · novelty 7.0

Introduces Synergistic Faithfulness metric based on Shapley Interaction Index to evaluate cross-modal synergy in VLM explainers, revealing over-reliance on visual salience in existing methods.

When Seeing Overrides Knowing: Disentangling Knowledge Conflicts in Vision-Language Models

cs.CV · 2025-07-18 · unverdicted · novelty 6.0

The work identifies a small set of attention heads in VLMs that mediate conflicts between parametric knowledge and visual input, and shows that intervening on them steers model behavior while attention patterns provide precise image-region attribution.

citing papers explorer

Showing 2 of 2 citing papers.

Measuring Cross-Modal Synergy: A Benchmark for VLM Explainability cs.AI · 2026-05-21 · unverdicted · none · ref 15
Introduces Synergistic Faithfulness metric based on Shapley Interaction Index to evaluate cross-modal synergy in VLM explainers, revealing over-reliance on visual salience in existing methods.
When Seeing Overrides Knowing: Disentangling Knowledge Conflicts in Vision-Language Models cs.CV · 2025-07-18 · unverdicted · none · ref 24
The work identifies a small set of attention heads in VLMs that mediate conflicts between parametric knowledge and visual input, and shows that intervening on them steers model behavior while attention patterns provide precise image-region attribution.

Insight over sight: Exploring the vision-knowledge conflicts in multimodal llms

fields

years

verdicts

representative citing papers

citing papers explorer