Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination

Jinrui Jiang; Xinyu Dai; Zhangtai Wu; Zhen Wu

arxiv: 2605.19250 · v1 · pith:P7BNJBU7new · submitted 2026-05-19 · 💻 cs.AI

Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination

Jinrui Jiang , Zhangtai Wu , Zhen Wu , Xinyu Dai This is my paper

Pith reviewed 2026-05-20 06:21 UTC · model grok-4.3

classification 💻 cs.AI

keywords modality conflict hallucinationattention headscausal interventionmultimodal large language modelspath patchinghallucination reductionMACI

0 comments

The pith

Attention head imbalance in multimodal models favors erroneous text over visual evidence during generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks why multimodal large language models sometimes follow contradictory text instead of visual input and traces the cause to internal attention mechanisms. Using path patching on five open-source models, it separates attention heads into those that causally promote hallucinations and those that resist them. The driving heads turn out to be more numerous and collectively stronger, while resisting heads are fewer but individually potent. This creates a structural tilt toward the wrong premise. The authors then build a conditional intervention that suppresses only the driving heads when conflict appears, producing the largest drop in hallucinations among tested methods.

Core claim

Across five open-source MLLMs, hallucination-driving attention heads are more broadly distributed and carry greater aggregate causal weight than hallucination-resisting heads, forming an imbalanced routing structure that biases generation toward erroneous textual premises; conditional suppression of the driving heads via MACI yields the largest hallucination reduction on the MMMC benchmark among compared baselines.

What carries the argument

Path-patching identification of hallucination-driving versus hallucination-resisting attention heads and the resulting distributed-versus-localized imbalance in their causal effects on token prediction.

If this is right

The imbalance appears consistently across the five tested open-source MLLMs.
Conditional suppression of driving heads improves the hallucination-accuracy trade-off compared with unconditional or random interventions.
The same intervention transfers zero-shot to the SCI-SemanticConflict test.
Ablation experiments confirm that driving and resisting heads exert opposing effects on generation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training objectives could be adjusted to strengthen the aggregate weight of resisting heads relative to driving heads.
Similar head-level imbalances may exist for other hallucination types such as object or attribute errors.
The routing structure identified here could be monitored at inference time as an early-warning signal for modality conflicts.

Load-bearing premise

Path patching isolates the true causal contribution of each individual attention head to the final output without substantial interference from other heads or from the chosen patching values.

What would settle it

No measurable reduction in modality-conflict hallucinations when the same suppression is applied to randomly selected heads instead of the causally identified driving heads on the MMMC benchmark.

Figures

Figures reproduced from arXiv: 2605.19250 by Jinrui Jiang, Xinyu Dai, Zhangtai Wu, Zhen Wu.

**Figure 1.** Figure 1: Head-level path patching. Top (Conflict run): the model is biased toward the erroneous textual premise. Middle (Clean run): the model identifies the visual evidence given an unbiased query. Bottom (Patching): substituting head (l, i)’s activation with its clean-run counterpart and measuring the change in hallucination advantage indicates whether the head drives or resists hallucination. ratio: 1.51×), wh… view at source ↗

**Figure 2.** Figure 2: Hallucination-driving (H+, red) and hallucination-resisting (H−, blue) heads in Qwen2.5-VL-7B. Top: layer-wise importance and per-layer sums. Bottom: ranked heads and cumulative importance. Results for all models are in Appendix B. Qwen2.5-VL-7B Qwen3-VL-8B InternVL3-8B LLaVA-NeXT-7B LLaVA-7B 0 20 40 60 80 100 Hallucination Rate (%) Base Prune-Random Prune-D Prune-Both Prune-R [PITH_FULL_IMAGE:figures/fu… view at source ↗

**Figure 3.** Figure 3: Causal validation by head ablation. Hallucination rate (%) under Base, [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Trade-off between hallucination-rate reduction and non-conflict accuracy [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Head importance distributions for the remaining four models. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

Modality-conflict hallucination occurs when multimodal large language models (MLLMs) prioritize erroneous textual premises over contradictory visual evidence. To understand why visual evidence fails to prevail during generation, we take a mechanistic perspective and examine which internal components drive or resist this failure. We perform head-level causal analysis using path patching across five open-source MLLMs and identify two groups of attention heads with opposing causal roles: hallucination-driving heads and hallucination-resisting heads. We find a consistent asymmetry: driving effects are more broadly distributed and carry greater aggregate weight, whereas resisting effects concentrate in a small number of high-importance heads. Ablation experiments further confirm that these groups exert opposing effects during generation: distributed driving influence and localized resistance together form an imbalanced routing structure that biases generation toward the erroneous premise. Motivated by this finding, we propose MACI (Modality-conflict-Aware Causal Intervention), a conditional intervention that suppresses causally identified hallucination-driving heads only when conflict is detected. Across five MLLMs, MACI achieves the largest hallucination reduction among compared inference-time baselines on the MMMC benchmark with a favorable hallucination-accuracy trade-off, and transfers zero-shot to the SCI-SemanticConflict test.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds a distributed-versus-localized imbalance in attention heads for modality-conflict hallucinations via path patching and turns it into a conditional suppression method that beats baselines on MMMC.

read the letter

The key thing here is that hallucination-driving heads turn out more broadly distributed with higher total causal weight than the resisting ones, creating an imbalanced routing that favors bad text over visuals, and their MACI fix suppresses the drivers only on detected conflict to get the biggest hallucination drop among inference-time methods across five MLLMs plus zero-shot transfer to another test. They run the same analysis on five open models and use ablations to show the two groups really oppose each other during generation. That consistency and the practical payoff on a known benchmark are the parts that land cleanly. The zero-shot transfer is a small but useful extra. The softer spot sits in the path-patching protocol itself. Because heads update the residual stream together, patching one head can be compensated by the rest or depend on the exact corrupted activation chosen as the patch value. The paper needs to show explicit checks for this, such as joint patching of head sets or results across multiple patch sources, otherwise the reported broader distribution of driving heads could partly reflect how the intervention interacts rather than a stable model property. If those controls are present and reported with effect sizes, the asymmetry holds up better. This is aimed at people working on mechanistic interpretability for vision-language models or on lightweight fixes for multimodal reliability. A reader already using causal interventions in transformers would pick up the specific driving-resisting split and the conditional intervention design. It deserves a serious referee because the causal angle on a concrete failure mode is worth checking in detail even if the intervention protocol needs tightening.

Referee Report

1 major / 3 minor

Summary. The manuscript claims that modality-conflict hallucination in MLLMs arises from an imbalanced attention-head routing structure: across five open-source models, hallucination-driving heads identified via path patching are more broadly distributed and carry greater aggregate causal weight than hallucination-resisting heads. Ablations confirm opposing effects, and the authors propose MACI, a conditional intervention that suppresses driving heads only on detected conflict, yielding the largest hallucination reduction on MMMC among baselines while preserving accuracy and transferring zero-shot to SCI-SemanticConflict.

Significance. If the path-patching results prove robust, the work supplies concrete causal evidence for why visual evidence is overridden by textual premises and demonstrates a practical, targeted mitigation strategy. Consistency across five models and the ablation confirmation are strengths; the favorable hallucination-accuracy trade-off of MACI would be a useful contribution to inference-time hallucination control if the underlying head classifications are stable.

major comments (1)

[path-patching protocol and definition of driving versus resisting heads] Path-patching protocol and definition of driving versus resisting heads: the central claim of an imbalanced routing structure (broader distribution and higher aggregate causal weight for driving heads) depends on the assumption that single-head path patching isolates each head's causal contribution to final-token prediction. Because heads interact through the residual stream, patching one head's output from a corrupted run into a clean run can be compensated by remaining heads or altered by the specific corrupted activation chosen as the patch value. Without reported controls such as joint patching of candidate head sets or comparisons across multiple patch sources, the reported asymmetry risks being an artifact of the intervention rather than a stable computational property.

minor comments (3)

The abstract and methods description provide no quantitative effect sizes, error bars, or statistical tests for the reported differences in distribution and aggregate weight between driving and resisting heads.
Details on the conflict detector used inside MACI (how conflict is detected at inference time and the precise suppression rule) are not specified, making it difficult to assess reproducibility or failure modes.
The manuscript would benefit from explicit comparison of the path-patching results against alternative causal methods (e.g., activation patching with multiple source runs or attribution patching) to strengthen the claim that the identified imbalance is method-independent.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our path-patching analysis. We address the methodological concern in detail below and describe the revisions we will undertake.

read point-by-point responses

Referee: [path-patching protocol and definition of driving versus resisting heads] Path-patching protocol and definition of driving versus resisting heads: the central claim of an imbalanced routing structure (broader distribution and higher aggregate causal weight for driving heads) depends on the assumption that single-head path patching isolates each head's causal contribution to final-token prediction. Because heads interact through the residual stream, patching one head's output from a corrupted run into a clean run can be compensated by remaining heads or altered by the specific corrupted activation chosen as the patch value. Without reported controls such as joint patching of candidate head sets or comparisons across multiple patch sources, the reported asymmetry risks being an artifact of the intervention rather than a stable computational property.

Authors: We appreciate the referee's observation regarding residual-stream interactions and the assumptions underlying single-head path patching. Our protocol follows the standard single-head intervention used in mechanistic interpretability to attribute effects to individual components. While compensation by other heads is possible in principle, we validated the opposing roles through group-level ablation experiments that intervene simultaneously on the full sets of driving and resisting heads; these collective interventions confirm the distributed driving influence and localized resistance, thereby providing evidence that the asymmetry is not solely an artifact of isolated patching. The same imbalance pattern is reproduced across five architecturally distinct MLLMs, further supporting that the finding reflects a stable property rather than a patching-source artifact. In the revised manuscript we will expand the Methods section to explicitly discuss the residual-stream interaction concern, clarify why single-head patching was chosen for head identification, and explain how the group ablations serve as a control for collective effects. We will also add a dedicated limitations paragraph addressing the assumptions of the protocol. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper applies the established external technique of path-patching to quantify the causal effect of each attention head on hallucination rates under modality conflict. Heads are then partitioned into driving and resisting groups according to the sign of those measured effects; the reported broader distribution and higher aggregate causal weight of the driving group is an empirical summary statistic computed directly from the same set of intervention results across the five models. This constitutes an observation about the measured distribution rather than a quantity defined in terms of itself or a parameter fitted and then relabeled as a prediction. The subsequent ablation checks and the design of MACI follow from these measurements without any self-referential reduction or load-bearing self-citation that would make the central claim equivalent to its inputs by construction. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the validity of path patching as a causal probe and on the assumption that the MMMC benchmark isolates modality conflict without confounding factors. No explicit free parameters or invented entities are described in the abstract; the conflict detector inside MACI is an unstated modeling choice.

axioms (1)

domain assumption Path patching on attention heads isolates their causal contribution to the final generation without substantial side effects from other components
Invoked when the authors label heads as driving or resisting on the basis of patching outcomes

pith-pipeline@v0.9.0 · 5751 in / 1500 out tokens · 46001 ms · 2026-05-20T06:21:28.387167+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We perform head-level causal analysis using path patching across five open-source MLLMs and identify two groups of attention heads with opposing causal roles: hallucination-driving heads and hallucination-resisting heads.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Importance Score. ... ¯I_{l,i} = ... L(x_cf) - L(x^{(l,i)←cl}_cf)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 7 internal anchors

[1]

Qwen3-VL Technical Report

Bai, S., Cai, Y., Chen, R., et al.: Qwen3-VL technical report (2025), https://arxiv. org/abs/2511.21631

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Bai, S., Chen, K., Liu, X., et al.: Qwen2.5-VL technical report (2025), https:// arxiv.org/abs/2502.13923

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Bai, Z., Wang, P., Ma, T., Chen, G., Liu, Z., Fu, J., Shou, M.Z.: Hallucination of multimodal large language models: A survey (2024), https://arxiv.org/abs/2404. 18930

work page 2024
[4]

Basu, S., Grayson, M., Morrison, C., Nushi, B., Feizi, S., Massiceti, D.: Under- standing information storage and transfer in multi-modal large language models (2024), https://arxiv.org/abs/2406.04236

work page arXiv 2024
[5]

DeepSeek-AI: DeepSeek-V3.2: Pushing the frontier of open large language models (2025), https://arxiv.org/abs/2512.02556

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

In: European Conference on Computer Vision

Gao, J., Gan, L., Li, Y., Ye, Y., Wang, D.: Dissecting dissonance: Benchmark- ing large multimodal models against self-contradictory instructions. In: European Conference on Computer Vision. pp. 404–420. Springer (2024)

work page 2024
[7]

Guan, T., Liu, F., Wu, X., Xian, R., Li, Z., Liu, X., Wang, X., Chen, L., Huang, F., Yacoob, Y., Manocha, D., Zhou, T.: HallusionBench: An advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models (2024), https://arxiv.org/abs/2310.14566

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

In: Proceedings of the Annual Meeting of the Association for Computational Lin- guistics (ACL) (2025)

He, J., Zhu, K., Guo, H., Fang, J., Hua, Z., Jia, Y., Tang, M., Chua, T.S., Wang, J.: Cracking the code of hallucination in LVLMs with vision-aware head divergence. In: Proceedings of the Annual Meeting of the Association for Computational Lin- guistics (ACL) (2025)

work page 2025
[9]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Huang, Q., Dong, X., Zhang, P., Wang, B., He, C., Wang, J., Lin, D., Zhang, W., Yu, N.: OPERA: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13418– 13427 (2024)

work page 2024
[10]

Mitigating Object Hallucinations in Large Vision - Language Models through Visual Contrastive Decoding , November 2023

Leng, S., Zhang, H., Chen, G., Li, X., Lu, S., Miao, C., Bing, L.: Mitigating object hallucinations in large vision-language models through visual contrastive decoding. arXiv preprint arXiv:2311.16922 (2023), https://arxiv.org/abs/2311.16922

work page arXiv 2023
[11]

github.io/blog/2024-01-30-llava-next/

Liu, H., Li, C., Li, Y., Li, B., Zhang, Y., Shen, S., Lee, Y.J.: LLaVA-NeXT: Im- proved reasoning, OCR, and world knowledge (January 2024), https://llava-vl. github.io/blog/2024-01-30-llava-next/

work page 2024
[12]

Visual Instruction Tuning

Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning (2023), https://arxiv. org/abs/2304.08485

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

Advances in Neural Information Processing Systems35(2022) Attention Head Imbalance in Modality Conflict 11

Meng, K., Bau, D., Andonian, A., Belinkov, Y.: Locating and editing factual asso- ciations in GPT. Advances in Neural Information Processing Systems35(2022) Attention Head Imbalance in Modality Conflict 11

work page 2022
[14]

Nguyen, T., Michaels, J., Fiterau, M., Jensen, D.: Challenges in understanding modality conflict in vision-language models (2025), https://arxiv.org/abs/2509. 02805

work page 2025
[15]

Qian, J., Zheng, G., Zhu, Y., Yang, S.: Intervene-All-Paths: Unified mitigation of LVLMhallucinationsacrossalignmentformats.In:AdvancesinNeuralInformation Processing Systems (NeurIPS) (2025)

work page 2025
[16]

In: Ad- vances in Neural Information Processing Systems

Vig, J., Gehrmann, S., Belinkov, Y., Qian, S., Nevo, D., Singer, Y., Shieber, S.: In- vestigating gender bias in language models using causal mediation analysis. In: Ad- vances in Neural Information Processing Systems. vol. 33, pp. 12388–12401 (2020)

work page 2020
[17]

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Wang, K., Variengien, A., Conmy, A., Shlegeris, B., Steinhardt, J.: Interpretabil- ity in the wild: a circuit for indirect object identification in GPT-2 small. arXiv preprint arXiv:2211.00593 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[18]

In: Findings of the Association for Computational Linguistics ACL 2024

Wang, X., Pan, J., Ding, L., Biemann, C.: Mitigating hallucinations in large vision-language models with instruction contrastive decoding. In: Findings of the Association for Computational Linguistics ACL 2024. pp. 15840–15853 (2024), https://aclanthology.org/2024.findings-acl.937

work page 2024
[19]

In: Proceedings of the AAAI Con- ference on Artificial Intelligence

Wang, Y., Aniri, Bi, J., Pirk, S., Ma, Y.: ASCD: Attention-steerable contrastive decoding for reducing hallucination in MLLM. In: Proceedings of the AAAI Con- ference on Artificial Intelligence. vol. 40, pp. 10306–10314 (2026)

work page 2026
[20]

In: Inter- national Conference on Learning Representations (2025)

Yang, T., Li, Z., Cao, J., Xu, C.: Understanding and mitigating hallucination in large vision-language models via modular attribution and intervention. In: Inter- national Conference on Learning Representations (2025)

work page 2025
[21]

Cross-modal information flow in multimodal large language models

Zhang, Z., Yadav, S., Han, F., Shutova, E.: Cross-modal information flow in mul- timodal large language models. arXiv preprint arXiv:2411.18620 (2024)

work page arXiv 2024
[22]

In: Proceedings of the 42nd International Conference on Machine Learning

Zhang, Z., Zhou, W., Zhao, J., Li, H.: Robust multimodal large language models against modality conflict. In: Proceedings of the 42nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 267, pp. 77233–77253. PMLR (2025)

work page 2025
[23]

In: Advances in Neural Information Processing Systems (2023)

Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E.P., Zhang, H., Gonzalez, J.E., Stoica, I.: Judging LLM-as-a-judge with MT-bench and chatbot arena. In: Advances in Neural Information Processing Systems (2023)

work page 2023
[24]

Zhu, J., Wang, W., Chen, Z., et al.: InternVL3: Exploring advanced training and test-time recipes for open-source multimodal models (2025), https://arxiv.org/ abs/2504.10479

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

arXiv preprint arXiv:2410.03659 (2024) A Judge Reliability and Probe Detection Hall

Zhu, T., Liu, Q., Wang, F., Tu, Z., Chen, M.: Unraveling cross-modality knowledge conflict in large vision-language models. arXiv preprint arXiv:2410.03659 (2024) A Judge Reliability and Probe Detection Hall. labels premise following, not correctness: erroneous-premise following on MMMC and substituted-premise following on SCI. Llama-3.3-70B givesκ= 0.784...

work page arXiv 2024

[1] [1]

Qwen3-VL Technical Report

Bai, S., Cai, Y., Chen, R., et al.: Qwen3-VL technical report (2025), https://arxiv. org/abs/2511.21631

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Bai, S., Chen, K., Liu, X., et al.: Qwen2.5-VL technical report (2025), https:// arxiv.org/abs/2502.13923

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

Bai, Z., Wang, P., Ma, T., Chen, G., Liu, Z., Fu, J., Shou, M.Z.: Hallucination of multimodal large language models: A survey (2024), https://arxiv.org/abs/2404. 18930

work page 2024

[4] [4]

Basu, S., Grayson, M., Morrison, C., Nushi, B., Feizi, S., Massiceti, D.: Under- standing information storage and transfer in multi-modal large language models (2024), https://arxiv.org/abs/2406.04236

work page arXiv 2024

[5] [5]

DeepSeek-AI: DeepSeek-V3.2: Pushing the frontier of open large language models (2025), https://arxiv.org/abs/2512.02556

work page internal anchor Pith review Pith/arXiv arXiv 2025

[6] [6]

In: European Conference on Computer Vision

Gao, J., Gan, L., Li, Y., Ye, Y., Wang, D.: Dissecting dissonance: Benchmark- ing large multimodal models against self-contradictory instructions. In: European Conference on Computer Vision. pp. 404–420. Springer (2024)

work page 2024

[7] [7]

Guan, T., Liu, F., Wu, X., Xian, R., Li, Z., Liu, X., Wang, X., Chen, L., Huang, F., Yacoob, Y., Manocha, D., Zhou, T.: HallusionBench: An advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models (2024), https://arxiv.org/abs/2310.14566

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

In: Proceedings of the Annual Meeting of the Association for Computational Lin- guistics (ACL) (2025)

He, J., Zhu, K., Guo, H., Fang, J., Hua, Z., Jia, Y., Tang, M., Chua, T.S., Wang, J.: Cracking the code of hallucination in LVLMs with vision-aware head divergence. In: Proceedings of the Annual Meeting of the Association for Computational Lin- guistics (ACL) (2025)

work page 2025

[9] [9]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Huang, Q., Dong, X., Zhang, P., Wang, B., He, C., Wang, J., Lin, D., Zhang, W., Yu, N.: OPERA: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13418– 13427 (2024)

work page 2024

[10] [10]

Mitigating Object Hallucinations in Large Vision - Language Models through Visual Contrastive Decoding , November 2023

Leng, S., Zhang, H., Chen, G., Li, X., Lu, S., Miao, C., Bing, L.: Mitigating object hallucinations in large vision-language models through visual contrastive decoding. arXiv preprint arXiv:2311.16922 (2023), https://arxiv.org/abs/2311.16922

work page arXiv 2023

[11] [11]

github.io/blog/2024-01-30-llava-next/

Liu, H., Li, C., Li, Y., Li, B., Zhang, Y., Shen, S., Lee, Y.J.: LLaVA-NeXT: Im- proved reasoning, OCR, and world knowledge (January 2024), https://llava-vl. github.io/blog/2024-01-30-llava-next/

work page 2024

[12] [12]

Visual Instruction Tuning

Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning (2023), https://arxiv. org/abs/2304.08485

work page internal anchor Pith review Pith/arXiv arXiv 2023

[13] [13]

Advances in Neural Information Processing Systems35(2022) Attention Head Imbalance in Modality Conflict 11

Meng, K., Bau, D., Andonian, A., Belinkov, Y.: Locating and editing factual asso- ciations in GPT. Advances in Neural Information Processing Systems35(2022) Attention Head Imbalance in Modality Conflict 11

work page 2022

[14] [14]

Nguyen, T., Michaels, J., Fiterau, M., Jensen, D.: Challenges in understanding modality conflict in vision-language models (2025), https://arxiv.org/abs/2509. 02805

work page 2025

[15] [15]

Qian, J., Zheng, G., Zhu, Y., Yang, S.: Intervene-All-Paths: Unified mitigation of LVLMhallucinationsacrossalignmentformats.In:AdvancesinNeuralInformation Processing Systems (NeurIPS) (2025)

work page 2025

[16] [16]

In: Ad- vances in Neural Information Processing Systems

Vig, J., Gehrmann, S., Belinkov, Y., Qian, S., Nevo, D., Singer, Y., Shieber, S.: In- vestigating gender bias in language models using causal mediation analysis. In: Ad- vances in Neural Information Processing Systems. vol. 33, pp. 12388–12401 (2020)

work page 2020

[17] [17]

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

Wang, K., Variengien, A., Conmy, A., Shlegeris, B., Steinhardt, J.: Interpretabil- ity in the wild: a circuit for indirect object identification in GPT-2 small. arXiv preprint arXiv:2211.00593 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[18] [18]

In: Findings of the Association for Computational Linguistics ACL 2024

Wang, X., Pan, J., Ding, L., Biemann, C.: Mitigating hallucinations in large vision-language models with instruction contrastive decoding. In: Findings of the Association for Computational Linguistics ACL 2024. pp. 15840–15853 (2024), https://aclanthology.org/2024.findings-acl.937

work page 2024

[19] [19]

In: Proceedings of the AAAI Con- ference on Artificial Intelligence

Wang, Y., Aniri, Bi, J., Pirk, S., Ma, Y.: ASCD: Attention-steerable contrastive decoding for reducing hallucination in MLLM. In: Proceedings of the AAAI Con- ference on Artificial Intelligence. vol. 40, pp. 10306–10314 (2026)

work page 2026

[20] [20]

In: Inter- national Conference on Learning Representations (2025)

Yang, T., Li, Z., Cao, J., Xu, C.: Understanding and mitigating hallucination in large vision-language models via modular attribution and intervention. In: Inter- national Conference on Learning Representations (2025)

work page 2025

[21] [21]

Cross-modal information flow in multimodal large language models

Zhang, Z., Yadav, S., Han, F., Shutova, E.: Cross-modal information flow in mul- timodal large language models. arXiv preprint arXiv:2411.18620 (2024)

work page arXiv 2024

[22] [22]

In: Proceedings of the 42nd International Conference on Machine Learning

Zhang, Z., Zhou, W., Zhao, J., Li, H.: Robust multimodal large language models against modality conflict. In: Proceedings of the 42nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 267, pp. 77233–77253. PMLR (2025)

work page 2025

[23] [23]

In: Advances in Neural Information Processing Systems (2023)

Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E.P., Zhang, H., Gonzalez, J.E., Stoica, I.: Judging LLM-as-a-judge with MT-bench and chatbot arena. In: Advances in Neural Information Processing Systems (2023)

work page 2023

[24] [24]

Zhu, J., Wang, W., Chen, Z., et al.: InternVL3: Exploring advanced training and test-time recipes for open-source multimodal models (2025), https://arxiv.org/ abs/2504.10479

work page internal anchor Pith review Pith/arXiv arXiv 2025

[25] [25]

arXiv preprint arXiv:2410.03659 (2024) A Judge Reliability and Probe Detection Hall

Zhu, T., Liu, Q., Wang, F., Tu, Z., Chen, M.: Unraveling cross-modality knowledge conflict in large vision-language models. arXiv preprint arXiv:2410.03659 (2024) A Judge Reliability and Probe Detection Hall. labels premise following, not correctness: erroneous-premise following on MMMC and substituted-premise following on SCI. Llama-3.3-70B givesκ= 0.784...

work page arXiv 2024