Sparse Neuron Ablation Triggers Catastrophic Collapse of the Language Core in Large Vision-Language Models

Andrea Cavallaro; Cen Lu; Yung-Chen Tang

read the original abstract

Large Vision-Language Models (LVLMs) have shown impressive multimodal understanding capabilities, yet the structures that sustain their functionality remain poorly understood from a mechanistic interpretability standpoint. We propose Consistently Activated Neurons (CAN), a progressive neuron ablation method to identify critical neurons whose removal triggers catastrophic collapse, and use it to investigate structural vulnerabilities in representative 7B LVLMs. Experiments reveal that catastrophic collapse can be triggered by ablating as few as four neurons in \texttt{LLaVA-1.5-7b-hf} and a few thousand in \texttt{InstructBLIP-vicuna-7b}, both representing a small fraction of model parameters. Notably, critical neurons are predominantly localized in the language model, particularly in its down-projection layer, rather than in the vision components. We also observe a consistent two-stage collapse pattern: initial expressive degradation followed by sudden, complete collapse. These findings reveal that LVLM functionality depends on a sparse subset of neurons concentrated in the language backbone, offering mechanistic insights into how their functionality is structured and where these models are most vulnerable.

Sparse Neuron Ablation Triggers Catastrophic Collapse of the Language Core in Large Vision-Language Models

discussion (0)