Finding Culture-Sensitive Neurons in Vision-Language Models

Ivan Titov; Rochelle Choenni; Rohit Saxena; Xiutian Zhao

arxiv: 2510.24942 · v2 · submitted 2025-10-28 · 💻 cs.LG · cs.AI· cs.CL

Finding Culture-Sensitive Neurons in Vision-Language Models

Xiutian Zhao , Rochelle Choenni , Rohit Saxena , Ivan Titov This is my paper

Pith reviewed 2026-05-18 02:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL

keywords culture-sensitive neuronsvision-language modelsneuron ablationcultural visual question answeringmodel interpretabilityConAct selectorCVQA benchmark

0 comments

The pith

Vision-language models contain neurons whose selective ablation impairs performance only on matching cultural questions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether vision-language models encode cultural information in identifiable neurons rather than spreading it uniformly. Using the CVQA benchmark, the authors flag neurons that respond more to inputs from one of 25 cultural groups, then deactivate those neurons in three different models and measure the resulting drops in visual question answering accuracy. The experiments show that targeted deactivation hurts accuracy on questions tied to the corresponding culture far more than on unrelated ones, and that a new margin-based selector called ConAct identifies these units more effectively than earlier probability or entropy approaches. Layer analysis further indicates the sensitive neurons cluster in particular decoder layers in a model-specific pattern.

Core claim

Neurons whose activations are selectively higher for inputs from a given cultural context can be located with activation-based selectors; when these units are ablated, accuracy falls sharply on questions about that same culture while remaining largely intact for other cultures. The new Contrastive Activation Margin selector outperforms prior methods at surfacing such units, and the neurons are concentrated in specific layers rather than distributed evenly across the network.

What carries the argument

Culture-sensitive neurons identified by activation selectivity and tested through targeted ablation on the CVQA benchmark.

If this is right

Deactivating a small set of culture-flagged neurons reduces accuracy on questions from the corresponding culture while leaving other cultures largely unaffected.
The Contrastive Activation Margin method locates these neurons more reliably than probability- or entropy-based alternatives.
The sensitive neurons concentrate in particular decoder layers, with the exact layers varying by model architecture.
The pattern appears consistently across three vision-language models and twenty-five cultural groups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the neurons prove causal, targeted editing of only those units could adjust cultural behavior without full retraining.
The same selectivity approach might reveal specialized neurons for other attributes such as language variant or visual domain.
Layer-specific clustering points to possible training interventions that strengthen cultural alignment by focusing updates on those layers.

Load-bearing premise

The flagged neurons are causally responsible for cultural processing instead of merely correlating with other non-cultural features present in the test questions.

What would settle it

If ablating the selected neurons produced roughly equal performance drops across all cultural groups rather than disproportionately large drops only on the matching group, the claim of culture-specific sensitivity would not hold.

read the original abstract

Despite their impressive performance, vision-language models (VLMs) still struggle on culturally situated inputs. To understand how VLMs process culturally grounded information, we study the presence of culture-sensitive neurons, i.e., neurons whose activations show preferential sensitivity to inputs associated with particular cultural contexts. We examine whether such neurons are important for culturally diverse visual question answering and where they are located. Using the CVQA benchmark, we identify neurons of culture selectivity and perform diagnostic tests by deactivating the neurons flagged by various identification methods. Experiments on three VLMs across 25 cultural groups demonstrate the existence of neurons whose ablation disproportionately harms performance on questions about the corresponding cultures, while having limited effects on others. Moreover, we introduce a new margin-based selector Contrastive Activation Margin (ConAct) and show that it outperforms probability- and entropy-based methods in identifying neurons associated with cultural selectivity. Finally, our layer-wise analyses reveal that such neurons are not uniformly distributed: they cluster in specific decoder layers in a model-dependent way.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims culture-sensitive neurons exist in VLMs and introduces ConAct to find them, but the abstract alone leaves the causal link unproven.

read the letter

The core point here is that the authors report neurons in three VLMs whose removal hurts performance more on CVQA questions tied to one of 25 cultural groups than on others, and they offer a new margin-based selector called ConAct that beats simpler probability or entropy baselines. They also note these neurons cluster in particular decoder layers depending on the model. That is the actual new piece: a targeted way to surface units that appear selective for cultural content in vision-language settings, plus the layer-wise distribution result. The ablation tests give some empirical grounding that these units matter for the task, which is a reasonable first step toward mechanistic interpretability on this topic. Credit to them for moving beyond generic activation analysis and testing multiple identification methods head-to-head on a real benchmark. The soft spot is exactly what the stress-test flags. Without the full methods it is impossible to tell whether the flagged neurons are responding to genuinely cultural features or to correlated but non-cultural signals such as image style, lighting, question phrasing, or object co-occurrence that happen to differ across the 25 groups. The abstract mentions diagnostic tests but gives no detail on statistical thresholds, baseline ablations of random or non-selective neurons, or explicit controls for those confounds. That gap makes the causal claim harder to evaluate right now. The work is aimed at people doing interpretability on multimodal models and at teams trying to diagnose or mitigate cultural performance gaps. A reader already working on neuron-level analysis or fairness benchmarks would get the most out of the ConAct comparison and the layer findings. It is coherent enough and engages the right literature to deserve a serious referee, even though the current evidence is preliminary. I would send it to review with a request for the full experimental controls and any additional checks on whether the selectivity holds after accounting for obvious visual or linguistic covariates.

Referee Report

2 major / 1 minor

Summary. The paper claims that vision-language models contain culture-sensitive neurons identifiable via activation patterns on the CVQA benchmark. Using ablation on neurons selected by multiple methods (including the introduced Contrastive Activation Margin or ConAct selector) across three VLMs and 25 cultural groups, it reports that deactivating these neurons disproportionately reduces performance on culture-specific questions while having limited effects on others. It further claims that such neurons cluster in specific decoder layers in a model-dependent manner and that ConAct outperforms probability- and entropy-based selectors.

Significance. If the ablation results can be shown to isolate cultural processing rather than correlated non-cultural features, the work would provide useful empirical evidence for localized cultural representations in VLMs. The layer-wise distribution findings and the comparative evaluation of neuron selectors could inform future interpretability studies and efforts to improve cultural robustness in multimodal models.

major comments (2)

[Abstract] Abstract: The central claim that ablation 'disproportionately harms performance on questions about the corresponding cultures' is load-bearing for the existence of culture-sensitive neurons, yet the abstract supplies no quantitative definition of 'disproportionate,' no statistical thresholds, and no description of controls for confounds such as image style, question phrasing, or non-diagnostic visual content that co-vary with the 25 cultural groups. Without these, it is not possible to distinguish causal cultural sensitivity from correlation.
[Abstract] Abstract: The superiority of the new ConAct selector is asserted but the abstract gives only the label 'margin-based' with no equation, no precise definition of the contrastive margin, and no details on the experimental comparison (e.g., number of neurons selected, exact metrics, or statistical significance) against probability- and entropy-based methods. This information is required to evaluate the methodological contribution.

minor comments (1)

[Abstract] The abstract would be clearer if it named the three VLMs and briefly indicated the total number of neurons or layers examined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these detailed comments on the abstract. We agree that greater precision in the abstract will strengthen the presentation of our claims and methods. We will revise the abstract in the next version to address both points.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that ablation 'disproportionately harms performance on questions about the corresponding cultures' is load-bearing for the existence of culture-sensitive neurons, yet the abstract supplies no quantitative definition of 'disproportionate,' no statistical thresholds, and no description of controls for confounds such as image style, question phrasing, or non-diagnostic visual content that co-vary with the 25 cultural groups. Without these, it is not possible to distinguish causal cultural sensitivity from correlation.

Authors: We accept this critique of the abstract's brevity. In the full paper the term 'disproportionately' is defined via direct comparison of ablation-induced accuracy drops on culture-matched CVQA items versus a matched control set of non-cultural and cross-cultural items, with significance evaluated by statistical tests across runs and groups. The experimental design includes controls for image style (via augmentation), question phrasing (standardized templates), and visual content (by holding images fixed while varying cultural context). We will revise the abstract to include a concise statement of this quantitative comparison and the controls employed. revision: yes
Referee: [Abstract] Abstract: The superiority of the new ConAct selector is asserted but the abstract gives only the label 'margin-based' with no equation, no precise definition of the contrastive margin, and no details on the experimental comparison (e.g., number of neurons selected, exact metrics, or statistical significance) against probability- and entropy-based methods. This information is required to evaluate the methodological contribution.

Authors: We agree the abstract is too terse on the new selector. The full manuscript defines the contrastive margin explicitly and compares ConAct against the baselines by selecting the same number of top-ranked neurons and measuring their ablation impact on CVQA accuracy for the 25 groups. We will expand the abstract to supply a brief definition of the margin and to state that ConAct is evaluated by the magnitude and statistical reliability of the resulting performance degradation relative to the probability- and entropy-based alternatives. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ablation study on external benchmark

full rationale

The abstract describes an empirical investigation: neuron identification via ConAct and other selectors, followed by ablation tests on the CVQA benchmark across three VLMs and 25 cultural groups. No equations, derivations, fitted parameters, or self-citations are present that reduce any reported effect to quantities defined by the same cultural data. The central claim rests on observable performance differences after intervention, which are independent of the identification procedure by construction and falsifiable against the external benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is limited to the abstract; no explicit free parameters, invented entities, or ad-hoc axioms are stated. The work implicitly relies on standard interpretability assumptions about the causal role of individual neurons.

axioms (1)

domain assumption Changes in model output after targeted neuron deactivation reflect the causal contribution of those neurons to the observed behavior.
This premise supports the diagnostic ablation tests described in the abstract.

pith-pipeline@v0.9.0 · 5680 in / 1289 out tokens · 43109 ms · 2026-05-18T02:18:20.355027+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adapt activation-based neuron analysis... using counters K(c), S(c), normalized P(c), M(c); introduce Contrastive Activation Selection (CAS) measuring gap between top and second culture
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Ablating top-r% neurons... self-deactivation vs cross-deactivation gaps

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.