Don't Look at the Numbers: Visual Anchoring Bias and Layer-wise Representation in VLMs

M. Shalankin

arxiv: 2605.11218 · v1 · submitted 2026-05-11 · 💻 cs.AI

Don't Look at the Numbers: Visual Anchoring Bias and Layer-wise Representation in VLMs

M. Shalankin This is my paper

Pith reviewed 2026-05-13 01:59 UTC · model grok-4.3

classification 💻 cs.AI

keywords visual anchoring biasvision-language modelsquality judgmentlayer-wise representationnumeric anchorsrepresentation dynamicsmodel fusion

0 comments

The pith

Numeric anchors embedded in images systematically bias Vision-Language Model quality judgments across multiple architectures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether numbers placed directly on images distort how Vision-Language Models rate image quality. It finds consistent bias across six models from five families, with the distortion from anchors proving stronger than the distortion from severe actual quality loss. Internal analysis shows the bias arises because layers that readily detect the anchors are not the same layers that best support quality judgments. The work maps how different models fuse the anchor information at early or later stages of processing.

Core claim

Embedded numeric anchors on images systematically bias Vision-Language Model quality judgments across six VLMs from five architectural families (ANOVA eta^2 = 0.18-0.77, all p < 0.001). Anchor effects are 2.5x larger than severe image quality degradation, confirming bias is not reducible to visual changes. Layer-wise probing reveals consistent dissociation: layers where anchor classification saturates (L12-L34) are suboptimal for quality prediction, with optimal layers deeper (R^2 = 0.69-0.91). Fusion analysis identifies architecture-dependent integration -- instant fusion at L1-L2 in two models versus partial or no fusion in three others.

What carries the argument

Layer-wise probing that identifies dissociation between layers saturating on anchor classification and deeper layers optimal for quality prediction, plus architecture-dependent early versus late fusion of visual-numeric information.

Load-bearing premise

The addition of numeric anchors isolates a pure anchoring effect rather than interacting with the models' prior training patterns or prompt phrasing.

What would settle it

Running the same quality judgment task after replacing numeric anchors with matched non-numeric visual patterns of equal visual complexity and finding no remaining bias would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.11218 by M. Shalankin.

**Figure 1.** Figure 1: Conceptual framework of visual anchoring bias in VLMs. Clean and anchored inputs (images with overlaid numeric ratings) are processed through the transformer’s layered architecture. Score probing examines quality representations across layers; the fusion layer identifies where text and vision signals integrate; the anchor breakthrough layer marks where text classification saturates. The resulting ∆ (… view at source ↗

**Figure 2.** Figure 2: Example of the text overlay attack. The original image (left) is presented alongside versions with embedded anchors (center: anchor=2, right: anchor=8). The numeric cue is rendered as semi-transparent text overlay on the image [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of text overlay attack with dif [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Anchor susceptibility across six VLMs. The chart shows η 2 (proportion of score variance explained by anchors), mean |∆| (average score shift), and anchor-score correlations for each model, ordered from most to least susceptible. The models form a clear susceptibility ranking: Model η 2 Mean |∆| r Qwen3-VL-8B 0.77 2.42 0.79 Qwen3-VL-4B 0.55 2.24 0.53 MiniCPM-V-4 0.44 1.60 0.63 Gemma-3-4b 0.63 1.36 0.77 Q… view at source ↗

**Figure 5.** Figure 5: Layer-wise anchor classification accuracy [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Dual-axis view of layer-wise classification [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Cosine similarity between text-injected and clean image representations across transformer layers for five VLMs. The four distinct patterns—instant fusion (Gemma-3, Gemma-4), gradual growth (MiniCPM), near-fusion with divergence (Qwen3.5), and DROP at breakthrough (Qwen3-VL-4B)—reveal fundamentally different cross-modal integration strategies. Cross-Phase Timing. Synthesizing all four probing phases rev… view at source ↗

**Figure 8.** Figure 8: Cross-phase timing summary across five probed VLMs. Each row shows the progression of score probe breakthrough, fusion layer, anchor classification breakthrough, and optimal quality prediction layer. Architectures exhibit distinct temporal ordering of these milestones [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows numeric anchors on images shift VLM quality ratings more than real degradation does, with some layer-wise patterns, but the design leaves room for prompt and training-data confounds.

read the letter

The core finding is that embedding numbers in images moves quality judgments in six VLMs from five families, with ANOVA effects that beat a severe degradation control by 2.5 times. The layer probing adds something: anchor detection saturates earlier than the layers that best track the quality score, and fusion timing differs by architecture. That dissociation and the cross-model comparison are the parts that go beyond prior bias checks in vision-language work. The behavioral results are reported with clear stats and hold across the tested models, which is useful for anyone running quality pipelines. The comparison to image degradation helps rule out crude pixel-level explanations. The main soft spot is isolation. Numbers are both visual and textual, so the effect could come from the model treating the digits as extra prompt context or from training co-occurrences rather than pure visual anchoring. The abstract gives no detail on controls for prompt wording or for non-numeric markers, and the stress-test concern on that point still stands without seeing the full methods. If those controls are missing or weak, the causal story linking behavior to the reported layer dynamics loses force. This is relevant for robustness and evaluation work. A reader who cares about VLM reliability or representational analysis will get concrete numbers and architecture comparisons to think about. It is worth sending to referees even if the visual-bias claim needs tightening.

Referee Report

3 major / 1 minor

Summary. The manuscript claims that embedding numeric anchors on images systematically biases quality judgments in six VLMs spanning five architectural families, with ANOVA effect sizes of eta^2 = 0.18-0.77 (all p < 0.001). Anchor effects are reported as 2.5x larger than those from severe image quality degradation, and layer-wise probing shows dissociation (anchor classification saturates at L12-L34 while quality prediction is optimal deeper, R^2 = 0.69-0.91), with architecture-dependent fusion patterns (instant at L1-L2 in two models, partial or absent in others).

Significance. If the results hold after addressing potential confounds, this would be a significant contribution to understanding biases in VLMs for quality assessment tasks. The cross-architecture replication and the magnitude relative to image degradation provide useful evidence, while the layer-wise and fusion analyses offer mechanistic insights into how VLMs integrate numeric and visual information.

major comments (3)

[Abstract] Abstract: The reported ANOVA results and 2.5x comparison to image degradation control for low-level visual changes but do not address whether numeric anchors are processed as additional textual context in the prompt or via training-data co-occurrences, undermining the claim that the bias is specifically visual anchoring.
[Layer-wise probing results] Layer-wise probing results: The dissociation between anchor saturation (L12-L34) and deeper quality-prediction layers, along with the fusion analysis, presupposes that the behavioral effect arises from visual feature integration; if the effect is instead driven by prompt or data-dependent mechanisms, these representational findings lose their explanatory force for the central bias claim.
[Experimental design] Experimental design (implied in methods and results): No details are provided on exact stimuli, prompt formatting, data exclusion criteria, or controls for textual interpretation of embedded numbers, making it impossible to confirm that the manipulation isolates visual anchoring bias.

minor comments (1)

[Abstract] The abstract would benefit from specifying the number of images, trials per condition, and exact prompt wording used for quality judgments to improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important clarifications needed regarding the specificity of the visual anchoring claim and methodological transparency. We address each major point below and have revised the manuscript to strengthen the evidence for visual processing while adding missing details.

read point-by-point responses

Referee: [Abstract] The reported ANOVA results and 2.5x comparison to image degradation control for low-level visual changes but do not address whether numeric anchors are processed as additional textual context in the prompt or via training-data co-occurrences, undermining the claim that the bias is specifically visual anchoring.

Authors: The anchors are embedded directly in the image pixels with no numeric content in the accompanying text prompts, which remain identical across conditions. The 2.5x comparison is to low-level visual degradation (e.g., blur, noise) rather than textual manipulation. While we did not include an explicit textual-number-in-prompt control, the layer-wise results show anchor information emerging in visual encoder layers before language fusion, consistent with visual rather than purely prompt-driven processing. We will add explicit discussion of training-data co-occurrence alternatives and note this as a boundary condition in the revised abstract and discussion. revision: partial
Referee: [Layer-wise probing results] The dissociation between anchor saturation (L12-L34) and deeper quality-prediction layers, along with the fusion analysis, presupposes that the behavioral effect arises from visual feature integration; if the effect is instead driven by prompt or data-dependent mechanisms, these representational findings lose their explanatory force for the central bias claim.

Authors: The probing is performed on visual encoder activations from image inputs containing the anchors; prompts contain no numbers. The observed dissociation (early anchor saturation vs. later optimal quality prediction) and architecture-specific fusion patterns therefore reflect how visually embedded numeric information alters the visual representations that later support the quality judgment. We agree the explanatory link should be stated more explicitly and will expand the discussion to contrast visual integration against prompt-only or data-co-occurrence accounts, including why the layer-wise pattern would be unlikely under a purely textual mechanism. revision: yes
Referee: [Experimental design] No details are provided on exact stimuli, prompt formatting, data exclusion criteria, or controls for textual interpretation of embedded numbers, making it impossible to confirm that the manipulation isolates visual anchoring bias.

Authors: We acknowledge this omission in the submitted version. The revised manuscript will include a dedicated Methods subsection with: (i) example stimuli showing anchor placement and formatting, (ii) verbatim prompt templates, (iii) precise data exclusion rules (e.g., response validity filters), and (iv) an additional control condition in which numeric values are supplied only via text in the prompt (no image embedding) to directly test textual vs. visual routes. These additions will allow readers to evaluate the isolation of the visual anchoring effect. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical study with statistical results and no derivation chain

full rationale

The manuscript reports experimental findings on numeric anchor effects in VLMs via ANOVA (eta^2 values), R^2 from layer probing, and fusion analysis across models. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text or abstract. All claims rest on direct behavioral measurements and representational probes rather than any reduction of outputs to inputs by construction. The work is self-contained as an empirical investigation without theoretical derivations that could introduce circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical study relying on standard statistical tests; no new free parameters, axioms beyond basic ANOVA assumptions, or invented entities.

axioms (1)

standard math Standard assumptions of ANOVA and linear regression hold for the reported eta-squared and R-squared values
Invoked implicitly when reporting statistical significance and effect sizes.

pith-pipeline@v0.9.0 · 5424 in / 1191 out tokens · 46515 ms · 2026-05-13T01:59:44.847909+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages · 1 internal anchor

[1]

28 M.SHALANKIN

Bleeker M.J.R., Hendriksen M., Yates A., de Rijke M.,Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning, Transactions on Machine Learning Research, 2024. 28 M.SHALANKIN

work page 2024
[2]

Cheng H., Xiao E., Wang Y., Zhang L., Zhang Q., Cao J., Xu K., Sun M., Hao X., Gu J., Xu R.,Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models, arXiv:2503.11519, 2025

work page arXiv 2025
[3]

Echterhoff J., Liu Y., Alessa A., McAuley J., He Z.,Cognitive Bias in Decision-Making with LLMs, arXiv:2403.00811, 2024

work page arXiv 2024
[4]

Hufe L., Venhoff C., Purelku E., Dreyer M., Lapuschkin S., Samek W., Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP, arXiv:2508.20570, 2025

work page arXiv 2025
[5]

Li Q., Ye Z., Feng X., Zhong W., Ma W., Feng X.,Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation, arXiv:2511.05923, 2025

work page arXiv 2025
[6]

Lou J., Sun Y.,Anchoring Bias in Large Language Models: An Exper- imental Study, arXiv:2412.06593, 2024

work page arXiv 2024
[7]

Shi C., Yu Y., Yang S.,Vision Function Layer in Multimodal LLMs, arXiv:2509.24791, 2025

work page arXiv 2025
[8]

Steinberg J., Gal O.,Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models, arXiv:2602.22918, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[9]

Suri G., Slater L.R., Ziaee A., Nguyen M.,Do Large Language Models Show Decision Heuristics Similar to Humans? A Case Study Using GPT-3.5, arXiv:2305.04400, 2023

work page arXiv 2023
[10]

Wang Z., Han Z., Chen S., Xue F., Ding Z., Xiao X., Tresp V., Torr P., Gu J.,Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image, arXiv:2402.14899, 2024

work page arXiv 2024

[1] [1]

28 M.SHALANKIN

Bleeker M.J.R., Hendriksen M., Yates A., de Rijke M.,Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning, Transactions on Machine Learning Research, 2024. 28 M.SHALANKIN

work page 2024

[2] [2]

Cheng H., Xiao E., Wang Y., Zhang L., Zhang Q., Cao J., Xu K., Sun M., Hao X., Gu J., Xu R.,Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models, arXiv:2503.11519, 2025

work page arXiv 2025

[3] [3]

Echterhoff J., Liu Y., Alessa A., McAuley J., He Z.,Cognitive Bias in Decision-Making with LLMs, arXiv:2403.00811, 2024

work page arXiv 2024

[4] [4]

Hufe L., Venhoff C., Purelku E., Dreyer M., Lapuschkin S., Samek W., Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP, arXiv:2508.20570, 2025

work page arXiv 2025

[5] [5]

Li Q., Ye Z., Feng X., Zhong W., Ma W., Feng X.,Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation, arXiv:2511.05923, 2025

work page arXiv 2025

[6] [6]

Lou J., Sun Y.,Anchoring Bias in Large Language Models: An Exper- imental Study, arXiv:2412.06593, 2024

work page arXiv 2024

[7] [7]

Shi C., Yu Y., Yang S.,Vision Function Layer in Multimodal LLMs, arXiv:2509.24791, 2025

work page arXiv 2025

[8] [8]

Steinberg J., Gal O.,Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models, arXiv:2602.22918, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[9] [9]

Suri G., Slater L.R., Ziaee A., Nguyen M.,Do Large Language Models Show Decision Heuristics Similar to Humans? A Case Study Using GPT-3.5, arXiv:2305.04400, 2023

work page arXiv 2023

[10] [10]

Wang Z., Han Z., Chen S., Xue F., Ding Z., Xiao X., Tresp V., Torr P., Gu J.,Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image, arXiv:2402.14899, 2024

work page arXiv 2024