pith. sign in

arxiv: 2605.11218 · v1 · submitted 2026-05-11 · 💻 cs.AI

Don't Look at the Numbers: Visual Anchoring Bias and Layer-wise Representation in VLMs

Pith reviewed 2026-05-13 01:59 UTC · model grok-4.3

classification 💻 cs.AI
keywords visual anchoring biasvision-language modelsquality judgmentlayer-wise representationnumeric anchorsrepresentation dynamicsmodel fusion
0
0 comments X

The pith

Numeric anchors embedded in images systematically bias Vision-Language Model quality judgments across multiple architectures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether numbers placed directly on images distort how Vision-Language Models rate image quality. It finds consistent bias across six models from five families, with the distortion from anchors proving stronger than the distortion from severe actual quality loss. Internal analysis shows the bias arises because layers that readily detect the anchors are not the same layers that best support quality judgments. The work maps how different models fuse the anchor information at early or later stages of processing.

Core claim

Embedded numeric anchors on images systematically bias Vision-Language Model quality judgments across six VLMs from five architectural families (ANOVA eta^2 = 0.18-0.77, all p < 0.001). Anchor effects are 2.5x larger than severe image quality degradation, confirming bias is not reducible to visual changes. Layer-wise probing reveals consistent dissociation: layers where anchor classification saturates (L12-L34) are suboptimal for quality prediction, with optimal layers deeper (R^2 = 0.69-0.91). Fusion analysis identifies architecture-dependent integration -- instant fusion at L1-L2 in two models versus partial or no fusion in three others.

What carries the argument

Layer-wise probing that identifies dissociation between layers saturating on anchor classification and deeper layers optimal for quality prediction, plus architecture-dependent early versus late fusion of visual-numeric information.

Load-bearing premise

The addition of numeric anchors isolates a pure anchoring effect rather than interacting with the models' prior training patterns or prompt phrasing.

What would settle it

Running the same quality judgment task after replacing numeric anchors with matched non-numeric visual patterns of equal visual complexity and finding no remaining bias would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.11218 by M. Shalankin.

Figure 1
Figure 1. Figure 1: Conceptual framework of visual anchor￾ing bias in VLMs. Clean and anchored inputs (images with overlaid numeric ratings) are pro￾cessed through the transformer’s layered archi￾tecture. Score probing examines quality represen￾tations across layers; the fusion layer identifies where text and vision signals integrate; the an￾chor breakthrough layer marks where text clas￾sification saturates. The resulting ∆ (… view at source ↗
Figure 2
Figure 2. Figure 2: Example of the text overlay attack. The original image (left) is presented alongside ver￾sions with embedded anchors (center: anchor=2, right: anchor=8). The numeric cue is rendered as semi-transparent text overlay on the image [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Examples of text overlay attack with dif [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Anchor susceptibility across six VLMs. The chart shows η 2 (proportion of score vari￾ance explained by anchors), mean |∆| (average score shift), and anchor-score correlations for each model, ordered from most to least suscep￾tible. The models form a clear susceptibility ranking: Model η 2 Mean |∆| r Qwen3-VL-8B 0.77 2.42 0.79 Qwen3-VL-4B 0.55 2.24 0.53 MiniCPM-V-4 0.44 1.60 0.63 Gemma-3-4b 0.63 1.36 0.77 Q… view at source ↗
Figure 5
Figure 5. Figure 5: Layer-wise anchor classification accuracy [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Dual-axis view of layer-wise classification [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Cosine similarity between text-injected and clean image representations across trans￾former layers for five VLMs. The four distinct patterns—instant fusion (Gemma-3, Gemma-4), gradual growth (MiniCPM), near-fusion with di￾vergence (Qwen3.5), and DROP at breakthrough (Qwen3-VL-4B)—reveal fundamentally different cross-modal integration strategies. Cross-Phase Timing. Synthesizing all four probing phases re￾v… view at source ↗
Figure 8
Figure 8. Figure 8: Cross-phase timing summary across five probed VLMs. Each row shows the progression of score probe breakthrough, fusion layer, anchor classification breakthrough, and optimal quality prediction layer. Architectures exhibit distinct temporal ordering of these milestones [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
read the original abstract

Embedded numeric anchors on images systematically bias Vision-Language Model quality judgments across six VLMs from five architectural families (ANOVA eta^2 = 0.18-0.77, all p < 0.001). Anchor effects are 2.5x larger than severe image quality degradation, confirming bias is not reducible to visual changes. Layer-wise probing reveals consistent dissociation: layers where anchor classification saturates (L12-L34) are suboptimal for quality prediction, with optimal layers deeper (R^2 = 0.69-0.91). Fusion analysis identifies architecture-dependent integration -- instant fusion at L1-L2 in two models versus partial or no fusion in three others. These results establish a causal account of visual anchoring bias, linking behavioral susceptibility to representation dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript claims that embedding numeric anchors on images systematically biases quality judgments in six VLMs spanning five architectural families, with ANOVA effect sizes of eta^2 = 0.18-0.77 (all p < 0.001). Anchor effects are reported as 2.5x larger than those from severe image quality degradation, and layer-wise probing shows dissociation (anchor classification saturates at L12-L34 while quality prediction is optimal deeper, R^2 = 0.69-0.91), with architecture-dependent fusion patterns (instant at L1-L2 in two models, partial or absent in others).

Significance. If the results hold after addressing potential confounds, this would be a significant contribution to understanding biases in VLMs for quality assessment tasks. The cross-architecture replication and the magnitude relative to image degradation provide useful evidence, while the layer-wise and fusion analyses offer mechanistic insights into how VLMs integrate numeric and visual information.

major comments (3)
  1. [Abstract] Abstract: The reported ANOVA results and 2.5x comparison to image degradation control for low-level visual changes but do not address whether numeric anchors are processed as additional textual context in the prompt or via training-data co-occurrences, undermining the claim that the bias is specifically visual anchoring.
  2. [Layer-wise probing results] Layer-wise probing results: The dissociation between anchor saturation (L12-L34) and deeper quality-prediction layers, along with the fusion analysis, presupposes that the behavioral effect arises from visual feature integration; if the effect is instead driven by prompt or data-dependent mechanisms, these representational findings lose their explanatory force for the central bias claim.
  3. [Experimental design] Experimental design (implied in methods and results): No details are provided on exact stimuli, prompt formatting, data exclusion criteria, or controls for textual interpretation of embedded numbers, making it impossible to confirm that the manipulation isolates visual anchoring bias.
minor comments (1)
  1. [Abstract] The abstract would benefit from specifying the number of images, trials per condition, and exact prompt wording used for quality judgments to improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important clarifications needed regarding the specificity of the visual anchoring claim and methodological transparency. We address each major point below and have revised the manuscript to strengthen the evidence for visual processing while adding missing details.

read point-by-point responses
  1. Referee: [Abstract] The reported ANOVA results and 2.5x comparison to image degradation control for low-level visual changes but do not address whether numeric anchors are processed as additional textual context in the prompt or via training-data co-occurrences, undermining the claim that the bias is specifically visual anchoring.

    Authors: The anchors are embedded directly in the image pixels with no numeric content in the accompanying text prompts, which remain identical across conditions. The 2.5x comparison is to low-level visual degradation (e.g., blur, noise) rather than textual manipulation. While we did not include an explicit textual-number-in-prompt control, the layer-wise results show anchor information emerging in visual encoder layers before language fusion, consistent with visual rather than purely prompt-driven processing. We will add explicit discussion of training-data co-occurrence alternatives and note this as a boundary condition in the revised abstract and discussion. revision: partial

  2. Referee: [Layer-wise probing results] The dissociation between anchor saturation (L12-L34) and deeper quality-prediction layers, along with the fusion analysis, presupposes that the behavioral effect arises from visual feature integration; if the effect is instead driven by prompt or data-dependent mechanisms, these representational findings lose their explanatory force for the central bias claim.

    Authors: The probing is performed on visual encoder activations from image inputs containing the anchors; prompts contain no numbers. The observed dissociation (early anchor saturation vs. later optimal quality prediction) and architecture-specific fusion patterns therefore reflect how visually embedded numeric information alters the visual representations that later support the quality judgment. We agree the explanatory link should be stated more explicitly and will expand the discussion to contrast visual integration against prompt-only or data-co-occurrence accounts, including why the layer-wise pattern would be unlikely under a purely textual mechanism. revision: yes

  3. Referee: [Experimental design] No details are provided on exact stimuli, prompt formatting, data exclusion criteria, or controls for textual interpretation of embedded numbers, making it impossible to confirm that the manipulation isolates visual anchoring bias.

    Authors: We acknowledge this omission in the submitted version. The revised manuscript will include a dedicated Methods subsection with: (i) example stimuli showing anchor placement and formatting, (ii) verbatim prompt templates, (iii) precise data exclusion rules (e.g., response validity filters), and (iv) an additional control condition in which numeric values are supplied only via text in the prompt (no image embedding) to directly test textual vs. visual routes. These additions will allow readers to evaluate the isolation of the visual anchoring effect. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical study with statistical results and no derivation chain

full rationale

The manuscript reports experimental findings on numeric anchor effects in VLMs via ANOVA (eta^2 values), R^2 from layer probing, and fusion analysis across models. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text or abstract. All claims rest on direct behavioral measurements and representational probes rather than any reduction of outputs to inputs by construction. The work is self-contained as an empirical investigation without theoretical derivations that could introduce circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical study relying on standard statistical tests; no new free parameters, axioms beyond basic ANOVA assumptions, or invented entities.

axioms (1)
  • standard math Standard assumptions of ANOVA and linear regression hold for the reported eta-squared and R-squared values
    Invoked implicitly when reporting statistical significance and effect sizes.

pith-pipeline@v0.9.0 · 5424 in / 1191 out tokens · 46515 ms · 2026-05-13T01:59:44.847909+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages · 1 internal anchor

  1. [1]

    28 M.SHALANKIN

    Bleeker M.J.R., Hendriksen M., Yates A., de Rijke M.,Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning, Transactions on Machine Learning Research, 2024. 28 M.SHALANKIN

  2. [2]

    Cheng H., Xiao E., Wang Y., Zhang L., Zhang Q., Cao J., Xu K., Sun M., Hao X., Gu J., Xu R.,Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models, arXiv:2503.11519, 2025

  3. [3]

    Echterhoff J., Liu Y., Alessa A., McAuley J., He Z.,Cognitive Bias in Decision-Making with LLMs, arXiv:2403.00811, 2024

  4. [4]

    Hufe L., Venhoff C., Purelku E., Dreyer M., Lapuschkin S., Samek W., Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP, arXiv:2508.20570, 2025

  5. [5]

    Li Q., Ye Z., Feng X., Zhong W., Ma W., Feng X.,Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation, arXiv:2511.05923, 2025

  6. [6]

    Lou J., Sun Y.,Anchoring Bias in Large Language Models: An Exper- imental Study, arXiv:2412.06593, 2024

  7. [7]

    Shi C., Yu Y., Yang S.,Vision Function Layer in Multimodal LLMs, arXiv:2509.24791, 2025

  8. [8]

    Steinberg J., Gal O.,Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models, arXiv:2602.22918, 2026

  9. [9]

    Suri G., Slater L.R., Ziaee A., Nguyen M.,Do Large Language Models Show Decision Heuristics Similar to Humans? A Case Study Using GPT-3.5, arXiv:2305.04400, 2023

  10. [10]

    Wang Z., Han Z., Chen S., Xue F., Ding Z., Xiao X., Tresp V., Torr P., Gu J.,Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image, arXiv:2402.14899, 2024