Don't Look at the Numbers: Visual Anchoring Bias and Layer-wise Representation in VLMs
Pith reviewed 2026-05-13 01:59 UTC · model grok-4.3
The pith
Numeric anchors embedded in images systematically bias Vision-Language Model quality judgments across multiple architectures.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Embedded numeric anchors on images systematically bias Vision-Language Model quality judgments across six VLMs from five architectural families (ANOVA eta^2 = 0.18-0.77, all p < 0.001). Anchor effects are 2.5x larger than severe image quality degradation, confirming bias is not reducible to visual changes. Layer-wise probing reveals consistent dissociation: layers where anchor classification saturates (L12-L34) are suboptimal for quality prediction, with optimal layers deeper (R^2 = 0.69-0.91). Fusion analysis identifies architecture-dependent integration -- instant fusion at L1-L2 in two models versus partial or no fusion in three others.
What carries the argument
Layer-wise probing that identifies dissociation between layers saturating on anchor classification and deeper layers optimal for quality prediction, plus architecture-dependent early versus late fusion of visual-numeric information.
Load-bearing premise
The addition of numeric anchors isolates a pure anchoring effect rather than interacting with the models' prior training patterns or prompt phrasing.
What would settle it
Running the same quality judgment task after replacing numeric anchors with matched non-numeric visual patterns of equal visual complexity and finding no remaining bias would falsify the claim.
Figures
read the original abstract
Embedded numeric anchors on images systematically bias Vision-Language Model quality judgments across six VLMs from five architectural families (ANOVA eta^2 = 0.18-0.77, all p < 0.001). Anchor effects are 2.5x larger than severe image quality degradation, confirming bias is not reducible to visual changes. Layer-wise probing reveals consistent dissociation: layers where anchor classification saturates (L12-L34) are suboptimal for quality prediction, with optimal layers deeper (R^2 = 0.69-0.91). Fusion analysis identifies architecture-dependent integration -- instant fusion at L1-L2 in two models versus partial or no fusion in three others. These results establish a causal account of visual anchoring bias, linking behavioral susceptibility to representation dynamics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that embedding numeric anchors on images systematically biases quality judgments in six VLMs spanning five architectural families, with ANOVA effect sizes of eta^2 = 0.18-0.77 (all p < 0.001). Anchor effects are reported as 2.5x larger than those from severe image quality degradation, and layer-wise probing shows dissociation (anchor classification saturates at L12-L34 while quality prediction is optimal deeper, R^2 = 0.69-0.91), with architecture-dependent fusion patterns (instant at L1-L2 in two models, partial or absent in others).
Significance. If the results hold after addressing potential confounds, this would be a significant contribution to understanding biases in VLMs for quality assessment tasks. The cross-architecture replication and the magnitude relative to image degradation provide useful evidence, while the layer-wise and fusion analyses offer mechanistic insights into how VLMs integrate numeric and visual information.
major comments (3)
- [Abstract] Abstract: The reported ANOVA results and 2.5x comparison to image degradation control for low-level visual changes but do not address whether numeric anchors are processed as additional textual context in the prompt or via training-data co-occurrences, undermining the claim that the bias is specifically visual anchoring.
- [Layer-wise probing results] Layer-wise probing results: The dissociation between anchor saturation (L12-L34) and deeper quality-prediction layers, along with the fusion analysis, presupposes that the behavioral effect arises from visual feature integration; if the effect is instead driven by prompt or data-dependent mechanisms, these representational findings lose their explanatory force for the central bias claim.
- [Experimental design] Experimental design (implied in methods and results): No details are provided on exact stimuli, prompt formatting, data exclusion criteria, or controls for textual interpretation of embedded numbers, making it impossible to confirm that the manipulation isolates visual anchoring bias.
minor comments (1)
- [Abstract] The abstract would benefit from specifying the number of images, trials per condition, and exact prompt wording used for quality judgments to improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important clarifications needed regarding the specificity of the visual anchoring claim and methodological transparency. We address each major point below and have revised the manuscript to strengthen the evidence for visual processing while adding missing details.
read point-by-point responses
-
Referee: [Abstract] The reported ANOVA results and 2.5x comparison to image degradation control for low-level visual changes but do not address whether numeric anchors are processed as additional textual context in the prompt or via training-data co-occurrences, undermining the claim that the bias is specifically visual anchoring.
Authors: The anchors are embedded directly in the image pixels with no numeric content in the accompanying text prompts, which remain identical across conditions. The 2.5x comparison is to low-level visual degradation (e.g., blur, noise) rather than textual manipulation. While we did not include an explicit textual-number-in-prompt control, the layer-wise results show anchor information emerging in visual encoder layers before language fusion, consistent with visual rather than purely prompt-driven processing. We will add explicit discussion of training-data co-occurrence alternatives and note this as a boundary condition in the revised abstract and discussion. revision: partial
-
Referee: [Layer-wise probing results] The dissociation between anchor saturation (L12-L34) and deeper quality-prediction layers, along with the fusion analysis, presupposes that the behavioral effect arises from visual feature integration; if the effect is instead driven by prompt or data-dependent mechanisms, these representational findings lose their explanatory force for the central bias claim.
Authors: The probing is performed on visual encoder activations from image inputs containing the anchors; prompts contain no numbers. The observed dissociation (early anchor saturation vs. later optimal quality prediction) and architecture-specific fusion patterns therefore reflect how visually embedded numeric information alters the visual representations that later support the quality judgment. We agree the explanatory link should be stated more explicitly and will expand the discussion to contrast visual integration against prompt-only or data-co-occurrence accounts, including why the layer-wise pattern would be unlikely under a purely textual mechanism. revision: yes
-
Referee: [Experimental design] No details are provided on exact stimuli, prompt formatting, data exclusion criteria, or controls for textual interpretation of embedded numbers, making it impossible to confirm that the manipulation isolates visual anchoring bias.
Authors: We acknowledge this omission in the submitted version. The revised manuscript will include a dedicated Methods subsection with: (i) example stimuli showing anchor placement and formatting, (ii) verbatim prompt templates, (iii) precise data exclusion rules (e.g., response validity filters), and (iv) an additional control condition in which numeric values are supplied only via text in the prompt (no image embedding) to directly test textual vs. visual routes. These additions will allow readers to evaluate the isolation of the visual anchoring effect. revision: yes
Circularity Check
No circularity: purely empirical study with statistical results and no derivation chain
full rationale
The manuscript reports experimental findings on numeric anchor effects in VLMs via ANOVA (eta^2 values), R^2 from layer probing, and fusion analysis across models. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text or abstract. All claims rest on direct behavioral measurements and representational probes rather than any reduction of outputs to inputs by construction. The work is self-contained as an empirical investigation without theoretical derivations that could introduce circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard assumptions of ANOVA and linear regression hold for the reported eta-squared and R-squared values
Reference graph
Works this paper leans on
-
[1]
Bleeker M.J.R., Hendriksen M., Yates A., de Rijke M.,Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning, Transactions on Machine Learning Research, 2024. 28 M.SHALANKIN
work page 2024
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
-
[8]
Steinberg J., Gal O.,Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models, arXiv:2602.22918, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [9]
- [10]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.