Cognitive Alignment At No Cost: Inducing Human Attention Biases For Interpretable Vision Transformers
Pith reviewed 2026-05-10 02:02 UTC · model grok-4.3
The pith
Fine-tuning vision transformers on human saliency maps induces human-like attention biases at no cost to accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Fine-tuning the self-attention weights of ViT-B/16 on human saliency fixation maps, compared against a shuffled control, produces significantly higher alignment across five saliency metrics and induces three hallmark human biases: reversal of the baseline large-object preference toward small objects, amplified animacy preference, and lowered attention entropy. Bayesian parity analysis supplies decisive to very-strong evidence that these changes leave classification accuracy intact on ImageNet, ImageNet-C, and ObjectNet. The same procedure applied to ResNet-50 instead reduces both alignment and accuracy, indicating that the modular self-attention mechanism uniquely permits dissociation of spa
What carries the argument
Fine-tuning of ViT self-attention weights on human saliency fixation maps, isolated via comparison to a shuffled control.
Load-bearing premise
The human saliency fixation maps capture semantically relevant signals that the shuffled control can separate from nonspecific supervision effects.
What would settle it
Retraining with the shuffled maps yields the same alignment gains and bias changes as the real maps, or classification accuracy drops on ImageNet, ImageNet-C, or ObjectNet after fine-tuning with the real maps.
Figures
read the original abstract
For state-of-the-art image understanding, Vision Transformers (ViTs) have become the standard architecture but their processing diverges substantially from human attentional characteristics. We investigate whether this cognitive gap can be shrunk by fine-tuning the self-attention weights of Google's ViT-B/16 on human saliency fixation maps. To isolate the effects of semantically relevant signals from generic human supervision, the tuned model is compared against a shuffled control. Fine-tuning significantly improved alignment across five saliency metrics and induced three hallmark human-like biases: tuning reversed the baseline's anti-human large-object bias toward small-objects, amplified the animacy preference and diminished extreme attention entropy. Bayesian parity analysis provides decisive to very-strong evidence that this cognitive alignment comes at no cost to the model's original classification performance on in- (ImageNet), corrupted (ImageNet-C) and out-of-distribution (ObjectNet) benchmarks. An equivalent procedure applied to a ResNet-50 Convolutional Neural Network (CNN) instead degraded both alignment and accuracy, suggesting that the ViT's modular self-attention mechanism is uniquely suited for dissociating spatial priority from representational logic. These findings demonstrate that biologically grounded priors can be instilled as a free emergent property of human-aligned attention, to improve transformer interpretability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that fine-tuning the self-attention layers of ViT-B/16 on human saliency fixation maps improves alignment with human attention patterns across five saliency metrics, induces three human-like biases (reversal of anti-human large-object bias toward small objects, amplified animacy preference, and reduced attention entropy), and that this alignment incurs no cost to classification accuracy. This is supported by comparison to a shuffled control (to isolate semantic signals) and Bayesian parity analysis showing decisive-to-very-strong evidence of equivalent performance on ImageNet, ImageNet-C, and ObjectNet. The same procedure applied to ResNet-50 degrades both alignment and accuracy, suggesting ViT self-attention is uniquely suited for this dissociation.
Significance. If the empirical results and controls hold, the work is significant for demonstrating that biologically grounded attention priors can be instilled in transformers as an emergent property that enhances interpretability without performance trade-offs on in-distribution, corrupted, and out-of-distribution benchmarks. The architectural contrast with CNNs and the use of a shuffled control provide a concrete way to separate generic supervision from semantically relevant human biases, with potential implications for more human-aligned vision models.
major comments (2)
- [Bayesian parity analysis (abstract and results)] The central 'at no cost' claim rests on Bayesian parity analysis providing decisive-to-very-strong evidence of equivalent accuracy across ImageNet, ImageNet-C, and ObjectNet. However, the abstract (and presumably the corresponding results section) reports neither the exact model specification, prior choices (e.g., normal vs. Cauchy on the performance difference), ROPE width, numerical Bayes factor values, nor any sensitivity/robustness checks. Equivalence Bayes factors are known to be sensitive to these choices; without them the strength of evidence cannot be verified and the claim does not fully follow from the reported data.
- [Methods (shuffled control description)] The shuffled control is presented as isolating semantically relevant signals from generic human supervision, yet the manuscript provides no quantitative comparison of how well the shuffled maps preserve low-level statistics (e.g., center bias, entropy) versus the original fixation maps. If the control fails to fully match these statistics, the attribution of bias induction specifically to semantic content is weakened.
minor comments (2)
- [Abstract] The abstract states that five saliency metrics were used but does not name them; explicitly listing the metrics (e.g., NSS, CC, etc.) would improve immediate readability.
- [Abstract] The claim that the procedure 'reversed the baseline's anti-human large-object bias' would benefit from a brief quantitative statement of the effect size or statistical test in the abstract or early results.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us improve the clarity and rigor of our manuscript. We address each of the major comments in detail below.
read point-by-point responses
-
Referee: [Bayesian parity analysis (abstract and results)] The central 'at no cost' claim rests on Bayesian parity analysis providing decisive-to-very-strong evidence of equivalent accuracy across ImageNet, ImageNet-C, and ObjectNet. However, the abstract (and presumably the corresponding results section) reports neither the exact model specification, prior choices (e.g., normal vs. Cauchy on the performance difference), ROPE width, numerical Bayes factor values, nor any sensitivity/robustness checks. Equivalence Bayes factors are known to be sensitive to these choices; without them the strength of evidence cannot be verified and the claim does not fully follow from the reported data.
Authors: We acknowledge that the manuscript does not provide the full specification of the Bayesian parity analysis. We will revise the Methods and Results sections to include the exact model specification, prior choices (normal prior on the performance difference), ROPE width, numerical Bayes factor values, and sensitivity/robustness checks under alternative priors. This will allow verification of the evidence strength for the equivalence claim. revision: yes
-
Referee: [Methods (shuffled control description)] The shuffled control is presented as isolating semantically relevant signals from generic human supervision, yet the manuscript provides no quantitative comparison of how well the shuffled maps preserve low-level statistics (e.g., center bias, entropy) versus the original fixation maps. If the control fails to fully match these statistics, the attribution of bias induction specifically to semantic content is weakened.
Authors: We agree that quantifying the preservation of low-level statistics in the shuffled control would strengthen the claim that it isolates semantic content. We will add this comparison to the Methods section, reporting metrics such as center bias and entropy for the original versus shuffled maps to demonstrate that the procedure primarily disrupts semantic alignment while largely preserving spatial statistics. revision: yes
Circularity Check
Empirical fine-tuning and Bayesian evaluation of alignment vs. accuracy is self-contained
full rationale
The paper describes an experimental procedure: fine-tuning ViT-B/16 self-attention on human saliency maps, comparison to a shuffled control baseline, measurement of alignment metrics, and direct accuracy evaluation on ImageNet/ImageNet-C/ObjectNet. Bayesian parity analysis is applied post-hoc to the observed performance differences. No equations, parameters, or claims reduce by construction to their own inputs; the shuffled control supplies an independent contrast not derived from the success metrics. No self-citation chains, ansatzes, or renamings of known results are load-bearing for the central claim.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., & Kim, B. (2020). Sanity checks for saliency maps. https://doi.org/10.48550/arXiv.1810.03292 Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Gar- cia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artif...
-
[2]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255. Diaz, L. T., & Alvarez, G. A. Ventral Stream Responses to Inanimate Objects are Equally Aligned with AlexNet (2012) and Modern Deep Neural Networks.https://2025.ccneuro.org/ abstract_pdf/Diaz_2025_Ventral_Stream_R...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.11929 2012
-
[3]
P., & Tipper, S
Frischen, A., Bayliss, A. P., & Tipper, S. P. (2007). Gaze cueing of attention: visual attention, social cognition, and individual differences.Psychological bulletin,133(4),
2007
-
[4]
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2018, November). ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In International conference on learning representations. Geva, M., Schuster, R., Berant, J., & Levy, O. (2021, November). Transformer feed-forward la...
-
[5]
L., Kumar, S., Sun, X., Mittal, A.,
Koorathota, S., Papadopoulos, N., Ma, J. L., Kumar, S., Sun, X., Mittal, A., ... & Sajda, P. (2023). Fixating on attention: Integrating human eye tracking into vision transformers. arXiv preprint arXiv:2308.13969. Krajbich I., Armel C., Rangel A. (2010). Visual fixations and the computation and comparison of value in simple choice. Nat. Neurosci. 13 1292–...
-
[6]
LeCun, Y., & Bengio, Y. (1998). Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks. Li, X., Yin, X., Li, C., Zhang, P., Hu, X., Zhang, L., ... &Gao, J.(2020, August). Oscar: Object- semantics aligned pre-training for vision-language tasks. InEuropean conference on computer vision(pp. 121-137). Cham...
-
[7]
J., Iyer, A., Itti, L., & Koch, C
Peters, R. J., Iyer, A., Itti, L., & Koch, C. (2005). Components of bottom-up gaze allocation in natural images. Vision research, 45(18), 2397-2416. Petersen, S. E., & Posner, M. I. (2012). The attention system of the human brain: 20 years after.Annual review of neuroscience,35, 73-89. Piñero, L. G. O., Carrasco, M., Aranda, J., & González, C. (2025). Com...
2005
-
[8]
Attention is All you Need
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L.u., Polosukhin I. Attention is All you Need. In: Guyon I., Luxburg U.V., Bengio S., Wallach H., Fergus R., Vishwanathan S., Garnett R., editors. Advances in Neural Information Processing Systems. Vol. 30 Curran Associates, Inc.; Red Hook, NY, USA: 2017 Walther, D., & Koch, C. ...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.