VLMs show systematic drops in counting accuracy as visual and linguistic complexity rise, with modest gains from targeted attention reweighting in the decoder.
Example images for theObjectcategory,Colorpattern, showing different object colors
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions
VLMs show systematic drops in counting accuracy as visual and linguistic complexity rise, with modest gains from targeted attention reweighting in the decoder.