Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
Clip-count: Towards text-guided zero-shot object counting
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
VLMs fail at visual counting extrapolation because they cannot project visual magnitudes onto symbolic tokens, despite intact perceptual representations, supporting a fractured magnitude hypothesis.
citing papers explorer
-
Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting
Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
-
Unveiling the Visual Counting Bottleneck in Vision-Language Models
VLMs fail at visual counting extrapolation because they cannot project visual magnitudes onto symbolic tokens, despite intact perceptual representations, supporting a fractured magnitude hypothesis.