Clip-count: Towards text-guided zero-shot object counting

Jiang, R · 2023 · arXiv 1783.361178

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting

cs.CV · 2026-05-04 · unverdicted · novelty 7.0

Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.

Unveiling the Visual Counting Bottleneck in Vision-Language Models

cs.MM · 2026-05-28 · unverdicted · novelty 6.0

VLMs fail at visual counting extrapolation because they cannot project visual magnitudes onto symbolic tokens, despite intact perceptual representations, supporting a fractured magnitude hypothesis.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting cs.CV · 2026-05-04 · unverdicted · none · ref 31
Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
Unveiling the Visual Counting Bottleneck in Vision-Language Models cs.MM · 2026-05-28 · unverdicted · none · ref 22
VLMs fail at visual counting extrapolation because they cannot project visual magnitudes onto symbolic tokens, despite intact perceptual representations, supporting a fractured magnitude hypothesis.

Clip-count: Towards text-guided zero-shot object counting

fields

years

verdicts

representative citing papers

citing papers explorer