Vision-language models show some human-like patterns in visual search effort (flat for features, rising for conjunctions) but diverge on target-present vs absent slopes and enumeration accuracy when reasoning tokens proxy reaction time.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Do vision-language models search like humans? Reasoning tokens as a reaction-time analog in classic visual-search paradigms
Vision-language models show some human-like patterns in visual search effort (flat for features, rising for conjunctions) but diverge on target-present vs absent slopes and enumeration accuracy when reasoning tokens proxy reaction time.