VLMs frequently switch away from a target visual path to nearby similar distractors in controlled tracing tasks, with standard scaling, reasoning, and instruction interventions providing only partial mitigation.
Response wide shut? surprising observations in basic vision language model capabilities
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
VLMs Trace Without Tracking: Diagnosing Failures in Visual Path Following
VLMs frequently switch away from a target visual path to nearby similar distractors in controlled tracing tasks, with standard scaling, reasoning, and instruction interventions providing only partial mitigation.