VLMs frequently switch away from a target visual path to nearby similar distractors in controlled tracing tasks, with standard scaling, reasoning, and instruction interventions providing only partial mitigation.
Understanding the limits of vision language models through the lens of the binding problem
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
VLMs Trace Without Tracking: Diagnosing Failures in Visual Path Following
VLMs frequently switch away from a target visual path to nearby similar distractors in controlled tracing tasks, with standard scaling, reasoning, and instruction interventions providing only partial mitigation.