SleepWalk is a new benchmark exposing that frontier VLMs struggle with spatially grounded trajectory prediction in 3D environments, with performance declining sharply as task difficulty increases across three tiers.
Only reference items clearly visible in the provided images
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation
SleepWalk is a new benchmark exposing that frontier VLMs struggle with spatially grounded trajectory prediction in 3D environments, with performance declining sharply as task difficulty increases across three tiers.