A new diagnostic benchmark decomposes LLM spatial navigation into three cognitive scales and shows that cross-scale aggregation, not single-level deficits, causes failure beyond small mazes.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
SVoT uses RL with GRPO to train MLLMs on interleaved textual and visual reasoning chains for multi-hop spatial tasks, achieving up to 65% accuracy gains on new domains with quantitative state verification.
SpecFlow represents intermediate visual thoughts in fixed-size DCT space and uses classifier-free guidance to steer updates from textual thoughts, achieving up to 2.1x lower computation and KV cache costs.
Confidence-based decoding and training in masked diffusion models shortcut long-range dependencies in reasoning, producing errors on complex inputs that random masking avoids.
citing papers explorer
-
SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning
SVoT uses RL with GRPO to train MLLMs on interleaved textual and visual reasoning chains for multi-hop spatial tasks, achieving up to 65% accuracy gains on new domains with quantitative state verification.
-
The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models
Confidence-based decoding and training in masked diffusion models shortcut long-range dependencies in reasoning, producing errors on complex inputs that random masking avoids.