SVoT uses RL with GRPO to train MLLMs on interleaved textual and visual reasoning chains for multi-hop spatial tasks, achieving up to 65% accuracy gains on new domains with quantitative state verification.
A config- urable library for generating and manipulating maze datasets.arXiv preprint arXiv:2309.10498, 2023
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
SpecFlow represents intermediate visual thoughts in fixed-size DCT space and uses classifier-free guidance to steer updates from textual thoughts, achieving up to 2.1x lower computation and KV cache costs.
Confidence-based decoding and training in masked diffusion models shortcut long-range dependencies in reasoning, producing errors on complex inputs that random masking avoids.
citing papers explorer
-
SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning
SVoT uses RL with GRPO to train MLLMs on interleaved textual and visual reasoning chains for multi-hop spatial tasks, achieving up to 65% accuracy gains on new domains with quantitative state verification.
-
Spectral-Progressive Thought Flow for Lightweight Multimodal Reasoning
SpecFlow represents intermediate visual thoughts in fixed-size DCT space and uses classifier-free guidance to steer updates from textual thoughts, achieving up to 2.1x lower computation and KV cache costs.
-
The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models
Confidence-based decoding and training in masked diffusion models shortcut long-range dependencies in reasoning, producing errors on complex inputs that random masking avoids.