Faithful GRPO cuts inconsistent chain-of-thought reasoning in visual spatial tasks from 24.5% to 1.7% while raising visual grounding scores by 13% and final answer accuracy on seven benchmarks.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization
Faithful GRPO cuts inconsistent chain-of-thought reasoning in visual spatial tasks from 24.5% to 1.7% while raising visual grounding scores by 13% and final answer accuracy on seven benchmarks.