Behavior Forecasters trained on LRM trajectories outperform larger models in predicting repeatability and input sensitivity at low cost.
Marble: A hard benchmark for multimodal spatial reasoning and planning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A new consistency-verifier RL framework with OT-GRPO raises spatial reasoning accuracy in LRMs to near supervised levels using only internal geometric and semantic checks.
citing papers explorer
-
Forecasting Future Behavior as a Learning Task
Behavior Forecasters trained on LRM trajectories outperform larger models in predicting repeatability and input sensitivity at low cost.
-
The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning
A new consistency-verifier RL framework with OT-GRPO raises spatial reasoning accuracy in LRMs to near supervised levels using only internal geometric and semantic checks.