A closed-loop self-evolving training system for spatial reasoning in MLLMs that iteratively generates QA pairs matched to the model's current capabilities via confidence feedback, achieving gains with an order of magnitude less data.
Thinking with spatial code for physical-world video reasoning.arXiv preprint arXiv:2603.05591,
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
OneCanvas aggregates multi-view 3D patches onto one panoramic canvas with continuous angular placement and 3D embeddings, enabling pretrained VLMs to achieve SOTA on SQA3D and VSI-Bench with an order of magnitude less compute via a new spatial pretraining curriculum.
citing papers explorer
-
Ouroboros-Spatial: Closing the Data-Model Loop for Spatial Reasoning
A closed-loop self-evolving training system for spatial reasoning in MLLMs that iteratively generates QA pairs matched to the model's current capabilities via confidence feedback, achieving gains with an order of magnitude less data.
-
OneCanvas: 3D Scene Understanding via Panoramic Reprojection
OneCanvas aggregates multi-view 3D patches onto one panoramic canvas with continuous angular placement and 3D embeddings, enabling pretrained VLMs to achieve SOTA on SQA3D and VSI-Bench with an order of magnitude less compute via a new spatial pretraining curriculum.