Closed-loop VLM agents using multi-view reasoning, object-centered visualization, and single-axis rotation prediction achieve superior text-guided 6D pose rearrangement for target objects in scenes.
It can help you infer information that is difficult to verify from the current images, such as detecting collisions between objects based on their original sizes
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Text-Guided 6D Object Pose Rearrangement via Closed-Loop VLM Agents
Closed-loop VLM agents using multi-view reasoning, object-centered visualization, and single-axis rotation prediction achieve superior text-guided 6D pose rearrangement for target objects in scenes.