Camera-Relative Object Coordinates (CROCS) as an intermediate geometry representation in two-stage image-to-3D models yields superior novel-view quality, geometric accuracy, and multiview consistency over depth maps, visual features, and other pointmap alternatives.
Multiple view ge- ometry in computer vision
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
Spann3R uses a learned spatial memory to regress per-image pointmaps directly in a shared global coordinate system, removing the need for optimization-based alignment after per-pair predictions.
citing papers explorer
-
How to Spin an Object: First, Get the Shape Right
Camera-Relative Object Coordinates (CROCS) as an intermediate geometry representation in two-stage image-to-3D models yields superior novel-view quality, geometric accuracy, and multiview consistency over depth maps, visual features, and other pointmap alternatives.
-
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
-
3D Reconstruction with Spatial Memory
Spann3R uses a learned spatial memory to regress per-image pointmaps directly in a shared global coordinate system, removing the need for optimization-based alignment after per-pair predictions.