pith. machine review for the scientific record. sign in

arxiv: 2412.10273 · v3 · submitted 2024-12-13 · 💻 cs.CV · cs.LG

Recognition: unknown

How to Spin an Object: First, Get the Shape Right

Authors on Pith no claims yet
classification 💻 cs.CV cs.LG
keywords crocsgenerationgeometryunpicappearanceconsistencyenablesframework
0
0 comments X
read the original abstract

Image-to-3D models increasingly rely on hierarchical generation to disentangle geometry and texture. However, the design choices underlying these two-stage models--particularly the optimal choice of intermediate geometric representations--remain largely understudied. To investigate this, we introduce unPIC (undo-a-Picture), a modular framework for empirical analysis of image-to-3D pipelines. By factorizing the generation process into a multiview-geometry prior followed by an appearance decoder, unPIC enables a rigorous comparison of intermediate geometry representations. Through this framework, we identify that a specific representation, Camera-Relative Object Coordinates (CROCS), significantly outperforms alternatives such as depth maps, pretrained visual features, and other pointmap-based representations. We demonstrate that CROCS is not only easier for the first-stage geometry prior to predict, but also serves as an effective conditioning signal for ensuring 360-degree consistency during appearance decoding. Another advantage is that CROCS enables fully feedforward, direct 3D point cloud generation without requiring a separate post-hoc reconstruction step. Our unPIC formulation utilizing CROCS achieves superior novel-view quality, geometric accuracy, and multiview consistency; it outperforms leading baselines, including InstantMesh, Direct3D, CAT3D, Free3D, and EscherNet, on datasets of real-world 3D captures like Google Scanned Objects and the Digital Twin Catalog.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.