ShapeUP: Scalable Image-Conditioned 3D Editing

Inbar Gat , Dana Cohen-Bar , Guy Levy , Elad Richardson , Daniel Cohen-Or

Authors on Pith no claims yet

classification 💻 cs.CV cs.GR

keywords editingshapeupscalablevisualconsistencyeditedfoundationimage-conditioned

read the original abstract

Recent advancements in 3D foundation models have enabled the generation of high-fidelity assets, yet precise 3D manipulation remains a significant challenge. Existing 3D editing frameworks often face a difficult trade-off between visual controllability, geometric consistency, and scalability. Specifically, optimization-based methods are prohibitively slow, multi-view 2D propagation techniques suffer from visual drift, and training-free latent manipulation methods are inherently bound by frozen priors and cannot directly benefit from scaling. In this work, we present ShapeUP, a scalable, image-conditioned 3D editing framework that formulates editing as a supervised latent-to-latent translation within a native 3D representation. This formulation allows ShapeUP to build on a pretrained 3D foundation model, leveraging its strong generative prior while adapting it to editing through supervised training. In practice, ShapeUP is trained on triplets consisting of a source 3D shape, an edited 2D image, and the corresponding edited 3D shape, and learns a direct mapping using a 3D Diffusion Transformer (DiT). This image-as-prompt approach enables fine-grained visual control over both local and global edits and achieves implicit, mask-free localization, while maintaining strict structural consistency with the original asset. Our extensive evaluations demonstrate that ShapeUP consistently outperforms current trained and training-free baselines in both identity preservation and edit fidelity, offering a robust and scalable paradigm for native 3D content creation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Velocity-Space 3D Asset Editing
cs.GR 2026-05 unverdicted novelty 7.0

VS3D performs local 3D asset editing by injecting reconstruction-anchored source signals, partial-mean guidance, and twin-agreement residuals into the velocity sampler to control edit strength and preserve identity.