SPAD : Spatially Aware Multiview Diffusers

Aliaksandr Siarohin; Bernard Ghanem; Guocheng Qian; Igor Gilitschenski; Jian Ren; Michael Vasilkovsky; Riza Alp Guler; Sergey Tulyakov; Yash Kant; Ziyi Wu

arxiv: 2402.05235 · v1 · pith:RATDRQDEnew · submitted 2024-02-07 · 💻 cs.CV

SPAD : Spatially Aware Multiview Diffusers

Yash Kant , Ziyi Wu , Michael Vasilkovsky , Guocheng Qian , Jian Ren , Riza Alp Guler , Bernard Ghanem , Sergey Tulyakov

show 2 more authors

Igor Gilitschenski Aliaksandr Siarohin

This is my paper

classification 💻 cs.CV

keywords spadcameracross-viewgenerationimagesmulti-viewnovelobjaverse

0 comments

read the original abstract

We present SPAD, a novel approach for creating consistent multi-view images from text prompts or single images. To enable multi-view generation, we repurpose a pretrained 2D diffusion model by extending its self-attention layers with cross-view interactions, and fine-tune it on a high quality subset of Objaverse. We find that a naive extension of the self-attention proposed in prior work (e.g. MVDream) leads to content copying between views. Therefore, we explicitly constrain the cross-view attention based on epipolar geometry. To further enhance 3D consistency, we utilize Plucker coordinates derived from camera rays and inject them as positional encoding. This enables SPAD to reason over spatial proximity in 3D well. In contrast to recent works that can only generate views at fixed azimuth and elevation, SPAD offers full camera control and achieves state-of-the-art results in novel view synthesis on unseen objects from the Objaverse and Google Scanned Objects datasets. Finally, we demonstrate that text-to-3D generation using SPAD prevents the multi-face Janus issue. See more details at our webpage: https://yashkant.github.io/spad

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
cs.CV 2024-06 unverdicted novelty 6.0

CamCo equips image-to-video generators with Plücker-coordinate camera inputs and epipolar attention to improve 3D consistency and camera controllability.
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
cs.CV 2024-04 unverdicted novelty 6.0

InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.