Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
Mvdif- fusion: Enabling holistic multi-view image generation with correspondence-aware diffusion.arXiv preprint arXiv:2307.01097
7 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 7representative citing papers
BoostDream refines coarse feed-forward text-to-3D assets via 3D distillation, multi-view SDS loss from a 2D diffusion model, and prompt-consistent normal maps to produce higher-quality results more efficiently than standard SDS.
SyncDreamer produces multiview-consistent images from a single input image by jointly modeling their distribution and synchronizing intermediate diffusion states via 3D-aware attention.
MVDream is a multi-view diffusion model that functions as a generalizable 3D prior, enabling more consistent text-to-3D generation and few-shot 3D concept learning from 2D examples.
Restore3D restores shape and texture of broken 3D objects via multi-view image refinement with a Mask Self-Perceiver and coarse-to-fine mesh reconstruction, outperforming baselines on synthetic and real benchmarks.
Native3D introduces a direct 3D scene generation method using unified mesh-texture representation and 3D REPA Loss for semantic alignment, claimed to outperform prior 2D-dependent approaches.
DecoRec decomposes single-view 3D scene reconstruction into per-object diffusion reconstructions followed by a differentiable rendering and diffusion-guided merging pipeline.
citing papers explorer
-
BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion
BoostDream refines coarse feed-forward text-to-3D assets via 3D distillation, multi-view SDS loss from a 2D diffusion model, and prompt-consistent normal maps to produce higher-quality results more efficiently than standard SDS.