Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
pixelnerf: Neural radiance fields from one or few images
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.
citing papers explorer
-
Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens
Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
-
C3G: Learning Compact 3D Representations with 2K Gaussians
C3G creates compact 3D Gaussian representations with 2K points by guiding placement via learnable tokens that aggregate multi-view features through attention, yielding better efficiency and performance than dense methods.