GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
hub Canonical reference
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
Canonical reference. 82% of citing Pith papers cite this work as background.
abstract
In this paper, we present a novel diffusion model called that generates multiview-consistent images from a single-view image. Using pretrained large-scale 2D diffusion models, recent work Zero123 demonstrates the ability to generate plausible novel views from a single-view image of an object. However, maintaining consistency in geometry and colors for the generated images remains a challenge. To address this issue, we propose a synchronized multiview diffusion model that models the joint probability distribution of multiview images, enabling the generation of multiview-consistent images in a single reverse process. SyncDreamer synchronizes the intermediate states of all the generated images at every step of the reverse process through a 3D-aware feature attention mechanism that correlates the corresponding features across different views. Experiments show that SyncDreamer generates images with high consistency across different views, thus making it well-suited for various 3D generation tasks such as novel-view-synthesis, text-to-3D, and image-to-3D.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Introduces the first passive source attribution benchmark for 22 generative 3D models and a Transformer achieving 97.22% accuracy under full supervision and 77.17% with 1% training data.
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffusion model, outperforming prior methods on new CAD-220K and PrintCAD datasets.
ConFixGS repairs feedforward 3D Gaussian Splatting with confidence-aware diffusion priors, delivering up to 3.68 dB PSNR gains and halved FID scores on Waymo, nuScenes, and KITTI novel view synthesis tasks.
A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.
SafeMind is a differentiable framework that combines probabilistic control barrier functions, semantic context encoding, and meta-adaptive risk calibration to deliver safer, lower-energy quadruped locomotion under uncertainty.
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
Stroke of Surprise is a framework that generates vector sketches undergoing semantic transformation from one concept to another by adding strokes, using dual-branch SDS and overlay loss for optimization.
CoMoVi co-generates 3D human motions and 2D videos synchronously in a single diffusion denoising loop using 3D-to-2D projection and dual-branch diffusion with 3D-2D cross attentions.
Affostruction reconstructs full 3D object geometry from partial RGBD views and grounds text-based affordances on both visible and unobserved surfaces, reporting large gains over prior methods.
DreamGaussian creates high-quality textured 3D meshes from single-view images in 2 minutes via generative Gaussian Splatting with mesh extraction and UV refinement.
GeoFlow adds a geometry-consistency reward based on rigid camera flow and object appearance preservation, integrated via reinforcement fine-tuning to improve geometric coherence in video generation.
REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
FurnSet improves single-view 3D scene reconstruction by using per-object CLS tokens and set-aware self-attention to group and jointly reconstruct repeated object instances, with added scene-object conditioning and layout optimization.
Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
LaviGen turns 3D generative models into an autoregressive layout generator that models geometric and physical constraints, delivering 19% higher physical plausibility and 65% faster inference on the LayoutVLM benchmark.
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
SegviGen shows pretrained 3D generative models can be repurposed for part segmentation via voxel colorization, beating prior methods by 40% interactively and 15% on full segmentation using only 0.32% of labeled data.
Kaleido is a masked autoregressive generative model that unifies 3D view synthesis and video modeling by pre-training a single transformer on video data, achieving SOTA zero-shot and many-view performance on view synthesis benchmarks.
BulletGen enhances 4D dynamic scene reconstruction from monocular videos by supervising Gaussian optimization with diffusion-generated frames aligned at a bullet-time step, achieving SOTA on novel-view synthesis and tracking.
InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.
Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.
citing papers explorer
-
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction
GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
-
Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models
Introduces the first passive source attribution benchmark for 22 generative 3D models and a Transformer achieving 97.22% accuracy under full supervision and 77.17% with 1% training data.
-
Functionalization via Structure Completion and Motion Rectification
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
-
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
-
Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion
Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffusion model, outperforming prior methods on new CAD-220K and PrintCAD datasets.
-
ConFixGS: Learning to Fix Feedforward 3D Gaussian Splatting with Confidence-Aware Diffusion Priors in Driving Scenes
ConFixGS repairs feedforward 3D Gaussian Splatting with confidence-aware diffusion priors, delivering up to 3.68 dB PSNR gains and halved FID scores on Waymo, nuScenes, and KITTI novel view synthesis tasks.
-
Geometrically Consistent Multi-View Scene Generation from Freehand Sketches
A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.
-
SafeMind: A Risk-Aware Differentiable Control Framework for Adaptive and Safe Quadruped Locomotion
SafeMind is a differentiable framework that combines probabilistic control barrier functions, semantic context encoding, and meta-adaptive risk calibration to deliver safer, lower-energy quadruped locomotion under uncertainty.
-
Novel View Synthesis as Video Completion
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
-
Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching
Stroke of Surprise is a framework that generates vector sketches undergoing semantic transformation from one concept to another by adding strokes, using dual-branch SDS and overlay loss for optimization.
-
CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos
CoMoVi co-generates 3D human motions and 2D videos synchronously in a single diffusion denoising loop using 3D-to-2D projection and dual-branch diffusion with 3D-2D cross attentions.
-
Affostruction: 3D Affordance Grounding with Generative Reconstruction
Affostruction reconstructs full 3D object geometry from partial RGBD views and grounds text-based affordances on both visible and unobserved surfaces, reporting large gains over prior methods.
-
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
DreamGaussian creates high-quality textured 3D meshes from single-view images in 2 minutes via generative Gaussian Splatting with mesh extraction and UV refinement.
-
GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation
GeoFlow adds a geometry-consistency reward based on rigid camera flow and object appearance preservation, integrated via reinforcement fine-tuning to improve geometric coherence in video generation.
-
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement
REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
-
FurnSet: Exploiting Repeats for 3D Scene Reconstruction
FurnSet improves single-view 3D scene reconstruction by using per-object CLS tokens and set-aware self-attention to group and jointly reconstruct repeated object instances, with added scene-object conditioning and layout optimization.
-
Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens
Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
-
Repurposing 3D Generative Model for Autoregressive Layout Generation
LaviGen turns 3D generative models into an autoregressive layout generator that models geometric and physical constraints, delivering 19% higher physical plausibility and 65% faster inference on the LayoutVLM benchmark.
-
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
-
SegviGen: Repurposing 3D Generative Model for Part Segmentation
SegviGen shows pretrained 3D generative models can be repurposed for part segmentation via voxel colorization, beating prior methods by 40% interactively and 15% on full segmentation using only 0.32% of labeled data.
-
Scaling Sequence-to-Sequence Generative Neural Rendering
Kaleido is a masked autoregressive generative model that unifies 3D view synthesis and video modeling by pre-training a single transformer on video data, achieving SOTA zero-shot and many-view performance on view synthesis benchmarks.
-
BulletGen: Improving 4D Reconstruction with Bullet-Time Generation
BulletGen enhances 4D dynamic scene reconstruction from monocular videos by supervising Gaussian optimization with diffusion-generated frames aligned at a bullet-time step, achieving SOTA on novel-view synthesis and tracking.
-
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.
-
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.
-
Efficient 3D Content Reconstruction and Generation
Presents Instant3D for rapid text/image-to-3D generation via multi-view diffusion plus feed-forward reconstruction, and FastMap for 10x faster structure-from-motion with comparable accuracy.
-
DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing
DreamEdit3D learns separate token embeddings for segmented object components via two-phase multi-view optimization to enable text-guided 3D editing with consistent image generation and mesh reconstruction.
-
DecoRec: Decomposed 3D Scene Reconstruction from Single-View Images via Object-Level Diffusion
DecoRec decomposes single-view 3D scene reconstruction into per-object diffusion reconstructions followed by a differentiable rendering and diffusion-guided merging pipeline.
-
Pose-Aware Diffusion for 3D Generation
PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.
-
DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation
LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.
-
KFC-W: Generating 3D-Consistent Videos from Unposed Internet Photos
KFC-W is a self-supervised 3D-aware video model trained on videos and multiview internet photos that produces geometrically consistent interpolations between unposed input images without any 3D annotations.
-
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Zero123++ produces high-quality 3D-consistent multi-view images from a single input by fine-tuning Stable Diffusion with targeted conditioning and training methods.
-
Landscape-Awareness for Geometric View Diffusion Model
A score-based method is introduced to guide optimization in geometric view diffusion models toward correct viewpoints, improving convergence and sample efficiency over naive multistart strategies.
-
AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation
AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantically accurate, temporally coherent animations in seconds.
-
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
Hunyuan3D 2.1 is a two-part system with DiT for shape generation and Paint for texture synthesis that produces high-fidelity 3D assets with PBR materials.
- Stream3D: Sequential Multi-View 3D Generation via Evidential Memory
- Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories