hub Canonical reference

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura · 2023 · cs.CV · arXiv 2309.03453

Canonical reference. 82% of citing Pith papers cite this work as background.

40 Pith papers citing it

Background 82% of classified citations

open full Pith review browse 40 citing papers arXiv PDF

abstract

In this paper, we present a novel diffusion model called that generates multiview-consistent images from a single-view image. Using pretrained large-scale 2D diffusion models, recent work Zero123 demonstrates the ability to generate plausible novel views from a single-view image of an object. However, maintaining consistency in geometry and colors for the generated images remains a challenge. To address this issue, we propose a synchronized multiview diffusion model that models the joint probability distribution of multiview images, enabling the generation of multiview-consistent images in a single reverse process. SyncDreamer synchronizes the intermediate states of all the generated images at every step of the reverse process through a 3D-aware feature attention mechanism that correlates the corresponding features across different views. Experiments show that SyncDreamer generates images with high consistency across different views, thus making it well-suited for various 3D generation tasks such as novel-view-synthesis, text-to-3D, and image-to-3D.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 10 baseline 1

citation-polarity summary

background 9 baseline 1 unclear 1

representative citing papers

COSY: Compositional 3DGS Synthesis for Disentangled Human Head Editing

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

COSY uses independent per-component 3DGS generators plus context tokens to achieve disentangled semantic editing of human heads without masks or classifiers.

GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.

Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Introduces the first passive source attribution benchmark for 22 generative 3D models and a Transformer achieving 97.22% accuracy under full supervision and 77.17% with 1% training data.

Functionalization via Structure Completion and Motion Rectification

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.

Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffusion model, outperforming prior methods on new CAD-220K and PrintCAD datasets.

ConFixGS: Learning to Fix Feedforward 3D Gaussian Splatting with Confidence-Aware Diffusion Priors in Driving Scenes

cs.CV · 2026-05-10 · unverdicted · novelty 7.0

ConFixGS repairs feedforward 3D Gaussian Splatting with confidence-aware diffusion priors, delivering up to 3.68 dB PSNR gains and halved FID scores on Waymo, nuScenes, and KITTI novel view synthesis tasks.

Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.

SafeMind: A Risk-Aware Differentiable Control Framework for Adaptive and Safe Quadruped Locomotion

cs.RO · 2026-04-10 · unverdicted · novelty 7.0

SafeMind is a differentiable framework that combines probabilistic control barrier functions, semantic context encoding, and meta-adaptive risk calibration to deliver safer, lower-energy quadruped locomotion under uncertainty.

Novel View Synthesis as Video Completion

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.

Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching

cs.CV · 2026-02-12 · unverdicted · novelty 7.0

Stroke of Surprise is a framework that generates vector sketches undergoing semantic transformation from one concept to another by adding strokes, using dual-branch SDS and overlay loss for optimization.

CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos

cs.CV · 2026-01-15 · unverdicted · novelty 7.0

CoMoVi co-generates 3D human motions and 2D videos synchronously in a single diffusion denoising loop using 3D-to-2D projection and dual-branch diffusion with 3D-2D cross attentions.

Affostruction: 3D Affordance Grounding with Generative Reconstruction

cs.CV · 2026-01-14 · unverdicted · novelty 7.0

Affostruction reconstructs full 3D object geometry from partial RGBD views and grounds text-based affordances on both visible and unobserved surfaces, reporting large gains over prior methods.

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

cs.CV · 2023-09-28 · unverdicted · novelty 7.0

DreamGaussian creates high-quality textured 3D meshes from single-view images in 2 minutes via generative Gaussian Splatting with mesh extraction and UV refinement.

GeoEdit: Geometry-Aware Object Editing via Dual-Branch Denoising

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

GeoEdit introduces a Lift-Manipulate-Render-Denoise pipeline with dual-branch denoising and variance-homogeneous injection for 3D-consistent object editing in single photos.

GeoFace: Consistent Multi-View Face Generation with Geometry-Constrained Diffusion

cs.CV · 2026-06-26 · unverdicted · novelty 6.0

GeoFace generates consistent multi-view face images and 3D geometry from one input via a dual-stream diffusion framework with geometry-guided attention alignment.

Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

cs.CV · 2026-05-20 · unverdicted · novelty 6.0 · 2 refs

Stream3D is a training-free method that maintains a fixed-size evidential memory of past frames to convert frozen view-conditioned 3D generators into consistent streaming generators.

GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

GeoFlow adds a geometry-consistency reward based on rigid camera flow and object appearance preservation, integrated via reinforcement fine-tuning to improve geometric coherence in video generation.

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

cs.CV · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

R-DMesh proposes a VAE-based disentanglement of base mesh, motion trajectories, and rectification offset plus Triflow Attention and rectified-flow diffusion to produce 4D meshes aligned to video despite initial pose mismatch.

REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.

FurnSet: Exploiting Repeats for 3D Scene Reconstruction

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

FurnSet improves single-view 3D scene reconstruction by using per-object CLS tokens and set-aware self-attention to group and jointly reconstruct repeated object instances, with added scene-object conditioning and layout optimization.

Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.

Repurposing 3D Generative Model for Autoregressive Layout Generation

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

LaviGen turns 3D generative models into an autoregressive layout generator that models geometric and physical constraints, delivering 19% higher physical plausibility and 65% faster inference on the LayoutVLM benchmark.

ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment

cs.CV · 2026-04-12 · unverdicted · novelty 6.0

ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.

SegviGen: Repurposing 3D Generative Model for Part Segmentation

cs.CV · 2026-03-17 · unverdicted · novelty 6.0

SegviGen shows pretrained 3D generative models can be repurposed for part segmentation via voxel colorization, beating prior methods by 40% interactively and 15% on full segmentation using only 0.32% of labeled data.

citing papers explorer

Showing 40 of 40 citing papers.

COSY: Compositional 3DGS Synthesis for Disentangled Human Head Editing cs.CV · 2026-05-22 · unverdicted · none · ref 29 · internal anchor
COSY uses independent per-component 3DGS generators plus context tokens to achieve disentangled semantic editing of human heads without masks or classifiers.
GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction cs.CV · 2026-05-22 · unverdicted · none · ref 21 · internal anchor
GenRecon lifts object-level generative priors to scene-scale reconstruction by chunking scenes and using projection-based conditioning on multi-view features, claiming 16% better results than prior methods.
Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models cs.CV · 2026-05-18 · unverdicted · none · ref 59 · internal anchor
Introduces the first passive source attribution benchmark for 22 generative 3D models and a Transformer achieving 97.22% accuracy under full supervision and 77.17% with 1% training data.
Functionalization via Structure Completion and Motion Rectification cs.CV · 2026-05-18 · unverdicted · none · ref 81 · internal anchor
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion cs.CV · 2026-05-13 · unverdicted · none · ref 21 · internal anchor
Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffusion model, outperforming prior methods on new CAD-220K and PrintCAD datasets.
ConFixGS: Learning to Fix Feedforward 3D Gaussian Splatting with Confidence-Aware Diffusion Priors in Driving Scenes cs.CV · 2026-05-10 · unverdicted · none · ref 76 · internal anchor
ConFixGS repairs feedforward 3D Gaussian Splatting with confidence-aware diffusion priors, delivering up to 3.68 dB PSNR gains and halved FID scores on Waymo, nuScenes, and KITTI novel view synthesis tasks.
Geometrically Consistent Multi-View Scene Generation from Freehand Sketches cs.CV · 2026-04-15 · unverdicted · none · ref 31 · internal anchor
A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.
SafeMind: A Risk-Aware Differentiable Control Framework for Adaptive and Safe Quadruped Locomotion cs.RO · 2026-04-10 · unverdicted · none · ref 16 · internal anchor
SafeMind is a differentiable framework that combines probabilistic control barrier functions, semantic context encoding, and meta-adaptive risk calibration to deliver safer, lower-energy quadruped locomotion under uncertainty.
Novel View Synthesis as Video Completion cs.CV · 2026-04-09 · unverdicted · none · ref 25 · internal anchor
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching cs.CV · 2026-02-12 · unverdicted · none · ref 75 · internal anchor
Stroke of Surprise is a framework that generates vector sketches undergoing semantic transformation from one concept to another by adding strokes, using dual-branch SDS and overlay loss for optimization.
CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos cs.CV · 2026-01-15 · unverdicted · none · ref 53 · internal anchor
CoMoVi co-generates 3D human motions and 2D videos synchronously in a single diffusion denoising loop using 3D-to-2D projection and dual-branch diffusion with 3D-2D cross attentions.
Affostruction: 3D Affordance Grounding with Generative Reconstruction cs.CV · 2026-01-14 · unverdicted · none · ref 24 · internal anchor
Affostruction reconstructs full 3D object geometry from partial RGBD views and grounds text-based affordances on both visible and unobserved surfaces, reporting large gains over prior methods.
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation cs.CV · 2023-09-28 · unverdicted · none · ref 119 · internal anchor
DreamGaussian creates high-quality textured 3D meshes from single-view images in 2 minutes via generative Gaussian Splatting with mesh extraction and UV refinement.
GeoEdit: Geometry-Aware Object Editing via Dual-Branch Denoising cs.CV · 2026-06-29 · unverdicted · none · ref 28 · internal anchor
GeoEdit introduces a Lift-Manipulate-Render-Denoise pipeline with dual-branch denoising and variance-homogeneous injection for 3D-consistent object editing in single photos.
GeoFace: Consistent Multi-View Face Generation with Geometry-Constrained Diffusion cs.CV · 2026-06-26 · unverdicted · none · ref 42 · internal anchor
GeoFace generates consistent multi-view face images and 3D geometry from one input via a dual-stream diffusion framework with geometry-guided attention alignment.
Stream3D: Sequential Multi-View 3D Generation via Evidential Memory cs.CV · 2026-05-20 · unverdicted · none · ref 42 · 2 links · internal anchor
Stream3D is a training-free method that maintains a fixed-size evidential memory of past frames to convert frozen view-conditioned 3D generators into consistent streaming generators.
GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation cs.CV · 2026-05-18 · unverdicted · none · ref 47 · internal anchor
GeoFlow adds a geometry-consistency reward based on rigid camera flow and object appearance preservation, integrated via reinforcement fine-tuning to improve geometric coherence in video generation.
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow cs.CV · 2026-05-13 · unverdicted · none · ref 99 · 2 links · internal anchor
R-DMesh proposes a VAE-based disentanglement of base mesh, motion trajectories, and rectification offset plus Triflow Attention and rectified-flow diffusion to produce 4D meshes aligned to video despite initial pose mismatch.
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement cs.CV · 2026-04-30 · unverdicted · none · ref 26 · internal anchor
REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
FurnSet: Exploiting Repeats for 3D Scene Reconstruction cs.CV · 2026-04-22 · unverdicted · none · ref 28 · internal anchor
FurnSet improves single-view 3D scene reconstruction by using per-object CLS tokens and set-aware self-attention to group and jointly reconstruct repeated object instances, with added scene-object conditioning and layout optimization.
Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens cs.CV · 2026-04-21 · unverdicted · none · ref 18 · internal anchor
Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
Repurposing 3D Generative Model for Autoregressive Layout Generation cs.CV · 2026-04-17 · unverdicted · none · ref 58 · internal anchor
LaviGen turns 3D generative models into an autoregressive layout generator that models geometric and physical constraints, delivering 19% higher physical plausibility and 65% faster inference on the LayoutVLM benchmark.
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment cs.CV · 2026-04-12 · unverdicted · none · ref 32 · internal anchor
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
SegviGen: Repurposing 3D Generative Model for Part Segmentation cs.CV · 2026-03-17 · unverdicted · none · ref 40 · internal anchor
SegviGen shows pretrained 3D generative models can be repurposed for part segmentation via voxel colorization, beating prior methods by 40% interactively and 15% on full segmentation using only 0.32% of labeled data.
Scaling Sequence-to-Sequence Generative Neural Rendering cs.CV · 2025-10-05 · unverdicted · none · ref 8 · internal anchor
Kaleido is a masked autoregressive generative model that unifies 3D view synthesis and video modeling by pre-training a single transformer on video data, achieving SOTA zero-shot and many-view performance on view synthesis benchmarks.
BulletGen: Improving 4D Reconstruction with Bullet-Time Generation cs.GR · 2025-06-23 · unverdicted · none · ref 47 · internal anchor
BulletGen enhances 4D dynamic scene reconstruction from monocular videos by supervising Gaussian optimization with diffusion-generated frames aligned at a bullet-time step, achieving SOTA on novel-view synthesis and tracking.
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models cs.CV · 2024-04-10 · unverdicted · none · ref 24 · internal anchor
InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets cs.CV · 2023-11-25 · conditional · none · ref 58 · internal anchor
Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.
VolFill: Single-View Amodal 3D Scene Reconstruction with Volumetric Flow Matching cs.CV · 2026-05-29 · unverdicted · none · ref 48 · internal anchor
VolFill uses a hybrid 3D VAE to compress sparse truncated unsigned distance function grids into latent space and a latent Diffusion Transformer to denoise complete scenes, conditioned on geometry foundation models, outperforming baselines on SCRREAM and NRGB-D datasets.
Efficient 3D Content Reconstruction and Generation cs.CV · 2026-05-18 · unverdicted · none · ref 144 · internal anchor
Presents Instant3D for rapid text/image-to-3D generation via multi-view diffusion plus feed-forward reconstruction, and FastMap for 10x faster structure-from-motion with comparable accuracy.
DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing cs.CV · 2026-05-16 · unverdicted · none · ref 24 · internal anchor
DreamEdit3D learns separate token embeddings for segmented object components via two-phase multi-view optimization to enable text-guided 3D editing with consistent image generation and mesh reconstruction.
DecoRec: Decomposed 3D Scene Reconstruction from Single-View Images via Object-Level Diffusion cs.CV · 2026-05-16 · unverdicted · none · ref 10 · internal anchor
DecoRec decomposes single-view 3D scene reconstruction into per-object diffusion reconstructions followed by a differentiable rendering and diffusion-guided merging pipeline.
Pose-Aware Diffusion for 3D Generation cs.CV · 2026-05-01 · unverdicted · none · ref 25 · internal anchor
PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.
DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation cs.CV · 2025-09-09 · unverdicted · none · ref 18 · internal anchor
LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.
KFC-W: Generating 3D-Consistent Videos from Unposed Internet Photos cs.CV · 2024-11-20 · unverdicted · none · ref 42 · internal anchor
KFC-W is a self-supervised 3D-aware video model trained on videos and multiview internet photos that produces geometrically consistent interpolations between unposed input images without any 3D annotations.
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model cs.CV · 2023-10-23 · unverdicted · none · ref 13 · internal anchor
Zero123++ produces high-quality 3D-consistent multi-view images from a single input by fine-tuning Stable Diffusion with targeted conditioning and training methods.
Landscape-Awareness for Geometric View Diffusion Model cs.CV · 2026-05-19 · unverdicted · none · ref 29 · internal anchor
A score-based method is introduced to guide optimization in geometric view diffusion models toward correct viewpoints, improving convergence and sample efficiency over naive multistart strategies.
AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation cs.CV · 2026-04-29 · unverdicted · none · ref 37 · internal anchor
AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantically accurate, temporally coherent animations in seconds.
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material cs.CV · 2025-06-18 · unverdicted · none · ref 21 · internal anchor
Hunyuan3D 2.1 is a two-part system with DiT for shape generation and Paint for texture synthesis that produces high-fidelity 3D assets with PBR materials.
Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories cs.CV · 2026-04-10 · unreviewed · ref 12 · internal anchor

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer