hub Canonical reference

Imagedream: Image-prompt multi-view diffusion for 3d generation

· 2023 · arXiv 2312.02201

Canonical reference. 100% of citing Pith papers cite this work as background.

20 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7

citation-polarity summary

background 7

representative citing papers

Variance Reduction on the Camera Axis: Multi-View Score Distillation for 3D

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

MV-SDI aggregates K-view gradients per step via accumulation and antithetic pairs at fixed UNet budget, raising CLIP R-Precision from 74.8% to 83.8% (K=2) and halving steps while keeping the 2D prior frozen.

Functionalization via Structure Completion and Motion Rectification

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.

HiFiVe: High-Fidelity Vehicle Generation Leveraging Auto-Regressive 2D Generative Priors

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

HiFiVe is a training-free framework using an auto-regressive texture refinement pipeline with depth-based warping, multi-view fusion, and symmetry to enhance both texture and geometry fidelity in vehicle generation from 2D priors.

Velox: Learning Representations of 4D Geometry and Appearance

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth simulation.

REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.

Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

Any3DAvatar reconstructs full-head 3D Gaussian avatars from one image via one-step denoising on a Plücker-aware scaffold plus auxiliary view supervision, beating prior single-image methods on fidelity while running substantially faster.

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

cs.CV · 2025-02-10 · unverdicted · novelty 6.0

TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

cs.CV · 2024-05-16 · conditional · novelty 6.0

A multi-view diffusion model generates consistent novel views from sparse images to enable fast 3D scene reconstruction.

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

cs.CV · 2024-04-10 · unverdicted · novelty 6.0

InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.

DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing

cs.CV · 2026-05-16 · unverdicted · novelty 5.0

DreamEdit3D learns separate token embeddings for segmented object components via two-phase multi-view optimization to enable text-guided 3D editing with consistent image generation and mesh reconstruction.

DecoRec: Decomposed 3D Scene Reconstruction from Single-View Images via Object-Level Diffusion

cs.CV · 2026-05-16 · unverdicted · novelty 5.0

DecoRec decomposes single-view 3D scene reconstruction into per-object diffusion reconstructions followed by a differentiable rendering and diffusion-guided merging pipeline.

Pose-Aware Diffusion for 3D Generation

cs.CV · 2026-05-01 · unverdicted · novelty 5.0

PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.

Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples sparse-view multiview generation with 3D Gaussian lifting.

DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation

cs.CV · 2025-09-09 · unverdicted · novelty 5.0

LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.

Qwen-Image Technical Report

cs.CV · 2025-08-04 · unverdicted · novelty 5.0

Qwen-Image is a foundation model that reaches state-of-the-art results in image generation and editing by combining a large-scale text-focused data pipeline with curriculum learning and dual semantic-reconstructive encoding for editing consistency.

3DCarGen: Scalable 3D Car Generation via 3D-consistent Multi-view Synthesis

cs.CV · 2026-06-23 · unverdicted · novelty 4.0

3DCarGen synthesizes 3D-consistent multi-view images from one input photo, builds a coarse 3D Gaussian representation, then generates arbitrary views and recovers detailed meshes with color-normal optimization for real-world car images.

AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation

cs.CV · 2026-04-29 · unverdicted · novelty 4.0

AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantically accurate, temporally coherent animations in seconds.

Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details

cs.CV · 2025-06-19 · unverdicted · novelty 4.0

Hunyuan3D 2.5's LATTICE model with 10B parameters generates detailed 3D shapes from images and uses multi-view PBR for textures, outperforming prior methods in fidelity and mesh quality.

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

cs.CV · 2025-01-21 · unverdicted · novelty 4.0

Hunyuan3D 2.0 scales flow-based diffusion transformers and texture synthesis models to generate high-resolution textured 3D assets that outperform prior state-of-the-art in geometry, alignment, and texture quality.

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

cs.CV · 2026-05-13 · 2 refs

citing papers explorer

Showing 20 of 20 citing papers.

Variance Reduction on the Camera Axis: Multi-View Score Distillation for 3D cs.CV · 2026-06-29 · unverdicted · none · ref 43
MV-SDI aggregates K-view gradients per step via accumulation and antithetic pairs at fixed UNet budget, raising CLIP R-Precision from 74.8% to 83.8% (K=2) and halving steps while keeping the 2D prior frozen.
Functionalization via Structure Completion and Motion Rectification cs.CV · 2026-05-18 · unverdicted · none · ref 50
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
HiFiVe: High-Fidelity Vehicle Generation Leveraging Auto-Regressive 2D Generative Priors cs.CV · 2026-06-24 · unverdicted · none · ref 33
HiFiVe is a training-free framework using an auto-regressive texture refinement pipeline with depth-based warping, multi-view fusion, and symmetry to enhance both texture and geometry fidelity in vehicle generation from 2D priors.
Velox: Learning Representations of 4D Geometry and Appearance cs.CV · 2026-05-06 · unverdicted · none · ref 93
Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth simulation.
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement cs.CV · 2026-04-30 · unverdicted · none · ref 52
REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image cs.CV · 2026-04-15 · unverdicted · none · ref 42
Any3DAvatar reconstructs full-head 3D Gaussian avatars from one image via one-step denoising on a Plücker-aware scaffold plus auxiliary view supervision, beating prior single-image methods on fidelity while running substantially faster.
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models cs.CV · 2025-02-10 · unverdicted · none · ref 188
TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.
CAT3D: Create Anything in 3D with Multi-View Diffusion Models cs.CV · 2024-05-16 · conditional · none · ref 9
A multi-view diffusion model generates consistent novel views from sparse images to enable fast 3D scene reconstruction.
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models cs.CV · 2024-04-10 · unverdicted · none · ref 50
InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.
DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing cs.CV · 2026-05-16 · unverdicted · none · ref 43
DreamEdit3D learns separate token embeddings for segmented object components via two-phase multi-view optimization to enable text-guided 3D editing with consistent image generation and mesh reconstruction.
DecoRec: Decomposed 3D Scene Reconstruction from Single-View Images via Object-Level Diffusion cs.CV · 2026-05-16 · unverdicted · none · ref 72
DecoRec decomposes single-view 3D scene reconstruction into per-object diffusion reconstructions followed by a differentiable rendering and diffusion-guided merging pipeline.
Pose-Aware Diffusion for 3D Generation cs.CV · 2026-05-01 · unverdicted · none · ref 46
PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.
Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation cs.CV · 2026-04-20 · unverdicted · none · ref 41
Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples sparse-view multiview generation with 3D Gaussian lifting.
DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation cs.CV · 2025-09-09 · unverdicted · none · ref 77
LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.
Qwen-Image Technical Report cs.CV · 2025-08-04 · unverdicted · none · ref 28
Qwen-Image is a foundation model that reaches state-of-the-art results in image generation and editing by combining a large-scale text-focused data pipeline with curriculum learning and dual semantic-reconstructive encoding for editing consistency.
3DCarGen: Scalable 3D Car Generation via 3D-consistent Multi-view Synthesis cs.CV · 2026-06-23 · unverdicted · none · ref 23
3DCarGen synthesizes 3D-consistent multi-view images from one input photo, builds a coarse 3D Gaussian representation, then generates arbitrary views and recovers detailed meshes with color-normal optimization for real-world car images.
AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation cs.CV · 2026-04-29 · unverdicted · none · ref 36
AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantically accurate, temporally coherent animations in seconds.
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details cs.CV · 2025-06-19 · unverdicted · none · ref 14
Hunyuan3D 2.5's LATTICE model with 10B parameters generates detailed 3D shapes from images and uses multi-view PBR for textures, outperforming prior methods in fidelity and mesh quality.
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation cs.CV · 2025-01-21 · unverdicted · none · ref 94
Hunyuan3D 2.0 scales flow-based diffusion transformers and texture synthesis models to generate high-resolution textured 3D assets that outperform prior state-of-the-art in geometry, alignment, and texture quality.
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow cs.CV · 2026-05-13 · unreviewed · ref 109 · 2 links

Imagedream: Image-prompt multi-view diffusion for 3d generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer