hub Canonical reference

Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model

Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei · 2023 · cs.CV · arXiv 2310.15110

Canonical reference. 88% of citing Pith papers cite this work as background.

31 Pith papers citing it

Background 88% of classified citations

open full Pith review browse 31 citing papers arXiv PDF

abstract

We report Zero123++, an image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view. To take full advantage of pretrained 2D generative priors, we develop various conditioning and training schemes to minimize the effort of finetuning from off-the-shelf image diffusion models such as Stable Diffusion. Zero123++ excels in producing high-quality, consistent multi-view images from a single image, overcoming common issues like texture degradation and geometric misalignment. Furthermore, we showcase the feasibility of training a ControlNet on Zero123++ for enhanced control over the generation process. The code is available at https://github.com/SUDO-AI-3D/zero123plus.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 method 1

citation-polarity summary

background 7 use method 1

representative citing papers

Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Stream3D is a training-free method that maintains temporal consistency in 3D generation from monocular streams by dynamically caching a fixed number of informative historical frames using an evidence score.

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

cs.CV · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.

UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models

cs.CV · 2026-04-19 · unverdicted · novelty 7.0 · 2 refs

UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.

Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

cs.CV · 2026-04-15 · unverdicted · novelty 7.0

A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.

Novel View Synthesis as Video Completion

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.

SVG360: Editable Multiview Vector Graphics from a Single SVG

cs.CV · 2025-11-20 · unverdicted · novelty 7.0

SVG360 lifts a single SVG to a view-conditioned representation, uses spatial memory to propagate consistent parts across views, and applies structure-aware vectorization to produce editable multiview SVGs.

PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models

cs.CV · 2025-05-28 · unverdicted · novelty 7.0

PacTure uses view packing and next-scale autoregressive prediction to generate consistent multi-view PBR textures faster than prior sequential or cross-attention methods.

Materialist: Physically Based Editing Using Single-Image Inverse Rendering

cs.CV · 2025-01-07 · unverdicted · novelty 7.0

Materialist performs single-image inverse rendering via neural-initialized progressive differentiable rendering to enable physically consistent material editing, object insertion, relighting, and transparency edits without full scene geometry.

GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

GeoFlow adds a geometry-consistency reward based on rigid camera flow and object appearance preservation, integrated via reinforcement fine-tuning to improve geometric coherence in video generation.

GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

GeoQuery replaces corrupted rendering features with geometry-aligned proxy queries and restricts cross-view attention to local windows, enabling robust diffusion-based refinement under extreme view sparsity.

Generative 3D Gaussians with Learned Density Control

cs.GR · 2026-05-08 · unverdicted · novelty 6.0

DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

PhysForge generates physics-grounded 3D assets via a VLM-planned Hierarchical Physical Blueprint and a KineVoxel Injection diffusion model, backed by the new PhysDB dataset of 150,000 annotated assets.

Stylistic Attribute Control in Latent Diffusion Models

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

A technique for parametric stylistic control in latent diffusion models learns disentangled directions from synthetic datasets and applies them via guidance composition while preserving semantics.

Sparse-View 3D Gaussian Splatting in the Wild

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

A new sparse-view 3D Gaussian splatting method for unconstrained scenes with distractors combines diffusion-based reference-guided refinement and sparsity-aware Gaussian replication to achieve better rendering quality.

Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.

Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

Any3DAvatar reconstructs full-head 3D Gaussian avatars from one image via one-step denoising on a Plücker-aware scaffold plus auxiliary view supervision, beating prior single-image methods on fidelity while running substantially faster.

SIC3D: Style Image Conditioned Text-to-3D Gaussian Splatting Generation

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

SIC3D generates text-to-3D objects with Gaussian splatting then stylizes them using Variational Stylized Score Distillation loss plus scaling regularization to improve style match and geometry fidelity.

Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning

cs.GR · 2026-03-25 · conditional · novelty 6.0

Realiz3D decouples visual domain from 3D controls in diffusion models via domain-aware residual adapters to enable photorealistic controllable generation.

Scaling Sequence-to-Sequence Generative Neural Rendering

cs.CV · 2025-10-05 · unverdicted · novelty 6.0

Kaleido is a masked autoregressive generative model that unifies 3D view synthesis and video modeling by pre-training a single transformer on video data, achieving SOTA zero-shot and many-view performance on view synthesis benchmarks.

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

cs.CV · 2025-02-10 · unverdicted · novelty 6.0

TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

cs.CV · 2024-06-04 · unverdicted · novelty 6.0

CamCo equips image-to-video generators with Plücker-coordinate camera inputs and epipolar attention to improve 3D consistency and camera controllability.

Evaluating Real-World Robot Manipulation Policies in Simulation

cs.RO · 2024-05-09 · conditional · novelty 6.0

SIMPLER simulated environments yield policy performance that correlates strongly with real-world robot manipulation results and captures similar sensitivity to distribution shifts.

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

cs.CV · 2024-04-10 · unverdicted · novelty 6.0

InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.

DecoRec: Decomposed 3D Scene Reconstruction from Single-View Images via Object-Level Diffusion

cs.CV · 2026-05-16 · unverdicted · novelty 5.0

DecoRec decomposes single-view 3D scene reconstruction into per-object diffusion reconstructions followed by a differentiable rendering and diffusion-guided merging pipeline.

citing papers explorer

Showing 31 of 31 citing papers.

Stream3D: Sequential Multi-View 3D Generation via Evidential Memory cs.CV · 2026-05-20 · unverdicted · none · ref 56 · internal anchor
Stream3D is a training-free method that maintains temporal consistency in 3D generation from monocular streams by dynamically caching a fixed number of informative historical frames using an evidence score.
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow cs.CV · 2026-05-13 · unverdicted · none · ref 129 · 2 links · internal anchor
R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models cs.CV · 2026-04-19 · unverdicted · none · ref 61 · 2 links · internal anchor
UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.
Geometrically Consistent Multi-View Scene Generation from Freehand Sketches cs.CV · 2026-04-15 · unverdicted · none · ref 40 · internal anchor
A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.
Novel View Synthesis as Video Completion cs.CV · 2026-04-09 · unverdicted · none · ref 34 · internal anchor
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
SVG360: Editable Multiview Vector Graphics from a Single SVG cs.CV · 2025-11-20 · unverdicted · none · ref 34 · internal anchor
SVG360 lifts a single SVG to a view-conditioned representation, uses spatial memory to propagate consistent parts across views, and applies structure-aware vectorization to produce editable multiview SVGs.
PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models cs.CV · 2025-05-28 · unverdicted · none · ref 75 · internal anchor
PacTure uses view packing and next-scale autoregressive prediction to generate consistent multi-view PBR textures faster than prior sequential or cross-attention methods.
Materialist: Physically Based Editing Using Single-Image Inverse Rendering cs.CV · 2025-01-07 · unverdicted · none · ref 58 · internal anchor
Materialist performs single-image inverse rendering via neural-initialized progressive differentiable rendering to enable physically consistent material editing, object insertion, relighting, and transparency edits without full scene geometry.
GeoFlow: Enforcing Implicit Geometric Consistency in Video Generation cs.CV · 2026-05-18 · unverdicted · none · ref 60 · internal anchor
GeoFlow adds a geometry-consistency reward based on rigid camera flow and object appearance preservation, integrated via reinforcement fine-tuning to improve geometric coherence in video generation.
GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction cs.CV · 2026-05-12 · unverdicted · none · ref 43 · internal anchor
GeoQuery replaces corrupted rendering features with geometry-aligned proxy queries and restricts cross-view attention to local windows, enabling robust diffusion-based refinement under extreme view sparsity.
Generative 3D Gaussians with Learned Density Control cs.GR · 2026-05-08 · unverdicted · none · ref 46 · internal anchor
DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World cs.CV · 2026-05-06 · unverdicted · none · ref 16 · internal anchor
PhysForge generates physics-grounded 3D assets via a VLM-planned Hierarchical Physical Blueprint and a KineVoxel Injection diffusion model, backed by the new PhysDB dataset of 150,000 annotated assets.
Stylistic Attribute Control in Latent Diffusion Models cs.CV · 2026-05-04 · unverdicted · none · ref 20 · internal anchor
A technique for parametric stylistic control in latent diffusion models learns disentangled directions from synthetic datasets and applies them via guidance composition while preserving semantics.
Sparse-View 3D Gaussian Splatting in the Wild cs.CV · 2026-04-30 · unverdicted · none · ref 50 · internal anchor
A new sparse-view 3D Gaussian splatting method for unconstrained scenes with distractors combines diffusion-based reference-guided refinement and sparsity-aware Gaussian replication to achieve better rendering quality.
Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens cs.CV · 2026-04-21 · unverdicted · none · ref 35 · internal anchor
Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image cs.CV · 2026-04-15 · unverdicted · none · ref 36 · internal anchor
Any3DAvatar reconstructs full-head 3D Gaussian avatars from one image via one-step denoising on a Plücker-aware scaffold plus auxiliary view supervision, beating prior single-image methods on fidelity while running substantially faster.
SIC3D: Style Image Conditioned Text-to-3D Gaussian Splatting Generation cs.CV · 2026-04-09 · unverdicted · none · ref 28 · internal anchor
SIC3D generates text-to-3D objects with Gaussian splatting then stylizes them using Variational Stylized Score Distillation loss plus scaling regularization to improve style match and geometry fidelity.
Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning cs.GR · 2026-03-25 · conditional · none · ref 31 · internal anchor
Realiz3D decouples visual domain from 3D controls in diffusion models via domain-aware residual adapters to enable photorealistic controllable generation.
Scaling Sequence-to-Sequence Generative Neural Rendering cs.CV · 2025-10-05 · unverdicted · none · ref 14 · internal anchor
Kaleido is a masked autoregressive generative model that unifies 3D view synthesis and video modeling by pre-training a single transformer on video data, achieving SOTA zero-shot and many-view performance on view synthesis benchmarks.
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models cs.CV · 2025-02-10 · unverdicted · none · ref 59 · internal anchor
TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.
CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation cs.CV · 2024-06-04 · unverdicted · none · ref 43 · internal anchor
CamCo equips image-to-video generators with Plücker-coordinate camera inputs and epipolar attention to improve 3D consistency and camera controllability.
Evaluating Real-World Robot Manipulation Policies in Simulation cs.RO · 2024-05-09 · conditional · none · ref 61 · internal anchor
SIMPLER simulated environments yield policy performance that correlates strongly with real-world robot manipulation results and captures similar sensitivity to distribution shifts.
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models cs.CV · 2024-04-10 · unverdicted · none · ref 41 · internal anchor
InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.
DecoRec: Decomposed 3D Scene Reconstruction from Single-View Images via Object-Level Diffusion cs.CV · 2026-05-16 · unverdicted · none · ref 12 · internal anchor
DecoRec decomposes single-view 3D scene reconstruction into per-object diffusion reconstructions followed by a differentiable rendering and diffusion-guided merging pipeline.
DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation cs.CV · 2025-09-09 · unverdicted · none · ref 16 · internal anchor
LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.
Sketch2MinSurf: Vision-Language Guided Generation of Editable Minimal Surfaces from Hand-Drawn Sketches cs.CV · 2026-05-20 · unverdicted · none · ref 33 · internal anchor
Hybrid vision-language and geometric optimization framework generates editable minimal surfaces from sketches, reporting 0.844 topological similarity on 100 test sketches.
Landscape-Awareness for Geometric View Diffusion Model cs.CV · 2026-05-19 · unverdicted · none · ref 46 · internal anchor
A score-based method is introduced to guide optimization in geometric view diffusion models toward correct viewpoints, improving convergence and sample efficiency over naive multistart strategies.
AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation cs.CV · 2026-04-29 · unverdicted · none · ref 38 · internal anchor
AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantically accurate, temporally coherent animations in seconds.
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details cs.CV · 2025-06-19 · unverdicted · none · ref 12 · internal anchor
Hunyuan3D 2.5's LATTICE model with 10B parameters generates detailed 3D shapes from images and uses multi-view PBR for textures, outperforming prior methods in fidelity and mesh quality.
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation cs.CV · 2025-01-21 · unverdicted · none · ref 78 · internal anchor
Hunyuan3D 2.0 scales flow-based diffusion transformers and texture synthesis models to generate high-resolution textured 3D assets that outperform prior state-of-the-art in geometry, alignment, and texture quality.
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material cs.CV · 2025-06-18 · unverdicted · none · ref 22 · internal anchor
Hunyuan3D 2.1 is a two-part system with DiT for shape generation and Paint for texture synthesis that produces high-fidelity 3D assets with PBR materials.

Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer