R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
hub Canonical reference
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Canonical reference. 88% of citing Pith papers cite this work as background.
abstract
We report Zero123++, an image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view. To take full advantage of pretrained 2D generative priors, we develop various conditioning and training schemes to minimize the effort of finetuning from off-the-shelf image diffusion models such as Stable Diffusion. Zero123++ excels in producing high-quality, consistent multi-view images from a single image, overcoming common issues like texture degradation and geometric misalignment. Furthermore, we showcase the feasibility of training a ControlNet on Zero123++ for enhanced control over the generation process. The code is available at https://github.com/SUDO-AI-3D/zero123plus.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
UniGeo unifies geometric guidance across three levels in video models to reduce geometric drift and improve consistency in camera-controllable image editing.
A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in realism and consistency.
Video diffusion models can be adapted into permutation-invariant generators for sparse novel view synthesis by treating the problem as video completion and removing temporal order cues.
SVG360 lifts a single SVG to a view-conditioned representation, uses spatial memory to propagate consistent parts across views, and applies structure-aware vectorization to produce editable multiview SVGs.
PacTure uses view packing and next-scale autoregressive prediction to generate consistent multi-view PBR textures faster than prior sequential or cross-attention methods.
Materialist performs single-image inverse rendering via neural-initialized progressive differentiable rendering to enable physically consistent material editing, object insertion, relighting, and transparency edits without full scene geometry.
GeoFlow adds a geometry-consistency reward based on rigid camera flow and object appearance preservation, integrated via reinforcement fine-tuning to improve geometric coherence in video generation.
GeoQuery replaces corrupted rendering features with geometry-aligned proxy queries and restricts cross-view attention to local windows, enabling robust diffusion-based refinement under extreme view sparsity.
DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.
PhysForge generates physics-grounded 3D assets via a VLM-planned Hierarchical Physical Blueprint and a KineVoxel Injection diffusion model, backed by the new PhysDB dataset of 150,000 annotated assets.
A technique for parametric stylistic control in latent diffusion models learns disentangled directions from synthetic datasets and applies them via guidance composition while preserving semantics.
A new sparse-view 3D Gaussian splatting method for unconstrained scenes with distractors combines diffusion-based reference-guided refinement and sparsity-aware Gaussian replication to achieve better rendering quality.
Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
Any3DAvatar reconstructs full-head 3D Gaussian avatars from one image via one-step denoising on a Plücker-aware scaffold plus auxiliary view supervision, beating prior single-image methods on fidelity while running substantially faster.
SIC3D generates text-to-3D objects with Gaussian splatting then stylizes them using Variational Stylized Score Distillation loss plus scaling regularization to improve style match and geometry fidelity.
Realiz3D decouples visual domain from 3D controls in diffusion models via domain-aware residual adapters to enable photorealistic controllable generation.
Kaleido is a masked autoregressive generative model that unifies 3D view synthesis and video modeling by pre-training a single transformer on video data, achieving SOTA zero-shot and many-view performance on view synthesis benchmarks.
TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.
CamCo equips image-to-video generators with Plücker-coordinate camera inputs and epipolar attention to improve 3D consistency and camera controllability.
SIMPLER simulated environments yield policy performance that correlates strongly with real-world robot manipulation results and captures similar sensitivity to distribution shifts.
InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.
DecoRec decomposes single-view 3D scene reconstruction into per-object diffusion reconstructions followed by a differentiable rendering and diffusion-guided merging pipeline.
LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.
citing papers explorer
-
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.