pith. machine review for the scientific record. sign in

arxiv: 2308.16512 · v4 · submitted 2023-08-31 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

MVDream: Multi-view Diffusion for 3D Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-15 08:31 UTC · model grok-4.3

classification 💻 cs.CV
keywords multi-view diffusion3D generationtext-to-3Dscore distillation samplingconsistent multi-view images3D priorfew-shot 3D learning
0
0 comments X

The pith

A multi-view diffusion model trained on both 2D and 3D data acts as a generalizable 3D prior that improves consistency in text-to-3D generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MVDream, a diffusion model designed to produce consistent images across multiple viewpoints from a single text prompt. It achieves this by training jointly on 2D image-text pairs and 3D data, merging the wide coverage of standard 2D diffusion with the geometric coherence of rendered 3D views. This trained model functions as an implicit 3D prior that works independently of any particular 3D shape format. When plugged into score distillation sampling, it yields more stable and consistent 3D outputs than methods relying solely on 2D diffusion. The same model also supports personalizing new 3D concepts from a small number of 2D reference images.

Core claim

MVDream shows that a multi-view diffusion model learned from both 2D and 3D data is implicitly a generalizable 3D prior agnostic to 3D representations. Applied via Score Distillation Sampling, it markedly improves the consistency and stability of existing 2D-lifting approaches to 3D generation. It further enables few-shot concept learning from 2D examples for 3D output, analogous to DreamBooth but in the 3D setting.

What carries the argument

The multi-view diffusion model trained jointly on 2D and 3D data, which generates viewpoint-consistent images and thereby encodes an implicit 3D prior usable in score distillation sampling.

If this is right

  • Existing 2D-lifting pipelines for text-to-3D can be upgraded to higher consistency simply by swapping in the multi-view diffusion prior.
  • Few-shot personalization of 3D objects becomes feasible from ordinary 2D photographs without explicit 3D data.
  • The same prior can be used with any 3D representation because it does not depend on a specific geometry format.
  • Training cost for new 3D generators decreases because the model already supplies multi-view consistency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be extended to video or dynamic scenes by adding temporal consistency as another training signal.
  • If the implicit prior holds across domains, similar joint 2D-3D training might improve consistency in other generative tasks such as novel-view synthesis.
  • Downstream applications could combine this prior with faster inference methods to make real-time 3D content creation more practical.

Load-bearing premise

Joint training on 2D and 3D data yields a prior that generalizes to new text prompts and shapes without overfitting to the specific training renderings or degrading single-view image quality.

What would settle it

A direct comparison showing that score distillation sampling with MVDream produces no measurable gain in multi-view consistency or output stability over standard 2D diffusion baselines on a fixed set of text prompts would falsify the central claim.

read the original abstract

We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings. We demonstrate that such a multi-view diffusion model is implicitly a generalizable 3D prior agnostic to 3D representations. It can be applied to 3D generation via Score Distillation Sampling, significantly enhancing the consistency and stability of existing 2D-lifting methods. It can also learn new concepts from a few 2D examples, akin to DreamBooth, but for 3D generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MVDream, a multi-view diffusion model trained jointly on 2D and 3D data to generate consistent multi-view images from text prompts. It claims this model functions as an implicit generalizable 3D prior agnostic to representations, which can be applied via Score Distillation Sampling (SDS) to enhance consistency and stability in existing 2D-lifting 3D generation methods, and extended to few-shot 3D concept learning akin to DreamBooth.

Significance. If validated, the work offers a practical bridge between the generalizability of 2D diffusion models and the multi-view consistency of 3D renderings, potentially improving text-to-3D pipelines without explicit 3D representations. The empirical demonstrations of SDS-based generation and few-shot adaptation provide concrete value for 3D content creation applications, though the strength depends on rigorous quantitative support for the transfer claims.

major comments (2)
  1. [§4.1] §4.1 (3D Generation via SDS): The central claim that MVDream 'significantly enhancing the consistency and stability of existing 2D-lifting methods' lacks load-bearing quantitative evidence; no metrics (e.g., multi-view consistency scores, CLIP similarity across views, or direct comparisons to DreamFusion baselines) or ablation isolating the multi-view prior's contribution are reported, leaving the enhancement unverified.
  2. [§5] §5 (Few-shot 3D Concept Learning): The generalizability of the 3D prior to novel text prompts and shapes rests on the untested assumption that joint 2D+3D training avoids overfitting to the specific 3D training renderings; no ablation on 3D data contribution, no diversity statistics for the 3D corpus, and no out-of-distribution shape/prompt tests are provided to support the transfer claim.
minor comments (2)
  1. [§3.2] §3.2 (Model Architecture): The definition and range of the 'multi-view conditioning strength' hyperparameter is introduced without explicit notation or sensitivity analysis, complicating reproducibility of the reported results.
  2. [Figure 3] Figure 3 (Qualitative Results): The caption does not specify the exact text prompts or camera poses used for the multi-view generations, reducing clarity for readers attempting to interpret the consistency improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address the major comments point-by-point below and will revise the manuscript accordingly to strengthen the quantitative support for our claims.

read point-by-point responses
  1. Referee: [§4.1] §4.1 (3D Generation via SDS): The central claim that MVDream 'significantly enhancing the consistency and stability of existing 2D-lifting methods' lacks load-bearing quantitative evidence; no metrics (e.g., multi-view consistency scores, CLIP similarity across views, or direct comparisons to DreamFusion baselines) or ablation isolating the multi-view prior's contribution are reported, leaving the enhancement unverified.

    Authors: We agree that the current manuscript relies primarily on qualitative results for this claim. In the revision we will add quantitative metrics including multi-view consistency scores and average CLIP similarity across generated views, plus direct numerical comparisons against DreamFusion baselines. We will also include an ablation that isolates the multi-view prior's contribution by comparing against a 2D-only diffusion baseline under identical SDS settings. revision: yes

  2. Referee: [§5] §5 (Few-shot 3D Concept Learning): The generalizability of the 3D prior to novel text prompts and shapes rests on the untested assumption that joint 2D+3D training avoids overfitting to the specific 3D training renderings; no ablation on 3D data contribution, no diversity statistics for the 3D corpus, and no out-of-distribution shape/prompt tests are provided to support the transfer claim.

    Authors: We acknowledge that additional controls are needed to substantiate the transfer claim. The revised version will report (1) an ablation measuring performance with and without the 3D training data, (2) basic diversity statistics (e.g., object category coverage and viewpoint distribution) for the 3D corpus, and (3) qualitative and quantitative results on out-of-distribution shapes and prompts not seen during training. revision: yes

Circularity Check

0 steps flagged

No significant circularity; core claim is empirical training result

full rationale

The paper claims that joint training on 2D and 3D data yields a multi-view diffusion model that acts as a generalizable 3D prior, demonstrated via application to SDS for 3D generation. This rests on external datasets and standard diffusion training rather than any self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation chain. No equations or derivations reduce the claimed prior to its inputs by construction. Minor self-citations (e.g., to DreamBooth or SDS) are not central to the derivation and do not force the result. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach assumes standard diffusion training dynamics extend to multi-view conditioning and that SDS can leverage the learned prior without additional 3D-specific losses.

free parameters (1)
  • multi-view conditioning strength
    Weighting between 2D and 3D training signals is chosen to balance consistency and generalizability.
axioms (1)
  • domain assumption Joint 2D-3D training yields a prior that is agnostic to explicit 3D representations.
    Invoked when claiming the model can be used directly with SDS for any 3D representation.

pith-pipeline@v0.9.0 · 5426 in / 1263 out tokens · 37067 ms · 2026-05-15T08:31:50.509985+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Mind the Gap: Geometrically Accurate Generative Reconstruction from Disjoint Views

    cs.CV 2026-05 unverdicted novelty 8.0

    GLADOS reconstructs 3D geometry from disjoint views by generating intermediate perspectives, performing robust coarse alignment that tolerates generative inconsistencies, and iteratively expanding context for consistency.

  2. R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

    cs.CV 2026-05 unverdicted novelty 7.0

    R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.

  3. Geometrically Consistent Multi-View Scene Generation from Freehand Sketches

    cs.CV 2026-04 unverdicted novelty 7.0

    A framework generates consistent multi-view scenes from one freehand sketch via a ~9k-sample dataset, Parallel Camera-Aware Attention Adapters, and Sparse Correspondence Supervision Loss, outperforming baselines in re...

  4. SparseCam4D: Spatio-Temporally Consistent 4D Reconstruction from Sparse Cameras

    cs.CV 2026-03 unverdicted novelty 7.0

    SparseCam4D achieves spatio-temporally consistent high-fidelity 4D reconstruction from sparse cameras via a Spatio-Temporal Distortion Field that corrects inconsistencies in generative observations.

  5. GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction

    cs.CV 2026-05 unverdicted novelty 6.0

    GeoQuery replaces corrupted rendering features with geometry-aligned proxy queries and restricts cross-view attention to local windows, enabling robust diffusion-based refinement under extreme view sparsity.

  6. Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search

    cs.CV 2026-05 unverdicted novelty 6.0

    Imagining in 360° decouples visual search into a single-step probabilistic semantic layout predictor and an actor, removing the need for multi-turn CoT reasoning and trajectory annotations while improving efficiency i...

  7. Velox: Learning Representations of 4D Geometry and Appearance

    cs.CV 2026-05 unverdicted novelty 6.0

    Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth...

  8. Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion

    cs.CV 2026-05 unverdicted novelty 6.0

    DiLAST optimizes 3D latents via guidance from a 2D diffusion model to enable generalizable style transfer for OOD styles in 3D asset generation.

  9. REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement

    cs.CV 2026-04 unverdicted novelty 6.0

    REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.

  10. Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens

    cs.CV 2026-04 unverdicted novelty 6.0

    Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.

  11. AnyLift: Scaling Motion Reconstruction from Internet Videos via 2D Diffusion

    cs.CV 2026-04 unverdicted novelty 6.0

    A two-stage method synthesizes multi-view 2D motion data from internet video keypoints and trains a camera-conditioned diffusion model to recover globally consistent 3D human motion and HOI in world space.

  12. HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance

    cs.CV 2026-04 unverdicted novelty 6.0

    HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.

  13. Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas

    cs.CV 2026-03 unverdicted novelty 6.0

    Stepper uses stepwise panoramic expansion with a multi-view 360-degree diffusion model and geometry reconstruction to produce high-fidelity, structurally consistent immersive 3D scenes from text.

  14. InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

    cs.CV 2024-04 unverdicted novelty 6.0

    InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.

  15. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

    cs.CV 2023-11 conditional novelty 6.0

    Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results...

  16. R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

    cs.CV 2026-05 unverdicted novelty 5.0

    R-DMesh uses a VAE with a learned rectification jump offset and Triflow Attention inside a rectified-flow diffusion transformer to produce video-aligned 4D meshes despite initial pose misalignment.

  17. Pose-Aware Diffusion for 3D Generation

    cs.CV 2026-05 unverdicted novelty 5.0

    PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.

  18. Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation

    cs.CV 2026-04 unverdicted novelty 5.0

    Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples spa...

  19. AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation

    cs.CV 2026-04 unverdicted novelty 4.0

    AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantica...

  20. Cosmos World Foundation Model Platform for Physical AI

    cs.CV 2025-01 unverdicted novelty 3.0

    The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.

Reference graph

Works this paper leans on

161 extracted references · 161 canonical work pages · cited by 19 Pith papers · 7 internal anchors

  1. [1]

    https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

    stable-diffusion-xl-base-1.0. https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0. Accessed: 2023-08-29

  2. [2]

    https://sketchfab.com/3d-models/popular

    Sketchfab. https://sketchfab.com/3d-models/popular. Accessed: 2023-08-30

  3. [3]

    https://huggingface.co/DeepFloyd

    Deepfloyd. https://huggingface.co/DeepFloyd. Accessed: 2023-08-25

  4. [4]

    https://lumalabs.ai/dashboard/imagine

    Luma.ai. https://lumalabs.ai/dashboard/imagine. Accessed: 2023-08-25

  5. [5]

    https://huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations

    Stable diffusion image variation. https://huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations

  6. [6]

    https://huggingface.co/stabilityai/stable-diffusion-2-1-base

    Stable diffusion 2.1 base. https://huggingface.co/stabilityai/stable-diffusion-2-1-base. Accessed: 2023-07-14

  7. [7]

    https://github.com/threestudio-project/threestudio

    Threestudio project. https://github.com/threestudio-project/threestudio. Accessed: 2023-08-25

  8. [9]

    Barron, Ben Mildenhall, Dor Verbin, Pratul P

    Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022

  9. [10]

    Align your latents: High-resolution video synthesis with latent diffusion models

    Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent diffusion models. In CVPR, 2023

  10. [11]

    Efficient geometry-aware 3d generative adversarial networks

    Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. In CVPR, 2022

  11. [12]

    Chan, Koki Nagano, Matthew A

    Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexander W. Bergman, Jeong Joon Park, Axel Levy, Miika Aittala, Shalini De Mello, Tero Karras, and Gordon Wetzstein. GeNVS : Generative novel view synthesis with 3D -aware diffusion models. In arXiv, 2023

  12. [14]

    Objaverse: A universe of annotated 3d objects

    Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. In CVPR, pp.\ 13142--13153, 2023

  13. [15]

    Gram: Generative radiance manifolds for 3d-aware image generation

    Yu Deng, Jiaolong Yang, Jianfeng Xiang, and Xin Tong. Gram: Generative radiance manifolds for 3d-aware image generation. In CVPR, pp.\ 10673--10683, 2022

  14. [16]

    Get3d: A generative model of high quality 3d textured shapes learned from images

    Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images. NeurIPS, 2022

  15. [17]

    Learning single-image 3d reconstruction by generative modelling of shape, pose and shading

    Paul Henderson and Vittorio Ferrari. Learning single-image 3d reconstruction by generative modelling of shape, pose and shading. International Journal of Computer Vision, 2020

  16. [18]

    Leveraging 2d data to learn textured 3d mesh generation

    Paul Henderson, Vagia Tsiminaki, and Christoph H Lampert. Leveraging 2d data to learn textured 3d mesh generation. In CVPR, 2020

  17. [19]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. NeurIPS, 2017

  18. [23]

    Holodiffusion: Training a 3d diffusion model using 2d images

    Animesh Karnewar, Andrea Vedaldi, David Novotny, and Niloy J Mitra. Holodiffusion: Training a 3d diffusion model using 2d images. In CVPR, 2023

  19. [24]

    Adam: A method for stochastic optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2014

  20. [25]

    Auto-encoding variational bayes

    Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In ICLR, 2014

  21. [26]

    Magic3d: High-resolution text-to-3d content creation

    Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023 a

  22. [29]

    Nerf: Representing scenes as neural radiance fields for view synthesis

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2021

  23. [30]

    Instant neural graphics primitives with a multiresolution hash encoding

    Thomas M\"uller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 2022

  24. [31]

    Hologan: Unsupervised learning of 3d representations from natural images

    Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, and Yong-Liang Yang. Hologan: Unsupervised learning of 3d representations from natural images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019

  25. [32]

    Blockgan: Learning 3d object-aware scene representations from unlabelled images

    Thu H Nguyen-Phuoc, Christian Richardt, Long Mai, Yongliang Yang, and Niloy Mitra. Blockgan: Learning 3d object-aware scene representations from unlabelled images. NeurIPS, 2020

  26. [34]

    Giraffe: Representing scenes as compositional generative neural feature fields

    Michael Niemeyer and Andreas Geiger. Giraffe: Representing scenes as compositional generative neural feature fields. In CVPR, 2021

  27. [36]

    Barron, and Ben Mildenhall

    Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2023

  28. [37]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In ICML, 2021

  29. [39]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \" o rn Ommer. High-resolution image synthesis with latent diffusion models. In CVPR, 2022

  30. [40]

    Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation

    Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023

  31. [41]

    Photorealistic text-to-image diffusion models with deep language understanding

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 2022

  32. [42]

    Improved techniques for training gans

    Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. NeurIPS, 2016

  33. [43]

    Laion-5b: An open large-scale dataset for training next generation image-text models

    Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS, 2022

  34. [44]

    Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis

    Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. In NeurIPS, 2021

  35. [45]

    3d neural field generation using triplane diffusion

    J Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, and Gordon Wetzstein. 3d neural field generation using triplane diffusion. In CVPR, 2023

  36. [47]

    Scene representation networks: Continuous 3d-structure-aware neural scene representations

    Vincent Sitzmann, Michael Zollh \"o fer, and Gordon Wetzstein. Scene representation networks: Continuous 3d-structure-aware neural scene representations. NeurIPS, 32, 2019

  37. [49]

    Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data, 2023

    Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data, 2023

  38. [53]

    Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation

    Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A Yeh, and Greg Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023 a

  39. [54]

    Rodin: A generative model for sculpting 3d digital avatars using diffusion

    Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, et al. Rodin: A generative model for sculpting 3d digital avatars using diffusion. In CVPR, 2023 b

  40. [56]

    Novel view synthesis with diffusion models

    Daniel Watson, William Chan, Ricardo Martin - Brualla, Jonathan Ho, Andrea Tagliasacchi, and Mohammad Norouzi. Novel view synthesis with diffusion models. In ICLR, 2023

  41. [57]

    Multiview compressive coding for 3d reconstruction

    Chao-Yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, and Georgia Gkioxari. Multiview compressive coding for 3d reconstruction. In CVPR, 2023

  42. [59]

    Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction

    Zhizhuo Zhou and Shubham Tulsiani. Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023

  43. [60]

    MetaHuman , howpublished =

  44. [61]

    Proceedings of the 6th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS) for Security, Safety and Monitoring in Smart Environments , year=

    A 3D Face Model for Pose and Illumination Invariant Face Recognition , author=. Proceedings of the 6th IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS) for Security, Safety and Monitoring in Smart Environments , year=

  45. [62]

    International Journal of Computer Vision , year=

    Learning single-image 3d reconstruction by generative modelling of shape, pose and shading , author=. International Journal of Computer Vision , year=

  46. [63]

    CVPR , year=

    Leveraging 2d data to learn textured 3d mesh generation , author=. CVPR , year=

  47. [64]

    CVPR , year=

    Efficient geometry-aware 3D generative adversarial networks , author=. CVPR , year=

  48. [65]

    NeurIPS , year=

    Get3d: A generative model of high quality 3d textured shapes learned from images , author=. NeurIPS , year=

  49. [66]

    CVPR , year=

    Rodin: A generative model for sculpting 3d digital avatars using diffusion , author=. CVPR , year=

  50. [67]

    CVPR , year=

    Holodiffusion: Training a 3D diffusion model using 2D images , author=. CVPR , year=

  51. [68]

    CVPR , year=

    3d neural field generation using triplane diffusion , author=. CVPR , year=

  52. [69]

    and Mildenhall, Ben , title =

    Poole, Ben and Jain, Ajay and Barron, Jonathan T. and Mildenhall, Ben , title =. ICLR , year=

  53. [70]

    CVPR , year=

    Magic3d: High-resolution text-to-3d content creation , author=. CVPR , year=

  54. [71]

    ECCV , year=

    Nerf: Representing scenes as neural radiance fields for view synthesis , author=. ECCV , year=

  55. [72]

    ICLR , year=

    Auto-encoding variational bayes , author=. ICLR , year=

  56. [73]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , year=

    Hologan: Unsupervised learning of 3d representations from natural images , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , year=

  57. [74]

    NeurIPS , year=

    Blockgan: Learning 3d object-aware scene representations from unlabelled images , author=. NeurIPS , year=

  58. [75]

    CVPR , year=

    Giraffe: Representing scenes as compositional generative neural feature fields , author=. CVPR , year=

  59. [76]

    CVPR , pages=

    Lifting 2d stylegan for 3d-aware face generation , author=. CVPR , pages=

  60. [77]

    CVPR , pages=

    Gram: Generative radiance manifolds for 3d-aware image generation , author=. CVPR , pages=

  61. [78]

    CVPR , year=

    Multiview compressive coding for 3D reconstruction , author=. CVPR , year=

  62. [79]

    Point-E: A System for Generating 3D Point Clouds from Complex Prompts

    Point-e: A system for generating 3d point clouds from complex prompts , author=. arXiv:2212.08751 , year=

  63. [80]

    Shap-e: Generating conditional 3d implicit functions

    Shap-e: Generating conditional 3d implicit functions , author=. arXiv:2305.02463 , year=

  64. [81]

    CVPR , year=

    Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation , author=. CVPR , year=

  65. [82]

    Stable Diffusion 2.1 base , howpublished =

  66. [83]

    arXiv:2306.12422 , year=

    DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation , author=. arXiv:2306.12422 , year=

  67. [84]

    arXiv:2305.16213 , year=

    ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation , author=. arXiv:2305.16213 , year=

  68. [85]

    arXiv:2303.13873 , year=

    Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation , author=. arXiv:2303.13873 , year=

  69. [86]

    arXiv:2304.12439 , year=

    TextMesh: Generation of Realistic 3D Meshes From Text Prompts , author=. arXiv:2304.12439 , year=

  70. [87]

    CVPR , year=

    Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation , author=. CVPR , year=

  71. [88]

    arXiv:2303.13508 , year=

    Dreambooth3d: Subject-driven text-to-3d generation , author=. arXiv:2303.13508 , year=

  72. [89]

    arXiv:2303.11328 , year=

    Zero-1-to-3: Zero-shot one image to 3d object , author=. arXiv:2303.11328 , year=

  73. [90]

    Mehdi S. M. Sajjadi and Henning Meyer and Etienne Pot and Urs Bergmann and Klaus Greff and Noha Radwan and Suhani Vora and Mario Lucic and Daniel Duckworth and Alexey Dosovitskiy and Jakob Uszkoreit and Thomas A. Funkhouser and Andrea Tagliasacchi , title =. CVPR , year =

  74. [91]

    Denoising Diffusion Probabilistic Models , booktitle =

    Jonathan Ho and Ajay Jain and Pieter Abbeel , editor =. Denoising Diffusion Probabilistic Models , booktitle =. 2020 , url =

  75. [92]

    Deep Unsupervised Learning using Nonequilibrium Thermodynamics , booktitle =

    Jascha Sohl. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , booktitle =. 2015 , url =

  76. [93]

    Generative Modeling by Estimating Gradients of the Data Distribution , booktitle =

    Yang Song and Stefano Ermon , editor =. Generative Modeling by Estimating Gradients of the Data Distribution , booktitle =. 2019 , url =

  77. [94]

    Diffusion Models Beat GANs on Image Synthesis , booktitle =

    Prafulla Dhariwal and Alexander Quinn Nichol , editor =. Diffusion Models Beat GANs on Image Synthesis , booktitle =. 2021 , url =

  78. [95]

    Fleet and Mohammad Norouzi and Tim Salimans , title =

    Jonathan Ho and Chitwan Saharia and William Chan and David J. Fleet and Mohammad Norouzi and Tim Salimans , title =. J. Mach. Learn. Res. , volume =. 2022 , url =

  79. [96]

    RePaint: Inpainting using Denoising Diffusion Probabilistic Models , booktitle =

    Andreas Lugmayr and Martin Danelljan and Andr. RePaint: Inpainting using Denoising Diffusion Probabilistic Models , booktitle =

  80. [97]

    Fleet and Mohammad Norouzi , title =

    Chitwan Saharia and Jonathan Ho and William Chan and Tim Salimans and David J. Fleet and Mohammad Norouzi , title =

Showing first 80 references.