pith. machine review for the scientific record. sign in

arxiv: 2209.14988 · v1 · submitted 2022-09-29 · 💻 cs.CV · cs.LG· stat.ML

Recognition: 2 theorem links

· Lean Theorem

DreamFusion: Text-to-3D using 2D Diffusion

Authors on Pith no claims yet

Pith reviewed 2026-05-11 11:19 UTC · model grok-4.3

classification 💻 cs.CV cs.LGstat.ML
keywords text-to-3Ddiffusion modelsNeural Radiance Fieldsprobability density distillation3D generationscore distillation samplingNeRF optimization
0
0 comments X

The pith

A 2D text-to-image diffusion model can serve as a prior to optimize a Neural Radiance Field into a consistent 3D model from text alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that large pretrained 2D diffusion models can drive text-to-3D synthesis by guiding the optimization of a 3D scene representation. It replaces the need for 3D datasets or 3D denoising architectures with a distillation loss applied to random 2D renderings of the 3D model. A reader would care because this turns existing image generators into 3D creators without additional training data or model changes. The resulting models support free viewpoint rendering, relighting, and composition into other scenes.

Core claim

A loss based on probability density distillation turns a pretrained 2D diffusion model into a prior that optimizes a randomly initialized Neural Radiance Field so its renderings from random angles achieve low loss under the text prompt; the resulting 3D model requires no 3D training data and no changes to the diffusion model.

What carries the argument

The probability density distillation loss, which converts the 2D diffusion model's denoising gradients into updates for 3D parameters via random-view renderings.

If this is right

  • The 3D model can be viewed from any angle after optimization.
  • Arbitrary illumination can be applied to relight the model.
  • The model can be composited into arbitrary 3D environments.
  • No 3D training data or 3D-specific architectures are required.
  • Existing 2D diffusion models can be used without modification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The success implies that 2D diffusion models have already learned enough 3D structure from their image-text training data to support 3D inference.
  • The same distillation approach could be tested on other 3D representations such as meshes or Gaussian splats.
  • Extending the random-view sampling to include temporal consistency might enable text-to-4D generation as a direct next step.

Load-bearing premise

Random 2D renderings scored by the 2D model will produce geometrically consistent 3D structure without explicit multi-view constraints or 3D supervision.

What would settle it

If the optimized NeRF produces renderings from novel viewpoints that violate the original text prompt or exhibit geometric inconsistencies such as floating artifacts or incorrect depth ordering.

read the original abstract

Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs. Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss. The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper introduces DreamFusion, a method for text-to-3D synthesis that optimizes a Neural Radiance Field (NeRF) using a Score Distillation Sampling (SDS) loss derived from a fixed pretrained 2D text-to-image diffusion model. This enables generation of 3D models from text prompts via gradient descent on random-view renderings, without 3D training data or modifications to the diffusion model, and supports applications such as novel view synthesis, relighting, and compositing.

Significance. If the empirical results hold, the work is significant for showing that 2D diffusion priors can be effectively distilled into 3D representations through the SDS loss, bypassing the lack of large-scale 3D datasets. It provides concrete evidence via optimization on diverse text prompts and demonstrates practical outputs, crediting the parameter-free use of an external pretrained model as a key strength.

major comments (1)
  1. [§3.2] §3.2: The SDS loss is applied independently to single-view renderings sampled from random cameras with no cross-view consistency term, depth regularizer, or multi-view correspondence loss. The central claim that this yields coherent 3D geometry therefore depends on the unanalyzed assumption that the shared NeRF parameters will avoid view-inconsistent minima; while §4 reports successful examples, the manuscript provides no derivation or empirical stress test of when this consistency emerges.
minor comments (3)
  1. [§3.1] §3.1: The notation distinguishing the diffusion model parameters φ from the NeRF parameters θ could be made more explicit to avoid confusion with standard diffusion literature.
  2. [§4] §4: Figure captions would benefit from including the exact text prompt and camera sampling details for each example to improve reproducibility.
  3. [Abstract] Abstract: The phrase 'DeepDream-like procedure' is used without a brief definition or reference, which may reduce accessibility for readers unfamiliar with that technique.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment and constructive feedback on our work. We address the major comment below.

read point-by-point responses
  1. Referee: [§3.2] §3.2: The SDS loss is applied independently to single-view renderings sampled from random cameras with no cross-view consistency term, depth regularizer, or multi-view correspondence loss. The central claim that this yields coherent 3D geometry therefore depends on the unanalyzed assumption that the shared NeRF parameters will avoid view-inconsistent minima; while §4 reports successful examples, the manuscript provides no derivation or empirical stress test of when this consistency emerges.

    Authors: We agree that the SDS loss is applied to individual renderings without explicit cross-view terms. Coherence emerges because the NeRF parameters are shared and jointly optimized over a distribution of random viewpoints; a view-inconsistent solution would produce high average loss across the sampled poses. The manuscript relies on this mechanism and demonstrates its effectiveness through the diverse successful examples in §4, but does not include a formal derivation or systematic stress tests for failure modes. In revision we will expand §3.2 with a short discussion of how shared parameters promote consistency and will add a small number of challenging cases illustrating when inconsistencies can occur. revision: partial

Circularity Check

0 steps flagged

No circularity; SDS loss derived from external fixed diffusion prior

full rationale

The paper introduces the SDS loss in §3.2 as a probability density distillation term taken directly from the score function of a frozen, pretrained 2D diffusion model φ. NeRF parameters θ are then optimized by gradient descent on random-view renderings x = g(θ, c) to minimize L_SDS(φ, x). Because φ is external and fixed, and the loss contains no self-referential terms or fitted parameters that are later renamed as predictions, no step in the derivation chain reduces the target 3D geometry to an input by construction. The multi-view consistency is an empirical outcome of shared θ under the external prior rather than a tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central contribution is the new distillation loss; everything else rests on standard assumptions of gradient-based optimization and the semantic coverage of the pretrained 2D model.

free parameters (1)
  • optimization hyperparameters
    Learning rate, number of views per iteration, and loss weighting coefficients are chosen empirically.
axioms (2)
  • domain assumption Gradient descent on the distillation loss will converge to a 3D representation whose renderings match the text prompt
    Invoked throughout the optimization procedure in Section 3.
  • domain assumption A 2D diffusion model trained on image-text pairs encodes sufficient 3D-consistent semantic information
    Core premise for using the model as a 3D prior without 3D data.
invented entities (1)
  • Score Distillation Sampling (SDS) loss no independent evidence
    purpose: Distills gradients from the 2D diffusion density into updates for 3D parameters
    Newly defined in this paper as the mechanism that transfers 2D knowledge to 3D optimization.

pith-pipeline@v0.9.0 · 5512 in / 1540 out tokens · 45629 ms · 2026-05-11T11:19:58.112831+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Foundation/AlexanderDuality alexander_duality_circle_linking contradicts
    ?
    contradicts

    CONTRADICTS: the theorem conflicts with this paper passage, or marks a claim that would need revision before publication.

    The core method in §3.2 optimizes NeRF parameters θ via the SDS loss L_SDS(φ, x = g(θ, c)) applied to renderings x from randomly sampled cameras c, where φ is the frozen 2D diffusion model. Each term in the loss depends only on a single 2D image and its noise prediction; no cross-view term, depth consistency regularizer, or multi-view correspondence loss is present.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 45 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ReConText3D: Replay-based Continual Text-to-3D Generation

    cs.CV 2026-04 conditional novelty 8.0

    ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.

  2. R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

    cs.CV 2026-05 unverdicted novelty 7.0

    R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.

  3. GTA: Advancing Image-to-3D World Generation via Geometry Then Appearance Video Diffusion

    cs.CV 2026-05 unverdicted novelty 7.0

    GTA generates 3D worlds from single images via a two-stage video diffusion process that prioritizes geometry before appearance to improve structural consistency.

  4. 3DReflecNet: A Large-Scale Dataset for 3D Reconstruction of Reflective, Transparent, and Low-Texture Objects

    cs.CV 2026-05 unverdicted novelty 7.0

    3DReflecNet is a 22 TB+ dataset of over 120,000 synthetic and 1,000 real objects with millions of multi-view frames for benchmarking 3D reconstruction on reflective, transparent, and low-texture surfaces.

  5. Generative Modeling with Orbit-Space Particle Flow Matching

    cs.GR 2026-05 unverdicted novelty 7.0

    OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.

  6. SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation

    cs.AI 2026-04 unverdicted novelty 7.0

    SpatialGrammar provides a grid-based DSL and compiler that lets LLMs generate collision-free 3D indoor scenes more reliably than raw-coordinate or code-based approaches.

  7. GSCompleter: A Distillation-Free Plugin for Metric-Aware 3D Gaussian Splatting Completion in Seconds

    cs.CV 2026-04 unverdicted novelty 7.0

    GSCompleter completes sparse 3D Gaussian Splatting scenes via a distillation-free generate-then-register pipeline using Stereo-Anchor lifting and Ray-Constrained Registration, delivering SOTA results on three benchmarks.

  8. TransSplat: Unbalanced Semantic Transport for Language-Driven 3DGS Editing

    cs.CV 2026-04 unverdicted novelty 7.0

    TransSplat uses unbalanced semantic transport to match edited 2D evidence with 3D Gaussians and recover a shared 3D edit field, yielding better local accuracy and structural consistency than prior view-consistency methods.

  9. Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning

    cs.LG 2026-04 unverdicted novelty 7.0

    GDMD replaces raw-sample rewards with distillation-gradient rewards in RL-guided diffusion distillation, yielding 4-step models that surpass their multi-step teachers on GenEval and human preference metrics.

  10. Brain3D: EEG-to-3D Decoding of Visual Representations via Multimodal Reasoning

    cs.CV 2026-04 unverdicted novelty 7.0

    A multimodal pipeline decodes EEG into 3D meshes via EEG-to-image, MLLM reasoning, diffusion, and single-image-to-3D conversion, reporting 85.4% 10-way accuracy and 0.648 CLIPScore.

  11. SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation

    cs.CV 2026-04 unverdicted novelty 7.0

    SEM-ROVER generates large multiview-consistent 3D urban driving scenes via semantic-conditioned diffusion on Σ-Voxfield voxel grids with progressive outpainting and deferred rendering.

  12. THOM: Generating Physically Plausible Hand-Object Meshes From Text

    cs.CV 2026-04 unverdicted novelty 7.0

    THOM is a training-free two-stage framework that generates physically plausible hand-object 3D meshes directly from text by combining text-guided Gaussians with contact-aware physics optimization and VLM refinement.

  13. Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

    cs.HC 2026-03 unverdicted novelty 7.0

    XR Blocks supplies an LLM-optimized Reality Model and Vibe Coding XR workflow that converts high-level prompts into working physics-aware XR applications with high one-shot success.

  14. Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation

    cs.RO 2026-05 unverdicted novelty 6.0

    VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.

  15. Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion

    cs.CV 2026-05 unverdicted novelty 6.0

    DiLAST optimizes 3D latents via guidance from a 2D diffusion model to enable generalizable style transfer for OOD styles in 3D asset generation.

  16. InpaintSLat: Inpainting Structured 3D Latents via Initial Noise Optimization

    cs.CV 2026-05 unverdicted novelty 6.0

    Optimizing initial noise via backpropagation approximation and spectral parameterization in structured 3D latent diffusion yields higher contextual consistency and prompt alignment in training-free inpainting.

  17. REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement

    cs.CV 2026-04 unverdicted novelty 6.0

    REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.

  18. Sparse-View 3D Gaussian Splatting in the Wild

    cs.CV 2026-04 unverdicted novelty 6.0

    A new sparse-view 3D Gaussian splatting method for unconstrained scenes with distractors combines diffusion-based reference-guided refinement and sparsity-aware Gaussian replication to achieve better rendering quality.

  19. FluSplat: Sparse-View 3D Editing without Test-Time Optimization

    cs.CV 2026-04 unverdicted novelty 6.0

    FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.

  20. Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens

    cs.CV 2026-04 unverdicted novelty 6.0

    Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.

  21. Deepfake Detection Generalization with Diffusion Noise

    cs.CV 2026-04 unverdicted novelty 6.0

    ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.

  22. Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data

    cs.CV 2026-04 unverdicted novelty 6.0

    BVE framework enables text-guided 3D editing beyond voxel limits by combining self-constructed data, lightweight semantic injection, and annotation-free masking to preserve local invariance.

  23. Grasp in Gaussians: Fast Monocular Reconstruction of Dynamic Hand-Object Interactions

    cs.CV 2026-04 unverdicted novelty 6.0

    GraG reconstructs dynamic 3D hand-object interactions from monocular video 6.4x faster than prior work by using compact Sum-of-Gaussians tracking initialized from large models and refined with 2D losses.

  24. Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting

    cs.RO 2026-04 unverdicted novelty 6.0

    Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.

  25. ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment

    cs.CV 2026-04 unverdicted novelty 6.0

    ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.

  26. Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories

    cs.CV 2026-04 unverdicted novelty 6.0

    A video diffusion model learns a joint distribution over videos and camera trajectories by representing cameras as pixel-aligned ray encodings (raxels) denoised jointly with video frames via decoupled attention.

  27. TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches

    cs.CV 2026-04 unverdicted novelty 6.0

    TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.

  28. Guiding a Diffusion Model by Swapping Its Tokens

    cs.CV 2026-04 unverdicted novelty 6.0

    Self-Swap Guidance steers diffusion sampling by swapping dissimilar token latents to enable CFG-like improvements for both conditional and unconditional generation.

  29. 3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience

    cs.CV 2026-04 unverdicted novelty 6.0

    3DrawAgent lets LLMs create complex 3D sketches from text prompts by using pairwise comparisons of their own outputs to self-improve spatial drawing skills without parameter updates.

  30. DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics

    cs.CV 2026-04 unverdicted novelty 6.0

    DailyArt recovers full joint parameters of articulated objects from a single static image by synthesizing an opened state and comparing discrepancies, supporting downstream part-level novel state synthesis.

  31. MemoryDiorama: Generating Dynamic 3D Diorama from Everyday Photos for Memory Recall

    cs.HC 2026-04 unverdicted novelty 6.0

    MemoryDiorama generates animated 3D dioramas from photos via LLM scene analysis and generative components, yielding richer autobiographical recall than photo-only or static diorama baselines.

  32. HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance

    cs.CV 2026-04 unverdicted novelty 6.0

    HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.

  33. MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model

    cs.CV 2026-03 unverdicted novelty 6.0

    MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.

  34. Teaching an Agent to Sketch One Part at a Time

    cs.AI 2026-03 unverdicted novelty 6.0

    A multi-modal LM agent is trained to produce vector sketches part-by-part via supervised fine-tuning and process-reward RL on the new ControlSketch-Part dataset with automatic part annotations.

  35. R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

    cs.CV 2026-05 unverdicted novelty 5.0

    R-DMesh uses a VAE with a learned rectification jump offset and Triflow Attention inside a rectified-flow diffusion transformer to produce video-aligned 4D meshes despite initial pose misalignment.

  36. RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation

    cs.CV 2026-05 unverdicted novelty 5.0

    RealDiffusion uses heat diffusion as a dissipative prior and a region-aware stochastic process inside a training-free physics-informed attention mechanism to improve multi-character coherence while preserving narrativ...

  37. SpatialPrompt: XR-Based Spatial Intent Expression as Executable Constraints for AI Generative 3D Design

    cs.HC 2026-05 unverdicted novelty 5.0

    SpatialPrompt turns spatial sketches and voice prompts into executable constraints for controllable AI 3D generation in XR, enabling iterative collaborative creation with color-coded contributions.

  38. ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation

    cs.CV 2026-05 unverdicted novelty 5.0

    ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.

  39. Pose-Aware Diffusion for 3D Generation

    cs.CV 2026-05 unverdicted novelty 5.0

    PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.

  40. Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images

    cs.CV 2026-04 unverdicted novelty 5.0

    Unposed-to-3D learns simulation-ready 3D vehicle models from unposed real images by predicting camera parameters for photometric self-supervision, then adding scale prediction and harmonization.

  41. UniMesh: Unifying 3D Mesh Understanding and Generation

    cs.CV 2026-04 unverdicted novelty 5.0

    UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.

  42. "From remembering to shaping": Narrating Shared Experiences by Co-Designing Cultural Heritage Artifacts in Collaborative VR

    cs.HC 2026-04 unverdicted novelty 5.0

    A collaborative VR workflow with GenAI lets users merge prompts and creatively repurpose outputs to co-create 3D artifacts that narrate shared cultural heritage experiences.

  43. Hitem3D 2.0: Multi-View Guided Native 3D Texture Generation

    cs.CV 2026-04 unverdicted novelty 5.0

    Hitem3D 2.0 combines multi-view image synthesis with native 3D texture projection to improve completeness, cross-view consistency, and geometry alignment over prior methods.

  44. AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation

    cs.CV 2026-04 unverdicted novelty 4.0

    AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantica...

  45. LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

    cs.CV 2026-04 unverdicted novelty 3.0

    This review organizes literature on large multimodal models and object-centric vision into four themes—understanding, referring segmentation, editing, and generation—while summarizing paradigms, strategies, and challe...

Reference graph

Works this paper leans on

149 extracted references · 149 canonical work pages · cited by 44 Pith papers · 9 internal anchors

  1. [1]

    Scaling Learning Algorithms Towards

    Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

  2. [2]

    and Osindero, Simon and Teh, Yee Whye , journal =

    Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

  3. [3]

    2016 , publisher=

    Deep learning , author=. 2016 , publisher=

  4. [4]

    UAI , year=

    Probability Distillation: A Caveat and Alternatives , author=. UAI , year=

  5. [5]

    Wei Ping and Kainan Peng and Jitong Chen , journal=

  6. [6]

    SIGGRAPH , year=

    A signal-processing framework for inverse rendering , author=. SIGGRAPH , year=

  7. [7]

    Barron, Jonathan T and Mildenhall, Ben and Tancik, Matthew and Hedman, Peter and Martin-Brualla, Ricardo and Srinivasan, Pratul P , journal=

  8. [8]

    Barron and Ben Mildenhall and Dor Verbin and Pratul P

    Jonathan T. Barron and Ben Mildenhall and Dor Verbin and Pratul P. Srinivasan and Peter Hedman , journal=

  9. [9]

    Barron and Jitendra Malik , Title =

    Jonathan T. Barron and Jitendra Malik , Title =. TPAMI , year=

  10. [10]

    Land and John J

    Edwin H. Land and John J. McCann , journal =. Lightness and Retinex Theory , year =

  11. [11]

    Courville and Christopher J

    Jae Hyun Lim and Aaron C. Courville and Christopher J. Pal and Chin. ICML , year =

  12. [12]

    ICML , year =

    Deep Unsupervised Learning using Nonequilibrium Thermodynamics , author =. ICML , year =

  13. [13]

    NeurIPS , year=

    Variational Diffusion Models , author=. NeurIPS , year=

  14. [14]

    Denoising Diffusion Probabilistic Models , year =

    Ho, Jonathan and Jain, Ajay and Abbeel, Pieter , journal =. Denoising Diffusion Probabilistic Models , year =

  15. [16]

    Sara and Lopes, Rapha Gontijo and Salimans, Tim and Ho, Jonathan and Fleet, David J and Norouzi, Mohammad , keywords =

    Saharia, Chitwan and Chan, William and Saxena, Saurabh and Li, Lala and Whang, Jay and Denton, Emily and Ghasemipour, Seyed Kamyar Seyed and Ayan, Burcu Karagol and Mahdavi, S. Sara and Lopes, Rapha Gontijo and Salimans, Tim and Ho, Jonathan and Fleet, David J and Norouzi, Mohammad , keywords =. Photorealistic Text-to-Image Diffusion Models with Deep Lang...

  16. [17]

    NeurIPS , year=

    Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling , author=. NeurIPS , year=

  17. [18]

    Henzler, Philipp and Mitra, Niloy J and and Ritschel, Tobias , journal=

  18. [19]

    and Monteiro, Marco and Kellnhofer, Petr and Wu, Jiajun and Wetzstein, Gordon , title =

    Chan, Eric R. and Monteiro, Marco and Kellnhofer, Petr and Wu, Jiajun and Wetzstein, Gordon , title =. CVPR , year=

  19. [20]

    Chan and Connor Z

    Eric R. Chan and Connor Z. Lin and Matthew A. Chan and Koki Nagano and Boxiao Pan and Shalini De Mello and Orazio Gallo and Leonidas Guibas and Jonathan Tremblay and Sameh Khamis and Tero Karras and Gordon Wetzstein , title =. arXiv , year =

  20. [21]

    and Abbeel, Pieter and Poole, Ben , title =

    Jain, Ajay and Mildenhall, Ben and Barron, Jonathan T. and Abbeel, Pieter and Poole, Ben , title =. CVPR , year =

  21. [23]

    Srinivasan, Pratul P and Deng, Boyang and Zhang, Xiuming and Tancik, Matthew and Mildenhall, Ben and Barron, Jonathan T , journal=

  22. [24]

    1760 , publisher=

    Photometria sive de mensura et gradibus luminis, colorum et umbrae , author=. 1760 , publisher=

  23. [25]

    IEEE TVCG , year =

    Nelson Max , title =. IEEE TVCG , year =

  24. [26]

    Srinivasan and Matthew Tancik and Jonathan T

    Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng , year=

  25. [27]

    Nguyen-Phuoc, Thu and Li, Chuan and Theis, Lucas and Richardt, Christian and Yang, Yong-Liang , journal =

  26. [28]

    ICML , year=

    Learning transferable visual models from natural language supervision , author=. ICML , year=

  27. [29]

    ICCV , year=

    PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows , author=. ICCV , year=

  28. [30]

    ECCV , year=

    Learning Gradient Fields for Shape Generation , author=. ECCV , year=

  29. [31]

    ICCV , year =

    Zhou, Linqi and Du, Yilun and Wu, Jiajun , title =. ICCV , year =

  30. [32]

    2110.08985 , archivePrefix=

    Jiatao Gu and Lingjie Liu and Peng Wang and Christian Theobalt , year=. 2110.08985 , archivePrefix=

  31. [33]

    arXiv , year=

    Unconstrained Scene Generation with Locally Conditioned Radiance Fields , author=. arXiv , year=

  32. [35]

    CVPR , year =

    Can Wang and Menglei Chai and Mingming He and Dongdong Chen and Jing Liao , title =. CVPR , year =

  33. [36]

    Sanghi, Aditya and Chu, Hang and Lambourne, Joseph G and Wang, Ye and Cheng, Chin-Yi and Fumero, Marco , journal=

  34. [38]

    SIGGRAPH Asia 2022 Conference Papers , year =

    Khalid, Nasir Mohammad and Xie, Tianhao and Belilovsky, Eugene and Tiberiu, Popa , title =. SIGGRAPH Asia 2022 Conference Papers , year =

  35. [39]

    Estimation of Non-Normalized Statistical Models by Score Matching , journal =

    Aapo Hyv. Estimation of Non-Normalized Statistical Models by Score Matching , journal =

  36. [40]

    ICLR , year=

    Score-Based Generative Modeling through Stochastic Differential Equations , author=. ICLR , year=

  37. [41]

    Neural computation , year=

    A connection between score matching and denoising autoencoders , author=. Neural computation , year=

  38. [42]

    NeurIPS , year =

    Yang Song and Stefano Ermon , title =. NeurIPS , year =

  39. [43]

    Repaint: Inpainting using denoising diffusion probabilistic models, 2022

    Lugmayr, Andreas and Danelljan, Martin and Romero, Andres and Yu, Fisher and Timofte, Radu and Van Gool, Luc , keywords =. RePaint: Inpainting using Denoising Diffusion Probabilistic Models , publisher =. 2022 , copyright =. doi:10.48550/ARXIV.2201.09865 , url =

  40. [44]

    Hong, Fangzhou and Zhang, Mingyuan and Pan, Liang and Cai, Zhongang and Yang, Lei and Liu, Ziwei , journal=

  41. [45]

    Or-El, Roy and Luo, Xuan and Shan, Mengyi and Shechtman, Eli and Park, Jeong Joon and Kemelmacher-Shlizerman, Ira , journal =. Style

  42. [46]

    ICML , year =

    From data to functa: Your data point is a function and you can treat it like one , author =. ICML , year =

  43. [47]

    NeurIPS , year=

    Attention is all you need , author=. NeurIPS , year=

  44. [48]

    ICLR , year =

    Elman Mansimov and Emilio Parisotto and Jimmy Ba and Ruslan Salakhutdinov , title =. ICLR , year =

  45. [49]

    ICML , year=

    Zero-shot text-to-image generation , author=. ICML , year=

  46. [50]

    2018 , journal =

    Hu, Tianyang and Chen, Zixiang and Sun, Hanxi and Bai, Jincheng and Ye, Mao and Cheng, Guang , title =. 2018 , journal =

  47. [51]

    ICLR , year=

    DiffWave: A Versatile Diffusion Model for Audio Synthesis , author=. ICLR , year=

  48. [54]

    Zhang, Yuxuan and Chen, Wenzheng and Ling, Huan and Gao, Jun and Zhang, Yinan and Torralba, Antonio and Fidler, Sanja , journal=. Image

  49. [57]

    CVPR , year=

    Deep residual learning for image recognition , author=. CVPR , year=

  50. [58]

    Gaussian Error Linear Units (

    Hendrycks, Dan and Gimpel, Kevin , journal=. Gaussian Error Linear Units (

  51. [60]

    Christoph Schuhmann and Romain Beaumont and Cade W Gordon and Ross Wightman and mehdi cherti and Theo Coombes and Aarush Katta and Clayton Mullis and Patrick Schramowski and Srivatsa R Kundurthy and Katherine Crowson and Richard Vencu and Ludwig Schmidt and Robert Kaczmarczyk and Jenia Jitsev , journal=

  52. [61]

    Computer Graphics Forum , year=

    Advances in neural rendering , author=. Computer Graphics Forum , year=

  53. [62]

    ICCV , year=

    Nerfies: Deformable neural radiance fields , author=. ICCV , year=

  54. [64]

    Schwarz, Katja and Liao, Yiyi and Niemeyer, Michael and Geiger, Andreas , journal =

  55. [65]

    Distill , year =

    Mordvintsev, Alexander and Pezzotti, Nicola and Schubert, Ludwig and Olah, Chris , title =. Distill , year =

  56. [66]

    CVPR , year=

    Towards Implicit Text-Guided 3D Shape Generation , author=. CVPR , year=

  57. [68]

    An Empirical Bayes Approach to Statistics

    Robbins, Herbert E. An Empirical Bayes Approach to Statistics. Breakthroughs in Statistics: Foundations and Basic Theory. 1992

  58. [69]

    ICML , year =

    Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling , author =. ICML , year =

  59. [70]

    CVPR , year =

    Zhai, Xiaohua and Wang, Xiao and Mustafa, Basil and Steiner, Andreas and Keysers, Daniel and Kolesnikov, Alexander and Beyer, Lucas , title =. CVPR , year =

  60. [71]

    NeurIPS , year=

    Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , author=. NeurIPS , year=

  61. [72]

    Barron and Pratul P

    Dor Verbin and Peter Hedman and Ben Mildenhall and Todd Zickler and Jonathan T. Barron and Pratul P. Srinivasan , journal=

  62. [73]

    and Liu, Ce and Lensch, Hendrik P.A

    Boss, Mark and Braun, Raphael and Jampani, Varun and Barron, Jonathan T. and Liu, Ce and Lensch, Hendrik P.A. , journal =

  63. [74]

    3DV , year=

    GAN2X: Non-Lambertian Inverse Rendering of Image GANs , author=. 3DV , year=

  64. [75]

    ICLR , year=

    Auto-Encoding Variational Bayes , author=. ICLR , year=

  65. [76]

    NeurIPS , year=

    Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , author=. NeurIPS , year=

  66. [77]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    Ramesh, Aditya and Dhariwal, Prafulla and Nichol, Alex and Chu, Casey and Chen, Mark , keywords =. Hierarchical Text-Conditional Image Generation with CLIP Latents , publisher =. 2022 , copyright =. doi:10.48550/ARXIV.2204.06125 , url =

  67. [78]

    ArXiv , year=

    Diffusion Models Beat GANs on Image Synthesis , author=. ArXiv , year=

  68. [79]

    ICML , year=

    GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , author=. ICML , year=

  69. [80]

    Fleet, and Mohammad Norouzi

    Saharia, Chitwan and Ho, Jonathan and Chan, William and Salimans, Tim and Fleet, David J. and Norouzi, Mohammad , keywords =. Image Super-Resolution via Iterative Refinement , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2104.07636 , url =

  70. [81]

    2015 , URL =

    Inceptionism: Going Deeper into Neural Networks , author =. 2015 , URL =

  71. [83]

    Do Deep Generative Models Know What They Don't Know?

    Nalisnick, Eric and Matsukawa, Akihiro and Teh, Yee Whye and Gorur, Dilan and Lakshminarayanan, Balaji , keywords =. Do Deep Generative Models Know What They Don't Know? , publisher =. 2018 , copyright =. doi:10.48550/ARXIV.1810.09136 , url =

  72. [84]

    Srinivasan and Peter Hedman and Ricardo Martin-Brualla and Jonathan T

    Ben Mildenhall and Dor Verbin and Pratul P. Srinivasan and Peter Hedman and Ricardo Martin-Brualla and Jonathan T. Barron , year=

  73. [85]

    Sticking the landing: Simple, lower-variance gradient estimators for variational inference, 2017

    Roeder, Geoffrey and Wu, Yuhuai and Duvenaud, David , keywords =. Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , publisher =. 2017 , copyright =. doi:10.48550/ARXIV.1703.09194 , url =

  74. [86]

    Liu , title =

    Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. Journal of Machine Learning Research , year =

  75. [87]

    arXiv preprint arXiv:2002.09018 , year=

    Anil, Rohan and Gupta, Vineet and Koren, Tomer and Regan, Kevin and Singer, Yoram , keywords =. Scalable Second Order Optimization for Deep Learning , publisher =. 2020 , copyright =. doi:10.48550/ARXIV.2002.09018 , url =

  76. [88]

    Palette: Image-to-image diffusion models

    Saharia, Chitwan and Chan, William and Chang, Huiwen and Lee, Chris A. and Ho, Jonathan and Salimans, Tim and Fleet, David J. and Norouzi, Mohammad , keywords =. Palette: Image-to-Image Diffusion Models , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2111.05826 , url =

  77. [89]

    arXiv preprint arXiv:2002.09018 , year=

    Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, and Yoram Singer. Scalable second order optimization for deep learning, 2020. URL https://arxiv.org/abs/2002.09018

  78. [90]

    Layer Normalization

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv:1607.06450, 2016

  79. [91]

    Mip-NeRF : A multiscale representation for anti-aliasing neural radiance fields

    Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. Mip-NeRF : A multiscale representation for anti-aliasing neural radiance fields. ICCV, 2021

  80. [92]

    Barron, Ben Mildenhall, Dor Verbin, Pratul P

    Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-NeRF 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022

Showing first 80 references.