arxiv: 2209.14988 · v1 · submitted 2022-09-29 · 💻 cs.CV · cs.LG· stat.ML

Recognition: 2 theorem links

· Lean Theorem

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole , Ajay Jain , Jonathan T. Barron , Ben Mildenhall

Authors on Pith no claims yet

Pith reviewed 2026-05-11 11:19 UTC · model grok-4.3

classification 💻 cs.CV cs.LGstat.ML

keywords text-to-3Ddiffusion modelsNeural Radiance Fieldsprobability density distillation3D generationscore distillation samplingNeRF optimization

0 comments

The pith

A 2D text-to-image diffusion model can serve as a prior to optimize a Neural Radiance Field into a consistent 3D model from text alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that large pretrained 2D diffusion models can drive text-to-3D synthesis by guiding the optimization of a 3D scene representation. It replaces the need for 3D datasets or 3D denoising architectures with a distillation loss applied to random 2D renderings of the 3D model. A reader would care because this turns existing image generators into 3D creators without additional training data or model changes. The resulting models support free viewpoint rendering, relighting, and composition into other scenes.

Core claim

A loss based on probability density distillation turns a pretrained 2D diffusion model into a prior that optimizes a randomly initialized Neural Radiance Field so its renderings from random angles achieve low loss under the text prompt; the resulting 3D model requires no 3D training data and no changes to the diffusion model.

What carries the argument

The probability density distillation loss, which converts the 2D diffusion model's denoising gradients into updates for 3D parameters via random-view renderings.

If this is right

The 3D model can be viewed from any angle after optimization.
Arbitrary illumination can be applied to relight the model.
The model can be composited into arbitrary 3D environments.
No 3D training data or 3D-specific architectures are required.
Existing 2D diffusion models can be used without modification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The success implies that 2D diffusion models have already learned enough 3D structure from their image-text training data to support 3D inference.
The same distillation approach could be tested on other 3D representations such as meshes or Gaussian splats.
Extending the random-view sampling to include temporal consistency might enable text-to-4D generation as a direct next step.

Load-bearing premise

Random 2D renderings scored by the 2D model will produce geometrically consistent 3D structure without explicit multi-view constraints or 3D supervision.

What would settle it

If the optimized NeRF produces renderings from novel viewpoints that violate the original text prompt or exhibit geometric inconsistencies such as floating artifacts or incorrect depth ordering.

read the original abstract

Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs. Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss. The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DreamFusion shows a practical way to optimize NeRFs from text using a distillation loss on a frozen 2D diffusion model, and the results hold up empirically on many prompts even without explicit 3D consistency terms.

read the letter

The main takeaway is that this paper gives a working recipe for text-to-3D by turning a pretrained 2D diffusion model into a prior for NeRF optimization. They derive a probability density distillation loss that scores random 2D renderings and back-propagates into the 3D parameters, all without any 3D training data or changes to the image model. That formulation is new relative to the diffusion and NeRF papers they cite, and it is what lets the method run on ordinary text prompts. The results are usable: the generated models can be viewed from arbitrary angles, relit, and dropped into other scenes, and the examples cover a decent range of objects and styles. The implementation details are concrete enough that others could reproduce the core loop. The soft spot is the absence of any cross-view term or geometric regularizer. The loss is applied independently to each random camera, so 3D coherence comes only from the shared NeRF weights and whatever implicit structure the 2D prior supplies. The paper shows this often works but does not prove or bound the consistency, and some outputs exhibit duplicated features or view-dependent artifacts. Later papers confirmed these issues, but the original experiments already flag them as occasional rather than solved. The math itself is a direct application of the diffusion score and looks solid. Citations to the relevant diffusion and NeRF baselines are appropriate. This is for researchers in generative graphics or vision who want to experiment with text-driven 3D without large 3D datasets. A reader who needs a starting point for follow-up work or wants to understand how 2D priors can bootstrap 3D will get concrete value. The paper deserves a serious referee because the technique is original and the empirical demonstration is strong enough to have influenced the field.

Referee Report

1 major / 3 minor

Summary. The paper introduces DreamFusion, a method for text-to-3D synthesis that optimizes a Neural Radiance Field (NeRF) using a Score Distillation Sampling (SDS) loss derived from a fixed pretrained 2D text-to-image diffusion model. This enables generation of 3D models from text prompts via gradient descent on random-view renderings, without 3D training data or modifications to the diffusion model, and supports applications such as novel view synthesis, relighting, and compositing.

Significance. If the empirical results hold, the work is significant for showing that 2D diffusion priors can be effectively distilled into 3D representations through the SDS loss, bypassing the lack of large-scale 3D datasets. It provides concrete evidence via optimization on diverse text prompts and demonstrates practical outputs, crediting the parameter-free use of an external pretrained model as a key strength.

major comments (1)

[§3.2] §3.2: The SDS loss is applied independently to single-view renderings sampled from random cameras with no cross-view consistency term, depth regularizer, or multi-view correspondence loss. The central claim that this yields coherent 3D geometry therefore depends on the unanalyzed assumption that the shared NeRF parameters will avoid view-inconsistent minima; while §4 reports successful examples, the manuscript provides no derivation or empirical stress test of when this consistency emerges.

minor comments (3)

[§3.1] §3.1: The notation distinguishing the diffusion model parameters φ from the NeRF parameters θ could be made more explicit to avoid confusion with standard diffusion literature.
[§4] §4: Figure captions would benefit from including the exact text prompt and camera sampling details for each example to improve reproducibility.
[Abstract] Abstract: The phrase 'DeepDream-like procedure' is used without a brief definition or reference, which may reduce accessibility for readers unfamiliar with that technique.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment and constructive feedback on our work. We address the major comment below.

read point-by-point responses

Referee: [§3.2] §3.2: The SDS loss is applied independently to single-view renderings sampled from random cameras with no cross-view consistency term, depth regularizer, or multi-view correspondence loss. The central claim that this yields coherent 3D geometry therefore depends on the unanalyzed assumption that the shared NeRF parameters will avoid view-inconsistent minima; while §4 reports successful examples, the manuscript provides no derivation or empirical stress test of when this consistency emerges.

Authors: We agree that the SDS loss is applied to individual renderings without explicit cross-view terms. Coherence emerges because the NeRF parameters are shared and jointly optimized over a distribution of random viewpoints; a view-inconsistent solution would produce high average loss across the sampled poses. The manuscript relies on this mechanism and demonstrates its effectiveness through the diverse successful examples in §4, but does not include a formal derivation or systematic stress tests for failure modes. In revision we will expand §3.2 with a short discussion of how shared parameters promote consistency and will add a small number of challenging cases illustrating when inconsistencies can occur. revision: partial

Circularity Check

0 steps flagged

No circularity; SDS loss derived from external fixed diffusion prior

full rationale

The paper introduces the SDS loss in §3.2 as a probability density distillation term taken directly from the score function of a frozen, pretrained 2D diffusion model φ. NeRF parameters θ are then optimized by gradient descent on random-view renderings x = g(θ, c) to minimize L_SDS(φ, x). Because φ is external and fixed, and the loss contains no self-referential terms or fitted parameters that are later renamed as predictions, no step in the derivation chain reduces the target 3D geometry to an input by construction. The multi-view consistency is an empirical outcome of shared θ under the external prior rather than a tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central contribution is the new distillation loss; everything else rests on standard assumptions of gradient-based optimization and the semantic coverage of the pretrained 2D model.

free parameters (1)

optimization hyperparameters
Learning rate, number of views per iteration, and loss weighting coefficients are chosen empirically.

axioms (2)

domain assumption Gradient descent on the distillation loss will converge to a 3D representation whose renderings match the text prompt
Invoked throughout the optimization procedure in Section 3.
domain assumption A 2D diffusion model trained on image-text pairs encodes sufficient 3D-consistent semantic information
Core premise for using the model as a 3D prior without 3D data.

invented entities (1)

Score Distillation Sampling (SDS) loss no independent evidence
purpose: Distills gradients from the 2D diffusion density into updates for 3D parameters
Newly defined in this paper as the mechanism that transfers 2D knowledge to 3D optimization.

pith-pipeline@v0.9.0 · 5512 in / 1540 out tokens · 45629 ms · 2026-05-11T11:19:58.112831+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Foundation/AlexanderDuality alexander_duality_circle_linking contradicts

?

contradicts
CONTRADICTS: the theorem conflicts with this paper passage, or marks a claim that would need revision before publication.

The core method in §3.2 optimizes NeRF parameters θ via the SDS loss L_SDS(φ, x = g(θ, c)) applied to renderings x from randomly sampled cameras c, where φ is the frozen 2D diffusion model. Each term in the loss depends only on a single 2D image and its noise prediction; no cross-view term, depth consistency regularizer, or multi-view correspondence loss is present.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 45 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ReConText3D: Replay-based Continual Text-to-3D Generation
cs.CV 2026-04 conditional novelty 8.0

ReConText3D is the first replay-memory framework for continual text-to-3D generation that prevents catastrophic forgetting on new textual categories while preserving quality on previously seen classes.
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
cs.CV 2026-05 unverdicted novelty 7.0

R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.
GTA: Advancing Image-to-3D World Generation via Geometry Then Appearance Video Diffusion
cs.CV 2026-05 unverdicted novelty 7.0

GTA generates 3D worlds from single images via a two-stage video diffusion process that prioritizes geometry before appearance to improve structural consistency.
3DReflecNet: A Large-Scale Dataset for 3D Reconstruction of Reflective, Transparent, and Low-Texture Objects
cs.CV 2026-05 unverdicted novelty 7.0

3DReflecNet is a 22 TB+ dataset of over 120,000 synthetic and 1,000 real objects with millions of multi-view frames for benchmarking 3D reconstruction on reflective, transparent, and low-texture surfaces.
Generative Modeling with Orbit-Space Particle Flow Matching
cs.GR 2026-05 unverdicted novelty 7.0

OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation
cs.AI 2026-04 unverdicted novelty 7.0

SpatialGrammar provides a grid-based DSL and compiler that lets LLMs generate collision-free 3D indoor scenes more reliably than raw-coordinate or code-based approaches.
GSCompleter: A Distillation-Free Plugin for Metric-Aware 3D Gaussian Splatting Completion in Seconds
cs.CV 2026-04 unverdicted novelty 7.0

GSCompleter completes sparse 3D Gaussian Splatting scenes via a distillation-free generate-then-register pipeline using Stereo-Anchor lifting and Ray-Constrained Registration, delivering SOTA results on three benchmarks.
TransSplat: Unbalanced Semantic Transport for Language-Driven 3DGS Editing
cs.CV 2026-04 unverdicted novelty 7.0

TransSplat uses unbalanced semantic transport to match edited 2D evidence with 3D Gaussians and recover a shared 3D edit field, yielding better local accuracy and structural consistency than prior view-consistency methods.
Guiding Distribution Matching Distillation with Gradient-Based Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

GDMD replaces raw-sample rewards with distillation-gradient rewards in RL-guided diffusion distillation, yielding 4-step models that surpass their multi-step teachers on GenEval and human preference metrics.
Brain3D: EEG-to-3D Decoding of Visual Representations via Multimodal Reasoning
cs.CV 2026-04 unverdicted novelty 7.0

A multimodal pipeline decodes EEG into 3D meshes via EEG-to-image, MLLM reasoning, diffusion, and single-image-to-3D conversion, reporting 85.4% 10-way accuracy and 0.648 CLIPScore.
SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation
cs.CV 2026-04 unverdicted novelty 7.0

SEM-ROVER generates large multiview-consistent 3D urban driving scenes via semantic-conditioned diffusion on Σ-Voxfield voxel grids with progressive outpainting and deferred rendering.
THOM: Generating Physically Plausible Hand-Object Meshes From Text
cs.CV 2026-04 unverdicted novelty 7.0

THOM is a training-free two-stage framework that generates physically plausible hand-object 3D meshes directly from text by combining text-guided Gaussians with contact-aware physics optimization and VLM refinement.
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
cs.HC 2026-03 unverdicted novelty 7.0

XR Blocks supplies an LLM-optimized Reality Model and Vibe Coding XR workflow that converts high-level prompts into working physics-aware XR applications with high one-shot success.
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
cs.RO 2026-05 unverdicted novelty 6.0

VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion
cs.CV 2026-05 unverdicted novelty 6.0

DiLAST optimizes 3D latents via guidance from a 2D diffusion model to enable generalizable style transfer for OOD styles in 3D asset generation.
InpaintSLat: Inpainting Structured 3D Latents via Initial Noise Optimization
cs.CV 2026-05 unverdicted novelty 6.0

Optimizing initial noise via backpropagation approximation and spectral parameterization in structured 3D latent diffusion yields higher contextual consistency and prompt alignment in training-free inpainting.
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement
cs.CV 2026-04 unverdicted novelty 6.0

REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.
Sparse-View 3D Gaussian Splatting in the Wild
cs.CV 2026-04 unverdicted novelty 6.0

A new sparse-view 3D Gaussian splatting method for unconstrained scenes with distractors combines diffusion-based reference-guided refinement and sparsity-aware Gaussian replication to achieve better rendering quality.
FluSplat: Sparse-View 3D Editing without Test-Time Optimization
cs.CV 2026-04 unverdicted novelty 6.0

FluSplat trains a model with geometric alignment constraints on multi-view edits to produce consistent 3D scene edits from sparse views in a single forward pass without test-time optimization.
Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens
cs.CV 2026-04 unverdicted novelty 6.0

Viewpoint tokens learned on a mixed 3D-rendered and photorealistic dataset enable precise camera control in text-to-image generation while factorizing geometry from appearance and transferring to unseen object categories.
Deepfake Detection Generalization with Diffusion Noise
cs.CV 2026-04 unverdicted novelty 6.0

ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data
cs.CV 2026-04 unverdicted novelty 6.0

BVE framework enables text-guided 3D editing beyond voxel limits by combining self-constructed data, lightweight semantic injection, and annotation-free masking to preserve local invariance.
Grasp in Gaussians: Fast Monocular Reconstruction of Dynamic Hand-Object Interactions
cs.CV 2026-04 unverdicted novelty 6.0

GraG reconstructs dynamic 3D hand-object interactions from monocular video 6.4x faster than prior work by using compact Sum-of-Gaussians tracking initialized from large models and refined with 2D losses.
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting
cs.RO 2026-04 unverdicted novelty 6.0

Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment
cs.CV 2026-04 unverdicted novelty 6.0

ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories
cs.CV 2026-04 unverdicted novelty 6.0

A video diffusion model learns a joint distribution over videos and camera trajectories by representing cameras as pixel-aligned ray encodings (raxels) denoised jointly with video frames via decoupled attention.
TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches
cs.CV 2026-04 unverdicted novelty 6.0

TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.
Guiding a Diffusion Model by Swapping Its Tokens
cs.CV 2026-04 unverdicted novelty 6.0

Self-Swap Guidance steers diffusion sampling by swapping dissimilar token latents to enable CFG-like improvements for both conditional and unconditional generation.
3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience
cs.CV 2026-04 unverdicted novelty 6.0

3DrawAgent lets LLMs create complex 3D sketches from text prompts by using pairwise comparisons of their own outputs to self-improve spatial drawing skills without parameter updates.
DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics
cs.CV 2026-04 unverdicted novelty 6.0

DailyArt recovers full joint parameters of articulated objects from a single static image by synthesizing an opened state and comparing discrepancies, supporting downstream part-level novel state synthesis.
MemoryDiorama: Generating Dynamic 3D Diorama from Everyday Photos for Memory Recall
cs.HC 2026-04 unverdicted novelty 6.0

MemoryDiorama generates animated 3D dioramas from photos via LLM scene analysis and generative components, yielding richer autobiographical recall than photo-only or static diorama baselines.
HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance
cs.CV 2026-04 unverdicted novelty 6.0

HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.
MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model
cs.CV 2026-03 unverdicted novelty 6.0

MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.
Teaching an Agent to Sketch One Part at a Time
cs.AI 2026-03 unverdicted novelty 6.0

A multi-modal LM agent is trained to produce vector sketches part-by-part via supervised fine-tuning and process-reward RL on the new ControlSketch-Part dataset with automatic part annotations.
R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
cs.CV 2026-05 unverdicted novelty 5.0

R-DMesh uses a VAE with a learned rectification jump offset and Triflow Attention inside a rectified-flow diffusion transformer to produce video-aligned 4D meshes despite initial pose misalignment.
RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation
cs.CV 2026-05 unverdicted novelty 5.0

RealDiffusion uses heat diffusion as a dissipative prior and a region-aware stochastic process inside a training-free physics-informed attention mechanism to improve multi-character coherence while preserving narrativ...
SpatialPrompt: XR-Based Spatial Intent Expression as Executable Constraints for AI Generative 3D Design
cs.HC 2026-05 unverdicted novelty 5.0

SpatialPrompt turns spatial sketches and voice prompts into executable constraints for controllable AI 3D generation in XR, enabling iterative collaborative creation with color-coded contributions.
ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation
cs.CV 2026-05 unverdicted novelty 5.0

ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.
Pose-Aware Diffusion for 3D Generation
cs.CV 2026-05 unverdicted novelty 5.0

PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.
Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images
cs.CV 2026-04 unverdicted novelty 5.0

Unposed-to-3D learns simulation-ready 3D vehicle models from unposed real images by predicting camera parameters for photometric self-supervision, then adding scale prediction and harmonization.
UniMesh: Unifying 3D Mesh Understanding and Generation
cs.CV 2026-04 unverdicted novelty 5.0

UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.
"From remembering to shaping": Narrating Shared Experiences by Co-Designing Cultural Heritage Artifacts in Collaborative VR
cs.HC 2026-04 unverdicted novelty 5.0

A collaborative VR workflow with GenAI lets users merge prompts and creatively repurpose outputs to co-create 3D artifacts that narrate shared cultural heritage experiences.
Hitem3D 2.0: Multi-View Guided Native 3D Texture Generation
cs.CV 2026-04 unverdicted novelty 5.0

Hitem3D 2.0 combines multi-view image synthesis with native 3D texture projection to improve completeness, cross-view consistency, and geometry alignment over prior methods.
AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation
cs.CV 2026-04 unverdicted novelty 4.0

AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantica...
LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation
cs.CV 2026-04 unverdicted novelty 3.0

This review organizes literature on large multimodal models and object-centric vision into four themes—understanding, referring segmentation, editing, and generation—while summarizing paradigms, strategies, and challe...

Reference graph

Works this paper leans on

149 extracted references · 149 canonical work pages · cited by 44 Pith papers · 9 internal anchors

[1]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

work page
[2]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

work page
[3]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

work page 2016
[4]

UAI , year=

Probability Distillation: A Caveat and Alternatives , author=. UAI , year=

work page
[5]

Wei Ping and Kainan Peng and Jitong Chen , journal=

work page
[6]

SIGGRAPH , year=

A signal-processing framework for inverse rendering , author=. SIGGRAPH , year=

work page
[7]

Barron, Jonathan T and Mildenhall, Ben and Tancik, Matthew and Hedman, Peter and Martin-Brualla, Ricardo and Srinivasan, Pratul P , journal=

work page
[8]

Barron and Ben Mildenhall and Dor Verbin and Pratul P

Jonathan T. Barron and Ben Mildenhall and Dor Verbin and Pratul P. Srinivasan and Peter Hedman , journal=

work page
[9]

Barron and Jitendra Malik , Title =

Jonathan T. Barron and Jitendra Malik , Title =. TPAMI , year=

work page
[10]

Land and John J

Edwin H. Land and John J. McCann , journal =. Lightness and Retinex Theory , year =

work page
[11]

Courville and Christopher J

Jae Hyun Lim and Aaron C. Courville and Christopher J. Pal and Chin. ICML , year =

work page
[12]

ICML , year =

Deep Unsupervised Learning using Nonequilibrium Thermodynamics , author =. ICML , year =

work page
[13]

NeurIPS , year=

Variational Diffusion Models , author=. NeurIPS , year=

work page
[14]

Denoising Diffusion Probabilistic Models , year =

Ho, Jonathan and Jain, Ajay and Abbeel, Pieter , journal =. Denoising Diffusion Probabilistic Models , year =

work page
[16]

Sara and Lopes, Rapha Gontijo and Salimans, Tim and Ho, Jonathan and Fleet, David J and Norouzi, Mohammad , keywords =

Saharia, Chitwan and Chan, William and Saxena, Saurabh and Li, Lala and Whang, Jay and Denton, Emily and Ghasemipour, Seyed Kamyar Seyed and Ayan, Burcu Karagol and Mahdavi, S. Sara and Lopes, Rapha Gontijo and Salimans, Tim and Ho, Jonathan and Fleet, David J and Norouzi, Mohammad , keywords =. Photorealistic Text-to-Image Diffusion Models with Deep Lang...

work page
[17]

NeurIPS , year=

Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling , author=. NeurIPS , year=

work page
[18]

Henzler, Philipp and Mitra, Niloy J and and Ritschel, Tobias , journal=

work page
[19]

and Monteiro, Marco and Kellnhofer, Petr and Wu, Jiajun and Wetzstein, Gordon , title =

Chan, Eric R. and Monteiro, Marco and Kellnhofer, Petr and Wu, Jiajun and Wetzstein, Gordon , title =. CVPR , year=

work page
[20]

Chan and Connor Z

Eric R. Chan and Connor Z. Lin and Matthew A. Chan and Koki Nagano and Boxiao Pan and Shalini De Mello and Orazio Gallo and Leonidas Guibas and Jonathan Tremblay and Sameh Khamis and Tero Karras and Gordon Wetzstein , title =. arXiv , year =

work page
[21]

and Abbeel, Pieter and Poole, Ben , title =

Jain, Ajay and Mildenhall, Ben and Barron, Jonathan T. and Abbeel, Pieter and Poole, Ben , title =. CVPR , year =

work page
[23]

Srinivasan, Pratul P and Deng, Boyang and Zhang, Xiuming and Tancik, Matthew and Mildenhall, Ben and Barron, Jonathan T , journal=

work page
[24]

1760 , publisher=

Photometria sive de mensura et gradibus luminis, colorum et umbrae , author=. 1760 , publisher=

work page
[25]

IEEE TVCG , year =

Nelson Max , title =. IEEE TVCG , year =

work page
[26]

Srinivasan and Matthew Tancik and Jonathan T

Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng , year=

work page
[27]

Nguyen-Phuoc, Thu and Li, Chuan and Theis, Lucas and Richardt, Christian and Yang, Yong-Liang , journal =

work page
[28]

ICML , year=

Learning transferable visual models from natural language supervision , author=. ICML , year=

work page
[29]

ICCV , year=

PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows , author=. ICCV , year=

work page
[30]

ECCV , year=

Learning Gradient Fields for Shape Generation , author=. ECCV , year=

work page
[31]

ICCV , year =

Zhou, Linqi and Du, Yilun and Wu, Jiajun , title =. ICCV , year =

work page
[32]

2110.08985 , archivePrefix=

Jiatao Gu and Lingjie Liu and Peng Wang and Christian Theobalt , year=. 2110.08985 , archivePrefix=

work page arXiv
[33]

arXiv , year=

Unconstrained Scene Generation with Locally Conditioned Radiance Fields , author=. arXiv , year=

work page
[35]

CVPR , year =

Can Wang and Menglei Chai and Mingming He and Dongdong Chen and Jing Liao , title =. CVPR , year =

work page
[36]

Sanghi, Aditya and Chu, Hang and Lambourne, Joseph G and Wang, Ye and Cheng, Chin-Yi and Fumero, Marco , journal=

work page
[38]

SIGGRAPH Asia 2022 Conference Papers , year =

Khalid, Nasir Mohammad and Xie, Tianhao and Belilovsky, Eugene and Tiberiu, Popa , title =. SIGGRAPH Asia 2022 Conference Papers , year =

work page 2022
[39]

Estimation of Non-Normalized Statistical Models by Score Matching , journal =

Aapo Hyv. Estimation of Non-Normalized Statistical Models by Score Matching , journal =

work page
[40]

ICLR , year=

Score-Based Generative Modeling through Stochastic Differential Equations , author=. ICLR , year=

work page
[41]

Neural computation , year=

A connection between score matching and denoising autoencoders , author=. Neural computation , year=

work page
[42]

NeurIPS , year =

Yang Song and Stefano Ermon , title =. NeurIPS , year =

work page
[43]

Repaint: Inpainting using denoising diffusion probabilistic models, 2022

Lugmayr, Andreas and Danelljan, Martin and Romero, Andres and Yu, Fisher and Timofte, Radu and Van Gool, Luc , keywords =. RePaint: Inpainting using Denoising Diffusion Probabilistic Models , publisher =. 2022 , copyright =. doi:10.48550/ARXIV.2201.09865 , url =

work page doi:10.48550/arxiv.2201.09865 2022
[44]

Hong, Fangzhou and Zhang, Mingyuan and Pan, Liang and Cai, Zhongang and Yang, Lei and Liu, Ziwei , journal=

work page
[45]

Or-El, Roy and Luo, Xuan and Shan, Mengyi and Shechtman, Eli and Park, Jeong Joon and Kemelmacher-Shlizerman, Ira , journal =. Style

work page
[46]

ICML , year =

From data to functa: Your data point is a function and you can treat it like one , author =. ICML , year =

work page
[47]

NeurIPS , year=

Attention is all you need , author=. NeurIPS , year=

work page
[48]

ICLR , year =

Elman Mansimov and Emilio Parisotto and Jimmy Ba and Ruslan Salakhutdinov , title =. ICLR , year =

work page
[49]

ICML , year=

Zero-shot text-to-image generation , author=. ICML , year=

work page
[50]

2018 , journal =

Hu, Tianyang and Chen, Zixiang and Sun, Hanxi and Bai, Jincheng and Ye, Mao and Cheng, Guang , title =. 2018 , journal =

work page 2018
[51]

ICLR , year=

DiffWave: A Versatile Diffusion Model for Audio Synthesis , author=. ICLR , year=

work page
[54]

Zhang, Yuxuan and Chen, Wenzheng and Ling, Huan and Gao, Jun and Zhang, Yinan and Torralba, Antonio and Fidler, Sanja , journal=. Image

work page
[57]

CVPR , year=

Deep residual learning for image recognition , author=. CVPR , year=

work page
[58]

Gaussian Error Linear Units (

Hendrycks, Dan and Gimpel, Kevin , journal=. Gaussian Error Linear Units (

work page
[60]

Christoph Schuhmann and Romain Beaumont and Cade W Gordon and Ross Wightman and mehdi cherti and Theo Coombes and Aarush Katta and Clayton Mullis and Patrick Schramowski and Srivatsa R Kundurthy and Katherine Crowson and Richard Vencu and Ludwig Schmidt and Robert Kaczmarczyk and Jenia Jitsev , journal=

work page
[61]

Computer Graphics Forum , year=

Advances in neural rendering , author=. Computer Graphics Forum , year=

work page
[62]

ICCV , year=

Nerfies: Deformable neural radiance fields , author=. ICCV , year=

work page
[64]

Schwarz, Katja and Liao, Yiyi and Niemeyer, Michael and Geiger, Andreas , journal =

work page
[65]

Distill , year =

Mordvintsev, Alexander and Pezzotti, Nicola and Schubert, Ludwig and Olah, Chris , title =. Distill , year =

work page
[66]

CVPR , year=

Towards Implicit Text-Guided 3D Shape Generation , author=. CVPR , year=

work page
[68]

An Empirical Bayes Approach to Statistics

Robbins, Herbert E. An Empirical Bayes Approach to Statistics. Breakthroughs in Statistics: Foundations and Basic Theory. 1992

work page 1992
[69]

ICML , year =

Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling , author =. ICML , year =

work page
[70]

CVPR , year =

Zhai, Xiaohua and Wang, Xiao and Mustafa, Basil and Steiner, Andreas and Keysers, Daniel and Kolesnikov, Alexander and Beyer, Lucas , title =. CVPR , year =

work page
[71]

NeurIPS , year=

Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance , author=. NeurIPS , year=

work page
[72]

Barron and Pratul P

Dor Verbin and Peter Hedman and Ben Mildenhall and Todd Zickler and Jonathan T. Barron and Pratul P. Srinivasan , journal=

work page
[73]

and Liu, Ce and Lensch, Hendrik P.A

Boss, Mark and Braun, Raphael and Jampani, Varun and Barron, Jonathan T. and Liu, Ce and Lensch, Hendrik P.A. , journal =

work page
[74]

3DV , year=

GAN2X: Non-Lambertian Inverse Rendering of Image GANs , author=. 3DV , year=

work page
[75]

ICLR , year=

Auto-Encoding Variational Bayes , author=. ICLR , year=

work page
[76]

NeurIPS , year=

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , author=. NeurIPS , year=

work page
[77]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Ramesh, Aditya and Dhariwal, Prafulla and Nichol, Alex and Chu, Casey and Chen, Mark , keywords =. Hierarchical Text-Conditional Image Generation with CLIP Latents , publisher =. 2022 , copyright =. doi:10.48550/ARXIV.2204.06125 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2204.06125 2022
[78]

ArXiv , year=

Diffusion Models Beat GANs on Image Synthesis , author=. ArXiv , year=

work page
[79]

ICML , year=

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , author=. ICML , year=

work page
[80]

Fleet, and Mohammad Norouzi

Saharia, Chitwan and Ho, Jonathan and Chan, William and Salimans, Tim and Fleet, David J. and Norouzi, Mohammad , keywords =. Image Super-Resolution via Iterative Refinement , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2104.07636 , url =

work page doi:10.48550/arxiv.2104.07636 2021
[81]

2015 , URL =

Inceptionism: Going Deeper into Neural Networks , author =. 2015 , URL =

work page 2015
[83]

Do Deep Generative Models Know What They Don't Know?

Nalisnick, Eric and Matsukawa, Akihiro and Teh, Yee Whye and Gorur, Dilan and Lakshminarayanan, Balaji , keywords =. Do Deep Generative Models Know What They Don't Know? , publisher =. 2018 , copyright =. doi:10.48550/ARXIV.1810.09136 , url =

work page Pith review doi:10.48550/arxiv.1810.09136 2018
[84]

Srinivasan and Peter Hedman and Ricardo Martin-Brualla and Jonathan T

Ben Mildenhall and Dor Verbin and Pratul P. Srinivasan and Peter Hedman and Ricardo Martin-Brualla and Jonathan T. Barron , year=

work page
[85]

Sticking the landing: Simple, lower-variance gradient estimators for variational inference, 2017

Roeder, Geoffrey and Wu, Yuhuai and Duvenaud, David , keywords =. Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference , publisher =. 2017 , copyright =. doi:10.48550/ARXIV.1703.09194 , url =

work page doi:10.48550/arxiv.1703.09194 2017
[86]

Liu , title =

Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. Journal of Machine Learning Research , year =

work page
[87]

arXiv preprint arXiv:2002.09018 , year=

Anil, Rohan and Gupta, Vineet and Koren, Tomer and Regan, Kevin and Singer, Yoram , keywords =. Scalable Second Order Optimization for Deep Learning , publisher =. 2020 , copyright =. doi:10.48550/ARXIV.2002.09018 , url =

work page doi:10.48550/arxiv.2002.09018 2020
[88]

Palette: Image-to-image diffusion models

Saharia, Chitwan and Chan, William and Chang, Huiwen and Lee, Chris A. and Ho, Jonathan and Salimans, Tim and Fleet, David J. and Norouzi, Mohammad , keywords =. Palette: Image-to-Image Diffusion Models , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2111.05826 , url =

work page doi:10.48550/arxiv.2111.05826 2021
[89]

arXiv preprint arXiv:2002.09018 , year=

Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, and Yoram Singer. Scalable second order optimization for deep learning, 2020. URL https://arxiv.org/abs/2002.09018

work page arXiv 2020
[90]

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv:1607.06450, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[91]

Mip-NeRF : A multiscale representation for anti-aliasing neural radiance fields

Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. Mip-NeRF : A multiscale representation for anti-aliasing neural radiance fields. ICCV, 2021

work page 2021
[92]

Barron, Ben Mildenhall, Dor Verbin, Pratul P

Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-NeRF 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022

work page 2022

Showing first 80 references.