GenAssets: Generating in-the-wild 3D Assets in Latent Space
Pith reviewed 2026-05-08 12:21 UTC · model grok-4.3
The pith
A 3D latent diffusion model generates complete high-quality assets from sparse in-the-wild driving sensor data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a 3D latent diffusion model that learns on in-the-wild LiDAR and camera data captured by a sensor platform and generates high-quality 3D assets with complete geometry and appearance. Key to our method is a reconstruct-then-generate approach that first leverages occlusion-aware neural rendering trained over multiple scenes to build a high-quality latent space for objects, and then trains a diffusion model that operates on the latent space. We show our method outperforms existing reconstruction and generation based methods, unlocking diverse and scalable content creation for simulation.
What carries the argument
The reconstruct-then-generate pipeline: occlusion-aware neural rendering trained across multiple scenes to produce a latent space for partially observed objects, followed by a diffusion model that samples complete assets inside that latent space.
If this is right
- Generated assets render consistently from arbitrary viewpoints rather than only near the original observations.
- The method produces complete geometry and appearance for traffic participants even from single-pass driving captures.
- It scales content creation for multi-sensor simulation without requiring dense multi-view captures or manual modeling.
- Outperforms both pure neural-rendering reconstruction and standard diffusion generation on in-the-wild driving scenes.
Where Pith is reading between the lines
- The same latent-space reconstruction step could be applied to other domains that suffer from sparse, occluded observations such as indoor robotics or aerial mapping.
- Once the latent space exists, the diffusion stage could be conditioned on additional attributes like vehicle type or weather to further increase scenario variety in simulation.
- Integration of these assets into closed-loop simulators would allow testing of perception and planning modules on far more diverse object configurations than real data alone provides.
Load-bearing premise
An occlusion-aware neural rendering model trained over multiple scenes can reliably construct a high-quality latent space for objects observed under sparse viewpoints and partial occlusions in driving data.
What would settle it
If assets generated by the model, when rendered from completely novel viewpoints far from any training observation, fail to match the appearance and geometry statistics of held-out real sensor captures of the same object categories, the claim that the latent space supports faithful completion would not hold.
Figures
read the original abstract
High-quality 3D assets for traffic participants are critical for multi-sensor simulation, which is essential for the safe end-to-end development of autonomy. Building assets from in-the-wild data is key for diversity and realism, but existing neural-rendering based reconstruction methods are slow and generate assets that render well only from viewpoints close to the original observations, limiting their usefulness in simulation. Recent diffusion-based generative models build complete and diverse assets, but perform poorly on in-the-wild driving scenes, where observed actors are captured under sparse and limited fields of view, and are partially occluded. In this work, we propose a 3D latent diffusion model that learns on in-the-wild LiDAR and camera data captured by a sensor platform and generates high-quality 3D assets with complete geometry and appearance. Key to our method is a "reconstruct-then-generate" approach that first leverages occlusion-aware neural rendering trained over multiple scenes to build a high-quality latent space for objects, and then trains a diffusion model that operates on the latent space. We show our method outperforms existing reconstruction and generation based methods, unlocking diverse and scalable content creation for simulation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes GenAssets, a 3D latent diffusion model that learns from in-the-wild LiDAR and camera data captured by a sensor platform. It uses a reconstruct-then-generate pipeline: an occlusion-aware neural rendering model is trained jointly over multiple scenes to construct a latent space for objects observed under sparse viewpoints and partial occlusions, after which a diffusion model operates in that latent space to generate 3D assets with complete geometry and appearance. The abstract asserts that this outperforms existing reconstruction and generation baselines for multi-sensor simulation of traffic participants.
Significance. If the central assumption holds, the approach could enable scalable generation of diverse, complete 3D assets from real driving data, addressing the slowness of per-scene neural reconstruction and the failure of standard diffusion models on limited-view, occluded observations. This would support more realistic simulation for autonomy development.
major comments (2)
- [Abstract] Abstract: The claim that the method 'outperforms existing reconstruction and generation based methods' is unsupported by any quantitative metrics, ablation studies, tables, or experimental details. This prevents verification of the central claim.
- [Abstract] Abstract (reconstruct-then-generate description): The pipeline assumes that the occlusion-aware neural renderer, trained jointly over multiple scenes, produces object latents encoding complete unobserved geometry and appearance rather than imputing from dataset priors. No analysis, ablations, or evidence is supplied to show that the latent space recovers missing parts from sparse, occluded driving views (<90° total viewpoint range, frequent partial occlusions) instead of collapsing to averages. This assumption is load-bearing, as every generated asset is decoded from samples in this latent space.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important points about supporting claims in the abstract and providing evidence for the core assumptions in our reconstruct-then-generate pipeline. We address each major comment below and commit to revisions that will strengthen the paper without altering its central contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the method 'outperforms existing reconstruction and generation based methods' is unsupported by any quantitative metrics, ablation studies, tables, or experimental details. This prevents verification of the central claim.
Authors: We agree that the abstract would be strengthened by including specific quantitative support for the performance claim. The full manuscript contains detailed experimental results in Sections 4 and 5, including tables with metrics such as PSNR, IoU for geometry, and FID for appearance, demonstrating consistent improvements over reconstruction and diffusion baselines. To address this directly, we will revise the abstract to incorporate key numerical results (e.g., average gains of X% on primary metrics) while maintaining its concise nature. revision: yes
-
Referee: [Abstract] Abstract (reconstruct-then-generate description): The pipeline assumes that the occlusion-aware neural renderer, trained jointly over multiple scenes, produces object latents encoding complete unobserved geometry and appearance rather than imputing from dataset priors. No analysis, ablations, or evidence is supplied to show that the latent space recovers missing parts from sparse, occluded driving views (<90° total viewpoint range, frequent partial occlusions) instead of collapsing to averages. This assumption is load-bearing, as every generated asset is decoded from samples in this latent space.
Authors: This is a substantive point about the properties of the learned latent space. Our joint multi-scene training is intended to promote completion of unobserved geometry through shared priors across diverse observations, and we provide supporting evidence via qualitative comparisons and quantitative metrics showing that our generated assets are more complete than per-scene baselines. That said, we acknowledge the absence of targeted analysis isolating recovery of missing parts versus dataset averaging. We will add a dedicated ablation subsection (including visualizations of decoded outputs from progressively sparser/occluded inputs against held-out ground truth) to directly demonstrate the latent space behavior. revision: yes
Circularity Check
No circularity: descriptive pipeline with no equations or self-referential reductions
full rationale
The paper describes a reconstruct-then-generate method that first trains an occlusion-aware neural renderer across scenes to produce object latents, then trains a diffusion model on those latents. No equations, derivations, or fitted-parameter predictions appear in the provided text. The central claim is an empirical method statement rather than a mathematical reduction; the neural-rendering step is presented as an enabling component whose validity is external to the diffusion stage. No self-citation chains, ansatzes smuggled via prior work, or renamings of known results are load-bearing for the output. The reader's assessment of score 2.0 is consistent with absence of circular structure.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Renderdiffusion: Image diffusion for 3d reconstruction, in- painting and generation
Titas Anciukevi ˇcius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J Mitra, and Paul Guerrero. Renderdiffusion: Image diffusion for 3d reconstruction, in- painting and generation. InCVPR, 2023. 3
2023
-
[2]
Barron, Ben Mildenhall, Dor Verbin, Pratul P
Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields.CVPR, 2022. 13
2022
-
[3]
Zip-nerf: Anti-aliased grid- based neural radiance fields
Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Zip-nerf: Anti-aliased grid- based neural radiance fields. InICCV, 2023. 22
2023
-
[4]
Gaudi: A neural architect for immersive 3d scene generation.NeurIPS, 2022
Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Lau- rent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, et al. Gaudi: A neural architect for immersive 3d scene generation.NeurIPS, 2022. 5
2022
-
[5]
Demystifying mmd gans.arXiv, 2018
Mikołaj Bi ´nkowski, Danica J Sutherland, Michael Arbel, and Arthur Gretton. Demystifying mmd gans.arXiv, 2018. 7, 16
2018
-
[6]
nuscenes: A multi- modal dataset for autonomous driving
Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InCVPR, 2020. 8, 18
2020
-
[7]
Lightplane: Highly-scalable components for neu- ral 3d fields.arXiv, 2024
Ang Cao, Justin Johnson, Andrea Vedaldi, and David Novotny. Lightplane: Highly-scalable components for neu- ral 3d fields.arXiv, 2024. 3
2024
-
[8]
pi-gan: Periodic implicit genera- tive adversarial networks for 3d-aware image synthesis
Eric R Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. pi-gan: Periodic implicit genera- tive adversarial networks for 3d-aware image synthesis. In CVPR, 2021. 2
2021
-
[9]
Efficient geometry-aware 3d generative adversarial networks
Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. InCVPR, 2022. 2, 3, 6, 7, 8, 14, 15, 16
2022
-
[10]
pixelsplat: 3d gaussian splats from im- age pairs for scalable generalizable 3d reconstruction
David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from im- age pairs for scalable generalizable 3d reconstruction. In CVPR, 2024. 6, 7, 14, 15
2024
-
[11]
Single-stage dif- fusion nerf: A unified approach to 3d generation and recon- struction
Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, and Hao Su. Single-stage dif- fusion nerf: A unified approach to 3d generation and recon- struction. InICCV, 2023. 2, 3, 5, 6, 7, 8, 15, 16
2023
-
[12]
Geosim: Realistic video simu- lation via geometry-aware composition for self-driving
Yun Chen, Frieda Rong, Shivam Duggal, Shenlong Wang, Xinchen Yan, Sivabalan Manivasagam, Shangjie Xue, Ersin Yumer, and Raquel Urtasun. Geosim: Realistic video simu- lation via geometry-aware composition for self-driving. In CVPR, 2021. 2
2021
-
[13]
G3r: Gradient guided gen- eralizable reconstruction
Yun Chen, Jingkang Wang, Ze Yang, Sivabalan Mani- vasagam, and Raquel Urtasun. G3r: Gradient guided gen- eralizable reconstruction. InECCV, 2025. 2, 6, 7, 14, 15
2025
-
[14]
Omnire: Omni urban scene reconstruction.arXiv, 2024
Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lu- tio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, et al. Omnire: Omni urban scene reconstruction.arXiv, 2024. 1, 2, 6, 22
2024
-
[15]
Objaverse: A universe of annotated 3d objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InCVPR, 2023. 2
2023
-
[16]
Diffusion models beat gans on image synthesis.NeurIPS, 2021
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.NeurIPS, 2021. 5
2021
-
[17]
CARLA: An open urban driving simulator
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Anto- nio Lopez, and Vladlen Koltun. CARLA: An open urban driving simulator. InCoRL, 2017. 1
2017
-
[18]
Taming transformers for high-resolution image synthesis
Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. InCVPR,
-
[19]
Dynamic 3d gaussian fields for urban areas.arXiv, 2024
Tobias Fischer, Jonas Kulhanek, Samuel Rota Bul `o, Lorenzo Porzi, Marc Pollefeys, and Peter Kontschieder. Dynamic 3d gaussian fields for urban areas.arXiv, 2024. 1
2024
-
[20]
De- tail me more: Improving gan’s photo-realism of complex scenes
Raghudeep Gadde, Qianli Feng, and Aleix M Martinez. De- tail me more: Improving gan’s photo-realism of complex scenes. InICCV, 2021. 5
2021
-
[21]
Get3d: A generative model of high quality 3d tex- tured shapes learned from images.NeurIPS, 2022
Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d tex- tured shapes learned from images.NeurIPS, 2022. 2, 3
2022
-
[22]
Cat3d: Create anything in 3d with multi-view diffusion models.arXiv, 2024
Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T Barron, and Ben Poole. Cat3d: Create anything in 3d with multi-view diffusion models.arXiv, 2024. 3
2024
-
[23]
Generative adversarial nets.NeurIPS,
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.NeurIPS,
-
[24]
Nerfdiff: Single-image view synthesis with nerf-guided distillation from 3d-aware diffusion
Jiatao Gu, Alex Trevithick, Kai-En Lin, Joshua M Susskind, Christian Theobalt, Lingjie Liu, and Ravi Ra- mamoorthi. Nerfdiff: Single-image view synthesis with nerf-guided distillation from 3d-aware diffusion. InICML,
-
[25]
Gans trained by a two time-scale update rule converge to a local nash equi- librium.NeurIPS, 2017
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equi- librium.NeurIPS, 2017. 6, 7, 16
2017
-
[26]
Denoising dif- fusion probabilistic models.NeurIPS, 2020
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.NeurIPS, 2020. 3, 5, 13
2020
-
[27]
LRM: Large reconstruction model for single im- age to 3d.arXiv, 2023
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large reconstruction model for single im- age to 3d.arXiv, 2023. 2
2023
-
[28]
Rangeldm: Fast realistic lidar point cloud generation
Qianjiang Hu, Zhimin Zhang, and Wei Hu. Rangeldm: Fast realistic lidar point cloud generation. InECCV, 2025. 3
2025
-
[29]
VEGS: View extrapolation of ur- ban scenes in 3d gaussian splatting using learned priors
Sungwon Hwang, Min-Jung Kim, Taewoong Kang, Jayeon Kang, and Jaegul Choo. VEGS: View extrapolation of ur- ban scenes in 3d gaussian splatting using learned priors. arXiv, 2024. 2, 3
2024
-
[30]
Image-to-image translation with conditional adver- sarial networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adver- sarial networks. InCVPR, 2017. 4
2017
-
[31]
Codenerf: Disentan- gled neural radiance fields for object categories
Wonbong Jang and Lourdes Agapito. Codenerf: Disentan- gled neural radiance fields for object categories. InICCV,
-
[32]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InCVPR, 2019. 2
2019
-
[33]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 2023. 22
2023
-
[34]
Autosplat: Constrained gaussian splatting for autonomous driving scene reconstruction.arXiv, 2024
Mustafa Khan, Hamidreza Fazlali, Dhruv Sharma, Tong- tong Cao, Dongfeng Bai, Yuan Ren, and Bingbing Liu. Autosplat: Constrained gaussian splatting for autonomous driving scene reconstruction.arXiv, 2024. 1, 2
2024
-
[35]
Auto-encoding variational bayes
Diederik P Kingma. Auto-encoding variational bayes. arXiv, 2013. 3, 5, 13
2013
-
[36]
Adam: A method for stochastic optimization.ICLR, 2015
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.ICLR, 2015. 14
2015
-
[37]
Segment anything
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. InICCV, 2023. 6, 15, 16
2023
-
[38]
Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv, 2023. 3
2023
-
[39]
Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers
Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chong- hao Sima, Tong Lu, Yu Qiao, and Jifeng Dai. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. InECCV, 2022. 8
2022
-
[40]
Magic3d: High- resolution text-to-3d content creation
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High- resolution text-to-3d content creation. InCVPR, 2023. 2, 3
2023
-
[41]
Neural scene rasterization for large scene rendering in real time
Jeffrey Yunfan Liu, Yun Chen, Ze Yang, Jingkang Wang, Sivabalan Manivasagam, and Raquel Urtasun. Neural scene rasterization for large scene rendering in real time. InICCV,
-
[42]
One-2-3-45: Any sin- gle image to 3d mesh in 45 seconds without per-shape op- timization.NeurIPS, 2024
Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. One-2-3-45: Any sin- gle image to 3d mesh in 45 seconds without per-shape op- timization.NeurIPS, 2024. 3
2024
-
[43]
Meshformer: High- quality mesh generation with 3d-guided reconstruction model.arXiv, 2024
Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Ling- hao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xi- aoshuai Zhang, Isabella Liu, et al. Meshformer: High- quality mesh generation with 3d-guided reconstruction model.arXiv, 2024. 20
2024
-
[44]
Zero-1-to-3: Zero-shot one image to 3d object
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3d object. InICCV, 2023. 3
2023
-
[45]
Meshdif- fusion: Score-based generative 3d mesh modeling.arXiv,
Zhen Liu, Yao Feng, Michael J Black, Derek Nowrouzezahrai, Liam Paull, and Weiyang Liu. Meshdif- fusion: Score-based generative 3d mesh modeling.arXiv,
-
[46]
Wonder3d: Single image to 3d using cross-domain diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Single image to 3d using cross-domain diffusion. InCVPR,
-
[47]
Diffusion probabilistic models for 3d point cloud generation
Shitong Luo and Wei Hu. Diffusion probabilistic models for 3d point cloud generation. InCVPR, 2021. 3
2021
-
[48]
Towards zero domain gap: A comprehensive study of realistic lidar simulation for autonomy testing
Sivabalan Manivasagam, Ioan Andrei B ˆarsan, Jingkang Wang, Ze Yang, and Raquel Urtasun. Towards zero domain gap: A comprehensive study of realistic lidar simulation for autonomy testing. InICCV, 2023. 1
2023
-
[49]
Lt3sd: Latent trees for 3d scene diffusion.arXiv, 2024
Quan Meng, Lei Li, Matthias Nießner, and Angela Dai. Lt3sd: Latent trees for 3d scene diffusion.arXiv, 2024. 3
2024
-
[50]
Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 2021
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 2021. 2
2021
-
[51]
Autorf: Learning 3d object radiance fields from single view observations
Norman M ¨uller, Andrea Simonelli, Lorenzo Porzi, Samuel Rota Bul `o, Matthias Nießner, and Peter Kontschieder. Autorf: Learning 3d object radiance fields from single view observations. InCVPR, 2022. 2
2022
-
[52]
Diffrf: Rendering-guided 3d radiance field diffusion
Norman M ¨uller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulo, Peter Kontschieder, and Matthias Nießner. Diffrf: Rendering-guided 3d radiance field diffusion. InCVPR, 2023. 2, 3, 5, 6
2023
-
[53]
Extracting triangular 3d models, materials, and lighting from images
Jacob Munkberg, Jon Hasselgren, Tianchang Shen, Jun Gao, Wenzheng Chen, Alex Evans, Thomas M ¨uller, and Sanja Fidler. Extracting triangular 3d models, materials, and lighting from images. InCVPR, 2022. 2
2022
-
[54]
Giraffe: Repre- senting scenes as compositional generative neural feature fields
Michael Niemeyer and Andreas Geiger. Giraffe: Repre- senting scenes as compositional generative neural feature fields. InCVPR, 2021. 2
2021
-
[55]
Au- todecoding latent 3d diffusion models.NeurIPS, 2023
Evangelos Ntavelis, Aliaksandr Siarohin, Kyle Olszewski, Chaoyang Wang, Luc V Gool, and Sergey Tulyakov. Au- todecoding latent 3d diffusion models.NeurIPS, 2023. 2, 3
2023
-
[56]
Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction
Michael Oechsle, Songyou Peng, and Andreas Geiger. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. InICCV, 2021. 2
2021
-
[57]
Neural scene graphs for dynamic scenes
Julian Ost, Fahim Mannan, Nils Thuerey, Julian Knodt, and Felix Heide. Neural scene graphs for dynamic scenes. In CVPR, 2021. 2
2021
-
[58]
SDXL: Improving latent diffusion mod- els for high-resolution image synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion mod- els for high-resolution image synthesis. InICLR, 2024. 3
2024
-
[59]
Dreamfusion: Text-to-3d using 2d diffusion.arXiv,
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv,
-
[60]
Neural lighting simulation for urban scenes
Ava Pun, Gary Sun, Jingkang Wang, Yun Chen, Ze Yang, Sivabalan Manivasagam, Wei-Chiu Ma, and Raquel Ur- tasun. Neural lighting simulation for urban scenes. In NeurIPS, 2023. 2, 22
2023
-
[61]
Towards realis- tic scene generation with lidar diffusion models
Haoxi Ran, Vitor Guizilini, and Yue Wang. Towards realis- tic scene generation with lidar diffusion models. InCVPR,
-
[62]
Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies
Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, and Francis Williams. Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies. In CVPR, 2024. 3 10
2024
-
[63]
SCube: Instant large-scale scene reconstruc- tion using voxsplats.arXiv, 2024
Xuanchi Ren, Yifan Lu, Hanxue Liang, Zhangjie Wu, Huan Ling, Mike Chen, Sanja Fidler, Francis Williams, and Ji- ahui Huang. SCube: Instant large-scale scene reconstruc- tion using voxsplats.arXiv, 2024. 2
2024
-
[64]
L3dg: Latent 3d gaussian diffusion
Barbara Roessle, Norman M ¨uller, Lorenzo Porzi, Samuel Rota Bul `o, Peter Kontschieder, Angela Dai, and Matthias Nießner. L3dg: Latent 3d gaussian diffusion. arXiv, 2024. 2, 3
2024
-
[65]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022. 2, 3, 5
2022
-
[66]
Lgsvl simulator: A high fidelity simulator for autonomous driving
Guodong Rong, Byung Hyun Shin, Hadi Tabatabaee, Qiang Lu, Steve Lemke, M ¯artin ¸ˇs Mo ˇzeiko, Eric Boise, Geehoon Uhm, Mark Gerow, and Shalin Mehta. Lgsvl simulator: A high fidelity simulator for autonomous driving. InITSC,
-
[67]
U- net: Convolutional networks for biomedical image segmen- tation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InMICCAI, 2015. 3, 5, 13
2015
-
[68]
Progressive distillation for fast sampling of diffusion models.arXiv, 2022
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.arXiv, 2022. 13
2022
-
[69]
Adv3d: Gener- ating safety-critical 3d objects through closed-loop simula- tion
Jay Sarva, Jingkang Wang, James Tu, Yuwen Xiong, Siva- balan Manivasagam, and Raquel Urtasun. Adv3d: Gener- ating safety-critical 3d objects through closed-loop simula- tion. InCoRL, 2023. 1
2023
-
[70]
AirSim: High-fidelity visual and physical simula- tion for autonomous vehicles
Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. AirSim: High-fidelity visual and physical simula- tion for autonomous vehicles. InField and service robotics,
-
[71]
Gina-3d: Learning to generate implicit neural assets in the wild
Bokui Shen, Xinchen Yan, Charles R Qi, Mahyar Najibi, Boyang Deng, Leonidas Guibas, Yin Zhou, and Dragomir Anguelov. Gina-3d: Learning to generate implicit neural assets in the wild. InCVPR, 2023. 2
2023
-
[72]
Mvdream: Multi-view diffusion for 3d generation.arXiv, 2023
Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d generation.arXiv, 2023. 3
2023
-
[73]
3d neural field genera- tion using triplane diffusion
J Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, and Gordon Wetzstein. 3d neural field genera- tion using triplane diffusion. InCVPR, 2023. 3, 5
2023
-
[74]
Deep unsupervised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InICML, 2015. 5
2015
-
[75]
Denois- ing diffusion implicit models.arXiv, 2020
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models.arXiv, 2020. 3, 5
2020
-
[76]
Score- based generative modeling through stochastic differential equations.arXiv, 2020
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations.arXiv, 2020. 3, 5
2020
-
[77]
Viewset diffusion:(0-) image-conditioned 3d gen- erative models from 2d data
Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Viewset diffusion:(0-) image-conditioned 3d gen- erative models from 2d data. InICCV, 2023. 3
2023
-
[78]
Block-nerf: Scalable large scene neural view synthesis
Matthew Tancik, Vincent Casser, Xinchen Yan, Sabeek Pradhan, Ben Mildenhall, Pratul P Srinivasan, Jonathan T Barron, and Henrik Kretzschmar. Block-nerf: Scalable large scene neural view synthesis. InCVPR, 2022. 1
2022
-
[79]
Dreamgaussian: Generative gaussian splatting for efficient 3d content creation.arXiv, 2023
Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation.arXiv, 2023. 3
2023
-
[80]
Lgm: Large multi-view gaussian model for high-resolution 3d content creation
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. In ECCV, 2025. 3
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.