pith. machine review for the scientific record. sign in

arxiv: 2604.23629 · v2 · submitted 2026-04-26 · 💻 cs.GR

Recognition: no theorem link

From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:50 UTC · model grok-4.3

classification 💻 cs.GR
keywords 3D asset generationproduction pipelineinteractive worldsgenerative modelingtopology optimizationUV parameterizationPBR materialsscene assembly
0
0 comments X

The pith

Organizing 3D generative methods by asset tiers and production stages reveals which outputs meet engine-ready standards for interactive use.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey examines three-dimensional content generation by moving beyond visual appearance alone to the full requirements of interactive applications. It structures the literature along two axes: asset tiers of general objects, characters, and scenes, and the production lifecycle running from data foundations and geometry synthesis through topology optimization, UV unwrapping, PBR materials, rigging, and scene assembly. The goal is to evaluate not only what models can create but whether the results can be loaded directly into real-time engines and simulators without manual fixes. A reader would care because game development, embodied AI, world simulation, digital twins, and spatial computing all depend on assets that satisfy engine constraints on topology, parameterization, materials, rigging, and physical layout. The work also gathers evaluation metrics across geometric fidelity, appearance, usability, and scene-level physical plausibility while listing open problems in data, controllability, and end-to-end assetization.

Core claim

The survey establishes that despite rapid progress in generative modeling a persistent gap remains between current outputs and the production-ready standard required by interactive applications, and that a two-dimensional taxonomy organized around asset tiers and the full production lifecycle provides the clearest way to assess which methods produce assets directly usable in downstream engines and simulation platforms.

What carries the argument

The two-dimensional taxonomy that crosses three asset tiers (general objects, characters, scenes) with the vertical production lifecycle stages from data and geometry synthesis through topology, UV, PBR appearance, rigging, and scene assembly.

Load-bearing premise

That organizing the literature around asset tiers and production lifecycle stages accurately identifies which methods produce assets that satisfy engine-level constraints without further processing.

What would settle it

A broad empirical check showing that outputs from a majority of recent generative methods already satisfy topology, UV parameterization, PBR materials, skeletal rigging, and physics-aware scene constraints when imported directly into standard real-time engines.

Figures

Figures reproduced from arXiv: 2604.23629 by Chunchao Guo, Dazhao Du, Jiafeng Wu, Jian Liu, Song Guo, Zhuofan Lou.

Figure 1
Figure 1. Figure 1: Pipeline-centric taxonomy bridging 3D synthesis and production-ready assets. Vertical axis: view at source ↗
Figure 2
Figure 2. Figure 2: Method taxonomy tree spanning five survey branches: 3D representations, data foundations, view at source ↗
Figure 3
Figure 3. Figure 3: Taxonomy of 3D shape representations and corresponding generative modeling paradigms. view at source ↗
Figure 4
Figure 4. Figure 4: Single-image 3D reconstruction results from open-source and closed-source models on the same view at source ↗
Figure 5
Figure 5. Figure 5: Character and avatar generation methods (2024–2025). Full-body synthesis: TADA! [202], view at source ↗
read the original abstract

Three-dimensional content generation has progressed from producing isolated, visually plausible shapes to constructing structured assets that can be deployed in real-time interactive environments. This trajectory is driven by converging demands from game development, embodied AI, world simulation, digital twins, and spatial computing, all of which require 3D content that goes beyond surface appearance to satisfy engine-level constraints on topology, UV parameterization, physically based materials, skeletal rigging, and physics-aware scene layout. Despite rapid advances in generative modeling, a persistent gap separates the outputs of current methods from the production-ready standard expected by interactive applications. This survey addresses that gap by organizing the literature around the asset production pipeline rather than algorithmic families. Along the horizontal axis we distinguish three asset tiers, namely general objects, characters, and scenes, while the vertical axis traces each tier through the full production lifecycle from data foundations and geometry synthesis through topology optimization, UV unwrapping, PBR appearance, rigging, and scene assembly. Through this two-dimensional taxonomy we assess not only what current methods can generate but whether their outputs are directly usable in downstream engines and simulation platforms. We further consolidate evaluation metrics and protocols that span geometric fidelity, appearance quality, asset usability, and scene-level physical plausibility. The survey concludes by identifying open challenges in data quality, generation controllability, end-to-end assetization, and physically grounded generation, and by situating production-ready 3D content as foundational infrastructure for emerging interactive world models and embodied intelligent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript is a literature survey on 3D asset generation. It organizes prior work via a two-dimensional taxonomy with horizontal axis of asset tiers (general objects, characters, scenes) and vertical axis of the production lifecycle (data foundations, geometry synthesis, topology optimization, UV unwrapping, PBR appearance, rigging, scene assembly). The central claim is that rapid advances in generative modeling have not closed a persistent gap to production-ready assets satisfying engine constraints on topology, UVs, materials, rigging, and physics layout; the survey assesses current methods' direct usability in downstream engines, consolidates metrics spanning geometric fidelity to scene-level physical plausibility, and identifies open challenges in data quality, controllability, end-to-end assetization, and physically grounded generation.

Significance. If the taxonomy accurately reflects usability, the survey supplies a useful organizing framework that shifts emphasis from algorithmic families to pipeline requirements. This can guide research toward assets deployable in games, simulation, digital twins, and embodied AI, while the consolidated metrics and challenge list provide concrete directions for closing the identified production gap.

major comments (1)
  1. [Taxonomy description and usability assessment (abstract and main taxonomy sections)] The assessment that current methods fail to produce directly usable assets (central to the gap claim and taxonomy evaluation) rests on qualitative categorization of the cited literature rather than direct empirical validation. No engine-level tests (e.g., importing outputs into Unity/Unreal to measure failure rates from non-manifold edges, invalid UV seams, or missing skeletal hierarchies) are described, which risks overstating or understating the gap by missing constraints not explicit in paper abstracts or results sections.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the survey's potential to guide research toward production-ready 3D assets. We address the major comment below, providing an honest account of our methodology as a literature survey and describing the revisions we will make.

read point-by-point responses
  1. Referee: The assessment that current methods fail to produce directly usable assets (central to the gap claim and taxonomy evaluation) rests on qualitative categorization of the cited literature rather than direct empirical validation. No engine-level tests (e.g., importing outputs into Unity/Unreal to measure failure rates from non-manifold edges, invalid UV seams, or missing skeletal hierarchies) are described, which risks overstating or understating the gap by missing constraints not explicit in paper abstracts or results sections.

    Authors: We agree that the usability assessment in the taxonomy is derived from qualitative synthesis of the capabilities, limitations, and output characteristics reported across the cited papers, rather than from new direct empirical tests such as importing assets into Unity or Unreal to measure specific failure rates. As this is a survey, performing comprehensive engine-level validation on dozens of methods would require re-implementation, standardized testing protocols, and resources that fall outside the scope of a literature review. We believe the central gap claim remains supported by the consistent absence of production features (e.g., guaranteed manifold topology, valid UV parameterization, complete skeletal hierarchies) in the documented outputs. To improve transparency, we will revise the taxonomy sections and add a brief limitations discussion explicitly noting the reliance on published results and the possibility that some engine-specific constraints may not be fully captured in the literature. This will clarify the methodology without altering the survey's conclusions or taxonomy structure. revision: yes

Circularity Check

0 steps flagged

No circularity: survey taxonomy synthesizes external literature without self-referential derivations

full rationale

The paper is a literature survey that proposes a 2D taxonomy (asset tiers × production lifecycle stages) to organize prior work on 3D generation. It makes no new quantitative predictions, fits no parameters to data, and advances no equations or uniqueness theorems. The central claim of a 'persistent gap' to production-ready assets is supported by qualitative mapping of cited external papers rather than any reduction to the authors' own inputs or self-citations. No load-bearing step reduces by construction to a fitted value or prior self-citation; the taxonomy is an organizational framework whose validity rests on the accuracy of the cited literature, not on internal self-definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature survey with no new mathematical derivations, fitted parameters, axioms, or postulated entities introduced by the authors.

pith-pipeline@v0.9.0 · 5579 in / 1100 out tokens · 58373 ms · 2026-05-12T01:50:46.216175+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

278 extracted references · 278 canonical work pages · 5 internal anchors

  1. [1]

    Barron, and Ben Mildenhall

    Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. InInternational Conference on Learning Representations (ICLR), 2023

  2. [2]

    Magic3d: High-resolution text-to-3d content creation

    Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 300–309, 2023

  3. [3]

    Prolific- dreamer: High-fidelity and diverse text-to-3d generation with variational score distillation

    Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolific- dreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  4. [4]

    Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation

    Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, and Shenghua Gao. Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. 22

  5. [5]

    Clay: A controllable large-scale generative model for creating high-quality 3d assets

    Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, and Jingyi Yu. Clay: A controllable large-scale generative model for creating high-quality 3d assets. ACM Trans. Graph., 43(4):1–20, 2024

  6. [6]

    Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

    Zibo Zhao, Zeqiang Lai, Qingxiang Lin, Yunfei Zhao, Haolin Liu, Shuhui Yang, Yifei Feng, Mingxin Yang, Sheng Zhang, Xianghui Yang, et al. Hunyuan3d 2.0: Scaling diffusion models for high resolution textured 3d assets generation.arXiv preprint arXiv:2501.12202, 2025

  7. [7]

    LRM: Large reconstruction model for single image to 3d

    Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large reconstruction model for single image to 3d. InInternational Conference on Learning Representations (ICLR), 2024

  8. [8]

    Gs-lrm: Large reconstruction model for 3d gaussian splatting

    Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. Gs-lrm: Large reconstruction model for 3d gaussian splatting. InEuropean Conference on Computer Vision (ECCV), pages 1–19, 2024. doi: 10.1007/978-3-031-72670-5_1

  9. [9]

    Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models

    Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan- Chen Guo, Ding Liang, Wanli Ouyang, and Yan-Pei Cao. Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models, 2025. URLhttps://arxiv.org/abs/2502.06608

  10. [10]

    Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal M

    Jake Bruce, Michael D. Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal M. P. Behbahani, Stephanie C. Y. Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott E. Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Fr...

  11. [11]

    Diffusion models are real-time game engines

    Dani Valevski, Yaniv Leviathan, Moab Arar, and Shlomi Fruchter. Diffusion models are real-time game engines. InInternational Conference on Learning Representations (ICLR), 2025

  12. [12]

    Cosmos World Foundation Model Platform for Physical AI

    NVIDIA. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025

  13. [13]

    Infinite photorealistic worlds using procedural gen- eration

    AlexanderRaistrick, LahavLipson, ZeyuMa, LingjieMei, MingzheWang, YimingZuo, KarhanKayan, Hongyu Wen, Beining Han, Yihan Wang, et al. Infinite photorealistic worlds using procedural gen- eration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12630–12641, 2023

  14. [14]

    Procthor: Large-scale embodied ai using procedural generation

    Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Procthor: Large-scale embodied ai using procedural generation. InAdvances in Neural Information Processing Systems (NeurIPS), volume 35, pages 5982–5994, 2022

  15. [15]

    Holodeck: Language guided generation of 3d embodied AI environments

    Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, and Christopher Clark. Holodeck: Language guided generation of 3d embodied AI environments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog...

  16. [16]

    3d-gpt: Procedural 3d modeling with large language models

    Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, and Stephen Gould. 3d-gpt: Procedural 3d modeling with large language models. InProceedings of the International Conference on 3D Vision (3DV), pages 1253–1263, 2025

  17. [17]

    Ross, Cordelia Schmid, and Alireza Fathi

    Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, and Alireza Fathi. Scenecraft: An LLM agent for synthesizing 3d scenes as blender code. InInternational Conference on Machine Learning (ICML), pages 19252–19282, 2024. 23

  18. [18]

    Habitat: A platform for embodied AI research

    Manolis Savva, Jitendra Malik, Devi Parikh, Dhruv Batra, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, and Vladlen Koltun. Habitat: A platform for embodied AI research. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9338–9346, 2019

  19. [19]

    Isaac gym: High performance gpu-based physics simulation for robot learning

    Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac gym: High performance gpu-based physics simulation for robot learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

  20. [20]

    Infinigen indoors: Photorealistic indoor scenes using procedural generation

    Alexander Raistrick, Lingjie Mei, Karhan Kayan, David Yan, Yiming Zuo, Beining Han, Hongyu Wen, Meenal Parakh, Stamatis Alexandropoulos, Lahav Lipson, Zeyu Ma, and Jia Deng. Infinigen indoors: Photorealistic indoor scenes using procedural generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21783–21...

  21. [21]

    Paint3d: Paint anything 3d with lighting-less texture diffusion models

    Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, and Gang Yu. Paint3d: Paint anything 3d with lighting-less texture diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4252–4262, 2024

  22. [22]

    arXiv preprint arXiv:2403.02151 , year=

    Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, , Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, and Yan-Pei Cao. Triposr: Fast 3d object reconstruction from a single image.arXiv preprint arXiv:2403.02151, 2024

  23. [23]

    Meshgpt: Generating triangle meshes with decoder-only transformers

    Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, and Matthias Nießner. Meshgpt: Generating triangle meshes with decoder-only transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 19615–19625, 2024

  24. [24]

    Meshanything: Artist-created mesh generation with autoregressive transformers

    Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, and Chi Zhang. Meshanything: Artist-created mesh generation with autoregressive transformers. InInternational Conference on Learning Representations (ICLR), 2025

  25. [25]

    Deepmesh: Auto-regressive artist-mesh creation with reinforcement learning

    Ruowen Zhao, Junliang Ye, Zhengyi Wang, Guangce Liu, Yiwen Chen, Yikai Wang, and Jun Zhu. Deepmesh: Auto-regressive artist-mesh creation with reinforcement learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10612–10623, 2025

  26. [26]

    Hunyuan3d studio: End-to-end ai pipeline for game-ready 3d asset generation.arXiv preprint arXiv:2509.12815, 2025

    Biwen Lei, Yang Li, Xinhai Liu, Shuhui Yang, Lixin Xu, Jingwei Huang, Ruining Tang, Haohan Weng, Jian Liu, Jing Xu, et al. Hunyuan3d studio: End-to-end ai pipeline for game-ready 3d asset generation. arXiv preprint arXiv:2509.12815, 2025

  27. [27]

    Structured 3d latents for scalable and versatile 3d generation

    Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21469– 21480, 2025

  28. [28]

    Quadgpt: Native quadrilateral mesh generation with autoregressive models

    Jian Liu, Chunshi Wang, Song Guo, Haohan Weng, Zhen Zhou, Zhiqi Li, Jiaao Yu, Yiling Zhu, Jing Xu, Biwen Lei, et al. Quadgpt: Native quadrilateral mesh generation with autoregressive models. arXiv preprint arXiv:2509.21420, 2025

  29. [29]

    Auto-regressive surface cutting.arXiv preprint arXiv:2506.18017, 2025

    Yang Li, Victor Cheung, Xinhai Liu, Yuguang Chen, Zhongjin Luo, Biwen Lei, Haohan Weng, Zibo Zhao, Jingwei Huang, Zhuo Chen, et al. Auto-regressive surface cutting.arXiv preprint arXiv:2506.18017, 2025

  30. [30]

    Texgen: a generative diffusion model for mesh textures.ACM Trans

    Xin Yu, Ze Yuan, Yuan-Chen Guo, Ying-Tian Liu, Jianhui Liu, Yangguang Li, Yan-Pei Cao, Ding Liang, and Xiaojuan Qi. Texgen: a generative diffusion model for mesh textures.ACM Trans. Graph., 43(6):213:1–213:14, 2024. 24

  31. [31]

    Materialmvp: Illumination-invariant material generation via multi-view pbr diffusion

    Zebin He, Mingxin Yang, Shuhui Yang, Yixuan Tang, Tao Wang, Kaihao Zhang, Guanying Chen, Yuhong Liu, Jie Jiang, Chunchao Guo, et al. Materialmvp: Illumination-invariant material generation via multi-view pbr diffusion. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 26294–26305, 2025

  32. [32]

    Rignet: neural rigging for articulated characters.ACM Trans

    Zhan Xu, Yang Zhou, Evangelos Kalogerakis, Chris Landreth, and Karan Singh. Rignet: neural rigging for articulated characters.ACM Trans. Graph., 39(4):58, 2020

  33. [33]

    Skinningnet: Two-stream graph convolutional neural network for skinning prediction of synthetic characters

    Albert Mosella-Montoro and Javier Ruiz-Hidalgo. Skinningnet: Two-stream graph convolutional neural network for skinning prediction of synthetic characters. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18593–18602, 2022

  34. [34]

    Native and compact structured latents for 3d generation.arXiv preprint arXiv:2512.14692, 2025

    Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, and Jiaolong Yang. Native and compact structured latents for 3d generation.CoRR, abs/2512.14692, 2025

  35. [35]

    Physcene: Physically interactable 3d scene synthesis for embodied ai

    Yandan Yang, Baoxiong Jia, Peiyuan Zhi, and Siyuan Huang. Physcene: Physically interactable 3d scene synthesis for embodied ai. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16262–16272, 2024

  36. [36]

    Pose2room: understanding 3d scenes from human activities

    Yinyu Nie, Angela Dai, Xiaoguang Han, and Matthias Nießner. Pose2room: understanding 3d scenes from human activities. InEuropean Conference on Computer Vision (ECCV), pages 425–443, 2022

  37. [37]

    Mime: Human-aware 3d scene generation

    Hongwei Yi, Chun-Hao P Huang, Shashank Tripathi, Lea Hering, Justus Thies, and Michael J Black. Mime: Human-aware 3d scene generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12965–12976, 2023

  38. [38]

    Recent advances in 3d object and scene generation: A survey,

    Xiang Tang, Ruotong Li, and Xiaopeng Fan. Recent advances in 3d object and scene generation: A survey.arXiv preprint arXiv:2504.11734, 2025

  39. [39]

    Kaisei Fukaya, Damon Daylamani-Zad, and Harry W. Agius. Intelligent generation of graphical game assets: A conceptual framework and systematic review of the state of the art.ACM Comput. Surv., 57(5):118:1–118:38, 2025

  40. [40]

    Ai-generated content (AIGC) for various data modali- ties: A survey.ACM Comput

    Lin Geng Foo, Hossein Rahmani, and Jun Liu. Ai-generated content (AIGC) for various data modali- ties: A survey.ACM Comput. Surv., 57(9):243:1–243:66, 2025

  41. [41]

    arXiv preprint arXiv:2505.05474 (2025)

    Beichen Wen, Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, and Ziwei Liu. 3d scene generation: A survey.arXiv preprint arXiv:2505.05474, 2025

  42. [42]

    ShapeNet: An Information-Rich 3D Model Repository

    Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository.arXiv preprint arXiv:1512.03012, 2015

  43. [43]

    Objaverse: A universe of annotated 3d objects

    Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13142–13153, 2023

  44. [44]

    Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Information Processing Systems, 36:35799–35813, 2023

    Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Information Processing Systems, 36:35799–35813, 2023

  45. [45]

    Smpl: A skinned multi-person linear model.ACM Trans

    Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. Smpl: A skinned multi-person linear model.ACM Trans. Graph., 34(6), 2015

  46. [46]

    3d-front: 3d furnished rooms with layouts and semantics

    Huan Fu, Bowen Cai, Lin Gao, Ling-Xiao Zhang, Jiaming Wang, Cao Li, Qixun Zeng, Chengyue Sun, Rongfei Jia, Binqiang Zhao, et al. 3d-front: 3d furnished rooms with layouts and semantics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10933– 10942, 2021. 25

  47. [47]

    ATISS: autoregressive transformers for indoor scene synthesis

    Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, and Sanja Fidler. ATISS: autoregressive transformers for indoor scene synthesis. InAdvances in Neural Information Processing Systems (NeurIPS), pages 12013–12026, 2021

  48. [48]

    Physgen3d: Crafting a miniature interactive world from a single image

    Boyuan Chen, Hanxiao Jiang, Shaowei Liu, Saurabh Gupta, Yunzhu Li, Hao Zhao, and Shenlong Wang. Physgen3d: Crafting a miniature interactive world from a single image. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6178–6189, 2025

  49. [49]

    Citycraft: A real crafter for 3d city generation

    JieDeng, WenhaoChai, JunshengHuang, ZhonghanZhao, QixuanHuang, MingyanGao, JianshuGuo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, et al. Citycraft: A real crafter for 3d city generation. arXiv preprint arXiv:2406.04983, 2024

  50. [50]

    Unrealllm: Towards highly controllable and interactable 3d scene generation by llm-powered procedural content generation

    Song Tang, Kaiyong Zhao, Lei Wang, Yuliang Li, Xuebo Liu, Junyi Zou, Qiang Wang, and Xiaowen Chu. Unrealllm: Towards highly controllable and interactable 3d scene generation by llm-powered procedural content generation. InFindings of the Association for Computational Linguistics: ACL 2025, pages 19417–19435, 2025. doi: 10.18653/v1/2025.findings-acl.994

  51. [51]

    Deep learning for 3d point clouds: A survey.IEEE Trans

    Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. Deep learning for 3d point clouds: A survey.IEEE Trans. Pattern Anal. Mach. Intell., 43(12):4338–4364, 2020

  52. [52]

    Pointnet: Deep learning on point sets for 3d classification and segmentation

    Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017

  53. [53]

    3d shapenets: A deep representation for volumetric shapes

    Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1912–1920, 2015

  54. [54]

    Voxnet: A 3d convolutional neural network for real-time object recognition

    Daniel Maturana and Sebastian Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. InProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 922–928, 2015

  55. [55]

    Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs

    Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. InProceedings of the IEEE international conference on computer vision, pages 2088–2096, 2017

  56. [56]

    Octnet: Learning deep 3d representations at high resolutions

    Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. Octnet: Learning deep 3d representations at high resolutions. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3577–3586, 2017

  57. [57]

    Instant neural graphics primi- tives with a multiresolution hash encoding.ACM Trans

    Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primi- tives with a multiresolution hash encoding.ACM Trans. Graph., 41(4):1–15, 2022

  58. [58]

    Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies

    Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, and Francis Williams. Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4209–4219, 2024

  59. [59]

    A survey on deep learning advances on different 3d data representations.arXiv preprint arXiv:1808.01462, 2018

    Eman Ahmed, Alexandre Saint, Abd El Rahman Shabayek, Kseniya Cherenkova, Rig Das, Gleb Gusev, Djamila Aouada, and Bjorn Ottersten. A survey on deep learning advances on different 3d data representations.arXiv preprint arXiv:1808.01462, 2018

  60. [60]

    Geometric deep learning on graphs and manifolds using mixture model cnns

    Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5115–5124, 2017

  61. [61]

    Srinivasan, Matthew Tancik, Jonathan T

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InEuropean Conference on Computer Vision (ECCV), pages 405–421, 2020. 26

  62. [62]

    Mip-nerf 360: Unbounded anti-aliased neural radiance fields

    Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5470–5479, 2022

  63. [63]

    Zip- nerf: Anti-aliased grid-based neural radiance fields

    Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Zip- nerf: Anti-aliased grid-based neural radiance fields. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 19697–19705, 2023

  64. [64]

    Deepsdf: Learning continuous signed distance functions for shape representation

    Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 165–174, 2019

  65. [65]

    Oc- cupancy networks: Learning 3d reconstruction in function space

    Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. Oc- cupancy networks: Learning 3d reconstruction in function space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4460–4470, 2019

  66. [66]

    Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction

    Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. InAdvances in Neural Information Processing Systems (NeurIPS), pages 27171–27183, 2021

  67. [67]

    Volume rendering of neural implicit surfaces

    Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. Volume rendering of neural implicit surfaces. Advances in neural information processing systems, 34:4805–4815, 2021

  68. [68]

    Lorensen and Harvey E

    William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. InACM SIGGRAPH Conference Proceedings, pages 163–169, 1987

  69. [69]

    Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting

    Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1606–1616, 2024

  70. [70]

    Mega: Hybrid mesh-gaussian head avatar for high-fidelity rendering and head editing

    Cong Wang, Di Kang, Heyi Sun, Shenhan Qian, Zixuan Wang, Linchao Bao, and Song-Hai Zhang. Mega: Hybrid mesh-gaussian head avatar for high-fidelity rendering and head editing. InProceedings of the computer vision and pattern recognition conference, pages 26274–26284, 2025

  71. [71]

    Deep marching tetrahe- dra: a hybrid representation for high-resolution 3d shape synthesis.Advances in Neural Information Processing Systems, 34:6087–6101, 2021

    Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. Deep marching tetrahe- dra: a hybrid representation for high-resolution 3d shape synthesis.Advances in Neural Information Processing Systems, 34:6087–6101, 2021

  72. [72]

    Flexible isosurface extraction for gradient-based mesh optimization.ACM Trans

    Tianchang Shen, Jacob Munkberg, Jon Hasselgren, Kangxue Yin, Zian Wang, Wenzheng Chen, Zan Gojcic, Sanja Fidler, Nicholas Sharp, and Jun Gao. Flexible isosurface extraction for gradient-based mesh optimization.ACM Trans. Graph., 42(4), 2023. doi: 10.1145/3592430

  73. [73]

    Get3d: A generative model of high quality 3d textured shapes learned from images

    Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

  74. [74]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, et al. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023

  75. [75]

    Efficient geometry-aware 3d generative adversarial networks

    Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16123–16133, 2022

  76. [76]

    Kingma and Max Welling

    Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. InInternational Conference on Learning Representations (ICLR), 2014. 27

  77. [77]

    Generative adversarial nets.Advances in neural information processing systems, 27, 2014

    Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014

  78. [78]

    Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

  79. [79]

    A point set generation network for 3d object recon- struction from a single image

    Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object recon- struction from a single image. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613, 2017

  80. [80]

    Pixel2mesh: Generating 3d mesh models from single rgb images

    Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. InEuropean Conference on Computer Vision (ECCV), pages 52–67, 2018

Showing first 80 references.