arxiv: 2604.23629 · v2 · submitted 2026-04-26 · 💻 cs.GR

Recognition: no theorem link

From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation

Jiafeng Wu , Zhuofan Lou , Jian Liu , Dazhao Du , Chunchao Guo , Song Guo

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:50 UTC · model grok-4.3

classification 💻 cs.GR

keywords 3D asset generationproduction pipelineinteractive worldsgenerative modelingtopology optimizationUV parameterizationPBR materialsscene assembly

0 comments

The pith

Organizing 3D generative methods by asset tiers and production stages reveals which outputs meet engine-ready standards for interactive use.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey examines three-dimensional content generation by moving beyond visual appearance alone to the full requirements of interactive applications. It structures the literature along two axes: asset tiers of general objects, characters, and scenes, and the production lifecycle running from data foundations and geometry synthesis through topology optimization, UV unwrapping, PBR materials, rigging, and scene assembly. The goal is to evaluate not only what models can create but whether the results can be loaded directly into real-time engines and simulators without manual fixes. A reader would care because game development, embodied AI, world simulation, digital twins, and spatial computing all depend on assets that satisfy engine constraints on topology, parameterization, materials, rigging, and physical layout. The work also gathers evaluation metrics across geometric fidelity, appearance, usability, and scene-level physical plausibility while listing open problems in data, controllability, and end-to-end assetization.

Core claim

The survey establishes that despite rapid progress in generative modeling a persistent gap remains between current outputs and the production-ready standard required by interactive applications, and that a two-dimensional taxonomy organized around asset tiers and the full production lifecycle provides the clearest way to assess which methods produce assets directly usable in downstream engines and simulation platforms.

What carries the argument

The two-dimensional taxonomy that crosses three asset tiers (general objects, characters, scenes) with the vertical production lifecycle stages from data and geometry synthesis through topology, UV, PBR appearance, rigging, and scene assembly.

Load-bearing premise

That organizing the literature around asset tiers and production lifecycle stages accurately identifies which methods produce assets that satisfy engine-level constraints without further processing.

What would settle it

A broad empirical check showing that outputs from a majority of recent generative methods already satisfy topology, UV parameterization, PBR materials, skeletal rigging, and physics-aware scene constraints when imported directly into standard real-time engines.

Figures

Figures reproduced from arXiv: 2604.23629 by Chunchao Guo, Dazhao Du, Jiafeng Wu, Jian Liu, Song Guo, Zhuofan Lou.

**Figure 1.** Figure 1: Pipeline-centric taxonomy bridging 3D synthesis and production-ready assets. Vertical axis: view at source ↗

**Figure 2.** Figure 2: Method taxonomy tree spanning five survey branches: 3D representations, data foundations, view at source ↗

**Figure 3.** Figure 3: Taxonomy of 3D shape representations and corresponding generative modeling paradigms. view at source ↗

**Figure 4.** Figure 4: Single-image 3D reconstruction results from open-source and closed-source models on the same view at source ↗

**Figure 5.** Figure 5: Character and avatar generation methods (2024–2025). Full-body synthesis: TADA! [202], view at source ↗

read the original abstract

Three-dimensional content generation has progressed from producing isolated, visually plausible shapes to constructing structured assets that can be deployed in real-time interactive environments. This trajectory is driven by converging demands from game development, embodied AI, world simulation, digital twins, and spatial computing, all of which require 3D content that goes beyond surface appearance to satisfy engine-level constraints on topology, UV parameterization, physically based materials, skeletal rigging, and physics-aware scene layout. Despite rapid advances in generative modeling, a persistent gap separates the outputs of current methods from the production-ready standard expected by interactive applications. This survey addresses that gap by organizing the literature around the asset production pipeline rather than algorithmic families. Along the horizontal axis we distinguish three asset tiers, namely general objects, characters, and scenes, while the vertical axis traces each tier through the full production lifecycle from data foundations and geometry synthesis through topology optimization, UV unwrapping, PBR appearance, rigging, and scene assembly. Through this two-dimensional taxonomy we assess not only what current methods can generate but whether their outputs are directly usable in downstream engines and simulation platforms. We further consolidate evaluation metrics and protocols that span geometric fidelity, appearance quality, asset usability, and scene-level physical plausibility. The survey concludes by identifying open challenges in data quality, generation controllability, end-to-end assetization, and physically grounded generation, and by situating production-ready 3D content as foundational infrastructure for emerging interactive world models and embodied intelligent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A survey that organizes 3D generation literature around production stages but judges the usability gap from paper descriptions alone.

read the letter

This paper is a survey that maps generative 3D work onto the steps needed for assets in interactive applications. It uses a grid of asset types and production stages to highlight where methods fall short of engine requirements. The new part is the two-axis taxonomy. One axis has general objects, characters, and scenes. The other runs through data, geometry, topology, UV unwrapping, PBR materials, rigging, and assembly. This setup lets them review the literature by how close each method gets to usable output rather than by the underlying algorithm. They also pull together metrics for geometry, appearance, usability, and physical plausibility, and list open issues like data quality and end-to-end pipelines. That organization is useful. It gives a clear picture of the production gap without getting lost in technical details of each model. The main limitation is that the judgment on usability comes from how the papers describe their results. There is no sign they took generated assets and loaded them into Unity or Unreal to measure actual failures, such as bad topology stopping simulations or missing rigs breaking animation. The gap is asserted based on the cited work, which may not capture every engine-specific constraint. This kind of survey is for researchers in computer graphics and AI who are thinking about deploying 3D generation in games, robotics, or simulation. It is not a new method or dataset, but it can help steer future papers toward the right problems. I would send it to peer review. The framing is solid and the field needs this kind of overview to move from visual quality to functional assets.

Referee Report

1 major / 0 minor

Summary. The manuscript is a literature survey on 3D asset generation. It organizes prior work via a two-dimensional taxonomy with horizontal axis of asset tiers (general objects, characters, scenes) and vertical axis of the production lifecycle (data foundations, geometry synthesis, topology optimization, UV unwrapping, PBR appearance, rigging, scene assembly). The central claim is that rapid advances in generative modeling have not closed a persistent gap to production-ready assets satisfying engine constraints on topology, UVs, materials, rigging, and physics layout; the survey assesses current methods' direct usability in downstream engines, consolidates metrics spanning geometric fidelity to scene-level physical plausibility, and identifies open challenges in data quality, controllability, end-to-end assetization, and physically grounded generation.

Significance. If the taxonomy accurately reflects usability, the survey supplies a useful organizing framework that shifts emphasis from algorithmic families to pipeline requirements. This can guide research toward assets deployable in games, simulation, digital twins, and embodied AI, while the consolidated metrics and challenge list provide concrete directions for closing the identified production gap.

major comments (1)

[Taxonomy description and usability assessment (abstract and main taxonomy sections)] The assessment that current methods fail to produce directly usable assets (central to the gap claim and taxonomy evaluation) rests on qualitative categorization of the cited literature rather than direct empirical validation. No engine-level tests (e.g., importing outputs into Unity/Unreal to measure failure rates from non-manifold edges, invalid UV seams, or missing skeletal hierarchies) are described, which risks overstating or understating the gap by missing constraints not explicit in paper abstracts or results sections.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the survey's potential to guide research toward production-ready 3D assets. We address the major comment below, providing an honest account of our methodology as a literature survey and describing the revisions we will make.

read point-by-point responses

Referee: The assessment that current methods fail to produce directly usable assets (central to the gap claim and taxonomy evaluation) rests on qualitative categorization of the cited literature rather than direct empirical validation. No engine-level tests (e.g., importing outputs into Unity/Unreal to measure failure rates from non-manifold edges, invalid UV seams, or missing skeletal hierarchies) are described, which risks overstating or understating the gap by missing constraints not explicit in paper abstracts or results sections.

Authors: We agree that the usability assessment in the taxonomy is derived from qualitative synthesis of the capabilities, limitations, and output characteristics reported across the cited papers, rather than from new direct empirical tests such as importing assets into Unity or Unreal to measure specific failure rates. As this is a survey, performing comprehensive engine-level validation on dozens of methods would require re-implementation, standardized testing protocols, and resources that fall outside the scope of a literature review. We believe the central gap claim remains supported by the consistent absence of production features (e.g., guaranteed manifold topology, valid UV parameterization, complete skeletal hierarchies) in the documented outputs. To improve transparency, we will revise the taxonomy sections and add a brief limitations discussion explicitly noting the reliance on published results and the possibility that some engine-specific constraints may not be fully captured in the literature. This will clarify the methodology without altering the survey's conclusions or taxonomy structure. revision: yes

Circularity Check

0 steps flagged

No circularity: survey taxonomy synthesizes external literature without self-referential derivations

full rationale

The paper is a literature survey that proposes a 2D taxonomy (asset tiers × production lifecycle stages) to organize prior work on 3D generation. It makes no new quantitative predictions, fits no parameters to data, and advances no equations or uniqueness theorems. The central claim of a 'persistent gap' to production-ready assets is supported by qualitative mapping of cited external papers rather than any reduction to the authors' own inputs or self-citations. No load-bearing step reduces by construction to a fitted value or prior self-citation; the taxonomy is an organizational framework whose validity rests on the accuracy of the cited literature, not on internal self-definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a literature survey with no new mathematical derivations, fitted parameters, axioms, or postulated entities introduced by the authors.

pith-pipeline@v0.9.0 · 5579 in / 1100 out tokens · 58373 ms · 2026-05-12T01:50:46.216175+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

278 extracted references · 278 canonical work pages · 5 internal anchors

[1]

Barron, and Ben Mildenhall

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. InInternational Conference on Learning Representations (ICLR), 2023

work page 2023
[2]

Magic3d: High-resolution text-to-3d content creation

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 300–309, 2023

work page 2023
[3]

Prolific- dreamer: High-fidelity and diverse text-to-3d generation with variational score distillation

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolific- dreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[4]

Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation

Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, and Shenghua Gao. Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. 22

work page 2023
[5]

Clay: A controllable large-scale generative model for creating high-quality 3d assets

Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, and Jingyi Yu. Clay: A controllable large-scale generative model for creating high-quality 3d assets. ACM Trans. Graph., 43(4):1–20, 2024

work page 2024
[6]

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Zibo Zhao, Zeqiang Lai, Qingxiang Lin, Yunfei Zhao, Haolin Liu, Shuhui Yang, Yifei Feng, Mingxin Yang, Sheng Zhang, Xianghui Yang, et al. Hunyuan3d 2.0: Scaling diffusion models for high resolution textured 3d assets generation.arXiv preprint arXiv:2501.12202, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

LRM: Large reconstruction model for single image to 3d

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large reconstruction model for single image to 3d. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[8]

Gs-lrm: Large reconstruction model for 3d gaussian splatting

Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. Gs-lrm: Large reconstruction model for 3d gaussian splatting. InEuropean Conference on Computer Vision (ECCV), pages 1–19, 2024. doi: 10.1007/978-3-031-72670-5_1

work page doi:10.1007/978-3-031-72670-5_1 2024
[9]

Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models

Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan- Chen Guo, Ding Liang, Wanli Ouyang, and Yan-Pei Cao. Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models, 2025. URLhttps://arxiv.org/abs/2502.06608

work page arXiv 2025
[10]

Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal M

Jake Bruce, Michael D. Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal M. P. Behbahani, Stephanie C. Y. Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott E. Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Fr...

work page 2024
[11]

Diffusion models are real-time game engines

Dani Valevski, Yaniv Leviathan, Moab Arar, and Shlomi Fruchter. Diffusion models are real-time game engines. InInternational Conference on Learning Representations (ICLR), 2025

work page 2025
[12]

Cosmos World Foundation Model Platform for Physical AI

NVIDIA. Cosmos world foundation model platform for physical ai.arXiv preprint arXiv:2501.03575, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Infinite photorealistic worlds using procedural gen- eration

AlexanderRaistrick, LahavLipson, ZeyuMa, LingjieMei, MingzheWang, YimingZuo, KarhanKayan, Hongyu Wen, Beining Han, Yihan Wang, et al. Infinite photorealistic worlds using procedural gen- eration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12630–12641, 2023

work page 2023
[14]

Procthor: Large-scale embodied ai using procedural generation

Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Procthor: Large-scale embodied ai using procedural generation. InAdvances in Neural Information Processing Systems (NeurIPS), volume 35, pages 5982–5994, 2022

work page 2022
[15]

Holodeck: Language guided generation of 3d embodied AI environments

Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, and Christopher Clark. Holodeck: Language guided generation of 3d embodied AI environments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog...

work page 2024
[16]

3d-gpt: Procedural 3d modeling with large language models

Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, and Stephen Gould. 3d-gpt: Procedural 3d modeling with large language models. InProceedings of the International Conference on 3D Vision (3DV), pages 1253–1263, 2025

work page 2025
[17]

Ross, Cordelia Schmid, and Alireza Fathi

Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, and Alireza Fathi. Scenecraft: An LLM agent for synthesizing 3d scenes as blender code. InInternational Conference on Machine Learning (ICML), pages 19252–19282, 2024. 23

work page 2024
[18]

Habitat: A platform for embodied AI research

Manolis Savva, Jitendra Malik, Devi Parikh, Dhruv Batra, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, and Vladlen Koltun. Habitat: A platform for embodied AI research. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9338–9346, 2019

work page 2019
[19]

Isaac gym: High performance gpu-based physics simulation for robot learning

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac gym: High performance gpu-based physics simulation for robot learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[20]

Infinigen indoors: Photorealistic indoor scenes using procedural generation

Alexander Raistrick, Lingjie Mei, Karhan Kayan, David Yan, Yiming Zuo, Beining Han, Hongyu Wen, Meenal Parakh, Stamatis Alexandropoulos, Lahav Lipson, Zeyu Ma, and Jia Deng. Infinigen indoors: Photorealistic indoor scenes using procedural generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21783–21...

work page 2024
[21]

Paint3d: Paint anything 3d with lighting-less texture diffusion models

Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, and Gang Yu. Paint3d: Paint anything 3d with lighting-less texture diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4252–4262, 2024

work page 2024
[22]

arXiv preprint arXiv:2403.02151 , year=

Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, , Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, and Yan-Pei Cao. Triposr: Fast 3d object reconstruction from a single image.arXiv preprint arXiv:2403.02151, 2024

work page arXiv 2024
[23]

Meshgpt: Generating triangle meshes with decoder-only transformers

Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, and Matthias Nießner. Meshgpt: Generating triangle meshes with decoder-only transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 19615–19625, 2024

work page 2024
[24]

Meshanything: Artist-created mesh generation with autoregressive transformers

Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, and Chi Zhang. Meshanything: Artist-created mesh generation with autoregressive transformers. InInternational Conference on Learning Representations (ICLR), 2025

work page 2025
[25]

Deepmesh: Auto-regressive artist-mesh creation with reinforcement learning

Ruowen Zhao, Junliang Ye, Zhengyi Wang, Guangce Liu, Yiwen Chen, Yikai Wang, and Jun Zhu. Deepmesh: Auto-regressive artist-mesh creation with reinforcement learning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10612–10623, 2025

work page 2025
[26]

Hunyuan3d studio: End-to-end ai pipeline for game-ready 3d asset generation.arXiv preprint arXiv:2509.12815, 2025

Biwen Lei, Yang Li, Xinhai Liu, Shuhui Yang, Lixin Xu, Jingwei Huang, Ruining Tang, Haohan Weng, Jian Liu, Jing Xu, et al. Hunyuan3d studio: End-to-end ai pipeline for game-ready 3d asset generation. arXiv preprint arXiv:2509.12815, 2025

work page arXiv 2025
[27]

Structured 3d latents for scalable and versatile 3d generation

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21469– 21480, 2025

work page 2025
[28]

Quadgpt: Native quadrilateral mesh generation with autoregressive models

Jian Liu, Chunshi Wang, Song Guo, Haohan Weng, Zhen Zhou, Zhiqi Li, Jiaao Yu, Yiling Zhu, Jing Xu, Biwen Lei, et al. Quadgpt: Native quadrilateral mesh generation with autoregressive models. arXiv preprint arXiv:2509.21420, 2025

work page arXiv 2025
[29]

Auto-regressive surface cutting.arXiv preprint arXiv:2506.18017, 2025

Yang Li, Victor Cheung, Xinhai Liu, Yuguang Chen, Zhongjin Luo, Biwen Lei, Haohan Weng, Zibo Zhao, Jingwei Huang, Zhuo Chen, et al. Auto-regressive surface cutting.arXiv preprint arXiv:2506.18017, 2025

work page arXiv 2025
[30]

Texgen: a generative diffusion model for mesh textures.ACM Trans

Xin Yu, Ze Yuan, Yuan-Chen Guo, Ying-Tian Liu, Jianhui Liu, Yangguang Li, Yan-Pei Cao, Ding Liang, and Xiaojuan Qi. Texgen: a generative diffusion model for mesh textures.ACM Trans. Graph., 43(6):213:1–213:14, 2024. 24

work page 2024
[31]

Materialmvp: Illumination-invariant material generation via multi-view pbr diffusion

Zebin He, Mingxin Yang, Shuhui Yang, Yixuan Tang, Tao Wang, Kaihao Zhang, Guanying Chen, Yuhong Liu, Jie Jiang, Chunchao Guo, et al. Materialmvp: Illumination-invariant material generation via multi-view pbr diffusion. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 26294–26305, 2025

work page 2025
[32]

Rignet: neural rigging for articulated characters.ACM Trans

Zhan Xu, Yang Zhou, Evangelos Kalogerakis, Chris Landreth, and Karan Singh. Rignet: neural rigging for articulated characters.ACM Trans. Graph., 39(4):58, 2020

work page 2020
[33]

Skinningnet: Two-stream graph convolutional neural network for skinning prediction of synthetic characters

Albert Mosella-Montoro and Javier Ruiz-Hidalgo. Skinningnet: Two-stream graph convolutional neural network for skinning prediction of synthetic characters. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18593–18602, 2022

work page 2022
[34]

Native and compact structured latents for 3d generation.arXiv preprint arXiv:2512.14692, 2025

Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, and Jiaolong Yang. Native and compact structured latents for 3d generation.CoRR, abs/2512.14692, 2025

work page arXiv 2025
[35]

Physcene: Physically interactable 3d scene synthesis for embodied ai

Yandan Yang, Baoxiong Jia, Peiyuan Zhi, and Siyuan Huang. Physcene: Physically interactable 3d scene synthesis for embodied ai. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16262–16272, 2024

work page 2024
[36]

Pose2room: understanding 3d scenes from human activities

Yinyu Nie, Angela Dai, Xiaoguang Han, and Matthias Nießner. Pose2room: understanding 3d scenes from human activities. InEuropean Conference on Computer Vision (ECCV), pages 425–443, 2022

work page 2022
[37]

Mime: Human-aware 3d scene generation

Hongwei Yi, Chun-Hao P Huang, Shashank Tripathi, Lea Hering, Justus Thies, and Michael J Black. Mime: Human-aware 3d scene generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12965–12976, 2023

work page 2023
[38]

Recent advances in 3d object and scene generation: A survey,

Xiang Tang, Ruotong Li, and Xiaopeng Fan. Recent advances in 3d object and scene generation: A survey.arXiv preprint arXiv:2504.11734, 2025

work page arXiv 2025
[39]

Kaisei Fukaya, Damon Daylamani-Zad, and Harry W. Agius. Intelligent generation of graphical game assets: A conceptual framework and systematic review of the state of the art.ACM Comput. Surv., 57(5):118:1–118:38, 2025

work page 2025
[40]

Ai-generated content (AIGC) for various data modali- ties: A survey.ACM Comput

Lin Geng Foo, Hossein Rahmani, and Jun Liu. Ai-generated content (AIGC) for various data modali- ties: A survey.ACM Comput. Surv., 57(9):243:1–243:66, 2025

work page 2025
[41]

arXiv preprint arXiv:2505.05474 (2025)

Beichen Wen, Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, and Ziwei Liu. 3d scene generation: A survey.arXiv preprint arXiv:2505.05474, 2025

work page arXiv 2025
[42]

ShapeNet: An Information-Rich 3D Model Repository

Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository.arXiv preprint arXiv:1512.03012, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[43]

Objaverse: A universe of annotated 3d objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13142–13153, 2023

work page 2023
[44]

Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Information Processing Systems, 36:35799–35813, 2023

Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Information Processing Systems, 36:35799–35813, 2023

work page 2023
[45]

Smpl: A skinned multi-person linear model.ACM Trans

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. Smpl: A skinned multi-person linear model.ACM Trans. Graph., 34(6), 2015

work page 2015
[46]

3d-front: 3d furnished rooms with layouts and semantics

Huan Fu, Bowen Cai, Lin Gao, Ling-Xiao Zhang, Jiaming Wang, Cao Li, Qixun Zeng, Chengyue Sun, Rongfei Jia, Binqiang Zhao, et al. 3d-front: 3d furnished rooms with layouts and semantics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10933– 10942, 2021. 25

work page 2021
[47]

ATISS: autoregressive transformers for indoor scene synthesis

Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, and Sanja Fidler. ATISS: autoregressive transformers for indoor scene synthesis. InAdvances in Neural Information Processing Systems (NeurIPS), pages 12013–12026, 2021

work page 2021
[48]

Physgen3d: Crafting a miniature interactive world from a single image

Boyuan Chen, Hanxiao Jiang, Shaowei Liu, Saurabh Gupta, Yunzhu Li, Hao Zhao, and Shenlong Wang. Physgen3d: Crafting a miniature interactive world from a single image. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6178–6189, 2025

work page 2025
[49]

Citycraft: A real crafter for 3d city generation

JieDeng, WenhaoChai, JunshengHuang, ZhonghanZhao, QixuanHuang, MingyanGao, JianshuGuo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, et al. Citycraft: A real crafter for 3d city generation. arXiv preprint arXiv:2406.04983, 2024

work page arXiv 2024
[50]

Unrealllm: Towards highly controllable and interactable 3d scene generation by llm-powered procedural content generation

Song Tang, Kaiyong Zhao, Lei Wang, Yuliang Li, Xuebo Liu, Junyi Zou, Qiang Wang, and Xiaowen Chu. Unrealllm: Towards highly controllable and interactable 3d scene generation by llm-powered procedural content generation. InFindings of the Association for Computational Linguistics: ACL 2025, pages 19417–19435, 2025. doi: 10.18653/v1/2025.findings-acl.994

work page doi:10.18653/v1/2025.findings-acl.994 2025
[51]

Deep learning for 3d point clouds: A survey.IEEE Trans

Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. Deep learning for 3d point clouds: A survey.IEEE Trans. Pattern Anal. Mach. Intell., 43(12):4338–4364, 2020

work page 2020
[52]

Pointnet: Deep learning on point sets for 3d classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017

work page 2017
[53]

3d shapenets: A deep representation for volumetric shapes

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1912–1920, 2015

work page 1912
[54]

Voxnet: A 3d convolutional neural network for real-time object recognition

Daniel Maturana and Sebastian Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. InProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 922–928, 2015

work page 2015
[55]

Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs

Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. InProceedings of the IEEE international conference on computer vision, pages 2088–2096, 2017

work page 2088
[56]

Octnet: Learning deep 3d representations at high resolutions

Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. Octnet: Learning deep 3d representations at high resolutions. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3577–3586, 2017

work page 2017
[57]

Instant neural graphics primi- tives with a multiresolution hash encoding.ACM Trans

Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primi- tives with a multiresolution hash encoding.ACM Trans. Graph., 41(4):1–15, 2022

work page 2022
[58]

Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies

Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, and Francis Williams. Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4209–4219, 2024

work page 2024
[59]

A survey on deep learning advances on different 3d data representations.arXiv preprint arXiv:1808.01462, 2018

Eman Ahmed, Alexandre Saint, Abd El Rahman Shabayek, Kseniya Cherenkova, Rig Das, Gleb Gusev, Djamila Aouada, and Bjorn Ottersten. A survey on deep learning advances on different 3d data representations.arXiv preprint arXiv:1808.01462, 2018

work page arXiv 2018
[60]

Geometric deep learning on graphs and manifolds using mixture model cnns

Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5115–5124, 2017

work page 2017
[61]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InEuropean Conference on Computer Vision (ECCV), pages 405–421, 2020. 26

work page 2020
[62]

Mip-nerf 360: Unbounded anti-aliased neural radiance fields

Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5470–5479, 2022

work page 2022
[63]

Zip- nerf: Anti-aliased grid-based neural radiance fields

Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Zip- nerf: Anti-aliased grid-based neural radiance fields. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 19697–19705, 2023

work page 2023
[64]

Deepsdf: Learning continuous signed distance functions for shape representation

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning continuous signed distance functions for shape representation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 165–174, 2019

work page 2019
[65]

Oc- cupancy networks: Learning 3d reconstruction in function space

Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. Oc- cupancy networks: Learning 3d reconstruction in function space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4460–4470, 2019

work page 2019
[66]

Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction

Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. InAdvances in Neural Information Processing Systems (NeurIPS), pages 27171–27183, 2021

work page 2021
[67]

Volume rendering of neural implicit surfaces

Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. Volume rendering of neural implicit surfaces. Advances in neural information processing systems, 34:4805–4815, 2021

work page 2021
[68]

Lorensen and Harvey E

William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. InACM SIGGRAPH Conference Proceedings, pages 163–169, 1987

work page 1987
[69]

Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting

Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1606–1616, 2024

work page 2024
[70]

Mega: Hybrid mesh-gaussian head avatar for high-fidelity rendering and head editing

Cong Wang, Di Kang, Heyi Sun, Shenhan Qian, Zixuan Wang, Linchao Bao, and Song-Hai Zhang. Mega: Hybrid mesh-gaussian head avatar for high-fidelity rendering and head editing. InProceedings of the computer vision and pattern recognition conference, pages 26274–26284, 2025

work page 2025
[71]

Deep marching tetrahe- dra: a hybrid representation for high-resolution 3d shape synthesis.Advances in Neural Information Processing Systems, 34:6087–6101, 2021

Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. Deep marching tetrahe- dra: a hybrid representation for high-resolution 3d shape synthesis.Advances in Neural Information Processing Systems, 34:6087–6101, 2021

work page 2021
[72]

Flexible isosurface extraction for gradient-based mesh optimization.ACM Trans

Tianchang Shen, Jacob Munkberg, Jon Hasselgren, Kangxue Yin, Zian Wang, Wenzheng Chen, Zan Gojcic, Sanja Fidler, Nicholas Sharp, and Jun Gao. Flexible isosurface extraction for gradient-based mesh optimization.ACM Trans. Graph., 42(4), 2023. doi: 10.1145/3592430

work page doi:10.1145/3592430 2023
[73]

Get3d: A generative model of high quality 3d textured shapes learned from images

Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d textured shapes learned from images. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[74]

3d gaussian splatting for real-time radiance field rendering.ACM Trans

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, et al. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023

work page 2023
[75]

Efficient geometry-aware 3d generative adversarial networks

Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, et al. Efficient geometry-aware 3d generative adversarial networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16123–16133, 2022

work page 2022
[76]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. InInternational Conference on Learning Representations (ICLR), 2014. 27

work page 2014
[77]

Generative adversarial nets.Advances in neural information processing systems, 27, 2014

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014

work page 2014
[78]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020
[79]

A point set generation network for 3d object recon- struction from a single image

Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object recon- struction from a single image. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613, 2017

work page 2017
[80]

Pixel2mesh: Generating 3d mesh models from single rgb images

Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. InEuropean Conference on Computer Vision (ECCV), pages 52–67, 2018

work page 2018

Showing first 80 references.