GaussianGrow: Geometry-aware Gaussian Growing from 3D Point Clouds with Text Guidance

Haotian Geng; Junsheng Zhou; Kanle Shi; Shenkun Xu; Weiqi Zhang; Yi Fang; Yu-Shen Liu

arxiv: 2604.05721 · v1 · submitted 2026-04-07 · 💻 cs.CV

GaussianGrow: Geometry-aware Gaussian Growing from 3D Point Clouds with Text Guidance

Weiqi Zhang , Junsheng Zhou , Haotian Geng , Kanle Shi , Shenkun Xu , Yi Fang , Yu-Shen Liu This is my paper

Pith reviewed 2026-05-10 19:12 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D Gaussian Splattingpoint cloud to Gaussiantext-guided generationmulti-view diffusiongeometry-awareinpaintingnovel view synthesis3D reconstruction

0 comments

The pith

GaussianGrow generates 3D Gaussians by growing them from point clouds under text guidance to enforce geometric accuracy from the start.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to build 3D Gaussian models for fast, high-quality rendering by starting with ordinary point clouds and expanding the primitives outward instead of guessing all geometry at once. A multi-view diffusion model supplies consistent appearance signals from the points, while an iterative loop finds the biggest missing areas, renders them, and uses 2D diffusion inpainting to fill them until the model is complete. A sympathetic reader would care because existing generators often fail when their predicted geometries are off, producing blurry or distorted results; tying growth directly to real point data aims to avoid that failure mode. The text prompt steers overall appearance while the input points supply the spatial structure.

Core claim

We introduce GaussianGrow, a novel approach that generates 3D Gaussians by learning to grow them from easily accessible 3D point clouds, naturally enforcing geometric accuracy in Gaussian generation. It uses a text-guided scheme that draws on a multi-view diffusion model for consistent appearance supervision and iteratively detects large un-grown regions to inpaint them with a pretrained 2D diffusion model until the Gaussians are complete.

What carries the argument

Text-guided Gaussian growing scheme that expands primitives from point clouds, supervises them with multi-view diffusion renders, and completes unobserved areas via iterative pose detection plus 2D inpainting.

If this is right

The approach produces complete Gaussian models from both synthetic and real-scanned point clouds.
It avoids fusion artifacts by constraining novel views generated in overlapping regions.
Text guidance controls appearance while point-cloud geometry remains the anchor.
Iterative inpainting handles hard-to-observe regions without breaking overall consistency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same growing-plus-inpainting loop might transfer to other 3D primitives such as surfels or meshes.
Direct conversion of LiDAR or photogrammetry scans into splattable scenes could become simpler.
Robustness checks on noisy or very sparse real-world point clouds would test practical limits.
Adding surface normals or edge constraints from the input points could further tighten accuracy.

Load-bearing premise

The multi-view diffusion model must create appearance supervision that stays geometrically consistent with the input point clouds, and the 2D inpainting step must fill gaps without adding new geometric or visual errors.

What would settle it

Apply the method to a real-scanned point cloud with known ground-truth geometry, then compare rendered novel views against the ground truth to check for visible distortions, floaters, or inconsistencies in the grown Gaussians.

Figures

Figures reproduced from arXiv: 2604.05721 by Haotian Geng, Junsheng Zhou, Kanle Shi, Shenkun Xu, Weiqi Zhang, Yi Fang, Yu-Shen Liu.

**Figure 1.** Figure 1: Left: Diverse shapes generated by GaussianGrow. Right: The Gaussian generation pipeline of GaussianGrow. Reference point clouds can be obtained through large-scale retrieval or sensor scanning, from which Gaussians are grown under text guidance. Abstract 3D Gaussian Splatting has demonstrated superior performance in rendering efficiency and quality, yet the generation of 3D Gaussians still remains a chal… view at source ↗

**Figure 2.** Figure 2: Overview of GaussianGrow. Stage 1. We leverage depth-aware ControlNet for primary view generation, with a geometryaware diffusion model for multi-view synthesis. Additional views are generated for improving appearances in overlap regions by optimizing camera poses to observe overlap regions. Gaussians are optimized to grow with supervision from both cardinal and additional views. Stage 2. We iteratively i… view at source ↗

**Figure 5.** Figure 5: The effectiveness of processing overlap regions. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 3.** Figure 3: We obtain the additional camera poses by optimizing [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: The effect of Gaussian inpainting. Before Overlap Processing After Overlap Processing [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Visual comparison on the Objaverse dataset shows that GaussianGrow uses point clouds instead of meshes. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Text-to-3D comparisons on T3Bench. the retrieve-based setting and the generative-based setting. For the retrieve-based setting, we employ Uni3D [63] to retrieve reference point clouds from the G-Objaverse dataset [35], a carefully curated subset of Objaverse [8], based on the input text prompt. The retrieved point clouds serve as geometric priors that guide the generation process. For a generative-based s… view at source ↗

**Figure 8.** Figure 8: Visual comparison with DreamGaussian and TriplaneGaussian on the task of Point-to-Gaussian. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

read the original abstract

3D Gaussian Splatting has demonstrated superior performance in rendering efficiency and quality, yet the generation of 3D Gaussians still remains a challenge without proper geometric priors. Existing methods have explored predicting point maps as geometric references for inferring Gaussian primitives, while the unreliable estimated geometries may lead to poor generations. In this work, we introduce GaussianGrow, a novel approach that generates 3D Gaussians by learning to grow them from easily accessible 3D point clouds, naturally enforcing geometric accuracy in Gaussian generation. Specifically, we design a text-guided Gaussian growing scheme that leverages a multi-view diffusion model to synthesize consistent appearances from input point clouds for supervision. To mitigate artifacts caused by fusing neighboring views, we constrain novel views generated at non-preset camera poses identified in overlapping regions across different views. For completing the hard-to-observe regions, we propose to iteratively detect the camera pose by observing the largest un-grown regions in point clouds and inpainting them by inpainting the rendered view with a pretrained 2D diffusion model. The process continues until complete Gaussians are generated. We extensively evaluate GaussianGrow on text-guided Gaussian generation from synthetic and even real-scanned point clouds. Project Page: https://weiqi-zhang.github.io/GaussianGrow

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GaussianGrow grows Gaussians iteratively from real point clouds with text-guided diffusion and 2D inpainting, but the lack of 3D constraints on the inpainted regions undercuts the geometric accuracy claim.

read the letter

GaussianGrow grows 3D Gaussians from input point clouds instead of predicted point maps, using a text-guided scheme with multi-view diffusion for supervision and an iterative loop to fill hard-to-see areas via 2D inpainting on the largest un-grown regions. The overlapping-view constraints on novel poses are meant to reduce fusion artifacts. This is a straightforward engineering move that starts from actual geometry rather than inferred maps, which should help in principle for reconstruction tasks. The evaluations on both synthetic and real-scanned clouds fit the setting they describe. The main soft spot is the one the stress-test note flags. Inpainting happens with a pretrained 2D diffusion model on rendered views, with no explicit 3D consistency loss or regularizer that ties the new Gaussians back to the original point cloud positions or scales. The overlapping constraints only apply to preset views, so added primitives in unobserved regions can still drift while producing plausible 2D output. If the full paper includes ablations or 3D error metrics on those completed areas that show the drift stays small, that would address it; the abstract alone does not. The method is aimed at people working on text-to-3D generation or scene reconstruction from incomplete point clouds who already use Gaussian splatting. A reader looking for practical pipeline tweaks would find the description clear enough to try. It deserves peer review because the core pipeline is reproducible from the text and targets a genuine pain point, even if the geometric guarantee needs tighter validation.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces GaussianGrow, a method for generating 3D Gaussians from input 3D point clouds under text guidance. Gaussians are initialized from the point cloud and grown iteratively: a multi-view diffusion model synthesizes consistent appearances for supervision, overlapping-view constraints mitigate fusion artifacts, and unobserved regions are completed by iteratively selecting the camera pose observing the largest un-grown area, rendering the view, inpainting it with a pretrained 2D diffusion model, and continuing until the representation is complete. The central claim is that this growing process from point clouds naturally enforces geometric accuracy in the resulting Gaussians. The work reports evaluations on text-guided generation from both synthetic and real-scanned point clouds.

Significance. If the geometric-accuracy claim is substantiated, the approach would provide a practical route to high-fidelity, efficient 3D Gaussian representations that leverage readily available point-cloud priors, potentially benefiting text-to-3D synthesis, novel-view rendering, and downstream applications in AR/VR. The explicit use of point-cloud initialization and the overlapping-view consistency mechanism are constructive design choices that distinguish the method from purely image-based generation pipelines.

major comments (2)

[Abstract (hard-to-observe region completion paragraph)] Abstract (hard-to-observe region completion paragraph): the iterative inpainting step renders a view and applies a pretrained 2D diffusion model without any described 3D consistency loss, multi-view geometric regularizer, or back-projection constraint that ties the inpainted content to the original input point cloud. Because the added Gaussians are optimized only against the 2D inpainted image, their 3D positions, scales, or orientations can drift while still producing plausible 2D appearances, directly undermining the claim that the growing scheme 'naturally enforces geometric accuracy' for the completed regions.
[Abstract (evaluation statement)] Abstract (evaluation statement): the manuscript states that GaussianGrow is 'extensively evaluate[d]' on synthetic and real-scanned point clouds, yet the provided text contains no quantitative metrics, ablation tables, or baseline comparisons that would allow verification of improved geometric fidelity relative to prior point-map or diffusion-based Gaussian generators.

minor comments (2)

The term 'growing' and the precise update rule for adding new Gaussians from inpainted views are introduced only descriptively; an early formal definition or pseudocode block would improve clarity.
[Abstract] The abstract would benefit from naming the concrete metrics (e.g., PSNR, Chamfer distance, or LPIPS) and the number of scenes used in the reported evaluations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our work. We provide detailed responses to each major comment below and have made revisions to the manuscript to address the concerns raised.

read point-by-point responses

Referee: [Abstract (hard-to-observe region completion paragraph)] Abstract (hard-to-observe region completion paragraph): the iterative inpainting step renders a view and applies a pretrained 2D diffusion model without any described 3D consistency loss, multi-view geometric regularizer, or back-projection constraint that ties the inpainted content to the original input point cloud. Because the added Gaussians are optimized only against the 2D inpainted image, their 3D positions, scales, or orientations can drift while still producing plausible 2D appearances, directly undermining the claim that the growing scheme 'naturally enforces geometric accuracy' for the completed regions.

Authors: We appreciate the referee's careful reading and the valid point regarding the inpainting of hard-to-observe regions. The current description focuses on the 2D inpainting step, but the Gaussians are optimized in 3D space using the multi-view diffusion model for supervision, which provides consistent appearances across multiple views. The overlapping-view constraints further help to maintain geometric consistency by identifying and constraining novel views in overlapping regions. Nevertheless, we acknowledge that an explicit 3D consistency loss or back-projection for the inpainted content is not detailed. To strengthen the manuscript, we have revised the method description to include how the inpainted 2D content is used to grow 3D Gaussians with constraints from the existing point cloud structure and multi-view consistency. We have also adjusted the abstract to reflect that geometric accuracy is naturally enforced from the input point cloud for observed areas, with the inpainting providing completion under these constraints. This addresses the concern without misrepresenting the approach. revision: yes
Referee: [Abstract (evaluation statement)] Abstract (evaluation statement): the manuscript states that GaussianGrow is 'extensively evaluate[d]' on synthetic and real-scanned point clouds, yet the provided text contains no quantitative metrics, ablation tables, or baseline comparisons that would allow verification of improved geometric fidelity relative to prior point-map or diffusion-based Gaussian generators.

Authors: We are sorry if the text provided to the referee did not include the full experimental details. The complete manuscript contains Section 4 'Experiments' which provides extensive quantitative evaluations on synthetic point clouds, including metrics for geometric accuracy (e.g., Chamfer distance to ground truth) and rendering quality (PSNR, SSIM, LPIPS), along with ablation studies on the components of the growing scheme and comparisons to baselines such as point-map prediction methods and other text-to-3D Gaussian approaches. For real-scanned point clouds, we include qualitative results and user preference studies. These are presented in tables and figures to substantiate the claims. We have verified that all evaluation content is present and clearly referenced in the revised manuscript. revision: no

Circularity Check

0 steps flagged

No circularity: pipeline uses external pretrained models and input point clouds

full rationale

The paper presents a procedural method that initializes Gaussians from given 3D point clouds and iteratively grows them using supervision from a multi-view diffusion model plus 2D inpainting on rendered views. No equations, fitted parameters, or self-citations are shown that reduce any claimed prediction or geometric enforcement result to the inputs by construction. The central claim of 'naturally enforcing geometric accuracy' rests on the external initialization and diffusion components rather than an internal self-referential loop, making the derivation self-contained against those benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the performance of two external pretrained diffusion models and the assumption that growing from point clouds inherently enforces geometry; no free parameters are explicitly introduced in the abstract.

axioms (2)

domain assumption Pretrained multi-view diffusion models produce consistent appearances from point clouds suitable for Gaussian supervision
Invoked to provide supervision signals in the growing scheme.
domain assumption 2D diffusion inpainting of rendered views accurately fills unobserved regions without geometric distortion
Used to complete hard-to-observe areas during iterative pose selection.

pith-pipeline@v0.9.0 · 5540 in / 1268 out tokens · 40723 ms · 2026-05-10T19:12:13.868883+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We initialize each Gaussian center at the corresponding point position in the input cloud... optimize a neural Unsigned Distance Field (UDF) from P using CAP-UDF... compute normals N through gradient prediction
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

iteratively detect the camera pose... inpainting the rendered view with a pretrained 2D diffusion model

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages

[1]

Meta 3D TextureGen: Fast and consistent texture generation for 3D objects,

Raphael Bensadoun, Yanir Kleiman, Idan Azuri, Omri Harosh, Andrea Vedaldi, Natalia Neverova, and Oran Gafni. Meta 3D TextureGen: Fast and consistent texture generation for 3d objects.arXiv preprint arXiv:2407.02430, 2024. 3

work page arXiv 2024
[2]

The Ball-Pivoting Al- gorithm for Surface Reconstruction.IEEE Transactions on Visualization and Computer Graphics, 5(4):349–359, 1999

Fausto Bernardini, Joshua Mittleman, Holly Rushmeier, Cl´audio Silva, and Gabriel Taubin. The Ball-Pivoting Al- gorithm for Surface Reconstruction.IEEE Transactions on Visualization and Computer Graphics, 5(4):349–359, 1999. 7

work page 1999
[3]

Demystifying mmd gans

Mikołaj Bi´nkowski, Danica J Sutherland, Michael Arbel, and Arthur Gretton. Demystifying mmd gans. InInternational Conference on Learning Representations (ICLR), 2018. 6

work page 2018
[4]

Texfusion: Synthesizing 3D textures with text-guided image diffusion models

Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, and Kangxue Yin. Texfusion: Synthesizing 3D textures with text-guided image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4169–4181, 2023. 3

work page 2023
[5]

Text2Tex: Text-driven Tex- ture Synthesis via Diffusion Models

Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, and Matthias Nießner. Text2Tex: Text-driven Tex- ture Synthesis via Diffusion Models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18558–18568, 2023. 2, 3, 6

work page 2023
[6]

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin Fu, Gang Yu, Ziwei Liu, et al. MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 585–594,

work page
[7]

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexan- der G Schwing, and Liang-Yan Gui. SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 4456–4465, 2023. 2

work page 2023
[8]

Objaverse: A Universe of Annotated 3D Objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A Universe of Annotated 3D Objects. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13142–13153, 2023. 6, 7

work page 2023
[9]

MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer

Juntong Fang, Zequn Chen, Weiqi Zhang, Donglin Di, Xuancheng Zhang, Chengmin Yang, and Yu-Shen Liu. MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026. 3

work page 2026
[10]

GVGEN: Text-to-3D Generation with V olumet- ric Representation

Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yang- guang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, and Tong He. GVGEN: Text-to-3D Generation with V olumet- ric Representation. InEuropean Conference on Computer Vision, 2024. 1, 3, 7

work page 2024
[11]

T3bench: Benchmarking current progress in text-to-3d gen- eration, 2023

Yuze He, Yushi Bai, Matthieu Lin, Wang Zhao, Yubin Hu, Jenny Sheng, Ran Yi, Juanzi Li, and Yong-Jin Liu. T3bench: Benchmarking current progress in text-to-3d gen- eration, 2023. 7

work page 2023
[12]

Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020. 2, 3

work page 2020
[13]

3dtopia: Large text-to-3d generation model with hybrid diffusion priors

Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Shuai Yang, Tengfei Wang, Liang Pan, Dahua Lin, et al. 3DTopia: Large Text-to-3D Genera- tion Model with Hybrid Diffusion Priors.arXiv preprint arXiv:2403.02234, 2024. 7

work page arXiv 2024
[14]

LRM: Large Reconstruction Model for Single Image to 3D

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large Reconstruction Model for Single Image to 3D. InInternational Conference on Learning Representa- tions (ICLR), 2024. 3

work page 2024
[15]

2D Gaussian Splatting for Geometrically Ac- curate Radiance Fields

Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2D Gaussian Splatting for Geometrically Ac- curate Radiance Fields. InSIGGRAPH 2024 Conference Pa- pers. Association for Computing Machinery, 2024. 3

work page 2024
[16]

TexGen: Text-Guided 3D Texture Generation with Multi- view Sampling and Resampling

Dong Huo, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, and Yee-Hong Yang. TexGen: Text-Guided 3D Texture Generation with Multi- view Sampling and Resampling. InEuropean Conference on Computer Vision, pages 352–368. Springer, 2024. 3

work page 2024
[17]

FlexiTex: Enhancing Tex- ture Generation via Visual Guidance

DaDong Jiang, Xianghui Yang, Zibo Zhao, Sheng Zhang, Jiaao Yu, Zeqiang Lai, Shaoxiong Yang, Chunchao Guo, Xiaobo Zhou, and Zhihui Ke. FlexiTex: Enhancing Tex- ture Generation via Visual Guidance. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3967– 3975, 2025. 3

work page 2025
[18]

3D Gaussian Splatting for Real-Time Radiance Field Rendering .ACM Transactions on Graphics, 42(4):1–14, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D Gaussian Splatting for Real-Time Radiance Field Rendering .ACM Transactions on Graphics, 42(4):1–14, 2023. 1, 2, 3

work page 2023
[19]

The role of imagenet classes in fr´echet inception distance

Tuomas Kynk ¨a¨anniemi, Tero Karras, Miika Aittala, Timo Aila, and Jaakko Lehtinen. The role of imagenet classes in fr´echet inception distance. InInternational Conference on Learning Representations, 2023. 6

work page 2023
[20]

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, and Chen Change Loy. LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation. InEuropean Conference on Com- puter Vision, pages 112–130. Springer, 2024. 7

work page 2024
[21]

DiffSplat: Repurposing Image Diffusion Models for Scalable 3D Gaussian Splat Generation

Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, and Yadong Mu. DiffSplat: Repurposing Image Diffusion Models for Scalable 3D Gaussian Splat Generation. InIn- ternational Conference on Learning Representations (ICLR),

work page
[22]

Magic3D: High- Resolution Text-to-3D Content Creation

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fi- dler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3D: High- Resolution Text-to-3D Content Creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023. 2

work page 2023
[23]

TexOct: Generating Textures of 3D Models with Octree-based Diffusion

Jialun Liu, Chenming Wu, Xinqi Liu, Xing Liu, Jinbo Wu, Haotian Peng, Chen Zhao, Haocheng Feng, Jingtuo Liu, and Errui Ding. TexOct: Generating Textures of 3D Models with Octree-based Diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4284–4293, 2024. 3

work page 2024
[24]

DIRECT-3D: Learning Direct Text-to-3D Gen- eration on Massive Noisy 3D Data

Qihao Liu, Yi Zhang, Song Bai, Adam Kortylewski, and Alan Yuille. DIRECT-3D: Learning Direct Text-to-3D Gen- eration on Massive Noisy 3D Data. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6881–6891, 2024. 7

work page 2024
[25]

Text-Guided Texturing by Synchronized Multi-View Diffu- sion

Yuxin Liu, Minshan Xie, Hanyuan Liu, and Tien-Tsin Wong. Text-Guided Texturing by Synchronized Multi-View Diffu- sion. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024. 3, 6

work page 2024
[26]

Large Point-to-Gaussian Model for Image-to-3D Generation

Longfei Lu, Huachen Gao, Tao Dai, Yaohua Zha, Zhi Hou, Junta Wu, and Shu-Tao Xia. Large Point-to-Gaussian Model for Image-to-3D Generation. InProceedings of the 32nd ACM International Conference on Multimedia, pages 10843–10852, 2024. 2, 3

work page 2024
[27]

GeoDream: Disentan- gling 2D and Geometric Priors for High-Fidelity and Consis- tent 3D Generation.arXiv preprint arXiv:2311.17971, 2023

Baorui Ma, Haoge Deng, Junsheng Zhou, Yu-Shen Liu, Tiejun Huang, and Xinlong Wang. GeoDream: Disentan- gling 2D and Geometric Priors for High-Fidelity and Consis- tent 3D Generation.arXiv preprint arXiv:2311.17971, 2023. 2

work page arXiv 2023
[28]

Latent-NeRF for Shape-Guided Gen- eration of 3D Shapes and Textures

Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. Latent-NeRF for Shape-Guided Gen- eration of 3D Shapes and Textures. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12663–12673, 2023. 2

work page 2023
[29]

DiffRF: Rendering-Guided 3D Radiance Field Diffusion

Norman M ¨uller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulo, Peter Kontschieder, and Matthias Nießner. DiffRF: Rendering-Guided 3D Radiance Field Diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4328–4338, 2023. 2

work page 2023
[30]

Improved Denoising Diffusion Probabilistic Models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved Denoising Diffusion Probabilistic Models. InInternational Conference on Machine Learning, pages 8162–8171. PMLR,

work page
[31]

MultiPull: Detailing Signed Distance Functions by Pulling Multi-Level Queries at Multi- Step

Takeshi Noda, Chao Chen, Weiqi Zhang, Xinhai Liu, Yu- Shen Liu, and Zhizhong Han. MultiPull: Detailing Signed Distance Functions by Pulling Multi-Level Queries at Multi- Step. InAdvances in Neural Information Processing Sys- tems, pages 13404–13429. Curran Associates, Inc., 2024. 3

work page 2024
[32]

Learning Bijective Sur- face Parameterization for Inferring Signed Distance Func- tions from Sparse Point Clouds with Grid Deformation

Takeshi Noda, Chao Chen, Junsheng Zhou, Weiqi Zhang, Yu-Shen Liu, and Zhizhong Han. Learning Bijective Sur- face Parameterization for Inferring Signed Distance Func- tions from Sparse Point Clouds with Grid Deformation. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 22139–22149, 2025. 3

work page 2025
[33]

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

Michael Oechsle, Songyou Peng, and Andreas Geiger. UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction. InInternational Con- ference on Computer Vision (ICCV), 2021. 3

work page 2021
[34]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. DreamFusion: Text-to-3D using 2D Diffusion. InIn- ternational Conference on Learning Representations, 2023. 2, 8

work page 2023
[35]

Richdreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text- to-3D

Lingteng Qiu, Guanying Chen, Xiaodong Gu, Qi Zuo, Mu- tian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, and Xiaoguang Han. Richdreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text- to-3D. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9914–9925,

work page
[36]

Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InInternational Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 6, 7

work page 2021
[37]

DreamBooth3D: Subject-Driven Text-to-3D Generation

Amit Raj, Srinivas Kaza, Ben Poole, Michael Niemeyer, Nataniel Ruiz, Ben Mildenhall, Shiran Zada, Kfir Aberman, Michael Rubinstein, Jonathan Barron, et al. DreamBooth3D: Subject-Driven Text-to-3D Generation. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 2349–2359, 2023. 2

work page 2023
[38]

TEXTure: Text-Guided Texturing of 3D Shapes

Elad Richardson, Gal Metzer, Yuval Alaluf, Raja Giryes, and Daniel Cohen-Or. TEXTure: Text-Guided Texturing of 3D Shapes. InACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023. 3, 6

work page 2023
[39]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022. 5, 6

work page 2022
[40]

3D Neural Field Generation using Triplane Diffusion

J Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Ji- ajun Wu, and Gordon Wetzstein. 3D Neural Field Generation using Triplane Diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20875–20886, 2023. 2

work page 2023
[41]

DreamCraft3D: Hierarchi- cal 3D Generation with Bootstrapped Diffusion Prior

Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, and Yebin Liu. DreamCraft3D: Hierarchi- cal 3D Generation with Bootstrapped Diffusion Prior . InIn- ternational Conference on Learning Representations (ICLR),

work page
[42]

Lgm: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation. In European Conference on Computer Vision, pages 1–18. Springer, 2024. 7

work page 2024
[43]

InTeX: Interactive text-to-texture synthesis via unified depth-aware inpainting,

Jiaxiang Tang, Ruijie Lu, Xiaokang Chen, Xiang Wen, Gang Zeng, and Ziwei Liu. Intex: Interactive text-to-texture syn- thesis via unified depth-aware inpainting.arXiv preprint arXiv:2403.11878, 2024. 3

work page arXiv 2024
[44]

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. InInternational Conference on Learning Representations, 2024. 8

work page 2024
[45]

V olumediffu- sion: Flexible text-to-3d generation with efficient volumetric encoder.arXiv preprint arXiv:2312.11459, 2023

Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, and Baining Guo. V olumeDif- fusion: Flexible Text-to-3D Generation with Efficient V olu- metric Encoder.arXiv preprint arXiv:2312.11459, 2023. 2

work page arXiv 2023
[46]

Hunyuan3D 2.0: Scaling Diffu- sion Models for High Resolution Textured 3D Assets Gener- ation, 2025

Tencent Hunyuan3D Team. Hunyuan3D 2.0: Scaling Diffu- sion Models for High Resolution Textured 3D Assets Gener- ation, 2025. 3

work page 2025
[47]

Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, et al. Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4563–4573, 2023. 2

work page 2023
[48]

ProlificDreamer: High- Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Information Process- ing Systems, 36, 2024

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongx- uan Li, Hang Su, and Jun Zhu. ProlificDreamer: High- Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Information Process- ing Systems, 36, 2024. 2

work page 2024
[49]

Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds

Xiaoyu Xiang, Liat Sless Gorelik, Omri Armstrong Yuchen Fan, Forrest Iandola, Yilei Li, Ita Lifshitz, and Rakesh Ranjan. Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,

work page
[50]

TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting

Bojun Xiong, Jialun Liu, Jiakui Hu, Chenming Wu, Jinbo Wu, Xing Liu, Chen Zhao, Errui Ding, and Zhouhui Lian. TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 551–561, 2025. 3

work page 2025
[51]

ImageReward: Learning and Evaluating Human Preferences for Text-to- Image Generation.Advances in Neural Information Process- ing Systems, 36:15903–15935, 2023

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. ImageReward: Learning and Evaluating Human Preferences for Text-to- Image Generation.Advances in Neural Information Process- ing Systems, 36:15903–15935, 2023. 7

work page 2023
[52]

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to- Image Diffusion Models

Jiale Xu, Xintao Wang, Weihao Cheng, Yan-Pei Cao, Ying Shan, Xiaohu Qie, and Shenghua Gao. Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to- Image Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20908–20918, 2023. 2

work page 2023
[53]

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, and Gordon Wet- zstein. GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation . InEuropean Conference on Computer Vision. Springer, 2024. 3, 7

work page 2024
[54]

xatlas: A Library for Mesh Parameteriza- tion

Jonathan Young. xatlas: A Library for Mesh Parameteriza- tion. GitHub repository, 2018. 7

work page 2018
[55]

Texture Generation on 3D Meshes with Point- UV Diffusion

Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Zhengzhe Liu, and Xiaojuan Qi. Texture Generation on 3D Meshes with Point- UV Diffusion. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 4206–4216,

work page
[56]

TEXGen: a Generative Diffusion Model for Mesh Tex- tures.ACM Transactions on Graphics (TOG), 43(6):1–14,

Xin Yu, Ze Yuan, Yuan-Chen Guo, Ying-Tian Liu, Jianhui Liu, Yangguang Li, Yan-Pei Cao, Ding Liang, and Xiaojuan Qi. TEXGen: a Generative Diffusion Model for Mesh Tex- tures.ACM Transactions on Graphics (TOG), 43(6):1–14,

work page
[57]

Paint3D: Paint Anything 3D with Lighting-Less Texture Dif- fusion Models

Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, and Gang Yu. Paint3D: Paint Anything 3D with Lighting-Less Texture Dif- fusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4252– 4262, 2024. 2, 3, 6

work page 2024
[58]

GaussianCube: Structuring Gaussian Splatting using Opti- mal Transport for 3D Generative Modeling

Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, and Baining Guo. GaussianCube: Structuring Gaussian Splatting using Opti- mal Transport for 3D Generative Modeling. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 1, 3

work page 2024
[59]

Adding Conditional Control to Text-to-Image Diffusion Models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding Conditional Control to Text-to-Image Diffusion Models . In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023. 3, 5, 6

work page 2023
[60]

MaterialRefGS: Reflective gaussian splatting with multi-view consistent material infer- ence

Wenyuan Zhang, Jimin Tang, Weiqi Zhang, Yi Fang, Yu- Shen Liu, and Zhizhong Han. MaterialRefGS: Reflective gaussian splatting with multi-view consistent material infer- ence. InAdvances in Neural Information Processing Sys- tems, 2025. 3

work page 2025
[61]

GAP: Gaussianize Any Point Clouds with Text Guidance

Weiqi Zhang, Junsheng Zhou, Haotian Geng, Wenyuan Zhang, and Yu-Shen Liu. GAP: Gaussianize Any Point Clouds with Text Guidance. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025. 6

work page 2025
[62]

Learning Consistency-Aware Unsigned Dis- tance Functions Progressively from Raw Point Clouds

Junsheng Zhou, Baorui Ma, Yu-Shen Liu, Yi Fang, and Zhizhong Han. Learning Consistency-Aware Unsigned Dis- tance Functions Progressively from Raw Point Clouds. In Advances in Neural Information Processing Systems, pages 16481–16494. Curran Associates, Inc., 2022. 3, 7

work page 2022
[63]

Uni3D: Exploring Uni- fied 3D Representation at Scale

Junsheng Zhou, Jinsheng Wang, Baorui Ma, Yu-Shen Liu, Tiejun Huang, and Xinlong Wang. Uni3D: Exploring Uni- fied 3D Representation at Scale. InInternational Conference on Learning Representations, pages 46766–46782, 2024. 2, 7

work page 2024
[64]

DiffGS: Functional Gaussian Splatting Diffusion

Junsheng Zhou, Weiqi Zhang, and Yu-Shen Liu. DiffGS: Functional Gaussian Splatting Diffusion. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 1, 3

work page 2024
[65]

UDiFF: Generating Condi- tional Unsigned Distance Fields with Optimal Wavelet Dif- fusion

Junsheng Zhou, Weiqi Zhang, Baorui Ma, Kanle Shi, Yu- Shen Liu, and Zhizhong Han. UDiFF: Generating Condi- tional Unsigned Distance Fields with Optimal Wavelet Dif- fusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21496– 21506, 2024. 2

work page 2024
[66]

GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance

Jingqiu Zhou, Lue Fan, Xuesong Chen, Linjiang Huang, Si Liu, and Hongsheng Li. GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 10788–10796, 2025. 3

work page 2025
[67]

UDFStudio: A Unified Frame- work of Datasets, Benchmarks and Generative Models for Unsigned Distance Functions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

Junsheng Zhou, Weiqi Zhang, Baorui Ma, Kanle Shi, Yu- Shen Liu, and Zhizhong Han. UDFStudio: A Unified Frame- work of Datasets, Benchmarks and Generative Models for Unsigned Distance Functions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026. 3

work page 2026
[68]

Triplane Meets Gaussian Splatting: Fast and Generalizable Single- View 3D Reconstruction with Transformers

Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane Meets Gaussian Splatting: Fast and Generalizable Single- View 3D Reconstruction with Transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 10324–10335, 2024. 2, 3, 8

work page 2024

[1] [1]

Meta 3D TextureGen: Fast and consistent texture generation for 3D objects,

Raphael Bensadoun, Yanir Kleiman, Idan Azuri, Omri Harosh, Andrea Vedaldi, Natalia Neverova, and Oran Gafni. Meta 3D TextureGen: Fast and consistent texture generation for 3d objects.arXiv preprint arXiv:2407.02430, 2024. 3

work page arXiv 2024

[2] [2]

The Ball-Pivoting Al- gorithm for Surface Reconstruction.IEEE Transactions on Visualization and Computer Graphics, 5(4):349–359, 1999

Fausto Bernardini, Joshua Mittleman, Holly Rushmeier, Cl´audio Silva, and Gabriel Taubin. The Ball-Pivoting Al- gorithm for Surface Reconstruction.IEEE Transactions on Visualization and Computer Graphics, 5(4):349–359, 1999. 7

work page 1999

[3] [3]

Demystifying mmd gans

Mikołaj Bi´nkowski, Danica J Sutherland, Michael Arbel, and Arthur Gretton. Demystifying mmd gans. InInternational Conference on Learning Representations (ICLR), 2018. 6

work page 2018

[4] [4]

Texfusion: Synthesizing 3D textures with text-guided image diffusion models

Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, and Kangxue Yin. Texfusion: Synthesizing 3D textures with text-guided image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4169–4181, 2023. 3

work page 2023

[5] [5]

Text2Tex: Text-driven Tex- ture Synthesis via Diffusion Models

Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, and Matthias Nießner. Text2Tex: Text-driven Tex- ture Synthesis via Diffusion Models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18558–18568, 2023. 2, 3, 6

work page 2023

[6] [6]

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin Fu, Gang Yu, Ziwei Liu, et al. MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 585–594,

work page

[7] [7]

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexan- der G Schwing, and Liang-Yan Gui. SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 4456–4465, 2023. 2

work page 2023

[8] [8]

Objaverse: A Universe of Annotated 3D Objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A Universe of Annotated 3D Objects. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13142–13153, 2023. 6, 7

work page 2023

[9] [9]

MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer

Juntong Fang, Zequn Chen, Weiqi Zhang, Donglin Di, Xuancheng Zhang, Chengmin Yang, and Yu-Shen Liu. MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026. 3

work page 2026

[10] [10]

GVGEN: Text-to-3D Generation with V olumet- ric Representation

Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yang- guang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, and Tong He. GVGEN: Text-to-3D Generation with V olumet- ric Representation. InEuropean Conference on Computer Vision, 2024. 1, 3, 7

work page 2024

[11] [11]

T3bench: Benchmarking current progress in text-to-3d gen- eration, 2023

Yuze He, Yushi Bai, Matthieu Lin, Wang Zhao, Yubin Hu, Jenny Sheng, Ran Yi, Juanzi Li, and Yong-Jin Liu. T3bench: Benchmarking current progress in text-to-3d gen- eration, 2023. 7

work page 2023

[12] [12]

Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020. 2, 3

work page 2020

[13] [13]

3dtopia: Large text-to-3d generation model with hybrid diffusion priors

Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Shuai Yang, Tengfei Wang, Liang Pan, Dahua Lin, et al. 3DTopia: Large Text-to-3D Genera- tion Model with Hybrid Diffusion Priors.arXiv preprint arXiv:2403.02234, 2024. 7

work page arXiv 2024

[14] [14]

LRM: Large Reconstruction Model for Single Image to 3D

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large Reconstruction Model for Single Image to 3D. InInternational Conference on Learning Representa- tions (ICLR), 2024. 3

work page 2024

[15] [15]

2D Gaussian Splatting for Geometrically Ac- curate Radiance Fields

Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2D Gaussian Splatting for Geometrically Ac- curate Radiance Fields. InSIGGRAPH 2024 Conference Pa- pers. Association for Computing Machinery, 2024. 3

work page 2024

[16] [16]

TexGen: Text-Guided 3D Texture Generation with Multi- view Sampling and Resampling

Dong Huo, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, and Yee-Hong Yang. TexGen: Text-Guided 3D Texture Generation with Multi- view Sampling and Resampling. InEuropean Conference on Computer Vision, pages 352–368. Springer, 2024. 3

work page 2024

[17] [17]

FlexiTex: Enhancing Tex- ture Generation via Visual Guidance

DaDong Jiang, Xianghui Yang, Zibo Zhao, Sheng Zhang, Jiaao Yu, Zeqiang Lai, Shaoxiong Yang, Chunchao Guo, Xiaobo Zhou, and Zhihui Ke. FlexiTex: Enhancing Tex- ture Generation via Visual Guidance. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3967– 3975, 2025. 3

work page 2025

[18] [18]

3D Gaussian Splatting for Real-Time Radiance Field Rendering .ACM Transactions on Graphics, 42(4):1–14, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D Gaussian Splatting for Real-Time Radiance Field Rendering .ACM Transactions on Graphics, 42(4):1–14, 2023. 1, 2, 3

work page 2023

[19] [19]

The role of imagenet classes in fr´echet inception distance

Tuomas Kynk ¨a¨anniemi, Tero Karras, Miika Aittala, Timo Aila, and Jaakko Lehtinen. The role of imagenet classes in fr´echet inception distance. InInternational Conference on Learning Representations, 2023. 6

work page 2023

[20] [20]

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, and Chen Change Loy. LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation. InEuropean Conference on Com- puter Vision, pages 112–130. Springer, 2024. 7

work page 2024

[21] [21]

DiffSplat: Repurposing Image Diffusion Models for Scalable 3D Gaussian Splat Generation

Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, and Yadong Mu. DiffSplat: Repurposing Image Diffusion Models for Scalable 3D Gaussian Splat Generation. InIn- ternational Conference on Learning Representations (ICLR),

work page

[22] [22]

Magic3D: High- Resolution Text-to-3D Content Creation

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fi- dler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3D: High- Resolution Text-to-3D Content Creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023. 2

work page 2023

[23] [23]

TexOct: Generating Textures of 3D Models with Octree-based Diffusion

Jialun Liu, Chenming Wu, Xinqi Liu, Xing Liu, Jinbo Wu, Haotian Peng, Chen Zhao, Haocheng Feng, Jingtuo Liu, and Errui Ding. TexOct: Generating Textures of 3D Models with Octree-based Diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4284–4293, 2024. 3

work page 2024

[24] [24]

DIRECT-3D: Learning Direct Text-to-3D Gen- eration on Massive Noisy 3D Data

Qihao Liu, Yi Zhang, Song Bai, Adam Kortylewski, and Alan Yuille. DIRECT-3D: Learning Direct Text-to-3D Gen- eration on Massive Noisy 3D Data. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6881–6891, 2024. 7

work page 2024

[25] [25]

Text-Guided Texturing by Synchronized Multi-View Diffu- sion

Yuxin Liu, Minshan Xie, Hanyuan Liu, and Tien-Tsin Wong. Text-Guided Texturing by Synchronized Multi-View Diffu- sion. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024. 3, 6

work page 2024

[26] [26]

Large Point-to-Gaussian Model for Image-to-3D Generation

Longfei Lu, Huachen Gao, Tao Dai, Yaohua Zha, Zhi Hou, Junta Wu, and Shu-Tao Xia. Large Point-to-Gaussian Model for Image-to-3D Generation. InProceedings of the 32nd ACM International Conference on Multimedia, pages 10843–10852, 2024. 2, 3

work page 2024

[27] [27]

GeoDream: Disentan- gling 2D and Geometric Priors for High-Fidelity and Consis- tent 3D Generation.arXiv preprint arXiv:2311.17971, 2023

Baorui Ma, Haoge Deng, Junsheng Zhou, Yu-Shen Liu, Tiejun Huang, and Xinlong Wang. GeoDream: Disentan- gling 2D and Geometric Priors for High-Fidelity and Consis- tent 3D Generation.arXiv preprint arXiv:2311.17971, 2023. 2

work page arXiv 2023

[28] [28]

Latent-NeRF for Shape-Guided Gen- eration of 3D Shapes and Textures

Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. Latent-NeRF for Shape-Guided Gen- eration of 3D Shapes and Textures. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12663–12673, 2023. 2

work page 2023

[29] [29]

DiffRF: Rendering-Guided 3D Radiance Field Diffusion

Norman M ¨uller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulo, Peter Kontschieder, and Matthias Nießner. DiffRF: Rendering-Guided 3D Radiance Field Diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4328–4338, 2023. 2

work page 2023

[30] [30]

Improved Denoising Diffusion Probabilistic Models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved Denoising Diffusion Probabilistic Models. InInternational Conference on Machine Learning, pages 8162–8171. PMLR,

work page

[31] [31]

MultiPull: Detailing Signed Distance Functions by Pulling Multi-Level Queries at Multi- Step

Takeshi Noda, Chao Chen, Weiqi Zhang, Xinhai Liu, Yu- Shen Liu, and Zhizhong Han. MultiPull: Detailing Signed Distance Functions by Pulling Multi-Level Queries at Multi- Step. InAdvances in Neural Information Processing Sys- tems, pages 13404–13429. Curran Associates, Inc., 2024. 3

work page 2024

[32] [32]

Learning Bijective Sur- face Parameterization for Inferring Signed Distance Func- tions from Sparse Point Clouds with Grid Deformation

Takeshi Noda, Chao Chen, Junsheng Zhou, Weiqi Zhang, Yu-Shen Liu, and Zhizhong Han. Learning Bijective Sur- face Parameterization for Inferring Signed Distance Func- tions from Sparse Point Clouds with Grid Deformation. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 22139–22149, 2025. 3

work page 2025

[33] [33]

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

Michael Oechsle, Songyou Peng, and Andreas Geiger. UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction. InInternational Con- ference on Computer Vision (ICCV), 2021. 3

work page 2021

[34] [34]

DreamFusion: Text-to-3D using 2D Diffusion

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. DreamFusion: Text-to-3D using 2D Diffusion. InIn- ternational Conference on Learning Representations, 2023. 2, 8

work page 2023

[35] [35]

Richdreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text- to-3D

Lingteng Qiu, Guanying Chen, Xiaodong Gu, Qi Zuo, Mu- tian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, and Xiaoguang Han. Richdreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text- to-3D. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9914–9925,

work page

[36] [36]

Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InInternational Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 6, 7

work page 2021

[37] [37]

DreamBooth3D: Subject-Driven Text-to-3D Generation

Amit Raj, Srinivas Kaza, Ben Poole, Michael Niemeyer, Nataniel Ruiz, Ben Mildenhall, Shiran Zada, Kfir Aberman, Michael Rubinstein, Jonathan Barron, et al. DreamBooth3D: Subject-Driven Text-to-3D Generation. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 2349–2359, 2023. 2

work page 2023

[38] [38]

TEXTure: Text-Guided Texturing of 3D Shapes

Elad Richardson, Gal Metzer, Yuval Alaluf, Raja Giryes, and Daniel Cohen-Or. TEXTure: Text-Guided Texturing of 3D Shapes. InACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023. 3, 6

work page 2023

[39] [39]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022. 5, 6

work page 2022

[40] [40]

3D Neural Field Generation using Triplane Diffusion

J Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Ji- ajun Wu, and Gordon Wetzstein. 3D Neural Field Generation using Triplane Diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20875–20886, 2023. 2

work page 2023

[41] [41]

DreamCraft3D: Hierarchi- cal 3D Generation with Bootstrapped Diffusion Prior

Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, and Yebin Liu. DreamCraft3D: Hierarchi- cal 3D Generation with Bootstrapped Diffusion Prior . InIn- ternational Conference on Learning Representations (ICLR),

work page

[42] [42]

Lgm: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation

Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation. In European Conference on Computer Vision, pages 1–18. Springer, 2024. 7

work page 2024

[43] [43]

InTeX: Interactive text-to-texture synthesis via unified depth-aware inpainting,

Jiaxiang Tang, Ruijie Lu, Xiaokang Chen, Xiang Wen, Gang Zeng, and Ziwei Liu. Intex: Interactive text-to-texture syn- thesis via unified depth-aware inpainting.arXiv preprint arXiv:2403.11878, 2024. 3

work page arXiv 2024

[44] [44]

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. InInternational Conference on Learning Representations, 2024. 8

work page 2024

[45] [45]

V olumediffu- sion: Flexible text-to-3d generation with efficient volumetric encoder.arXiv preprint arXiv:2312.11459, 2023

Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, and Baining Guo. V olumeDif- fusion: Flexible Text-to-3D Generation with Efficient V olu- metric Encoder.arXiv preprint arXiv:2312.11459, 2023. 2

work page arXiv 2023

[46] [46]

Hunyuan3D 2.0: Scaling Diffu- sion Models for High Resolution Textured 3D Assets Gener- ation, 2025

Tencent Hunyuan3D Team. Hunyuan3D 2.0: Scaling Diffu- sion Models for High Resolution Textured 3D Assets Gener- ation, 2025. 3

work page 2025

[47] [47]

Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, et al. Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4563–4573, 2023. 2

work page 2023

[48] [48]

ProlificDreamer: High- Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Information Process- ing Systems, 36, 2024

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongx- uan Li, Hang Su, and Jun Zhu. ProlificDreamer: High- Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Information Process- ing Systems, 36, 2024. 2

work page 2024

[49] [49]

Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds

Xiaoyu Xiang, Liat Sless Gorelik, Omri Armstrong Yuchen Fan, Forrest Iandola, Yilei Li, Ita Lifshitz, and Rakesh Ranjan. Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,

work page

[50] [50]

TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting

Bojun Xiong, Jialun Liu, Jiakui Hu, Chenming Wu, Jinbo Wu, Xing Liu, Chen Zhao, Errui Ding, and Zhouhui Lian. TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 551–561, 2025. 3

work page 2025

[51] [51]

ImageReward: Learning and Evaluating Human Preferences for Text-to- Image Generation.Advances in Neural Information Process- ing Systems, 36:15903–15935, 2023

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. ImageReward: Learning and Evaluating Human Preferences for Text-to- Image Generation.Advances in Neural Information Process- ing Systems, 36:15903–15935, 2023. 7

work page 2023

[52] [52]

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to- Image Diffusion Models

Jiale Xu, Xintao Wang, Weihao Cheng, Yan-Pei Cao, Ying Shan, Xiaohu Qie, and Shenghua Gao. Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to- Image Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20908–20918, 2023. 2

work page 2023

[53] [53]

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, and Gordon Wet- zstein. GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation . InEuropean Conference on Computer Vision. Springer, 2024. 3, 7

work page 2024

[54] [54]

xatlas: A Library for Mesh Parameteriza- tion

Jonathan Young. xatlas: A Library for Mesh Parameteriza- tion. GitHub repository, 2018. 7

work page 2018

[55] [55]

Texture Generation on 3D Meshes with Point- UV Diffusion

Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Zhengzhe Liu, and Xiaojuan Qi. Texture Generation on 3D Meshes with Point- UV Diffusion. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 4206–4216,

work page

[56] [56]

TEXGen: a Generative Diffusion Model for Mesh Tex- tures.ACM Transactions on Graphics (TOG), 43(6):1–14,

Xin Yu, Ze Yuan, Yuan-Chen Guo, Ying-Tian Liu, Jianhui Liu, Yangguang Li, Yan-Pei Cao, Ding Liang, and Xiaojuan Qi. TEXGen: a Generative Diffusion Model for Mesh Tex- tures.ACM Transactions on Graphics (TOG), 43(6):1–14,

work page

[57] [57]

Paint3D: Paint Anything 3D with Lighting-Less Texture Dif- fusion Models

Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, and Gang Yu. Paint3D: Paint Anything 3D with Lighting-Less Texture Dif- fusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4252– 4262, 2024. 2, 3, 6

work page 2024

[58] [58]

GaussianCube: Structuring Gaussian Splatting using Opti- mal Transport for 3D Generative Modeling

Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, and Baining Guo. GaussianCube: Structuring Gaussian Splatting using Opti- mal Transport for 3D Generative Modeling. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 1, 3

work page 2024

[59] [59]

Adding Conditional Control to Text-to-Image Diffusion Models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding Conditional Control to Text-to-Image Diffusion Models . In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023. 3, 5, 6

work page 2023

[60] [60]

MaterialRefGS: Reflective gaussian splatting with multi-view consistent material infer- ence

Wenyuan Zhang, Jimin Tang, Weiqi Zhang, Yi Fang, Yu- Shen Liu, and Zhizhong Han. MaterialRefGS: Reflective gaussian splatting with multi-view consistent material infer- ence. InAdvances in Neural Information Processing Sys- tems, 2025. 3

work page 2025

[61] [61]

GAP: Gaussianize Any Point Clouds with Text Guidance

Weiqi Zhang, Junsheng Zhou, Haotian Geng, Wenyuan Zhang, and Yu-Shen Liu. GAP: Gaussianize Any Point Clouds with Text Guidance. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025. 6

work page 2025

[62] [62]

Learning Consistency-Aware Unsigned Dis- tance Functions Progressively from Raw Point Clouds

Junsheng Zhou, Baorui Ma, Yu-Shen Liu, Yi Fang, and Zhizhong Han. Learning Consistency-Aware Unsigned Dis- tance Functions Progressively from Raw Point Clouds. In Advances in Neural Information Processing Systems, pages 16481–16494. Curran Associates, Inc., 2022. 3, 7

work page 2022

[63] [63]

Uni3D: Exploring Uni- fied 3D Representation at Scale

Junsheng Zhou, Jinsheng Wang, Baorui Ma, Yu-Shen Liu, Tiejun Huang, and Xinlong Wang. Uni3D: Exploring Uni- fied 3D Representation at Scale. InInternational Conference on Learning Representations, pages 46766–46782, 2024. 2, 7

work page 2024

[64] [64]

DiffGS: Functional Gaussian Splatting Diffusion

Junsheng Zhou, Weiqi Zhang, and Yu-Shen Liu. DiffGS: Functional Gaussian Splatting Diffusion. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 1, 3

work page 2024

[65] [65]

UDiFF: Generating Condi- tional Unsigned Distance Fields with Optimal Wavelet Dif- fusion

Junsheng Zhou, Weiqi Zhang, Baorui Ma, Kanle Shi, Yu- Shen Liu, and Zhizhong Han. UDiFF: Generating Condi- tional Unsigned Distance Fields with Optimal Wavelet Dif- fusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21496– 21506, 2024. 2

work page 2024

[66] [66]

GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance

Jingqiu Zhou, Lue Fan, Xuesong Chen, Linjiang Huang, Si Liu, and Hongsheng Li. GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 10788–10796, 2025. 3

work page 2025

[67] [67]

UDFStudio: A Unified Frame- work of Datasets, Benchmarks and Generative Models for Unsigned Distance Functions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

Junsheng Zhou, Weiqi Zhang, Baorui Ma, Kanle Shi, Yu- Shen Liu, and Zhizhong Han. UDFStudio: A Unified Frame- work of Datasets, Benchmarks and Generative Models for Unsigned Distance Functions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026. 3

work page 2026

[68] [68]

Triplane Meets Gaussian Splatting: Fast and Generalizable Single- View 3D Reconstruction with Transformers

Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane Meets Gaussian Splatting: Fast and Generalizable Single- View 3D Reconstruction with Transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 10324–10335, 2024. 2, 3, 8

work page 2024