pith. sign in

arxiv: 2604.05721 · v1 · submitted 2026-04-07 · 💻 cs.CV

GaussianGrow: Geometry-aware Gaussian Growing from 3D Point Clouds with Text Guidance

Pith reviewed 2026-05-10 19:12 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian Splattingpoint cloud to Gaussiantext-guided generationmulti-view diffusiongeometry-awareinpaintingnovel view synthesis3D reconstruction
0
0 comments X

The pith

GaussianGrow generates 3D Gaussians by growing them from point clouds under text guidance to enforce geometric accuracy from the start.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to build 3D Gaussian models for fast, high-quality rendering by starting with ordinary point clouds and expanding the primitives outward instead of guessing all geometry at once. A multi-view diffusion model supplies consistent appearance signals from the points, while an iterative loop finds the biggest missing areas, renders them, and uses 2D diffusion inpainting to fill them until the model is complete. A sympathetic reader would care because existing generators often fail when their predicted geometries are off, producing blurry or distorted results; tying growth directly to real point data aims to avoid that failure mode. The text prompt steers overall appearance while the input points supply the spatial structure.

Core claim

We introduce GaussianGrow, a novel approach that generates 3D Gaussians by learning to grow them from easily accessible 3D point clouds, naturally enforcing geometric accuracy in Gaussian generation. It uses a text-guided scheme that draws on a multi-view diffusion model for consistent appearance supervision and iteratively detects large un-grown regions to inpaint them with a pretrained 2D diffusion model until the Gaussians are complete.

What carries the argument

Text-guided Gaussian growing scheme that expands primitives from point clouds, supervises them with multi-view diffusion renders, and completes unobserved areas via iterative pose detection plus 2D inpainting.

If this is right

  • The approach produces complete Gaussian models from both synthetic and real-scanned point clouds.
  • It avoids fusion artifacts by constraining novel views generated in overlapping regions.
  • Text guidance controls appearance while point-cloud geometry remains the anchor.
  • Iterative inpainting handles hard-to-observe regions without breaking overall consistency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same growing-plus-inpainting loop might transfer to other 3D primitives such as surfels or meshes.
  • Direct conversion of LiDAR or photogrammetry scans into splattable scenes could become simpler.
  • Robustness checks on noisy or very sparse real-world point clouds would test practical limits.
  • Adding surface normals or edge constraints from the input points could further tighten accuracy.

Load-bearing premise

The multi-view diffusion model must create appearance supervision that stays geometrically consistent with the input point clouds, and the 2D inpainting step must fill gaps without adding new geometric or visual errors.

What would settle it

Apply the method to a real-scanned point cloud with known ground-truth geometry, then compare rendered novel views against the ground truth to check for visible distortions, floaters, or inconsistencies in the grown Gaussians.

Figures

Figures reproduced from arXiv: 2604.05721 by Haotian Geng, Junsheng Zhou, Kanle Shi, Shenkun Xu, Weiqi Zhang, Yi Fang, Yu-Shen Liu.

Figure 1
Figure 1. Figure 1: Left: Diverse shapes generated by GaussianGrow. Right: The Gaussian generation pipeline of GaussianGrow. Reference point clouds can be obtained through large-scale retrieval or sensor scanning, from which Gaussians are grown under text guidance. Abstract 3D Gaussian Splatting has demonstrated superior perfor￾mance in rendering efficiency and quality, yet the gener￾ation of 3D Gaussians still remains a chal… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of GaussianGrow. Stage 1. We leverage depth-aware ControlNet for primary view generation, with a geometry￾aware diffusion model for multi-view synthesis. Additional views are generated for improving appearances in overlap regions by optimizing camera poses to observe overlap regions. Gaussians are optimized to grow with supervision from both cardinal and additional views. Stage 2. We iteratively i… view at source ↗
Figure 5
Figure 5. Figure 5: The effectiveness of processing overlap regions. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 3
Figure 3. Figure 3: We obtain the additional camera poses by optimizing [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The effect of Gaussian inpainting. Before Overlap Processing After Overlap Processing [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visual comparison on the Objaverse dataset shows that GaussianGrow uses point clouds instead of meshes. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Text-to-3D comparisons on T3Bench. the retrieve-based setting and the generative-based setting. For the retrieve-based setting, we employ Uni3D [63] to re￾trieve reference point clouds from the G-Objaverse dataset [35], a carefully curated subset of Objaverse [8], based on the input text prompt. The retrieved point clouds serve as geometric priors that guide the generation process. For a generative-based s… view at source ↗
Figure 8
Figure 8. Figure 8: Visual comparison with DreamGaussian and TriplaneGaussian on the task of Point-to-Gaussian. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
read the original abstract

3D Gaussian Splatting has demonstrated superior performance in rendering efficiency and quality, yet the generation of 3D Gaussians still remains a challenge without proper geometric priors. Existing methods have explored predicting point maps as geometric references for inferring Gaussian primitives, while the unreliable estimated geometries may lead to poor generations. In this work, we introduce GaussianGrow, a novel approach that generates 3D Gaussians by learning to grow them from easily accessible 3D point clouds, naturally enforcing geometric accuracy in Gaussian generation. Specifically, we design a text-guided Gaussian growing scheme that leverages a multi-view diffusion model to synthesize consistent appearances from input point clouds for supervision. To mitigate artifacts caused by fusing neighboring views, we constrain novel views generated at non-preset camera poses identified in overlapping regions across different views. For completing the hard-to-observe regions, we propose to iteratively detect the camera pose by observing the largest un-grown regions in point clouds and inpainting them by inpainting the rendered view with a pretrained 2D diffusion model. The process continues until complete Gaussians are generated. We extensively evaluate GaussianGrow on text-guided Gaussian generation from synthetic and even real-scanned point clouds. Project Page: https://weiqi-zhang.github.io/GaussianGrow

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces GaussianGrow, a method for generating 3D Gaussians from input 3D point clouds under text guidance. Gaussians are initialized from the point cloud and grown iteratively: a multi-view diffusion model synthesizes consistent appearances for supervision, overlapping-view constraints mitigate fusion artifacts, and unobserved regions are completed by iteratively selecting the camera pose observing the largest un-grown area, rendering the view, inpainting it with a pretrained 2D diffusion model, and continuing until the representation is complete. The central claim is that this growing process from point clouds naturally enforces geometric accuracy in the resulting Gaussians. The work reports evaluations on text-guided generation from both synthetic and real-scanned point clouds.

Significance. If the geometric-accuracy claim is substantiated, the approach would provide a practical route to high-fidelity, efficient 3D Gaussian representations that leverage readily available point-cloud priors, potentially benefiting text-to-3D synthesis, novel-view rendering, and downstream applications in AR/VR. The explicit use of point-cloud initialization and the overlapping-view consistency mechanism are constructive design choices that distinguish the method from purely image-based generation pipelines.

major comments (2)
  1. [Abstract (hard-to-observe region completion paragraph)] Abstract (hard-to-observe region completion paragraph): the iterative inpainting step renders a view and applies a pretrained 2D diffusion model without any described 3D consistency loss, multi-view geometric regularizer, or back-projection constraint that ties the inpainted content to the original input point cloud. Because the added Gaussians are optimized only against the 2D inpainted image, their 3D positions, scales, or orientations can drift while still producing plausible 2D appearances, directly undermining the claim that the growing scheme 'naturally enforces geometric accuracy' for the completed regions.
  2. [Abstract (evaluation statement)] Abstract (evaluation statement): the manuscript states that GaussianGrow is 'extensively evaluate[d]' on synthetic and real-scanned point clouds, yet the provided text contains no quantitative metrics, ablation tables, or baseline comparisons that would allow verification of improved geometric fidelity relative to prior point-map or diffusion-based Gaussian generators.
minor comments (2)
  1. The term 'growing' and the precise update rule for adding new Gaussians from inpainted views are introduced only descriptively; an early formal definition or pseudocode block would improve clarity.
  2. [Abstract] The abstract would benefit from naming the concrete metrics (e.g., PSNR, Chamfer distance, or LPIPS) and the number of scenes used in the reported evaluations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our work. We provide detailed responses to each major comment below and have made revisions to the manuscript to address the concerns raised.

read point-by-point responses
  1. Referee: [Abstract (hard-to-observe region completion paragraph)] Abstract (hard-to-observe region completion paragraph): the iterative inpainting step renders a view and applies a pretrained 2D diffusion model without any described 3D consistency loss, multi-view geometric regularizer, or back-projection constraint that ties the inpainted content to the original input point cloud. Because the added Gaussians are optimized only against the 2D inpainted image, their 3D positions, scales, or orientations can drift while still producing plausible 2D appearances, directly undermining the claim that the growing scheme 'naturally enforces geometric accuracy' for the completed regions.

    Authors: We appreciate the referee's careful reading and the valid point regarding the inpainting of hard-to-observe regions. The current description focuses on the 2D inpainting step, but the Gaussians are optimized in 3D space using the multi-view diffusion model for supervision, which provides consistent appearances across multiple views. The overlapping-view constraints further help to maintain geometric consistency by identifying and constraining novel views in overlapping regions. Nevertheless, we acknowledge that an explicit 3D consistency loss or back-projection for the inpainted content is not detailed. To strengthen the manuscript, we have revised the method description to include how the inpainted 2D content is used to grow 3D Gaussians with constraints from the existing point cloud structure and multi-view consistency. We have also adjusted the abstract to reflect that geometric accuracy is naturally enforced from the input point cloud for observed areas, with the inpainting providing completion under these constraints. This addresses the concern without misrepresenting the approach. revision: yes

  2. Referee: [Abstract (evaluation statement)] Abstract (evaluation statement): the manuscript states that GaussianGrow is 'extensively evaluate[d]' on synthetic and real-scanned point clouds, yet the provided text contains no quantitative metrics, ablation tables, or baseline comparisons that would allow verification of improved geometric fidelity relative to prior point-map or diffusion-based Gaussian generators.

    Authors: We are sorry if the text provided to the referee did not include the full experimental details. The complete manuscript contains Section 4 'Experiments' which provides extensive quantitative evaluations on synthetic point clouds, including metrics for geometric accuracy (e.g., Chamfer distance to ground truth) and rendering quality (PSNR, SSIM, LPIPS), along with ablation studies on the components of the growing scheme and comparisons to baselines such as point-map prediction methods and other text-to-3D Gaussian approaches. For real-scanned point clouds, we include qualitative results and user preference studies. These are presented in tables and figures to substantiate the claims. We have verified that all evaluation content is present and clearly referenced in the revised manuscript. revision: no

Circularity Check

0 steps flagged

No circularity: pipeline uses external pretrained models and input point clouds

full rationale

The paper presents a procedural method that initializes Gaussians from given 3D point clouds and iteratively grows them using supervision from a multi-view diffusion model plus 2D inpainting on rendered views. No equations, fitted parameters, or self-citations are shown that reduce any claimed prediction or geometric enforcement result to the inputs by construction. The central claim of 'naturally enforcing geometric accuracy' rests on the external initialization and diffusion components rather than an internal self-referential loop, making the derivation self-contained against those benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the performance of two external pretrained diffusion models and the assumption that growing from point clouds inherently enforces geometry; no free parameters are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Pretrained multi-view diffusion models produce consistent appearances from point clouds suitable for Gaussian supervision
    Invoked to provide supervision signals in the growing scheme.
  • domain assumption 2D diffusion inpainting of rendered views accurately fills unobserved regions without geometric distortion
    Used to complete hard-to-observe areas during iterative pose selection.

pith-pipeline@v0.9.0 · 5540 in / 1268 out tokens · 40723 ms · 2026-05-10T19:12:13.868883+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages

  1. [1]

    Meta 3D TextureGen: Fast and consistent texture generation for 3D objects,

    Raphael Bensadoun, Yanir Kleiman, Idan Azuri, Omri Harosh, Andrea Vedaldi, Natalia Neverova, and Oran Gafni. Meta 3D TextureGen: Fast and consistent texture generation for 3d objects.arXiv preprint arXiv:2407.02430, 2024. 3

  2. [2]

    The Ball-Pivoting Al- gorithm for Surface Reconstruction.IEEE Transactions on Visualization and Computer Graphics, 5(4):349–359, 1999

    Fausto Bernardini, Joshua Mittleman, Holly Rushmeier, Cl´audio Silva, and Gabriel Taubin. The Ball-Pivoting Al- gorithm for Surface Reconstruction.IEEE Transactions on Visualization and Computer Graphics, 5(4):349–359, 1999. 7

  3. [3]

    Demystifying mmd gans

    Mikołaj Bi´nkowski, Danica J Sutherland, Michael Arbel, and Arthur Gretton. Demystifying mmd gans. InInternational Conference on Learning Representations (ICLR), 2018. 6

  4. [4]

    Texfusion: Synthesizing 3D textures with text-guided image diffusion models

    Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, and Kangxue Yin. Texfusion: Synthesizing 3D textures with text-guided image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4169–4181, 2023. 3

  5. [5]

    Text2Tex: Text-driven Tex- ture Synthesis via Diffusion Models

    Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, and Matthias Nießner. Text2Tex: Text-driven Tex- ture Synthesis via Diffusion Models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18558–18568, 2023. 2, 3, 6

  6. [6]

    MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

    Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin Fu, Gang Yu, Ziwei Liu, et al. MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 585–594,

  7. [7]

    SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

    Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexan- der G Schwing, and Liang-Yan Gui. SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 4456–4465, 2023. 2

  8. [8]

    Objaverse: A Universe of Annotated 3D Objects

    Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A Universe of Annotated 3D Objects. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13142–13153, 2023. 6, 7

  9. [9]

    MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer

    Juntong Fang, Zequn Chen, Weiqi Zhang, Donglin Di, Xuancheng Zhang, Chengmin Yang, and Yu-Shen Liu. MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2026. 3

  10. [10]

    GVGEN: Text-to-3D Generation with V olumet- ric Representation

    Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yang- guang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, and Tong He. GVGEN: Text-to-3D Generation with V olumet- ric Representation. InEuropean Conference on Computer Vision, 2024. 1, 3, 7

  11. [11]

    T3bench: Benchmarking current progress in text-to-3d gen- eration, 2023

    Yuze He, Yushi Bai, Matthieu Lin, Wang Zhao, Yubin Hu, Jenny Sheng, Ran Yi, Juanzi Li, and Yong-Jin Liu. T3bench: Benchmarking current progress in text-to-3d gen- eration, 2023. 7

  12. [12]

    Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020. 2, 3

  13. [13]

    3dtopia: Large text-to-3d generation model with hybrid diffusion priors

    Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Shuai Yang, Tengfei Wang, Liang Pan, Dahua Lin, et al. 3DTopia: Large Text-to-3D Genera- tion Model with Hybrid Diffusion Priors.arXiv preprint arXiv:2403.02234, 2024. 7

  14. [14]

    LRM: Large Reconstruction Model for Single Image to 3D

    Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large Reconstruction Model for Single Image to 3D. InInternational Conference on Learning Representa- tions (ICLR), 2024. 3

  15. [15]

    2D Gaussian Splatting for Geometrically Ac- curate Radiance Fields

    Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2D Gaussian Splatting for Geometrically Ac- curate Radiance Fields. InSIGGRAPH 2024 Conference Pa- pers. Association for Computing Machinery, 2024. 3

  16. [16]

    TexGen: Text-Guided 3D Texture Generation with Multi- view Sampling and Resampling

    Dong Huo, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, and Yee-Hong Yang. TexGen: Text-Guided 3D Texture Generation with Multi- view Sampling and Resampling. InEuropean Conference on Computer Vision, pages 352–368. Springer, 2024. 3

  17. [17]

    FlexiTex: Enhancing Tex- ture Generation via Visual Guidance

    DaDong Jiang, Xianghui Yang, Zibo Zhao, Sheng Zhang, Jiaao Yu, Zeqiang Lai, Shaoxiong Yang, Chunchao Guo, Xiaobo Zhou, and Zhihui Ke. FlexiTex: Enhancing Tex- ture Generation via Visual Guidance. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3967– 3975, 2025. 3

  18. [18]

    3D Gaussian Splatting for Real-Time Radiance Field Rendering .ACM Transactions on Graphics, 42(4):1–14, 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D Gaussian Splatting for Real-Time Radiance Field Rendering .ACM Transactions on Graphics, 42(4):1–14, 2023. 1, 2, 3

  19. [19]

    The role of imagenet classes in fr´echet inception distance

    Tuomas Kynk ¨a¨anniemi, Tero Karras, Miika Aittala, Timo Aila, and Jaakko Lehtinen. The role of imagenet classes in fr´echet inception distance. InInternational Conference on Learning Representations, 2023. 6

  20. [20]

    LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

    Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, and Chen Change Loy. LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation. InEuropean Conference on Com- puter Vision, pages 112–130. Springer, 2024. 7

  21. [21]

    DiffSplat: Repurposing Image Diffusion Models for Scalable 3D Gaussian Splat Generation

    Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, and Yadong Mu. DiffSplat: Repurposing Image Diffusion Models for Scalable 3D Gaussian Splat Generation. InIn- ternational Conference on Learning Representations (ICLR),

  22. [22]

    Magic3D: High- Resolution Text-to-3D Content Creation

    Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fi- dler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3D: High- Resolution Text-to-3D Content Creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023. 2

  23. [23]

    TexOct: Generating Textures of 3D Models with Octree-based Diffusion

    Jialun Liu, Chenming Wu, Xinqi Liu, Xing Liu, Jinbo Wu, Haotian Peng, Chen Zhao, Haocheng Feng, Jingtuo Liu, and Errui Ding. TexOct: Generating Textures of 3D Models with Octree-based Diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4284–4293, 2024. 3

  24. [24]

    DIRECT-3D: Learning Direct Text-to-3D Gen- eration on Massive Noisy 3D Data

    Qihao Liu, Yi Zhang, Song Bai, Adam Kortylewski, and Alan Yuille. DIRECT-3D: Learning Direct Text-to-3D Gen- eration on Massive Noisy 3D Data. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6881–6891, 2024. 7

  25. [25]

    Text-Guided Texturing by Synchronized Multi-View Diffu- sion

    Yuxin Liu, Minshan Xie, Hanyuan Liu, and Tien-Tsin Wong. Text-Guided Texturing by Synchronized Multi-View Diffu- sion. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024. 3, 6

  26. [26]

    Large Point-to-Gaussian Model for Image-to-3D Generation

    Longfei Lu, Huachen Gao, Tao Dai, Yaohua Zha, Zhi Hou, Junta Wu, and Shu-Tao Xia. Large Point-to-Gaussian Model for Image-to-3D Generation. InProceedings of the 32nd ACM International Conference on Multimedia, pages 10843–10852, 2024. 2, 3

  27. [27]

    GeoDream: Disentan- gling 2D and Geometric Priors for High-Fidelity and Consis- tent 3D Generation.arXiv preprint arXiv:2311.17971, 2023

    Baorui Ma, Haoge Deng, Junsheng Zhou, Yu-Shen Liu, Tiejun Huang, and Xinlong Wang. GeoDream: Disentan- gling 2D and Geometric Priors for High-Fidelity and Consis- tent 3D Generation.arXiv preprint arXiv:2311.17971, 2023. 2

  28. [28]

    Latent-NeRF for Shape-Guided Gen- eration of 3D Shapes and Textures

    Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. Latent-NeRF for Shape-Guided Gen- eration of 3D Shapes and Textures. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12663–12673, 2023. 2

  29. [29]

    DiffRF: Rendering-Guided 3D Radiance Field Diffusion

    Norman M ¨uller, Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulo, Peter Kontschieder, and Matthias Nießner. DiffRF: Rendering-Guided 3D Radiance Field Diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4328–4338, 2023. 2

  30. [30]

    Improved Denoising Diffusion Probabilistic Models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved Denoising Diffusion Probabilistic Models. InInternational Conference on Machine Learning, pages 8162–8171. PMLR,

  31. [31]

    MultiPull: Detailing Signed Distance Functions by Pulling Multi-Level Queries at Multi- Step

    Takeshi Noda, Chao Chen, Weiqi Zhang, Xinhai Liu, Yu- Shen Liu, and Zhizhong Han. MultiPull: Detailing Signed Distance Functions by Pulling Multi-Level Queries at Multi- Step. InAdvances in Neural Information Processing Sys- tems, pages 13404–13429. Curran Associates, Inc., 2024. 3

  32. [32]

    Learning Bijective Sur- face Parameterization for Inferring Signed Distance Func- tions from Sparse Point Clouds with Grid Deformation

    Takeshi Noda, Chao Chen, Junsheng Zhou, Weiqi Zhang, Yu-Shen Liu, and Zhizhong Han. Learning Bijective Sur- face Parameterization for Inferring Signed Distance Func- tions from Sparse Point Clouds with Grid Deformation. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 22139–22149, 2025. 3

  33. [33]

    UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

    Michael Oechsle, Songyou Peng, and Andreas Geiger. UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction. InInternational Con- ference on Computer Vision (ICCV), 2021. 3

  34. [34]

    DreamFusion: Text-to-3D using 2D Diffusion

    Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. DreamFusion: Text-to-3D using 2D Diffusion. InIn- ternational Conference on Learning Representations, 2023. 2, 8

  35. [35]

    Richdreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text- to-3D

    Lingteng Qiu, Guanying Chen, Xiaodong Gu, Qi Zuo, Mu- tian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, and Xiaoguang Han. Richdreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text- to-3D. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9914–9925,

  36. [36]

    Learn- ing transferable visual models from natural language super- vision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InInternational Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 6, 7

  37. [37]

    DreamBooth3D: Subject-Driven Text-to-3D Generation

    Amit Raj, Srinivas Kaza, Ben Poole, Michael Niemeyer, Nataniel Ruiz, Ben Mildenhall, Shiran Zada, Kfir Aberman, Michael Rubinstein, Jonathan Barron, et al. DreamBooth3D: Subject-Driven Text-to-3D Generation. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 2349–2359, 2023. 2

  38. [38]

    TEXTure: Text-Guided Texturing of 3D Shapes

    Elad Richardson, Gal Metzer, Yuval Alaluf, Raja Giryes, and Daniel Cohen-Or. TEXTure: Text-Guided Texturing of 3D Shapes. InACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023. 3, 6

  39. [39]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022. 5, 6

  40. [40]

    3D Neural Field Generation using Triplane Diffusion

    J Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Ji- ajun Wu, and Gordon Wetzstein. 3D Neural Field Generation using Triplane Diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20875–20886, 2023. 2

  41. [41]

    DreamCraft3D: Hierarchi- cal 3D Generation with Bootstrapped Diffusion Prior

    Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, and Yebin Liu. DreamCraft3D: Hierarchi- cal 3D Generation with Bootstrapped Diffusion Prior . InIn- ternational Conference on Learning Representations (ICLR),

  42. [42]

    Lgm: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation

    Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large Multi-View Gaus- sian Model for High-Resolution 3D Content Creation. In European Conference on Computer Vision, pages 1–18. Springer, 2024. 7

  43. [43]

    InTeX: Interactive text-to-texture synthesis via unified depth-aware inpainting,

    Jiaxiang Tang, Ruijie Lu, Xiaokang Chen, Xiang Wen, Gang Zeng, and Ziwei Liu. Intex: Interactive text-to-texture syn- thesis via unified depth-aware inpainting.arXiv preprint arXiv:2403.11878, 2024. 3

  44. [44]

    DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

    Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. InInternational Conference on Learning Representations, 2024. 8

  45. [45]

    V olumediffu- sion: Flexible text-to-3d generation with efficient volumetric encoder.arXiv preprint arXiv:2312.11459, 2023

    Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, and Baining Guo. V olumeDif- fusion: Flexible Text-to-3D Generation with Efficient V olu- metric Encoder.arXiv preprint arXiv:2312.11459, 2023. 2

  46. [46]

    Hunyuan3D 2.0: Scaling Diffu- sion Models for High Resolution Textured 3D Assets Gener- ation, 2025

    Tencent Hunyuan3D Team. Hunyuan3D 2.0: Scaling Diffu- sion Models for High Resolution Textured 3D Assets Gener- ation, 2025. 3

  47. [47]

    Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

    Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, et al. Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4563–4573, 2023. 2

  48. [48]

    ProlificDreamer: High- Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Information Process- ing Systems, 36, 2024

    Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongx- uan Li, Hang Su, and Jun Zhu. ProlificDreamer: High- Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Information Process- ing Systems, 36, 2024. 2

  49. [49]

    Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds

    Xiaoyu Xiang, Liat Sless Gorelik, Omri Armstrong Yuchen Fan, Forrest Iandola, Yilei Li, Ita Lifshitz, and Rakesh Ranjan. Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision,

  50. [50]

    TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting

    Bojun Xiong, Jialun Liu, Jiakui Hu, Chenming Wu, Jinbo Wu, Xing Liu, Chen Zhao, Errui Ding, and Zhouhui Lian. TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 551–561, 2025. 3

  51. [51]

    ImageReward: Learning and Evaluating Human Preferences for Text-to- Image Generation.Advances in Neural Information Process- ing Systems, 36:15903–15935, 2023

    Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. ImageReward: Learning and Evaluating Human Preferences for Text-to- Image Generation.Advances in Neural Information Process- ing Systems, 36:15903–15935, 2023. 7

  52. [52]

    Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to- Image Diffusion Models

    Jiale Xu, Xintao Wang, Weihao Cheng, Yan-Pei Cao, Ying Shan, Xiaohu Qie, and Shenghua Gao. Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to- Image Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20908–20918, 2023. 2

  53. [53]

    GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

    Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, and Gordon Wet- zstein. GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation . InEuropean Conference on Computer Vision. Springer, 2024. 3, 7

  54. [54]

    xatlas: A Library for Mesh Parameteriza- tion

    Jonathan Young. xatlas: A Library for Mesh Parameteriza- tion. GitHub repository, 2018. 7

  55. [55]

    Texture Generation on 3D Meshes with Point- UV Diffusion

    Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Zhengzhe Liu, and Xiaojuan Qi. Texture Generation on 3D Meshes with Point- UV Diffusion. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 4206–4216,

  56. [56]

    TEXGen: a Generative Diffusion Model for Mesh Tex- tures.ACM Transactions on Graphics (TOG), 43(6):1–14,

    Xin Yu, Ze Yuan, Yuan-Chen Guo, Ying-Tian Liu, Jianhui Liu, Yangguang Li, Yan-Pei Cao, Ding Liang, and Xiaojuan Qi. TEXGen: a Generative Diffusion Model for Mesh Tex- tures.ACM Transactions on Graphics (TOG), 43(6):1–14,

  57. [57]

    Paint3D: Paint Anything 3D with Lighting-Less Texture Dif- fusion Models

    Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, and Gang Yu. Paint3D: Paint Anything 3D with Lighting-Less Texture Dif- fusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4252– 4262, 2024. 2, 3, 6

  58. [58]

    GaussianCube: Structuring Gaussian Splatting using Opti- mal Transport for 3D Generative Modeling

    Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, and Baining Guo. GaussianCube: Structuring Gaussian Splatting using Opti- mal Transport for 3D Generative Modeling. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 1, 3

  59. [59]

    Adding Conditional Control to Text-to-Image Diffusion Models

    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding Conditional Control to Text-to-Image Diffusion Models . In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023. 3, 5, 6

  60. [60]

    MaterialRefGS: Reflective gaussian splatting with multi-view consistent material infer- ence

    Wenyuan Zhang, Jimin Tang, Weiqi Zhang, Yi Fang, Yu- Shen Liu, and Zhizhong Han. MaterialRefGS: Reflective gaussian splatting with multi-view consistent material infer- ence. InAdvances in Neural Information Processing Sys- tems, 2025. 3

  61. [61]

    GAP: Gaussianize Any Point Clouds with Text Guidance

    Weiqi Zhang, Junsheng Zhou, Haotian Geng, Wenyuan Zhang, and Yu-Shen Liu. GAP: Gaussianize Any Point Clouds with Text Guidance. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025. 6

  62. [62]

    Learning Consistency-Aware Unsigned Dis- tance Functions Progressively from Raw Point Clouds

    Junsheng Zhou, Baorui Ma, Yu-Shen Liu, Yi Fang, and Zhizhong Han. Learning Consistency-Aware Unsigned Dis- tance Functions Progressively from Raw Point Clouds. In Advances in Neural Information Processing Systems, pages 16481–16494. Curran Associates, Inc., 2022. 3, 7

  63. [63]

    Uni3D: Exploring Uni- fied 3D Representation at Scale

    Junsheng Zhou, Jinsheng Wang, Baorui Ma, Yu-Shen Liu, Tiejun Huang, and Xinlong Wang. Uni3D: Exploring Uni- fied 3D Representation at Scale. InInternational Conference on Learning Representations, pages 46766–46782, 2024. 2, 7

  64. [64]

    DiffGS: Functional Gaussian Splatting Diffusion

    Junsheng Zhou, Weiqi Zhang, and Yu-Shen Liu. DiffGS: Functional Gaussian Splatting Diffusion. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 1, 3

  65. [65]

    UDiFF: Generating Condi- tional Unsigned Distance Fields with Optimal Wavelet Dif- fusion

    Junsheng Zhou, Weiqi Zhang, Baorui Ma, Kanle Shi, Yu- Shen Liu, and Zhizhong Han. UDiFF: Generating Condi- tional Unsigned Distance Fields with Optimal Wavelet Dif- fusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21496– 21506, 2024. 2

  66. [66]

    GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance

    Jingqiu Zhou, Lue Fan, Xuesong Chen, Linjiang Huang, Si Liu, and Hongsheng Li. GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 10788–10796, 2025. 3

  67. [67]

    UDFStudio: A Unified Frame- work of Datasets, Benchmarks and Generative Models for Unsigned Distance Functions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

    Junsheng Zhou, Weiqi Zhang, Baorui Ma, Kanle Shi, Yu- Shen Liu, and Zhizhong Han. UDFStudio: A Unified Frame- work of Datasets, Benchmarks and Generative Models for Unsigned Distance Functions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026. 3

  68. [68]

    Triplane Meets Gaussian Splatting: Fast and Generalizable Single- View 3D Reconstruction with Transformers

    Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane Meets Gaussian Splatting: Fast and Generalizable Single- View 3D Reconstruction with Transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 10324–10335, 2024. 2, 3, 8