DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation
Pith reviewed 2026-05-18 18:26 UTC · model grok-4.3
The pith
LGAA reuses layers from multi-view diffusion models to generate PBR-ready 3D assets from only 69k instances.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LGAA unifies geometry and PBR material modeling by exploiting multi-view diffusion priors through a modular design: the LGAA Wrapper reuses and adapts network layers from MV diffusion models to preserve 2D priors for better convergence, the LGAA Switcher aligns multiple wrapper layers that encapsulate different knowledge, and the LGAA Decoder, a tamed variational autoencoder, predicts 2D Gaussian Splatting with PBR channels. A dedicated post-processing procedure then extracts high-quality, relightable mesh assets from the resulting 2DGS. Experiments demonstrate superior performance with both text- and image-conditioned MV diffusion models and data-efficient finetuning with merely 69k multi-v
What carries the argument
Lightweight Gaussian Asset Adapter (LGAA), a plug-in module whose Wrapper reuses MV diffusion layers, Switcher aligns multiple priors, and Decoder predicts 2DGS with PBR channels, thereby lifting pre-trained models for unified 3D asset generation.
Load-bearing premise
Reusing and adapting network layers from pre-trained MV diffusion models preserves 2D priors sufficiently to enable better convergence and superior 3D PBR performance when finetuned on only 69k multi-view instances.
What would settle it
A model trained from scratch on the same 69k multi-view instances that matches or exceeds LGAA in convergence speed and final 3D PBR asset quality would show that layer reuse is not necessary for the claimed data efficiency.
Figures
read the original abstract
The labor- and experience-intensive creation of 3D assets with physically based rendering (PBR) materials demands an autonomous 3D asset creation pipeline. However, most existing 3D generation methods focus on geometry modeling, either baking textures into simple vertex colors or leaving texture synthesis to post-processing with image diffusion models. To achieve end-to-end PBR-ready 3D asset generation, we present Lightweight Gaussian Asset Adapter (LGAA), a novel framework that unifies the modeling of geometry and PBR materials by exploiting multi-view (MV) diffusion priors from a novel perspective. The LGAA features a modular design with three components. Specifically, the LGAA Wrapper reuses and adapts network layers from MV diffusion models, which encapsulate knowledge acquired from billions of images, enabling better convergence in a data-efficient manner. To incorporate multiple diffusion priors for geometry and PBR synthesis, the LGAA Switcher aligns multiple LGAA Wrapper layers encapsulating different knowledge. Then, a tamed variational autoencoder (VAE), termed LGAA Decoder, is designed to predict 2D Gaussian Splatting (2DGS) with PBR channels. Finally, we introduce a dedicated post-processing procedure to effectively extract high-quality, relightable mesh assets from the resulting 2DGS. Extensive quantitative and qualitative experiments demonstrate the superior performance of LGAA with both text- and image-conditioned MV diffusion models. Additionally, the modular design enables flexible incorporation of multiple diffusion priors, and the knowledge-preserving scheme effectively preseves the 2D priors learned on massive image dataset, which leads to data efficient finetuning to lift the MV diffuison models for 3D generation with merely 69k multi-view instances.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Lightweight Gaussian Asset Adapter (LGAA), a modular plug-in framework to lift pre-trained multi-view (MV) diffusion models for end-to-end 3D asset generation with geometry and PBR materials. LGAA Wrapper reuses and adapts layers from MV diffusion models to preserve 2D priors for data-efficient finetuning; LGAA Switcher aligns multiple such wrappers for geometry and PBR priors; LGAA Decoder (a tamed VAE) predicts 2D Gaussian Splatting with PBR channels; a post-processing step extracts relightable meshes. The authors claim superior performance over existing methods via extensive quantitative and qualitative experiments on both text- and image-conditioned MV models, achieved through data-efficient finetuning on only 69k multi-view instances.
Significance. If the empirical claims hold, the modular reuse of billion-image 2D priors for 3D PBR generation would represent a practical advance in data-efficient 3D asset pipelines, reducing reliance on large-scale 3D datasets while enabling flexible combination of geometry and material priors. The explicit post-processing to relightable meshes and the 2DGS output format are also potentially useful for downstream applications.
major comments (2)
- Abstract: the central claim that LGAA 'effectively preserves the 2D priors learned on massive image dataset' and thereby enables 'data efficient finetuning ... with merely 69k multi-view instances' is unsupported by any reported quantitative check (feature similarity, retained 2D generation quality after adaptation, or from-scratch ablation). Without such evidence the attribution of gains to prior preservation rather than to the new architecture or training protocol remains unverified.
- Abstract and Experiments section: no numerical metrics, ablation tables, error bars, dataset construction details, or evaluation protocol (e.g., metrics for geometry, PBR, or relighting quality) are supplied to substantiate the repeated assertion of 'superior performance.' This absence prevents assessment of whether the reported gains are statistically meaningful or merely qualitative.
minor comments (2)
- Abstract: typographical errors ('preseves', 'diffuison') should be corrected.
- Abstract: the sentence 'the modular design enables flexible incorporation of multiple diffusion priors, and the knowledge-preserving scheme effectively preseves...' is run-on and should be split for clarity.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. The two major comments identify important gaps in evidentiary support for our central claims. We agree that additional quantitative checks and detailed reporting are needed to strengthen the manuscript and will revise accordingly.
read point-by-point responses
-
Referee: Abstract: the central claim that LGAA 'effectively preserves the 2D priors learned on massive image dataset' and thereby enables 'data efficient finetuning ... with merely 69k multi-view instances' is unsupported by any reported quantitative check (feature similarity, retained 2D generation quality after adaptation, or from-scratch ablation). Without such evidence the attribution of gains to prior preservation rather than to the new architecture or training protocol remains unverified.
Authors: We agree that the current version does not contain direct quantitative verification of prior preservation, such as feature-space similarity between adapted and original MV diffusion layers or an explicit from-scratch training ablation. The manuscript relies on indirect evidence through overall performance gains and the modular reuse design. We will add a dedicated ablation subsection that reports (i) performance when the LGAA Wrapper is trained from random initialization versus initialized from MV diffusion weights and (ii) any feasible retained 2D generation quality metrics after adaptation. This revision will allow readers to assess the contribution of the preserved priors more rigorously. revision: yes
-
Referee: Abstract and Experiments section: no numerical metrics, ablation tables, error bars, dataset construction details, or evaluation protocol (e.g., metrics for geometry, PBR, or relighting quality) are supplied to substantiate the repeated assertion of 'superior performance.' This absence prevents assessment of whether the reported gains are statistically meaningful or merely qualitative.
Authors: The experiments section does contain quantitative comparisons, yet we acknowledge that error bars, complete dataset construction details for the 69k instances, and explicit per-category metrics for geometry, PBR material quality, and relighting fidelity are insufficiently reported. We will expand the experiments section with (i) full ablation tables including numerical values and standard deviations, (ii) a detailed description of the multi-view dataset curation and splits, and (iii) additional evaluation protocols and metrics specifically for geometry accuracy, PBR channel fidelity, and relighting quality under novel lighting. These additions will make the superiority claims quantitatively verifiable. revision: yes
Circularity Check
No circularity: modular reuse of external pre-trained priors with independent experimental validation
full rationale
The paper describes an engineering framework (LGAA Wrapper reusing MV diffusion layers, LGAA Switcher for alignment, LGAA Decoder for 2DGS+PBR prediction) that builds on external pre-trained models trained on billions of images. The central data-efficiency claim (finetuning on 69k instances) is presented as an empirical outcome of prior preservation rather than a quantity defined by or fitted to the target 3D results themselves. No equations, self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the provided text; the derivation chain consists of architectural choices justified by external diffusion priors and validated through quantitative/qualitative experiments.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multi-view diffusion models encapsulate knowledge acquired from billions of images that can be reused for better convergence in 3D tasks.
invented entities (3)
-
LGAA Wrapper
no independent evidence
-
LGAA Switcher
no independent evidence
-
LGAA Decoder
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/CostJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
trained on only 69k multi-view instances... modular design enables flexible incorporation of multiple diffusion priors
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Hunyuan3d 1.0: A unified frame- work for text-to-3d and image-to-3d generation,
X. Yang, H. Shi, B. Zhang, F. Yang, J. Wang, H. Zhao, X. Liu, X. Wang, Q. Lin, J. Yu et al. , “Hunyuan3d 1.0: A unified frame- work for text-to-3d and image-to-3d generation,” arXiv preprint arXiv:2411.02293, 2024. 1
-
[2]
Clay: A controllable large-scale generative model for creating high-quality 3d assets,
L. Zhang, Z. Wang, Q. Zhang, Q. Qiu, A. Pang, H. Jiang, W. Yang, L. Xu, and J. Yu, “Clay: A controllable large-scale generative model for creating high-quality 3d assets,” ACM T ransactions on Graphics (TOG), vol. 43, no. 4, pp. 1–20, 2024. 1
work page 2024
-
[3]
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
Z. Zhao, Z. Lai, Q. Lin, Y. Zhao, H. Liu, S. Yang, Y. Feng, M. Yang, S. Zhang, X. Yang et al., “Hunyuan3d 2.0: Scaling diffusion models for high resolution textured 3d assets generation,” arXiv preprint arXiv:2501.12202, 2025. 1, 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Y. Yang, Y.-C. Guo, Y. Huang, Z.-X. Zou, Z. Yu, Y. Li, Y.-P . Cao, and X. Liu, “Holopart: Generative 3d part amodal segmentation,” arXiv preprint arXiv:2504.07943 , 2025. 1
-
[5]
Sparseflex: High-resolution and arbitrary-topology 3d shape modeling,
X. He, Z.-X. Zou, C.-H. Chen, Y.-C. Guo, D. Liang, C. Yuan, W. Ouyang, Y.-P . Cao, and Y. Li, “Sparseflex: High-resolution and arbitrary-topology 3d shape modeling,” arXiv preprint arXiv:2503.21732, 2025. 1
-
[6]
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
Y. Li, Z.-X. Zou, Z. Liu, D. Wang, Y. Liang, Z. Yu, X. Liu, Y.-C. Guo, D. Liang, W. Ouyang et al., “Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models,” arXiv preprint arXiv:2502.06608, 2025. 1
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
S. Wu, Y. Lin, F. Zhang, Y. Zeng, Y. Yang, Y. Bao, J. Qian, S. Zhu, P . Torr, X. Cao, and Y. Yao, “Direct3d-s2: Gigascale 3d generation made easy with spatial sparse attention,” arXiv preprint arXiv:2505.17412, 2025. 1, 3
-
[8]
arXiv2505.14521(2025) 6, 8, 10, 11
Z. Li, Y. Wang, H. Zheng, Y. Luo, and B. Wen, “Sparc3d: Sparse representation and construction for high-resolution 3d shapes modeling,” arXiv preprint arXiv:2505.14521 , 2025. 1, 3
-
[9]
DreamFusion: Text-to-3D using 2D Diffusion
B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “Dreamfusion: Text-to-3d using 2d diffusion,” arXiv preprint arXiv:2209.14988 ,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation,
H. Wang, X. Du, J. Li, R. A. Yeh, and G. Shakhnarovich, “Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 12 619–12 629. 1
work page 2023
-
[11]
Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content cre- ation,
R. Chen, Y. Chen, N. Jiao, and K. Jia, “Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content cre- ation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 22 246–22 256. 1, 3
work page 2023
-
[12]
Magic3d: High- resolution text-to-3d content creation,
C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, 12 K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin, “Magic3d: High- resolution text-to-3d content creation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 300–309. 1, 3
work page 2023
-
[13]
Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, and J. Zhu, “Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation,” Advances in Neural Information Processing Systems, vol. 36, 2024. 1, 3
work page 2024
-
[14]
Text-to-3d using gaussian splatting,
Z. Chen, F. Wang, Y. Wang, and H. Liu, “Text-to-3d using gaussian splatting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 21 401–21 412. 1, 3
work page 2024
-
[15]
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng, “Dreamgaussian: Generative gaussian splatting for efficient 3d content creation,” arXiv preprint arXiv:2309.16653 , 2023. 1, 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
R. Shi, H. Chen, Z. Zhang, M. Liu, C. Xu, X. Wei, L. Chen, C. Zeng, and H. Su, “Zero123++: a single image to consistent multi-view diffusion base model,” arXiv preprint arXiv:2310.15110 , 2023. 1, 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[17]
Wonder3d: Single image to 3d using cross-domain diffusion,
X. Long, Y.-C. Guo, C. Lin, Y. Liu, Z. Dou, L. Liu, Y. Ma, S.- H. Zhang, M. Habermann, C. Theobalt et al. , “Wonder3d: Single image to 3d using cross-domain diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 9970–9980. 1, 3
work page 2024
-
[18]
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
Y. Liu, C. Lin, Z. Zeng, X. Long, L. Liu, T. Komura, and W. Wang, “Syncdreamer: Generating multiview-consistent images from a single-view image,” arXiv preprint arXiv:2309.03453 , 2023. 1, 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[19]
MVDream: Multi-view Diffusion for 3D Generation
Y. Shi, P . Wang, J. Ye, M. Long, K. Li, and X. Yang, “Mv- dream: Multi-view diffusion for 3d generation,” arXiv preprint arXiv:2308.16512, 2023. 1, 3, 4, 5, 8
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
J. Li, H. Tan, K. Zhang, Z. Xu, F. Luan, Y. Xu, Y. Hong, K. Sunkavalli, G. Shakhnarovich, and S. Bi, “Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model,” arXiv preprint arXiv:2311.06214 , 2023. 1, 3
-
[21]
LRM: Large Reconstruction Model for Single Image to 3D
Y. Hong, K. Zhang, J. Gu, S. Bi, Y. Zhou, D. Liu, F. Liu, K. Sunkavalli, T. Bui, and H. Tan, “Lrm: Large reconstruction model for single image to 3d,” arXiv preprint arXiv:2311.04400 ,
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
Y. Xu, H. Tan, F. Luan, S. Bi, P . Wang, J. Li, Z. Shi, K. Sunkavalli, G. Wetzstein, Z. Xu et al. , “Dmv3d: Denoising multi-view dif- fusion using 3d large reconstruction model,” arXiv preprint arXiv:2311.09217, 2023. 1
-
[23]
J. Xu, W. Cheng, Y. Gao, X. Wang, S. Gao, and Y. Shan, “In- stantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models,” arXiv preprint arXiv:2404.07191, 2024. 1, 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[24]
J. Tang, Z. Chen, X. Chen, T. Wang, G. Zeng, and Z. Liu, “Lgm: Large multi-view gaussian model for high-resolution 3d content creation,” arXiv preprint arXiv:2402.05054 , 2024. 1, 3, 6, 8
-
[25]
Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation
Y. Xu, Z. Shi, W. Yifan, H. Chen, C. Yang, S. Peng, Y. Shen, and G. Wetzstein, “Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation,” arXiv preprint arXiv:2403.14621, 2024. 1, 3
-
[26]
3dtopia-xl: Scaling high-quality 3d asset gen- eration via primitive diffusion,
Z. Chen, J. Tang, Y. Dong, Z. Cao, F. Hong, Y. Lan, T. Wang, H. Xie, T. Wu, S. Saitoet al., “3dtopia-xl: Scaling high-quality 3d asset gen- eration via primitive diffusion,” arXiv preprint arXiv:2409.12957 ,
- [27]
-
[28]
NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction
P . Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” arXiv preprint arXiv:2106.10689 , 2021. 1, 3
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[29]
Nerf: Representing scenes as neural radiance fields for view synthesis,
B. Mildenhall, P . P . Srinivasan, M. Tancik, J. T. Barron, R. Ra- mamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM , vol. 65, no. 1, pp. 99–106, 2021. 1, 3
work page 2021
-
[30]
Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d,
L. Qiu, G. Chen, X. Gu, Q. Zuo, M. Xu, Y. Wu, W. Yuan, Z. Dong, L. Bo, and X. Han, “Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 9914–9925. 1, 2, 3, 7
work page 2024
-
[31]
arXiv preprint arXiv:2412.12083 (2024)
Z. Li, T. Wu, J. Tan, M. Zhang, J. Wang, and D. Lin, “Idarb: Intrinsic decomposition for arbitrary number of input views and illuminations,” arXiv preprint arXiv:2412.12083 , 2024. 1, 3, 4, 5, 8
-
[32]
arXiv preprint arXiv:2501.18590 (2025)
R. Liang, Z. Gojcic, H. Ling, J. Munkberg, J. Hasselgren, Z.-H. Lin, J. Gao, A. Keller, N. Vijaykumar, S. Fidler et al. , “Diffusionren- derer: Neural inverse and forward rendering with video diffusion models,” arXiv preprint arXiv:2501.18590 , 2025. 1
-
[33]
Depth anything at any condition.arXiv preprint arXiv:2507.01634, 2025
B. Sun, M. Jin, B. Yin, and Q. Hou, “Depth anything at any condition,” arXiv preprint arXiv:2507.01634 , 2025. 1
-
[34]
3d gaussian splatting for real-time radiance field rendering,
B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM T ransactions on Graphics, vol. 42, no. 4, July 2023. [Online]. Avail- able: https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/ 2, 3, 4, 7
work page 2023
-
[35]
2d gaussian splatting for geometrically accurate radiance fields,
B. Huang, Z. Yu, A. Chen, A. Geiger, and S. Gao, “2d gaussian splatting for geometrically accurate radiance fields,” in ACM SIGGRAPH 2024 Conference Papers , 2024, pp. 1–11. 2, 4, 7
work page 2024
-
[36]
Splatter image: Ultra-fast single-view 3d reconstruction,
S. Szymanowicz, C. Rupprecht, and A. Vedaldi, “Splatter image: Ultra-fast single-view 3d reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 10 208–10 217. 2, 4
work page 2024
-
[37]
Objaverse: A universe of annotated 3d objects,
M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. Van- derBilt, L. Schmidt, K. Ehsani, A. Kembhavi, and A. Farhadi, “Objaverse: A universe of annotated 3d objects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 13 142–13 153. 2, 3, 7
work page 2023
-
[38]
Denoising diffusion probabilistic models,
J. Ho, A. Jain, and P . Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems , vol. 33, pp. 6840–6851, 2020. 3
work page 2020
-
[39]
Flexible isosurface extraction for gradient-based mesh optimization,
T. Shen, J. Munkberg, J. Hasselgren, K. Yin, Z. Wang, W. Chen, Z. Gojcic, S. Fidler, N. Sharp, and J. Gao, “Flexible isosurface extraction for gradient-based mesh optimization,” ACM T rans. Graph. , vol. 42, no. 4, jul 2023. [Online]. Available: https://doi.org/10.1145/3592430 3
-
[40]
Deep march- ing tetrahedra: a hybrid representation for high-resolution 3d shape synthesis,
T. Shen, J. Gao, K. Yin, M.-Y. Liu, and S. Fidler, “Deep march- ing tetrahedra: a hybrid representation for high-resolution 3d shape synthesis,” Advances in Neural Information Processing Systems, vol. 34, pp. 6087–6101, 2021. 3
work page 2021
-
[41]
Occupancy networks: Learning 3d reconstruction in function space,
L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” in Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , 2019, pp. 4460–4470. 3
work page 2019
-
[42]
arXiv preprint arXiv:2310.19415 , year=
X. Yu, Y.-C. Guo, Y. Li, D. Liang, S.-H. Zhang, and X. Qi, “Text-to-3d with classifier score distillation,” arXiv preprint arXiv:2310.19415, 2023. 3
-
[43]
Lucid- dreamer: Towards high-fidelity text-to-3d generation via interval score matching,
Y. Liang, X. Yang, J. Lin, H. Li, X. Xu, and Y. Chen, “Lucid- dreamer: Towards high-fidelity text-to-3d generation via interval score matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 6517–6526. 3
work page 2024
-
[44]
Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors
T. Yi, J. Fang, G. Wu, L. Xie, X. Zhang, W. Liu, Q. Tian, and X. Wang, “Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors,” arXiv preprint arXiv:2310.08529, 2023. 3
-
[45]
K. Zhang, S. Bi, H. Tan, Y. Xiangli, N. Zhao, K. Sunkavalli, and Z. Xu, “Gs-lrm: Large reconstruction model for 3d gaussian splatting,” arXiv preprint arXiv:2404.19702 , 2024. 3
-
[46]
Ctrl-room: Controllable text- to-3d room meshes generation with layout constraints,
C. Fang, X. Hu, K. Luo, and P . Tan, “Ctrl-room: Controllable text-to-3d room meshes generation with layout constraints,” arXiv preprint arXiv:2310.03602, 2023. 3
-
[47]
arXiv preprint arXiv:2312.17142 , year=
J. Ren, L. Pan, J. Tang, C. Zhang, A. Cao, G. Zeng, and Z. Liu, “Dreamgaussian4d: Generative 4d gaussian splatting,” arXiv preprint arXiv:2312.17142 , 2023. 3
-
[48]
arXiv preprint arXiv:2301.11280 , year=
U. Singer, S. Sheynin, A. Polyak, O. Ashual, I. Makarov, F. Kokki- nos, N. Goyal, A. Vedaldi, D. Parikh, J. Johnson et al., “Text-to-4d dynamic scene generation,” arXiv preprint arXiv:2301.11280 , 2023. 3
-
[49]
Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models,
H. Ling, S. W. Kim, A. Torralba, S. Fidler, and K. Kreis, “Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 8576–8588. 3
work page 2024
-
[50]
Objaverse-xl: A universe of 10m+ 3d objects,
M. Deitke, R. Liu, M. Wallingford, H. Ngo, O. Michel, A. Kusupati, A. Fan, C. Laforte, V . Voleti, S. Y. Gadre et al. , “Objaverse-xl: A universe of 10m+ 3d objects,” Advances in Neural Information Processing Systems, vol. 36, 2024. 3, 7
work page 2024
-
[51]
Ar-1-to-3: Single image to consistent 3d object generation via next-view prediction,
X. Zhang, Y. Zhou, K. Wang, Y. Wang, Z. Li, S. Jiao, D. Zhou, Q. Hou, and M.-M. Cheng, “Ar-1-to-3: Single image to consistent 3d object generation via next-view prediction,” arXiv preprint arXiv:2503.12929, 2025. 3
-
[52]
Sweetdreamer: Aligning geometric priors in 2d diffusion for consistent text-to-3d,
W. Li, R. Chen, X. Chen, and P . Tan, “Sweetdreamer: Aligning geometric priors in 2d diffusion for consistent text-to-3d,” arXiv preprint arXiv:2310.02596, 2023. 3 13
-
[53]
Dreamview: Injecting view-specific text guidance into text-to-3d generation,
J. Yan, Y. Gao, Q. Yang, X. Wei, X. Xie, A. Wu, and W.-S. Zheng, “Dreamview: Injecting view-specific text guidance into text-to-3d generation,” in European Conference on Computer Vision . Springer, 2024, pp. 358–374. 3, 4, 5, 8
work page 2024
-
[54]
Crm: Single image to 3d textured mesh with convolutional reconstruction model,
Z. Wang, Y. Wang, Y. Chen, C. Xiang, S. Chen, D. Yu, C. Li, H. Su, and J. Zhu, “Crm: Single image to 3d textured mesh with con- volutional reconstruction model,” arXiv preprint arXiv:2403.05034 ,
-
[55]
Lara: Efficient large-baseline radiance fields,
A. Chen, H. Xu, S. Esposito, S. Tang, and A. Geiger, “Lara: Efficient large-baseline radiance fields,” arXiv preprint arXiv:2407.04699 ,
-
[56]
Turbo3d: Ultra-fast text-to-3d generation,
H. Hu, T. Yin, F. Luan, Y. Hu, H. Tan, Z. Xu, S. Bi, S. Tulsiani, and K. Zhang, “Turbo3d: Ultra-fast text-to-3d generation,” arXiv preprint arXiv:2412.04470, 2024. 3
-
[57]
Diffsplat: Repurposing image diffusion models for scalable gaussian splat generation,
C. Lin, P . Pan, B. Yang, Z. Li, and Y. Mu, “Diffsplat: Repurposing image diffusion models for scalable gaussian splat generation,” arXiv preprint arXiv:2501.16764 , 2025. 3, 8
-
[58]
Structured 3d latents for scalable and versatile 3d generation,
J. Xiang, Z. Lv, S. Xu, Y. Deng, R. Wang, B. Zhang, D. Chen, X. Tong, and J. Yang, “Structured 3d latents for scalable and versatile 3d generation,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 21 469–21 480. 3, 8
work page 2025
-
[59]
3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models
B. Zhang, J. Tang, M. Nießner, and P . Wonka, “3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models,” ACM T rans. Graph. , vol. 42, no. 4, jul 2023. [Online]. Available: https://doi.org/10.1145/3592442 3
-
[60]
arXiv preprint arXiv:2405.14979 (2024) 11
W. Li, J. Liu, H. Yan, R. Chen, Y. Liang, X. Chen, P . Tan, and X. Long, “Craftsman3d: High-fidelity mesh generation with 3d native generation and interactive geometry refiner,” arXiv preprint arXiv:2405.14979, 2024. 3
-
[61]
Dora: Sampling and benchmarking for 3d shape variational auto-encoders,
R. Chen, J. Zhang, Y. Liang, G. Luo, W. Li, J. Liu, X. Li, X. Long, J. Feng, and P . Tan, “Dora: Sampling and benchmarking for 3d shape variational auto-encoders,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 16 251–16 261. 3
work page 2025
-
[62]
Nero: Neural geometry and brdf reconstruction of reflective objects from multiview images,
Y. Liu, P . Wang, C. Lin, X. Long, J. Wang, L. Liu, T. Komura, and W. Wang, “Nero: Neural geometry and brdf reconstruction of reflective objects from multiview images,” ACM T ransactions on Graphics (TOG), vol. 42, no. 4, pp. 1–22, 2023. 3
work page 2023
-
[63]
Gs-ror: 3d gaussian splat- ting for reflective object relighting via sdf priors,
Z.-L. Zhu, B. Wang, and J. Yang, “Gs-ror: 3d gaussian splat- ting for reflective object relighting via sdf priors,” arXiv preprint arXiv:2406.18544, 2024. 3
-
[64]
Nerfactor: Neural factorization of shape and reflectance under an unknown illumination,
X. Zhang, P . P . Srinivasan, B. Deng, P . Debevec, W. T. Freeman, and J. T. Barron, “Nerfactor: Neural factorization of shape and reflectance under an unknown illumination,” ACM T ransactions on Graphics (T oG), vol. 40, no. 6, pp. 1–18, 2021. 3
work page 2021
-
[65]
Tensosdf: Roughness- aware tensorial representation for robust geometry and material reconstruction,
J. Li, L. Wang, L. Zhang, and B. Wang, “Tensosdf: Roughness- aware tensorial representation for robust geometry and material reconstruction,” ACM T ransactions on Graphics (TOG), vol. 43, no. 4, pp. 1–13, 2024. 3
work page 2024
-
[66]
Gaussian splatting with dis- cretized sdf for relightable assets,
Z.-L. Zhu, J. Yang, and B. Wang, “Gaussian splatting with dis- cretized sdf for relightable assets,” in Proceedings of IEEE Interna- tional Conference on Computer Vision (ICCV) , 2025. 3
work page 2025
-
[67]
Unidream: Unifying dif- fusion priors for relightable text-to-3d generation,
Z. Liu, Y. Li, Y. Lin, X. Yu, S. Peng, Y.-P . Cao, X. Qi, X. Huang, D. Liang, and W. Ouyang, “Unidream: Unifying dif- fusion priors for relightable text-to-3d generation,” arXiv preprint arXiv:2312.08754, 2023. 3
-
[68]
Matlaber: Material- aware text-to-3d via latent brdf auto-encoder,
X. Xu, Z. Lyu, X. Pan, and B. Dai, “Matlaber: Material- aware text-to-3d via latent brdf auto-encoder,” arXiv preprint arXiv:2308.09278, 2023. 3
-
[69]
Meta 3d assetgen: Text-to-mesh generation with high- quality geometry, texture, and pbr materials,
Y. Siddiqui, T. Monnier, F. Kokkinos, M. Kariya, Y. Kleiman, E. Garreau, O. Gafni, N. Neverova, A. Vedaldi, R. Shapovalov et al. , “Meta 3d assetgen: Text-to-mesh generation with high- quality geometry, texture, and pbr materials,” arXiv preprint arXiv:2407.02445, 2024. 3
-
[70]
Arm: Appearance reconstruction model for re- lightable 3d generation,
X. Feng, C. Yu, Z. Bi, Y. Shang, F. Gao, H. Wu, K. Zhou, C. Jiang, and Y. Yang, “Arm: Appearance reconstruction model for re- lightable 3d generation,” arXiv preprint arXiv:2411.10825 , 2024. 3
-
[71]
Texgaussian: Generating high-quality pbr material via octree-based 3d gaussian splatting,
B. Xiong, J. Liu, J. Hu, C. Wu, J. Wu, X. Liu, C. Zhao, E. Ding, and Z. Lian, “Texgaussian: Generating high-quality pbr material via octree-based 3d gaussian splatting,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 551–561. 3
work page 2025
-
[72]
Texgen: a generative diffusion model for mesh textures,
X. Yu, Z. Yuan, Y.-C. Guo, Y.-T. Liu, J. Liu, Y. Li, Y.-P . Cao, D. Liang, and X. Qi, “Texgen: a generative diffusion model for mesh textures,” ACM T ransactions on Graphics (TOG), vol. 43, no. 6, pp. 1–14, 2024. 3
work page 2024
-
[73]
B. Ummenhofer, S. Agrawal, R. Sep ´ulveda, Y. Lao, K. Zhang, T. Cheng, S. R. Richter, S. Wang, and G. Ros, “Objects with lighting: A real-world dataset for evaluating reconstruction and rendering for object relighting,” in 3DV. IEEE, 2024. 3, 6
work page 2024
-
[74]
Digital twin catalog: A large-scale photorealistic 3d object digital twin dataset,
Z. Dong, K. Chen, Z. Lv, H.-X. Yu, Y. Zhang, C. Zhang, Y. Zhu, S. Tian, Z. Li, G. Moffatt et al., “Digital twin catalog: A large-scale photorealistic 3d object digital twin dataset,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 753–
work page 2025
-
[75]
Mage: Single image to material-aware 3d via the multi-view g- buffer estimation model,
H. Wang, Z. Wang, X. Long, C. Lin, G. Hancke, and R. W. Lau, “Mage: Single image to material-aware 3d via the multi-view g- buffer estimation model,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 10 985–10 995. 3
work page 2025
-
[76]
High-resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695. 4
work page 2022
-
[77]
arXiv preprint arXiv:2312.02201 , year=
P . Wang and Y. Shi, “Imagedream: Image-prompt multi-view diffusion for 3d generation,” arXiv preprint arXiv:2312.02201, 2023. 4, 5, 7, 8
-
[78]
Adding conditional control to text-to-image diffusion models,
L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023, pp. 3836–3847. 5
work page 2023
-
[79]
Extracting triangular 3d models, mate- rials, and lighting from images,
J. Munkberg, J. Hasselgren, T. Shen, J. Gao, W. Chen, A. Evans, T. M ¨uller, and S. Fidler, “Extracting triangular 3d models, mate- rials, and lighting from images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 8280–8290. 5
work page 2022
-
[80]
Gs-ir: 3d gaussian splatting for inverse rendering,
Z. Liang, Q. Zhang, Y. Feng, Y. Shan, and K. Jia, “Gs-ir: 3d gaussian splatting for inverse rendering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 21 644–21 653. 5
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.