DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation

Jian Yang; Jiaxiong Qiu; Jin Xie; Liu Liu; Wei Sui; Xinjie Wang; Ze-Xin Yin; Zhizhong Su

arxiv: 2509.07435 · v2 · submitted 2025-09-09 · 💻 cs.CV

DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation

Ze-Xin Yin , Jiaxiong Qiu , Liu Liu , Xinjie Wang , Wei Sui , Zhizhong Su , Jian Yang , Jin Xie This is my paper

Pith reviewed 2026-05-18 18:26 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D asset generationmulti-view diffusionGaussian splattingPBR materialsplug-in adapterdata-efficient finetuningrelightable meshesdiffusion priors

0 comments

The pith

LGAA reuses layers from multi-view diffusion models to generate PBR-ready 3D assets from only 69k instances.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Lightweight Gaussian Asset Adapter (LGAA), a modular plug-in that attaches to existing multi-view diffusion models to create 3D assets with both geometry and physically based rendering materials. It reuses and adapts network layers trained on billions of images so that fine-tuning on a small set of 69,000 multi-view examples still converges well and preserves useful 2D knowledge. The design includes a wrapper for layer reuse, a switcher to combine multiple priors, and a decoder that outputs 2D Gaussian splats with PBR channels, followed by post-processing to extract relightable meshes. A sympathetic reader would care because most 3D generation methods either ignore materials or need far larger datasets, and this method offers an end-to-end, data-efficient route to production-ready assets.

Core claim

LGAA unifies geometry and PBR material modeling by exploiting multi-view diffusion priors through a modular design: the LGAA Wrapper reuses and adapts network layers from MV diffusion models to preserve 2D priors for better convergence, the LGAA Switcher aligns multiple wrapper layers that encapsulate different knowledge, and the LGAA Decoder, a tamed variational autoencoder, predicts 2D Gaussian Splatting with PBR channels. A dedicated post-processing procedure then extracts high-quality, relightable mesh assets from the resulting 2DGS. Experiments demonstrate superior performance with both text- and image-conditioned MV diffusion models and data-efficient finetuning with merely 69k multi-v

What carries the argument

Lightweight Gaussian Asset Adapter (LGAA), a plug-in module whose Wrapper reuses MV diffusion layers, Switcher aligns multiple priors, and Decoder predicts 2DGS with PBR channels, thereby lifting pre-trained models for unified 3D asset generation.

Load-bearing premise

Reusing and adapting network layers from pre-trained MV diffusion models preserves 2D priors sufficiently to enable better convergence and superior 3D PBR performance when finetuned on only 69k multi-view instances.

What would settle it

A model trained from scratch on the same 69k multi-view instances that matches or exceeds LGAA in convergence speed and final 3D PBR asset quality would show that layer reuse is not necessary for the claimed data efficiency.

Figures

Figures reproduced from arXiv: 2509.07435 by Jian Yang, Jiaxiong Qiu, Jin Xie, Liu Liu, Wei Sui, Xinjie Wang, Ze-Xin Yin, Zhizhong Su.

**Figure 1.** Figure 1: Our pipeline possesses the capability of generating diverse, PBR-ready 3D assets from either text prompts or image [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Overall of the 3D asset generation pipeline. We propose the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visual comparisons of text-conditioned 3D asset generation methods. For LGM and LaRa, we use MVDream 2.1 to [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Visual comparisons of image-conditioned 3D asset generation methods. For LGM and LaRa, we use ImageDream to [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Relighting results under different HDRI maps. The synthesized diffuse materials exhibit base color changes under [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: We provide detailed visualizations of geometry and PBR materials from the generated 3D assets, along with the input [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

The labor- and experience-intensive creation of 3D assets with physically based rendering (PBR) materials demands an autonomous 3D asset creation pipeline. However, most existing 3D generation methods focus on geometry modeling, either baking textures into simple vertex colors or leaving texture synthesis to post-processing with image diffusion models. To achieve end-to-end PBR-ready 3D asset generation, we present Lightweight Gaussian Asset Adapter (LGAA), a novel framework that unifies the modeling of geometry and PBR materials by exploiting multi-view (MV) diffusion priors from a novel perspective. The LGAA features a modular design with three components. Specifically, the LGAA Wrapper reuses and adapts network layers from MV diffusion models, which encapsulate knowledge acquired from billions of images, enabling better convergence in a data-efficient manner. To incorporate multiple diffusion priors for geometry and PBR synthesis, the LGAA Switcher aligns multiple LGAA Wrapper layers encapsulating different knowledge. Then, a tamed variational autoencoder (VAE), termed LGAA Decoder, is designed to predict 2D Gaussian Splatting (2DGS) with PBR channels. Finally, we introduce a dedicated post-processing procedure to effectively extract high-quality, relightable mesh assets from the resulting 2DGS. Extensive quantitative and qualitative experiments demonstrate the superior performance of LGAA with both text- and image-conditioned MV diffusion models. Additionally, the modular design enables flexible incorporation of multiple diffusion priors, and the knowledge-preserving scheme effectively preseves the 2D priors learned on massive image dataset, which leads to data efficient finetuning to lift the MV diffuison models for 3D generation with merely 69k multi-view instances.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LGAA is a practical modular adapter for lifting MV diffusion models to 2DGS with PBR, but the data-efficiency story hinges on unshown checks that the 2D priors survive the 69k-instance fine-tune.

read the letter

The main point to know is that this paper presents a modular plug-in called LGAA to adapt multi-view diffusion models for generating 3D assets that include physically based rendering materials, using 2D Gaussian Splatting as the core representation. The new part is the specific three-component architecture. The LGAA Wrapper reuses and fine-tunes layers from existing MV diffusion models to carry over their 2D knowledge into the 3D task. The LGAA Switcher then combines wrappers trained on different priors, one for geometry and one for PBR. Finally the LGAA Decoder, a tamed VAE, predicts the 2DGS with added PBR channels. They follow this with a post-processing step to extract high-quality relightable meshes. The claim is that this leads to better performance and allows fine-tuning with only 69k multi-view instances because the 2D priors are preserved. This approach has merit in its engineering practicality. Reusing pre-trained models makes sense for efficiency, and the modularity could let people swap in various diffusion backbones without retraining everything. If the experiments in the full paper show clear gains over baselines with proper controls, that would be a useful advance for graphics pipelines. The soft spots center on the evidence for the data-efficiency story. The abstract asserts superior results and preservation of priors but gives no quantitative metrics, ablations, or comparisons like feature similarity before and after adaptation or performance against a from-scratch model. The concern that fine-tuning on 69k instances might overwrite rather than preserve the relevant knowledge is reasonable until those checks appear. Minor issues might include more details on the dataset construction and evaluation protocols. This paper is for people working on diffusion-based 3D generation who need a way to incorporate PBR without heavy new training. A practitioner or researcher focused on practical implementations would get the most out of it. I recommend sending it for peer review. The framework is clearly described and the goal is well-motivated, so referees can assess the experimental support and suggest improvements where needed.

Referee Report

2 major / 2 minor

Summary. The paper proposes Lightweight Gaussian Asset Adapter (LGAA), a modular plug-in framework to lift pre-trained multi-view (MV) diffusion models for end-to-end 3D asset generation with geometry and PBR materials. LGAA Wrapper reuses and adapts layers from MV diffusion models to preserve 2D priors for data-efficient finetuning; LGAA Switcher aligns multiple such wrappers for geometry and PBR priors; LGAA Decoder (a tamed VAE) predicts 2D Gaussian Splatting with PBR channels; a post-processing step extracts relightable meshes. The authors claim superior performance over existing methods via extensive quantitative and qualitative experiments on both text- and image-conditioned MV models, achieved through data-efficient finetuning on only 69k multi-view instances.

Significance. If the empirical claims hold, the modular reuse of billion-image 2D priors for 3D PBR generation would represent a practical advance in data-efficient 3D asset pipelines, reducing reliance on large-scale 3D datasets while enabling flexible combination of geometry and material priors. The explicit post-processing to relightable meshes and the 2DGS output format are also potentially useful for downstream applications.

major comments (2)

Abstract: the central claim that LGAA 'effectively preserves the 2D priors learned on massive image dataset' and thereby enables 'data efficient finetuning ... with merely 69k multi-view instances' is unsupported by any reported quantitative check (feature similarity, retained 2D generation quality after adaptation, or from-scratch ablation). Without such evidence the attribution of gains to prior preservation rather than to the new architecture or training protocol remains unverified.
Abstract and Experiments section: no numerical metrics, ablation tables, error bars, dataset construction details, or evaluation protocol (e.g., metrics for geometry, PBR, or relighting quality) are supplied to substantiate the repeated assertion of 'superior performance.' This absence prevents assessment of whether the reported gains are statistically meaningful or merely qualitative.

minor comments (2)

Abstract: typographical errors ('preseves', 'diffuison') should be corrected.
Abstract: the sentence 'the modular design enables flexible incorporation of multiple diffusion priors, and the knowledge-preserving scheme effectively preseves...' is run-on and should be split for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The two major comments identify important gaps in evidentiary support for our central claims. We agree that additional quantitative checks and detailed reporting are needed to strengthen the manuscript and will revise accordingly.

read point-by-point responses

Referee: Abstract: the central claim that LGAA 'effectively preserves the 2D priors learned on massive image dataset' and thereby enables 'data efficient finetuning ... with merely 69k multi-view instances' is unsupported by any reported quantitative check (feature similarity, retained 2D generation quality after adaptation, or from-scratch ablation). Without such evidence the attribution of gains to prior preservation rather than to the new architecture or training protocol remains unverified.

Authors: We agree that the current version does not contain direct quantitative verification of prior preservation, such as feature-space similarity between adapted and original MV diffusion layers or an explicit from-scratch training ablation. The manuscript relies on indirect evidence through overall performance gains and the modular reuse design. We will add a dedicated ablation subsection that reports (i) performance when the LGAA Wrapper is trained from random initialization versus initialized from MV diffusion weights and (ii) any feasible retained 2D generation quality metrics after adaptation. This revision will allow readers to assess the contribution of the preserved priors more rigorously. revision: yes
Referee: Abstract and Experiments section: no numerical metrics, ablation tables, error bars, dataset construction details, or evaluation protocol (e.g., metrics for geometry, PBR, or relighting quality) are supplied to substantiate the repeated assertion of 'superior performance.' This absence prevents assessment of whether the reported gains are statistically meaningful or merely qualitative.

Authors: The experiments section does contain quantitative comparisons, yet we acknowledge that error bars, complete dataset construction details for the 69k instances, and explicit per-category metrics for geometry, PBR material quality, and relighting fidelity are insufficiently reported. We will expand the experiments section with (i) full ablation tables including numerical values and standard deviations, (ii) a detailed description of the multi-view dataset curation and splits, and (iii) additional evaluation protocols and metrics specifically for geometry accuracy, PBR channel fidelity, and relighting quality under novel lighting. These additions will make the superiority claims quantitatively verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: modular reuse of external pre-trained priors with independent experimental validation

full rationale

The paper describes an engineering framework (LGAA Wrapper reusing MV diffusion layers, LGAA Switcher for alignment, LGAA Decoder for 2DGS+PBR prediction) that builds on external pre-trained models trained on billions of images. The central data-efficiency claim (finetuning on 69k instances) is presented as an empirical outcome of prior preservation rather than a quantity defined by or fitted to the target 3D results themselves. No equations, self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the provided text; the derivation chain consists of architectural choices justified by external diffusion priors and validated through quantitative/qualitative experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim rests on the effectiveness of the modular adapter design in preserving pre-trained 2D diffusion knowledge while adapting to 3D PBR output; this depends on domain assumptions about diffusion model capabilities and introduces several new invented components without independent falsifiable evidence beyond the claimed experiments.

axioms (1)

domain assumption Multi-view diffusion models encapsulate knowledge acquired from billions of images that can be reused for better convergence in 3D tasks.
Invoked in the abstract to justify the Wrapper component and data-efficient finetuning.

invented entities (3)

LGAA Wrapper no independent evidence
purpose: Reuses and adapts network layers from MV diffusion models
New modular component introduced to encapsulate and adapt 2D priors.
LGAA Switcher no independent evidence
purpose: Aligns multiple LGAA Wrapper layers encapsulating different knowledge for geometry and PBR
New component to incorporate multiple diffusion priors.
LGAA Decoder no independent evidence
purpose: Tamed variational autoencoder to predict 2D Gaussian Splatting with PBR channels
New decoder design for the 3D output representation.

pith-pipeline@v0.9.0 · 5868 in / 1538 out tokens · 59479 ms · 2026-05-18T18:26:06.847959+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

trained on only 69k multi-view instances... modular design enables flexible incorporation of multiple diffusion priors

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

96 extracted references · 96 canonical work pages · 11 internal anchors

[1]

Hunyuan3d 1.0: A unified frame- work for text-to-3d and image-to-3d generation,

X. Yang, H. Shi, B. Zhang, F. Yang, J. Wang, H. Zhao, X. Liu, X. Wang, Q. Lin, J. Yu et al. , “Hunyuan3d 1.0: A unified frame- work for text-to-3d and image-to-3d generation,” arXiv preprint arXiv:2411.02293, 2024. 1

work page arXiv 2024
[2]

Clay: A controllable large-scale generative model for creating high-quality 3d assets,

L. Zhang, Z. Wang, Q. Zhang, Q. Qiu, A. Pang, H. Jiang, W. Yang, L. Xu, and J. Yu, “Clay: A controllable large-scale generative model for creating high-quality 3d assets,” ACM T ransactions on Graphics (TOG), vol. 43, no. 4, pp. 1–20, 2024. 1

work page 2024
[3]

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Z. Zhao, Z. Lai, Q. Lin, Y. Zhao, H. Liu, S. Yang, Y. Feng, M. Yang, S. Zhang, X. Yang et al., “Hunyuan3d 2.0: Scaling diffusion models for high resolution textured 3d assets generation,” arXiv preprint arXiv:2501.12202, 2025. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

arXiv2504.07943(2025) 8

Y. Yang, Y.-C. Guo, Y. Huang, Z.-X. Zou, Z. Yu, Y. Li, Y.-P . Cao, and X. Liu, “Holopart: Generative 3d part amodal segmentation,” arXiv preprint arXiv:2504.07943 , 2025. 1

work page arXiv 2025
[5]

Sparseflex: High-resolution and arbitrary-topology 3d shape modeling,

X. He, Z.-X. Zou, C.-H. Chen, Y.-C. Guo, D. Liang, C. Yuan, W. Ouyang, Y.-P . Cao, and Y. Li, “Sparseflex: High-resolution and arbitrary-topology 3d shape modeling,” arXiv preprint arXiv:2503.21732, 2025. 1

work page arXiv 2025
[6]

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

Y. Li, Z.-X. Zou, Z. Liu, D. Wang, Y. Liang, Z. Yu, X. Liu, Y.-C. Guo, D. Liang, W. Ouyang et al., “Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models,” arXiv preprint arXiv:2502.06608, 2025. 1

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Direct3d-s2: Gigascale 3d generation made easy with spatial sparse attention.arXiv preprint arXiv:2505.17412, 2025

S. Wu, Y. Lin, F. Zhang, Y. Zeng, Y. Yang, Y. Bao, J. Qian, S. Zhu, P . Torr, X. Cao, and Y. Yao, “Direct3d-s2: Gigascale 3d generation made easy with spatial sparse attention,” arXiv preprint arXiv:2505.17412, 2025. 1, 3

work page arXiv 2025
[8]

arXiv2505.14521(2025) 6, 8, 10, 11

Z. Li, Y. Wang, H. Zheng, Y. Luo, and B. Wen, “Sparc3d: Sparse representation and construction for high-resolution 3d shapes modeling,” arXiv preprint arXiv:2505.14521 , 2025. 1, 3

work page arXiv 2025
[9]

DreamFusion: Text-to-3D using 2D Diffusion

B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “Dreamfusion: Text-to-3d using 2d diffusion,” arXiv preprint arXiv:2209.14988 ,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation,

H. Wang, X. Du, J. Li, R. A. Yeh, and G. Shakhnarovich, “Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 12 619–12 629. 1

work page 2023
[11]

Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content cre- ation,

R. Chen, Y. Chen, N. Jiao, and K. Jia, “Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content cre- ation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 22 246–22 256. 1, 3

work page 2023
[12]

Magic3d: High- resolution text-to-3d content creation,

C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, 12 K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin, “Magic3d: High- resolution text-to-3d content creation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 300–309. 1, 3

work page 2023
[13]

Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation,

Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, and J. Zhu, “Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation,” Advances in Neural Information Processing Systems, vol. 36, 2024. 1, 3

work page 2024
[14]

Text-to-3d using gaussian splatting,

Z. Chen, F. Wang, Y. Wang, and H. Liu, “Text-to-3d using gaussian splatting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 21 401–21 412. 1, 3

work page 2024
[15]

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng, “Dreamgaussian: Generative gaussian splatting for efficient 3d content creation,” arXiv preprint arXiv:2309.16653 , 2023. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model

R. Shi, H. Chen, Z. Zhang, M. Liu, C. Xu, X. Wei, L. Chen, C. Zeng, and H. Su, “Zero123++: a single image to consistent multi-view diffusion base model,” arXiv preprint arXiv:2310.15110 , 2023. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

Wonder3d: Single image to 3d using cross-domain diffusion,

X. Long, Y.-C. Guo, C. Lin, Y. Liu, Z. Dou, L. Liu, Y. Ma, S.- H. Zhang, M. Habermann, C. Theobalt et al. , “Wonder3d: Single image to 3d using cross-domain diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 9970–9980. 1, 3

work page 2024
[18]

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Y. Liu, C. Lin, Z. Zeng, X. Long, L. Liu, T. Komura, and W. Wang, “Syncdreamer: Generating multiview-consistent images from a single-view image,” arXiv preprint arXiv:2309.03453 , 2023. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

MVDream: Multi-view Diffusion for 3D Generation

Y. Shi, P . Wang, J. Ye, M. Long, K. Li, and X. Yang, “Mv- dream: Multi-view diffusion for 3d generation,” arXiv preprint arXiv:2308.16512, 2023. 1, 3, 4, 5, 8

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model.arXiv preprint arXiv:2311.06214, 2023

J. Li, H. Tan, K. Zhang, Z. Xu, F. Luan, Y. Xu, Y. Hong, K. Sunkavalli, G. Shakhnarovich, and S. Bi, “Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model,” arXiv preprint arXiv:2311.06214 , 2023. 1, 3

work page arXiv 2023
[21]

LRM: Large Reconstruction Model for Single Image to 3D

Y. Hong, K. Zhang, J. Gu, S. Bi, Y. Zhou, D. Liu, F. Liu, K. Sunkavalli, T. Bui, and H. Tan, “Lrm: Large reconstruction model for single image to 3d,” arXiv preprint arXiv:2311.04400 ,

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Dmv3d: Denoising multi-view diffu- sion using 3d large reconstruction model.arXiv preprint arXiv:2311.09217, 2023

Y. Xu, H. Tan, F. Luan, S. Bi, P . Wang, J. Li, Z. Shi, K. Sunkavalli, G. Wetzstein, Z. Xu et al. , “Dmv3d: Denoising multi-view dif- fusion using 3d large reconstruction model,” arXiv preprint arXiv:2311.09217, 2023. 1

work page arXiv 2023
[23]

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

J. Xu, W. Cheng, Y. Gao, X. Wang, S. Gao, and Y. Shan, “In- stantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models,” arXiv preprint arXiv:2404.07191, 2024. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

arXiv2402.05054(2024) 11

J. Tang, Z. Chen, X. Chen, T. Wang, G. Zeng, and Z. Liu, “Lgm: Large multi-view gaussian model for high-resolution 3d content creation,” arXiv preprint arXiv:2402.05054 , 2024. 1, 3, 6, 8

work page arXiv 2024
[25]

Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation

Y. Xu, Z. Shi, W. Yifan, H. Chen, C. Yang, S. Peng, Y. Shen, and G. Wetzstein, “Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation,” arXiv preprint arXiv:2403.14621, 2024. 1, 3

work page arXiv 2024
[26]

3dtopia-xl: Scaling high-quality 3d asset gen- eration via primitive diffusion,

Z. Chen, J. Tang, Y. Dong, Z. Cao, F. Hong, Y. Lan, T. Wang, H. Xie, T. Wu, S. Saitoet al., “3dtopia-xl: Scaling high-quality 3d asset gen- eration via primitive diffusion,” arXiv preprint arXiv:2409.12957 ,

work page arXiv
[27]

Huang, Y

Z. Huang, Y.-C. Guo, H. Wang, R. Yi, L. Ma, Y.-P . Cao, and L. Sheng, “Mv-adapter: Multi-view consistent image generation made easy,” arXiv preprint arXiv:2412.03632 , 2024. 1

work page arXiv 2024
[28]

NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

P . Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” arXiv preprint arXiv:2106.10689 , 2021. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2021
[29]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P . P . Srinivasan, M. Tancik, J. T. Barron, R. Ra- mamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM , vol. 65, no. 1, pp. 99–106, 2021. 1, 3

work page 2021
[30]

Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d,

L. Qiu, G. Chen, X. Gu, Q. Zuo, M. Xu, Y. Wu, W. Yuan, Z. Dong, L. Bo, and X. Han, “Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 9914–9925. 1, 2, 3, 7

work page 2024
[31]

arXiv preprint arXiv:2412.12083 (2024)

Z. Li, T. Wu, J. Tan, M. Zhang, J. Wang, and D. Lin, “Idarb: Intrinsic decomposition for arbitrary number of input views and illuminations,” arXiv preprint arXiv:2412.12083 , 2024. 1, 3, 4, 5, 8

work page arXiv 2024
[32]

arXiv preprint arXiv:2501.18590 (2025)

R. Liang, Z. Gojcic, H. Ling, J. Munkberg, J. Hasselgren, Z.-H. Lin, J. Gao, A. Keller, N. Vijaykumar, S. Fidler et al. , “Diffusionren- derer: Neural inverse and forward rendering with video diffusion models,” arXiv preprint arXiv:2501.18590 , 2025. 1

work page arXiv 2025
[33]

Depth anything at any condition.arXiv preprint arXiv:2507.01634, 2025

B. Sun, M. Jin, B. Yin, and Q. Hou, “Depth anything at any condition,” arXiv preprint arXiv:2507.01634 , 2025. 1

work page arXiv 2025
[34]

3d gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM T ransactions on Graphics, vol. 42, no. 4, July 2023. [Online]. Avail- able: https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/ 2, 3, 4, 7

work page 2023
[35]

2d gaussian splatting for geometrically accurate radiance fields,

B. Huang, Z. Yu, A. Chen, A. Geiger, and S. Gao, “2d gaussian splatting for geometrically accurate radiance fields,” in ACM SIGGRAPH 2024 Conference Papers , 2024, pp. 1–11. 2, 4, 7

work page 2024
[36]

Splatter image: Ultra-fast single-view 3d reconstruction,

S. Szymanowicz, C. Rupprecht, and A. Vedaldi, “Splatter image: Ultra-fast single-view 3d reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 10 208–10 217. 2, 4

work page 2024
[37]

Objaverse: A universe of annotated 3d objects,

M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. Van- derBilt, L. Schmidt, K. Ehsani, A. Kembhavi, and A. Farhadi, “Objaverse: A universe of annotated 3d objects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 13 142–13 153. 2, 3, 7

work page 2023
[38]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P . Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems , vol. 33, pp. 6840–6851, 2020. 3

work page 2020
[39]

Flexible isosurface extraction for gradient-based mesh optimization,

T. Shen, J. Munkberg, J. Hasselgren, K. Yin, Z. Wang, W. Chen, Z. Gojcic, S. Fidler, N. Sharp, and J. Gao, “Flexible isosurface extraction for gradient-based mesh optimization,” ACM T rans. Graph. , vol. 42, no. 4, jul 2023. [Online]. Available: https://doi.org/10.1145/3592430 3

work page doi:10.1145/3592430 2023
[40]

Deep march- ing tetrahedra: a hybrid representation for high-resolution 3d shape synthesis,

T. Shen, J. Gao, K. Yin, M.-Y. Liu, and S. Fidler, “Deep march- ing tetrahedra: a hybrid representation for high-resolution 3d shape synthesis,” Advances in Neural Information Processing Systems, vol. 34, pp. 6087–6101, 2021. 3

work page 2021
[41]

Occupancy networks: Learning 3d reconstruction in function space,

L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” in Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , 2019, pp. 4460–4470. 3

work page 2019
[42]

arXiv preprint arXiv:2310.19415 , year=

X. Yu, Y.-C. Guo, Y. Li, D. Liang, S.-H. Zhang, and X. Qi, “Text-to-3d with classifier score distillation,” arXiv preprint arXiv:2310.19415, 2023. 3

work page arXiv 2023
[43]

Lucid- dreamer: Towards high-fidelity text-to-3d generation via interval score matching,

Y. Liang, X. Yang, J. Lin, H. Li, X. Xu, and Y. Chen, “Lucid- dreamer: Towards high-fidelity text-to-3d generation via interval score matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 6517–6526. 3

work page 2024
[44]

Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors

T. Yi, J. Fang, G. Wu, L. Xie, X. Zhang, W. Liu, Q. Tian, and X. Wang, “Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors,” arXiv preprint arXiv:2310.08529, 2023. 3

work page arXiv 2023
[45]

arXiv2404.19702(2024) 11

K. Zhang, S. Bi, H. Tan, Y. Xiangli, N. Zhao, K. Sunkavalli, and Z. Xu, “Gs-lrm: Large reconstruction model for 3d gaussian splatting,” arXiv preprint arXiv:2404.19702 , 2024. 3

work page arXiv 2024
[46]

Ctrl-room: Controllable text- to-3d room meshes generation with layout constraints,

C. Fang, X. Hu, K. Luo, and P . Tan, “Ctrl-room: Controllable text-to-3d room meshes generation with layout constraints,” arXiv preprint arXiv:2310.03602, 2023. 3

work page arXiv 2023
[47]

arXiv preprint arXiv:2312.17142 , year=

J. Ren, L. Pan, J. Tang, C. Zhang, A. Cao, G. Zeng, and Z. Liu, “Dreamgaussian4d: Generative 4d gaussian splatting,” arXiv preprint arXiv:2312.17142 , 2023. 3

work page arXiv 2023
[48]

arXiv preprint arXiv:2301.11280 , year=

U. Singer, S. Sheynin, A. Polyak, O. Ashual, I. Makarov, F. Kokki- nos, N. Goyal, A. Vedaldi, D. Parikh, J. Johnson et al., “Text-to-4d dynamic scene generation,” arXiv preprint arXiv:2301.11280 , 2023. 3

work page arXiv 2023
[49]

Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models,

H. Ling, S. W. Kim, A. Torralba, S. Fidler, and K. Kreis, “Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 8576–8588. 3

work page 2024
[50]

Objaverse-xl: A universe of 10m+ 3d objects,

M. Deitke, R. Liu, M. Wallingford, H. Ngo, O. Michel, A. Kusupati, A. Fan, C. Laforte, V . Voleti, S. Y. Gadre et al. , “Objaverse-xl: A universe of 10m+ 3d objects,” Advances in Neural Information Processing Systems, vol. 36, 2024. 3, 7

work page 2024
[51]

Ar-1-to-3: Single image to consistent 3d object generation via next-view prediction,

X. Zhang, Y. Zhou, K. Wang, Y. Wang, Z. Li, S. Jiao, D. Zhou, Q. Hou, and M.-M. Cheng, “Ar-1-to-3: Single image to consistent 3d object generation via next-view prediction,” arXiv preprint arXiv:2503.12929, 2025. 3

work page arXiv 2025
[52]

Sweetdreamer: Aligning geometric priors in 2d diffusion for consistent text-to-3d,

W. Li, R. Chen, X. Chen, and P . Tan, “Sweetdreamer: Aligning geometric priors in 2d diffusion for consistent text-to-3d,” arXiv preprint arXiv:2310.02596, 2023. 3 13

work page arXiv 2023
[53]

Dreamview: Injecting view-specific text guidance into text-to-3d generation,

J. Yan, Y. Gao, Q. Yang, X. Wei, X. Xie, A. Wu, and W.-S. Zheng, “Dreamview: Injecting view-specific text guidance into text-to-3d generation,” in European Conference on Computer Vision . Springer, 2024, pp. 358–374. 3, 4, 5, 8

work page 2024
[54]

Crm: Single image to 3d textured mesh with convolutional reconstruction model,

Z. Wang, Y. Wang, Y. Chen, C. Xiang, S. Chen, D. Yu, C. Li, H. Su, and J. Zhu, “Crm: Single image to 3d textured mesh with con- volutional reconstruction model,” arXiv preprint arXiv:2403.05034 ,

work page arXiv
[55]

Lara: Efficient large-baseline radiance fields,

A. Chen, H. Xu, S. Esposito, S. Tang, and A. Geiger, “Lara: Efficient large-baseline radiance fields,” arXiv preprint arXiv:2407.04699 ,

work page arXiv
[56]

Turbo3d: Ultra-fast text-to-3d generation,

H. Hu, T. Yin, F. Luan, Y. Hu, H. Tan, Z. Xu, S. Bi, S. Tulsiani, and K. Zhang, “Turbo3d: Ultra-fast text-to-3d generation,” arXiv preprint arXiv:2412.04470, 2024. 3

work page arXiv 2024
[57]

Diffsplat: Repurposing image diffusion models for scalable gaussian splat generation,

C. Lin, P . Pan, B. Yang, Z. Li, and Y. Mu, “Diffsplat: Repurposing image diffusion models for scalable gaussian splat generation,” arXiv preprint arXiv:2501.16764 , 2025. 3, 8

work page arXiv 2025
[58]

Structured 3d latents for scalable and versatile 3d generation,

J. Xiang, Z. Lv, S. Xu, Y. Deng, R. Wang, B. Zhang, D. Chen, X. Tong, and J. Yang, “Structured 3d latents for scalable and versatile 3d generation,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 21 469–21 480. 3, 8

work page 2025
[59]

3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models

B. Zhang, J. Tang, M. Nießner, and P . Wonka, “3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models,” ACM T rans. Graph. , vol. 42, no. 4, jul 2023. [Online]. Available: https://doi.org/10.1145/3592442 3

work page doi:10.1145/3592442 2023
[60]

arXiv preprint arXiv:2405.14979 (2024) 11

W. Li, J. Liu, H. Yan, R. Chen, Y. Liang, X. Chen, P . Tan, and X. Long, “Craftsman3d: High-fidelity mesh generation with 3d native generation and interactive geometry refiner,” arXiv preprint arXiv:2405.14979, 2024. 3

work page arXiv 2024
[61]

Dora: Sampling and benchmarking for 3d shape variational auto-encoders,

R. Chen, J. Zhang, Y. Liang, G. Luo, W. Li, J. Liu, X. Li, X. Long, J. Feng, and P . Tan, “Dora: Sampling and benchmarking for 3d shape variational auto-encoders,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 16 251–16 261. 3

work page 2025
[62]

Nero: Neural geometry and brdf reconstruction of reflective objects from multiview images,

Y. Liu, P . Wang, C. Lin, X. Long, J. Wang, L. Liu, T. Komura, and W. Wang, “Nero: Neural geometry and brdf reconstruction of reflective objects from multiview images,” ACM T ransactions on Graphics (TOG), vol. 42, no. 4, pp. 1–22, 2023. 3

work page 2023
[63]

Gs-ror: 3d gaussian splat- ting for reflective object relighting via sdf priors,

Z.-L. Zhu, B. Wang, and J. Yang, “Gs-ror: 3d gaussian splat- ting for reflective object relighting via sdf priors,” arXiv preprint arXiv:2406.18544, 2024. 3

work page arXiv 2024
[64]

Nerfactor: Neural factorization of shape and reflectance under an unknown illumination,

X. Zhang, P . P . Srinivasan, B. Deng, P . Debevec, W. T. Freeman, and J. T. Barron, “Nerfactor: Neural factorization of shape and reflectance under an unknown illumination,” ACM T ransactions on Graphics (T oG), vol. 40, no. 6, pp. 1–18, 2021. 3

work page 2021
[65]

Tensosdf: Roughness- aware tensorial representation for robust geometry and material reconstruction,

J. Li, L. Wang, L. Zhang, and B. Wang, “Tensosdf: Roughness- aware tensorial representation for robust geometry and material reconstruction,” ACM T ransactions on Graphics (TOG), vol. 43, no. 4, pp. 1–13, 2024. 3

work page 2024
[66]

Gaussian splatting with dis- cretized sdf for relightable assets,

Z.-L. Zhu, J. Yang, and B. Wang, “Gaussian splatting with dis- cretized sdf for relightable assets,” in Proceedings of IEEE Interna- tional Conference on Computer Vision (ICCV) , 2025. 3

work page 2025
[67]

Unidream: Unifying dif- fusion priors for relightable text-to-3d generation,

Z. Liu, Y. Li, Y. Lin, X. Yu, S. Peng, Y.-P . Cao, X. Qi, X. Huang, D. Liang, and W. Ouyang, “Unidream: Unifying dif- fusion priors for relightable text-to-3d generation,” arXiv preprint arXiv:2312.08754, 2023. 3

work page arXiv 2023
[68]

Matlaber: Material- aware text-to-3d via latent brdf auto-encoder,

X. Xu, Z. Lyu, X. Pan, and B. Dai, “Matlaber: Material- aware text-to-3d via latent brdf auto-encoder,” arXiv preprint arXiv:2308.09278, 2023. 3

work page arXiv 2023
[69]

Meta 3d assetgen: Text-to-mesh generation with high- quality geometry, texture, and pbr materials,

Y. Siddiqui, T. Monnier, F. Kokkinos, M. Kariya, Y. Kleiman, E. Garreau, O. Gafni, N. Neverova, A. Vedaldi, R. Shapovalov et al. , “Meta 3d assetgen: Text-to-mesh generation with high- quality geometry, texture, and pbr materials,” arXiv preprint arXiv:2407.02445, 2024. 3

work page arXiv 2024
[70]

Arm: Appearance reconstruction model for re- lightable 3d generation,

X. Feng, C. Yu, Z. Bi, Y. Shang, F. Gao, H. Wu, K. Zhou, C. Jiang, and Y. Yang, “Arm: Appearance reconstruction model for re- lightable 3d generation,” arXiv preprint arXiv:2411.10825 , 2024. 3

work page arXiv 2024
[71]

Texgaussian: Generating high-quality pbr material via octree-based 3d gaussian splatting,

B. Xiong, J. Liu, J. Hu, C. Wu, J. Wu, X. Liu, C. Zhao, E. Ding, and Z. Lian, “Texgaussian: Generating high-quality pbr material via octree-based 3d gaussian splatting,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 551–561. 3

work page 2025
[72]

Texgen: a generative diffusion model for mesh textures,

X. Yu, Z. Yuan, Y.-C. Guo, Y.-T. Liu, J. Liu, Y. Li, Y.-P . Cao, D. Liang, and X. Qi, “Texgen: a generative diffusion model for mesh textures,” ACM T ransactions on Graphics (TOG), vol. 43, no. 6, pp. 1–14, 2024. 3

work page 2024
[73]

Objects with lighting: A real-world dataset for evaluating reconstruction and rendering for object relighting,

B. Ummenhofer, S. Agrawal, R. Sep ´ulveda, Y. Lao, K. Zhang, T. Cheng, S. R. Richter, S. Wang, and G. Ros, “Objects with lighting: A real-world dataset for evaluating reconstruction and rendering for object relighting,” in 3DV. IEEE, 2024. 3, 6

work page 2024
[74]

Digital twin catalog: A large-scale photorealistic 3d object digital twin dataset,

Z. Dong, K. Chen, Z. Lv, H.-X. Yu, Y. Zhang, C. Zhang, Y. Zhu, S. Tian, Z. Li, G. Moffatt et al., “Digital twin catalog: A large-scale photorealistic 3d object digital twin dataset,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 753–

work page 2025
[75]

Mage: Single image to material-aware 3d via the multi-view g- buffer estimation model,

H. Wang, Z. Wang, X. Long, C. Lin, G. Hancke, and R. W. Lau, “Mage: Single image to material-aware 3d via the multi-view g- buffer estimation model,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 10 985–10 995. 3

work page 2025
[76]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695. 4

work page 2022
[77]

arXiv preprint arXiv:2312.02201 , year=

P . Wang and Y. Shi, “Imagedream: Image-prompt multi-view diffusion for 3d generation,” arXiv preprint arXiv:2312.02201, 2023. 4, 5, 7, 8

work page arXiv 2023
[78]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023, pp. 3836–3847. 5

work page 2023
[79]

Extracting triangular 3d models, mate- rials, and lighting from images,

J. Munkberg, J. Hasselgren, T. Shen, J. Gao, W. Chen, A. Evans, T. M ¨uller, and S. Fidler, “Extracting triangular 3d models, mate- rials, and lighting from images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 8280–8290. 5

work page 2022
[80]

Gs-ir: 3d gaussian splatting for inverse rendering,

Z. Liang, Q. Zhang, Y. Feng, Y. Shan, and K. Jia, “Gs-ir: 3d gaussian splatting for inverse rendering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 21 644–21 653. 5

work page 2024

Showing first 80 references.

[1] [1]

Hunyuan3d 1.0: A unified frame- work for text-to-3d and image-to-3d generation,

X. Yang, H. Shi, B. Zhang, F. Yang, J. Wang, H. Zhao, X. Liu, X. Wang, Q. Lin, J. Yu et al. , “Hunyuan3d 1.0: A unified frame- work for text-to-3d and image-to-3d generation,” arXiv preprint arXiv:2411.02293, 2024. 1

work page arXiv 2024

[2] [2]

Clay: A controllable large-scale generative model for creating high-quality 3d assets,

L. Zhang, Z. Wang, Q. Zhang, Q. Qiu, A. Pang, H. Jiang, W. Yang, L. Xu, and J. Yu, “Clay: A controllable large-scale generative model for creating high-quality 3d assets,” ACM T ransactions on Graphics (TOG), vol. 43, no. 4, pp. 1–20, 2024. 1

work page 2024

[3] [3]

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Z. Zhao, Z. Lai, Q. Lin, Y. Zhao, H. Liu, S. Yang, Y. Feng, M. Yang, S. Zhang, X. Yang et al., “Hunyuan3d 2.0: Scaling diffusion models for high resolution textured 3d assets generation,” arXiv preprint arXiv:2501.12202, 2025. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

arXiv2504.07943(2025) 8

Y. Yang, Y.-C. Guo, Y. Huang, Z.-X. Zou, Z. Yu, Y. Li, Y.-P . Cao, and X. Liu, “Holopart: Generative 3d part amodal segmentation,” arXiv preprint arXiv:2504.07943 , 2025. 1

work page arXiv 2025

[5] [5]

Sparseflex: High-resolution and arbitrary-topology 3d shape modeling,

X. He, Z.-X. Zou, C.-H. Chen, Y.-C. Guo, D. Liang, C. Yuan, W. Ouyang, Y.-P . Cao, and Y. Li, “Sparseflex: High-resolution and arbitrary-topology 3d shape modeling,” arXiv preprint arXiv:2503.21732, 2025. 1

work page arXiv 2025

[6] [6]

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

Y. Li, Z.-X. Zou, Z. Liu, D. Wang, Y. Liang, Z. Yu, X. Liu, Y.-C. Guo, D. Liang, W. Ouyang et al., “Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models,” arXiv preprint arXiv:2502.06608, 2025. 1

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

Direct3d-s2: Gigascale 3d generation made easy with spatial sparse attention.arXiv preprint arXiv:2505.17412, 2025

S. Wu, Y. Lin, F. Zhang, Y. Zeng, Y. Yang, Y. Bao, J. Qian, S. Zhu, P . Torr, X. Cao, and Y. Yao, “Direct3d-s2: Gigascale 3d generation made easy with spatial sparse attention,” arXiv preprint arXiv:2505.17412, 2025. 1, 3

work page arXiv 2025

[8] [8]

arXiv2505.14521(2025) 6, 8, 10, 11

Z. Li, Y. Wang, H. Zheng, Y. Luo, and B. Wen, “Sparc3d: Sparse representation and construction for high-resolution 3d shapes modeling,” arXiv preprint arXiv:2505.14521 , 2025. 1, 3

work page arXiv 2025

[9] [9]

DreamFusion: Text-to-3D using 2D Diffusion

B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “Dreamfusion: Text-to-3d using 2d diffusion,” arXiv preprint arXiv:2209.14988 ,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation,

H. Wang, X. Du, J. Li, R. A. Yeh, and G. Shakhnarovich, “Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 12 619–12 629. 1

work page 2023

[11] [11]

Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content cre- ation,

R. Chen, Y. Chen, N. Jiao, and K. Jia, “Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content cre- ation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 22 246–22 256. 1, 3

work page 2023

[12] [12]

Magic3d: High- resolution text-to-3d content creation,

C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, 12 K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin, “Magic3d: High- resolution text-to-3d content creation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 300–309. 1, 3

work page 2023

[13] [13]

Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation,

Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, and J. Zhu, “Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation,” Advances in Neural Information Processing Systems, vol. 36, 2024. 1, 3

work page 2024

[14] [14]

Text-to-3d using gaussian splatting,

Z. Chen, F. Wang, Y. Wang, and H. Liu, “Text-to-3d using gaussian splatting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 21 401–21 412. 1, 3

work page 2024

[15] [15]

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng, “Dreamgaussian: Generative gaussian splatting for efficient 3d content creation,” arXiv preprint arXiv:2309.16653 , 2023. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model

R. Shi, H. Chen, Z. Zhang, M. Liu, C. Xu, X. Wei, L. Chen, C. Zeng, and H. Su, “Zero123++: a single image to consistent multi-view diffusion base model,” arXiv preprint arXiv:2310.15110 , 2023. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [17]

Wonder3d: Single image to 3d using cross-domain diffusion,

X. Long, Y.-C. Guo, C. Lin, Y. Liu, Z. Dou, L. Liu, Y. Ma, S.- H. Zhang, M. Habermann, C. Theobalt et al. , “Wonder3d: Single image to 3d using cross-domain diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 9970–9980. 1, 3

work page 2024

[18] [18]

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

Y. Liu, C. Lin, Z. Zeng, X. Long, L. Liu, T. Komura, and W. Wang, “Syncdreamer: Generating multiview-consistent images from a single-view image,” arXiv preprint arXiv:2309.03453 , 2023. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

MVDream: Multi-view Diffusion for 3D Generation

Y. Shi, P . Wang, J. Ye, M. Long, K. Li, and X. Yang, “Mv- dream: Multi-view diffusion for 3d generation,” arXiv preprint arXiv:2308.16512, 2023. 1, 3, 4, 5, 8

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model.arXiv preprint arXiv:2311.06214, 2023

J. Li, H. Tan, K. Zhang, Z. Xu, F. Luan, Y. Xu, Y. Hong, K. Sunkavalli, G. Shakhnarovich, and S. Bi, “Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model,” arXiv preprint arXiv:2311.06214 , 2023. 1, 3

work page arXiv 2023

[21] [21]

LRM: Large Reconstruction Model for Single Image to 3D

Y. Hong, K. Zhang, J. Gu, S. Bi, Y. Zhou, D. Liu, F. Liu, K. Sunkavalli, T. Bui, and H. Tan, “Lrm: Large reconstruction model for single image to 3d,” arXiv preprint arXiv:2311.04400 ,

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

Dmv3d: Denoising multi-view diffu- sion using 3d large reconstruction model.arXiv preprint arXiv:2311.09217, 2023

Y. Xu, H. Tan, F. Luan, S. Bi, P . Wang, J. Li, Z. Shi, K. Sunkavalli, G. Wetzstein, Z. Xu et al. , “Dmv3d: Denoising multi-view dif- fusion using 3d large reconstruction model,” arXiv preprint arXiv:2311.09217, 2023. 1

work page arXiv 2023

[23] [23]

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

J. Xu, W. Cheng, Y. Gao, X. Wang, S. Gao, and Y. Shan, “In- stantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models,” arXiv preprint arXiv:2404.07191, 2024. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[24] [24]

arXiv2402.05054(2024) 11

J. Tang, Z. Chen, X. Chen, T. Wang, G. Zeng, and Z. Liu, “Lgm: Large multi-view gaussian model for high-resolution 3d content creation,” arXiv preprint arXiv:2402.05054 , 2024. 1, 3, 6, 8

work page arXiv 2024

[25] [25]

Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation

Y. Xu, Z. Shi, W. Yifan, H. Chen, C. Yang, S. Peng, Y. Shen, and G. Wetzstein, “Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation,” arXiv preprint arXiv:2403.14621, 2024. 1, 3

work page arXiv 2024

[26] [26]

3dtopia-xl: Scaling high-quality 3d asset gen- eration via primitive diffusion,

Z. Chen, J. Tang, Y. Dong, Z. Cao, F. Hong, Y. Lan, T. Wang, H. Xie, T. Wu, S. Saitoet al., “3dtopia-xl: Scaling high-quality 3d asset gen- eration via primitive diffusion,” arXiv preprint arXiv:2409.12957 ,

work page arXiv

[27] [27]

Huang, Y

Z. Huang, Y.-C. Guo, H. Wang, R. Yi, L. Ma, Y.-P . Cao, and L. Sheng, “Mv-adapter: Multi-view consistent image generation made easy,” arXiv preprint arXiv:2412.03632 , 2024. 1

work page arXiv 2024

[28] [28]

NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

P . Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” arXiv preprint arXiv:2106.10689 , 2021. 1, 3

work page internal anchor Pith review Pith/arXiv arXiv 2021

[29] [29]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P . P . Srinivasan, M. Tancik, J. T. Barron, R. Ra- mamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM , vol. 65, no. 1, pp. 99–106, 2021. 1, 3

work page 2021

[30] [30]

Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d,

L. Qiu, G. Chen, X. Gu, Q. Zuo, M. Xu, Y. Wu, W. Yuan, Z. Dong, L. Bo, and X. Han, “Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 9914–9925. 1, 2, 3, 7

work page 2024

[31] [31]

arXiv preprint arXiv:2412.12083 (2024)

Z. Li, T. Wu, J. Tan, M. Zhang, J. Wang, and D. Lin, “Idarb: Intrinsic decomposition for arbitrary number of input views and illuminations,” arXiv preprint arXiv:2412.12083 , 2024. 1, 3, 4, 5, 8

work page arXiv 2024

[32] [32]

arXiv preprint arXiv:2501.18590 (2025)

R. Liang, Z. Gojcic, H. Ling, J. Munkberg, J. Hasselgren, Z.-H. Lin, J. Gao, A. Keller, N. Vijaykumar, S. Fidler et al. , “Diffusionren- derer: Neural inverse and forward rendering with video diffusion models,” arXiv preprint arXiv:2501.18590 , 2025. 1

work page arXiv 2025

[33] [33]

Depth anything at any condition.arXiv preprint arXiv:2507.01634, 2025

B. Sun, M. Jin, B. Yin, and Q. Hou, “Depth anything at any condition,” arXiv preprint arXiv:2507.01634 , 2025. 1

work page arXiv 2025

[34] [34]

3d gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM T ransactions on Graphics, vol. 42, no. 4, July 2023. [Online]. Avail- able: https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/ 2, 3, 4, 7

work page 2023

[35] [35]

2d gaussian splatting for geometrically accurate radiance fields,

B. Huang, Z. Yu, A. Chen, A. Geiger, and S. Gao, “2d gaussian splatting for geometrically accurate radiance fields,” in ACM SIGGRAPH 2024 Conference Papers , 2024, pp. 1–11. 2, 4, 7

work page 2024

[36] [36]

Splatter image: Ultra-fast single-view 3d reconstruction,

S. Szymanowicz, C. Rupprecht, and A. Vedaldi, “Splatter image: Ultra-fast single-view 3d reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 10 208–10 217. 2, 4

work page 2024

[37] [37]

Objaverse: A universe of annotated 3d objects,

M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. Van- derBilt, L. Schmidt, K. Ehsani, A. Kembhavi, and A. Farhadi, “Objaverse: A universe of annotated 3d objects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 13 142–13 153. 2, 3, 7

work page 2023

[38] [38]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P . Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems , vol. 33, pp. 6840–6851, 2020. 3

work page 2020

[39] [39]

Flexible isosurface extraction for gradient-based mesh optimization,

T. Shen, J. Munkberg, J. Hasselgren, K. Yin, Z. Wang, W. Chen, Z. Gojcic, S. Fidler, N. Sharp, and J. Gao, “Flexible isosurface extraction for gradient-based mesh optimization,” ACM T rans. Graph. , vol. 42, no. 4, jul 2023. [Online]. Available: https://doi.org/10.1145/3592430 3

work page doi:10.1145/3592430 2023

[40] [40]

Deep march- ing tetrahedra: a hybrid representation for high-resolution 3d shape synthesis,

T. Shen, J. Gao, K. Yin, M.-Y. Liu, and S. Fidler, “Deep march- ing tetrahedra: a hybrid representation for high-resolution 3d shape synthesis,” Advances in Neural Information Processing Systems, vol. 34, pp. 6087–6101, 2021. 3

work page 2021

[41] [41]

Occupancy networks: Learning 3d reconstruction in function space,

L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” in Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition , 2019, pp. 4460–4470. 3

work page 2019

[42] [42]

arXiv preprint arXiv:2310.19415 , year=

X. Yu, Y.-C. Guo, Y. Li, D. Liang, S.-H. Zhang, and X. Qi, “Text-to-3d with classifier score distillation,” arXiv preprint arXiv:2310.19415, 2023. 3

work page arXiv 2023

[43] [43]

Lucid- dreamer: Towards high-fidelity text-to-3d generation via interval score matching,

Y. Liang, X. Yang, J. Lin, H. Li, X. Xu, and Y. Chen, “Lucid- dreamer: Towards high-fidelity text-to-3d generation via interval score matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 6517–6526. 3

work page 2024

[44] [44]

Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors

T. Yi, J. Fang, G. Wu, L. Xie, X. Zhang, W. Liu, Q. Tian, and X. Wang, “Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors,” arXiv preprint arXiv:2310.08529, 2023. 3

work page arXiv 2023

[45] [45]

arXiv2404.19702(2024) 11

K. Zhang, S. Bi, H. Tan, Y. Xiangli, N. Zhao, K. Sunkavalli, and Z. Xu, “Gs-lrm: Large reconstruction model for 3d gaussian splatting,” arXiv preprint arXiv:2404.19702 , 2024. 3

work page arXiv 2024

[46] [46]

Ctrl-room: Controllable text- to-3d room meshes generation with layout constraints,

C. Fang, X. Hu, K. Luo, and P . Tan, “Ctrl-room: Controllable text-to-3d room meshes generation with layout constraints,” arXiv preprint arXiv:2310.03602, 2023. 3

work page arXiv 2023

[47] [47]

arXiv preprint arXiv:2312.17142 , year=

J. Ren, L. Pan, J. Tang, C. Zhang, A. Cao, G. Zeng, and Z. Liu, “Dreamgaussian4d: Generative 4d gaussian splatting,” arXiv preprint arXiv:2312.17142 , 2023. 3

work page arXiv 2023

[48] [48]

arXiv preprint arXiv:2301.11280 , year=

U. Singer, S. Sheynin, A. Polyak, O. Ashual, I. Makarov, F. Kokki- nos, N. Goyal, A. Vedaldi, D. Parikh, J. Johnson et al., “Text-to-4d dynamic scene generation,” arXiv preprint arXiv:2301.11280 , 2023. 3

work page arXiv 2023

[49] [49]

Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models,

H. Ling, S. W. Kim, A. Torralba, S. Fidler, and K. Kreis, “Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 8576–8588. 3

work page 2024

[50] [50]

Objaverse-xl: A universe of 10m+ 3d objects,

M. Deitke, R. Liu, M. Wallingford, H. Ngo, O. Michel, A. Kusupati, A. Fan, C. Laforte, V . Voleti, S. Y. Gadre et al. , “Objaverse-xl: A universe of 10m+ 3d objects,” Advances in Neural Information Processing Systems, vol. 36, 2024. 3, 7

work page 2024

[51] [51]

Ar-1-to-3: Single image to consistent 3d object generation via next-view prediction,

X. Zhang, Y. Zhou, K. Wang, Y. Wang, Z. Li, S. Jiao, D. Zhou, Q. Hou, and M.-M. Cheng, “Ar-1-to-3: Single image to consistent 3d object generation via next-view prediction,” arXiv preprint arXiv:2503.12929, 2025. 3

work page arXiv 2025

[52] [52]

Sweetdreamer: Aligning geometric priors in 2d diffusion for consistent text-to-3d,

W. Li, R. Chen, X. Chen, and P . Tan, “Sweetdreamer: Aligning geometric priors in 2d diffusion for consistent text-to-3d,” arXiv preprint arXiv:2310.02596, 2023. 3 13

work page arXiv 2023

[53] [53]

Dreamview: Injecting view-specific text guidance into text-to-3d generation,

J. Yan, Y. Gao, Q. Yang, X. Wei, X. Xie, A. Wu, and W.-S. Zheng, “Dreamview: Injecting view-specific text guidance into text-to-3d generation,” in European Conference on Computer Vision . Springer, 2024, pp. 358–374. 3, 4, 5, 8

work page 2024

[54] [54]

Crm: Single image to 3d textured mesh with convolutional reconstruction model,

Z. Wang, Y. Wang, Y. Chen, C. Xiang, S. Chen, D. Yu, C. Li, H. Su, and J. Zhu, “Crm: Single image to 3d textured mesh with con- volutional reconstruction model,” arXiv preprint arXiv:2403.05034 ,

work page arXiv

[55] [55]

Lara: Efficient large-baseline radiance fields,

A. Chen, H. Xu, S. Esposito, S. Tang, and A. Geiger, “Lara: Efficient large-baseline radiance fields,” arXiv preprint arXiv:2407.04699 ,

work page arXiv

[56] [56]

Turbo3d: Ultra-fast text-to-3d generation,

H. Hu, T. Yin, F. Luan, Y. Hu, H. Tan, Z. Xu, S. Bi, S. Tulsiani, and K. Zhang, “Turbo3d: Ultra-fast text-to-3d generation,” arXiv preprint arXiv:2412.04470, 2024. 3

work page arXiv 2024

[57] [57]

Diffsplat: Repurposing image diffusion models for scalable gaussian splat generation,

C. Lin, P . Pan, B. Yang, Z. Li, and Y. Mu, “Diffsplat: Repurposing image diffusion models for scalable gaussian splat generation,” arXiv preprint arXiv:2501.16764 , 2025. 3, 8

work page arXiv 2025

[58] [58]

Structured 3d latents for scalable and versatile 3d generation,

J. Xiang, Z. Lv, S. Xu, Y. Deng, R. Wang, B. Zhang, D. Chen, X. Tong, and J. Yang, “Structured 3d latents for scalable and versatile 3d generation,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 21 469–21 480. 3, 8

work page 2025

[59] [59]

3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models

B. Zhang, J. Tang, M. Nießner, and P . Wonka, “3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models,” ACM T rans. Graph. , vol. 42, no. 4, jul 2023. [Online]. Available: https://doi.org/10.1145/3592442 3

work page doi:10.1145/3592442 2023

[60] [60]

arXiv preprint arXiv:2405.14979 (2024) 11

W. Li, J. Liu, H. Yan, R. Chen, Y. Liang, X. Chen, P . Tan, and X. Long, “Craftsman3d: High-fidelity mesh generation with 3d native generation and interactive geometry refiner,” arXiv preprint arXiv:2405.14979, 2024. 3

work page arXiv 2024

[61] [61]

Dora: Sampling and benchmarking for 3d shape variational auto-encoders,

R. Chen, J. Zhang, Y. Liang, G. Luo, W. Li, J. Liu, X. Li, X. Long, J. Feng, and P . Tan, “Dora: Sampling and benchmarking for 3d shape variational auto-encoders,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 16 251–16 261. 3

work page 2025

[62] [62]

Nero: Neural geometry and brdf reconstruction of reflective objects from multiview images,

Y. Liu, P . Wang, C. Lin, X. Long, J. Wang, L. Liu, T. Komura, and W. Wang, “Nero: Neural geometry and brdf reconstruction of reflective objects from multiview images,” ACM T ransactions on Graphics (TOG), vol. 42, no. 4, pp. 1–22, 2023. 3

work page 2023

[63] [63]

Gs-ror: 3d gaussian splat- ting for reflective object relighting via sdf priors,

Z.-L. Zhu, B. Wang, and J. Yang, “Gs-ror: 3d gaussian splat- ting for reflective object relighting via sdf priors,” arXiv preprint arXiv:2406.18544, 2024. 3

work page arXiv 2024

[64] [64]

Nerfactor: Neural factorization of shape and reflectance under an unknown illumination,

X. Zhang, P . P . Srinivasan, B. Deng, P . Debevec, W. T. Freeman, and J. T. Barron, “Nerfactor: Neural factorization of shape and reflectance under an unknown illumination,” ACM T ransactions on Graphics (T oG), vol. 40, no. 6, pp. 1–18, 2021. 3

work page 2021

[65] [65]

Tensosdf: Roughness- aware tensorial representation for robust geometry and material reconstruction,

J. Li, L. Wang, L. Zhang, and B. Wang, “Tensosdf: Roughness- aware tensorial representation for robust geometry and material reconstruction,” ACM T ransactions on Graphics (TOG), vol. 43, no. 4, pp. 1–13, 2024. 3

work page 2024

[66] [66]

Gaussian splatting with dis- cretized sdf for relightable assets,

Z.-L. Zhu, J. Yang, and B. Wang, “Gaussian splatting with dis- cretized sdf for relightable assets,” in Proceedings of IEEE Interna- tional Conference on Computer Vision (ICCV) , 2025. 3

work page 2025

[67] [67]

Unidream: Unifying dif- fusion priors for relightable text-to-3d generation,

Z. Liu, Y. Li, Y. Lin, X. Yu, S. Peng, Y.-P . Cao, X. Qi, X. Huang, D. Liang, and W. Ouyang, “Unidream: Unifying dif- fusion priors for relightable text-to-3d generation,” arXiv preprint arXiv:2312.08754, 2023. 3

work page arXiv 2023

[68] [68]

Matlaber: Material- aware text-to-3d via latent brdf auto-encoder,

X. Xu, Z. Lyu, X. Pan, and B. Dai, “Matlaber: Material- aware text-to-3d via latent brdf auto-encoder,” arXiv preprint arXiv:2308.09278, 2023. 3

work page arXiv 2023

[69] [69]

Meta 3d assetgen: Text-to-mesh generation with high- quality geometry, texture, and pbr materials,

Y. Siddiqui, T. Monnier, F. Kokkinos, M. Kariya, Y. Kleiman, E. Garreau, O. Gafni, N. Neverova, A. Vedaldi, R. Shapovalov et al. , “Meta 3d assetgen: Text-to-mesh generation with high- quality geometry, texture, and pbr materials,” arXiv preprint arXiv:2407.02445, 2024. 3

work page arXiv 2024

[70] [70]

Arm: Appearance reconstruction model for re- lightable 3d generation,

X. Feng, C. Yu, Z. Bi, Y. Shang, F. Gao, H. Wu, K. Zhou, C. Jiang, and Y. Yang, “Arm: Appearance reconstruction model for re- lightable 3d generation,” arXiv preprint arXiv:2411.10825 , 2024. 3

work page arXiv 2024

[71] [71]

Texgaussian: Generating high-quality pbr material via octree-based 3d gaussian splatting,

B. Xiong, J. Liu, J. Hu, C. Wu, J. Wu, X. Liu, C. Zhao, E. Ding, and Z. Lian, “Texgaussian: Generating high-quality pbr material via octree-based 3d gaussian splatting,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 551–561. 3

work page 2025

[72] [72]

Texgen: a generative diffusion model for mesh textures,

X. Yu, Z. Yuan, Y.-C. Guo, Y.-T. Liu, J. Liu, Y. Li, Y.-P . Cao, D. Liang, and X. Qi, “Texgen: a generative diffusion model for mesh textures,” ACM T ransactions on Graphics (TOG), vol. 43, no. 6, pp. 1–14, 2024. 3

work page 2024

[73] [73]

Objects with lighting: A real-world dataset for evaluating reconstruction and rendering for object relighting,

B. Ummenhofer, S. Agrawal, R. Sep ´ulveda, Y. Lao, K. Zhang, T. Cheng, S. R. Richter, S. Wang, and G. Ros, “Objects with lighting: A real-world dataset for evaluating reconstruction and rendering for object relighting,” in 3DV. IEEE, 2024. 3, 6

work page 2024

[74] [74]

Digital twin catalog: A large-scale photorealistic 3d object digital twin dataset,

Z. Dong, K. Chen, Z. Lv, H.-X. Yu, Y. Zhang, C. Zhang, Y. Zhu, S. Tian, Z. Li, G. Moffatt et al., “Digital twin catalog: A large-scale photorealistic 3d object digital twin dataset,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 753–

work page 2025

[75] [75]

Mage: Single image to material-aware 3d via the multi-view g- buffer estimation model,

H. Wang, Z. Wang, X. Long, C. Lin, G. Hancke, and R. W. Lau, “Mage: Single image to material-aware 3d via the multi-view g- buffer estimation model,” in Proceedings of the Computer Vision and Pattern Recognition Conference , 2025, pp. 10 985–10 995. 3

work page 2025

[76] [76]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695. 4

work page 2022

[77] [77]

arXiv preprint arXiv:2312.02201 , year=

P . Wang and Y. Shi, “Imagedream: Image-prompt multi-view diffusion for 3d generation,” arXiv preprint arXiv:2312.02201, 2023. 4, 5, 7, 8

work page arXiv 2023

[78] [78]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023, pp. 3836–3847. 5

work page 2023

[79] [79]

Extracting triangular 3d models, mate- rials, and lighting from images,

J. Munkberg, J. Hasselgren, T. Shen, J. Gao, W. Chen, A. Evans, T. M ¨uller, and S. Fidler, “Extracting triangular 3d models, mate- rials, and lighting from images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 8280–8290. 5

work page 2022

[80] [80]

Gs-ir: 3d gaussian splatting for inverse rendering,

Z. Liang, Q. Zhang, Y. Feng, Y. Shan, and K. Jia, “Gs-ir: 3d gaussian splatting for inverse rendering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 21 644–21 653. 5

work page 2024