Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation

Andreas Engelhardt; Hendrik P.A. Lensch; Jan-Niklas Dihlmann; Mark Boss; Simon Donne

arxiv: 2606.23514 · v1 · pith:W3CL47ACnew · submitted 2026-06-22 · 💻 cs.CV · cs.GR

Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation

Jan-Niklas Dihlmann , Andreas Engelhardt , Simon Donne , Hendrik P.A. Lensch , Mark Boss This is my paper

Pith reviewed 2026-06-26 08:46 UTC · model grok-4.3

classification 💻 cs.CV cs.GR

keywords 3D asset generationgeometric conditioningconstraint mesheslatent diffusioncontrollable generationtext-to-3Dhull avoidance touch

0 comments

The pith

Arbor adds explicit geometric control to text-to-3D models by routing constraint mesh tokens into a frozen denoiser.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Arbor to give text-conditioned 3D generators a direct spatial control interface using constraint meshes. These meshes define hull regions where geometry must exist, avoidance regions that must stay empty, and touch regions that must make contact. The meshes are turned into tokens and attached through routing inside the frozen denoiser so each part of the latent space receives only the relevant local constraint. This produces higher obedience to the constraints while keeping object quality and output variation, all without adding compliance losses or retraining the base model.

Core claim

Arbor is a trainable attachment for text conditioned latent 3D generation that treats constraint meshes as a native 3D control interface. The interface supplies hull regions where geometry should exist, avoidance regions that should remain empty, and touch regions the object should contact. Constraint meshes are converted into tokens and integrated via a routed attachment inside a frozen denoiser so each latent region receives only the constraint portion that applies to its spatial location. Even without dedicated compliance losses, this improves constraint obedience while preserving object quality and variation under fixed constraints.

What carries the argument

The routed attachment that converts constraint meshes into tokens and injects them locally into the frozen denoiser.

If this is right

Text-to-3D models can respect explicit spatial requirements such as fitting inside envelopes or leaving clearance for motion.
Output variation remains high even when the same constraint meshes are applied.
Constraint obedience rises on both automatic and artist-curated benchmarks for hull, avoidance, and touch constraints.
Metric gains align with human preference ratings for the controlled outputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same token-routing attachment could be tested on other latent generative backbones such as video or image models.
Artists could iterate on 3D assets by successively adding or editing constraint meshes without restarting generation.
The approach might reduce reliance on post-processing or manual cleanup in production asset pipelines.

Load-bearing premise

Constraint meshes can be converted into tokens and integrated via a routed attachment inside a frozen denoiser to locally influence generation without degrading base model performance or requiring additional compliance signals.

What would settle it

On the automatic and artist-curated control benchmarks, constraint obedience metrics show no improvement when the routed attachment is added compared with the unmodified base denoiser.

Figures

Figures reproduced from arXiv: 2606.23514 by Andreas Engelhardt, Hendrik P.A. Lensch, Jan-Niklas Dihlmann, Mark Boss, Simon Donne.

**Figure 1.** Figure 1: Arbor overview. Arbor turns simple 3D control objects into an explicit constraint signal for textconditioned 3D generation. Hull regions mark where generated geometry should exist, touch regions mark contact patches, and avoidance regions mark free space that should remain empty. This enables artist to co-author the generation process, making asset generation more reliable and therefore more likely to be … view at source ↗

**Figure 2.** Figure 2: Constraint conditioning pipeline. Arbor converts a typed constraint object into TRELLIS.2 OVoxels, encodes geometry and signal attributes with frozen encoders (Sec. 3.2), aligns the resulting latents into geometry tokens, and routes those tokens into the TRELLIS sparse structure denoiser (Sec. 3.3). Local routing gives each query group the nearby constraint evidence it needs, while learned global summaries… view at source ↗

**Figure 3.** Figure 3: Controlled generation comparison. Each column shows one prompt and constraint object. The constraint is rendered as normal shaded geometry with signal regions colored hull, touch, and avoidance. Rows compare predictions and their constraint following. Here, green indicates a hull match and blue indicates missing hull. Arbor keeps readable objects while following local roles. Models. Arbor is the model from… view at source ↗

**Figure 4.** Figure 4: Constraint sweeps. The prompt is fixed and the constraint region is continuously moved, scaled, or rotated. Arbor follows the deformation without snapping to a small set of canonical layouts [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Variation under a fixed constraint. Each block keeps the hull fixed and varies the seed; image conditioned baselines also fix the input image. Arbor changes details and proportions across seeds while still satisfying the constraint, where the image anchored methods stay close to their input. Method Var.↑ Ctrl. Scr.↑ Hull Hit↑ Avoid Viol.↓ Vol. Match↑ MV-CLIP↑ Arbor 0.740 0.361 ± 0.010 0.707 ± 0.044 0.016 ±… view at source ↗

**Figure 6.** Figure 6: Automatic constraint families used by Arbor. The figure shows the concrete generators that make up Arbor’s typed constraint program. Green columns are positive hull families, yellow columns are touch/contact families, and red columns are avoidance families. Each column lists the family intent at the top and example outputs on several objects below. These families are sampled online during training, while b… view at source ↗

**Figure 7.** Figure 7: Additional Arbor results on selected Toys4K constraints. Showing manual and automatic benchmark cases. In practice, this variant was not the best solution. It does improve direct constraint pressure, but it also moves the model toward a failure mode where following the constraint becomes easier than generating a plausible object from the prompt. This is close to the behavior seen in SpaceControl, where the… view at source ↗

**Figure 8.** Figure 8: Extended constraint sweeps. Additional sweep selections using the same rendering language as [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

read the original abstract

Text and image conditioned 3D models now generate convincing assets, but they still offer little direct control over the space an object should occupy or avoid. In authoring, this spatial intent is often known before generation starts. A chair should fit a seating envelope, a prop should leave clearance for motion, or a part should expose a contact surface. Prompts and image views are poor carriers for such constraints, requiring the need for an explicit control interface. We present Arbor, a trainable attachment for text conditioned latent 3D generation. Arbor introduces constraint meshes as a native 3D control interface. The interface uses hull regions where geometry should exist, avoidance regions that should remain empty, and touch regions the object should contact. Unlike completion or whole object scaffold control, these meshes are not target evidence. They are local typed requirements and can include regions where no surface should appear. Arbor keeps this signal as geometry by converting constraint meshes into tokens and learning a routed attachment inside a frozen denoiser. Each latent region can therefore receive the part of the constraint that matters for its spatial location. We evaluate Arbor on automatic and artist curated control benchmarks with hull, avoidance, and touch constraints, and compare the metric trends to a user preference study. Even without dedicated compliance losses, Arbor improves constraint obedience while preserving object quality and variation under fixed constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Arbor tokenizes typed constraint meshes and routes them locally into a frozen 3D denoiser, which is a clean interface idea, but the abstract gives no numbers or implementation details to check whether it actually works.

read the letter

The new piece is the constraint mesh interface with three explicit types—hull regions that must contain geometry, avoidance regions that must stay empty, and touch regions for contact—treated as local requirements rather than full targets. These get converted to tokens and fed through a learned routed attachment inside a frozen latent denoiser so the signal stays geometry-based and location-specific.

That framing is distinct from scaffold or completion baselines because the meshes can specify empty space and do not dictate the entire output shape. The problem statement also lands: prompts and images are weak at carrying spatial authoring intent like clearance or fit constraints.

The paper reports that this attachment improves constraint obedience on automatic and artist-curated benchmarks plus a user study, all without dedicated compliance losses, while holding object quality and variation steady.

The main limitation is that none of the supporting evidence is visible. There are no metric values, baseline tables, ablation results, tokenization procedure, routing architecture, or error bars in the provided text. The central claim therefore cannot be evaluated from what is here, which leaves the soundness of the mechanism unverified.

This is aimed at people building controllable 3D generators who need spatial knobs beyond text or images. A reader working on diffusion attachments or 3D interfaces would find the setup worth examining if the full experiments are solid.

I would send it to peer review so the implementation details and quantitative results can be checked properly.

Referee Report

1 major / 0 minor

Summary. The paper presents Arbor, a trainable attachment for text-conditioned latent 3D generation. Arbor converts constraint meshes (hull regions where geometry should exist, avoidance regions that should remain empty, and touch regions the object should contact) into tokens and integrates them via a routed attachment inside a frozen denoiser, allowing each latent region to receive the relevant part of the constraint. The central claim is that, even without dedicated compliance losses, this yields improved constraint obedience on hull/avoidance/touch benchmarks while preserving object quality and variation, with supporting evidence from automatic/artist-curated benchmarks and a user preference study.

Significance. If the experimental claims hold, the work would supply a practical, geometry-native control interface that addresses a clear limitation in current 3D generative models, where prompts and images are poor carriers of spatial intent. The additive, frozen-denoiser design is a notable strength if it demonstrably avoids the need for compliance losses while maintaining base-model performance.

major comments (1)

[Abstract] Abstract and provided manuscript text: the central claim of improved constraint obedience rests on metric trends from automatic and artist-curated benchmarks plus a user study, yet the text supplies no method equations, tokenization procedure, routing architecture, benchmark definitions, quantitative tables, error bars, or ablation results. Without these, the support for the claim that the routed attachment produces measurable gains cannot be verified.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. The concern raised is that the abstract and provided text lack supporting technical details and results. The full manuscript contains dedicated sections addressing these elements; we address the point below and note that no changes to the core claims or experiments are required.

read point-by-point responses

Referee: [Abstract] Abstract and provided manuscript text: the central claim of improved constraint obedience rests on metric trends from automatic and artist-curated benchmarks plus a user study, yet the text supplies no method equations, tokenization procedure, routing architecture, benchmark definitions, quantitative tables, error bars, or ablation results. Without these, the support for the claim that the routed attachment produces measurable gains cannot be verified.

Authors: The full manuscript (beyond the abstract) includes: (1) Section 3 with the tokenization procedure for constraint meshes, the routed attachment architecture, and all relevant equations for the conditioning mechanism inside the frozen denoiser; (2) Section 4 with explicit definitions of the hull, avoidance, and touch benchmarks (both automatic and artist-curated); (3) Section 5 with quantitative tables reporting metric trends, error bars from multiple runs, and the user preference study results; and (4) Section 5.3 with ablation studies isolating the contribution of the routed attachment. These sections directly support the central claim. The abstract is intentionally concise and summarizes rather than reproduces the full technical content. We can add explicit section references to the abstract in a revision if the referee finds that helpful for navigation. revision: partial

Circularity Check

0 steps flagged

No circularity: method is an additive trainable attachment evaluated on external benchmarks

full rationale

The paper presents Arbor as a new trainable routed attachment that converts constraint meshes to tokens and integrates them locally inside a frozen denoiser. The central claim of improved constraint obedience is supported by evaluation on automatic and artist-curated benchmarks with hull/avoidance/touch constraints, plus a user preference study. No equations, predictions, or first-principles results are shown that reduce by construction to fitted parameters or self-citations; the approach is additive and externally benchmarked rather than self-referential.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract, the central claim rests on the assumption that a frozen denoiser can incorporate the attachment and that constraint meshes function as local typed requirements rather than targets. No free parameters are explicitly named. One invented entity is the constraint mesh interface.

axioms (1)

domain assumption A text-conditioned latent 3D denoiser can remain frozen while a trainable routed attachment processes constraint tokens.
Stated in the description of Arbor as a trainable attachment inside a frozen denoiser.

invented entities (1)

constraint meshes (hull, avoidance, touch regions) no independent evidence
purpose: Provide explicit local 3D control signals that are not target geometry.
Introduced as the core native interface; no independent evidence outside the paper is mentioned.

pith-pipeline@v0.9.1-grok · 5786 in / 1426 out tokens · 34929 ms · 2026-06-26T08:46:39.973590+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 8 canonical work pages

[1]

Kim, Noam Aigerman, Amit H

Amir Barda, Matheus Gadelha, Vladimir G. Kim, Noam Aigerman, Amit H. Bermano, and Thibault Groueix. Instant3dit: Multiview inpainting for fast editing of 3D objects. InConference on Computer Vision and Pattern Recognition (CVPR), pages 16273–16282, 2025. URL https: //arxiv.org/abs/2412.00518

arXiv 2025
[2]

SF3D: Stable fast 3D mesh reconstruction with UV-unwrapping and illumination disentanglement

Mark Boss, Zixuan Huang, Aaryaman Vasishta, and Varun Jampani. SF3D: Stable fast 3D mesh reconstruction with UV-unwrapping and illumination disentanglement. InConference on Computer Vision and Pattern Recognition (CVPR), pages 16240–16250, 2025. URL https: //arxiv.org/abs/2408.00653

arXiv 2025
[3]

DiffComplete: Diffusion-based generative 3D shape completion

Ruihang Chu, Enze Xie, Shentong Mo, Zhenguo Li, Matthias Nießner, Chi-Wing Fu, and Jiaya Jia. DiffComplete: Diffusion-based generative 3D shape completion. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. URL https://arxiv.org/abs/ 2306.16329

arXiv 2023
[4]

Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, and Jitendra Malik

Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F. Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, and Jitendra Malik. ABO: Dataset and benchmarks for real-world 3D object understanding. InConference on Computer Vision and Pattern Recognition (CVPR), 2022. URL https: //arxiv.org/...

arXiv 2022
[5]

Objaverse-XL: A universe of 10M+ 3D objects

Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl V ondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, and Ali Farhadi. Objaverse-XL: A universe of 10M+ 3D objects. InAdvances in Neural Information Processi...

Pith/arXiv arXiv 2023
[6]

V*: Guided visual search as a core mechanism in multimodal llms

Jan-Niklas Dihlmann, Andreas Engelhardt, and Hendrik Lensch. SIGNeRF: Scene integrated generation for neural radiance fields. InConference on Computer Vision and Pattern Recognition (CVPR), 2024. doi: 10.1109/CVPR52733.2024.00638. URL https://doi.org/10.1109/ CVPR52733.2024.00638

work page doi:10.1109/cvpr52733.2024.00638 2024
[7]

Jan-Niklas Dihlmann, Mark Boss, Simon Donne, Andreas Engelhardt, Hendrik P. A. Lensch, and Varun Jampani. ReLi3D: Relightable multi-view 3D reconstruction with disentangled illumination. InInternational Conference on Learning Representations (ICLR), 2026. URL https://openreview.net/forum?id=BlSKgQb3Vd

2026
[8]

2d gaussian splatting for geometrically accurate radiance fields,

Wenqi Dong, Bangbang Yang, Lin Ma, Xiao Liu, Liyuan Cui, Hujun Bao, Yuewen Ma, and Zhaopeng Cui. Coin3D: Controllable and interactive 3D assets generation with proxy- guided conditioning. InACM SIGGRAPH, 2024. doi: 10.1145/3641519.3657425. URL https://doi.org/10.1145/3641519.3657425

work page doi:10.1145/3641519.3657425 2024
[9]

SpaceControl: Introducing test-time spatial control to 3D generative modeling

Elisabetta Fedele, Francis Engelmann, Ian Huang, Or Litany, Marc Pollefeys, and Leonidas Guibas. SpaceControl: Introducing test-time spatial control to 3D generative modeling. InInternational Conference on Learning Representations (ICLR), 2026. URL https: //openreview.net/forum?id=mEqsCVI5sN

2026
[10]

ObjFiller-3D: Consistent multi-view 3D inpainting via video diffusion models.arXiv preprint, 2025

Haitang Feng, Jie Liu, Jie Tang, Gangshan Wu, Beiqi Chen, Jianhuang Lai, and Guangcong Wang. ObjFiller-3D: Consistent multi-view 3D inpainting via video diffusion models.arXiv preprint, 2025. URLhttps://arxiv.org/abs/2508.18271

Pith/arXiv arXiv 2025
[11]

OpenLRM: Open-source large reconstruction models

Zexin He and Tengfei Wang. OpenLRM: Open-source large reconstruction models. https:// github.com/3DTopia/OpenLRM, 2023. URL https://github.com/3DTopia/OpenLRM. GitHub repository; open-source implementation of LRM, not a primary paper. 10

2023
[12]

SPAGHETTI

Amir Hertz, Or Perel, Raja Giryes, Olga Sorkine-Hornung, and Daniel Cohen-Or. SPAGHETTI. ACM Transactions on Graphics (TOG), 2022. doi: 10.1145/3528223.3530084. URL https: //doi.org/10.1145/3528223.3530084

work page doi:10.1145/3528223.3530084 2022
[13]

LRM: Large reconstruction model for single image to 3D

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large reconstruction model for single image to 3D. InInternational Conference on Learning Representations (ICLR), 2024. URL https: //openreview.net/forum?id=sllU8vvsFF

2024
[14]

Easy3E: Feed-forward 3D asset editing via rectified voxel flow.arXiv preprint, 2026

Shimin Hu, Yuanyi Wei, Fei Zha, Yudong Guo, and Juyong Zhang. Easy3E: Feed-forward 3D asset editing via rectified voxel flow.arXiv preprint, 2026. URL https://arxiv.org/abs/ 2602.21499. CVPR 2026

arXiv 2026
[15]

SPAR3D: Stable point-aware reconstruction of 3D objects from single images

Zixuan Huang, Mark Boss, Aaryaman Vasishta, James Matthew Rehg, and Varun Jampani. SPAR3D: Stable point-aware reconstruction of 3D objects from single images. InConference on Computer Vision and Pattern Recognition (CVPR), pages 16860–16870, 2025. URL https: //arxiv.org/abs/2501.04689

arXiv 2025
[16]

Otaduy, and Dan Casas

Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, and Ben Poole. Zero-shot text-guided object generation with dream fields. InConference on Computer Vision and Pattern Recognition (CVPR), pages 857–866, 2022. doi: 10.1109/CVPR52688.2022.00094. URL https://doi.org/10.1109/CVPR52688.2022.00094

work page doi:10.1109/cvpr52688.2022.00094 2022
[17]

Chang, and Manolis Savva

Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Schacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X. Chang, and Manolis Savva. Habitat synthetic scenes dataset (HSSD-200): An analysis of 3D scene scale and realism tradeoffs for ObjectGoal navigation. InIEEE International Conference on Robotics and Automation (ICRA), 2024. ...

arXiv 2024
[18]

SALAD: Part-level latent diffusion for 3D shape generation and manipulation

Juil Koo, Seungwoo Yoo, Minh Hieu Nguyen, and Minhyuk Sung. SALAD: Part-level latent diffusion for 3D shape generation and manipulation. InInternational Conference on Computer Vision (ICCV), 2023. URLhttps://arxiv.org/abs/2303.12236

arXiv 2023
[19]

BoxSplitGen: A generative model for 3D part bounding boxes in varying granularity

Juil Koo, Wei-Tung Lin, Chanho Park, Chanhyeok Park, and Minhyuk Sung. BoxSplitGen: A generative model for 3D part bounding boxes in varying granularity. InWinter Conference on Applications of Computer Vision (WACV), pages 1777–1787, 2026. URL https://arxiv. org/abs/2602.20666

arXiv 2026
[20]

Lightlab: Controlling light sources in images with diffusion models

Mingi Lee, Dongsu Zhang, Clément Jambon, and Young Min Kim. BrepDiff: Single-stage b-rep diffusion model. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, SIGGRAPH Conference Papers ’25, New York, NY , USA, 2025. Association for Computing Machinery. ISBN 9798400715402. doi: 10.1145/3...

work page doi:10.1145/3721238.3730698 2025
[21]

Black, and Otmar Hilliges

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3D: High-resolution text-to-3D content creation. InConference on Computer Vision and Pattern Recognition (CVPR), pages 300–309, 2023. doi: 10.1109/CVPR52729.2023.00037. URL https://doi.org/10.1109/ CVPR52729.2023.00037

work page doi:10.1109/cvpr52729.2023.00037 2023
[22]

T2I-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2I-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InAAAI Conference on Artificial Intelligence (AAAI), pages 4296–4304, 2024. doi: 10.1609/AAAI.V38I5.28226. URLhttps://doi.org/10.1609/AAAI.V38I5.28226

work page doi:10.1609/aaai.v38i5.28226 2024
[23]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jégou, Julien Mairal, Patric...

2024
[24]

Barron, and Ben Mildenhall

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. DreamFusion: Text-to-3D using 2D diffusion. InInternational Conference on Learning Representations (ICLR), 2023. URL https://openreview.net/forum?id=FjNys5c7VyY. 11

2023
[25]

Spice·e: Structural priors in 3D diffusion using cross-entity attention

Etai Sella, Gal Fiebelman, Noam Atia, and Hadar Averbuch-Elor. Spice·e: Structural priors in 3D diffusion using cross-entity attention. InACM SIGGRAPH, pages 1–11, 2024. URL https://arxiv.org/abs/2311.17834

arXiv 2024
[26]

Stefan Stojanov, Anh Thai, and James M. Rehg. Using shape to categorize: Low-shot learning with an iterative categorization-discrimination loop. InConference on Computer Vision and Pattern Recognition (CVPR), 2021. URLhttps://arxiv.org/abs/2104.07371

arXiv 2021
[27]

DreamCraft3D: Hierarchical 3D generation with bootstrapped diffusion prior

Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, and Yebin Liu. DreamCraft3D: Hierarchical 3D generation with bootstrapped diffusion prior. InInternational Conference on Learning Representations (ICLR), 2024. URL https://openreview.net/ forum?id=DDX1u29Gqr

2024
[28]

DreamGaussian: Generative gaussian splatting for efficient 3D content creation

Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. DreamGaussian: Generative gaussian splatting for efficient 3D content creation. InInternational Conference on Learning Representations (ICLR), 2024. URLhttps://openreview.net/forum?id=UyNXMqnN3c

2024
[29]

Hunyuan3D-omni: A unified framework for controllable generation of 3D assets.arXiv preprint, 2025

Team Hunyuan3D. Hunyuan3D-omni: A unified framework for controllable generation of 3D assets.arXiv preprint, 2025. URLhttps://arxiv.org/abs/2509.21245

arXiv 2025
[30]

TripoSR: Fast 3D object reconstruction from a single image.arXiv preprint, 2024

Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, and Yan-Pei Cao. TripoSR: Fast 3D object reconstruction from a single image.arXiv preprint, 2024. URL https://arxiv.org/abs/ 2403.02151

Pith/arXiv arXiv 2024
[31]

SK-adapter: Skeleton-based structural control for native 3D generation.arXiv preprint, 2026

Anbang Wang, Yuzhuo Ao, Shangzhe Wu, and Chi-Keung Tang. SK-adapter: Skeleton-based structural control for native 3D generation.arXiv preprint, 2026. URL https://arxiv.org/ abs/2603.14152

arXiv 2026
[32]

Zhenwei Wang, Tengfei Wang, Zexin He, Gerhard Hancke, Ziwei Liu, and Rynson W. H. Lau. Phidias: A generative model for creating 3D content from text, image, and 3D conditions with reference-augmented diffusion. InInternational Conference on Learning Representations (ICLR), 2025. URL https://proceedings.iclr.cc/paper_files/paper/2025/hash/ 50ca96a1a9ebe0b5...

2025
[33]

Direct3D: Scalable image-to-3D generation via 3D latent diffusion transformer

Shuang Wu, Youtian Lin, Feihu Zhang, Yifei Zeng, Jingxi Xu, Philip Torr, Xun Cao, and Yao Yao. Direct3D: Scalable image-to-3D generation via 3D latent diffusion transformer. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, pages 121859–121881, 2024. doi: 10.52202/079017-3873. URL https://proceedings.neurips.cc/paper_files/paper/20...

work page doi:10.52202/079017-3873 2024
[34]

Points-to-3D: Structure- aware 3D generation with point cloud priors

Jiatong Xia, Zicheng Duan, Anton van den Hengel, and Lingqiao Liu. Points-to-3D: Structure- aware 3D generation with point cloud priors. InConference on Computer Vision and Pattern Recognition (CVPR), 2026. URLhttps://jiatongxia.github.io/points2-3D/

2026
[35]

Native and compact structured latents for 3D generation.arXiv preprint, 2025

Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, and Jiaolong Yang. Native and compact structured latents for 3D generation.arXiv preprint, 2025. URL https://arxiv.org/abs/ 2512.14692

Pith/arXiv arXiv 2025
[36]

Structured 3D latents for scalable and versatile 3D generation

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3D latents for scalable and versatile 3D generation. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 21469–21480, 2025. URLhttps://arxiv.org/abs/2412.01506

Pith/arXiv arXiv 2025
[37]

In- stantMesh: Efficient 3D mesh generation from a single image with sparse-view large recon- struction models.arXiv preprint, 2024

Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, and Ying Shan. In- stantMesh: Efficient 3D mesh generation from a single image with sparse-view large recon- struction models.arXiv preprint, 2024. URLhttps://arxiv.org/abs/2404.07191

Pith/arXiv arXiv 2024
[38]

Lambourne, Pradeep Kumar Jayaraman, Zhengqing Wang, Karl D.D

Xiang Xu, Joseph G. Lambourne, Pradeep Kumar Jayaraman, Zhengqing Wang, Karl D.D. Willis, and Yasutaka Furukawa. BrepGen: A b-rep generative diffusion model with structured latent geometry.ACM Transactions on Graphics (TOG), 43(4):1–14, 2024. doi: 10.1145/ 3658129. URLhttps://brepgen.github.io/

2024
[39]

OmniPart: Part-aware 3D generation with semantic decoupling and structural cohesion.arXiv preprint, 2025

Yunhan Yang, Yufan Zhou, Yuan-Chen Guo, Zi-Xin Zou, Yukun Huang, Ying-Tian Liu, Hao Xu, Ding Liang, Yan-Pei Cao, and Xihui Liu. OmniPart: Part-aware 3D generation with semantic decoupling and structural cohesion.arXiv preprint, 2025. URL https://arxiv.org/abs/ 2507.06165. SIGGRAPH Asia 2025. 12

arXiv 2025
[40]

IP-adapter: Text compatible image prompt adapter for text-to-image diffusion models.arXiv preprint, 2023

Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. IP-adapter: Text compatible image prompt adapter for text-to-image diffusion models.arXiv preprint, 2023. URL https:// arxiv.org/abs/2308.06721

Pith/arXiv arXiv 2023
[41]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. InInternational Conference on Computer Vision (ICCV), 2023. URL https://arxiv.org/abs/2302.05543

Pith/arXiv arXiv 2023
[42]

Assembler: Scalable 3D part assembly via anchor point diffusion.arXiv preprint, 2025

Wang Zhao, Yan-Pei Cao, Jiale Xu, Yuejiang Dong, and Ying Shan. Assembler: Scalable 3D part assembly via anchor point diffusion.arXiv preprint, 2025. URL https://arxiv.org/ abs/2506.17074. SIGGRAPH Asia 2025

arXiv 2025
[43]

Hunyuan3D 2.0: Scaling diffusion models for high resolution textured 3D assets generation.arXiv preprint, 2025

Zibo Zhao, Zeqiang Lai, Qingxiang Lin, Yunfei Zhao, Haolin Liu, Shuhui Yang, Yifei Feng, Mingxin Yang, Sheng Zhang, Xianghui Yang, Huiwen Shi, Sicong Liu, Junta Wu, Yihang Lian, Fan Yang, Ruining Tang, Zebin He, Xinzhou Wang, Jian Liu, Xuhui Zuo, Zhuo Chen, Biwen Lei, Haohan Weng, Jing Xu, Yiling Zhu, Xinhai Liu, Lixin Xu, Changrong Hu, Shaoxiong Yang, So...

2025
[44]

PartSAM: A scalable promptable part segmentation model trained on native 3D data.arXiv preprint, 2026

Zhe Zhu, Le Wan, Rui Xu, Yiheng Zhang, Honghua Chen, Zhiyang Dou, Cheng Lin, Yuan Liu, and Mingqiang Wei. PartSAM: A scalable promptable part segmentation model trained on native 3D data.arXiv preprint, 2026. URL https://arxiv.org/abs/2509.21965. ICLR 2026. 13 A Supplementary Material This supplement adds the details that support the experimental claims b...

arXiv 2026

[1] [1]

Kim, Noam Aigerman, Amit H

Amir Barda, Matheus Gadelha, Vladimir G. Kim, Noam Aigerman, Amit H. Bermano, and Thibault Groueix. Instant3dit: Multiview inpainting for fast editing of 3D objects. InConference on Computer Vision and Pattern Recognition (CVPR), pages 16273–16282, 2025. URL https: //arxiv.org/abs/2412.00518

arXiv 2025

[2] [2]

SF3D: Stable fast 3D mesh reconstruction with UV-unwrapping and illumination disentanglement

Mark Boss, Zixuan Huang, Aaryaman Vasishta, and Varun Jampani. SF3D: Stable fast 3D mesh reconstruction with UV-unwrapping and illumination disentanglement. InConference on Computer Vision and Pattern Recognition (CVPR), pages 16240–16250, 2025. URL https: //arxiv.org/abs/2408.00653

arXiv 2025

[3] [3]

DiffComplete: Diffusion-based generative 3D shape completion

Ruihang Chu, Enze Xie, Shentong Mo, Zhenguo Li, Matthias Nießner, Chi-Wing Fu, and Jiaya Jia. DiffComplete: Diffusion-based generative 3D shape completion. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. URL https://arxiv.org/abs/ 2306.16329

arXiv 2023

[4] [4]

Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, and Jitendra Malik

Jasmine Collins, Shubham Goel, Kenan Deng, Achleshwar Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F. Yago Vicente, Thomas Dideriksen, Himanshu Arora, Matthieu Guillaumin, and Jitendra Malik. ABO: Dataset and benchmarks for real-world 3D object understanding. InConference on Computer Vision and Pattern Recognition (CVPR), 2022. URL https: //arxiv.org/...

arXiv 2022

[5] [5]

Objaverse-XL: A universe of 10M+ 3D objects

Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl V ondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, and Ali Farhadi. Objaverse-XL: A universe of 10M+ 3D objects. InAdvances in Neural Information Processi...

Pith/arXiv arXiv 2023

[6] [6]

V*: Guided visual search as a core mechanism in multimodal llms

Jan-Niklas Dihlmann, Andreas Engelhardt, and Hendrik Lensch. SIGNeRF: Scene integrated generation for neural radiance fields. InConference on Computer Vision and Pattern Recognition (CVPR), 2024. doi: 10.1109/CVPR52733.2024.00638. URL https://doi.org/10.1109/ CVPR52733.2024.00638

work page doi:10.1109/cvpr52733.2024.00638 2024

[7] [7]

Jan-Niklas Dihlmann, Mark Boss, Simon Donne, Andreas Engelhardt, Hendrik P. A. Lensch, and Varun Jampani. ReLi3D: Relightable multi-view 3D reconstruction with disentangled illumination. InInternational Conference on Learning Representations (ICLR), 2026. URL https://openreview.net/forum?id=BlSKgQb3Vd

2026

[8] [8]

2d gaussian splatting for geometrically accurate radiance fields,

Wenqi Dong, Bangbang Yang, Lin Ma, Xiao Liu, Liyuan Cui, Hujun Bao, Yuewen Ma, and Zhaopeng Cui. Coin3D: Controllable and interactive 3D assets generation with proxy- guided conditioning. InACM SIGGRAPH, 2024. doi: 10.1145/3641519.3657425. URL https://doi.org/10.1145/3641519.3657425

work page doi:10.1145/3641519.3657425 2024

[9] [9]

SpaceControl: Introducing test-time spatial control to 3D generative modeling

Elisabetta Fedele, Francis Engelmann, Ian Huang, Or Litany, Marc Pollefeys, and Leonidas Guibas. SpaceControl: Introducing test-time spatial control to 3D generative modeling. InInternational Conference on Learning Representations (ICLR), 2026. URL https: //openreview.net/forum?id=mEqsCVI5sN

2026

[10] [10]

ObjFiller-3D: Consistent multi-view 3D inpainting via video diffusion models.arXiv preprint, 2025

Haitang Feng, Jie Liu, Jie Tang, Gangshan Wu, Beiqi Chen, Jianhuang Lai, and Guangcong Wang. ObjFiller-3D: Consistent multi-view 3D inpainting via video diffusion models.arXiv preprint, 2025. URLhttps://arxiv.org/abs/2508.18271

Pith/arXiv arXiv 2025

[11] [11]

OpenLRM: Open-source large reconstruction models

Zexin He and Tengfei Wang. OpenLRM: Open-source large reconstruction models. https:// github.com/3DTopia/OpenLRM, 2023. URL https://github.com/3DTopia/OpenLRM. GitHub repository; open-source implementation of LRM, not a primary paper. 10

2023

[12] [12]

SPAGHETTI

Amir Hertz, Or Perel, Raja Giryes, Olga Sorkine-Hornung, and Daniel Cohen-Or. SPAGHETTI. ACM Transactions on Graphics (TOG), 2022. doi: 10.1145/3528223.3530084. URL https: //doi.org/10.1145/3528223.3530084

work page doi:10.1145/3528223.3530084 2022

[13] [13]

LRM: Large reconstruction model for single image to 3D

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large reconstruction model for single image to 3D. InInternational Conference on Learning Representations (ICLR), 2024. URL https: //openreview.net/forum?id=sllU8vvsFF

2024

[14] [14]

Easy3E: Feed-forward 3D asset editing via rectified voxel flow.arXiv preprint, 2026

Shimin Hu, Yuanyi Wei, Fei Zha, Yudong Guo, and Juyong Zhang. Easy3E: Feed-forward 3D asset editing via rectified voxel flow.arXiv preprint, 2026. URL https://arxiv.org/abs/ 2602.21499. CVPR 2026

arXiv 2026

[15] [15]

SPAR3D: Stable point-aware reconstruction of 3D objects from single images

Zixuan Huang, Mark Boss, Aaryaman Vasishta, James Matthew Rehg, and Varun Jampani. SPAR3D: Stable point-aware reconstruction of 3D objects from single images. InConference on Computer Vision and Pattern Recognition (CVPR), pages 16860–16870, 2025. URL https: //arxiv.org/abs/2501.04689

arXiv 2025

[16] [16]

Otaduy, and Dan Casas

Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, and Ben Poole. Zero-shot text-guided object generation with dream fields. InConference on Computer Vision and Pattern Recognition (CVPR), pages 857–866, 2022. doi: 10.1109/CVPR52688.2022.00094. URL https://doi.org/10.1109/CVPR52688.2022.00094

work page doi:10.1109/cvpr52688.2022.00094 2022

[17] [17]

Chang, and Manolis Savva

Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Schacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X. Chang, and Manolis Savva. Habitat synthetic scenes dataset (HSSD-200): An analysis of 3D scene scale and realism tradeoffs for ObjectGoal navigation. InIEEE International Conference on Robotics and Automation (ICRA), 2024. ...

arXiv 2024

[18] [18]

SALAD: Part-level latent diffusion for 3D shape generation and manipulation

Juil Koo, Seungwoo Yoo, Minh Hieu Nguyen, and Minhyuk Sung. SALAD: Part-level latent diffusion for 3D shape generation and manipulation. InInternational Conference on Computer Vision (ICCV), 2023. URLhttps://arxiv.org/abs/2303.12236

arXiv 2023

[19] [19]

BoxSplitGen: A generative model for 3D part bounding boxes in varying granularity

Juil Koo, Wei-Tung Lin, Chanho Park, Chanhyeok Park, and Minhyuk Sung. BoxSplitGen: A generative model for 3D part bounding boxes in varying granularity. InWinter Conference on Applications of Computer Vision (WACV), pages 1777–1787, 2026. URL https://arxiv. org/abs/2602.20666

arXiv 2026

[20] [20]

Lightlab: Controlling light sources in images with diffusion models

Mingi Lee, Dongsu Zhang, Clément Jambon, and Young Min Kim. BrepDiff: Single-stage b-rep diffusion model. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, SIGGRAPH Conference Papers ’25, New York, NY , USA, 2025. Association for Computing Machinery. ISBN 9798400715402. doi: 10.1145/3...

work page doi:10.1145/3721238.3730698 2025

[21] [21]

Black, and Otmar Hilliges

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3D: High-resolution text-to-3D content creation. InConference on Computer Vision and Pattern Recognition (CVPR), pages 300–309, 2023. doi: 10.1109/CVPR52729.2023.00037. URL https://doi.org/10.1109/ CVPR52729.2023.00037

work page doi:10.1109/cvpr52729.2023.00037 2023

[22] [22]

T2I-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2I-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InAAAI Conference on Artificial Intelligence (AAAI), pages 4296–4304, 2024. doi: 10.1609/AAAI.V38I5.28226. URLhttps://doi.org/10.1609/AAAI.V38I5.28226

work page doi:10.1609/aaai.v38i5.28226 2024

[23] [23]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jégou, Julien Mairal, Patric...

2024

[24] [24]

Barron, and Ben Mildenhall

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. DreamFusion: Text-to-3D using 2D diffusion. InInternational Conference on Learning Representations (ICLR), 2023. URL https://openreview.net/forum?id=FjNys5c7VyY. 11

2023

[25] [25]

Spice·e: Structural priors in 3D diffusion using cross-entity attention

Etai Sella, Gal Fiebelman, Noam Atia, and Hadar Averbuch-Elor. Spice·e: Structural priors in 3D diffusion using cross-entity attention. InACM SIGGRAPH, pages 1–11, 2024. URL https://arxiv.org/abs/2311.17834

arXiv 2024

[26] [26]

Stefan Stojanov, Anh Thai, and James M. Rehg. Using shape to categorize: Low-shot learning with an iterative categorization-discrimination loop. InConference on Computer Vision and Pattern Recognition (CVPR), 2021. URLhttps://arxiv.org/abs/2104.07371

arXiv 2021

[27] [27]

DreamCraft3D: Hierarchical 3D generation with bootstrapped diffusion prior

Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, and Yebin Liu. DreamCraft3D: Hierarchical 3D generation with bootstrapped diffusion prior. InInternational Conference on Learning Representations (ICLR), 2024. URL https://openreview.net/ forum?id=DDX1u29Gqr

2024

[28] [28]

DreamGaussian: Generative gaussian splatting for efficient 3D content creation

Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. DreamGaussian: Generative gaussian splatting for efficient 3D content creation. InInternational Conference on Learning Representations (ICLR), 2024. URLhttps://openreview.net/forum?id=UyNXMqnN3c

2024

[29] [29]

Hunyuan3D-omni: A unified framework for controllable generation of 3D assets.arXiv preprint, 2025

Team Hunyuan3D. Hunyuan3D-omni: A unified framework for controllable generation of 3D assets.arXiv preprint, 2025. URLhttps://arxiv.org/abs/2509.21245

arXiv 2025

[30] [30]

TripoSR: Fast 3D object reconstruction from a single image.arXiv preprint, 2024

Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, and Yan-Pei Cao. TripoSR: Fast 3D object reconstruction from a single image.arXiv preprint, 2024. URL https://arxiv.org/abs/ 2403.02151

Pith/arXiv arXiv 2024

[31] [31]

SK-adapter: Skeleton-based structural control for native 3D generation.arXiv preprint, 2026

Anbang Wang, Yuzhuo Ao, Shangzhe Wu, and Chi-Keung Tang. SK-adapter: Skeleton-based structural control for native 3D generation.arXiv preprint, 2026. URL https://arxiv.org/ abs/2603.14152

arXiv 2026

[32] [32]

Zhenwei Wang, Tengfei Wang, Zexin He, Gerhard Hancke, Ziwei Liu, and Rynson W. H. Lau. Phidias: A generative model for creating 3D content from text, image, and 3D conditions with reference-augmented diffusion. InInternational Conference on Learning Representations (ICLR), 2025. URL https://proceedings.iclr.cc/paper_files/paper/2025/hash/ 50ca96a1a9ebe0b5...

2025

[33] [33]

Direct3D: Scalable image-to-3D generation via 3D latent diffusion transformer

Shuang Wu, Youtian Lin, Feihu Zhang, Yifei Zeng, Jingxi Xu, Philip Torr, Xun Cao, and Yao Yao. Direct3D: Scalable image-to-3D generation via 3D latent diffusion transformer. InAdvances in Neural Information Processing Systems (NeurIPS), volume 37, pages 121859–121881, 2024. doi: 10.52202/079017-3873. URL https://proceedings.neurips.cc/paper_files/paper/20...

work page doi:10.52202/079017-3873 2024

[34] [34]

Points-to-3D: Structure- aware 3D generation with point cloud priors

Jiatong Xia, Zicheng Duan, Anton van den Hengel, and Lingqiao Liu. Points-to-3D: Structure- aware 3D generation with point cloud priors. InConference on Computer Vision and Pattern Recognition (CVPR), 2026. URLhttps://jiatongxia.github.io/points2-3D/

2026

[35] [35]

Native and compact structured latents for 3D generation.arXiv preprint, 2025

Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, and Jiaolong Yang. Native and compact structured latents for 3D generation.arXiv preprint, 2025. URL https://arxiv.org/abs/ 2512.14692

Pith/arXiv arXiv 2025

[36] [36]

Structured 3D latents for scalable and versatile 3D generation

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3D latents for scalable and versatile 3D generation. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 21469–21480, 2025. URLhttps://arxiv.org/abs/2412.01506

Pith/arXiv arXiv 2025

[37] [37]

In- stantMesh: Efficient 3D mesh generation from a single image with sparse-view large recon- struction models.arXiv preprint, 2024

Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, and Ying Shan. In- stantMesh: Efficient 3D mesh generation from a single image with sparse-view large recon- struction models.arXiv preprint, 2024. URLhttps://arxiv.org/abs/2404.07191

Pith/arXiv arXiv 2024

[38] [38]

Lambourne, Pradeep Kumar Jayaraman, Zhengqing Wang, Karl D.D

Xiang Xu, Joseph G. Lambourne, Pradeep Kumar Jayaraman, Zhengqing Wang, Karl D.D. Willis, and Yasutaka Furukawa. BrepGen: A b-rep generative diffusion model with structured latent geometry.ACM Transactions on Graphics (TOG), 43(4):1–14, 2024. doi: 10.1145/ 3658129. URLhttps://brepgen.github.io/

2024

[39] [39]

OmniPart: Part-aware 3D generation with semantic decoupling and structural cohesion.arXiv preprint, 2025

Yunhan Yang, Yufan Zhou, Yuan-Chen Guo, Zi-Xin Zou, Yukun Huang, Ying-Tian Liu, Hao Xu, Ding Liang, Yan-Pei Cao, and Xihui Liu. OmniPart: Part-aware 3D generation with semantic decoupling and structural cohesion.arXiv preprint, 2025. URL https://arxiv.org/abs/ 2507.06165. SIGGRAPH Asia 2025. 12

arXiv 2025

[40] [40]

IP-adapter: Text compatible image prompt adapter for text-to-image diffusion models.arXiv preprint, 2023

Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. IP-adapter: Text compatible image prompt adapter for text-to-image diffusion models.arXiv preprint, 2023. URL https:// arxiv.org/abs/2308.06721

Pith/arXiv arXiv 2023

[41] [41]

Adding conditional control to text-to-image diffusion models

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. InInternational Conference on Computer Vision (ICCV), 2023. URL https://arxiv.org/abs/2302.05543

Pith/arXiv arXiv 2023

[42] [42]

Assembler: Scalable 3D part assembly via anchor point diffusion.arXiv preprint, 2025

Wang Zhao, Yan-Pei Cao, Jiale Xu, Yuejiang Dong, and Ying Shan. Assembler: Scalable 3D part assembly via anchor point diffusion.arXiv preprint, 2025. URL https://arxiv.org/ abs/2506.17074. SIGGRAPH Asia 2025

arXiv 2025

[43] [43]

Hunyuan3D 2.0: Scaling diffusion models for high resolution textured 3D assets generation.arXiv preprint, 2025

Zibo Zhao, Zeqiang Lai, Qingxiang Lin, Yunfei Zhao, Haolin Liu, Shuhui Yang, Yifei Feng, Mingxin Yang, Sheng Zhang, Xianghui Yang, Huiwen Shi, Sicong Liu, Junta Wu, Yihang Lian, Fan Yang, Ruining Tang, Zebin He, Xinzhou Wang, Jian Liu, Xuhui Zuo, Zhuo Chen, Biwen Lei, Haohan Weng, Jing Xu, Yiling Zhu, Xinhai Liu, Lixin Xu, Changrong Hu, Shaoxiong Yang, So...

2025

[44] [44]

PartSAM: A scalable promptable part segmentation model trained on native 3D data.arXiv preprint, 2026

Zhe Zhu, Le Wan, Rui Xu, Yiheng Zhang, Honghua Chen, Zhiyang Dou, Cheng Lin, Yuan Liu, and Mingqiang Wei. PartSAM: A scalable promptable part segmentation model trained on native 3D data.arXiv preprint, 2026. URL https://arxiv.org/abs/2509.21965. ICLR 2026. 13 A Supplementary Material This supplement adds the details that support the experimental claims b...

arXiv 2026