PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion

Baochang Zhang; Guojun Lei; Haodong Zhu; Hong Li; Linin Yang; Sheng Xu; Yichen Yang

arxiv: 2511.18801 · v3 · pith:KN2B6XG2new · submitted 2025-11-24 · 💻 cs.CV

PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion

Yichen Yang , Hong Li , Haodong Zhu , Linin Yang , Guojun Lei , Sheng Xu , Baochang Zhang This is my paper

Pith reviewed 2026-05-21 18:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D mesh generationdiscrete diffusionpart-wise generationsemi-autoregressivepoint cloud to meshDiT architecturesemantic segmentationcross-attention

0 comments

The pith

PartDiffuser generates 3D meshes from point clouds by autoregressing across semantic parts for global structure while diffusing in parallel inside each part for local details.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes PartDiffuser as a semi-autoregressive diffusion method to fix the balance problem in existing autoregressive 3D mesh generators, where global consistency often comes at the cost of lost fine details and accumulated errors. It first segments the input into semantic parts, then sequences the parts autoregressively to lock in overall topology and runs discrete diffusion steps simultaneously within each part to recover high-frequency geometry. A DiT backbone with part-aware cross-attention takes hierarchical point-cloud conditioning to guide both levels. If the separation works, the resulting meshes should show richer surface detail than prior models while keeping coherent large-scale shape. This matters for applications that need production-ready 3D assets from raw scans or sketches.

Core claim

PartDiffuser performs semantic segmentation on the input mesh or point cloud, then uses autoregression between parts to maintain global topology while running a parallel discrete diffusion process inside each semantic part to reconstruct high-frequency geometric features, all inside a DiT architecture equipped with a part-aware cross-attention layer that conditions on hierarchical point-cloud geometry to decouple the global and local tasks.

What carries the argument

The part-aware cross-attention mechanism inside the DiT backbone that uses hierarchical point-cloud conditioning to dynamically steer generation and separate global topology control from local detail reconstruction.

If this is right

Global structural consistency is achieved through autoregressive ordering of semantic parts rather than full-sequence autoregression.
High-frequency local details are recovered by parallel discrete diffusion performed independently inside each part.
Error accumulation across the entire object is limited because diffusion steps remain local to each semantic region.
Meshes exhibit richer surface detail than current state-of-the-art point-cloud-to-mesh generators while preserving overall topology.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The part-wise split could be tested on other conditional generation tasks such as texture synthesis or scene layout where global coherence and local fidelity must both be maintained.
If segmentation can be made lightweight and online, the framework might support interactive 3D modeling tools that accept partial point clouds.
The same conditioning hierarchy might improve consistency when extending the model to generate textured meshes or animated sequences.

Load-bearing premise

The method assumes that accurate semantic segmentation of the input can be obtained in advance and that the part-aware cross-attention will successfully prevent boundary artifacts when merging the autoregressive inter-part sequence with the parallel intra-part diffusion.

What would settle it

Quantitative results on a held-out test set of complex objects where the method shows no improvement over prior models on detail-sensitive metrics such as normal consistency or edge sharpness, or visual inspection revealing visible seams or loss of geometry at part boundaries in the generated meshes.

Figures

Figures reproduced from arXiv: 2511.18801 by Baochang Zhang, Guojun Lei, Haodong Zhu, Hong Li, Linin Yang, Sheng Xu, Yichen Yang.

**Figure 1.** Figure 1: Gallery of our mesh generation results. Abstract Existing autoregressive (AR) methods for generating artistdesigned meshes struggle to balance global structural consistency with high-fidelity local details, and are susceptible to error accumulation. To address this, we propose PartDiffuser, a novel semi-autoregressive diffusion framework for point-cloud-to-mesh generation. The method first performs *Equ… view at source ↗

**Figure 2.** Figure 2: An overview of our PartDiffuser framework. The process begins with semantic segmentation of the input point-cloud using [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of the composite attention mask during the [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Visual comparison of PartDiffuser with Baselines. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: An example of the ablation study. text, shows a degradation in performance. This indicates that while the global feature is crucial for capturing the holistic shape, it lacks the fine-grained guidance necessary for high-fidelity local detail. Without the specific {Cparti } features, the model struggles to precisely reconstruct the geometry of individual parts, resulting in lower overall accuracy. Convers… view at source ↗

**Figure 6.** Figure 6: Mesh with different faces in the dataset. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of the part-wise sampling process. The process evolves from left to right: (1) The first part being denoised, (2) [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Visual comparison of varying k. 2 [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

read the original abstract

Existing autoregressive (AR) methods for generating artist-designed meshes struggle to balance global structural consistency with high-fidelity local details, and are susceptible to error accumulation. To address this, we propose PartDiffuser, a novel semi-autoregressive diffusion framework for point-cloud-to-mesh generation. The method first performs semantic segmentation on the mesh and then operates in a "part-wise" manner: it employs autoregression between parts to ensure global topology, while utilizing a parallel discrete diffusion process within each semantic part to precisely reconstruct high-frequency geometric features. PartDiffuser is based on the DiT architecture and introduces a part-aware cross-attention mechanism, using point clouds as hierarchical geometric conditioning to dynamically control the generation process, thereby effectively decoupling the global and local generation tasks. Experiments demonstrate that this method significantly outperforms state-of-the-art (SOTA) models in generating 3D meshes with rich detail, exhibiting exceptional detail representation suitable for real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PartDiffuser splits 3D mesh generation into inter-part autoregression for topology and intra-part parallel diffusion for details, but the SOTA outperformance claim cannot be checked from the abstract and rests on untested segmentation and boundary assumptions.

read the letter

The core idea is to take existing autoregressive mesh generators, which accumulate errors when building global structure, and split the work: run autoregression only across semantic parts to keep overall topology consistent, then run discrete diffusion in parallel inside each part to capture fine geometry. The model is a DiT variant that adds part-aware cross-attention and feeds hierarchical point-cloud features as conditioning. That combination is the actual novelty; it is not just another diffusion variant but a deliberate decoupling of scale levels through the part structure and the attention mechanism. The description of how the conditioning is meant to control global versus local behavior is clear and shows the authors thought through the failure modes of pure AR approaches. Credit for that. The experiments are described only at the level of the abstract, which claims clear wins on detail fidelity and real-world suitability. No tables, no specific metrics, and no listed baselines appear in the provided text, so the size of the improvement stays unknown. The method also requires an upfront semantic segmentation step whose accuracy is taken as given. If the parts do not line up with actual geometric features, or if the cross-attention leaves visible seams at the interfaces, the reported advantage would shrink or vanish. The stress-test note correctly flags both issues, and nothing in the abstract supplies boundary-error numbers or segmentation-robustness checks to address them. This work is aimed at people already building point-cloud-to-mesh pipelines in graphics and computer vision. A reader who needs a practical way to trade off global coherence against local detail might pick up the part-wise pattern even if the current numbers need verification. The paper is coherent enough on its own terms to deserve a serious referee who can examine the full results, ablations, and any boundary metrics that were actually run.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes PartDiffuser, a semi-autoregressive discrete diffusion framework for point-cloud-to-mesh generation. It first performs semantic segmentation on the input, then applies autoregressive generation across parts to maintain global topology while running parallel discrete diffusion within each part to recover high-frequency details. The DiT architecture is extended with a part-aware cross-attention mechanism that conditions generation on hierarchical point clouds, thereby decoupling global and local tasks. The abstract states that experiments show significant outperformance over SOTA models in detail fidelity for real-world applications.

Significance. If the empirical claims are substantiated, the part-wise decomposition could offer a practical way to reconcile global consistency with local geometric fidelity in conditional mesh generation. The combination of autoregressive inter-part modeling and intra-part discrete diffusion, together with cross-attention conditioning, represents an architectural pattern that may influence subsequent work on structured 3D synthesis.

major comments (2)

[Abstract] Abstract: the central claim that the method 'significantly outperforms state-of-the-art (SOTA) models in generating 3D meshes with rich detail' is unsupported by any quantitative metrics, baseline comparisons, error measures, or experimental protocol; without these the outperformance assertion cannot be evaluated and is load-bearing for the paper's contribution.
[Method] Method description (abstract and implied §3): the framework presupposes accurate upfront semantic segmentation and artifact-free boundary handling via part-aware cross-attention, yet no robustness analysis, ablation on segmentation noise, or boundary-specific metrics (e.g., normal consistency or edge error across part interfaces) are reported; these assumptions directly determine whether the global-local decoupling succeeds.

minor comments (2)

[Abstract] Abstract: the term 'semi-autoregressive' is introduced without a concise definition of how the autoregressive inter-part schedule interacts with the parallel intra-part diffusion steps.
[Abstract] Abstract: 'hierarchical point clouds' are mentioned as conditioning input but the construction of the hierarchy (number of levels, sampling strategy) is not specified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review of our manuscript. We address each of the major comments below and have made revisions to the manuscript where appropriate to strengthen the presentation of our results and analysis.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the method 'significantly outperforms state-of-the-art (SOTA) models in generating 3D meshes with rich detail' is unsupported by any quantitative metrics, baseline comparisons, error measures, or experimental protocol; without these the outperformance assertion cannot be evaluated and is load-bearing for the paper's contribution.

Authors: The abstract is intended as a concise overview, and the supporting quantitative evidence—including specific metrics, baseline comparisons, and the experimental setup—is provided in detail in Section 4 of the full manuscript. Nevertheless, we agree that incorporating key quantitative results into the abstract would make the claim more immediately verifiable. We have revised the abstract to include brief references to the performance gains observed in our experiments. revision: yes
Referee: [Method] Method description (abstract and implied §3): the framework presupposes accurate upfront semantic segmentation and artifact-free boundary handling via part-aware cross-attention, yet no robustness analysis, ablation on segmentation noise, or boundary-specific metrics (e.g., normal consistency or edge error across part interfaces) are reported; these assumptions directly determine whether the global-local decoupling succeeds.

Authors: We recognize the importance of validating the robustness of the part-wise approach to segmentation inaccuracies. The current work assumes high-quality semantic segmentation as input, consistent with many part-based 3D generation methods. To directly address this concern, we have conducted additional experiments and included an ablation study on segmentation noise levels along with boundary-specific metrics in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural combination with independent experimental claims

full rationale

The paper introduces PartDiffuser as a new semi-autoregressive framework that combines upfront semantic segmentation, autoregressive inter-part generation for global topology, parallel discrete diffusion within parts for local details, and part-aware cross-attention on hierarchical point clouds. No equations, fitted parameters, or derivation steps appear that reduce any claimed prediction or result to the inputs by construction. The abstract and description frame the approach as an original architectural synthesis rather than a self-referential fit or renamed prior result. Central performance claims rest on experimental outperformance rather than load-bearing self-citations or uniqueness theorems imported from the authors' prior work. This is the common case of a self-contained engineering contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are specified in the abstract; the approach builds on standard DiT and discrete diffusion components from prior literature without introducing new postulated entities.

pith-pipeline@v0.9.0 · 5709 in / 1174 out tokens · 45793 ms · 2026-05-21T18:32:37.904977+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

employs autoregression between parts to ensure global topology, while utilizing a parallel discrete diffusion process within each semantic part
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

part-aware cross-attention mechanism, using point clouds as hierarchical geometric conditioning

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 7 internal anchors

[1]

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Marianne Arriola, Aaron Gokaslan, Justin T Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, and V olodymyr Kuleshov. Block diffusion: Interpolating be- tween autoregressive and diffusion language models.arXiv preprint arXiv:2503.09573, 2025. 2, 3, 5, 6

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Structured denoising dif- fusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tar- low, and Rianne Van Den Berg. Structured denoising dif- fusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021. 2, 3

work page 2021
[3]

Partgen: Part-level 3d generation and reconstruction with multi-view diffusion models

Minghao Chen, Roman Shapovalov, Iro Laina, Tom Mon- nier, Jianyuan Wang, David Novotny, and Andrea Vedaldi. Partgen: Part-level 3d generation and reconstruction with multi-view diffusion models. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 5881–5892, 2025. 3

work page 2025
[4]

Autopartgen: Autogres- sive 3d part generation and discovery.arXiv preprint arXiv:2507.13346, 2025

Minghao Chen, Jianyuan Wang, Roman Shapovalov, Tom Monnier, Hyunyoung Jung, Dilin Wang, Rakesh Ranjan, Iro Laina, and Andrea Vedaldi. Autopartgen: Autogres- sive 3d part generation and discovery.arXiv preprint arXiv:2507.13346, 2025. 3

work page arXiv 2025
[5]

Meshxl: Neural coordinate field for generative 3d foundation models.Advances in Neural Information Pro- cessing Systems, 37:97141–97166, 2024

Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Billzb Wang, Jingyi Yu, Gang Yu, et al. Meshxl: Neural coordinate field for generative 3d foundation models.Advances in Neural Information Pro- cessing Systems, 37:97141–97166, 2024. 2

work page 2024
[6]

Meshanything: Artist-created mesh generation with autoregressive transformers.arXiv preprint arXiv:2406.10163, 2024

Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Ji- axiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, et al. Meshanything: Artist-created mesh generation with au- toregressive transformers.arXiv preprint arXiv:2406.10163,

work page arXiv
[7]

Meshany- thing v2: Artist-created mesh generation with adjacent mesh tokenization

Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, and Guosheng Lin. Meshany- thing v2: Artist-created mesh generation with adjacent mesh tokenization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13922–13931, 2025. 6

work page 2025
[8]

Objaverse: A universe of annotated 3d objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13142–13153, 2023. 6, 8, 1

work page 2023
[9]

Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 3

work page 2021
[10]

3d-front: 3d furnished rooms with layouts and semantics

Huan Fu, Bowen Cai, Lin Gao, Ling-Xiao Zhang, Jiaming Wang, Cao Li, Qixun Zeng, Chengyue Sun, Rongfei Jia, Bin- qiang Zhao, et al. 3d-front: 3d furnished rooms with layouts and semantics. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 10933–10942,

work page
[11]

Memdlm: De novo membrane protein design with masked discrete diffusion protein language models

Shrey Goel, Vishrut Thoutam, Edgar Mariano Marro- quin, Aaron Gokaslan, Arash Firouzbakht, Sophia Vincoff, V olodymyr Kuleshov, Huong T Kratochvil, and Pranam Chatterjee. Memdlm: De novo membrane protein design with masked discrete diffusion protein language models. arXiv preprint arXiv:2410.16735, 2024. 3

work page arXiv 2024
[12]

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, et al. Scaling diffusion language models via adaptation from autoregressive models.arXiv preprint arXiv:2410.17891, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Diffu- coder: Understanding and improving masked diffusion mod- els for code generation.arXiv preprint arXiv:2506.20639,

Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, and Yizhe Zhang. Diffu- coder: Understanding and improving masked diffusion mod- els for code generation.arXiv preprint arXiv:2506.20639,

work page arXiv
[14]

Meshtron: High-fidelity, artist-like 3d mesh generation at scale.arXiv preprint arXiv:2412.09548, 2024

Zekun Hao, David W Romero, Tsung-Yi Lin, and Ming-Yu Liu. Meshtron: High-fidelity, artist-like 3d mesh generation at scale.arXiv preprint arXiv:2412.09548, 2024. 2

work page arXiv 2024
[15]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. 3

work page internal anchor Pith review Pith/arXiv arXiv 2022
[16]

Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 3

work page 2020
[17]

Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal naviga- tion

Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X Chang, and Manolis Savva. Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal naviga- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern...

work page
[18]

Mercury: Ultra-Fast Language Models Based on Diffusion

Samar Khanna, Siddhant Kharbanda, Shufan Li, Harshit Varma, Eric Wang, Sawyer Birnbaum, Ziyang Luo, Ya- nis Miraoui, Akash Palrecha, Stefano Ermon, et al. Mer- cury: Ultra-fast language models based on diffusion.arXiv preprint arXiv:2506.17298, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Diffusion-lm improves control- lable text generation.Advances in neural information pro- cessing systems, 35:4328–4343, 2022

Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. Diffusion-lm improves control- lable text generation.Advances in neural information pro- cessing systems, 35:4328–4343, 2022. 3

work page 2022
[20]

Partcrafter: Structured 3d mesh generation via compositional latent diffusion trans- formers.arXiv preprint arXiv:2506.05573, 2025

Yuchen Lin, Chenguo Lin, Panwang Pan, Honglei Yan, Yiqiang Feng, Yadong Mu, and Katerina Fragkiadaki. Partcrafter: Structured 3d mesh generation via compo- sitional latent diffusion transformers.arXiv preprint arXiv:2506.05573, 2025. 3

work page arXiv 2025
[21]

Treemeshgpt: Artistic mesh generation with autoregressive tree sequenc- ing

Stefan Lionar, Jiabin Liang, and Gim Hee Lee. Treemeshgpt: Artistic mesh generation with autoregressive tree sequenc- ing. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26608–26617, 2025. 2, 3, 6

work page 2025
[22]

Part123: part-aware 3d reconstruction from a single-view image

Anran Liu, Cheng Lin, Yuan Liu, Xiaoxiao Long, Zhiyang Dou, Hao-Xiang Guo, Ping Luo, and Wenping Wang. Part123: part-aware 3d reconstruction from a single-view image. InACM SIGGRAPH 2024 Conference Papers, pages 1–12, 2024. 3

work page 2024
[23]

Partfield: Learn- ing 3d feature fields for part segmentation and beyond

Minghua Liu, Mikaela Angelina Uy, Donglai Xiang, Hao Su, Sanja Fidler, Nicholas Sharp, and Jun Gao. Partfield: Learn- ing 3d feature fields for part segmentation and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9704–9715, 2025. 4, 6, 1 9

work page 2025
[24]

Wonder3d: Sin- gle image to 3d using cross-domain diffusion

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Sin- gle image to 3d using cross-domain diffusion. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9970–9980, 2024. 2

work page 2024
[25]

Marching cubes: A high resolution 3d surface construction algorithm

William E Lorensen and Harvey E Cline. Marching cubes: A high resolution 3d surface construction algorithm. InSem- inal graphics: pioneering efforts that shaped the field, pages 347–353. 1998. 2

work page 1998
[26]

Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2

work page 2021
[27]

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.arXiv preprint arXiv:2502.09992, 2025. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

Deepsdf: Learning con- tinuous signed distance functions for shape representation

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning con- tinuous signed distance functions for shape representation. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 165–174, 2019. 2

work page 2019
[29]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 4195–4205,

work page
[30]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 3

work page 2022
[31]

Meshgpt: Generating triangle meshes with decoder-only transformers

Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Ta- tiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, and Matthias Nießner. Meshgpt: Generating triangle meshes with decoder-only transformers. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19615–19625, 2024. 2

work page 2024
[32]

Topology sculptor, shape refiner: Discrete diffusion model for high-fidelity 3d meshes generation.arXiv preprint arXiv:2510.21264, 2025

Kaiyu Song, Hanjiang Lai, Yaqing Zhang, Chuangjian Cai, Yan Pan Kun Yue, and Jian Yin. Topology sculptor, shape refiner: Discrete diffusion model for high-fidelity 3d meshes generation.arXiv preprint arXiv:2510.21264, 2025. 3

work page arXiv 2025
[33]

Efficient part-level 3d object generation via dual volume packing.arXiv preprint arXiv:2506.09980,

Jiaxiang Tang, Ruijie Lu, Zhaoshuo Li, Zekun Hao, Xuan Li, Fangyin Wei, Shuran Song, Gang Zeng, Ming-Yu Liu, and Tsung-Yi Lin. Efficient part-level 3d object generation via dual volume packing.arXiv preprint arXiv:2506.09980,

work page arXiv
[34]

arXiv preprint arXiv:2410.13782 , year=

Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, and Quanquan Gu. Dplm-2: A multi- modal diffusion protein language model.arXiv preprint arXiv:2410.13782, 2024. 3

work page arXiv 2024
[35]

LLaMA-Mesh: Unifying 3d mesh generation with language models.arXiv preprint arXiv:2411.09595, 2024

Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, and Xiaohui Zeng. Llama-mesh: Unifying 3d mesh generation with language models.arXiv preprint arXiv:2411.09595, 2024. 2

work page arXiv 2024
[36]

Scaling mesh generation via compressive tokenization

Haohan Weng, Zibo Zhao, Biwen Lei, Xianghui Yang, Jian Liu, Zeqiang Lai, Zhuo Chen, Yuhong Liu, Jie Jiang, Chun- chao Guo, et al. Scaling mesh generation via compressive tokenization. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11093–11103, 2025. 2, 3, 4, 6, 1

work page 2025
[37]

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Chengyue Wu, Hao Zhang, Shuchen Xue, Zhijian Liu, Shizhe Diao, Ligeng Zhu, Ping Luo, Song Han, and Enze Xie. Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding.arXiv preprint arXiv:2505.22618, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[38]

Structured 3d latents for scalable and versatile 3d gen- eration

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d gen- eration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21469–21480, 2025. 2

work page 2025
[39]

Frankenstein: Generating semantic- compositional 3d scenes in one tri-plane

Han Yan, Yang Li, Zhennan Wu, Shenzhou Chen, Weix- uan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, et al. Frankenstein: Generating semantic- compositional 3d scenes in one tri-plane. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024. 3

work page 2024
[40]

Phycage: Physically plausible compositional 3d asset gener- ation from a single image.arXiv preprint arXiv:2411.18548,

Han Yan, Mingrui Zhang, Yang Li, Chao Ma, and Pan Ji. Phycage: Physically plausible compositional 3d asset gener- ation from a single image.arXiv preprint arXiv:2411.18548,

work page arXiv
[41]

Holopart: Generative 3d part amodal segmentation.arXiv preprint arXiv:2504.07943, 2025

Yunhan Yang, Yuan-Chen Guo, Yukun Huang, Zi-Xin Zou, Zhipeng Yu, Yangguang Li, Yan-Pei Cao, and Xihui Liu. Holopart: Generative 3d part amodal segmentation.arXiv preprint arXiv:2504.07943, 2025. 3

work page arXiv 2025
[42]

Dream 7B: Diffusion Large Language Models

Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models.arXiv preprint arXiv:2508.15487, 2025. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Deepmesh: Auto- regressive artist-mesh creation with reinforcement learning

Ruowen Zhao, Junliang Ye, Zhengyi Wang, Guangce Liu, Yiwen Chen, Yikai Wang, and Jun Zhu. Deepmesh: Auto- regressive artist-mesh creation with reinforcement learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10612–10623, 2025. 2, 3

work page 2025
[44]

Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation.Advances in neural information processing systems, 36:73969–73982,

Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, and Shenghua Gao. Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation.Advances in neural information processing systems, 36:73969–73982,

work page
[45]

Dataset Construction As a supplement to the dataset introduction in the main text, we provide a detailed description of the dataset construction process

4 10 PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion Supplementary Material A. Dataset Construction As a supplement to the dataset introduction in the main text, we provide a detailed description of the dataset construction process. We utilize Objaverse [8] and 3D-Front [10] as our primary data sources. The data preprocessing pipeline co...

work page

[1] [1]

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Marianne Arriola, Aaron Gokaslan, Justin T Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, and V olodymyr Kuleshov. Block diffusion: Interpolating be- tween autoregressive and diffusion language models.arXiv preprint arXiv:2503.09573, 2025. 2, 3, 5, 6

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Structured denoising dif- fusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021

Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tar- low, and Rianne Van Den Berg. Structured denoising dif- fusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021. 2, 3

work page 2021

[3] [3]

Partgen: Part-level 3d generation and reconstruction with multi-view diffusion models

Minghao Chen, Roman Shapovalov, Iro Laina, Tom Mon- nier, Jianyuan Wang, David Novotny, and Andrea Vedaldi. Partgen: Part-level 3d generation and reconstruction with multi-view diffusion models. InProceedings of the Com- puter Vision and Pattern Recognition Conference, pages 5881–5892, 2025. 3

work page 2025

[4] [4]

Autopartgen: Autogres- sive 3d part generation and discovery.arXiv preprint arXiv:2507.13346, 2025

Minghao Chen, Jianyuan Wang, Roman Shapovalov, Tom Monnier, Hyunyoung Jung, Dilin Wang, Rakesh Ranjan, Iro Laina, and Andrea Vedaldi. Autopartgen: Autogres- sive 3d part generation and discovery.arXiv preprint arXiv:2507.13346, 2025. 3

work page arXiv 2025

[5] [5]

Meshxl: Neural coordinate field for generative 3d foundation models.Advances in Neural Information Pro- cessing Systems, 37:97141–97166, 2024

Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Billzb Wang, Jingyi Yu, Gang Yu, et al. Meshxl: Neural coordinate field for generative 3d foundation models.Advances in Neural Information Pro- cessing Systems, 37:97141–97166, 2024. 2

work page 2024

[6] [6]

Meshanything: Artist-created mesh generation with autoregressive transformers.arXiv preprint arXiv:2406.10163, 2024

Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Ji- axiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, et al. Meshanything: Artist-created mesh generation with au- toregressive transformers.arXiv preprint arXiv:2406.10163,

work page arXiv

[7] [7]

Meshany- thing v2: Artist-created mesh generation with adjacent mesh tokenization

Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, and Guosheng Lin. Meshany- thing v2: Artist-created mesh generation with adjacent mesh tokenization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13922–13931, 2025. 6

work page 2025

[8] [8]

Objaverse: A universe of annotated 3d objects

Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13142–13153, 2023. 6, 8, 1

work page 2023

[9] [9]

Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 3

work page 2021

[10] [10]

3d-front: 3d furnished rooms with layouts and semantics

Huan Fu, Bowen Cai, Lin Gao, Ling-Xiao Zhang, Jiaming Wang, Cao Li, Qixun Zeng, Chengyue Sun, Rongfei Jia, Bin- qiang Zhao, et al. 3d-front: 3d furnished rooms with layouts and semantics. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 10933–10942,

work page

[11] [11]

Memdlm: De novo membrane protein design with masked discrete diffusion protein language models

Shrey Goel, Vishrut Thoutam, Edgar Mariano Marro- quin, Aaron Gokaslan, Arash Firouzbakht, Sophia Vincoff, V olodymyr Kuleshov, Huong T Kratochvil, and Pranam Chatterjee. Memdlm: De novo membrane protein design with masked discrete diffusion protein language models. arXiv preprint arXiv:2410.16735, 2024. 3

work page arXiv 2024

[12] [12]

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, et al. Scaling diffusion language models via adaptation from autoregressive models.arXiv preprint arXiv:2410.17891, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Diffu- coder: Understanding and improving masked diffusion mod- els for code generation.arXiv preprint arXiv:2506.20639,

Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, and Yizhe Zhang. Diffu- coder: Understanding and improving masked diffusion mod- els for code generation.arXiv preprint arXiv:2506.20639,

work page arXiv

[14] [14]

Meshtron: High-fidelity, artist-like 3d mesh generation at scale.arXiv preprint arXiv:2412.09548, 2024

Zekun Hao, David W Romero, Tsung-Yi Lin, and Ming-Yu Liu. Meshtron: High-fidelity, artist-like 3d mesh generation at scale.arXiv preprint arXiv:2412.09548, 2024. 2

work page arXiv 2024

[15] [15]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. 3

work page internal anchor Pith review Pith/arXiv arXiv 2022

[16] [16]

Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 3

work page 2020

[17] [17]

Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal naviga- tion

Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X Chang, and Manolis Savva. Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal naviga- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern...

work page

[18] [18]

Mercury: Ultra-Fast Language Models Based on Diffusion

Samar Khanna, Siddhant Kharbanda, Shufan Li, Harshit Varma, Eric Wang, Sawyer Birnbaum, Ziyang Luo, Ya- nis Miraoui, Akash Palrecha, Stefano Ermon, et al. Mer- cury: Ultra-fast language models based on diffusion.arXiv preprint arXiv:2506.17298, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [19]

Diffusion-lm improves control- lable text generation.Advances in neural information pro- cessing systems, 35:4328–4343, 2022

Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. Diffusion-lm improves control- lable text generation.Advances in neural information pro- cessing systems, 35:4328–4343, 2022. 3

work page 2022

[20] [20]

Partcrafter: Structured 3d mesh generation via compositional latent diffusion trans- formers.arXiv preprint arXiv:2506.05573, 2025

Yuchen Lin, Chenguo Lin, Panwang Pan, Honglei Yan, Yiqiang Feng, Yadong Mu, and Katerina Fragkiadaki. Partcrafter: Structured 3d mesh generation via compo- sitional latent diffusion transformers.arXiv preprint arXiv:2506.05573, 2025. 3

work page arXiv 2025

[21] [21]

Treemeshgpt: Artistic mesh generation with autoregressive tree sequenc- ing

Stefan Lionar, Jiabin Liang, and Gim Hee Lee. Treemeshgpt: Artistic mesh generation with autoregressive tree sequenc- ing. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26608–26617, 2025. 2, 3, 6

work page 2025

[22] [22]

Part123: part-aware 3d reconstruction from a single-view image

Anran Liu, Cheng Lin, Yuan Liu, Xiaoxiao Long, Zhiyang Dou, Hao-Xiang Guo, Ping Luo, and Wenping Wang. Part123: part-aware 3d reconstruction from a single-view image. InACM SIGGRAPH 2024 Conference Papers, pages 1–12, 2024. 3

work page 2024

[23] [23]

Partfield: Learn- ing 3d feature fields for part segmentation and beyond

Minghua Liu, Mikaela Angelina Uy, Donglai Xiang, Hao Su, Sanja Fidler, Nicholas Sharp, and Jun Gao. Partfield: Learn- ing 3d feature fields for part segmentation and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9704–9715, 2025. 4, 6, 1 9

work page 2025

[24] [24]

Wonder3d: Sin- gle image to 3d using cross-domain diffusion

Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Sin- gle image to 3d using cross-domain diffusion. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9970–9980, 2024. 2

work page 2024

[25] [25]

Marching cubes: A high resolution 3d surface construction algorithm

William E Lorensen and Harvey E Cline. Marching cubes: A high resolution 3d surface construction algorithm. InSem- inal graphics: pioneering efforts that shaped the field, pages 347–353. 1998. 2

work page 1998

[26] [26]

Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2

work page 2021

[27] [27]

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.arXiv preprint arXiv:2502.09992, 2025. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

Deepsdf: Learning con- tinuous signed distance functions for shape representation

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning con- tinuous signed distance functions for shape representation. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 165–174, 2019. 2

work page 2019

[29] [29]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 4195–4205,

work page

[30] [30]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 3

work page 2022

[31] [31]

Meshgpt: Generating triangle meshes with decoder-only transformers

Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Ta- tiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, and Matthias Nießner. Meshgpt: Generating triangle meshes with decoder-only transformers. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19615–19625, 2024. 2

work page 2024

[32] [32]

Topology sculptor, shape refiner: Discrete diffusion model for high-fidelity 3d meshes generation.arXiv preprint arXiv:2510.21264, 2025

Kaiyu Song, Hanjiang Lai, Yaqing Zhang, Chuangjian Cai, Yan Pan Kun Yue, and Jian Yin. Topology sculptor, shape refiner: Discrete diffusion model for high-fidelity 3d meshes generation.arXiv preprint arXiv:2510.21264, 2025. 3

work page arXiv 2025

[33] [33]

Efficient part-level 3d object generation via dual volume packing.arXiv preprint arXiv:2506.09980,

Jiaxiang Tang, Ruijie Lu, Zhaoshuo Li, Zekun Hao, Xuan Li, Fangyin Wei, Shuran Song, Gang Zeng, Ming-Yu Liu, and Tsung-Yi Lin. Efficient part-level 3d object generation via dual volume packing.arXiv preprint arXiv:2506.09980,

work page arXiv

[34] [34]

arXiv preprint arXiv:2410.13782 , year=

Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, and Quanquan Gu. Dplm-2: A multi- modal diffusion protein language model.arXiv preprint arXiv:2410.13782, 2024. 3

work page arXiv 2024

[35] [35]

LLaMA-Mesh: Unifying 3d mesh generation with language models.arXiv preprint arXiv:2411.09595, 2024

Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, and Xiaohui Zeng. Llama-mesh: Unifying 3d mesh generation with language models.arXiv preprint arXiv:2411.09595, 2024. 2

work page arXiv 2024

[36] [36]

Scaling mesh generation via compressive tokenization

Haohan Weng, Zibo Zhao, Biwen Lei, Xianghui Yang, Jian Liu, Zeqiang Lai, Zhuo Chen, Yuhong Liu, Jie Jiang, Chun- chao Guo, et al. Scaling mesh generation via compressive tokenization. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11093–11103, 2025. 2, 3, 4, 6, 1

work page 2025

[37] [37]

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Chengyue Wu, Hao Zhang, Shuchen Xue, Zhijian Liu, Shizhe Diao, Ligeng Zhu, Ping Luo, Song Han, and Enze Xie. Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding.arXiv preprint arXiv:2505.22618, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025

[38] [38]

Structured 3d latents for scalable and versatile 3d gen- eration

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d gen- eration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21469–21480, 2025. 2

work page 2025

[39] [39]

Frankenstein: Generating semantic- compositional 3d scenes in one tri-plane

Han Yan, Yang Li, Zhennan Wu, Shenzhou Chen, Weix- uan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, et al. Frankenstein: Generating semantic- compositional 3d scenes in one tri-plane. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024. 3

work page 2024

[40] [40]

Phycage: Physically plausible compositional 3d asset gener- ation from a single image.arXiv preprint arXiv:2411.18548,

Han Yan, Mingrui Zhang, Yang Li, Chao Ma, and Pan Ji. Phycage: Physically plausible compositional 3d asset gener- ation from a single image.arXiv preprint arXiv:2411.18548,

work page arXiv

[41] [41]

Holopart: Generative 3d part amodal segmentation.arXiv preprint arXiv:2504.07943, 2025

Yunhan Yang, Yuan-Chen Guo, Yukun Huang, Zi-Xin Zou, Zhipeng Yu, Yangguang Li, Yan-Pei Cao, and Xihui Liu. Holopart: Generative 3d part amodal segmentation.arXiv preprint arXiv:2504.07943, 2025. 3

work page arXiv 2025

[42] [42]

Dream 7B: Diffusion Large Language Models

Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models.arXiv preprint arXiv:2508.15487, 2025. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025

[43] [43]

Deepmesh: Auto- regressive artist-mesh creation with reinforcement learning

Ruowen Zhao, Junliang Ye, Zhengyi Wang, Guangce Liu, Yiwen Chen, Yikai Wang, and Jun Zhu. Deepmesh: Auto- regressive artist-mesh creation with reinforcement learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10612–10623, 2025. 2, 3

work page 2025

[44] [44]

Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation.Advances in neural information processing systems, 36:73969–73982,

Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, and Shenghua Gao. Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation.Advances in neural information processing systems, 36:73969–73982,

work page

[45] [45]

Dataset Construction As a supplement to the dataset introduction in the main text, we provide a detailed description of the dataset construction process

4 10 PartDiffuser: Part-wise 3D Mesh Generation via Discrete Diffusion Supplementary Material A. Dataset Construction As a supplement to the dataset introduction in the main text, we provide a detailed description of the dataset construction process. We utilize Objaverse [8] and 3D-Front [10] as our primary data sources. The data preprocessing pipeline co...

work page