PolyFlow: Continuous Topology Embedding Flow Matching for Artist-style Mesh Generation

Biwen Lei; Chunchao Guo; Chunshi Wang; Haohan Weng; Junliang Ye; Kaiyi Zhang; Yang Li; Yawei Luo; Yunhan Yang; Zeqiang Lai

arxiv: 2606.30673 · v1 · pith:S47PNX3Wnew · submitted 2026-06-25 · 💻 cs.GR · cs.CV

PolyFlow: Continuous Topology Embedding Flow Matching for Artist-style Mesh Generation

Chunshi Wang , Haohan Weng , Junliang Ye , Biwen Lei , Yang Li , Zibo Zhao , Zeqiang Lai , Kaiyi Zhang

show 4 more authors

Yunhan Yang Zhuo Chen Chunchao Guo Yawei Luo

This is my paper

Pith reviewed 2026-07-01 06:51 UTC · model grok-4.3

classification 💻 cs.GR cs.CV

keywords mesh generationflow matchingtopology embeddingcontinuous representationparallel synthesisvertex denoisingartist meshesToys4K

0 comments

The pith

A compact topology embedder projects discrete meshes into continuous per-vertex states recoverable by distance thresholding, allowing flow matching to generate artist meshes in parallel.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that a pretrained embedder can map vertex positions and normals from discrete meshes into continuous embeddings where original adjacency information recovers through spacetime distance thresholding. With the embedder frozen, any mesh converts to a unified continuous state space that combines position, normal, and implicit topology attributes. A Transformer-based flow-matching model then performs parallel vertex denoising conditioned on point-cloud features, completing generation through an ODE solver while letting users specify target vertex count for resolution control. Evaluations on Toys4K show this approach yields lower Chamfer and Hausdorff distances than autoregressive transformer baselines. A reader would care because the method removes the sequential bottleneck of high-quality mesh generation without sacrificing the topological precision required for artist output.

Core claim

By pretraining and freezing a compact topology embedder that projects discrete vertex positions and normals into continuous per-vertex embeddings, the original adjacency can be recovered via spacetime distance thresholding; any mesh thereby becomes a continuous state space that unifies position, normal, and implicit topology, enabling a flow-matching Transformer to perform fully parallel vertex-state denoising conditioned on point-cloud features and to synthesize meshes via ODE integration with direct control over output vertex count.

What carries the argument

The compact topology embedder that maps discrete mesh vertex positions and normals into continuous per-vertex embeddings from which adjacency recovers by spacetime distance thresholding.

If this is right

Mesh generation completes in parallel through an ODE solver rather than sequential autoregressive decoding.
Output resolution is set directly by choosing the target number of vertices at inference time.
The model surpasses state-of-the-art autoregressive baselines in both Chamfer Distance and Hausdorff Distance on the Toys4K benchmark.
Generation supports conditioning on extracted point-cloud features for guided synthesis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding step could be tested on other discrete geometric structures such as molecular graphs to see whether flow matching extends beyond surface meshes.
Because the state space is continuous, interpolation between two generated embeddings might produce intermediate meshes whose recovered topology remains consistent.
Pretraining the embedder on larger and more varied mesh collections could raise the fidelity ceiling for complex artist topologies without changing the flow-matching stage.

Load-bearing premise

The spacetime distance thresholding on the continuous per-vertex embeddings recovers the original discrete adjacency without loss of topological fidelity required for artist-quality meshes.

What would settle it

Generate meshes from the flow model on Toys4K, recover connectivity by applying the spacetime distance threshold to the output embeddings, and measure whether the resulting adjacency graphs match ground-truth topologies or introduce invalid faces or connectivity errors.

Figures

Figures reproduced from arXiv: 2606.30673 by Biwen Lei, Chunchao Guo, Chunshi Wang, Haohan Weng, Junliang Ye, Kaiyi Zhang, Yang Li, Yawei Luo, Yunhan Yang, Zeqiang Lai, Zhuo Chen, Zibo Zhao.

**Figure 2.** Figure 2: Overview of the PolyFlow pipeline. Left—Training: Given a 3D mesh, we sample a point cloud and encode it into condition features via a frozen condition encoder. Vertex positions (x, y, z), surface normals, and topology embeddings produced by a frozen topology embedder are concatenated to form the joint flow state z = [xyz, normals, emb] of shape (B, V, D). A Flow Transformer is trained to denoise z from Ga… view at source ↗

**Figure 3.** Figure 3: Visualization of the denoising process. Top: vertex positions at selected ODE steps [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Visual ablation on topology embedding dimension. From left to right: input mesh, [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of mesh generation results on Toys4K. Each row shows one [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative evaluation of PolyFlow’s vertex generation. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

read the original abstract

Autoregressive Transformers dominate high-quality mesh generation by producing artist-worthy topologies, yet their inherent sequential decoding induces substantial computational overhead, falling orders of magnitude slower than parallel generative models. On the other hand, while continuous diffusion and flow-matching methods support efficient parallel synthesis across a variety of domains, they cannot be directly applied to meshes: mesh connectivity is inherently discrete and incompatible with standard continuous noise injection and denoising operations. To resolve this fundamental incompatibility, we introduce a compact topology embedder that projects discrete mesh vertex positions and normals into continuous per-vertex embeddings, where the original discrete adjacency information can be faithfully recovered via spacetime distance thresholding. After pretraining and freezing this embedder, any raw mesh can be fully converted into a continuous per-vertex state space unifying position, normal, and implicit topological attributes. Built upon this novel continuous mesh representation, we present PolyFlow, a Transformer-based flow-matching framework that achieves fully parallel vertex state denoising conditioned on extracted point-cloud features. During inference, our model completes generation rapidly via an ODE solver, and supports explicit, precise control over output mesh resolution by directly specifying the target vertex count. Extensive evaluations on the Toys4K benchmark demonstrate that PolyFlow surpasses state-of-the-art autoregressive baselines in both Chamfer Distance and Hausdorff Distance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The topology embedder lets flow matching run on meshes in parallel, but the paper still needs to prove that thresholding recovers correct adjacency on generated outputs rather than just reporting point distances.

read the letter

The main advance is the compact topology embedder that turns discrete vertex positions, normals, and adjacency into a continuous per-vertex state. Once frozen, this lets a standard flow-matching Transformer denoise the whole mesh at once instead of token by token. Inference is an ODE solve with direct control over vertex count, which is a practical win over autoregressive baselines.

The construction itself is straightforward: pretrain the embedder so that spacetime distance thresholding can pull back the discrete edges. Then condition the flow model on point-cloud features. On Toys4K the method reports better Chamfer and Hausdorff numbers than the autoregressive competitors. That part is clear and the parallel speed claim follows directly from the architecture.

The soft spot is exactly the one the stress-test flags. Chamfer and Hausdorff only check geometry; they say nothing about whether the recovered edges match the intended topology or produce manifold meshes. The abstract asserts faithful recovery, but without topology-specific metrics, edge-error counts, or qualitative examples of generated connectivity, it is hard to know if the outputs are actually artist-usable meshes or just point clouds that happen to be close. If the thresholding step introduces extra or missing edges on out-of-distribution samples, the geometric scores become misleading.

No ablations on embedder size, threshold choice, or alternative recovery methods appear in the provided summary, and the comparison stays limited to autoregressive models. A reader would want to see at least one continuous baseline adapted to meshes and some check that the generated topologies are valid.

This is for people working on 3D generative models in graphics who care about speed and topology. The core idea is worth testing, but the current evidence is thin on the topology claim. I would send it to peer review so the authors can add the missing validation.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a compact topology embedder that maps discrete mesh vertex positions and normals to continuous per-vertex embeddings, with the claim that original discrete adjacency can be recovered via spacetime distance thresholding. After freezing this embedder, the authors present PolyFlow, a Transformer-based flow-matching model that performs parallel denoising of the continuous vertex states conditioned on point-cloud features. The central empirical claim is that PolyFlow outperforms state-of-the-art autoregressive baselines on the Toys4K benchmark in both Chamfer Distance and Hausdorff Distance while enabling faster parallel generation and explicit control over vertex count.

Significance. If the topology recovery mechanism proves reliable, the work would provide a practical route to parallel, high-quality mesh generation that avoids the sequential decoding overhead of autoregressive Transformers, potentially benefiting applications requiring rapid synthesis of artist-style meshes.

major comments (2)

[Abstract] Abstract (paragraph describing the embedder): the claim that discrete adjacency information can be 'faithfully recovered via spacetime distance thresholding' is load-bearing for the validity of the continuous representation and the downstream claim of artist-quality meshes, yet the reported evaluations supply only geometric metrics (Chamfer Distance and Hausdorff Distance) with no topology-specific validation such as edge-recovery accuracy, adjacency-matrix agreement, or checks for non-manifold artifacts on generated samples.
[Abstract] Abstract: the statement that 'extensive evaluations on the Toys4K benchmark demonstrate that PolyFlow surpasses state-of-the-art autoregressive baselines' is presented without any description of the baselines, experimental protocol, number of samples, or error bars, making it impossible to assess whether the central performance claim is supported by the data.

minor comments (1)

[Abstract] The abstract asserts that autoregressive methods are 'orders of magnitude slower' without citing specific timing numbers or hardware; adding a quantitative comparison would strengthen the motivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the two major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph describing the embedder): the claim that discrete adjacency information can be 'faithfully recovered via spacetime distance thresholding' is load-bearing for the validity of the continuous representation and the downstream claim of artist-quality meshes, yet the reported evaluations supply only geometric metrics (Chamfer Distance and Hausdorff Distance) with no topology-specific validation such as edge-recovery accuracy, adjacency-matrix agreement, or checks for non-manifold artifacts on generated samples.

Authors: We agree that topology-specific validation would strengthen the load-bearing claim. The full manuscript describes the spacetime thresholding mechanism in Section 3 and provides qualitative recovery examples, but does not report quantitative topology metrics such as edge-recovery accuracy. We will add these metrics (edge recovery rate and adjacency matrix F1) on generated samples to the experiments, along with checks for non-manifold artifacts, and revise the abstract to reference this validation. This constitutes a substantive addition. revision: yes
Referee: [Abstract] Abstract: the statement that 'extensive evaluations on the Toys4K benchmark demonstrate that PolyFlow surpasses state-of-the-art autoregressive baselines' is presented without any description of the baselines, experimental protocol, number of samples, or error bars, making it impossible to assess whether the central performance claim is supported by the data.

Authors: The abstract is written for brevity, while the full manuscript specifies the baselines (MeshGPT and PolyGen), protocol (identical Toys4K train/test split, evaluation on 500 held-out shapes), sample count, and reports mean ± std in Table 2. We will revise the abstract to include a concise clause naming the baselines and noting that results are averaged with standard deviations. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on independent benchmark evaluation

full rationale

The paper introduces a pretrained-and-frozen topology embedder followed by a flow-matching denoiser, with performance asserted via direct comparison of Chamfer and Hausdorff distances against autoregressive baselines on the external Toys4K benchmark. No derivation step equates a claimed prediction to a fitted parameter or self-citation by construction; the spacetime-distance recovery rule is presented as an explicit design choice rather than a derived necessity, and the reported metrics are computed on generated outputs without reducing to the embedder's own training objective. The chain is therefore self-contained against external data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review performed on abstract only; full paper would be needed to enumerate free parameters, axioms, and invented entities with precision.

axioms (1)

domain assumption The topology embedder can project discrete mesh vertex positions and normals into continuous embeddings from which original adjacency can be recovered via spacetime distance thresholding.
Central premise stated in the abstract for creating the continuous mesh representation.

invented entities (1)

Compact topology embedder no independent evidence
purpose: To convert discrete meshes into a continuous per-vertex state space unifying position, normal, and implicit topology
New component introduced to resolve the incompatibility between discrete mesh connectivity and continuous flow-matching operations.

pith-pipeline@v0.9.1-grok · 5792 in / 1165 out tokens · 45548 ms · 2026-07-01T06:51:22.476851+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 25 canonical work pages · 9 internal anchors

[1]

D. Chen, Y. Qu, X. Li, M. Li, and S. Zhang. Xspecmesh: Quality-preserving auto- regressive mesh generation acceleration via multi-head speculative decoding.arXiv preprint arXiv:2507.23777, 2025

work page arXiv 2025
[2]

R. Chen, Y. Chen, N. Jiao, and K. Jia. Fantasia3d: Disentangling geometry and ap- pearance for high-quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22246–22256, 2023

2023
[3]

S. Chen, X. Chen, A. Pang, X. Zeng, W. Cheng, Y. Fu, F. Yin, Z. Wang, J. Yu, G. Yu, et al. Meshxl: Neural coordinate field for generative 3d foundation models.Advances in Neural Information Processing Systems, 37:97141–97166, 2024

2024
[4]

Y. Chen, T. He, D. Huang, W. Ye, S. Chen, J. Tang, Z. Cai, L. Yang, G. Yu, G. Lin, et al. Meshanything: Artist-created mesh generation with autoregressive transformers. In International Conference on Learning Representations, volume 2025, pages 51369–51389, 2025

2025
[5]

Y. Chen, Z. Li, Y. Wang, H. Zhang, Q. Li, C. Zhang, and G. Lin. Ultra3d: Efficient and high-fidelity 3d generation with part attention.arXiv preprint arXiv:2507.17745, 2025

work page arXiv 2025
[6]

Y. Chen, Y. Wang, Y. Luo, Z. Wang, Z. Chen, J. Zhu, C. Zhang, and G. Lin. Meshanything v2: Artist-created mesh generation with adjacent mesh tokenization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13922–13931, 2025

2025
[7]

Esser, S

P . Esser, S. Kulal, A. Blattmann, R. Entezari, J. M ¨uller, H. Saini, Y. Levi, D. Lorenz, A. Sauer, F. Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024

2024
[8]

Z. Hao, D. W. Romero, T.-Y. Lin, and M.-Y. Liu. Meshtron: High-fidelity, artist-like 3d mesh generation at scale.arXiv preprint arXiv:2412.09548, 2024

work page arXiv 2024
[9]

He, Z.-X

X. He, Z.-X. Zou, C.-H. Chen, Y.-C. Guo, D. Liang, C. Yuan, W. Ouyang, Y.-P . Cao, and Y. Li. Sparseflex: High-resolution and arbitrary-topology 3d shape modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14822– 14833, 2025

2025
[11]

J. Kim, Y. Lan, A. Fortes, Y. Chen, and X. Pan. Fastmesh: Efficient artistic mesh genera- tion via component decoupling.arXiv preprint arXiv:2508.19188, 2025

work page arXiv 2025
[12]

Z. Lai, Y. Zhao, Z. Zhao, H. Liu, Q. Lin, J. Huang, C. Guo, and X. Yue. Lattice: Democ- ratize high-fidelity 3d generation at scale.arXiv preprint arXiv:2512.03052, 2025

work page arXiv 2025
[13]

W. Li, J. Liu, H. Yan, R. Chen, Y. Liang, X. Chen, P . Tan, and X. Long. Craftsman3d: High-fidelity mesh generation with 3d native generation and interactive geometry refiner.arXiv preprint arXiv:2405.14979, 2024

work page arXiv 2024
[14]

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

Y. Li, Z.-X. Zou, Z. Liu, D. Wang, Y. Liang, Z. Yu, X. Liu, Y.-C. Guo, D. Liang, W. Ouyang, et al. Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models. arXiv preprint arXiv:2502.06608, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

Z. Li, Y. Wang, H. Zheng, Y. Luo, and B. Wen. Sparc3d: Sparse representation and construction for high-resolution 3d shapes modeling.arXiv preprint arXiv:2505.14521, 2025

work page arXiv 2025
[16]

C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 300–309, 2023

2023
[17]

Flow Matching for Generative Modeling

Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[18]

F. Liu, J. Ye, Y. Wang, H. Wang, Z. Wang, J. Zhu, and Y. Duan. Dreamreward-x: Boosting high-quality 3d generation with human preference alignment.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2025
[19]

J. Liu, C. Wang, S. Guo, H. Weng, Z. Zhou, Z. Li, J. Yu, Y. Zhu, J. Xu, B. Lei, Z. Chen, and C. Guo. Quadgpt: Native quadrilateral mesh generation with autoregressive models,
[20]

URLhttps://arxiv.org/abs/2509.21420

work page arXiv
[21]

X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[22]

C. Ma, Y. Li, X. Yan, J. Xu, Y. Yang, C. Wang, Z. Zhao, Y. Guo, Z. Chen, and C. Guo. P3-sam: Native 3d part segmentation, 2025. URL https://arxiv.org/abs/2509.06784. 13 Tencent Hunyuan

work page arXiv 2025
[23]

N. Ma, M. Goldstein, M. S. Albergo, N. M. Boffi, E. Vanden-Eijnden, and S. Xie. Sit: Exploring flow and diffusion-based generative models with scalable interpolant trans- formers. InEuropean Conference on Computer Vision, pages 23–40. Springer, 2024

2024
[24]

C. Nash, Y. Ganin, S. A. Eslami, and P . Battaglia. Polygen: An autoregressive generative model of 3d meshes. InInternational conference on machine learning, pages 7220–7229. PMLR, 2020

2020
[25]

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

A. Nichol, H. Jun, P . Dhariwal, P . Mishkin, and M. Chen. Point-e: A system for generating 3d point clouds from complex prompts.arXiv preprint arXiv:2212.08751, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[26]

Peebles and S

W. Peebles and S. Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023

2023
[27]

DreamFusion: Text-to-3D using 2D Diffusion

B. Poole, A. Jain, J. T. Barron, and B. Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[28]

L. Qiu, G. Chen, X. Gu, Q. Zuo, M. Xu, Y. Wu, W. Yuan, Z. Dong, L. Bo, and X. Han. Richdreamer: A generalizable normal-depth diffusion model for detail richness in text- to-3d. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9914–9925, 2024

2024
[29]

Sharp and M

N. Sharp and M. Ovsjanikov. Pointtrinet: Learned triangulation of 3d point sets. In European conference on computer vision, pages 762–778. Springer, 2020

2020
[30]

T. Shen, Z. Li, M. Law, M. Atzmon, S. Fidler, J. Lucas, J. Gao, and N. Sharp. Spacemesh: A continuous representation for learning manifold surface meshes. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024

2024
[31]

Y. Shi, P . Wang, J. Ye, M. Long, K. Li, and X. Yang. Mvdream: Multi-view diffusion for 3d generation.arXiv preprint arXiv:2308.16512, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

Siddiqui, A

Y. Siddiqui, A. Alliegro, A. Artemov, T. Tommasi, D. Sirigatti, V . Rosov, A. Dai, and M. Nießner. Meshgpt: Generating triangle meshes with decoder-only transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19615–19625, 2024

2024
[33]

S. Son, M. Gadelha, Y. Zhou, Z. Xu, M. C. Lin, and Y. Zhou. Dmesh: A differentiable mesh representation.arXiv preprint arXiv:2404.13445, 2024

work page arXiv 2024
[34]

Stojanov, A

S. Stojanov, A. Thai, and J. M. Rehg. Using shape to categorize: Low-shot learning with an explicit shape bias. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1798–1808, 2021

2021
[35]

J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation.arXiv preprint arXiv:2309.16653, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

J. Tang, M. Li, Z. Hao, X. Liu, G. Zeng, M.-Y. Liu, and Q. Zhang. Edgerunner: Auto- regressive auto-encoder for artistic mesh generation. InInternational Conference on Learning Representations, volume 2025, pages 35913–35934, 2025

2025
[37]

T. H. Team. Hunyuan3d-omni: A unified framework for controllable generation of 3d assets, 2025. URLhttps://arxiv.org/abs/2509.21245

work page arXiv 2025
[38]

C. Wang, J. Ye, Y. Yang, Y. Li, Z. Lin, J. Zhu, Z. Chen, Y. Luo, and C. Guo. Part-x-mllm: Part-aware 3d multimodal large language model, 2025. URL https://arxiv.org/abs/ 2511.13647

work page arXiv 2025
[39]

Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, and J. Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation.Advances in Neural Information Processing Systems, 36:8406–8441, 2023

2023
[40]

H. Weng, Y. Wang, T. Zhang, C. Chen, and J. Zhu. Pivotmesh: Generic 3d mesh generation via pivot vertices guidance.arXiv preprint arXiv:2405.16890, 2024

work page arXiv 2024
[41]

H. Weng, Z. Zhao, B. Lei, X. Yang, J. Liu, Z. Lai, Z. Chen, Y. Liu, J. Jiang, C. Guo, et al. Scaling mesh generation via compressive tokenization. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11093–11103, 2025

2025
[42]

Z. Wu, P . Zhou, X. Yi, X. Yuan, and H. Zhang. Consistent3d: Towards consistent high- fidelity text-to-3d generation with deterministic sampling prior. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9892–9902, 2024

2024
[43]

Xiang, Z

J. Xiang, Z. Lv, S. Xu, Y. Deng, R. Wang, B. Zhang, D. Chen, X. Tong, and J. Yang. Structured 3d latents for scalable and versatile 3d generation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21469–21480, 2025

2025
[44]

J. Xu, W. Cheng, Y. Gao, X. Wang, S. Gao, and Y. Shan. Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models.arXiv preprint arXiv:2404.07191, 2024. 14 Tencent Hunyuan

work page internal anchor Pith review Pith/arXiv arXiv 2024
[45]

X. Yan, J. Xu, Y. Li, C. Ma, Y. Yang, C. Wang, Z. Zhao, Z. Lai, Y. Zhao, Z. Chen, and C. Guo. X-part: high fidelity and structure coherent shape decomposition, 2025. URL https://arxiv.org/abs/2509.08643

work page arXiv 2025
[46]

Y. Yang, C. Wang, J. Ye, Y. Li, Z. Chen, Z. Huang, Y. Mu, Z. Chen, C. Guo, and X. Liu. Physforge: Generating physics-grounded 3d assets for interactive virtual world.arXiv preprint arXiv:2605.05163, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[47]

J. Ye, F. Liu, Q. Li, Z. Wang, Y. Wang, X. Wang, Y. Duan, and J. Zhu. Dreamreward: Text-to-3d generation with human preference. InEuropean Conference on Computer Vision, pages 259–276. Springer, 2024

2024
[48]

J. Ye, Z. Wang, R. Zhao, S. Xie, and J. Zhu. Shapellm-omni: A native multimodal llm for 3d generation and understanding.arXiv preprint arXiv:2506.01853, 2025

work page arXiv 2025
[49]

J. Ye, S. Xie, R. Zhao, Z. Wang, H. Yan, W. Zu, L. Ma, and J. Zhu. Nano3d: A training-free approach for efficient 3d editing without masks.arXiv preprint arXiv:2510.15019, 2025

work page arXiv 2025
[50]

J. Ye, Z. Huang, Y. Qu, C. Wang, Y. Yang, Y. Li, Y. Luo, Z. Chen, S. Lu, J. Zhu, et al. Universe3d: Emerging properties of unified multimodal models in 3d understanding and generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 613–623, 2026

2026
[51]

T. Yi, J. Fang, J. Wang, G. Wu, L. Xie, X. Zhang, W. Liu, Q. Tian, and X. Wang. Gaussian- dreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion, pages 6796–6807, 2024

2024
[52]

Zhang, J

B. Zhang, J. Tang, M. Niessner, and P . Wonka. 3dshape2vecset: A 3d shape representa- tion for neural fields and generative diffusion models.ACM Transactions On Graphics (TOG), 42(4):1–16, 2023

2023
[53]

R. Zhao, J. Ye, Z. Wang, G. Liu, Y. Chen, Y. Wang, and J. Zhu. Deepmesh: Auto- regressive artist-mesh creation with reinforcement learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10612–10623, 2025

2025
[54]

Z. Zhao, W. Liu, X. Chen, X. Zeng, R. Wang, P . Cheng, B. Fu, T. Chen, G. Yu, and S. Gao. Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation.Advances in neural information processing systems, 36:73969–73982, 2023. 15

2023

[1] [1]

D. Chen, Y. Qu, X. Li, M. Li, and S. Zhang. Xspecmesh: Quality-preserving auto- regressive mesh generation acceleration via multi-head speculative decoding.arXiv preprint arXiv:2507.23777, 2025

work page arXiv 2025

[2] [2]

R. Chen, Y. Chen, N. Jiao, and K. Jia. Fantasia3d: Disentangling geometry and ap- pearance for high-quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22246–22256, 2023

2023

[3] [3]

S. Chen, X. Chen, A. Pang, X. Zeng, W. Cheng, Y. Fu, F. Yin, Z. Wang, J. Yu, G. Yu, et al. Meshxl: Neural coordinate field for generative 3d foundation models.Advances in Neural Information Processing Systems, 37:97141–97166, 2024

2024

[4] [4]

Y. Chen, T. He, D. Huang, W. Ye, S. Chen, J. Tang, Z. Cai, L. Yang, G. Yu, G. Lin, et al. Meshanything: Artist-created mesh generation with autoregressive transformers. In International Conference on Learning Representations, volume 2025, pages 51369–51389, 2025

2025

[5] [5]

Y. Chen, Z. Li, Y. Wang, H. Zhang, Q. Li, C. Zhang, and G. Lin. Ultra3d: Efficient and high-fidelity 3d generation with part attention.arXiv preprint arXiv:2507.17745, 2025

work page arXiv 2025

[6] [6]

Y. Chen, Y. Wang, Y. Luo, Z. Wang, Z. Chen, J. Zhu, C. Zhang, and G. Lin. Meshanything v2: Artist-created mesh generation with adjacent mesh tokenization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13922–13931, 2025

2025

[7] [7]

Esser, S

P . Esser, S. Kulal, A. Blattmann, R. Entezari, J. M ¨uller, H. Saini, Y. Levi, D. Lorenz, A. Sauer, F. Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024

2024

[8] [8]

Z. Hao, D. W. Romero, T.-Y. Lin, and M.-Y. Liu. Meshtron: High-fidelity, artist-like 3d mesh generation at scale.arXiv preprint arXiv:2412.09548, 2024

work page arXiv 2024

[9] [9]

He, Z.-X

X. He, Z.-X. Zou, C.-H. Chen, Y.-C. Guo, D. Liang, C. Yuan, W. Ouyang, Y.-P . Cao, and Y. Li. Sparseflex: High-resolution and arbitrary-topology 3d shape modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14822– 14833, 2025

2025

[10] [11]

J. Kim, Y. Lan, A. Fortes, Y. Chen, and X. Pan. Fastmesh: Efficient artistic mesh genera- tion via component decoupling.arXiv preprint arXiv:2508.19188, 2025

work page arXiv 2025

[11] [12]

Z. Lai, Y. Zhao, Z. Zhao, H. Liu, Q. Lin, J. Huang, C. Guo, and X. Yue. Lattice: Democ- ratize high-fidelity 3d generation at scale.arXiv preprint arXiv:2512.03052, 2025

work page arXiv 2025

[12] [13]

W. Li, J. Liu, H. Yan, R. Chen, Y. Liang, X. Chen, P . Tan, and X. Long. Craftsman3d: High-fidelity mesh generation with 3d native generation and interactive geometry refiner.arXiv preprint arXiv:2405.14979, 2024

work page arXiv 2024

[13] [14]

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

Y. Li, Z.-X. Zou, Z. Liu, D. Wang, Y. Liang, Z. Yu, X. Liu, Y.-C. Guo, D. Liang, W. Ouyang, et al. Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models. arXiv preprint arXiv:2502.06608, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [15]

Z. Li, Y. Wang, H. Zheng, Y. Luo, and B. Wen. Sparc3d: Sparse representation and construction for high-resolution 3d shapes modeling.arXiv preprint arXiv:2505.14521, 2025

work page arXiv 2025

[15] [16]

C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 300–309, 2023

2023

[16] [17]

Flow Matching for Generative Modeling

Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[17] [18]

F. Liu, J. Ye, Y. Wang, H. Wang, Z. Wang, J. Zhu, and Y. Duan. Dreamreward-x: Boosting high-quality 3d generation with human preference alignment.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2025

[18] [19]

J. Liu, C. Wang, S. Guo, H. Weng, Z. Zhou, Z. Li, J. Yu, Y. Zhu, J. Xu, B. Lei, Z. Chen, and C. Guo. Quadgpt: Native quadrilateral mesh generation with autoregressive models,

[19] [20]

URLhttps://arxiv.org/abs/2509.21420

work page arXiv

[20] [21]

X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[21] [22]

C. Ma, Y. Li, X. Yan, J. Xu, Y. Yang, C. Wang, Z. Zhao, Y. Guo, Z. Chen, and C. Guo. P3-sam: Native 3d part segmentation, 2025. URL https://arxiv.org/abs/2509.06784. 13 Tencent Hunyuan

work page arXiv 2025

[22] [23]

N. Ma, M. Goldstein, M. S. Albergo, N. M. Boffi, E. Vanden-Eijnden, and S. Xie. Sit: Exploring flow and diffusion-based generative models with scalable interpolant trans- formers. InEuropean Conference on Computer Vision, pages 23–40. Springer, 2024

2024

[23] [24]

C. Nash, Y. Ganin, S. A. Eslami, and P . Battaglia. Polygen: An autoregressive generative model of 3d meshes. InInternational conference on machine learning, pages 7220–7229. PMLR, 2020

2020

[24] [25]

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

A. Nichol, H. Jun, P . Dhariwal, P . Mishkin, and M. Chen. Point-e: A system for generating 3d point clouds from complex prompts.arXiv preprint arXiv:2212.08751, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[25] [26]

Peebles and S

W. Peebles and S. Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023

2023

[26] [27]

DreamFusion: Text-to-3D using 2D Diffusion

B. Poole, A. Jain, J. T. Barron, and B. Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[27] [28]

L. Qiu, G. Chen, X. Gu, Q. Zuo, M. Xu, Y. Wu, W. Yuan, Z. Dong, L. Bo, and X. Han. Richdreamer: A generalizable normal-depth diffusion model for detail richness in text- to-3d. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9914–9925, 2024

2024

[28] [29]

Sharp and M

N. Sharp and M. Ovsjanikov. Pointtrinet: Learned triangulation of 3d point sets. In European conference on computer vision, pages 762–778. Springer, 2020

2020

[29] [30]

T. Shen, Z. Li, M. Law, M. Atzmon, S. Fidler, J. Lucas, J. Gao, and N. Sharp. Spacemesh: A continuous representation for learning manifold surface meshes. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024

2024

[30] [31]

Y. Shi, P . Wang, J. Ye, M. Long, K. Li, and X. Yang. Mvdream: Multi-view diffusion for 3d generation.arXiv preprint arXiv:2308.16512, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [32]

Siddiqui, A

Y. Siddiqui, A. Alliegro, A. Artemov, T. Tommasi, D. Sirigatti, V . Rosov, A. Dai, and M. Nießner. Meshgpt: Generating triangle meshes with decoder-only transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19615–19625, 2024

2024

[32] [33]

S. Son, M. Gadelha, Y. Zhou, Z. Xu, M. C. Lin, and Y. Zhou. Dmesh: A differentiable mesh representation.arXiv preprint arXiv:2404.13445, 2024

work page arXiv 2024

[33] [34]

Stojanov, A

S. Stojanov, A. Thai, and J. M. Rehg. Using shape to categorize: Low-shot learning with an explicit shape bias. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1798–1808, 2021

2021

[34] [35]

J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation.arXiv preprint arXiv:2309.16653, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[35] [36]

J. Tang, M. Li, Z. Hao, X. Liu, G. Zeng, M.-Y. Liu, and Q. Zhang. Edgerunner: Auto- regressive auto-encoder for artistic mesh generation. InInternational Conference on Learning Representations, volume 2025, pages 35913–35934, 2025

2025

[36] [37]

T. H. Team. Hunyuan3d-omni: A unified framework for controllable generation of 3d assets, 2025. URLhttps://arxiv.org/abs/2509.21245

work page arXiv 2025

[37] [38]

C. Wang, J. Ye, Y. Yang, Y. Li, Z. Lin, J. Zhu, Z. Chen, Y. Luo, and C. Guo. Part-x-mllm: Part-aware 3d multimodal large language model, 2025. URL https://arxiv.org/abs/ 2511.13647

work page arXiv 2025

[38] [39]

Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, and J. Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation.Advances in Neural Information Processing Systems, 36:8406–8441, 2023

2023

[39] [40]

H. Weng, Y. Wang, T. Zhang, C. Chen, and J. Zhu. Pivotmesh: Generic 3d mesh generation via pivot vertices guidance.arXiv preprint arXiv:2405.16890, 2024

work page arXiv 2024

[40] [41]

H. Weng, Z. Zhao, B. Lei, X. Yang, J. Liu, Z. Lai, Z. Chen, Y. Liu, J. Jiang, C. Guo, et al. Scaling mesh generation via compressive tokenization. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11093–11103, 2025

2025

[41] [42]

Z. Wu, P . Zhou, X. Yi, X. Yuan, and H. Zhang. Consistent3d: Towards consistent high- fidelity text-to-3d generation with deterministic sampling prior. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9892–9902, 2024

2024

[42] [43]

Xiang, Z

J. Xiang, Z. Lv, S. Xu, Y. Deng, R. Wang, B. Zhang, D. Chen, X. Tong, and J. Yang. Structured 3d latents for scalable and versatile 3d generation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21469–21480, 2025

2025

[43] [44]

J. Xu, W. Cheng, Y. Gao, X. Wang, S. Gao, and Y. Shan. Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models.arXiv preprint arXiv:2404.07191, 2024. 14 Tencent Hunyuan

work page internal anchor Pith review Pith/arXiv arXiv 2024

[44] [45]

X. Yan, J. Xu, Y. Li, C. Ma, Y. Yang, C. Wang, Z. Zhao, Z. Lai, Y. Zhao, Z. Chen, and C. Guo. X-part: high fidelity and structure coherent shape decomposition, 2025. URL https://arxiv.org/abs/2509.08643

work page arXiv 2025

[45] [46]

Y. Yang, C. Wang, J. Ye, Y. Li, Z. Chen, Z. Huang, Y. Mu, Z. Chen, C. Guo, and X. Liu. Physforge: Generating physics-grounded 3d assets for interactive virtual world.arXiv preprint arXiv:2605.05163, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[46] [47]

J. Ye, F. Liu, Q. Li, Z. Wang, Y. Wang, X. Wang, Y. Duan, and J. Zhu. Dreamreward: Text-to-3d generation with human preference. InEuropean Conference on Computer Vision, pages 259–276. Springer, 2024

2024

[47] [48]

J. Ye, Z. Wang, R. Zhao, S. Xie, and J. Zhu. Shapellm-omni: A native multimodal llm for 3d generation and understanding.arXiv preprint arXiv:2506.01853, 2025

work page arXiv 2025

[48] [49]

J. Ye, S. Xie, R. Zhao, Z. Wang, H. Yan, W. Zu, L. Ma, and J. Zhu. Nano3d: A training-free approach for efficient 3d editing without masks.arXiv preprint arXiv:2510.15019, 2025

work page arXiv 2025

[49] [50]

J. Ye, Z. Huang, Y. Qu, C. Wang, Y. Yang, Y. Li, Y. Luo, Z. Chen, S. Lu, J. Zhu, et al. Universe3d: Emerging properties of unified multimodal models in 3d understanding and generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 613–623, 2026

2026

[50] [51]

T. Yi, J. Fang, J. Wang, G. Wu, L. Xie, X. Zhang, W. Liu, Q. Tian, and X. Wang. Gaussian- dreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion, pages 6796–6807, 2024

2024

[51] [52]

Zhang, J

B. Zhang, J. Tang, M. Niessner, and P . Wonka. 3dshape2vecset: A 3d shape representa- tion for neural fields and generative diffusion models.ACM Transactions On Graphics (TOG), 42(4):1–16, 2023

2023

[52] [53]

R. Zhao, J. Ye, Z. Wang, G. Liu, Y. Chen, Y. Wang, and J. Zhu. Deepmesh: Auto- regressive artist-mesh creation with reinforcement learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10612–10623, 2025

2025

[53] [54]

Z. Zhao, W. Liu, X. Chen, X. Zeng, R. Wang, P . Cheng, B. Fu, T. Chen, G. Yu, and S. Gao. Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation.Advances in neural information processing systems, 36:73969–73982, 2023. 15

2023