PolyFlow: Continuous Topology Embedding Flow Matching for Artist-style Mesh Generation
Pith reviewed 2026-07-01 06:51 UTC · model grok-4.3
The pith
A compact topology embedder projects discrete meshes into continuous per-vertex states recoverable by distance thresholding, allowing flow matching to generate artist meshes in parallel.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By pretraining and freezing a compact topology embedder that projects discrete vertex positions and normals into continuous per-vertex embeddings, the original adjacency can be recovered via spacetime distance thresholding; any mesh thereby becomes a continuous state space that unifies position, normal, and implicit topology, enabling a flow-matching Transformer to perform fully parallel vertex-state denoising conditioned on point-cloud features and to synthesize meshes via ODE integration with direct control over output vertex count.
What carries the argument
The compact topology embedder that maps discrete mesh vertex positions and normals into continuous per-vertex embeddings from which adjacency recovers by spacetime distance thresholding.
If this is right
- Mesh generation completes in parallel through an ODE solver rather than sequential autoregressive decoding.
- Output resolution is set directly by choosing the target number of vertices at inference time.
- The model surpasses state-of-the-art autoregressive baselines in both Chamfer Distance and Hausdorff Distance on the Toys4K benchmark.
- Generation supports conditioning on extracted point-cloud features for guided synthesis.
Where Pith is reading between the lines
- The same embedding step could be tested on other discrete geometric structures such as molecular graphs to see whether flow matching extends beyond surface meshes.
- Because the state space is continuous, interpolation between two generated embeddings might produce intermediate meshes whose recovered topology remains consistent.
- Pretraining the embedder on larger and more varied mesh collections could raise the fidelity ceiling for complex artist topologies without changing the flow-matching stage.
Load-bearing premise
The spacetime distance thresholding on the continuous per-vertex embeddings recovers the original discrete adjacency without loss of topological fidelity required for artist-quality meshes.
What would settle it
Generate meshes from the flow model on Toys4K, recover connectivity by applying the spacetime distance threshold to the output embeddings, and measure whether the resulting adjacency graphs match ground-truth topologies or introduce invalid faces or connectivity errors.
Figures
read the original abstract
Autoregressive Transformers dominate high-quality mesh generation by producing artist-worthy topologies, yet their inherent sequential decoding induces substantial computational overhead, falling orders of magnitude slower than parallel generative models. On the other hand, while continuous diffusion and flow-matching methods support efficient parallel synthesis across a variety of domains, they cannot be directly applied to meshes: mesh connectivity is inherently discrete and incompatible with standard continuous noise injection and denoising operations. To resolve this fundamental incompatibility, we introduce a compact topology embedder that projects discrete mesh vertex positions and normals into continuous per-vertex embeddings, where the original discrete adjacency information can be faithfully recovered via spacetime distance thresholding. After pretraining and freezing this embedder, any raw mesh can be fully converted into a continuous per-vertex state space unifying position, normal, and implicit topological attributes. Built upon this novel continuous mesh representation, we present PolyFlow, a Transformer-based flow-matching framework that achieves fully parallel vertex state denoising conditioned on extracted point-cloud features. During inference, our model completes generation rapidly via an ODE solver, and supports explicit, precise control over output mesh resolution by directly specifying the target vertex count. Extensive evaluations on the Toys4K benchmark demonstrate that PolyFlow surpasses state-of-the-art autoregressive baselines in both Chamfer Distance and Hausdorff Distance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a compact topology embedder that maps discrete mesh vertex positions and normals to continuous per-vertex embeddings, with the claim that original discrete adjacency can be recovered via spacetime distance thresholding. After freezing this embedder, the authors present PolyFlow, a Transformer-based flow-matching model that performs parallel denoising of the continuous vertex states conditioned on point-cloud features. The central empirical claim is that PolyFlow outperforms state-of-the-art autoregressive baselines on the Toys4K benchmark in both Chamfer Distance and Hausdorff Distance while enabling faster parallel generation and explicit control over vertex count.
Significance. If the topology recovery mechanism proves reliable, the work would provide a practical route to parallel, high-quality mesh generation that avoids the sequential decoding overhead of autoregressive Transformers, potentially benefiting applications requiring rapid synthesis of artist-style meshes.
major comments (2)
- [Abstract] Abstract (paragraph describing the embedder): the claim that discrete adjacency information can be 'faithfully recovered via spacetime distance thresholding' is load-bearing for the validity of the continuous representation and the downstream claim of artist-quality meshes, yet the reported evaluations supply only geometric metrics (Chamfer Distance and Hausdorff Distance) with no topology-specific validation such as edge-recovery accuracy, adjacency-matrix agreement, or checks for non-manifold artifacts on generated samples.
- [Abstract] Abstract: the statement that 'extensive evaluations on the Toys4K benchmark demonstrate that PolyFlow surpasses state-of-the-art autoregressive baselines' is presented without any description of the baselines, experimental protocol, number of samples, or error bars, making it impossible to assess whether the central performance claim is supported by the data.
minor comments (1)
- [Abstract] The abstract asserts that autoregressive methods are 'orders of magnitude slower' without citing specific timing numbers or hardware; adding a quantitative comparison would strengthen the motivation.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We address the two major comments point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph describing the embedder): the claim that discrete adjacency information can be 'faithfully recovered via spacetime distance thresholding' is load-bearing for the validity of the continuous representation and the downstream claim of artist-quality meshes, yet the reported evaluations supply only geometric metrics (Chamfer Distance and Hausdorff Distance) with no topology-specific validation such as edge-recovery accuracy, adjacency-matrix agreement, or checks for non-manifold artifacts on generated samples.
Authors: We agree that topology-specific validation would strengthen the load-bearing claim. The full manuscript describes the spacetime thresholding mechanism in Section 3 and provides qualitative recovery examples, but does not report quantitative topology metrics such as edge-recovery accuracy. We will add these metrics (edge recovery rate and adjacency matrix F1) on generated samples to the experiments, along with checks for non-manifold artifacts, and revise the abstract to reference this validation. This constitutes a substantive addition. revision: yes
-
Referee: [Abstract] Abstract: the statement that 'extensive evaluations on the Toys4K benchmark demonstrate that PolyFlow surpasses state-of-the-art autoregressive baselines' is presented without any description of the baselines, experimental protocol, number of samples, or error bars, making it impossible to assess whether the central performance claim is supported by the data.
Authors: The abstract is written for brevity, while the full manuscript specifies the baselines (MeshGPT and PolyGen), protocol (identical Toys4K train/test split, evaluation on 500 held-out shapes), sample count, and reports mean ± std in Table 2. We will revise the abstract to include a concise clause naming the baselines and noting that results are averaged with standard deviations. revision: yes
Circularity Check
No circularity: empirical claims rest on independent benchmark evaluation
full rationale
The paper introduces a pretrained-and-frozen topology embedder followed by a flow-matching denoiser, with performance asserted via direct comparison of Chamfer and Hausdorff distances against autoregressive baselines on the external Toys4K benchmark. No derivation step equates a claimed prediction to a fitted parameter or self-citation by construction; the spacetime-distance recovery rule is presented as an explicit design choice rather than a derived necessity, and the reported metrics are computed on generated outputs without reducing to the embedder's own training objective. The chain is therefore self-contained against external data.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The topology embedder can project discrete mesh vertex positions and normals into continuous embeddings from which original adjacency can be recovered via spacetime distance thresholding.
invented entities (1)
-
Compact topology embedder
no independent evidence
Reference graph
Works this paper leans on
- [1]
-
[2]
R. Chen, Y. Chen, N. Jiao, and K. Jia. Fantasia3d: Disentangling geometry and ap- pearance for high-quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22246–22256, 2023
2023
-
[3]
S. Chen, X. Chen, A. Pang, X. Zeng, W. Cheng, Y. Fu, F. Yin, Z. Wang, J. Yu, G. Yu, et al. Meshxl: Neural coordinate field for generative 3d foundation models.Advances in Neural Information Processing Systems, 37:97141–97166, 2024
2024
-
[4]
Y. Chen, T. He, D. Huang, W. Ye, S. Chen, J. Tang, Z. Cai, L. Yang, G. Yu, G. Lin, et al. Meshanything: Artist-created mesh generation with autoregressive transformers. In International Conference on Learning Representations, volume 2025, pages 51369–51389, 2025
2025
- [5]
-
[6]
Y. Chen, Y. Wang, Y. Luo, Z. Wang, Z. Chen, J. Zhu, C. Zhang, and G. Lin. Meshanything v2: Artist-created mesh generation with adjacent mesh tokenization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13922–13931, 2025
2025
-
[7]
Esser, S
P . Esser, S. Kulal, A. Blattmann, R. Entezari, J. M ¨uller, H. Saini, Y. Levi, D. Lorenz, A. Sauer, F. Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024
2024
- [8]
-
[9]
He, Z.-X
X. He, Z.-X. Zou, C.-H. Chen, Y.-C. Guo, D. Liang, C. Yuan, W. Ouyang, Y.-P . Cao, and Y. Li. Sparseflex: High-resolution and arbitrary-topology 3d shape modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14822– 14833, 2025
2025
- [11]
- [12]
- [13]
-
[14]
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
Y. Li, Z.-X. Zou, Z. Liu, D. Wang, Y. Liang, Z. Yu, X. Liu, Y.-C. Guo, D. Liang, W. Ouyang, et al. Triposg: High-fidelity 3d shape synthesis using large-scale rectified flow models. arXiv preprint arXiv:2502.06608, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [15]
-
[16]
C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 300–309, 2023
2023
-
[17]
Flow Matching for Generative Modeling
Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[18]
F. Liu, J. Ye, Y. Wang, H. Wang, Z. Wang, J. Zhu, and Y. Duan. Dreamreward-x: Boosting high-quality 3d generation with human preference alignment.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
2025
-
[19]
J. Liu, C. Wang, S. Guo, H. Weng, Z. Zhou, Z. Li, J. Yu, Y. Zhu, J. Xu, B. Lei, Z. Chen, and C. Guo. Quadgpt: Native quadrilateral mesh generation with autoregressive models,
- [20]
-
[21]
X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
- [22]
-
[23]
N. Ma, M. Goldstein, M. S. Albergo, N. M. Boffi, E. Vanden-Eijnden, and S. Xie. Sit: Exploring flow and diffusion-based generative models with scalable interpolant trans- formers. InEuropean Conference on Computer Vision, pages 23–40. Springer, 2024
2024
-
[24]
C. Nash, Y. Ganin, S. A. Eslami, and P . Battaglia. Polygen: An autoregressive generative model of 3d meshes. InInternational conference on machine learning, pages 7220–7229. PMLR, 2020
2020
-
[25]
Point-E: A System for Generating 3D Point Clouds from Complex Prompts
A. Nichol, H. Jun, P . Dhariwal, P . Mishkin, and M. Chen. Point-e: A system for generating 3d point clouds from complex prompts.arXiv preprint arXiv:2212.08751, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[26]
Peebles and S
W. Peebles and S. Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023
2023
-
[27]
DreamFusion: Text-to-3D using 2D Diffusion
B. Poole, A. Jain, J. T. Barron, and B. Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[28]
L. Qiu, G. Chen, X. Gu, Q. Zuo, M. Xu, Y. Wu, W. Yuan, Z. Dong, L. Bo, and X. Han. Richdreamer: A generalizable normal-depth diffusion model for detail richness in text- to-3d. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9914–9925, 2024
2024
-
[29]
Sharp and M
N. Sharp and M. Ovsjanikov. Pointtrinet: Learned triangulation of 3d point sets. In European conference on computer vision, pages 762–778. Springer, 2020
2020
-
[30]
T. Shen, Z. Li, M. Law, M. Atzmon, S. Fidler, J. Lucas, J. Gao, and N. Sharp. Spacemesh: A continuous representation for learning manifold surface meshes. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024
2024
-
[31]
Y. Shi, P . Wang, J. Ye, M. Long, K. Li, and X. Yang. Mvdream: Multi-view diffusion for 3d generation.arXiv preprint arXiv:2308.16512, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
Siddiqui, A
Y. Siddiqui, A. Alliegro, A. Artemov, T. Tommasi, D. Sirigatti, V . Rosov, A. Dai, and M. Nießner. Meshgpt: Generating triangle meshes with decoder-only transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19615–19625, 2024
2024
- [33]
-
[34]
Stojanov, A
S. Stojanov, A. Thai, and J. M. Rehg. Using shape to categorize: Low-shot learning with an explicit shape bias. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1798–1808, 2021
2021
-
[35]
J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation.arXiv preprint arXiv:2309.16653, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[36]
J. Tang, M. Li, Z. Hao, X. Liu, G. Zeng, M.-Y. Liu, and Q. Zhang. Edgerunner: Auto- regressive auto-encoder for artistic mesh generation. InInternational Conference on Learning Representations, volume 2025, pages 35913–35934, 2025
2025
- [37]
- [38]
-
[39]
Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, and J. Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation.Advances in Neural Information Processing Systems, 36:8406–8441, 2023
2023
- [40]
-
[41]
H. Weng, Z. Zhao, B. Lei, X. Yang, J. Liu, Z. Lai, Z. Chen, Y. Liu, J. Jiang, C. Guo, et al. Scaling mesh generation via compressive tokenization. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11093–11103, 2025
2025
-
[42]
Z. Wu, P . Zhou, X. Yi, X. Yuan, and H. Zhang. Consistent3d: Towards consistent high- fidelity text-to-3d generation with deterministic sampling prior. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9892–9902, 2024
2024
-
[43]
Xiang, Z
J. Xiang, Z. Lv, S. Xu, Y. Deng, R. Wang, B. Zhang, D. Chen, X. Tong, and J. Yang. Structured 3d latents for scalable and versatile 3d generation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 21469–21480, 2025
2025
-
[44]
J. Xu, W. Cheng, Y. Gao, X. Wang, S. Gao, and Y. Shan. Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models.arXiv preprint arXiv:2404.07191, 2024. 14 Tencent Hunyuan
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [45]
-
[46]
Y. Yang, C. Wang, J. Ye, Y. Li, Z. Chen, Z. Huang, Y. Mu, Z. Chen, C. Guo, and X. Liu. Physforge: Generating physics-grounded 3d assets for interactive virtual world.arXiv preprint arXiv:2605.05163, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[47]
J. Ye, F. Liu, Q. Li, Z. Wang, Y. Wang, X. Wang, Y. Duan, and J. Zhu. Dreamreward: Text-to-3d generation with human preference. InEuropean Conference on Computer Vision, pages 259–276. Springer, 2024
2024
- [48]
- [49]
-
[50]
J. Ye, Z. Huang, Y. Qu, C. Wang, Y. Yang, Y. Li, Y. Luo, Z. Chen, S. Lu, J. Zhu, et al. Universe3d: Emerging properties of unified multimodal models in 3d understanding and generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 613–623, 2026
2026
-
[51]
T. Yi, J. Fang, J. Wang, G. Wu, L. Xie, X. Zhang, W. Liu, Q. Tian, and X. Wang. Gaussian- dreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion, pages 6796–6807, 2024
2024
-
[52]
Zhang, J
B. Zhang, J. Tang, M. Niessner, and P . Wonka. 3dshape2vecset: A 3d shape representa- tion for neural fields and generative diffusion models.ACM Transactions On Graphics (TOG), 42(4):1–16, 2023
2023
-
[53]
R. Zhao, J. Ye, Z. Wang, G. Liu, Y. Chen, Y. Wang, and J. Zhu. Deepmesh: Auto- regressive artist-mesh creation with reinforcement learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10612–10623, 2025
2025
-
[54]
Z. Zhao, W. Liu, X. Chen, X. Zeng, R. Wang, P . Cheng, B. Fu, T. Chen, G. Yu, and S. Gao. Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation.Advances in neural information processing systems, 36:73969–73982, 2023. 15
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.