Recognition: no theorem link
Learning to Build Shapes by Extrusion
Pith reviewed 2026-05-16 09:44 UTC · model grok-4.3
The pith
Extrusion sequences let language models generate and edit manifold 3D meshes with any number of faces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By decomposing a library of quadrilateral meshes into non-self-intersecting face loops and fine-tuning a language model to reconstruct each mesh as a sequence of extrusions, the method produces meshes that are manifold by construction, supports arbitrary face counts, reconstructs given shapes, synthesizes new ones, and adds new features to existing meshes.
What carries the argument
Text Encoded Extrusions (TEE), a sequence representation of mesh construction as ordered face extrusions.
Load-bearing premise
That extrusion sequences learned from existing quadrilateral meshes will produce valid manifold meshes when applied to novel or edited shapes without further constraints or post-processing.
What would settle it
A single generated or edited mesh containing a non-manifold vertex, edge, or self-intersecting face after the extrusion sequence is applied.
read the original abstract
We introduce Text Encoded Extrusions (TEE), a text-based representation that expresses mesh construction as sequences of face extrusions rather than polygon lists, and a method for generating 3D meshes from TEE using a large language model (LLM). By learning extrusion sequences that assemble a mesh, similar to the way artists create meshes, our approach naturally supports arbitrary output face counts and produces manifold meshes by design, in contrast to recent mesh generative transformer based models. The learnt extrusion sequences can also be applied to existing meshes - enabling editing in addition to generation. To train our model, we decompose a library of quadrilateral meshes with non-self-intersecting face loops into constituent loops, which can be viewed as their building blocks, and finetune an LLM on the steps for reassembling the quadrilateral meshes by performing a sequence of extrusions. We demonstrate that our representation enables reconstruction, novel shape synthesis, and the addition of new features to existing meshes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Text Encoded Extrusions (TEE), a text-based representation that encodes quadrilateral mesh construction as ordered sequences of face-loop extrusions. An LLM is fine-tuned on sequences obtained by decomposing a library of quad meshes into non-self-intersecting loops and then reassembling them via extrusion steps. The central claims are that this representation supports arbitrary output face counts, produces manifold meshes by design (in contrast to mesh generative transformers), and enables both novel synthesis and editing of existing meshes by applying learned extrusion sequences.
Significance. If the manifoldness and generalization claims hold, the approach would provide a controllable, topologically consistent alternative to direct mesh generation methods by mimicking artist-like sequential construction. The explicit decomposition into extrusion steps and the ability to apply sequences to existing meshes are potentially useful strengths for editing tasks.
major comments (2)
- [Abstract and §3] Abstract and §3 (method): the claim that the method 'produces manifold meshes by design' is load-bearing for the central contribution, yet the manuscript provides no validity filter, constrained decoding, or rejection sampling at inference. Because the model is a fine-tuned LLM using next-token sampling, generated sequences can violate the topological constraints (non-self-intersecting loops, consistent winding, closed manifold boundaries) used in the training decomposition, reducing the guarantee to an unverified distributional hope.
- [Abstract and §4] Abstract and §4 (experiments): no quantitative results, baselines, error metrics, or ablation studies are reported. Without measured reconstruction accuracy, generalization error on novel prompts, or failure rates on invalid sequence generation, it is impossible to evaluate whether the learned distribution stays inside the valid set or whether the editing capability works reliably.
minor comments (2)
- [§2] Notation for TEE sequences should be formalized with a precise grammar or BNF in §2 to clarify how vertex identifications and loop closures are encoded in text.
- [§3.1] The decomposition algorithm in §3.1 should specify the exact criteria used to select non-self-intersecting face loops and how ties are broken when multiple valid decompositions exist.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We agree that the manifoldness claim requires clarification and that quantitative evaluations are needed to substantiate the claims. We will revise the manuscript accordingly, as detailed in the point-by-point responses below.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (method): the claim that the method 'produces manifold meshes by design' is load-bearing for the central contribution, yet the manuscript provides no validity filter, constrained decoding, or rejection sampling at inference. Because the model is a fine-tuned LLM using next-token sampling, generated sequences can violate the topological constraints (non-self-intersecting loops, consistent winding, closed manifold boundaries) used in the training decomposition, reducing the guarantee to an unverified distributional hope.
Authors: We acknowledge that the manuscript does not include constrained decoding, validity filters, or rejection sampling during inference. The 'by design' phrasing in the abstract and §3 refers to the fact that every training sequence is derived from valid, non-self-intersecting face-loop decompositions of manifold quad meshes, and that a correct extrusion step on a manifold base preserves manifoldness. However, we agree that next-token sampling from the fine-tuned LLM provides only a distributional tendency toward validity rather than a hard guarantee. We will revise the abstract and method section to remove the unqualified 'by design' claim, explicitly state the probabilistic nature of the guarantee, and add a new subsection reporting the empirical rate of valid (manifold, non-self-intersecting) sequences generated on held-out prompts. revision: partial
-
Referee: [Abstract and §4] Abstract and §4 (experiments): no quantitative results, baselines, error metrics, or ablation studies are reported. Without measured reconstruction accuracy, generalization error on novel prompts, or failure rates on invalid sequence generation, it is impossible to evaluate whether the learned distribution stays inside the valid set or whether the editing capability works reliably.
Authors: The current manuscript presents only qualitative demonstrations of reconstruction, novel synthesis, and editing. We agree that this is insufficient to evaluate the claims. In the revised version we will add a quantitative evaluation section that reports: (i) reconstruction accuracy via Chamfer distance and normal consistency on a held-out test set of quad meshes, (ii) the percentage of generated sequences that produce manifold, non-self-intersecting meshes, (iii) success rates for the editing task when applying learned sequences to new base meshes, and (iv) a simple baseline comparison against a direct mesh-generation transformer trained on the same data. We will also include an ablation on training-set size to show how validity rates scale. revision: yes
Circularity Check
No circularity: manifold claim follows from representation and training data, not self-definition or fitted reduction
full rationale
The paper defines TEE as sequences of face-loop extrusions derived from explicit decomposition of existing quad meshes, then fine-tunes an LLM to reproduce those sequences. The 'manifold by design' statement is a direct consequence of the representation: any sequence obeying the same topological rules used in decomposition yields a manifold mesh. No equations, parameters, or self-citations reduce the output to the input by construction; the central claim rests on observable training data and the inductive bias of next-token prediction on valid examples. The absence of an explicit validity filter at inference is a correctness or robustness concern, not a circularity in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Quadrilateral meshes with non-self-intersecting face loops can be decomposed into constituent loops that serve as building blocks for extrusion sequences
invented entities (1)
-
Text Encoded Extrusions (TEE)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Federica Bogo, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2017. Dy- namic FAUST: Registering Human Bodies in Motion. InIEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Figure 12: Three objects generated from extrusion sequences trained on a database of highly varied meshes. While the results are strange they demonstrate that the m...
work page 2017
-
[2]
Matteo Bracci, Marco Tarini, Nico Pietroni, Marco Livesu, and Paolo Cignoni
-
[3]
HexaLab.net: An online viewer for hexahedral meshes.Cad Computer Aided Design110 (2019), 24–36. doi:10.1016/j.cad.2018.12.003
- [4]
-
[5]
Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Zhibin Wang, Jingyi Yu, Gang Yu, Bin Fu, and Tao Chen. 2024. MeshXL: Neural Coordinate Field for Generative 3D Foundation Models.Advances in Neural Information Processing Systems37 (2024)
work page 2024
-
[6]
Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, and Guosheng Lin. 2024. MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization.Arxiv (cornell University)(2024). doi:10.48550/arXiv.2408.02555
-
[7]
Zhiqin Chen, Andrea Tagliasacchi, Thomas Funkhouser, and Hao Zhang. 2022. Neural dual contouring.Acm Transactions on Graphics41, 4 (2022), 3530108. doi:10.1145/3528223.3530108
-
[8]
Julian Chibane, Aymen Mir, and Gerard Pons-Moll. 2020. Neural Unsigned Distance Fields for Implicit Function Learning.Advances in Neural Information Processing Systems 33, Neurips 202033 (2020)
work page 2020
-
[9]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ah- mad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sra- vankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Learning to Build Shapes by Extrusion Rodriguez, Austen Gregers...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024
- [10]
-
[11]
Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. 2022. Get3d: A generative model of high quality 3d textured shapes learned from images.Advances in neural information processing systems35 (2022), 31841–31854
work page 2022
-
[12]
SFF Gibson. 1998. Constrained elastic surface nets: Generating smooth surfaces from binary segmented data.Medical Image Computing and Computer-assisted Intervention - Miccai’981496 (1998), 888–898
work page 1998
-
[13]
Benoît Guillard, Federico Stella, and Pascal Fua. 2022. MeshUDF: Fast and Differentiable Meshing of Unsigned Distance Field Networks.Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)13663 (2022), 576–592. doi:10.1007/978-3-031- 20062-5_33
-
[14]
Romero, Tsung-Yi Lin, and Ming-Yu Liu
Zekun Hao, David W. Romero, Tsung-Yi Lin, and Ming-Yu Liu. 2024. Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale. (2024)
work page 2024
-
[15]
Alec Jacobson, Daniele Panozzo, et al . 2018. libigl: A simple C++ geometry processing library. https://libigl.github.io/
work page 2018
-
[16]
Jae Joong Lee, Bosheng Li, and Bedrich Benes. 2023. Latent L-systems: Transformer-based Tree Generator.Acm Transactions on Graphics43, 1 (2023), 7. doi:10.1145/3627101
-
[17]
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. 2024. INSTANT3D: FAST TEXT-TO-3D WITH SPARSE-VIEW GENERATION AND LARGE RECON- STRUCTION MODEL.12th International Conference on Learning Representations, Iclr 2024(2024)
work page 2024
-
[18]
Chen Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming Yu Liu, and Tsung Yi Lin. 2023. Magic3D: High-Resolution Text-to-3D Content Creation.Proceedings of the Ieee Computer Society Conference on Computer Vision and Pattern Recognition2023- (2023), 300–309. doi:10.1109/CVPR52729.2023.00037
-
[19]
Hsueh-Ti Derek Liu, Francis Williams, Alec Jacobson, Sanja Fidler, and Or Litany
-
[20]
Learning smooth neural functions via lipschitz regularization.Siggraph22 Conference Proceeding: Special Interest Group on Computer Graphics and Interac- tive Techniques Conference Proceedings(2022), 31 (13 pp.). doi:10.1145/3528233. 3530713
-
[21]
William E. Lorensen and Harvey E. Cline. 1987. Marching cubes: A high resolu- tion 3D surface construction algorithm.Acm Siggraph Computer Graphics21, 4 (1987), 163–169. doi:10.1145/37402.37422
-
[22]
Paul Merrell and Dinesh Manocha. 2011. Model Synthesis: A General Procedural Modeling Algorithm.IEEE Transactions on Visualization and Computer Graphics 17, 6 (2011), 715–728. doi:10.1109/TVCG.2010.112
-
[23]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2022. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.Communications of the Acm65, 1 (2022), 99–106. doi:10.1145/3503250
-
[24]
Charlie Nash, Yaroslav Ganin, S. M.Ali Eslami, and Peter W. Battaglia. 2020. PolyGen: An autoregressive generative model of 3D meshes.37th International Conference on Machine Learning, Icml 2020168147-10 (2020), 7177–7186
work page 2020
-
[25]
Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen
-
[26]
Point-E: A System for Generating 3D Point Clouds from Complex Prompts [arXiv].Arxiv(2022), 8 pp. Christiansen et al
work page 2022
- [27]
-
[28]
Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation.Proceedings of the Ieee Computer Society Conference on Computer Vision and Pattern Recognition2019- (2019), 165–174. doi:10.1109/ CVPR.2019.00025
-
[29]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research12 (2011), 2825–2830
work page 2011
-
[30]
Nico Pietroni, Marcel Campen, Alla Sheffer, Gianmarco Cherchi, David Bommes, Xifeng Gao, Riccardo Scateni, Franck Ledoux, Jean Remacle, and Marco Livesu
-
[31]
Hex-Mesh Generation and Processing: A Survey.ACM Trans. Graph.42, 2, Article 16 (Oct. 2022), 44 pages. doi:10.1145/3554920
-
[32]
Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. 2023. DREAM- FUSION: TEXT-TO-3D USING 2D DIFFUSION.11th International Conference on Learning Representations, Iclr 2023(2023)
work page 2023
-
[33]
P. Prusinkiewicz. 1986. Graphical applications of L-systems.Proceedings of Graphics Interface ’86 and Vision Interface ’86(1986), 247–253
work page 1986
- [34]
-
[35]
Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied Hands: Modeling and Capturing Hands and Bodies Together.ACM Transactions on Graphics, (Proc. SIGGRAPH Asia)36, 6 (Nov. 2017)
work page 2017
-
[36]
P. Schmidt, D. Pieper, and L. Kobbelt. 2023. Surface Maps via Adaptive Triangu- lations.Computer Graphics Forum42, 2 (2023), 103–117. doi:10.1111/cgf.14747
-
[37]
Pavel Senin. 2008. Dynamic time warping algorithm review.Information and Computer Science Department University of Hawaii at Manoa Honolulu, USA855, 1-23 (2008), 40
work page 2008
-
[38]
Tianchang Shen, Zhaoshuo Li, Marc Law, Matan Atzmon, Sanja Fidler, James Lucas, Jun Gao, and Nicholas Sharp. 2024. SpaceMesh: A Continuous Represen- tation for Learning Manifold Surface Meshes.Siggraph Asia 2024 Conference Papers(2024). doi:10.1145/3680528.3687634
-
[39]
Tianchang Shen, Jacob Munkberg, Jon Hasselgren, Kangxue Yin, Zian Wang, Wenzheng Chen, Zan Gojcic, Sanja Fidler, Nicholas Sharp, and Jun Gao. 2023. Flexible Isosurface Extraction for Gradient-Based Mesh Optimization.ACM Trans. Graph.42, 4, Article 37 (jul 2023), 16 pages. doi:10.1145/3592430
-
[40]
Jonathan Richard Shewchuk. 1996. Triangle: Engineering a 2D quality mesh generator and delaunay triangulator.Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 1148 (1996), 203–222. doi:10.1007/bfb0014497
-
[41]
Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, and Matthias Nießner. 2024. MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers.Proceedings of the Ieee Computer Society Conference on Computer Vision and Pattern Recognition (2024), 19615–19625. doi:10.1109/CVPR52733.2024.01855
- [42]
-
[43]
Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Morgan McGuire, and Sanja Fidler. 2021. Neural Geometric Level of Detail: Real-Time Rendering with Implicit 3D Shapes. Proceedings of the Ieee Computer Society Conference on Computer Vision and Pattern Recognition(2021), 11353–11362. doi:10.1109/...
-
[44]
Jiaxiang Tang, Zhaoshuo Li, Zekun Hao, Xian Liu, Gang Zeng, Ming-Yu Liu, and Qinsheng Zhang. 2024. EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation. (2024). doi:10.48550/arXiv.2409.18114
-
[45]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need.Advances in Neural Information Processing Systems 30 (nips 2017)30 (2017)
work page 2017
-
[46]
P. Viville, P. Kraemer, and D. Bechmann. 2023. Meso-Skeleton Guided Hexahedral Mesh Design.Computer Graphics Forum42, 7 (2023), e14932. doi:10.1111/cgf. 14932
work page doi:10.1111/cgf 2023
-
[47]
Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, and Karsten Kreis. 2022. LION: Latent Point Diffusion Models for 3D Shape Generation. InAdvances in Neural Information Processing Systems (NeurIPS)
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.