pith. machine review for the scientific record. sign in

arxiv: 2603.04337 · v2 · submitted 2026-03-04 · 💻 cs.CV · cs.CL

Recognition: no theorem link

Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection

Authors on Pith no claims yet

Pith reviewed 2026-05-15 16:23 UTC · model grok-4.3

classification 💻 cs.CV cs.CL
keywords CAD generationLLM-based modelingB-reppointer selectioncommand sequencesquantization errorgeometric entitiesCAD dataset
0
0 comments X

The pith

Pointer-CAD unifies command sequences with B-rep by letting LLMs predict pointers to select specific edges and faces for operations like chamfer and fillet.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pointer-CAD addresses the inability of pure command sequence representations to handle entity selections needed for complex operations such as chamfer or fillet, along with topological errors from discretizing continuous variables during sketch and extrude steps. It conditions each generation step on both the textual description and the B-rep structure built from prior steps, with the LLM predicting a pointer that picks the most feature-consistent geometric entity from the available set. This explicit pointer mechanism incorporates B-rep geometric information directly into the sequential modeling process and thereby reduces quantization error. The method is trained on a newly constructed dataset of approximately 575K CAD models equipped with expert-level natural language descriptions. Experiments indicate that the approach supports complex geometric structures while driving segmentation error to an extremely low level, delivering clear gains over earlier command-sequence baselines.

Core claim

Pointer-CAD decomposes CAD model generation into sequential steps where each command is conditioned on the textual description and the B-rep generated from previous steps; whenever an operation requires selection of a geometric entity, the LLM predicts a pointer that identifies the most feature-consistent candidate among the available B-rep edges or faces, thereby unifying boundary representation geometry with command sequences and mitigating topological inaccuracies introduced by quantization.

What carries the argument

Pointer-based edges and faces selection, which identifies the most feature-consistent candidate from the current B-rep entity set for each required selection operation.

If this is right

  • Enables complex editing operations such as chamfer and fillet inside LLM-driven command sequence generation.
  • Reduces segmentation error to an extremely low level relative to prior command-sequence methods.
  • Mitigates topological inaccuracies that arise from quantization of continuous variables in sketch and extrude steps.
  • Supports reliable generation of complex geometric structures in CAD models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The pointer mechanism could be adapted to other sequential 3D modeling pipelines that need precise entity referencing.
  • The data annotation pipeline offers a route to scale annotated CAD datasets beyond the current 575K examples.
  • Iterative refinement loops in AI CAD tools may become more stable once selection errors are decoupled from quantization.
  • Pointer accuracy on deliberately ambiguous feature sets would expose remaining limits in current LLM selection.

Load-bearing premise

The LLM can reliably predict pointers to the intended geometric entities based on feature consistency without introducing selection ambiguities or new error modes.

What would settle it

Measure whether predicted pointers match ground-truth entity selections on B-rep models containing multiple geometrically similar edges or faces when performing fillet or chamfer operations.

Figures

Figures reproduced from arXiv: 2603.04337 by Chenyu Wang, Dacheng Qi, Jingwei Xu, Shenghua Gao, Tianzhe Chu, Wen Liu, Wenrui Ding, Yi Ma, Zibo Zhao.

Figure 1
Figure 1. Figure 1: Illustration of the strength of our proposed pointer-based command sequence compared to the previous command sequence-based CAD representation. Command sequences suffer from the inability to refer to specific edges or faces, and discretization￾induced quantization errors. In contrast, Pointer-CAD leverages edge pointers to directly refer to B-rep entities, enabling precise operations such as sketch snappin… view at source ↗
Figure 2
Figure 2. Figure 2: Pointer-CAD Pipeline. At each generation step, the full user prompt is tokenized, while the B-rep is updated with all geometry generated so far. A multimodal fusion module combines the textual prompt with the evolving B-rep, which is further encoded via a graph neural network over its faces and edges. The fused representation is then processed by a large language model to predict the vector for the current… view at source ↗
Figure 3
Figure 3. Figure 3: Dataset construction pipeline. Raw JSONs are converted into a minimal format containing only annotation-relevant elements. Sketch planes and models are rendered, and Qwen2.5-VL generates textual descriptions for integration into the JSON. Finally, Qwen2.5 produces step-by-step instructions, with dimension parameters wrapped in special tags for future data augmentation. both Label Tokens and Value Tokens is… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative performance comparison on Recap￾DeepCAD dataset. Our method consistently produces accurate and faithful geometry aligned with the ground truth, while com￾peting methods often miss details or collapse entirely. Notably, Pointer-CAD achieves superior results among LLM-based meth￾ods despite a significantly smaller size than CADmium. we use Qwen2.5-0.5B [56] as the backbone LLM for Pointer-CAD. Th… view at source ↗
Figure 6
Figure 6. Figure 6: Showcase of complex CAD model generation. 5.4. Visualization of Complex Cases To demonstrate the capabilities and functional boundaries of our method, we visualize a set of generated complex CAD cases in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative performance comparison on Recap￾OmniCAD+ dataset. Our method accurately recovers detailed structures that closely match the ground truth for complex CAD models involving chamfer or fillet operations. Conversely, com￾peting methods often miss fine-grained features or fail entirely. 5.3. Ablation on the GNN component To verify the efficiency of the GNN component, we con￾duct a comparison in [PIT… view at source ↗
Figure 7
Figure 7. Figure 7: Prompt comparison. Recap-DeepCAD dataset includes dimensional values with explicit units, whereas Text2CAD dataset uses normalized, unit-free geometric parameters. sampling. 9.6. Application of Click Interaction Editing Since our proposed pointer-based command sequence al￾lows entity selection at each step, we extend the model with token concatenation to incorporate user-interactive se￾lections alongside t… view at source ↗
Figure 8
Figure 8. Figure 8: Quantization Error. We directly measure quantization error by computing the median Chamfer Distance between each representation before and after quantization, where Pointer-CAD exhibits substantially smaller error than Text2CAD. Add a cylinder on the selected face Apply a fillet to the selected edges. Cut a cylinder from the selected face [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Illustration of our interactive editing functionality. Users can directly click on a face or edge of the CAD model and provide a text prompt to specify the desired operation. sketch plane selection. 10.1. Specific Vector Translation Rules Each token is classified as one of three types: Label Token, Value Token, or Pointer. To simplify the model architec￾ture, we assign non-overlapping integer ranges to lab… view at source ↗
Figure 10
Figure 10. Figure 10: c, the final sketch coordinate system UV W is ob￾tained by applying a counterclockwise in-plane rotation to U ′V ′W′ about the W-axis. An optional scaling factor may also be applied to mitigate quantization errors. Z X Y (a) Face selection. Z X Y P(x, y) Origin (u, v) W’ U’ V’ Z X Y P(x, y) Origin (u, v) W’ U’ V’ (b) Origin definition. W U’ V U θ (c) Rotation defini￾tion [PITH_FULL_IMAGE:figures/full_fig… view at source ↗
Figure 11
Figure 11. Figure 11: A non-manifold topology leads to multiple valid inter [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Representative samples from the Recap-OmniCAD+ dataset. The figure displays a range of models with varying com￾plexity, from simpler parts with basic features to intricate com￾ponents incorporating numerous fillets, chamfers, and complex sketches. (via cosine similarity) against the 128-dimensional embed￾dings of all candidate geometric entities (faces and edges) generated by the B-rep encoder. The entity… view at source ↗
Figure 13
Figure 13. Figure 13: Distribution of modeling operations across datasets. The figure illustrates the total count of each modeling opera￾tion type for the DeepCAD, OmniCAD, and Recap-OmniCAD+ datasets. 1 2 3 4 5 6 7 8 9 10 ≥11 0.0 2.0×104 4.0×104 6.0×104 8.0×104 1.0×105 1.2×105 1.4×105 1.6×105 1.8×105 Count Solid Modeling Operation DeepCAD OmniCAD Recap-OmniCAD+ [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Distribution of modeling steps per model. The figure compares the number of solid modeling operations required per model across the datasets [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Prompt for visual description. This prompt is used with the Qwen2.5-vl-72B model to generate a description of the CAD model’s visual appearance. Right Left Back Front Top Bottom You are given six orthographic views of the same 3D object in the following fixed order (each image also has a label at the bottom-right corner indicating its view): 1. Right view 2. Left view 3. Back view 4. Front view 5. Top vie… view at source ↗
Figure 16
Figure 16. Figure 16: Prompt for sketch plane description. This prompt guides the model to describe the relative position of the sketch plane, with placeholders for the normal vector and facing direction being dynamically replaced [PITH_FULL_IMAGE:figures/full_fig_p020_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Examples of the minimal JSON structure. This figure illustrates two structured ’minimal JSONs’ format, which integrates visual annotations and key modeling parameters for the language model [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Prompt for generating the final natural language description. This prompt is used with the ’minimal JSON’ to generate the final natural language description of the modeling process [PITH_FULL_IMAGE:figures/full_fig_p022_18.png] view at source ↗
read the original abstract

Constructing computer-aided design (CAD) models is labor-intensive but essential for engineering and manufacturing. Recent advances in Large Language Models (LLMs) have inspired the LLM-based CAD generation by representing CAD as command sequences. But these methods struggle in practical scenarios because command sequence representation does not support entity selection (e.g. faces or edges), limiting its ability to support complex editing operations such as chamfer or fillet. Further, the discretization of a continuous variable during sketch and extrude operations may result in topological errors. To address these limitations, we present Pointer-CAD, a novel LLM-based CAD generation framework that leverages a pointer-based command sequence representation to explicitly incorporate the geometric information of B-rep models into sequential modeling. In particular, Pointer-CAD decomposes CAD model generation into steps, conditioning the generation of each subsequent step on both the textual description and the B-rep generated from previous steps. Whenever an operation requires the selection of a specific geometric entity, the LLM predicts a Pointer that selects the most feature-consistent candidate from the available set. Such a selection operation also reduces the quantization error in the command sequence-based representation. To support the training of Pointer-CAD, we develop a data annotation pipeline that produces expert-level natural language descriptions and apply it to build a dataset of approximately 575K CAD models. Extensive experimental results demonstrate that Pointer-CAD effectively supports the generation of complex geometric structures and reduces segmentation error to an extremely low level, achieving a significant improvement over prior command sequence methods, thereby significantly mitigating the topological inaccuracies introduced by quantization error.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Pointer-CAD, an LLM-based CAD generation framework that unifies B-Rep geometry with command sequences through pointer-based selection of edges and faces. It decomposes generation into steps conditioned on accumulating B-Rep and textual descriptions, using LLM-predicted pointers to select feature-consistent entities for operations like chamfer or fillet. This is claimed to support complex structures while reducing segmentation error and mitigating quantization-induced topological inaccuracies. A data annotation pipeline yields a dataset of ~575K models with expert natural language descriptions, and experiments are said to demonstrate significant improvements over prior command-sequence methods.

Significance. If the pointer mechanism proves reliable, the work could meaningfully advance LLM-driven CAD by enabling entity-aware editing and lowering topological errors from discretization, addressing key practical limitations in sequential representations.

major comments (2)
  1. [Abstract] Abstract: the central claim that Pointer-CAD 'reduces segmentation error to an extremely low level' and achieves 'significant improvement' over prior methods is unsupported by any numerical metrics, baseline comparisons, or ablation results; without these, the asserted mitigation of quantization-induced topological inaccuracies cannot be evaluated.
  2. [Abstract] Abstract / Method description: the pointer prediction step assumes reliable selection of the 'most feature-consistent candidate' from available B-Rep entities, yet no pointer-level accuracy metrics, analysis of ambiguous cases (geometrically similar edges/faces), or isolation of selection errors from command errors are provided; a single mis-predicted pointer would propagate invalid topology through the accumulating B-Rep conditioning, directly undermining the headline improvement.
minor comments (1)
  1. [Abstract] The dataset size is stated as 'approximately 575K'; an exact count and split details would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comments point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that Pointer-CAD 'reduces segmentation error to an extremely low level' and achieves 'significant improvement' over prior methods is unsupported by any numerical metrics, baseline comparisons, or ablation results; without these, the asserted mitigation of quantization-induced topological inaccuracies cannot be evaluated.

    Authors: The body of the manuscript (Section 4) contains quantitative experimental results, including baseline comparisons against prior command-sequence methods and ablations that demonstrate reductions in segmentation error and mitigation of topological inaccuracies due to quantization. We agree that the abstract should make these results explicit rather than qualitative. We will revise the abstract to incorporate key numerical metrics, such as the reported segmentation error rates and relative improvements over baselines. revision: yes

  2. Referee: [Abstract] Abstract / Method description: the pointer prediction step assumes reliable selection of the 'most feature-consistent candidate' from available B-Rep entities, yet no pointer-level accuracy metrics, analysis of ambiguous cases (geometrically similar edges/faces), or isolation of selection errors from command errors are provided; a single mis-predicted pointer would propagate invalid topology through the accumulating B-Rep conditioning, directly undermining the headline improvement.

    Authors: We acknowledge the concern regarding error propagation from pointer mispredictions. Our current evaluation reports end-to-end model quality, which incorporates the effects of both command generation and pointer selection. The original manuscript does not include isolated pointer-level accuracy metrics or a dedicated analysis of ambiguous cases. In revision we will add a discussion of potential propagation effects and error isolation where possible using existing data, but a full quantitative breakdown of pointer accuracy on ambiguous entities would require new experiments. revision: partial

standing simulated objections not resolved
  • Isolated pointer-level accuracy metrics and dedicated analysis of ambiguous entity selection cases (geometrically similar edges/faces), as these were not computed in the original experiments.

Circularity Check

0 steps flagged

No significant circularity; Pointer-CAD framework is self-contained

full rationale

The paper introduces a pointer-based command sequence for LLM-driven CAD generation that conditions each step on prior B-rep output plus text. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the derivation. Dataset construction via annotation pipeline and reported error reductions are presented as independent empirical outcomes rather than tautological re-statements of inputs. The central claims rest on the proposed pointer selection mechanism and external evaluation, without reduction to prior results by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the unverified assumption that pointer prediction will be accurate and that the annotation pipeline yields reliable training data; no free parameters or invented physical entities are stated.

axioms (1)
  • domain assumption LLMs can be trained to predict pointers that correctly select intended geometric entities from B-rep models given text and prior geometry
    This assumption underpins the entire pointer selection step and the claimed error reduction.
invented entities (1)
  • Pointer-based command sequence representation no independent evidence
    purpose: To allow explicit selection of edges and faces inside sequential CAD command generation
    New representational device introduced to bridge B-rep geometry with command sequences.

pith-pipeline@v0.9.0 · 5611 in / 1286 out tokens · 78710 ms · 2026-05-15T16:23:01.196689+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Computer-Aided Design Generation by Cascaded Discrete Diffusion Model

    cs.CV 2026-05 unverdicted novelty 7.0

    Cascaded discrete diffusion generates CAD command sequences with absorbing transitions and parameters with Gaussian, scale-invariant, and prior-preserving kernels, outperforming autoregressive and continuous diffusion...

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · cited by 1 Pith paper · 7 internal anchors

  1. [1]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

  2. [2]

    Gencad: Image- conditioned computer-aided design generation with transformer-based contrastive representation and diffusion priors.arXiv preprint arXiv:2409.16294, 2024

    Md Ferdous Alam and Faez Ahmed. Gencad: Image- conditioned computer-aided design generation with transformer-based contrastive representation and diffusion priors.arXiv preprint arXiv:2409.16294, 2024. 1

  3. [3]

    Gen- erating cad code with vision-language models for 3d designs

    Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Zaidi, Megan Langwasser, Wei Xu, and Matthew Gombolay. Gen- erating cad code with vision-language models for 3d designs. arXiv preprint arXiv:2410.05340, 2024. 1

  4. [4]

    Ge- ometric modeling of solid objects by using a face adjacency graph representation.ACM SIGGRAPH Computer Graphics, 19(3):131–139, 1985

    Silvia Ansaldi, Leila De Floriani, and Bianca Falcidieno. Ge- ometric modeling of solid objects by using a face adjacency graph representation.ACM SIGGRAPH Computer Graphics, 19(3):131–139, 1985. 3

  5. [5]

    Claude opus 4 system card, 2025

    Anthropic. Claude opus 4 system card, 2025. 7

  6. [6]

    Qwen2.5-VL Technical Report

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2.5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 2, 6

  7. [7]

    CadQuery, 2025

    CadQuery Contributors. CadQuery, 2025. 1, 3, 12

  8. [8]

    Computer aided detection (cad): an overview.Cancer Imaging, 5(1):17, 2005

    Ronald A Castellino. Computer aided detection (cad): an overview.Cancer Imaging, 5(1):17, 2005. 1

  9. [9]

    Img2cad: Conditioned 3-d cad model generation from single image with structured visual geometry.IEEE Trans- actions on Industrial Informatics, 2025

    Tianrun Chen, Chunan Yu, Yuanqi Hu, Jing Li, Tao Xu, Run- long Cao, Lanyun Zhu, Ying Zang, Yong Zhang, Zejian Li, et al. Img2cad: Conditioned 3-d cad model generation from single image with structured visual geometry.IEEE Trans- actions on Industrial Informatics, 2025. 3

  10. [10]

    Inversecsg: Automatic conversion of 3d models to csg trees.ACM Transactions on Graphics (TOG), 37(6):1–16, 2018

    Tao Du, Jeevana Priya Inala, Yewen Pu, Andrew Spielberg, Adriana Schulz, Daniela Rus, Armando Solar-Lezama, and Wojciech Matusik. Inversecsg: Automatic conversion of 3d models to csg trees.ACM Transactions on Graphics (TOG), 37(6):1–16, 2018. 3

  11. [11]

    Transcad: A hi- erarchical transformer for cad sequence inference from point clouds

    Elona Dupont, Kseniya Cherenkova, Dimitrios Mallis, Gleb Gusev, Anis Kacem, and Djamila Aouada. Transcad: A hi- erarchical transformer for cad sequence inference from point clouds. InEuropean Conference on Computer Vision, pages 19–36. Springer, 2024. 3

  12. [12]

    A parametric and feature-based cad dataset to support human-computer interaction for advanced 3d shape learning.Integrated Computer-Aided Engineering, 32(1):75–96, 2025

    Rubin Fan, Fazhi He, Yuxin Liu, Yupeng Song, Linkun Fan, and Xiaohu Yan. A parametric and feature-based cad dataset to support human-computer interaction for advanced 3d shape learning.Integrated Computer-Aided Engineering, 32(1):75–96, 2025. 3

  13. [13]

    Addison-Wesley Professional, 1996

    James D Foley.Computer graphics: principles and practice. Addison-Wesley Professional, 1996. 3

  14. [14]

    FreeCAD, 2024

    FreeCAD Community. FreeCAD, 2024. 3

  15. [15]

    Gemini 2.5 pro, 2025

    Google DeepMind. Gemini 2.5 pro, 2025. 7

  16. [16]

    Cadmium: Fine-tuning code language models for text-driven sequential cad design.arXiv preprint arXiv:2507.09792, 2025

    Prashant Govindarajan, Davide Baldelli, Jay Pathak, Quentin Fournier, and Sarath Chandar. Cadmium: Fine-tuning code language models for text-driven sequential cad design.arXiv preprint arXiv:2507.09792, 2025. 2, 3, 4, 7, 12

  17. [17]

    CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

    Yandong Guan, Xilin Wang, Xingxi Ming, Jing Zhang, Dong Xu, and Qian Yu. Cad-coder: Text-to-cad generation with chain-of-thought and geometric reward.arXiv preprint arXiv:2505.19713, 2025. 3

  18. [18]

    Complexgen: Cad reconstruction by b-rep chain complex generation.ACM Transactions on Graphics (TOG), 41(4):1–18, 2022

    Haoxiang Guo, Shilin Liu, Hao Pan, Yang Liu, Xin Tong, and Baining Guo. Complexgen: Cad reconstruction by b-rep chain complex generation.ACM Transactions on Graphics (TOG), 41(4):1–18, 2022. 3

  19. [19]

    Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 5

  20. [20]

    Opencoder: The open cook- book for top-tier code large language models

    Siming Huang, Tianhao Cheng, Jason Klein Liu, Weidi Xu, Jiaran Hao, Liuyihan Song, Yang Xu, Jian Yang, Jiaheng Liu, Chenchen Zhang, et al. Opencoder: The open cook- book for top-tier code large language models. InACL, pages 33167–33193, 2025. 3

  21. [21]

    Uv-net: Learning from boundary rep- resentations

    Pradeep Kumar Jayaraman, Aditya Sanghi, Joseph G Lam- bourne, Karl DD Willis, Thomas Davies, Hooman Shayani, and Nigel Morris. Uv-net: Learning from boundary rep- resentations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11703– 11712, 2021. 4

  22. [22]

    Solidgen: An autoregressive model for direct b-rep synthe- sis.arXiv preprint arXiv:2203.13944, 2022

    Pradeep Kumar Jayaraman, Joseph G Lambourne, Nishkrit Desai, Karl DD Willis, Aditya Sanghi, and Nigel JW Morris. Solidgen: An autoregressive model for direct b-rep synthe- sis.arXiv preprint arXiv:2203.13944, 2022. 3

  23. [23]

    Ucsg-net-unsupervised discovering of constructive solid ge- ometry tree.Advances in neural information processing sys- tems, 33:8776–8786, 2020

    Kacper Kania, Maciej Zieba, and Tomasz Kajdanowicz. Ucsg-net-unsupervised discovering of constructive solid ge- ometry tree.Advances in neural information processing sys- tems, 33:8776–8786, 2020. 3

  24. [24]

    Cad-signet: Cad language inference from point clouds using layer-wise sketch instance guided attention

    Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-signet: Cad language inference from point clouds using layer-wise sketch instance guided attention. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 4713–4722, 2024. 1, 3

  25. [25]

    Text2cad: Generating sequential cad designs from beginner- to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024

    Mohammad Sadil Khan, Sankalp Sinha, Talha Uddin, Di- dier Stricker, Sk Aziz Ali, and Muhammad Zeshan Afzal. Text2cad: Generating sequential cad designs from beginner- to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024. 1, 2, 3, 4, 6, 7, 12

  26. [26]

    Abc: A big cad model dataset for geometric deep learning

    Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, and Daniele Panozzo. Abc: A big cad model dataset for geometric deep learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9601–9611, 2019. 3

  27. [27]

    cadrille: Multi-modal cad reconstruc- tion with online reinforcement learning.arXiv preprint arXiv:2505.22914, 2025

    Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhem- chuzhnikov, Alexander Nikulin, Ilya Zisman, Anna V orontsova, Anton Konushin, Vladislav Kurenkov, and Danila Rukhovich. cadrille: Multi-modal cad reconstruc- tion with online reinforcement learning.arXiv preprint arXiv:2505.22914, 2025. 1, 3

  28. [28]

    Brepnet: A topological message passing system for solid models

    Joseph G Lambourne, Karl DD Willis, Pradeep Kumar Jayaraman, Aditya Sanghi, Peter Meltzer, and Hooman Shayani. Brepnet: A topological message passing system for solid models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12773– 12782, 2021. 1

  29. [29]

    Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation

    Jiahao Li, Weijian Ma, Xueyang Li, Yunzhong Lou, Guichun Zhou, and Xiangdong Zhou. Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation. InCVPR, pages 18563–18573, 2025. 1, 3

  30. [30]

    Hola: B-rep genera- tion using a holistic latent representation.ACM Transactions on Graphics (TOG), 44(4):1–25, 2025

    Yilin Liu, Duoteng Xu, Xingyao Yu, Xiang Xu, Daniel Cohen-Or, Hao Zhang, and Hui Huang. Hola: B-rep genera- tion using a holistic latent representation.ACM Transactions on Graphics (TOG), 44(4):1–25, 2025. 3

  31. [31]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 18

  32. [32]

    Draw step by step: Reconstructing cad construction sequences from point clouds via multimodal diffusion

    Weijian Ma, Shuaiqi Chen, Yunzhong Lou, Xueyang Li, and Xiangdong Zhou. Draw step by step: Reconstructing cad construction sequences from point clouds via multimodal diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27154– 27163, 2024. 3

  33. [33]

    Polygen: An autoregressive generative model of 3d meshes

    Charlie Nash, Yaroslav Ganin, SM Ali Eslami, and Peter Battaglia. Polygen: An autoregressive generative model of 3d meshes. InInternational conference on machine learning, pages 7220–7229. PMLR, 2020. 3

  34. [34]

    Creft-cad: Boosting orthographic projection reasoning for cad via re- inforcement fine-tuning.arXiv preprint arXiv:2506.00568,

    Ke Niu, Zhuofan Chen, Haiyang Yu, Yuwen Chen, Teng Fu, Mengyang Zhao, Bin Li, and Xiangyang Xue. Creft-cad: Boosting orthographic projection reasoning for cad via re- inforcement fine-tuning.arXiv preprint arXiv:2506.00568,

  35. [35]

    Introducing gpt 5.2, 2025

    OpenAI. Introducing gpt 5.2, 2025. 7

  36. [36]

    Learning transferable visual models from natural language supervi- sion

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 18

  37. [37]

    Mlcad: A survey of research in machine learning for cad keynote paper.IEEE Transactions on Computer-Aided Design of Integrated Cir- cuits and Systems, 41(10):3162–3181, 2021

    Martin Rapp, Hussam Amrouch, Yibo Lin, Bei Yu, David Z Pan, Marilyn Wolf, and J ¨org Henkel. Mlcad: A survey of research in machine learning for cad keynote paper.IEEE Transactions on Computer-Aided Design of Integrated Cir- cuits and Systems, 41(10):3162–3181, 2021. 1

  38. [38]

    Csg-stump: A learn- ing friendly csg-like representation for interpretable shape parsing

    Daxuan Ren, Jianmin Zheng, Jianfei Cai, Jiatong Li, Haiyong Jiang, Zhongang Cai, Junzhe Zhang, Liang Pan, Mingyuan Zhang, Haiyu Zhao, et al. Csg-stump: A learn- ing friendly csg-like representation for interpretable shape parsing. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 12478–12487, 2021. 3

  39. [39]

    Cad-recode: Reverse engineering cad code from point clouds

    Danila Rukhovich, Elona Dupont, Dimitrios Mallis, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-recode: Reverse engineering cad code from point clouds. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 9801–9811, 2025. 3

  40. [40]

    The graph neural net- work model.IEEE transactions on neural networks, 20(1): 61–80, 2008

    Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Ha- genbuchner, and Gabriele Monfardini. The graph neural net- work model.IEEE transactions on neural networks, 20(1): 61–80, 2008. 2, 4

  41. [41]

    Csgnet: Neural shape parser for constructive solid geometry

    Gopal Sharma, Rishabh Goyal, Difan Liu, Evangelos Kalogerakis, and Subhransu Maji. Csgnet: Neural shape parser for constructive solid geometry. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 5515–5523, 2018. 3

  42. [42]

    Qwen2 Technical Report

    Qwen Team. Qwen2 technical report.arXiv preprint arXiv:2407.10671, 2024. 5

  43. [43]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 5

  44. [44]

    Pointer networks.Advances in neural information processing sys- tems, 28, 2015

    Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks.Advances in neural information processing sys- tems, 28, 2015. 2, 3

  45. [45]

    Text- to-cad generation through infusing visual feedback in large language models

    Ruiyu Wang, Yu Yuan, Shizhao Sun, and Jiang Bian. Text- to-cad generation through infusing visual feedback in large language models. InICML, 2025. 3, 12

  46. [46]

    Cad-gpt: Synthesising cad construction sequence with spatial reasoning-enhanced mul- timodal llms

    Siyu Wang, Cailian Chen, Xinyi Le, Qimin Xu, Lei Xu, Yanzhou Zhang, and Jie Yang. Cad-gpt: Synthesising cad construction sequence with spatial reasoning-enhanced mul- timodal llms. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7880–7888, 2025. 1, 3

  47. [47]

    Fusion 360 gallery: A dataset and environ- ment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4): 1–24, 2021

    Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lambourne, Armando Solar-Lezama, and Wo- jciech Matusik. Fusion 360 gallery: A dataset and environ- ment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4): 1–24, 2021. 3

  48. [48]

    Cmt: A cascade mar with topology predictor for multimodal conditional cad generation.arXiv preprint arXiv:2504.20830, 2025

    Jianyu Wu, Yizhou Wang, Xiangyu Yue, Xinzhu Ma, Jingyang Guo, Dongzhan Zhou, Wanli Ouyang, and Shix- iang Tang. Cmt: A cascade mar with topology predictor for multimodal conditional cad generation.arXiv preprint arXiv:2504.20830, 2025. 1, 3

  49. [49]

    Deepcad: A deep generative network for computer-aided design models

    Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer-aided design models. in 2021 ieee. InCVF International Conference on Computer Vision (ICCV), pages 6772–6782, 2021. 1, 2, 3, 4, 7

  50. [50]

    Unsupervised feature learning via non-parametric instance discrimination

    Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3733–3742,

  51. [51]

    arXiv:2505.06507 [cs.AI] https://arxiv.org/abs/2505.06507 Xiang Xu, Pradeep Kumar Jayaraman, Joseph G Lambourne, Karl DD Willis, and Yasutaka Furukawa

    Haoyang Xie and Feng Ju. Text-to-cadquery: A new paradigm for cad generation with scalable large model ca- pabilities.arXiv preprint arXiv:2505.06507, 2025. 1, 12

  52. [52]

    Cad-mllm: Unifying multimodality- conditioned cad generation with mllm.arXiv preprint arXiv:2411.04954, 2024

    Jingwei Xu, Chenyu Wang, Zibo Zhao, Wen Liu, Yi Ma, and Shenghua Gao. Cad-mllm: Unifying multimodality- conditioned cad generation with mllm.arXiv preprint arXiv:2411.04954, 2024. 1, 2, 3, 6, 7, 12

  53. [53]

    Skexgen: Autoregressive generation of cad con- struction sequences with disentangled codebooks.arXiv preprint arXiv:2207.04632, 2022

    Xiang Xu, Karl DD Willis, Joseph G Lambourne, Chin- Yi Cheng, Pradeep Kumar Jayaraman, and Yasutaka Fu- rukawa. Skexgen: Autoregressive generation of cad con- struction sequences with disentangled codebooks.arXiv preprint arXiv:2207.04632, 2022. 1, 3, 12

  54. [54]

    Brepgen: A b-rep generative diffusion model with structured latent geometry.ACM Transactions on Graphics (TOG), 43 (4):1–14, 2024

    Xiang Xu, Joseph Lambourne, Pradeep Jayaraman, Zhengqing Wang, Karl Willis, and Yasutaka Furukawa. Brepgen: A b-rep generative diffusion model with structured latent geometry.ACM Transactions on Graphics (TOG), 43 (4):1–14, 2024. 1, 3

  55. [55]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 1, 7

  56. [56]

    Qwen2.5 Technical Report

    Qwen An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxin Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin,...

  57. [57]

    Rl- cad: Reinforcement learning training gym for revolution in- volved cad command sequence generation.arXiv preprint arXiv:2503.18549, 2025

    Xiaolong Yin, Xingyu Lu, Jiahang Shen, Jingzhe Ni, Hai- long Li, Ruofeng Tong, Min Tang, and Peng Du. Rl- cad: Reinforcement learning training gym for revolution in- volved cad command sequence generation.arXiv preprint arXiv:2503.18549, 2025. 4

  58. [58]

    Img2cad: Reverse engineering 3d cad models from images through vlm-assisted conditional factorization.arXiv preprint arXiv:2408.01437, 2024

    Yang You, Mikaela Angelina Uy, Jiaqi Han, Rahul Thomas, Haotong Zhang, Suya You, and Leonidas Guibas. Img2cad: Reverse engineering 3d cad models from images through vlm-assisted conditional factorization.arXiv preprint arXiv:2408.01437, 2024. 1

  59. [59]

    Capri-net: Learning compact cad shapes with adaptive prim- itive assembly

    Fenggen Yu, Zhiqin Chen, Manyi Li, Aditya Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, and Hao Zhang. Capri-net: Learning compact cad shapes with adaptive prim- itive assembly. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11768– 11778, 2022. 3

  60. [60]

    Advances in Neural Information Processing Systems, 36: 22807–22819, 2023

    Fenggen Yu, Qimin Chen, Maham Tanveer, Ali Mah- davi Amiri, and Hao Zhang.D 2csg: Unsupervised learning of compact csg trees with dual complements and dropouts. Advances in Neural Information Processing Systems, 36: 22807–22819, 2023. 3

  61. [61]

    Gencad-three-dimensional: Computer-aided design program generation using multimodal latent space alignment and syn- thetic dataset balancing.JMD, 148(3):031703, 2026

    Nomi Yu, Md Ferdous Alam, A John Hart, and Faez Ahmed. Gencad-three-dimensional: Computer-aided design program generation using multimodal latent space alignment and syn- thetic dataset balancing.JMD, 148(3):031703, 2026. 3

  62. [62]

    Openecad: An efficient visual language model for editable 3d-cad design

    Zhe Yuan, Jianqi Shi, and Yanhong Huang. Openecad: An efficient visual language model for editable 3d-cad design. Computers & Graphics, 124:104048, 2024. 3

  63. [63]

    Diffusion-cad: Controllable dif- fusion model for generating computer-aided design models

    Aijia Zhang, Weiqiang Jia, Qiang Zou, Yixiong Feng, Xi- aoxiang Wei, and Ye Zhang. Diffusion-cad: Controllable dif- fusion model for generating computer-aided design models. IEEE Transactions on Visualization and Computer Graph- ics, 2025. 3

  64. [64]

    Flexcad: Unified and versatile controllable cad generation with fine-tuned large language models

    Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, and Jiang Bian. Flexcad: Unified and versatile controllable cad generation with fine-tuned large language models. In ICLR, 2025. 3

  65. [65]

    Codegeex: A pre- trained model for code generation with multilingual bench- marking on humaneval-x

    Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, and Jie Tang. Codegeex: A pre- trained model for code generation with multilingual bench- marking on humaneval-x. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5673–5684, 2023. 3...

  66. [66]

    Additional Evaluations and Analytical Dis- cussions 9.1. Comprehensive Metrics for Text-to-CAD Eval- uation To provide a more complete assessment of model perfor- mance, we further report the Invalidity Ratio (IR), Dangling Edge Length (DangEL), and Self-Intersection Ratio (SIR) following previous works [16, 52]. IR measures genera- tion robustness, while...

  67. [67]

    Details of the Pointer-based Representation This section elaborates on the implementation logic of the pointer-based representation and the methodology for Table 11.Ablation results on the Recap-DeepCAD-Norm dataset.All baseline methods show a substantial drop in IR, in- dicating that they depend on memorizing dataset-specific dimen- sional patterns rathe...

  68. [68]

    Details of the training framework. 11.1. B-rep encoder For each B-rep edge, we uniformly sample 32 points along its parametric curve in 3D space and extract four quantities at each location: point coordinates, tangent and its reverse vector, and first-order derivatives. Each is represented as a 3D vector, and their concatenation yields a 12-dimensional fe...

  69. [69]

    Dataset Visualization Figure 12 presents several representative samples from the Recap-OmniCAD+ dataset, showcasing a wide spectrum of model complexity and diversity

    Details of the Dataset 12.1. Dataset Visualization Figure 12 presents several representative samples from the Recap-OmniCAD+ dataset, showcasing a wide spectrum of model complexity and diversity. As illustrated, our dataset contains a rich variety of models that not only feature com- plex geometric details such as fillets and chamfers but also exhibit div...

  70. [70]

    We use the AdamW optimizer [31] with a learning rate of1×10 −4 and a linear decay schedule

    Implementation Details For the default 0.5B model setting, the entire training pro- cess requires approximately 23 hours on 16 NVIDIA H800 GPUs. We use the AdamW optimizer [31] with a learning rate of1×10 −4 and a linear decay schedule. For LoRA, the dropout rate is set to0.1. We use a micro-batch size of 9 with 2 gradient accumulation steps per GPU. The ...

  71. [71]

    Generate a one-word name for the object, enclosed in <name></name>

  72. [72]

    Focus on geometric form, symmetry, major extrusions or cutouts, and distinctive elements

    Write a clear and concise one-sentence caption for the object, enclosed in <caption></caption>, summarizing its overall shape and key structural features. Focus on geometric form, symmetry, major extrusions or cutouts, and distinctive elements. Avoid interpretation or unnecessary detail. Figure 15.Prompt for visual description.This prompt is used with the...

  73. [73]

    parts": {

    Bottom view A red planar surface on a light blue object is highlighted. Red planar surface normal vector (numeric): (nx, ny, nz) = (REPLACE_NX, REPLACE_NY, REPLACE_NZ) Authoritative facing-direction hints (precomputed; use exactly as given): - current X direction: REPLACE_X_DIR (one of: right / left / none) - current Y direction: REPLACE_Y_DIR (one of: ba...

  74. [74]

    One or more 2D sketches. Each sketch may contain lines, arcs, and circles: (i) A line is defined by a start point and an end point; (ii) An arc is defined by a center point, a start point, a sweep angle, and a direction; (iii) A circle is defined by a center point and a radius

  75. [75]

    A coordinate system that positions the sketch in 3D space using: (i) A sketch plane (Top, Right, or Front), which defines the basis for the coordinate system; (ii) Rotation angles following Z-Y-X order (first rotate around Z, then Y, then X); (iii) Translation along the x, y, and z directions; (iv) Optionally, a description field may be present, giving a ...

  76. [76]

    It includes: (i) extrude operation type, which defines how the extrusion modifies geometry

    An extrusion operation applied to the sketch or sketches. It includes: (i) extrude operation type, which defines how the extrusion modifies geometry. This may involve adding material, cutting, intersecting, or creating new bodies or components; (ii) extrude extent mode, which defines how far and in which direction the sketch is extruded. Interpret and exp...

  77. [77]

    A fillet operation, defined by: (i) fillet radius, which specifies the radius of the fillet; (ii) fillet tangent chain, which indicates whether the fillet continues smoothly along tangent edges; (iii) fillet edges, which specifies the edges to which the fillet is applied

  78. [78]

    Instructions must follow these rules: 1

    A chamfer operation, defined by: (i) chamfer distance, which specifies the distance of the chamfer; (ii) chamfer tangent chain, which indicates whether the chamfer continues smoothly along tangent edges; (iii) chamfer edges, which specifies the edges to which the chamfer is applied. Instructions must follow these rules: 1. Explicitly mention each sketch, ...