arxiv: 2603.04337 · v2 · submitted 2026-03-04 · 💻 cs.CV · cs.CL

Recognition: no theorem link

Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection

Dacheng Qi , Chenyu Wang , Jingwei Xu , Tianzhe Chu , Zibo Zhao , Wen Liu , Wenrui Ding , Yi Ma

show 1 more author

Shenghua Gao

Authors on Pith no claims yet

Pith reviewed 2026-05-15 16:23 UTC · model grok-4.3

classification 💻 cs.CV cs.CL

keywords CAD generationLLM-based modelingB-reppointer selectioncommand sequencesquantization errorgeometric entitiesCAD dataset

0 comments

The pith

Pointer-CAD unifies command sequences with B-rep by letting LLMs predict pointers to select specific edges and faces for operations like chamfer and fillet.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pointer-CAD addresses the inability of pure command sequence representations to handle entity selections needed for complex operations such as chamfer or fillet, along with topological errors from discretizing continuous variables during sketch and extrude steps. It conditions each generation step on both the textual description and the B-rep structure built from prior steps, with the LLM predicting a pointer that picks the most feature-consistent geometric entity from the available set. This explicit pointer mechanism incorporates B-rep geometric information directly into the sequential modeling process and thereby reduces quantization error. The method is trained on a newly constructed dataset of approximately 575K CAD models equipped with expert-level natural language descriptions. Experiments indicate that the approach supports complex geometric structures while driving segmentation error to an extremely low level, delivering clear gains over earlier command-sequence baselines.

Core claim

Pointer-CAD decomposes CAD model generation into sequential steps where each command is conditioned on the textual description and the B-rep generated from previous steps; whenever an operation requires selection of a geometric entity, the LLM predicts a pointer that identifies the most feature-consistent candidate among the available B-rep edges or faces, thereby unifying boundary representation geometry with command sequences and mitigating topological inaccuracies introduced by quantization.

What carries the argument

Pointer-based edges and faces selection, which identifies the most feature-consistent candidate from the current B-rep entity set for each required selection operation.

If this is right

Enables complex editing operations such as chamfer and fillet inside LLM-driven command sequence generation.
Reduces segmentation error to an extremely low level relative to prior command-sequence methods.
Mitigates topological inaccuracies that arise from quantization of continuous variables in sketch and extrude steps.
Supports reliable generation of complex geometric structures in CAD models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The pointer mechanism could be adapted to other sequential 3D modeling pipelines that need precise entity referencing.
The data annotation pipeline offers a route to scale annotated CAD datasets beyond the current 575K examples.
Iterative refinement loops in AI CAD tools may become more stable once selection errors are decoupled from quantization.
Pointer accuracy on deliberately ambiguous feature sets would expose remaining limits in current LLM selection.

Load-bearing premise

The LLM can reliably predict pointers to the intended geometric entities based on feature consistency without introducing selection ambiguities or new error modes.

What would settle it

Measure whether predicted pointers match ground-truth entity selections on B-rep models containing multiple geometrically similar edges or faces when performing fillet or chamfer operations.

Figures

Figures reproduced from arXiv: 2603.04337 by Chenyu Wang, Dacheng Qi, Jingwei Xu, Shenghua Gao, Tianzhe Chu, Wen Liu, Wenrui Ding, Yi Ma, Zibo Zhao.

**Figure 1.** Figure 1: Illustration of the strength of our proposed pointer-based command sequence compared to the previous command sequence-based CAD representation. Command sequences suffer from the inability to refer to specific edges or faces, and discretizationinduced quantization errors. In contrast, Pointer-CAD leverages edge pointers to directly refer to B-rep entities, enabling precise operations such as sketch snappin… view at source ↗

**Figure 2.** Figure 2: Pointer-CAD Pipeline. At each generation step, the full user prompt is tokenized, while the B-rep is updated with all geometry generated so far. A multimodal fusion module combines the textual prompt with the evolving B-rep, which is further encoded via a graph neural network over its faces and edges. The fused representation is then processed by a large language model to predict the vector for the current… view at source ↗

**Figure 3.** Figure 3: Dataset construction pipeline. Raw JSONs are converted into a minimal format containing only annotation-relevant elements. Sketch planes and models are rendered, and Qwen2.5-VL generates textual descriptions for integration into the JSON. Finally, Qwen2.5 produces step-by-step instructions, with dimension parameters wrapped in special tags for future data augmentation. both Label Tokens and Value Tokens is… view at source ↗

**Figure 4.** Figure 4: Qualitative performance comparison on RecapDeepCAD dataset. Our method consistently produces accurate and faithful geometry aligned with the ground truth, while competing methods often miss details or collapse entirely. Notably, Pointer-CAD achieves superior results among LLM-based methods despite a significantly smaller size than CADmium. we use Qwen2.5-0.5B [56] as the backbone LLM for Pointer-CAD. Th… view at source ↗

**Figure 6.** Figure 6: Showcase of complex CAD model generation. 5.4. Visualization of Complex Cases To demonstrate the capabilities and functional boundaries of our method, we visualize a set of generated complex CAD cases in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 5.** Figure 5: Qualitative performance comparison on RecapOmniCAD+ dataset. Our method accurately recovers detailed structures that closely match the ground truth for complex CAD models involving chamfer or fillet operations. Conversely, competing methods often miss fine-grained features or fail entirely. 5.3. Ablation on the GNN component To verify the efficiency of the GNN component, we conduct a comparison in [PIT… view at source ↗

**Figure 7.** Figure 7: Prompt comparison. Recap-DeepCAD dataset includes dimensional values with explicit units, whereas Text2CAD dataset uses normalized, unit-free geometric parameters. sampling. 9.6. Application of Click Interaction Editing Since our proposed pointer-based command sequence allows entity selection at each step, we extend the model with token concatenation to incorporate user-interactive selections alongside t… view at source ↗

**Figure 8.** Figure 8: Quantization Error. We directly measure quantization error by computing the median Chamfer Distance between each representation before and after quantization, where Pointer-CAD exhibits substantially smaller error than Text2CAD. Add a cylinder on the selected face Apply a fillet to the selected edges. Cut a cylinder from the selected face [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Illustration of our interactive editing functionality. Users can directly click on a face or edge of the CAD model and provide a text prompt to specify the desired operation. sketch plane selection. 10.1. Specific Vector Translation Rules Each token is classified as one of three types: Label Token, Value Token, or Pointer. To simplify the model architecture, we assign non-overlapping integer ranges to lab… view at source ↗

**Figure 10.** Figure 10: c, the final sketch coordinate system UV W is obtained by applying a counterclockwise in-plane rotation to U ′V ′W′ about the W-axis. An optional scaling factor may also be applied to mitigate quantization errors. Z X Y (a) Face selection. Z X Y P(x, y) Origin (u, v) W’ U’ V’ Z X Y P(x, y) Origin (u, v) W’ U’ V’ (b) Origin definition. W U’ V U θ (c) Rotation definition [PITH_FULL_IMAGE:figures/full_fig… view at source ↗

**Figure 11.** Figure 11: A non-manifold topology leads to multiple valid inter [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: Representative samples from the Recap-OmniCAD+ dataset. The figure displays a range of models with varying complexity, from simpler parts with basic features to intricate components incorporating numerous fillets, chamfers, and complex sketches. (via cosine similarity) against the 128-dimensional embeddings of all candidate geometric entities (faces and edges) generated by the B-rep encoder. The entity… view at source ↗

**Figure 13.** Figure 13: Distribution of modeling operations across datasets. The figure illustrates the total count of each modeling operation type for the DeepCAD, OmniCAD, and Recap-OmniCAD+ datasets. 1 2 3 4 5 6 7 8 9 10 ≥11 0.0 2.0×104 4.0×104 6.0×104 8.0×104 1.0×105 1.2×105 1.4×105 1.6×105 1.8×105 Count Solid Modeling Operation DeepCAD OmniCAD Recap-OmniCAD+ [PITH_FULL_IMAGE:figures/full_fig_p019_13.png] view at source ↗

**Figure 14.** Figure 14: Distribution of modeling steps per model. The figure compares the number of solid modeling operations required per model across the datasets [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗

**Figure 15.** Figure 15: Prompt for visual description. This prompt is used with the Qwen2.5-vl-72B model to generate a description of the CAD model’s visual appearance. Right Left Back Front Top Bottom You are given six orthographic views of the same 3D object in the following fixed order (each image also has a label at the bottom-right corner indicating its view): 1. Right view 2. Left view 3. Back view 4. Front view 5. Top vie… view at source ↗

**Figure 16.** Figure 16: Prompt for sketch plane description. This prompt guides the model to describe the relative position of the sketch plane, with placeholders for the normal vector and facing direction being dynamically replaced [PITH_FULL_IMAGE:figures/full_fig_p020_16.png] view at source ↗

**Figure 17.** Figure 17: Examples of the minimal JSON structure. This figure illustrates two structured ’minimal JSONs’ format, which integrates visual annotations and key modeling parameters for the language model [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗

**Figure 18.** Figure 18: Prompt for generating the final natural language description. This prompt is used with the ’minimal JSON’ to generate the final natural language description of the modeling process [PITH_FULL_IMAGE:figures/full_fig_p022_18.png] view at source ↗

read the original abstract

Constructing computer-aided design (CAD) models is labor-intensive but essential for engineering and manufacturing. Recent advances in Large Language Models (LLMs) have inspired the LLM-based CAD generation by representing CAD as command sequences. But these methods struggle in practical scenarios because command sequence representation does not support entity selection (e.g. faces or edges), limiting its ability to support complex editing operations such as chamfer or fillet. Further, the discretization of a continuous variable during sketch and extrude operations may result in topological errors. To address these limitations, we present Pointer-CAD, a novel LLM-based CAD generation framework that leverages a pointer-based command sequence representation to explicitly incorporate the geometric information of B-rep models into sequential modeling. In particular, Pointer-CAD decomposes CAD model generation into steps, conditioning the generation of each subsequent step on both the textual description and the B-rep generated from previous steps. Whenever an operation requires the selection of a specific geometric entity, the LLM predicts a Pointer that selects the most feature-consistent candidate from the available set. Such a selection operation also reduces the quantization error in the command sequence-based representation. To support the training of Pointer-CAD, we develop a data annotation pipeline that produces expert-level natural language descriptions and apply it to build a dataset of approximately 575K CAD models. Extensive experimental results demonstrate that Pointer-CAD effectively supports the generation of complex geometric structures and reduces segmentation error to an extremely low level, achieving a significant improvement over prior command sequence methods, thereby significantly mitigating the topological inaccuracies introduced by quantization error.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pointer-CAD adds a pointer mechanism to let LLMs pick B-rep edges and faces inside command sequences, but the abstract gives no numbers to show it actually works.

read the letter

Pointer-CAD tries to solve two problems in LLM-based CAD: command sequences can't select specific faces or edges for edits like fillets, and discretizing sketch parameters creates topological mistakes. The fix is a pointer that, at each step, picks the most feature-consistent entity from the current B-rep while conditioning on both the text description and the geometry built so far. That unification is the concrete new piece. They also ran a data pipeline to add natural-language descriptions to roughly 575K models, which is useful infrastructure for anyone training these systems. The conditioning loop and the pointer selection rule are described clearly enough that the approach feels workable on paper. The dataset size and the explicit handling of entity selection are the parts that look like real progress over pure sequence baselines. The weak part is the evidence. The abstract states that the method supports complex structures, drops segmentation error to an extremely low level, and fixes quantization-induced topology problems, yet it supplies no metrics, no ablation tables, no pointer accuracy numbers, and no analysis of ambiguous cases where several edges look similar. If pointer prediction is off even once, the accumulating B-rep carries the error forward, so the lack of those checks matters. The stress-test note is right on this point. This paper is for groups already working on LLM CAD who need entity-level edits for manufacturing tasks. A reader who wants to see how pointers can be added to sequence models will get something from it. It deserves a serious referee because the idea is direct and the data scale is decent, but the current version would need the actual results and error analysis before it could be judged properly.

Referee Report

2 major / 1 minor

Summary. The paper introduces Pointer-CAD, an LLM-based CAD generation framework that unifies B-Rep geometry with command sequences through pointer-based selection of edges and faces. It decomposes generation into steps conditioned on accumulating B-Rep and textual descriptions, using LLM-predicted pointers to select feature-consistent entities for operations like chamfer or fillet. This is claimed to support complex structures while reducing segmentation error and mitigating quantization-induced topological inaccuracies. A data annotation pipeline yields a dataset of ~575K models with expert natural language descriptions, and experiments are said to demonstrate significant improvements over prior command-sequence methods.

Significance. If the pointer mechanism proves reliable, the work could meaningfully advance LLM-driven CAD by enabling entity-aware editing and lowering topological errors from discretization, addressing key practical limitations in sequential representations.

major comments (2)

[Abstract] Abstract: the central claim that Pointer-CAD 'reduces segmentation error to an extremely low level' and achieves 'significant improvement' over prior methods is unsupported by any numerical metrics, baseline comparisons, or ablation results; without these, the asserted mitigation of quantization-induced topological inaccuracies cannot be evaluated.
[Abstract] Abstract / Method description: the pointer prediction step assumes reliable selection of the 'most feature-consistent candidate' from available B-Rep entities, yet no pointer-level accuracy metrics, analysis of ambiguous cases (geometrically similar edges/faces), or isolation of selection errors from command errors are provided; a single mis-predicted pointer would propagate invalid topology through the accumulating B-Rep conditioning, directly undermining the headline improvement.

minor comments (1)

[Abstract] The dataset size is stated as 'approximately 575K'; an exact count and split details would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comments point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that Pointer-CAD 'reduces segmentation error to an extremely low level' and achieves 'significant improvement' over prior methods is unsupported by any numerical metrics, baseline comparisons, or ablation results; without these, the asserted mitigation of quantization-induced topological inaccuracies cannot be evaluated.

Authors: The body of the manuscript (Section 4) contains quantitative experimental results, including baseline comparisons against prior command-sequence methods and ablations that demonstrate reductions in segmentation error and mitigation of topological inaccuracies due to quantization. We agree that the abstract should make these results explicit rather than qualitative. We will revise the abstract to incorporate key numerical metrics, such as the reported segmentation error rates and relative improvements over baselines. revision: yes
Referee: [Abstract] Abstract / Method description: the pointer prediction step assumes reliable selection of the 'most feature-consistent candidate' from available B-Rep entities, yet no pointer-level accuracy metrics, analysis of ambiguous cases (geometrically similar edges/faces), or isolation of selection errors from command errors are provided; a single mis-predicted pointer would propagate invalid topology through the accumulating B-Rep conditioning, directly undermining the headline improvement.

Authors: We acknowledge the concern regarding error propagation from pointer mispredictions. Our current evaluation reports end-to-end model quality, which incorporates the effects of both command generation and pointer selection. The original manuscript does not include isolated pointer-level accuracy metrics or a dedicated analysis of ambiguous cases. In revision we will add a discussion of potential propagation effects and error isolation where possible using existing data, but a full quantitative breakdown of pointer accuracy on ambiguous entities would require new experiments. revision: partial

standing simulated objections not resolved

Isolated pointer-level accuracy metrics and dedicated analysis of ambiguous entity selection cases (geometrically similar edges/faces), as these were not computed in the original experiments.

Circularity Check

0 steps flagged

No significant circularity; Pointer-CAD framework is self-contained

full rationale

The paper introduces a pointer-based command sequence for LLM-driven CAD generation that conditions each step on prior B-rep output plus text. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the derivation. Dataset construction via annotation pipeline and reported error reductions are presented as independent empirical outcomes rather than tautological re-statements of inputs. The central claims rest on the proposed pointer selection mechanism and external evaluation, without reduction to prior results by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the unverified assumption that pointer prediction will be accurate and that the annotation pipeline yields reliable training data; no free parameters or invented physical entities are stated.

axioms (1)

domain assumption LLMs can be trained to predict pointers that correctly select intended geometric entities from B-rep models given text and prior geometry
This assumption underpins the entire pointer selection step and the claimed error reduction.

invented entities (1)

Pointer-based command sequence representation no independent evidence
purpose: To allow explicit selection of edges and faces inside sequential CAD command generation
New representational device introduced to bridge B-rep geometry with command sequences.

pith-pipeline@v0.9.0 · 5611 in / 1286 out tokens · 78710 ms · 2026-05-15T16:23:01.196689+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Computer-Aided Design Generation by Cascaded Discrete Diffusion Model
cs.CV 2026-05 unverdicted novelty 7.0

Cascaded discrete diffusion generates CAD command sequences with absorbing transitions and parameters with Gaussian, scale-invariant, and prior-preserving kernels, outperforming autoregressive and continuous diffusion...

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · cited by 1 Pith paper · 7 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Gencad: Image- conditioned computer-aided design generation with transformer-based contrastive representation and diffusion priors.arXiv preprint arXiv:2409.16294, 2024

Md Ferdous Alam and Faez Ahmed. Gencad: Image- conditioned computer-aided design generation with transformer-based contrastive representation and diffusion priors.arXiv preprint arXiv:2409.16294, 2024. 1

work page arXiv 2024
[3]

Gen- erating cad code with vision-language models for 3d designs

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Zaidi, Megan Langwasser, Wei Xu, and Matthew Gombolay. Gen- erating cad code with vision-language models for 3d designs. arXiv preprint arXiv:2410.05340, 2024. 1

work page arXiv 2024
[4]

Ge- ometric modeling of solid objects by using a face adjacency graph representation.ACM SIGGRAPH Computer Graphics, 19(3):131–139, 1985

Silvia Ansaldi, Leila De Floriani, and Bianca Falcidieno. Ge- ometric modeling of solid objects by using a face adjacency graph representation.ACM SIGGRAPH Computer Graphics, 19(3):131–139, 1985. 3

work page 1985
[5]

Claude opus 4 system card, 2025

Anthropic. Claude opus 4 system card, 2025. 7

work page 2025
[6]

Qwen2.5-VL Technical Report

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2.5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 2, 6

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

CadQuery, 2025

CadQuery Contributors. CadQuery, 2025. 1, 3, 12

work page 2025
[8]

Computer aided detection (cad): an overview.Cancer Imaging, 5(1):17, 2005

Ronald A Castellino. Computer aided detection (cad): an overview.Cancer Imaging, 5(1):17, 2005. 1

work page 2005
[9]

Img2cad: Conditioned 3-d cad model generation from single image with structured visual geometry.IEEE Trans- actions on Industrial Informatics, 2025

Tianrun Chen, Chunan Yu, Yuanqi Hu, Jing Li, Tao Xu, Run- long Cao, Lanyun Zhu, Ying Zang, Yong Zhang, Zejian Li, et al. Img2cad: Conditioned 3-d cad model generation from single image with structured visual geometry.IEEE Trans- actions on Industrial Informatics, 2025. 3

work page 2025
[10]

Inversecsg: Automatic conversion of 3d models to csg trees.ACM Transactions on Graphics (TOG), 37(6):1–16, 2018

Tao Du, Jeevana Priya Inala, Yewen Pu, Andrew Spielberg, Adriana Schulz, Daniela Rus, Armando Solar-Lezama, and Wojciech Matusik. Inversecsg: Automatic conversion of 3d models to csg trees.ACM Transactions on Graphics (TOG), 37(6):1–16, 2018. 3

work page 2018
[11]

Transcad: A hi- erarchical transformer for cad sequence inference from point clouds

Elona Dupont, Kseniya Cherenkova, Dimitrios Mallis, Gleb Gusev, Anis Kacem, and Djamila Aouada. Transcad: A hi- erarchical transformer for cad sequence inference from point clouds. InEuropean Conference on Computer Vision, pages 19–36. Springer, 2024. 3

work page 2024
[12]

A parametric and feature-based cad dataset to support human-computer interaction for advanced 3d shape learning.Integrated Computer-Aided Engineering, 32(1):75–96, 2025

Rubin Fan, Fazhi He, Yuxin Liu, Yupeng Song, Linkun Fan, and Xiaohu Yan. A parametric and feature-based cad dataset to support human-computer interaction for advanced 3d shape learning.Integrated Computer-Aided Engineering, 32(1):75–96, 2025. 3

work page 2025
[13]

Addison-Wesley Professional, 1996

James D Foley.Computer graphics: principles and practice. Addison-Wesley Professional, 1996. 3

work page 1996
[14]

FreeCAD, 2024

FreeCAD Community. FreeCAD, 2024. 3

work page 2024
[15]

Gemini 2.5 pro, 2025

Google DeepMind. Gemini 2.5 pro, 2025. 7

work page 2025
[16]

Cadmium: Fine-tuning code language models for text-driven sequential cad design.arXiv preprint arXiv:2507.09792, 2025

Prashant Govindarajan, Davide Baldelli, Jay Pathak, Quentin Fournier, and Sarath Chandar. Cadmium: Fine-tuning code language models for text-driven sequential cad design.arXiv preprint arXiv:2507.09792, 2025. 2, 3, 4, 7, 12

work page arXiv 2025
[17]

CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

Yandong Guan, Xilin Wang, Xingxi Ming, Jing Zhang, Dong Xu, and Qian Yu. Cad-coder: Text-to-cad generation with chain-of-thought and geometric reward.arXiv preprint arXiv:2505.19713, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

Complexgen: Cad reconstruction by b-rep chain complex generation.ACM Transactions on Graphics (TOG), 41(4):1–18, 2022

Haoxiang Guo, Shilin Liu, Hao Pan, Yang Liu, Xin Tong, and Baining Guo. Complexgen: Cad reconstruction by b-rep chain complex generation.ACM Transactions on Graphics (TOG), 41(4):1–18, 2022. 3

work page 2022
[19]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022. 5

work page 2022
[20]

Opencoder: The open cook- book for top-tier code large language models

Siming Huang, Tianhao Cheng, Jason Klein Liu, Weidi Xu, Jiaran Hao, Liuyihan Song, Yang Xu, Jian Yang, Jiaheng Liu, Chenchen Zhang, et al. Opencoder: The open cook- book for top-tier code large language models. InACL, pages 33167–33193, 2025. 3

work page 2025
[21]

Uv-net: Learning from boundary rep- resentations

Pradeep Kumar Jayaraman, Aditya Sanghi, Joseph G Lam- bourne, Karl DD Willis, Thomas Davies, Hooman Shayani, and Nigel Morris. Uv-net: Learning from boundary rep- resentations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11703– 11712, 2021. 4

work page 2021
[22]

Solidgen: An autoregressive model for direct b-rep synthe- sis.arXiv preprint arXiv:2203.13944, 2022

Pradeep Kumar Jayaraman, Joseph G Lambourne, Nishkrit Desai, Karl DD Willis, Aditya Sanghi, and Nigel JW Morris. Solidgen: An autoregressive model for direct b-rep synthe- sis.arXiv preprint arXiv:2203.13944, 2022. 3

work page arXiv 2022
[23]

Ucsg-net-unsupervised discovering of constructive solid ge- ometry tree.Advances in neural information processing sys- tems, 33:8776–8786, 2020

Kacper Kania, Maciej Zieba, and Tomasz Kajdanowicz. Ucsg-net-unsupervised discovering of constructive solid ge- ometry tree.Advances in neural information processing sys- tems, 33:8776–8786, 2020. 3

work page 2020
[24]

Cad-signet: Cad language inference from point clouds using layer-wise sketch instance guided attention

Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-signet: Cad language inference from point clouds using layer-wise sketch instance guided attention. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 4713–4722, 2024. 1, 3

work page 2024
[25]

Text2cad: Generating sequential cad designs from beginner- to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024

Mohammad Sadil Khan, Sankalp Sinha, Talha Uddin, Di- dier Stricker, Sk Aziz Ali, and Muhammad Zeshan Afzal. Text2cad: Generating sequential cad designs from beginner- to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024. 1, 2, 3, 4, 6, 7, 12

work page 2024
[26]

Abc: A big cad model dataset for geometric deep learning

Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, and Daniele Panozzo. Abc: A big cad model dataset for geometric deep learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9601–9611, 2019. 3

work page 2019
[27]

cadrille: Multi-modal cad reconstruc- tion with online reinforcement learning.arXiv preprint arXiv:2505.22914, 2025

Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhem- chuzhnikov, Alexander Nikulin, Ilya Zisman, Anna V orontsova, Anton Konushin, Vladislav Kurenkov, and Danila Rukhovich. cadrille: Multi-modal cad reconstruc- tion with online reinforcement learning.arXiv preprint arXiv:2505.22914, 2025. 1, 3

work page arXiv 2025
[28]

Brepnet: A topological message passing system for solid models

Joseph G Lambourne, Karl DD Willis, Pradeep Kumar Jayaraman, Aditya Sanghi, Peter Meltzer, and Hooman Shayani. Brepnet: A topological message passing system for solid models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12773– 12782, 2021. 1

work page 2021
[29]

Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation

Jiahao Li, Weijian Ma, Xueyang Li, Yunzhong Lou, Guichun Zhou, and Xiangdong Zhou. Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation. InCVPR, pages 18563–18573, 2025. 1, 3

work page 2025
[30]

Hola: B-rep genera- tion using a holistic latent representation.ACM Transactions on Graphics (TOG), 44(4):1–25, 2025

Yilin Liu, Duoteng Xu, Xingyao Yu, Xiang Xu, Daniel Cohen-Or, Hao Zhang, and Hui Huang. Hola: B-rep genera- tion using a holistic latent representation.ACM Transactions on Graphics (TOG), 44(4):1–25, 2025. 3

work page 2025
[31]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 18

work page internal anchor Pith review Pith/arXiv arXiv 2017
[32]

Draw step by step: Reconstructing cad construction sequences from point clouds via multimodal diffusion

Weijian Ma, Shuaiqi Chen, Yunzhong Lou, Xueyang Li, and Xiangdong Zhou. Draw step by step: Reconstructing cad construction sequences from point clouds via multimodal diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 27154– 27163, 2024. 3

work page 2024
[33]

Polygen: An autoregressive generative model of 3d meshes

Charlie Nash, Yaroslav Ganin, SM Ali Eslami, and Peter Battaglia. Polygen: An autoregressive generative model of 3d meshes. InInternational conference on machine learning, pages 7220–7229. PMLR, 2020. 3

work page 2020
[34]

Creft-cad: Boosting orthographic projection reasoning for cad via re- inforcement fine-tuning.arXiv preprint arXiv:2506.00568,

Ke Niu, Zhuofan Chen, Haiyang Yu, Yuwen Chen, Teng Fu, Mengyang Zhao, Bin Li, and Xiangyang Xue. Creft-cad: Boosting orthographic projection reasoning for cad via re- inforcement fine-tuning.arXiv preprint arXiv:2506.00568,

work page arXiv
[35]

Introducing gpt 5.2, 2025

OpenAI. Introducing gpt 5.2, 2025. 7

work page 2025
[36]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 18

work page 2021
[37]

Mlcad: A survey of research in machine learning for cad keynote paper.IEEE Transactions on Computer-Aided Design of Integrated Cir- cuits and Systems, 41(10):3162–3181, 2021

Martin Rapp, Hussam Amrouch, Yibo Lin, Bei Yu, David Z Pan, Marilyn Wolf, and J ¨org Henkel. Mlcad: A survey of research in machine learning for cad keynote paper.IEEE Transactions on Computer-Aided Design of Integrated Cir- cuits and Systems, 41(10):3162–3181, 2021. 1

work page 2021
[38]

Csg-stump: A learn- ing friendly csg-like representation for interpretable shape parsing

Daxuan Ren, Jianmin Zheng, Jianfei Cai, Jiatong Li, Haiyong Jiang, Zhongang Cai, Junzhe Zhang, Liang Pan, Mingyuan Zhang, Haiyu Zhao, et al. Csg-stump: A learn- ing friendly csg-like representation for interpretable shape parsing. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 12478–12487, 2021. 3

work page 2021
[39]

Cad-recode: Reverse engineering cad code from point clouds

Danila Rukhovich, Elona Dupont, Dimitrios Mallis, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-recode: Reverse engineering cad code from point clouds. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 9801–9811, 2025. 3

work page 2025
[40]

The graph neural net- work model.IEEE transactions on neural networks, 20(1): 61–80, 2008

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Ha- genbuchner, and Gabriele Monfardini. The graph neural net- work model.IEEE transactions on neural networks, 20(1): 61–80, 2008. 2, 4

work page 2008
[41]

Csgnet: Neural shape parser for constructive solid geometry

Gopal Sharma, Rishabh Goyal, Difan Liu, Evangelos Kalogerakis, and Subhransu Maji. Csgnet: Neural shape parser for constructive solid geometry. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 5515–5523, 2018. 3

work page 2018
[42]

Qwen2 Technical Report

Qwen Team. Qwen2 technical report.arXiv preprint arXiv:2407.10671, 2024. 5

work page internal anchor Pith review Pith/arXiv arXiv 2024
[43]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 5

work page 2017
[44]

Pointer networks.Advances in neural information processing sys- tems, 28, 2015

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks.Advances in neural information processing sys- tems, 28, 2015. 2, 3

work page 2015
[45]

Text- to-cad generation through infusing visual feedback in large language models

Ruiyu Wang, Yu Yuan, Shizhao Sun, and Jiang Bian. Text- to-cad generation through infusing visual feedback in large language models. InICML, 2025. 3, 12

work page 2025
[46]

Cad-gpt: Synthesising cad construction sequence with spatial reasoning-enhanced mul- timodal llms

Siyu Wang, Cailian Chen, Xinyi Le, Qimin Xu, Lei Xu, Yanzhou Zhang, and Jie Yang. Cad-gpt: Synthesising cad construction sequence with spatial reasoning-enhanced mul- timodal llms. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7880–7888, 2025. 1, 3

work page 2025
[47]

Fusion 360 gallery: A dataset and environ- ment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4): 1–24, 2021

Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lambourne, Armando Solar-Lezama, and Wo- jciech Matusik. Fusion 360 gallery: A dataset and environ- ment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4): 1–24, 2021. 3

work page 2021
[48]

Cmt: A cascade mar with topology predictor for multimodal conditional cad generation.arXiv preprint arXiv:2504.20830, 2025

Jianyu Wu, Yizhou Wang, Xiangyu Yue, Xinzhu Ma, Jingyang Guo, Dongzhan Zhou, Wanli Ouyang, and Shix- iang Tang. Cmt: A cascade mar with topology predictor for multimodal conditional cad generation.arXiv preprint arXiv:2504.20830, 2025. 1, 3

work page arXiv 2025
[49]

Deepcad: A deep generative network for computer-aided design models

Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer-aided design models. in 2021 ieee. InCVF International Conference on Computer Vision (ICCV), pages 6772–6782, 2021. 1, 2, 3, 4, 7

work page 2021
[50]

Unsupervised feature learning via non-parametric instance discrimination

Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3733–3742,

work page
[51]

arXiv:2505.06507 [cs.AI] https://arxiv.org/abs/2505.06507 Xiang Xu, Pradeep Kumar Jayaraman, Joseph G Lambourne, Karl DD Willis, and Yasutaka Furukawa

Haoyang Xie and Feng Ju. Text-to-cadquery: A new paradigm for cad generation with scalable large model ca- pabilities.arXiv preprint arXiv:2505.06507, 2025. 1, 12

work page arXiv 2025
[52]

Cad-mllm: Unifying multimodality- conditioned cad generation with mllm.arXiv preprint arXiv:2411.04954, 2024

Jingwei Xu, Chenyu Wang, Zibo Zhao, Wen Liu, Yi Ma, and Shenghua Gao. Cad-mllm: Unifying multimodality- conditioned cad generation with mllm.arXiv preprint arXiv:2411.04954, 2024. 1, 2, 3, 6, 7, 12

work page arXiv 2024
[53]

Skexgen: Autoregressive generation of cad con- struction sequences with disentangled codebooks.arXiv preprint arXiv:2207.04632, 2022

Xiang Xu, Karl DD Willis, Joseph G Lambourne, Chin- Yi Cheng, Pradeep Kumar Jayaraman, and Yasutaka Fu- rukawa. Skexgen: Autoregressive generation of cad con- struction sequences with disentangled codebooks.arXiv preprint arXiv:2207.04632, 2022. 1, 3, 12

work page arXiv 2022
[54]

Brepgen: A b-rep generative diffusion model with structured latent geometry.ACM Transactions on Graphics (TOG), 43 (4):1–14, 2024

Xiang Xu, Joseph Lambourne, Pradeep Jayaraman, Zhengqing Wang, Karl Willis, and Yasutaka Furukawa. Brepgen: A b-rep generative diffusion model with structured latent geometry.ACM Transactions on Graphics (TOG), 43 (4):1–14, 2024. 1, 3

work page 2024
[55]

Qwen3 Technical Report

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 1, 7

work page internal anchor Pith review Pith/arXiv arXiv 2025
[56]

Qwen2.5 Technical Report

Qwen An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxin Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin,...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[57]

Rl- cad: Reinforcement learning training gym for revolution in- volved cad command sequence generation.arXiv preprint arXiv:2503.18549, 2025

Xiaolong Yin, Xingyu Lu, Jiahang Shen, Jingzhe Ni, Hai- long Li, Ruofeng Tong, Min Tang, and Peng Du. Rl- cad: Reinforcement learning training gym for revolution in- volved cad command sequence generation.arXiv preprint arXiv:2503.18549, 2025. 4

work page arXiv 2025
[58]

Img2cad: Reverse engineering 3d cad models from images through vlm-assisted conditional factorization.arXiv preprint arXiv:2408.01437, 2024

Yang You, Mikaela Angelina Uy, Jiaqi Han, Rahul Thomas, Haotong Zhang, Suya You, and Leonidas Guibas. Img2cad: Reverse engineering 3d cad models from images through vlm-assisted conditional factorization.arXiv preprint arXiv:2408.01437, 2024. 1

work page arXiv 2024
[59]

Capri-net: Learning compact cad shapes with adaptive prim- itive assembly

Fenggen Yu, Zhiqin Chen, Manyi Li, Aditya Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, and Hao Zhang. Capri-net: Learning compact cad shapes with adaptive prim- itive assembly. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11768– 11778, 2022. 3

work page 2022
[60]

Advances in Neural Information Processing Systems, 36: 22807–22819, 2023

Fenggen Yu, Qimin Chen, Maham Tanveer, Ali Mah- davi Amiri, and Hao Zhang.D 2csg: Unsupervised learning of compact csg trees with dual complements and dropouts. Advances in Neural Information Processing Systems, 36: 22807–22819, 2023. 3

work page 2023
[61]

Gencad-three-dimensional: Computer-aided design program generation using multimodal latent space alignment and syn- thetic dataset balancing.JMD, 148(3):031703, 2026

Nomi Yu, Md Ferdous Alam, A John Hart, and Faez Ahmed. Gencad-three-dimensional: Computer-aided design program generation using multimodal latent space alignment and syn- thetic dataset balancing.JMD, 148(3):031703, 2026. 3

work page 2026
[62]

Openecad: An efficient visual language model for editable 3d-cad design

Zhe Yuan, Jianqi Shi, and Yanhong Huang. Openecad: An efficient visual language model for editable 3d-cad design. Computers & Graphics, 124:104048, 2024. 3

work page 2024
[63]

Diffusion-cad: Controllable dif- fusion model for generating computer-aided design models

Aijia Zhang, Weiqiang Jia, Qiang Zou, Yixiong Feng, Xi- aoxiang Wei, and Ye Zhang. Diffusion-cad: Controllable dif- fusion model for generating computer-aided design models. IEEE Transactions on Visualization and Computer Graph- ics, 2025. 3

work page 2025
[64]

Flexcad: Unified and versatile controllable cad generation with fine-tuned large language models

Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, and Jiang Bian. Flexcad: Unified and versatile controllable cad generation with fine-tuned large language models. In ICLR, 2025. 3

work page 2025
[65]

Codegeex: A pre- trained model for code generation with multilingual bench- marking on humaneval-x

Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, and Jie Tang. Codegeex: A pre- trained model for code generation with multilingual bench- marking on humaneval-x. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5673–5684, 2023. 3...

work page 2023
[66]

Additional Evaluations and Analytical Dis- cussions 9.1. Comprehensive Metrics for Text-to-CAD Eval- uation To provide a more complete assessment of model perfor- mance, we further report the Invalidity Ratio (IR), Dangling Edge Length (DangEL), and Self-Intersection Ratio (SIR) following previous works [16, 52]. IR measures genera- tion robustness, while...

work page arXiv
[67]

Details of the Pointer-based Representation This section elaborates on the implementation logic of the pointer-based representation and the methodology for Table 11.Ablation results on the Recap-DeepCAD-Norm dataset.All baseline methods show a substantial drop in IR, in- dicating that they depend on memorizing dataset-specific dimen- sional patterns rathe...

work page
[68]

Details of the training framework. 11.1. B-rep encoder For each B-rep edge, we uniformly sample 32 points along its parametric curve in 3D space and extract four quantities at each location: point coordinates, tangent and its reverse vector, and first-order derivatives. Each is represented as a 3D vector, and their concatenation yields a 12-dimensional fe...

work page
[69]

Dataset Visualization Figure 12 presents several representative samples from the Recap-OmniCAD+ dataset, showcasing a wide spectrum of model complexity and diversity

Details of the Dataset 12.1. Dataset Visualization Figure 12 presents several representative samples from the Recap-OmniCAD+ dataset, showcasing a wide spectrum of model complexity and diversity. As illustrated, our dataset contains a rich variety of models that not only feature com- plex geometric details such as fillets and chamfers but also exhibit div...

work page
[70]

We use the AdamW optimizer [31] with a learning rate of1×10 −4 and a linear decay schedule

Implementation Details For the default 0.5B model setting, the entire training pro- cess requires approximately 23 hours on 16 NVIDIA H800 GPUs. We use the AdamW optimizer [31] with a learning rate of1×10 −4 and a linear decay schedule. For LoRA, the dropout rate is set to0.1. We use a micro-batch size of 9 with 2 gradient accumulation steps per GPU. The ...

work page
[71]

Generate a one-word name for the object, enclosed in <name></name>

work page
[72]

Focus on geometric form, symmetry, major extrusions or cutouts, and distinctive elements

Write a clear and concise one-sentence caption for the object, enclosed in <caption></caption>, summarizing its overall shape and key structural features. Focus on geometric form, symmetry, major extrusions or cutouts, and distinctive elements. Avoid interpretation or unnecessary detail. Figure 15.Prompt for visual description.This prompt is used with the...

work page
[73]

parts": {

Bottom view A red planar surface on a light blue object is highlighted. Red planar surface normal vector (numeric): (nx, ny, nz) = (REPLACE_NX, REPLACE_NY, REPLACE_NZ) Authoritative facing-direction hints (precomputed; use exactly as given): - current X direction: REPLACE_X_DIR (one of: right / left / none) - current Y direction: REPLACE_Y_DIR (one of: ba...

work page
[74]

One or more 2D sketches. Each sketch may contain lines, arcs, and circles: (i) A line is defined by a start point and an end point; (ii) An arc is defined by a center point, a start point, a sweep angle, and a direction; (iii) A circle is defined by a center point and a radius

work page
[75]

A coordinate system that positions the sketch in 3D space using: (i) A sketch plane (Top, Right, or Front), which defines the basis for the coordinate system; (ii) Rotation angles following Z-Y-X order (first rotate around Z, then Y, then X); (iii) Translation along the x, y, and z directions; (iv) Optionally, a description field may be present, giving a ...

work page
[76]

It includes: (i) extrude operation type, which defines how the extrusion modifies geometry

An extrusion operation applied to the sketch or sketches. It includes: (i) extrude operation type, which defines how the extrusion modifies geometry. This may involve adding material, cutting, intersecting, or creating new bodies or components; (ii) extrude extent mode, which defines how far and in which direction the sketch is extruded. Interpret and exp...

work page
[77]

A fillet operation, defined by: (i) fillet radius, which specifies the radius of the fillet; (ii) fillet tangent chain, which indicates whether the fillet continues smoothly along tangent edges; (iii) fillet edges, which specifies the edges to which the fillet is applied

work page
[78]

Instructions must follow these rules: 1

A chamfer operation, defined by: (i) chamfer distance, which specifies the distance of the chamfer; (ii) chamfer tangent chain, which indicates whether the chamfer continues smoothly along tangent edges; (iii) chamfer edges, which specifies the edges to which the chamfer is applied. Instructions must follow these rules: 1. Explicitly mention each sketch, ...

work page