Text2CAD-Bench: A Benchmark for LLM-based Text-to-Parametric CAD Generation

Heng Meng; Jin Liu; Liang Wang; Litao Chen; Pingyi Zhou; Yongqiang Tang; Zekai Xiang

arxiv: 2605.18430 · v1 · pith:CNOEDABSnew · submitted 2026-05-18 · 💻 cs.LG

Text2CAD-Bench: A Benchmark for LLM-based Text-to-Parametric CAD Generation

Liang Wang , Heng Meng , Zekai Xiang , Jin Liu , Pingyi Zhou , Litao Chen , Yongqiang Tang This is my paper

Pith reviewed 2026-05-20 12:44 UTC · model grok-4.3

classification 💻 cs.LG

keywords text-to-CADparametric CADLLM benchmarkgeometric complexityCAD generationnatural language modelingtopology evaluation

0 comments

The pith

Text2CAD-Bench shows current LLMs handle basic CAD geometry but degrade sharply on complex topology and advanced features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Text2CAD-Bench, a new evaluation resource for turning natural language descriptions into parametric CAD models with large language models. It claims prior benchmarks stayed limited to simple primitives and mechanical parts, leaving out the harder cases that matter for actual design work. The new set holds 600 human-curated examples arranged in four levels of rising difficulty, each supplied with both casual geometric wording and expert-style procedural instructions. Tests on mainstream and specialized models confirm reasonable results at the basic levels yet clear drops once topology grows complex or features become advanced. This setup matters because reliable text-to-CAD conversion could shorten prototyping cycles and open modeling to users without CAD expertise.

Core claim

Text2CAD-Bench is the first benchmark that systematically tests text-to-parametric CAD generation across geometric complexity and application domains. It supplies 600 examples divided into L1-L2 for standard features, L3 for complex topology and freeform surfaces, and L4 for real-world uses outside mechanical parts, each paired with dual-style prompts. Evaluation of general and domain-specific LLMs finds solid performance on basic geometry that falls substantially when models must manage advanced topology or non-standard domains.

What carries the argument

The four-level hierarchy of 600 examples, each carrying both geometric and procedural prompts, that measures how model accuracy changes with rising topological and domain complexity.

If this is right

Targeted model improvements are required for complex topology and freeform surface handling before text-to-CAD can support realistic design tasks.
Expansion into L4-style non-mechanical domains becomes feasible only after the observed performance gaps close.
Dual prompt styles enable separate measurement of how well models interpret casual user language versus precise procedural instructions.
Public release of the benchmark supplies a shared testbed that can accelerate comparison and progress across text-to-CAD methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If models close the gap on L3 and L4, text-driven CAD could shorten iteration loops in product development by letting engineers describe changes in words rather than redraw sketches.
The benchmark structure could be reused to create parallel tests for text-to-3D or text-to-manufacturing pipelines that face similar topology challenges.
Fine-tuning on procedural prompt sequences might produce measurable gains on freeform cases, offering a concrete next experiment.

Load-bearing premise

The 600 human-curated examples and their four-level division accurately represent the distribution of challenges encountered in practical text-to-parametric CAD workflows.

What would settle it

A new collection of 200 industry-sourced CAD files, drawn independently of the original curation process, on which the same models show no performance drop when complexity increases would falsify the claim that the benchmark captures representative practical difficulty.

Figures

Figures reproduced from arXiv: 2605.18430 by Heng Meng, Jin Liu, Liang Wang, Litao Chen, Pingyi Zhou, Yongqiang Tang, Zekai Xiang.

**Figure 1.** Figure 1: Overview of the Text2CAD-Bench construction pipeline. Our human-in-the-loop process comprises four stages: (1) Design Specification defines target geometry and complexity levels; (2) Collaborative Authoring combines human expertise with AI assistance to develop validated CadQuery code; (3) Description Generation creates dual-style prompts (geometric and sequence) for each model; (4) Quality Assurance ensur… view at source ↗

**Figure 2.** Figure 2: Chamfer Distance (bars, left axis) and IoU (connected markers, right axis) across benchmark levels for general-purpose LLMs under (a) geometric and (b) sequence prompts. Within each model, the IoU line connects L1, L2, and L3 results, visualizing the degradation trend. All models exhibit rising CD and declining IoU from L1 to L3, with Qwen3-max showing the steepest degradation under geometric prompts [PIT… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of generated CAD models on L1–L2 examples under geometric (top) and sequence (bottom) prompts. Red crosses indicate execution failures. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative results on L3 under geometric prompts. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative results on L4 (Real-world) examples across diverse application domains. Ground-truth models are not shown as L4 evaluation relies on VLM-based scoring rather than geometric comparison. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

read the original abstract

Text-to-CAD generation aims to create parametric CAD models from natural language, enabling rapid prototyping and intuitive design workflows. However, existing benchmarks focus on basic primitives and simple sketch-extrude sequences, lacking advanced features essential for real-world applications and covering only traditional mechanical parts. We introduce Text2CAD-Bench, the first benchmark systematically evaluating text-to-CAD across geometric complexity and application diversity. Our benchmark comprises 600 human-curated examples spanning four levels: L1-L2 cover fundamental geometry with standard features, L3 introduces complex topology and freeform surfaces, and L4 extends to real-world domains beyond mechanical parts. Each example pairs dual-style prompts -- geometric descriptions mimicking non-expert users, and procedural sequences aligned with expert-level conventions. Evaluating mainstream general LLMs and domain-specific models, we find that current models perform reasonably on basic geometry but degrade substantially on complex topology and advanced features. We release our benchmark to drive progress in text-to-CAD research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper adds a benchmark with harder CAD cases and dual prompts, but the performance drop on complex topology needs checking against prompt ambiguity.

read the letter

The main point is that Text2CAD-Bench gives the first systematic set of 600 examples that go beyond basic primitives to include complex topology, freeform surfaces, and non-mechanical parts, with both geometric and procedural prompt styles for each. That setup lets them show current models handle L1-L2 reasonably but drop on L3-L4 features. The dual prompts are a practical addition that tries to match real user types. They evaluate a range of general and domain LLMs, which produces concrete numbers on where the gaps appear. The release of the benchmark itself is the clearest contribution here. It could give researchers a shared starting point for testing text-to-parametric generation. The curation covers four levels with human input, which is more than prior work referenced in the abstract. On the soft side, the description leaves out how the examples were selected, how many people reviewed them, or any measure of agreement. Without that, it's hard to judge whether the 600 cases truly reflect practical distributions or if some prompts simply leave too many valid CAD sequences open. The stress-test concern lands: for freeform and non-manifold cases, a natural-language description often under-constrains the exact extrude order or surface parameters, so part of the observed degradation could trace to that ambiguity rather than model limits alone. This paper is for people already working on LLM-driven CAD tools or generative design benchmarks. Someone building or evaluating models in that niche would find the new coverage and prompt variants useful to try. It deserves a serious referee because the artifact is new and the evaluation direction is relevant, even if the methods section will need more detail on construction and controls for prompt specificity. I would send it to review rather than desk reject.

Referee Report

3 major / 1 minor

Summary. The paper introduces Text2CAD-Bench, the first benchmark for evaluating LLM-based text-to-parametric CAD generation. It consists of 600 human-curated examples spanning four complexity levels (L1-L2: basic geometry and standard features; L3: complex topology and freeform surfaces; L4: real-world domains beyond mechanical parts), each with dual-style prompts (geometric descriptions for non-experts and procedural sequences for experts). Evaluations of general and domain-specific LLMs show reasonable performance on basic geometry but substantial degradation on complex topology and advanced features.

Significance. If the curation and evaluation methodology prove robust, the benchmark would fill a clear gap in existing CAD evaluation resources by incorporating advanced features and application diversity. The open release of the dataset would support reproducible progress in text-to-CAD research and help identify specific model limitations for complex parametric sequences.

major comments (3)

[Benchmark Construction] Benchmark Construction section: the curation process for the 600 examples provides no details on selection criteria, inter-rater reliability among human curators, or validation steps, which directly affects the validity of the four-level division and the claim that L3-L4 accurately capture practical challenges.
[Evaluation] Evaluation section: no statistical significance tests, confidence intervals, or variance measures are reported for the performance differences across levels, leaving the central claim of 'substantial degradation' on complex topology difficult to assess quantitatively.
[Prompt Design] Prompt Design and L3-L4 examples: geometric natural-language prompts for freeform surfaces and non-manifold topology frequently under-constrain the target parametric sequence (multiple extrude orders or surface parameterizations can satisfy the same description), which risks confounding prompt ambiguity with intrinsic model limitations in the reported degradation.

minor comments (1)

[Abstract] The abstract would benefit from including at least one concrete metric (e.g., success rate or average edit distance) to quantify the 'reasonable' vs. 'substantial degradation' findings.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments. We address each major comment point by point below, indicating where revisions have been made to the manuscript.

read point-by-point responses

Referee: [Benchmark Construction] Benchmark Construction section: the curation process for the 600 examples provides no details on selection criteria, inter-rater reliability among human curators, or validation steps, which directly affects the validity of the four-level division and the claim that L3-L4 accurately capture practical challenges.

Authors: We agree that additional details on the curation process would improve transparency and strengthen claims about the benchmark's validity. In the revised manuscript we have expanded the Benchmark Construction section with explicit selection criteria (coverage of geometric primitives, topological complexity metrics, and domain diversity), a description of the multi-expert curation workflow, and validation steps including expert cross-review for level assignment. Formal inter-rater reliability metrics were not computed during the original curation; we therefore describe the consensus process used instead of reporting statistics that were not collected. revision: partial
Referee: [Evaluation] Evaluation section: no statistical significance tests, confidence intervals, or variance measures are reported for the performance differences across levels, leaving the central claim of 'substantial degradation' on complex topology difficult to assess quantitatively.

Authors: We concur that quantitative statistical support would make the degradation claim more robust. The revised manuscript now includes bootstrap-derived 95% confidence intervals on success rates per level and paired statistical tests (McNemar’s test) comparing performance across complexity levels. These results are reported in the Evaluation section and supporting figures. revision: yes
Referee: [Prompt Design] Prompt Design and L3-L4 examples: geometric natural-language prompts for freeform surfaces and non-manifold topology frequently under-constrain the target parametric sequence (multiple extrude orders or surface parameterizations can satisfy the same description), which risks confounding prompt ambiguity with intrinsic model limitations in the reported degradation.

Authors: We recognize that natural-language prompts for complex topology can admit multiple valid parametric realizations. The dual-prompt design (geometric description paired with procedural sequence) was intended to reduce this ambiguity, and the observed performance drop is consistent across both prompt styles. In the revision we have added a dedicated discussion of prompt ambiguity, its potential confounding role, and the steps taken during curation to constrain prompts. We also include additional constrained prompt examples in the appendix. revision: partial

Circularity Check

0 steps flagged

No circularity: benchmark introduces independent evaluation data

full rationale

The paper introduces Text2CAD-Bench as a new human-curated dataset of 600 examples across four complexity levels and evaluates LLMs directly on it. No equations, fitted parameters, or derivations are present that reduce reported performance findings to self-referential inputs or prior self-citations by construction. The central claims rest on external benchmark creation and model testing rather than any closed loop of prediction equaling input.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Benchmark construction rests on the assumption that human curation yields representative and high-quality examples; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Human-curated examples across four defined levels capture the essential challenges of real-world text-to-CAD tasks
The performance claims depend on the representativeness of the 600 examples and the validity of the L1-L4 progression.

pith-pipeline@v0.9.0 · 5715 in / 1208 out tokens · 45281 ms · 2026-05-20T12:44:28.219391+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our benchmark comprises 600 human-curated examples spanning four levels: L1-L2 cover fundamental geometry with standard features, L3 introduces complex topology and freeform surfaces, and L4 extends to real-world domains beyond mechanical parts.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt three complementary metrics... Chamfer Distance (CD)... Invalidity Rate (IR)... Intersection over Union (IoU).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 12 internal anchors

[1]

Hierarchical Neural Coding for Controllable

Xu, Xiang and Jayaraman, Pradeep Kumar and Lambourne, Joseph G and Willis, Karl. Hierarchical Neural Coding for Controllable. International Conference on Machine Learning (

work page
[2]

Wu, Rundi and Xiao, Chang and Zheng, Changxi , urldate =. 2021. doi:10.1109/ICCV48922.2021.00670 , shorttitle =

work page doi:10.1109/iccv48922.2021.00670 2021
[3]

Xu, Xiang and Willis, Karl D. D. and Lambourne, Joseph G. and Cheng, Chin-Yi and Jayaraman, Pradeep Kumar and Furukawa, Yasutaka , urldate =. doi:10.48550/ARXIV.2207.04632 , shorttitle =

work page doi:10.48550/arxiv.2207.04632
[4]

and Desai, Nishkrit and Willis, Karl D

Jayaraman, Pradeep Kumar and Lambourne, Joseph G. and Desai, Nishkrit and Willis, Karl D. D. and Sanghi, Aditya and Morris, Nigel J. W. , urldate =. doi:10.48550/ARXIV.2203.13944 , shorttitle =

work page doi:10.48550/arxiv.2203.13944
[5]

Text2CAD: Generating Sequential

Khan, Mohammad Sadil and Sinha, Sankalp and Sheikh, Talha Uddin and Stricker, Didier and Ali, Sk Aziz and Afzal, Muhammad Zeshan , urldate =. Text2CAD: Generating Sequential. doi:10.48550/ARXIV.2409.17106 , shorttitle =

work page doi:10.48550/arxiv.2409.17106
[6]

Text-to-CadQuery: A New Paradigm for CADgenerationwithscalablelargemodelcapabilities

Xie, Haoyang and Ju, Feng , urldate =. Text-to-. doi:10.48550/ARXIV.2505.06507 , shorttitle =

work page doi:10.48550/arxiv.2505.06507
[7]

CAD-Recode: Reverse engineering CAD code from point clouds.arXiv preprint arXiv:2412.14042, 2024

Rukhovich, Danila and Dupont, Elona and Mallis, Dimitrios and Cherenkova, Kseniya and Kacem, Anis and Aouada, Djamila , urldate =. doi:10.48550/ARXIV.2412.14042 , shorttitle =

work page doi:10.48550/arxiv.2412.14042
[8]

Koch, Sebastian and Matveev, Albert and Jiang, Zhongshi and Williams, Francis and Artemov, Alexey and Burnaev, Evgeny and Alexa, Marc and Zorin, Denis and Panozzo, Daniele , urldate =. 2019. doi:10.1109/CVPR.2019.00983 , shorttitle =

work page doi:10.1109/cvpr.2019.00983 2019
[9]

Willis, Karl D. D. and Pu, Yewen and Luo, Jieliang and Chu, Hang and Du, Tao and Lambourne, Joseph G. and Solar-Lezama, Armando and Matusik, Wojciech , urldate =. Fusion 360 gallery: a dataset and environment for programmatic. doi:10.1145/3450626.3459818 , shorttitle =

work page doi:10.1145/3450626.3459818
[10]

2026 , url =

Claude , version =. 2026 , url =

work page 2026
[11]

doi:10.48550/ARXIV.2106.02711 , shorttitle =

Para, Wamiq Reyaz and Bhat, Shariq Farooq and Guerrero, Paul and Kelly, Tom and Mitra, Niloy and Guibas, Leonidas and Wonka, Peter , urldate =. doi:10.48550/ARXIV.2106.02711 , shorttitle =

work page doi:10.48550/arxiv.2106.02711
[12]

Proceedings of the 40th International Conference on Machine Learning , pages=

Hierarchical neural coding for controllable CAD model generation , author=. Proceedings of the 40th International Conference on Machine Learning , pages=

work page
[13]

International Design Engineering Technical Conferences and Computers and Information in Engineering Conference , volume=

Cad-coder: An open-source vision-language model for computer-aided design code generation , author=. International Design Engineering Technical Conferences and Computers and Information in Engineering Conference , volume=. 2025 , organization=

work page 2025
[14]

International Design Engineering Technical Conferences and Computers and Information in Engineering Conference , publisher =

Li, Xingang and Sun, Yuewan and Sha, Zhenghui , date =. International Design Engineering Technical Conferences and Computers and Information in Engineering Conference , publisher =

work page
[15]

Brep2Seq: a dataset and hierarchical deep learning network for reconstruction and generation of computer-aided design models , volume =

Zhang, Shuming and Guan, Zhidong and Jiang, Hao and Ning, Tao and Wang, Xiaodong and Tan, Pingan , date =. Brep2Seq: a dataset and hierarchical deep learning network for reconstruction and generation of computer-aided design models , volume =

work page
[16]

Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and Pinto, Henrique Ponde de Oliveira and Kaplan, Jared and Edwards, Harri and Burda, Yuri and Joseph, Nicholas and Brockman, Greg and Ray, Alex and Puri, Raul and Krueger, Gretchen and Petrov, Michael and Khlaaf, Heidy and Sastry, Girish and Mishkin, Pamela and Chan, Brooke and Gray, Scott and...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2107.03374
[17]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[18]

arXiv preprint arXiv:2501.19054 , year=

Text-to-cad generation through infusing visual feedback in large language models , author=. arXiv preprint arXiv:2501.19054 , year=

work page arXiv
[19]

GPT-4 Technical Report

OpenAI and Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and Avila, Red and Babuschkin, Igor and Balaji, Suchir and Balcom, Valerie and Baltescu, Paul and Bao, Haiming and Bavarian, Mohammad and Belgum, Jeff a...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774
[20]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[21]

doi:10.48550/ARXIV.2405.04434 , shorttitle =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2405.04434
[22]

Qwen2.5-Coder Technical Report

Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Lu, Keming and Dang, Kai and Fan, Yang and Zhang, Yichang and Yang, An and Men, Rui and Huang, Fei and Zheng, Bo and Miao, Yibo and Quan, Shanghaoran and Feng, Yunlong and Ren, Xingzhang and Ren, Xuancheng and Zhou...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2409.12186
[23]

Efficient memory management for large language model serving with pagedattention,

Kwon, Woosuk and Li, Zhuohan and Zhuang, Siyuan and Sheng, Ying and Zheng, Lianmin and Yu, Cody Hao and Gonzalez, Joseph and Zhang, Hao and Stoica, Ion , urldate =. Efficient Memory Management for Large Language Model Serving with. Proceedings of the 29th Symposium on Operating Systems Principles , publisher =. doi:10.1145/3600006.3613165 , eventtitle =

work page doi:10.1145/3600006.3613165
[24]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Sort , volume=

Measuring Mathematical Problem Solving With the MATH Dataset , author=. Sort , volume=

work page
[26]

Gaia: a benchmark for general ai assistants , author=

work page
[27]

1999 , publisher=

Principles of cad/cam/cae systems , author=. 1999 , publisher=

work page 1999
[28]

StarCoder: may the source be with you!

Starcoder: may the source be with you! , author=. arXiv preprint arXiv:2305.06161 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[29]

The International Conference on Learning Representations (ICLR) , year=

VITRUVION: A GENERATIVE MODEL OF PARAMETRIC CAD SKETCHES , author=. The International Conference on Learning Representations (ICLR) , year=

work page
[30]

European Conference on Computer Vision , pages=

Extrudenet: Unsupervised inverse sketch-and-extrude for shape parsing , author=. European Conference on Computer Vision , pages=. 2022 , organization=

work page 2022
[31]

ACM Transactions on Graphics (TOG) , volume=

Brepgen: A b-rep generative diffusion model with structured latent geometry , author=. ACM Transactions on Graphics (TOG) , volume=. 2024 , publisher=

work page 2024
[32]

DreamFusion: Text-to-3D using 2D Diffusion

Dreamfusion: Text-to-3d using 2d diffusion , author=. arXiv preprint arXiv:2209.14988 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[33]

Shap-E: Generating Conditional 3D Implicit Functions

Shap-e: Generating conditional 3d implicit functions , author=. arXiv preprint arXiv:2305.02463 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

A point set generation network for 3d object reconstruction from a single image , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[35]

European Conference on Computer Vision , pages=

Convolutional occupancy networks , author=. European Conference on Computer Vision , pages=. 2020 , organization=

work page 2020
[36]

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Chatglm: A family of large language models from glm-130b to glm-4 all tools , author=. arXiv preprint arXiv:2406.12793 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[37]

MiniMax-01: Scaling Foundation Models with Lightning Attention

Minimax-01: Scaling foundation models with lightning attention , author=. arXiv preprint arXiv:2501.08313 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Hierarchical Neural Coding for Controllable

Xu, Xiang and Jayaraman, Pradeep Kumar and Lambourne, Joseph G and Willis, Karl. Hierarchical Neural Coding for Controllable. International Conference on Machine Learning (

work page

[2] [2]

Wu, Rundi and Xiao, Chang and Zheng, Changxi , urldate =. 2021. doi:10.1109/ICCV48922.2021.00670 , shorttitle =

work page doi:10.1109/iccv48922.2021.00670 2021

[3] [3]

Xu, Xiang and Willis, Karl D. D. and Lambourne, Joseph G. and Cheng, Chin-Yi and Jayaraman, Pradeep Kumar and Furukawa, Yasutaka , urldate =. doi:10.48550/ARXIV.2207.04632 , shorttitle =

work page doi:10.48550/arxiv.2207.04632

[4] [4]

and Desai, Nishkrit and Willis, Karl D

Jayaraman, Pradeep Kumar and Lambourne, Joseph G. and Desai, Nishkrit and Willis, Karl D. D. and Sanghi, Aditya and Morris, Nigel J. W. , urldate =. doi:10.48550/ARXIV.2203.13944 , shorttitle =

work page doi:10.48550/arxiv.2203.13944

[5] [5]

Text2CAD: Generating Sequential

Khan, Mohammad Sadil and Sinha, Sankalp and Sheikh, Talha Uddin and Stricker, Didier and Ali, Sk Aziz and Afzal, Muhammad Zeshan , urldate =. Text2CAD: Generating Sequential. doi:10.48550/ARXIV.2409.17106 , shorttitle =

work page doi:10.48550/arxiv.2409.17106

[6] [6]

Text-to-CadQuery: A New Paradigm for CADgenerationwithscalablelargemodelcapabilities

Xie, Haoyang and Ju, Feng , urldate =. Text-to-. doi:10.48550/ARXIV.2505.06507 , shorttitle =

work page doi:10.48550/arxiv.2505.06507

[7] [7]

CAD-Recode: Reverse engineering CAD code from point clouds.arXiv preprint arXiv:2412.14042, 2024

Rukhovich, Danila and Dupont, Elona and Mallis, Dimitrios and Cherenkova, Kseniya and Kacem, Anis and Aouada, Djamila , urldate =. doi:10.48550/ARXIV.2412.14042 , shorttitle =

work page doi:10.48550/arxiv.2412.14042

[8] [8]

Koch, Sebastian and Matveev, Albert and Jiang, Zhongshi and Williams, Francis and Artemov, Alexey and Burnaev, Evgeny and Alexa, Marc and Zorin, Denis and Panozzo, Daniele , urldate =. 2019. doi:10.1109/CVPR.2019.00983 , shorttitle =

work page doi:10.1109/cvpr.2019.00983 2019

[9] [9]

Willis, Karl D. D. and Pu, Yewen and Luo, Jieliang and Chu, Hang and Du, Tao and Lambourne, Joseph G. and Solar-Lezama, Armando and Matusik, Wojciech , urldate =. Fusion 360 gallery: a dataset and environment for programmatic. doi:10.1145/3450626.3459818 , shorttitle =

work page doi:10.1145/3450626.3459818

[10] [10]

2026 , url =

Claude , version =. 2026 , url =

work page 2026

[11] [11]

doi:10.48550/ARXIV.2106.02711 , shorttitle =

Para, Wamiq Reyaz and Bhat, Shariq Farooq and Guerrero, Paul and Kelly, Tom and Mitra, Niloy and Guibas, Leonidas and Wonka, Peter , urldate =. doi:10.48550/ARXIV.2106.02711 , shorttitle =

work page doi:10.48550/arxiv.2106.02711

[12] [12]

Proceedings of the 40th International Conference on Machine Learning , pages=

Hierarchical neural coding for controllable CAD model generation , author=. Proceedings of the 40th International Conference on Machine Learning , pages=

work page

[13] [13]

International Design Engineering Technical Conferences and Computers and Information in Engineering Conference , volume=

Cad-coder: An open-source vision-language model for computer-aided design code generation , author=. International Design Engineering Technical Conferences and Computers and Information in Engineering Conference , volume=. 2025 , organization=

work page 2025

[14] [14]

International Design Engineering Technical Conferences and Computers and Information in Engineering Conference , publisher =

Li, Xingang and Sun, Yuewan and Sha, Zhenghui , date =. International Design Engineering Technical Conferences and Computers and Information in Engineering Conference , publisher =

work page

[15] [15]

Brep2Seq: a dataset and hierarchical deep learning network for reconstruction and generation of computer-aided design models , volume =

Zhang, Shuming and Guan, Zhidong and Jiang, Hao and Ning, Tao and Wang, Xiaodong and Tan, Pingan , date =. Brep2Seq: a dataset and hierarchical deep learning network for reconstruction and generation of computer-aided design models , volume =

work page

[16] [16]

Chen, Mark and Tworek, Jerry and Jun, Heewoo and Yuan, Qiming and Pinto, Henrique Ponde de Oliveira and Kaplan, Jared and Edwards, Harri and Burda, Yuri and Joseph, Nicholas and Brockman, Greg and Ray, Alex and Puri, Raul and Krueger, Gretchen and Petrov, Michael and Khlaaf, Heidy and Sastry, Girish and Mishkin, Pamela and Chan, Brooke and Gray, Scott and...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2107.03374

[17] [17]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

arXiv preprint arXiv:2501.19054 , year=

Text-to-cad generation through infusing visual feedback in large language models , author=. arXiv preprint arXiv:2501.19054 , year=

work page arXiv

[19] [19]

GPT-4 Technical Report

OpenAI and Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and Avila, Red and Babuschkin, Igor and Balaji, Suchir and Balcom, Valerie and Baltescu, Paul and Bao, Haiming and Bavarian, Mohammad and Belgum, Jeff a...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774

[20] [20]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

doi:10.48550/ARXIV.2405.04434 , shorttitle =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2405.04434

[22] [22]

Qwen2.5-Coder Technical Report

Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Zhang, Lei and Liu, Tianyu and Zhang, Jiajun and Yu, Bowen and Lu, Keming and Dang, Kai and Fan, Yang and Zhang, Yichang and Yang, An and Men, Rui and Huang, Fei and Zheng, Bo and Miao, Yibo and Quan, Shanghaoran and Feng, Yunlong and Ren, Xingzhang and Ren, Xuancheng and Zhou...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2409.12186

[23] [23]

Efficient memory management for large language model serving with pagedattention,

Kwon, Woosuk and Li, Zhuohan and Zhuang, Siyuan and Sheng, Ying and Zheng, Lianmin and Yu, Cody Hao and Gonzalez, Joseph and Zhang, Hao and Stoica, Ion , urldate =. Efficient Memory Management for Large Language Model Serving with. Proceedings of the 29th Symposium on Operating Systems Principles , publisher =. doi:10.1145/3600006.3613165 , eventtitle =

work page doi:10.1145/3600006.3613165

[24] [24]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

Sort , volume=

Measuring Mathematical Problem Solving With the MATH Dataset , author=. Sort , volume=

work page

[26] [26]

Gaia: a benchmark for general ai assistants , author=

work page

[27] [27]

1999 , publisher=

Principles of cad/cam/cae systems , author=. 1999 , publisher=

work page 1999

[28] [28]

StarCoder: may the source be with you!

Starcoder: may the source be with you! , author=. arXiv preprint arXiv:2305.06161 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

The International Conference on Learning Representations (ICLR) , year=

VITRUVION: A GENERATIVE MODEL OF PARAMETRIC CAD SKETCHES , author=. The International Conference on Learning Representations (ICLR) , year=

work page

[30] [30]

European Conference on Computer Vision , pages=

Extrudenet: Unsupervised inverse sketch-and-extrude for shape parsing , author=. European Conference on Computer Vision , pages=. 2022 , organization=

work page 2022

[31] [31]

ACM Transactions on Graphics (TOG) , volume=

Brepgen: A b-rep generative diffusion model with structured latent geometry , author=. ACM Transactions on Graphics (TOG) , volume=. 2024 , publisher=

work page 2024

[32] [32]

DreamFusion: Text-to-3D using 2D Diffusion

Dreamfusion: Text-to-3d using 2d diffusion , author=. arXiv preprint arXiv:2209.14988 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

Shap-E: Generating Conditional 3D Implicit Functions

Shap-e: Generating conditional 3d implicit functions , author=. arXiv preprint arXiv:2305.02463 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

A point set generation network for 3d object reconstruction from a single image , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page

[35] [35]

European Conference on Computer Vision , pages=

Convolutional occupancy networks , author=. European Conference on Computer Vision , pages=. 2020 , organization=

work page 2020

[36] [36]

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Chatglm: A family of large language models from glm-130b to glm-4 all tools , author=. arXiv preprint arXiv:2406.12793 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[37] [37]

MiniMax-01: Scaling Foundation Models with Lightning Attention

Minimax-01: Scaling foundation models with lightning attention , author=. arXiv preprint arXiv:2501.08313 , year=

work page internal anchor Pith review Pith/arXiv arXiv