CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
Pith reviewed 2026-05-19 14:06 UTC · model grok-4.3
The pith
CAD-Coder lets language models generate valid complex CAD models from text by producing optimized CadQuery scripts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By casting text-to-CAD as the generation of parametric CadQuery scripts and training with a two-stage pipeline of supervised fine-tuning followed by reinforcement learning under a geometric reward that combines Chamfer Distance with format compliance, together with chain-of-thought planning, the approach enables large language models to produce diverse, valid, and complex CAD models directly from natural language.
What carries the argument
The CAD-specific reward that adds Chamfer Distance for geometric fidelity to a format reward for script correctness, applied inside Group Reward Policy Optimization after supervised fine-tuning.
If this is right
- Language models can output executable code that directly produces geometrically correct CAD parts.
- The resulting models integrate immediately with standard CAD tools for further editing or validation.
- Chain-of-thought reasoning enables the model to handle more intricate design sequences than direct generation methods.
- The same training recipe can scale to larger datasets for continued gains in model complexity.
Where Pith is reading between the lines
- Designers could describe a mechanical part in plain words and receive a production-ready CAD file without drawing it themselves.
- The method could be combined with existing CAD libraries to support assemblies or parametric families of parts.
- Reward-driven code generation of this form might transfer to other engineering domains that rely on scripted geometry.
Load-bearing premise
A reward signal built from Chamfer Distance and code format compliance is enough to guarantee that the generated models are both geometrically accurate and ready for practical use.
What would settle it
Running the generated CAD models through manufacturing simulation or expert review and finding frequent cases of non-manufacturable topology or hidden errors despite low Chamfer Distance scores would show the reward does not suffice.
Figures
read the original abstract
In this work, we introduce CAD-Coder, a novel framework that reformulates text-to-CAD as the generation of CadQuery scripts - a Python-based, parametric CAD language. This representation enables direct geometric validation, a richer modeling vocabulary, and seamless integration with existing LLMs. To further enhance code validity and geometric fidelity, we propose a two-stage learning pipeline: (1) supervised fine-tuning on paired text-CadQuery data, and (2) reinforcement learning with Group Reward Policy Optimization (GRPO), guided by a CAD-specific reward comprising both a geometric reward (Chamfer Distance) and a format reward. We also introduce a chain-of-thought (CoT) planning process to improve model reasoning, and construct a large-scale, high-quality dataset of 110K text-CadQuery-3D model triplets and 1.5K CoT samples via an automated pipeline. Extensive experiments demonstrate that CAD-Coder enables LLMs to generate diverse, valid, and complex CAD models directly from natural language, advancing the state of the art of text-to-CAD generation and geometric reasoning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CAD-Coder, a framework reformulating text-to-CAD generation as the production of CadQuery Python scripts. It uses a two-stage pipeline consisting of supervised fine-tuning on a constructed 110K text-CadQuery-3D model dataset followed by Group Reward Policy Optimization (GRPO) reinforcement learning. The RL stage is guided by a composite reward that includes a geometric component based on Chamfer Distance (computed after point sampling) and a format reward, augmented by chain-of-thought planning. The central claim is that this approach enables LLMs to generate diverse, valid, and complex CAD models directly from natural language, advancing the state of the art in text-to-CAD and geometric reasoning.
Significance. If the empirical claims hold, the work would be significant for the graphics and CAD communities by demonstrating a scalable way to leverage LLMs for parametric CAD script generation, potentially lowering barriers to complex 3D modeling. The automated construction of a large paired dataset and the explicit use of an external geometric metric (Chamfer Distance) rather than self-referential signals are positive features. However, the practical impact hinges on whether the chosen reward produces models that are not only geometrically close but also topologically valid and manufacturable.
major comments (2)
- [Abstract and Section 4] Abstract and reward definition (Section 4): The geometric reward relies on Chamfer Distance after point sampling from the generated CadQuery output. Chamfer Distance quantifies surface proximity but is insensitive to self-intersections, non-manifold edges, invalid B-rep topology, or parametric constraints that would cause the model to fail as a solid in a CAD kernel. Because the paper positions this reward (together with the format reward) as the mechanism delivering 'valid' and 'practically usable' models without further human or domain-specific validation, this choice is load-bearing for the central claim; additional topological or manufacturing-validity metrics would be required to substantiate the claim.
- [Section 5] Section 5 (Experiments): The abstract asserts 'extensive experiments' showing superiority in diversity, validity, and complexity, yet the provided description contains no quantitative tables, baseline comparisons, ablation results on the GRPO coefficients, or error analysis of failure modes (e.g., rate of topologically invalid outputs). Without these details it is impossible to evaluate whether the reported gains are robust or whether the Chamfer-based reward actually correlates with downstream usability.
minor comments (2)
- [Section 3] The construction pipeline for the 110K dataset and the 1.5K CoT samples is mentioned but lacks sufficient detail on filtering criteria, quality assurance, or diversity metrics; a dedicated subsection or appendix would improve reproducibility.
- [Section 4] Notation for the GRPO objective and the precise weighting between geometric and format rewards should be formalized with an equation rather than prose description.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below, indicating where revisions have been made to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and Section 4] Abstract and reward definition (Section 4): The geometric reward relies on Chamfer Distance after point sampling from the generated CadQuery output. Chamfer Distance quantifies surface proximity but is insensitive to self-intersections, non-manifold edges, invalid B-rep topology, or parametric constraints that would cause the model to fail as a solid in a CAD kernel. Because the paper positions this reward (together with the format reward) as the mechanism delivering 'valid' and 'practically usable' models without further human or domain-specific validation, this choice is load-bearing for the central claim; additional topological or manufacturing-validity metrics would be required to substantiate the claim.
Authors: We thank the referee for this important observation. Chamfer Distance indeed measures surface proximity and is insensitive to topological defects such as self-intersections or non-manifold geometry. In our pipeline the format reward requires the CadQuery script to execute successfully before point sampling occurs, providing a basic validity filter. Nevertheless, we agree this does not fully address manufacturability or complex topological validity. In the revised manuscript we have added an explicit limitations paragraph in Section 4, clarified the proxy nature of the current reward, and included supplementary topological validation rates (e.g., successful B-rep solid checks) in the experimental results. revision: partial
-
Referee: [Section 5] Section 5 (Experiments): The abstract asserts 'extensive experiments' showing superiority in diversity, validity, and complexity, yet the provided description contains no quantitative tables, baseline comparisons, ablation results on the GRPO coefficients, or error analysis of failure modes (e.g., rate of topologically invalid outputs). Without these details it is impossible to evaluate whether the reported gains are robust or whether the Chamfer-based reward actually correlates with downstream usability.
Authors: We agree that the experimental presentation must be more comprehensive to support the claims. The full manuscript contains baseline comparisons and quantitative metrics, but we have substantially expanded Section 5 in the revision: new Table 1 reports validity, diversity, and complexity scores against baselines; Table 2 presents ablations on GRPO reward coefficients; and a dedicated error-analysis subsection now quantifies failure modes including the rate of topologically invalid outputs. These additions allow direct assessment of robustness and the relationship between the reward signal and practical usability. revision: yes
Circularity Check
No circularity: derivation relies on external Chamfer Distance metric and independent dataset construction
full rationale
The paper describes a standard two-stage pipeline of supervised fine-tuning on an externally constructed 110K text-CadQuery dataset followed by GRPO reinforcement learning. The geometric reward is defined using Chamfer Distance computed against ground-truth point samples plus a separate format reward; neither quantity is defined in terms of the model's own outputs or predictions. No self-citations, uniqueness theorems, or ansatzes from prior author work are invoked as load-bearing steps in the provided abstract and description. The central claim that the resulting models are valid and complex is an empirical outcome of optimizing the external proxy rather than a definitional tautology. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- GRPO reward coefficients
axioms (1)
- domain assumption CadQuery scripts provide a sufficiently expressive and directly executable representation for complex parametric CAD models
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
reinforcement learning with Group Reward Policy Optimization (GRPO), guided by a CAD-specific reward comprising both a geometric reward (Chamfer Distance) and a format reward
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
two-stage learning pipeline: (1) supervised fine-tuning ... (2) reinforcement learning with GRPO
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
ArtiCAD: Articulated CAD Assembly Design via Multi-Agent Code Generation
ArtiCAD presents the first training-free multi-agent framework that generates articulated, editable CAD assemblies from text or images by predicting assembly relationships early and using validation with rollback.
-
InCoder-32B-Thinking: Industrial Code World Model for Thinking
InCoder-32B-Thinking uses error-feedback synthesized thinking traces and a code world model to reach top open-source scores on general and industrial code benchmarks including 81.3% on LiveCodeBench and 84.0% on CAD-Coder.
-
Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection
Pointer-CAD unifies B-Rep geometry with command sequences via pointer-based entity selection, allowing LLMs to perform complex CAD edits while cutting topological errors from quantization.
Reference graph
Works this paper leans on
-
[1]
Program Synthesis with Large Language Models
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et al. Program synthesis with large language models.arXiv preprint arXiv:2108.07732, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[2]
Query2cad: Generating cad models using natural language queries
Akshay Badagabettu, Sai Sravan Yarlagadda, and Amir Barati Farimani. Query2cad: Generating cad models using natural language queries.arXiv preprint arXiv:2406.00144, 2024
-
[3]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...
work page 2021
-
[4]
Tianrun Chen, Chunan Yu, Yuanqi Hu, Jing Li, Tao Xu, Runlong Cao, Lanyun Zhu, Ying Zang, Yong Zhang, Zejian Li, et al. Img2cad: Conditioned 3d cad model generation from single image with structured visual geometry.arXiv preprint arXiv:2410.03417, 2024
-
[5]
Haoxuan Deng, Samir Khan, and John Ahmet Erkoyuncu. An investigation on utilizing large language model for industrial computer-aided design automation.Procedia CIRP, 128:221–226, 2024
work page 2024
-
[6]
Yuanzhe Deng, James Chen, and Alison Olechowski. What Sets Proficient and Expert Users Apart? Results of a Computer-Aided Design Experiment.Journal of Mechanical Design, 146(1):011401, 10 2023
work page 2023
-
[7]
Cadquery: A python parametric cad scripting framework
CADQuery Developers. Cadquery: A python parametric cad scripting framework. https: //cadquery.readthedocs.io/, 2024. Accessed: 2024-10-22
work page 2024
-
[8]
Haoxiang Guo, Shilin Liu, Hao Pan, Yang Liu, Xin Tong, and Baining Guo. Complexgen: Cad reconstruction by b-rep chain complex generation.ACM Transactions on Graphics (TOG), 2022
work page 2022
-
[9]
Jiangyong Huang, Baoxiong Jia, Yan Wang, Ziyu Zhu, Xiongkun Linghu, Qing Li, Song- Chun Zhu, and Siyuan Huang. Unveiling the mist over 3d vision-language understanding: Object-centric evaluation with chain-of-analysis.arXiv preprint arXiv:2503.22420, 2025
-
[10]
Solidgen: An autoregressive model for direct b-rep synthesis
Pradeep Kumar Jayaraman, Joseph G Lambourne, Nishkrit Desai, Karl DD Willis, Aditya Sanghi, and Nigel JW Morris. Solidgen: An autoregressive model for direct b-rep synthesis. Transaction in Machine Learning Research, 2023
work page 2023
-
[11]
A survey of reinforcement learning from human feedback, 2024
Timo Kaufmann, Paul Weng, Viktor Bengs, and Eyke Hüllermeier. A survey of reinforcement learning from human feedback, 2024
work page 2024
-
[12]
Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-signet: Cad language inference from point clouds using layer-wise sketch instance guided attention. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4713–4722, 2024
work page 2024
-
[13]
Mohammad Sadil Khan, Sankalp Sinha, Talha Uddin, Didier Stricker, Sk Aziz Ali, and Muham- mad Zeshan Afzal. Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024
work page 2024
-
[14]
Abc: A big cad model dataset for geometric deep learning
Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, and Daniele Panozzo. Abc: A big cad model dataset for geometric deep learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9601–9611, 2019. 15
work page 2019
-
[15]
Jiahao Li, Weijian Ma, Xueyang Li, Yunzhong Lou, Guichun Zhou, and Xiangdong Zhou. Cad-llama: Leveraging large language models for computer-aided design parametric 3d model generation.arXiv preprint arXiv:2505.04481, 2025
-
[16]
Xueyang Li, Yu Song, Yunzhong Lou, and Xiangdong Zhou. Cad translator: An effective drive for text to 3d parametric computer-aided design generative modeling. InProceedings of the 32nd ACM International Conference on Multimedia, pages 8461–8470, 2024
work page 2024
-
[17]
Hola: B-rep generation using a holistic latent representation.arXiv preprint arXiv:2504.14257, 2025
Yilin Liu, Duoteng Xu, Xingyao Yu, Xiang Xu, Daniel Cohen-Or, Hao Zhang, and Hui Huang. Hola: B-rep generation using a holistic latent representation.arXiv preprint arXiv:2504.14257, 2025
-
[18]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019
work page 2019
-
[19]
Dimitrios Mallis, Ahmet Serdar Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-assistant: Tool- augmented vllms as generic cad task solvers?arXiv preprint arXiv:2412.13810, 2024
-
[20]
Josh OpenAI, Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
David Robertson and Thomas J Allen. Cad system use and engineering performance.IEEE Transactions on Engineering Management, 40(3):274–282, 1993
work page 1993
-
[22]
CAD-Recode: Reverse engineering CAD code from point clouds.arXiv preprint arXiv:2412.14042, 2024
Danila Rukhovich, Elona Dupont, Dimitrios Mallis, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-recode: Reverse engineering cad code from point clouds.arXiv preprint arXiv:2412.14042, 2024
-
[23]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[24]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework.arXiv preprint arXiv: 2409.19256, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[26]
Yuewan Sun, Xingang Li, and Zhenghui Sha. Large language models for computer-aided design (llm4cad) fine-tuned: Dataset and experiments.Journal of Mechanical Design, pages 1–19, 2025
work page 2025
-
[27]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[28]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
work page 2017
-
[29]
arXiv preprint arXiv:2501.19054 , year=
Ruiyu Wang, Yu Yuan, Shizhao Sun, and Jiang Bian. Text-to-cad generation through infusing visual feedback in large language models.arXiv preprint arXiv:2501.19054, 2025
-
[30]
Chi, Quoc V Le, and Denny Zhou
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed H. Chi, Quoc V Le, and Denny Zhou. Chain of thought prompting elicits reasoning in large language models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. 16
work page 2022
-
[31]
Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4):1–24, 2021
work page 2021
-
[32]
Deepcad: A deep generative network for computer- aided design models
Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer- aided design models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6772–6782, 2021
work page 2021
-
[33]
Cad-mllm: Unifying multimodality-conditioned cad generation with mllm
Jingwei Xu, Zibo Zhao, Chenyu Wang, Wen Liu, Yi Ma, and Shenghua Gao. Cad-mllm: Unify- ing multimodality-conditioned cad generation with mllm.arXiv preprint arXiv:2411.04954, 2024
-
[34]
Brepgen: A b-rep generative diffusion model with structured latent geometry.ACM SIGGRAPH, 2024
Xiang Xu, Joseph G Lambourne, Pradeep Kumar Jayaraman, Zhengqing Wang, Karl DD Willis, and Yasutaka Furukawa. Brepgen: A b-rep generative diffusion model with structured latent geometry.ACM SIGGRAPH, 2024
work page 2024
-
[35]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
Jirong Zha, Yuxuan Fan, Xiao Yang, Chen Gao, and Xinlei Chen. How to enable llm with 3d capacity? a survey of spatial reasoning in llm.arXiv preprint arXiv:2504.05786, 2025
-
[37]
Weichen Zhang, Ruiying Peng, Chen Gao, Jianjie Fang, Xin Zeng, Kaiyuan Li, Ziyou Wang, Jinqiang Cui, Xin Wang, Xinlei Chen, et al. The point, the vision and the text: Does point cloud boost spatial reasoning of large language models?arXiv preprint arXiv:2504.04540, 2025. 17
work page internal anchor Pith review arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.