arxiv: 2605.07807 · v1 · submitted 2026-05-08 · 💻 cs.CV · cs.AI· cs.LG· cs.RO

Recognition: 2 theorem links

· Lean Theorem

Text-to-CAD Evaluation with CADTests

Dimitrios Mallis , Marco Wang , Ahmet Serdar Karadeniz , Elisa Ricci , Anis Kacem , Djamila Aouada

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:12 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LGcs.RO

keywords Text-to-CADCAD model generationautomated testingbenchmarkgeometric verificationtopological constraintsgenerative design

0 comments

The pith

CADTestBench uses executable tests to evaluate and guide Text-to-CAD model generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles the difficulty of measuring how well CAD models generated from text descriptions match their prompts. It introduces CADTests as automated checks that confirm whether a model meets the exact geometric and topological conditions described in the text. With these tests the authors build CADTestBench to compare existing Text-to-CAD systems and then show that the same tests can steer generation directly, producing simple baselines that outperform current methods. This matters because reliable, objective evaluation can speed up the development of automated design tools by giving clear signals about model correctness.

Core claim

CADTestBench is the first test-based benchmark for Text-to-CAD, built on CADTests that execute checks for geometric and topological compliance with the input prompt; the same tests can also be used to guide model generation and yield baselines that surpass recent methods.

What carries the argument

CADTests: executable software tests that verify whether a generated CAD model satisfies the geometric and topological requirements implied by the text prompt.

If this is right

Existing Text-to-CAD methods can be ranked by the fraction of CADTests they satisfy on a fixed prompt set.
Generation pipelines can incorporate CADTest feedback at inference time to improve output quality.
Training or fine-tuning objectives can be defined directly from test pass rates rather than indirect similarity scores.
New methods can be developed by searching for models that maximize the number of satisfied CADTests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The testing approach could extend to other structured generation tasks such as text-to-3D or parametric design where precise constraint satisfaction is required.
Integrating CADTests into a reinforcement-learning loop might produce models that satisfy constraints more reliably than current supervised approaches.
The benchmark data could be used to identify systematic failure modes in current generators, such as topology errors that visual metrics overlook.

Load-bearing premise

CADTests accurately and completely capture every geometric and topological requirement that an arbitrary text prompt implies, without false passes or false failures.

What would settle it

A text prompt and generated model pair where the model passes all applicable CADTests yet fails to match the prompt's intended shape or topology, or where a matching model fails the tests.

Figures

Figures reproduced from arXiv: 2605.07807 by Ahmet Serdar Karadeniz, Anis Kacem, Dimitrios Mallis, Djamila Aouada, Elisa Ricci, Marco Wang.

**Figure 1.** Figure 1: CADTESTS are executable programs that verify prompt specifications directly on the generated geometry. CADTESTS can account for ambiguous design prompts (left) and identify subtle structural errors (right). The figure ommits CADTEST implementations for clarity. SPoC [23] and MBPP [4] evaluate generated code by executing test suites and a sample is correct only if it passes all tests. In this work we introd… view at source ↗

**Figure 2.** Figure 2: Illustration of the proposed CADTest synthesis pipeline. (Top) A design prompt and reference CAD program are provided to an LLM planner to generate CAD mutants. (Bottom) The planner generates a test suite that is iteratively refined using execution feedback from the passing and mutation sets. In this example, a mutation that is not killed in R = 1 is detected in R = N after adding a test comparing the volu… view at source ↗

**Figure 3.** Figure 3: Category-level evaluation of Text-to-CAD baselines. Results are reported in terms percent [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Example generations from the CADTest-Claude baseline evaluated via [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of (left) CADTESTS counts and (right) discovered prompt requirement groups per benchmark sample for abstract and detailed prompts. Group Description: The result should be one single 3D clamping bracket object (one solid / one shell). Group Name: single_solid_output Verifies the model is a single solid Verifies the model has exactly 1 shell (single closed surface envelope) CADTests Group Descri… view at source ↗

**Figure 6.** Figure 6: Examples of prompt requirement groups for a [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: (left) Counts of generated CADTests and as well as prompt requirement groups across the detailed and abstract partitions of the CADTESTBENCH. (right) Share of CADTESTS by category for the detailed and abstract prompts. In [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: shows the frequency of different CADQuery API calls across the generated CADTESTS. An example including CADTest code snippets for the generated test suite is shown in [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Example CADTEST descriptions from multiple CADTESTBENCH samples, grouped according to the six CADTEST categories. 4 [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Example implementations of generated CADTESTS for a CADTESTBENCH sample using the CADQuery API. 5 [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 12.** Figure 12: import cadquery as cq def create_cad() -> cq.Workplane: cylinder_diameter:float = 1.31858 cylinder_height:float = 0.468766 cylinder:cq.Workplane = ( cq.Workplane("XY") .cylinder(cylinder_height, cylinder_diameter/2) ) board_length:float = 1.5 board_width:float = 0.1875 board_height:float = 0.075 board:cq.Workplane = ( cq.Workplane("XY") .box(board_length, board_width, board_height) ) part:cq.Workplane = (… view at source ↗

**Figure 11.** Figure 11: Depiction of CAD mutants. Our mutation generation pipeline takes a design prompt [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 13.** Figure 13: Generation trajectory from the CADTest baseline. The model produces both the CAD program and a set of tests used to verify the generated geometry. Execution logs are returned to the planner as feedback, revealing that the boolean union resulted in two separate solids. The planner then adjusts the design so the parts slightly overlap [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: ROC curves for the human study evaluation. Each curve shows true positive rate vs. [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗

**Figure 15.** Figure 15: Qualitative evaluation results for the Skilled, ReAct, and CADTest baselines using GPT-5.2 and Claude 4.6 Sonnet planners. The figure shows the generated BRep for each baseline together with the score assigned by the CADTEST suite. We report the pass rate (PR), which equals 1 when all tests are passed, and the requirement score (RS), defined as the fraction of prompt requirements satisfied for each sample… view at source ↗

read the original abstract

Text-to-CAD has recently emerged as an important task with the potential to substantially accelerate design workflows. Despite its significance, there has been surprisingly little work on Text-to-CAD evaluation, and assessing CAD model generation performance remains a considerable challenge. In this work, we introduce a new evaluation perspective for Text-to-CAD based on automated testing. We propose CADTestBench, the first test-based benchmark for Text-to-CAD, based on CADTests, executable software tests that verify whether a generated CAD model satisfies the geometric and topological requirements of the input prompt. Using CADTestBench, we conduct comprehensive benchmarking of recent Text-to-CAD methods and further demonstrate that CADTests can also guide CAD model generation, yielding simple baselines that surpass performance of current methods. CADTestBench code and data are available at GitHub and Hugging Face dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CADTestBench shifts Text-to-CAD eval to executable tests and shows they can guide generation, but the approach stands or falls on unshown test accuracy.

read the letter

The paper's core move is to replace similarity-based scores with CADTests, which are executable checks that a generated CAD model meets the geometric and topological specs in a text prompt. They package this into CADTestBench, run it on recent methods, and report that feeding the tests back into generation produces simple baselines that beat prior work. Releasing the code and dataset on GitHub and Hugging Face is the most immediately useful part; it lets others inspect and extend the tests directly.

Referee Report

3 major / 1 minor

Summary. The paper introduces CADTestBench, the first test-based benchmark for Text-to-CAD, based on CADTests—executable software tests that verify whether a generated CAD model satisfies the geometric and topological requirements of the input text prompt. Using this benchmark, the authors conduct comprehensive evaluation of recent Text-to-CAD methods and demonstrate that CADTests can also guide CAD model generation, producing simple baselines that surpass the performance of current methods. The code and data are released publicly via GitHub and Hugging Face.

Significance. If the CADTests are shown to be valid and complete, the work provides an objective, automated evaluation framework for an emerging task where assessment has been difficult, potentially standardizing comparisons and enabling better generation via test guidance. The public release of resources is a clear strength that could accelerate progress in the field.

major comments (3)

[Abstract] Abstract: The claims of 'comprehensive benchmarking' of recent methods and that 'guided baselines surpass performance of current methods' are asserted without any quantitative results, metrics, baseline details, or controls provided. This leaves the central empirical claims unverifiable and unsupported in the available text.
[CADTest construction (likely §3)] CADTest construction (likely §3): No details are given on the automated test generation process from arbitrary text prompts, nor is there any independent validation (e.g., human evaluation, inter-annotator agreement, or held-out test set) to confirm that CADTests faithfully capture all implied geometric and topological constraints without false positives or negatives. This is load-bearing, as both the benchmarking results and the guidance superiority claim rest on test correctness.
[Experiments and guidance sections (likely §4-5)] Experiments and guidance sections (likely §4-5): The assertion that CADTests can guide generation to outperform prior methods requires explicit description of the guidance mechanism, the 'simple baselines,' the evaluation metrics, and controls for test validity. Without these, and given the risk of incomplete tests for non-trivial prompts, the superiority cannot be established.

minor comments (1)

[Abstract] Abstract: The statement that 'CADTestBench code and data are available at GitHub and Hugging Face dataset' lacks specific repository names, links, or DOIs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive review. We appreciate the acknowledgment of CADTestBench's potential to standardize evaluation in Text-to-CAD and the value of the public resource release. Below we respond point-by-point to the major comments and commit to revisions that will strengthen the clarity, verifiability, and completeness of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The claims of 'comprehensive benchmarking' of recent methods and that 'guided baselines surpass performance of current methods' are asserted without any quantitative results, metrics, baseline details, or controls provided. This leaves the central empirical claims unverifiable and unsupported in the available text.

Authors: The abstract is deliberately concise and therefore omits specific numerical results. The full manuscript reports quantitative benchmarking results, pass-rate metrics, baseline specifications, and experimental controls in Sections 4 and 5. To address the concern, we will revise the abstract to include a brief summary of the key quantitative findings (e.g., overall pass rates and relative improvements of the guided baselines). revision: yes
Referee: [CADTest construction (likely §3)] CADTest construction (likely §3): No details are given on the automated test generation process from arbitrary text prompts, nor is there any independent validation (e.g., human evaluation, inter-annotator agreement, or held-out test set) to confirm that CADTests faithfully capture all implied geometric and topological constraints without false positives or negatives. This is load-bearing, as both the benchmarking results and the guidance superiority claim rest on test correctness.

Authors: Section 3 describes the automated CADTest generation pipeline, which parses text prompts to extract geometric and topological constraints and emits executable test code. We agree that additional explicit detail and independent validation would strengthen the work. We will expand the section with a precise algorithmic description of the generation process and will add a new validation subsection reporting human evaluation results on a held-out sample of prompts, including inter-annotator agreement statistics and an analysis of false-positive/negative rates. revision: yes
Referee: [Experiments and guidance sections (likely §4-5)] Experiments and guidance sections (likely §4-5): The assertion that CADTests can guide generation to outperform prior methods requires explicit description of the guidance mechanism, the 'simple baselines,' the evaluation metrics, and controls for test validity. Without these, and given the risk of incomplete tests for non-trivial prompts, the superiority cannot be established.

Authors: Sections 4 and 5 present the experimental protocol, the guidance procedure (CADTests used for iterative verification and selection), the simple baselines (lightweight LLM-based generators augmented by test-driven filtering), and the primary metric (CADTest pass rate). We acknowledge that these descriptions can be made more explicit. We will add pseudocode for the guidance loop, full baseline configurations, ablation studies that isolate the contribution of test guidance, and a limitations paragraph that directly discusses the risk of incomplete tests for complex prompts together with mitigation strategies. revision: yes

Circularity Check

0 steps flagged

No circularity in CADTestBench derivation or test-guided generation claims

full rationale

The paper introduces CADTestBench as a novel test-based benchmark using executable CADTests to verify geometric/topological requirements implied by text prompts. Benchmarking of prior Text-to-CAD methods and the demonstration that CADTests can guide generation to produce superior simple baselines are presented as independent empirical applications of the new tests. No load-bearing steps reduce by construction to self-definition, fitted inputs renamed as predictions, or self-citation chains; the central claims rest on the external validity of the proposed tests rather than tautological equivalence to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central contribution rests on the assumption that executable tests can reliably encode prompt requirements; no free parameters or invented physical entities are mentioned, but the new benchmark and test framework are introduced without upstream evidence.

axioms (1)

domain assumption Executable software tests can verify geometric and topological properties of CAD models against text prompt requirements
Foundational premise for CADTests and the benchmark; invoked throughout the abstract.

invented entities (2)

CADTests no independent evidence
purpose: Executable tests that check if a CAD model satisfies prompt requirements
New concept introduced to enable the benchmark and guidance method.
CADTestBench no independent evidence
purpose: Test-based benchmark for Text-to-CAD evaluation and guidance
New benchmark constructed from CADTests.

pith-pipeline@v0.9.0 · 5461 in / 1268 out tokens · 50378 ms · 2026-05-11T02:12:50.784901+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CADTESTS are executable software tests verifying whether a generated CAD model meets the design specifications of the input prompt... implemented as a Python code snippet executed on the B-rep m... using selectors along with B-Rep inspection primitives, including topology counts, bounding box dimensions, areas and volumes...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We leverage mutation analysis both to measure the effectiveness of generated CADTESTS and to guide the design of more discriminative tests... mutation score is defined as the fraction of killed mutants

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

[1]

Generating cad code with vision-language models for 3d designs.ICLR, 2025

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Zaidi, Megan Langwasser, Wei Xu, and Matthew Gombolay. Generating cad code with vision-language models for 3d designs.ICLR, 2025

work page 2025
[2]

The claude model spec

Anthropic. The claude model spec. 2025. URL https://docs.anthropic.com/en/docs/ about-claude/claude-model-spec

work page 2025
[3]

Testing telecoms software with quviq quickcheck

Thomas Arts, John Hughes, Joakim Johansson, and Ulf Wiger. Testing telecoms software with quviq quickcheck. InProceedings of the 2006 ACM SIGPLAN Workshop on Erlang, 2006

work page 2006
[4]

Cai, Michael Terry, Quoc V

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V . Le, and Charles Sutton. Program synthesis with large language models.ArXiv, 2021

work page 2021
[5]

Fusion 360

Autodesk Inc. Fusion 360. https://www.autodesk.com/products/fusion-360, 2026. Integrated CAD, CAM, and CAE platform

work page 2026
[6]

Query2cad: Generating cad models using natural language queries.ArXiv, 2024

Akshay Badagabettu, Sai Sravan Yarlagadda, and Amir Barati Farimani. Query2cad: Generating cad models using natural language queries.ArXiv, 2024

work page 2024
[7]

Cadquery

CadQuery Contributors. Cadquery. https://cadquery.readthedocs.io/, 2024. Python- based parametric CAD scripting library

work page 2024
[8]

Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-V oss, William H

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé, Jared Kaplan, Harrison Edwards, Yura Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mo Bavarian, Clemens Winter, Phi...

work page 2021
[9]

Freecad, 2024

FreeCAD Community. Freecad, 2024. URLhttps://www.freecad.org

work page 2024
[10]

Solidworks

Dassault Systèmes. Solidworks. https://www.solidworks.com, 2026. Professional CAD software for solid modeling and mechanical design

work page 2026
[11]

Program mutation: A new approach to program testing.Infotech State of the Art Report, Software Testing, 1979

Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. Program mutation: A new approach to program testing.Infotech State of the Art Report, Software Testing, 1979

work page 1979
[12]

Transcad: A hierarchical transformer for cad sequence inference from point clouds

Elona Dupont, Kseniya Cherenkova, Dimitrios Mallis, Gleb Gusev, Anis Kacem, and Djamila Aouada. Transcad: A hierarchical transformer for cad sequence inference from point clouds. In ECCV, 2024

work page 2024
[13]

Cad-coder: Text-to-cad generation with chain-of-thought and geometric reward.ArXiv, 2025

Yandong Guan, Xilin Wang, Xingxi Ming, Jing Zhang, Dong Xu, and Qian Yu. Cad-coder: Text-to-cad generation with chain-of-thought and geometric reward.ArXiv, 2025

work page 2025
[14]

Measuring Coding Challenge Competence With APPS

Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Xiaodong Song, and Jacob Steinhardt. Measur- ing coding challenge competence with apps.ArXiv, abs/2105.09938, 2021

work page internal anchor Pith review arXiv 2021
[15]

Pierce, Thomas Arts, and Ulf Norell

John Hughes, Benjamin C. Pierce, Thomas Arts, and Ulf Norell. Mysteries of dropbox: Property- based testing of a distributed synchronization service. In2016 IEEE International Conference on Software Testing, Verification and Validation (ICST), 2016. 10

work page 2016
[16]

An analysis and survey of the development of mutation testing

Yue Jia and Mark Harman. An analysis and survey of the development of mutation testing. IEEE Transactions on Software Engineering, 2011

work page 2011
[17]

Cad-llama: Leveraging large language models for computer-aided design parametric 3d model generation

Li Jiahao, Ma Weijian, Li Xueyang, Lou Yunzhong, Zhou Guichun, and Zhou Xiangdong. Cad-llama: Leveraging large language models for computer-aided design parametric 3d model generation. InCVPR, 2025

work page 2025
[18]

Davinci: A single-stage architecture for constrained cad sketch inference

Ahmet Serdar Karadeniz, Dimitrios Mallis, Nesryne Mejri, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Davinci: A single-stage architecture for constrained cad sketch inference. InBMVC, 2024

work page 2024
[19]

Picasso: A feed-forward framework for parametric inference of cad sketches via rendering self-supervision

Ahmet Serdar Karadeniz, Dimitrios Mallis, Nesryne Mejri, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Picasso: A feed-forward framework for parametric inference of cad sketches via rendering self-supervision. InWACV, 2025

work page 2025
[20]

Micadangelo: Fine-grained reconstruction of constrained cad models from 3d scans.NeurIPS, 2025

Ahmet Serdar Karadeniz, Dimitrios Mallis, Danila Rukhovich, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Micadangelo: Fine-grained reconstruction of constrained cad models from 3d scans.NeurIPS, 2025

work page 2025
[21]

Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.NeurIPS, 2024

Mohammad Sadil Khan, Sankalp Sinha, Talha Uddin, Didier Stricker, Sk Aziz Ali, and Muham- mad Zeshan Afzal. Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.NeurIPS, 2024

work page 2024
[22]

cadrille: Multi-modal cad reconstruction with online reinforcement learning.ArXiv, 2025

Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna V orontsova, Anton Konushin, Vladislav Kurenkov, and Danila Rukhovich. cadrille: Multi-modal cad reconstruction with online reinforcement learning.ArXiv, 2025

work page 2025
[23]

Spoc: Search-based pseudocode to code.NIPS, 2019

Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alexander Aiken, and Percy Liang. Spoc: Search-based pseudocode to code.NIPS, 2019

work page 2019
[24]

Unsuper- vised translation of programming languages.NIPS, 2020

Marie-Anne Lachaux, Baptiste Rozière, Lowik Chanussot, and Guillaume Lample. Unsuper- vised translation of programming languages.NIPS, 2020

work page 2020
[25]

Lambourne, Karl D

J. Lambourne, Karl D. D. Willis, Pradeep Kumar Jayaraman, Aditya Sanghi, Peter Meltzer, and Hooman Shayani. Brepnet: A topological message passing system for solid models.2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

work page 2021
[26]

Cad translator: An effective drive for text to 3d parametric computer-aided design generative modeling

Xueyang Li, Yu Song, Yunzhong Lou, and Xiangdong Zhou. Cad translator: An effective drive for text to 3d parametric computer-aided design generative modeling. InACM Multimedia, 2024

work page 2024
[27]

Competition-level code generation with alphacode.Science, 2022

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom, Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de, Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey, Cherepanov, James Molloy, Daniel Jaymin Mankowitz, Esme Suther- land Robs...

work page 2022
[28]

MacIver, Zac Hatfield-Dodds, and Many Other Contributors

David R. MacIver, Zac Hatfield-Dodds, and Many Other Contributors. Hypothesis: A new approach to property-based testing.Journal of Open Source Software, 4(43):1891, 2019. doi: 10.21105/joss.01891. URLhttps://doi.org/10.21105/joss.01891

work page doi:10.21105/joss.01891 2019
[29]

Sharp challenge 2023: Solving cad history and parameters recovery from point clouds and 3d scans

Dimitrios Mallis, Ali Sk Aziz, Elona Dupont, Kseniya Cherenkova, Ahmet Serdar Karadeniz, Mohammad Sadil Khan, Anis Kacem, Gleb Gusev, and Djamila Aouada. Sharp challenge 2023: Solving cad history and parameters recovery from point clouds and 3d scans. overview, datasets, metrics, and baselines. InCVPRW, 2023

work page 2023
[30]

Cad-assistant: Tool- augmented vllms as generic cad task solvers.ICCV, 2025

Dimitrios Mallis, Ahmet Serdar Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-assistant: Tool- augmented vllms as generic cad task solvers.ICCV, 2025

work page 2025
[31]

From idea to cad: A language model-driven multi-agent system for collaborative design

Felix Ocker, Stefan Menzel, Ahmed Sadik, and Thiago Rios. From idea to cad: A language model-driven multi-agent system for collaborative design. InArxiv, 2025. 11

work page 2025
[32]

GPT-4 Technical Report

OpenAI. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[33]

Zhang, Yue Jia, Yves Le Traon, and Mark Harman

Mike Papadakis, Marinos Kintis, Jie M. Zhang, Yue Jia, Yves Le Traon, and Mark Harman. Chapter six - mutation testing advances: An analysis and survey.Adv. Comput., 2019

work page 2019
[34]

Cad-recode: Reverse engineering cad code from point clouds.ICCV, 2025

Danila Rukhovich, Elona Dupont, Dimitrios Mallis, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-recode: Reverse engineering cad code from point clouds.ICCV, 2025

work page 2025
[35]

Vitruvion: A generative model of parametric cad sketches

Ari Seff, Wenda Zhou, Nick Richardson, and Ryan P Adams. Vitruvion: A generative model of parametric cad sketches. InICLR, 2022

work page 2022
[36]

Lambourne, Tolga Birdal, and Leonidas J

Mikaela Angelina Uy, Yen-Yu Chang, Minhyuk Sung, Purvi Goel, J. Lambourne, Tolga Birdal, and Leonidas J. Guibas. Point2cyl: Reverse engineering 3d objects from point clouds to extrusion cylinders.2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

work page 2022
[37]

Can large language models write good property-based tests?ArXiv, 2024

Vasudev Vikram, Caroline Lemieux, and Rohan Padhye. Can large language models write good property-based tests?ArXiv, 2024

work page 2024
[38]

Text-to-cad generation through infusing visual feedback in large language models.ICLR, 2025

Ruiyu Wang, Yu Yuan, Shizhao Sun, and Jiang Bian. Text-to-cad generation through infusing visual feedback in large language models.ICLR, 2025

work page 2025
[39]

Deepcad: A deep generative network for computer- aided design models

Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer- aided design models. InCVPR, 2021

work page 2021
[40]

Guibas, Dahua Lin, and Gordon Wetzstein

Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas J. Guibas, Dahua Lin, and Gordon Wetzstein. Gpt-4v(ision) is a human-aligned evaluator for text-to-3d generation. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

work page 2024
[41]

React: Synergizing reasoning and acting in language models.ArXiv, 2022

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models.ArXiv, 2022

work page 2022
[42]

Text2cad: Text to 3d cad generation via technical drawings.ArXiv, abs/2411.06206, 2024

Mohsen Yavartanoo, Sangmin Hong, Reyhaneh Neshatavar, and Kyoung Mu Lee. Text2cad: Text to 3d cad generation via technical drawings.ArXiv, abs/2411.06206, 2024

work page arXiv 2024
[43]

Cad-editor: A locate-then-infill framework with automated training data synthesis for text-based cad editing.ICML, 2025

Yu Yuan, Shizhao Sun, Qi Liu, and Jiang Bian. Cad-editor: A locate-then-infill framework with automated training data synthesis for text-based cad editing.ICML, 2025

work page 2025
[44]

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi.CVPR, 2023

Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, and Wenhu Chen. Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for exp...

work page 2023
[45]

V e r i f i e s t h e s h a p e h a s e x a c t l y o n e s h e l l

Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, and Jiang Bian. Flexcad: Unified and versatile controllable cad generation with fine-tuned large language models. InICLR, 2025. 12 Text-to-CAD Evaluation with CADTESTS Supplementary Material This supplementary material includes additional details and illustrations that were not included on the main paper...

work page 2025
[46]

XY" ) .cylinder (cylinder_height, cylinder_diameter /2) ) board_length:float = 1.5 board_width:float = 0.1875 board_height:float = 0.075 board:cq.Workplane = ( cq.Workplane (

In total, 1,275cad mutants were generated. More examples of mutated CAD BReps are shown in Fig. 12. import cadquery as cq def create_cad () -> cq.Workplane: cylinder_diameter:float = 1.31858 cylinder_height:float = 0.468766 cylinder:cq.Workplane = ( cq.Workplane ( "XY" ) .cylinder (cylinder_height, cylinder_diameter /2) ) board_length:float = 1.5 board_wi...

work page