Self-Improving CAD Generation Agents with Finite Element Analysis as Feedback

Guijin Son; Jehyun Park; Seyeon Park; Sunghee Ahn; Youngjae Yu

arxiv: 2605.17448 · v1 · pith:JMGKQMSGnew · submitted 2026-05-17 · 💻 cs.GR · cs.CL

Self-Improving CAD Generation Agents with Finite Element Analysis as Feedback

Guijin Son , Jehyun Park , Seyeon Park , Sunghee Ahn , Youngjae Yu This is my paper

Pith reviewed 2026-05-19 22:36 UTC · model grok-4.3

classification 💻 cs.GR cs.CL

keywords CAD generationfinite element analysisSTEP filesLLM agentsself-improving loopsgeometric reconstructionengineering validation

0 comments

The pith

CAD agents improve designs when finite element analysis and blueprint feedback close the loop between generation and engineering checks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current LLM agents for CAD rarely produce any fully assembled multi-part files that pass strict validation tests on first attempt. The paper reframes the task as generating complete STEP files from engineering briefs and then using finite element analysis to score them against physical requirements. It adds a text blueprint schema and a 21-view image renderer as extra signals so the agent can inspect and revise its own output. These changes raise geometric accuracy on standard reconstruction benchmarks. A reader cares because the result moves AI CAD tools from visual plausibility toward artifacts that satisfy structural constraints.

Core claim

The paper claims that finite element analysis on generated STEP files, paired with a novel text-only blueprint schema and 21-view image renderer, supplies usable feedback that lets Codex and Claude Code agents self-improve, lifting geometric reconstruction from 0.444 to 0.592 Box-IoU on S2O and from 0.397 to 0.505 on Fusion360 while moving toward higher rates of meeting typed engineering requirements.

What carries the argument

The closed-loop agent that feeds finite element analysis results plus blueprint and multi-view image signals back into the next generation step to produce assembled multi-part STEP files.

If this is right

No first-attempt agent run meets all strict requirements, but the added signals measurably raise the fraction of satisfied constraints.
Geometric reconstruction improves on both S2O and Fusion360 without changing the base model.
CAD generation becomes an iterative process checked against physical and structural criteria rather than reference proximity alone.
The same feedback loop can be applied to any agent that outputs STEP files for engineering review.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested on additional simulation domains such as thermal or fluid analysis to see if the same loop generalizes.
Combining the blueprint and image signals with constraint solvers might further reduce the gap between generated files and production-ready parts.
Similar self-correction patterns may appear in other generative tasks that currently lack quantitative physical feedback.

Load-bearing premise

Finite element analysis performed on the generated STEP files gives a reliable enough signal of real engineering fitness.

What would settle it

Compare FEA-passing designs against either physical prototypes or higher-fidelity simulations to see whether the reported compliance gains disappear.

Figures

Figures reproduced from arXiv: 2605.17448 by Guijin Son, Jehyun Park, Seyeon Park, Sunghee Ahn, Youngjae Yu.

**Figure 2.** Figure 2: Overview of the CAD-agent pipeline. A free-form engineering brief is converted into an optional schema-v4 blueprint, decomposed into construction units, assembled into a STEP artifact by a deterministic controller, and revised using rich-view inspection and FEA feedback. The controller owns execution, measurement, composition, and validation, while the agent owns design decisions and CAD-code repair. • A s… view at source ↗

**Figure 3.** Figure 3: Grouped nine-view sample for a generated wheel hub drawn from the 21-view rich-view set. The full set combines 12 axis-aligned and isometric views for exterior coverage, six close-ups for small features, and three alpha-blended x-ray views for internal mating and clearance. The strip contrasts conventional six-view coverage with selected additional views. The left close-up makes the bolt circle, concentric… view at source ↗

**Figure 4.** Figure 4: Representative S2O target items used to synthesize natural-language prompts. C Sample S2O and Fusion 360 evaluation prompts For the geometric benchmarks, each evaluation prompt is generated from the target rendering and structured metadata rather than written directly by the authors. Figures 4 and 5 show representative target items, and the boxes below give the full corresponding natural-language prompts s… view at source ↗

**Figure 5.** Figure 5: Representative Fusion 360 target items used to synthesize natural-language prompts. C.2 Fusion 360 prompt samples Full generated prompt: Fusion 360 robotic chassis This is a fabricated steel electromechanical chassis consisting of numerous sheet-like and machined bodies assembled into a rigid open frame. The primary load-bearing members are two thick side plates with generous internal cutouts to reduce mas… view at source ↗

**Figure 6.** Figure 6: First-attempt quality versus one-step FEA repair gain on Hephaestus-CCX. Each point is a [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Strict-pass retries where the agent changes the physical load-bearing structure. The steel column becomes a braced four-chord box column, the HPVC roll-protection system sheds excess surrogate mass while preserving stiffness, and the UGV tool arm becomes a hollow box beam with cleaner root and tip selector faces. Structural retuning. The AISC 360-22 steel column is the clearest load-path change and corres… view at source ↗

**Figure 8.** Figure 8: Strict-pass retries where the decisive change is simplification or hidden mass-property repair. The launcher removes fragile over-detailed geometry and routes load through a simpler body. The rollcage simplifies an unstable dense cage surrogate into an FIA-compatible tube layout with compliance metadata. The spacecraft panel looks similar in surface view, but the field map reveals the density correction th… view at source ↗

**Figure 9.** Figure 9: Strict-pass retries dominated by checker-contract repair. These artifacts already satisfy much of the underlying physics, but fail strict grading until the generated metadata exposes the required metric names, mass fields, selector bindings, or mesh-derived mass aliases. not the shape, but the bridge between the generated artifact and the evaluator’s typed engineering contract. 21 [PITH_FULL_IMAGE:figures… view at source ↗

**Figure 10.** Figure 10: Per-item engineering-domain distribution across the 466-case candidate pool, with the raw domain field grouped into thirteen broad buckets. Each bar is split into single-part and multi-part segments; the right-of-bar number is the total count and percentage of the pool. Aerospace and ground-vehicle cases together account for over half the pool, but every bucket is represented in both subsets, which is wha… view at source ↗

**Figure 11.** Figure 11: Candidate-pool brief distribution by catalog, single-part vs multi-part. The intercollegiate catalog (i) and the foundational A-series (a) account for the largest share of the pool; engineering standards (s) and patents/datasheets (pt) provide the bulk of the strict-spec briefs. 0 500 1000 1500 2000 requirement count across the 466 briefs structural analysis vibration analysis buckling analysis unknown th… view at source ↗

**Figure 12.** Figure 12: Distribution of requirement type across all pass/fail criteria in the 466-case pool. Structural-analysis criteria dominate; buckling, vibration, thermal, dimensional, geometric, and material-compliance checks each contribute a meaningful share. The two smallest types (fluid, radiation) sit outside CalculiX’s scope and are tracked as future-work analyses. end-to-end exercised by the 50 cases alone. (iii) D… view at source ↗

**Figure 13.** Figure 13: Catalog coverage of the curated 50-case benchmark against the full 466-case candidate pool. Bars are the pool count; red overlay is the count selected for Hephaestus-CCX. The selection over-samples engineering-standards (s) and aerospace (sa) catalogs because those briefs exercise the strictest pass/fail rubrics, and samples the I-series and A-series lightly relative to their pool share to keep the curate… view at source ↗

read the original abstract

Computer-aided design (CAD) is the backbone of modern industrial design, yet learned CAD generators still fall short of real engineering pipelines: they neither iterate like engineers nor evaluate what engineering requires. Prior work has treated CAD generation as two disjoint steps, part synthesis and assembly, where the former is graded by proximity to a gold reference and the latter, when handled at all, is reduced to a separate constraint solving step. In this work, we introduce a more industry-native task formulation that requires a model to produce a fully assembled multi-part STEP file from a free-form engineering brief, which is then validated via finite element analysis (FEA). FEA validation reveals that Codex (GPT-5.5) and Claude Code (Opus-4.7) agents do not produce a single strict-passing artifact in the main first-attempt sweep, with the best configuration meeting only about 20% of typed requirements on average. Moreover, we introduce two additional supervision signals, a novel text-only blueprint schema and a 21-view image renderer that aids the agent's visual inspection, that better align the generation loop with how engineers iterate in practice. On S2O and Fusion360, the same feedback tools improve geometric reconstruction, with GPT-5.5/xhigh rising from 0.444 to 0.592 Box-IoU on S2O and from 0.397 to 0.505 on Fusion360. Together these signals move CAD programs toward artifacts that are not only visually plausible but also checked against physical and structural requirements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sets up a CAD generation task with FEA validation and adds blueprint plus multi-view feedback, but only reports geometric IoU gains without showing those translate to better FEA outcomes.

read the letter

The main point is that this work pushes CAD agents toward producing assembled STEP files that can be checked for structural validity with finite element analysis, yet the quantitative results stay on geometry and leave the physical improvement claim untested. They define an end-to-end task from engineering brief to full multi-part model, run FEA on the output, and show that leading agents produce zero strict passes while meeting only about 20 percent of requirements on average. They then introduce a text blueprint schema and a 21-view renderer as extra signals in the loop, which lift Box-IoU on S2O and Fusion360 for the GPT-5.5 setup. That combination of task framing and specific feedback tools is the clearest new element. It does a decent job highlighting how current generators ignore engineering constraints and sketching a more iterative, visually grounded process that matches how designers actually work. The baseline failure numbers are also straightforward and worth having in the literature. The soft spot is the gap between the geometric lifts and any FEA or constraint-passing numbers. The abstract and results give before-and-after IoU but no corresponding before-and-after FEA scores, violation counts, or pass rates, so the claim that the new signals move designs toward real engineering requirements rests on an unshown correlation rather than direct evidence. Minor details like dataset construction or exact agent prompting would also help, but the missing FEA link is the load-bearing one. This is aimed at groups working on generative design, agent loops, and simulation-in-the-loop training. Readers who want concrete ideas for blueprint representations or multi-view inspection could pick up useful pieces, while anyone focused on verified structural performance would need the extra metrics. It is worth sending to peer review because the task is timely and the failure case is clearly documented, even if the positive results need tightening.

Referee Report

2 major / 1 minor

Summary. The paper formulates CAD generation as producing fully assembled multi-part STEP files from free-form engineering briefs, with validation via finite element analysis (FEA). It reports that Codex (GPT-5.5) and Claude Code (Opus-4.7) agents produce no strict-passing artifacts in a first-attempt sweep, satisfying only ~20% of typed requirements on average. The authors introduce a text-only blueprint schema and 21-view image renderer as additional feedback signals; these yield Box-IoU gains from 0.444 to 0.592 on S2O and from 0.397 to 0.505 on Fusion360 for the GPT-5.5/xhigh configuration. The central thesis is that these signals, combined with FEA feedback, move outputs toward artifacts that satisfy real engineering requirements.

Significance. If the core premise holds, the work could meaningfully advance self-improving CAD agents by closing the gap between geometric plausibility and physical/structural validity. The task reformulation and explicit use of FEA as a feedback loop represent a concrete step beyond reference-based metrics; the reported agent failure rates and the two new supervision signals are useful empirical anchors for the field.

major comments (2)

[Abstract / Results] Abstract and results: the claim that the blueprint schema and 21-view renderer improve engineering fidelity rests on an untested correlation. Geometric Box-IoU lifts are quantified, yet no before/after FEA scores, constraint-violation counts, or change in the fraction of artifacts meeting typed requirements are reported; without these, the causal link between the new signals and satisfaction of physical requirements cannot be assessed.
[Evaluation] Evaluation protocol: the manuscript states that FEA validation reveals zero strict-passing artifacts and ~20% average requirement satisfaction, but provides no table or section detailing how FEA outputs are mapped to the typed requirements or how the feedback loop uses FEA scores to drive self-improvement iterations.

minor comments (1)

[Abstract] The abstract would benefit from a concise definition or example of the 'typed requirements' used in the 20% figure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. The points raised highlight opportunities to strengthen the empirical support for our claims and to clarify the evaluation protocol. We address each major comment below and commit to revisions that directly respond to the concerns.

read point-by-point responses

Referee: [Abstract / Results] Abstract and results: the claim that the blueprint schema and 21-view renderer improve engineering fidelity rests on an untested correlation. Geometric Box-IoU lifts are quantified, yet no before/after FEA scores, constraint-violation counts, or change in the fraction of artifacts meeting typed requirements are reported; without these, the causal link between the new signals and satisfaction of physical requirements cannot be assessed.

Authors: We agree that the manuscript would benefit from direct before-and-after metrics on FEA outcomes and requirement satisfaction to substantiate the link to physical validity. The reported Box-IoU gains demonstrate improved geometric fidelity, which we view as a prerequisite for engineering requirements, but we did not quantify the corresponding changes in FEA pass rates or typed-requirement compliance for the blueprint and multi-view configurations. In the revised version we will re-evaluate the GPT-5.5/xhigh and Claude configurations with and without the new signals, reporting delta values for FEA scores, constraint-violation counts, and the fraction of artifacts meeting typed requirements. These additions will make the causal contribution of the supervision signals explicit. revision: yes
Referee: [Evaluation] Evaluation protocol: the manuscript states that FEA validation reveals zero strict-passing artifacts and ~20% average requirement satisfaction, but provides no table or section detailing how FEA outputs are mapped to the typed requirements or how the feedback loop uses FEA scores to drive self-improvement iterations.

Authors: We acknowledge that the current text describes the FEA integration at a high level without a dedicated mapping table or explicit iteration diagram. The manuscript does define the typed requirements and states that FEA is used for validation and feedback, yet the precise translation from FEA quantities (e.g., von Mises stress thresholds, displacement limits) to requirement satisfaction and the prompt-update mechanism for self-improvement are not tabulated. In revision we will insert a new subsection (with accompanying table and pseudocode) that (1) lists the FEA-derived criteria for each typed requirement and (2) details how the scalar FEA scores are injected into the agent’s next-turn prompt to close the self-improvement loop. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical IoU gains reported from added feedback signals without any derivation or fit reducing to inputs.

full rationale

The paper describes an empirical task formulation for CAD generation from engineering briefs, followed by FEA validation and introduction of blueprint and 21-view image feedback. Reported results consist of direct measurements: zero strict-passing artifacts in baseline sweeps, ~20% requirement compliance, and specific Box-IoU lifts (0.444 to 0.592 on S2O; 0.397 to 0.505 on Fusion360) when the new signals are added. No equations, parameter fittings, self-definitional loops, or load-bearing self-citations appear in the provided text that would make any claimed improvement equivalent to its own inputs by construction. The evaluation chain relies on external geometric and FEA metrics that remain independent of the generation process.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available so ledger is necessarily incomplete; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.0 · 5820 in / 1293 out tokens · 46611 ms · 2026-05-19T22:36:46.130529+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FEA validation reveals that Codex ... do not produce a single strict-passing artifact ... 21-view image renderer ... improve geometric reconstruction, with GPT-5.5/xhigh rising from 0.444 to 0.592 Box-IoU
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

rich-view image judge renders the STEP from 21 calibrated views ... finite-element feedback from CalculiX

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 3 internal anchors

[1]

gpt-oss-120b & gpt-oss-20b Model Card

Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. gpt-oss-120b & gpt-oss-20b model card.arXiv preprint arXiv:2508.10925, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Generating CAD code with vision-language models for 3d designs.arXiv preprint arXiv:2410.05340, 2024

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Zaidi, Megan Langwasser, Wei Xu, and Matthew Gombolay. Generating cad code with vision-language models for 3d designs.arXiv preprint arXiv:2410.05340, 2024

work page arXiv 2024
[3]

Developing a computer use model

Anthropic. Developing a computer use model. https://www.anthropic.com/news/ 3-5-models-and-computer-use , 2024. Introduces Claude 3.5 Sonnet computer use ca- pability

work page 2024
[4]

Cadsmith: Multi-agent cad genera- tion with programmatic geometric validation.arXiv preprint arXiv:2603.26512, 2026

Jesse Barkley, Rumi Loghmani, and Amir Barati Farimani. Cadsmith: Multi-agent cad genera- tion with programmatic geometric validation.arXiv preprint arXiv:2603.26512, 2026

work page arXiv 2026
[5]

CRC press, 2010

Geoffrey Boothroyd, Peter Dewhurst, and Winston A Knight.Product design for manufacture and assembly. CRC press, 2010

work page 2010
[6]

Videollm-online: Online video large language model for streaming video

Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, and Mike Zheng Shou. Videollm-online: Online video large language model for streaming video. InCVPR, 2024

work page 2024
[7]

CalculiX: A free software three-dimensional structural finite element program.http://www.calculix.de/, 2024

Guido Dhondt and Klaus Wittig. CalculiX: A free software three-dimensional structural finite element program.http://www.calculix.de/, 2024. Version 2.22

work page 2024
[8]

Cad-coder: An open-source vision-language model for computer-aided design code generation.Journal of Mechanical Design, 148(7):071702, 2026

Anna C Doris, Ferdous Alam, Amin Heyrani Nobari, and Faez Ahmed. Cad-coder: An open-source vision-language model for computer-aided design code generation.Journal of Mechanical Design, 148(7):071702, 2026

work page 2026
[9]

CADDesigner: Conceptual CAD Model Generation with a General-Purpose Agent

Fengxiao Fan, Jingzhe Ni, Xiaolong Yin, Sirui Wang, Xingyu Lu, Qiang Zou, Ruofeng Tong, Min Tang, and Peng Du. Caddesigner: Conceptual design of cad models based on general- purpose agent.arXiv preprint arXiv:2508.01031, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

A point set generation network for 3d object reconstruction from a single image

Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613, 2017

work page 2017
[11]

Gemini 3 pro image (Nano Banana Pro)

Google DeepMind. Gemini 3 pro image (Nano Banana Pro). https://deepmind.google/ models/gemini-image/pro/, 2025. Image generation and editing model built on Gemini 3 Pro

work page 2025
[12]

doi:10.48550/arXiv.2507.09792 , urldate =

Prashant Govindarajan, Davide Baldelli, Jay Pathak, Quentin Fournier, and Sarath Chandar. Cadmium: Fine-tuning code language models for text-driven sequential cad design.arXiv preprint arXiv:2507.09792, 2025

work page arXiv 2025
[13]

From kmmlu-redux to kmmlu-pro: A professional korean benchmark suite for llm evaluation

Seokhee Hong, Sunkyoung Kim, Guijin Son, Soyeon Kim, Yeonjung Hong, and Jinsik Lee. From kmmlu-redux to kmmlu-pro: A professional korean benchmark suite for llm evaluation. arXiv preprint arXiv:2507.08924, 2025

work page arXiv 2025
[14]

S2o: Static to openable enhancement for articulated 3d objects

Denys Iliash, Hanxiao Jiang, Yiming Zhang, Manolis Savva, and Angel X Chang. S2o: Static to openable enhancement for articulated 3d objects. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6785–6795, 2026

work page 2026
[15]

Automate: A dataset and learning approach for automatic mating of cad assemblies

Benjamin Jones, Dalton Hildreth, Duowen Chen, Ilya Baran, Vladimir G Kim, and Adriana Schulz. Automate: A dataset and learning approach for automatic mating of cad assemblies. ACM Transactions on Graphics (TOG), 40(6):1–18, 2021

work page 2021
[16]

Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024

Mohammad S Khan, Sankalp Sinha, Talha U Sheikh, Didier Stricker, Sk A Ali, and Muham- mad Z Afzal. Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024. 10

work page 2024
[17]

Cad-signet: Cad language inference from point clouds using layer-wise sketch instance guided attention

Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-signet: Cad language inference from point clouds using layer-wise sketch instance guided attention. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4713–4722, 2024

work page 2024
[18]

cadrille: Multi-modal cad reconstruction with reinforcement learning

Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna V orontsova, Anton Konushin, Vladislav Kurenkov, and Danila Rukhovich. cadrille: Multi-modal cad reconstruction with reinforcement learning. InThe Fourteenth International Conference on Learning Representations, 2025

work page 2025
[19]

Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation

Jiahao Li, Weijian Ma, Xueyang Li, Yunzhong Lou, Guichun Zhou, and Xiangdong Zhou. Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18563–18573, 2025

work page 2025
[20]

Seek-CAD:Aself-refinedgenerativemodelingfor3DparametricCAD using local inference via DeepSeek.arXiv preprint arXiv:2505.17702, 2025

Xueyang Li, Jiahao Li, Yu Song, Yunzhong Lou, and Xiangdong Zhou. Seek-cad: A self-refined generative modeling for 3d parametric cad using local inference via deepseek.arXiv preprint arXiv:2505.17702, 2025

work page arXiv 2025
[21]

Cad-assistant: tool- augmented vllms as generic cad task solvers

Dimitrios Mallis, Ahmet Serda Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-assistant: tool- augmented vllms as generic cad task solvers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7284–7294, 2025

work page 2025
[22]

Evocad: Evolutionary cad code generation with vision language models

Tobias Preintner, Weixuan Yuan, Adrian König, Thomas Bäck, Elena Raponi, and Niki Van Stein. Evocad: Evolutionary cad code generation with vision language models. In2025 IEEE 37th International Conference on Tools with Artificial Intelligence (ICTAI), pages 504–511. IEEE, 2025

work page 2025
[23]

Draw- ing2cad: Sequence-to-sequence learning for cad generation from vector drawings

Feiwei Qin, Shichao Lu, Junhao Hou, Changmiao Wang, Meie Fang, and Ligang Liu. Draw- ing2cad: Sequence-to-sequence learning for cad generation from vector drawings. InProceed- ings of the 33rd ACM International Conference on Multimedia, pages 10573–10582, 2025

work page 2025
[24]

Cad-recode: Reverse engineering cad code from point clouds

Danila Rukhovich, Elona Dupont, Dimitrios Mallis, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-recode: Reverse engineering cad code from point clouds. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9801–9811, 2025

work page 2025
[25]

OpenAI GPT-5 System Card

Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[26]

3d-gpt: Procedural 3d modeling with large language models

Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, and Stephen Gould. 3d-gpt: Procedural 3d modeling with large language models. In2025 International Conference on 3D Vision (3DV), pages 1253–1263. IEEE, 2025

work page 2025
[27]

What do single-view 3d reconstruction networks learn? InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3405–3414, 2019

Maxim Tatarchenko, Stephan R Richter, René Ranftl, Zhuwen Li, Vladlen Koltun, and Thomas Brox. What do single-view 3d reconstruction networks learn? InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3405–3414, 2019

work page 2019
[28]

Oxford university press New York, 2004

Daniel E Whitney.Mechanical assemblies: their design, manufacture, and role in product development, volume 1. Oxford university press New York, 2004

work page 2004
[29]

Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4):1–24, 2021

Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4):1–24, 2021

work page 2021
[30]

Joinable: Learning bottom-up assembly of parametric cad joints

Karl DD Willis, Pradeep Kumar Jayaraman, Hang Chu, Yunsheng Tian, Yifei Li, Daniele Grandi, Aditya Sanghi, Linh Tran, Joseph G Lambourne, Armando Solar-Lezama, et al. Joinable: Learning bottom-up assembly of parametric cad joints. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15849–15860, 2022. 11

work page 2022
[31]

Deepcad: A deep generative network for computer- aided design models

Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer- aided design models. InProceedings of the IEEE/CVF international conference on computer vision, pages 6772–6782, 2021

work page 2021
[32]

Text-to-CadQuery: A New Paradigm for CADgenerationwithscalablelargemodelcapabilities

Haoyang Xie and Feng Ju. Text-to-cadquery: A new paradigm for cad generation with scalable large model capabilities.arXiv preprint arXiv:2505.06507, 2025. 12 A Additional model baseline results Table 5:Additional OpenCode baseline results on Hephaestus-CCX. These rows use no FEA-retry round and are reported here to keep Table 2 focused on the main Codex a...

work page arXiv 2025
[33]

Catalog identification.A new catalog (e.g., a standards body, a supplier datasheet family, or a regional industrial archive) is added to the candidate pool, and each prospective case is assigned a difficulty tier (A+, A, B, C) reflecting how completely the underlying source already specifies the FEA pass/fail criteria

work page
[34]

Brief drafting.The narrative engineering brief is written from the source material, with all numeric limits (load magnitudes, material strengths, dimensional tolerances, safety factors) resolved inline at authoring time so the brief carries no dangling references to external documents

work page
[35]

Rn), each one tied to a load case and a numeric threshold

Requirement expansion.The brief’s pass/fail criteria are decomposed into an explicitly typed list of requirements (R1. . . Rn), each one tied to a load case and a numeric threshold

work page
[36]

The brief is only kept if every requirement parses, evaluates, and binds without skipping

Checker validation.Every candidate brief is run through the CalculiX harness against a hand-built reference geometry. The brief is only kept if every requirement parses, evaluates, and binds without skipping

work page
[37]

collegiate formula-class racecar benchmark

Scrub pass before release.A final pass strips identifying metadata and external dependen- cies; details follow. The scrub enforced two invariants. (i)No source identity: we removed every metadata field that could attribute a brief to a specific organization, competition, or human author, and rewrote the prompt text of 49 files whose narratives named speci...

work page 2000

[1] [1]

gpt-oss-120b & gpt-oss-20b Model Card

Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, et al. gpt-oss-120b & gpt-oss-20b model card.arXiv preprint arXiv:2508.10925, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Generating CAD code with vision-language models for 3d designs.arXiv preprint arXiv:2410.05340, 2024

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Zaidi, Megan Langwasser, Wei Xu, and Matthew Gombolay. Generating cad code with vision-language models for 3d designs.arXiv preprint arXiv:2410.05340, 2024

work page arXiv 2024

[3] [3]

Developing a computer use model

Anthropic. Developing a computer use model. https://www.anthropic.com/news/ 3-5-models-and-computer-use , 2024. Introduces Claude 3.5 Sonnet computer use ca- pability

work page 2024

[4] [4]

Cadsmith: Multi-agent cad genera- tion with programmatic geometric validation.arXiv preprint arXiv:2603.26512, 2026

Jesse Barkley, Rumi Loghmani, and Amir Barati Farimani. Cadsmith: Multi-agent cad genera- tion with programmatic geometric validation.arXiv preprint arXiv:2603.26512, 2026

work page arXiv 2026

[5] [5]

CRC press, 2010

Geoffrey Boothroyd, Peter Dewhurst, and Winston A Knight.Product design for manufacture and assembly. CRC press, 2010

work page 2010

[6] [6]

Videollm-online: Online video large language model for streaming video

Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, and Mike Zheng Shou. Videollm-online: Online video large language model for streaming video. InCVPR, 2024

work page 2024

[7] [7]

CalculiX: A free software three-dimensional structural finite element program.http://www.calculix.de/, 2024

Guido Dhondt and Klaus Wittig. CalculiX: A free software three-dimensional structural finite element program.http://www.calculix.de/, 2024. Version 2.22

work page 2024

[8] [8]

Cad-coder: An open-source vision-language model for computer-aided design code generation.Journal of Mechanical Design, 148(7):071702, 2026

Anna C Doris, Ferdous Alam, Amin Heyrani Nobari, and Faez Ahmed. Cad-coder: An open-source vision-language model for computer-aided design code generation.Journal of Mechanical Design, 148(7):071702, 2026

work page 2026

[9] [9]

CADDesigner: Conceptual CAD Model Generation with a General-Purpose Agent

Fengxiao Fan, Jingzhe Ni, Xiaolong Yin, Sirui Wang, Xingyu Lu, Qiang Zou, Ruofeng Tong, Min Tang, and Peng Du. Caddesigner: Conceptual design of cad models based on general- purpose agent.arXiv preprint arXiv:2508.01031, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

A point set generation network for 3d object reconstruction from a single image

Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613, 2017

work page 2017

[11] [11]

Gemini 3 pro image (Nano Banana Pro)

Google DeepMind. Gemini 3 pro image (Nano Banana Pro). https://deepmind.google/ models/gemini-image/pro/, 2025. Image generation and editing model built on Gemini 3 Pro

work page 2025

[12] [12]

doi:10.48550/arXiv.2507.09792 , urldate =

Prashant Govindarajan, Davide Baldelli, Jay Pathak, Quentin Fournier, and Sarath Chandar. Cadmium: Fine-tuning code language models for text-driven sequential cad design.arXiv preprint arXiv:2507.09792, 2025

work page arXiv 2025

[13] [13]

From kmmlu-redux to kmmlu-pro: A professional korean benchmark suite for llm evaluation

Seokhee Hong, Sunkyoung Kim, Guijin Son, Soyeon Kim, Yeonjung Hong, and Jinsik Lee. From kmmlu-redux to kmmlu-pro: A professional korean benchmark suite for llm evaluation. arXiv preprint arXiv:2507.08924, 2025

work page arXiv 2025

[14] [14]

S2o: Static to openable enhancement for articulated 3d objects

Denys Iliash, Hanxiao Jiang, Yiming Zhang, Manolis Savva, and Angel X Chang. S2o: Static to openable enhancement for articulated 3d objects. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6785–6795, 2026

work page 2026

[15] [15]

Automate: A dataset and learning approach for automatic mating of cad assemblies

Benjamin Jones, Dalton Hildreth, Duowen Chen, Ilya Baran, Vladimir G Kim, and Adriana Schulz. Automate: A dataset and learning approach for automatic mating of cad assemblies. ACM Transactions on Graphics (TOG), 40(6):1–18, 2021

work page 2021

[16] [16]

Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024

Mohammad S Khan, Sankalp Sinha, Talha U Sheikh, Didier Stricker, Sk A Ali, and Muham- mad Z Afzal. Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024. 10

work page 2024

[17] [17]

Cad-signet: Cad language inference from point clouds using layer-wise sketch instance guided attention

Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-signet: Cad language inference from point clouds using layer-wise sketch instance guided attention. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4713–4722, 2024

work page 2024

[18] [18]

cadrille: Multi-modal cad reconstruction with reinforcement learning

Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna V orontsova, Anton Konushin, Vladislav Kurenkov, and Danila Rukhovich. cadrille: Multi-modal cad reconstruction with reinforcement learning. InThe Fourteenth International Conference on Learning Representations, 2025

work page 2025

[19] [19]

Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation

Jiahao Li, Weijian Ma, Xueyang Li, Yunzhong Lou, Guichun Zhou, and Xiangdong Zhou. Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18563–18573, 2025

work page 2025

[20] [20]

Seek-CAD:Aself-refinedgenerativemodelingfor3DparametricCAD using local inference via DeepSeek.arXiv preprint arXiv:2505.17702, 2025

Xueyang Li, Jiahao Li, Yu Song, Yunzhong Lou, and Xiangdong Zhou. Seek-cad: A self-refined generative modeling for 3d parametric cad using local inference via deepseek.arXiv preprint arXiv:2505.17702, 2025

work page arXiv 2025

[21] [21]

Cad-assistant: tool- augmented vllms as generic cad task solvers

Dimitrios Mallis, Ahmet Serda Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-assistant: tool- augmented vllms as generic cad task solvers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7284–7294, 2025

work page 2025

[22] [22]

Evocad: Evolutionary cad code generation with vision language models

Tobias Preintner, Weixuan Yuan, Adrian König, Thomas Bäck, Elena Raponi, and Niki Van Stein. Evocad: Evolutionary cad code generation with vision language models. In2025 IEEE 37th International Conference on Tools with Artificial Intelligence (ICTAI), pages 504–511. IEEE, 2025

work page 2025

[23] [23]

Draw- ing2cad: Sequence-to-sequence learning for cad generation from vector drawings

Feiwei Qin, Shichao Lu, Junhao Hou, Changmiao Wang, Meie Fang, and Ligang Liu. Draw- ing2cad: Sequence-to-sequence learning for cad generation from vector drawings. InProceed- ings of the 33rd ACM International Conference on Multimedia, pages 10573–10582, 2025

work page 2025

[24] [24]

Cad-recode: Reverse engineering cad code from point clouds

Danila Rukhovich, Elona Dupont, Dimitrios Mallis, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad-recode: Reverse engineering cad code from point clouds. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9801–9811, 2025

work page 2025

[25] [25]

OpenAI GPT-5 System Card

Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[26] [26]

3d-gpt: Procedural 3d modeling with large language models

Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, and Stephen Gould. 3d-gpt: Procedural 3d modeling with large language models. In2025 International Conference on 3D Vision (3DV), pages 1253–1263. IEEE, 2025

work page 2025

[27] [27]

What do single-view 3d reconstruction networks learn? InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3405–3414, 2019

Maxim Tatarchenko, Stephan R Richter, René Ranftl, Zhuwen Li, Vladlen Koltun, and Thomas Brox. What do single-view 3d reconstruction networks learn? InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3405–3414, 2019

work page 2019

[28] [28]

Oxford university press New York, 2004

Daniel E Whitney.Mechanical assemblies: their design, manufacture, and role in product development, volume 1. Oxford university press New York, 2004

work page 2004

[29] [29]

Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4):1–24, 2021

Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4):1–24, 2021

work page 2021

[30] [30]

Joinable: Learning bottom-up assembly of parametric cad joints

Karl DD Willis, Pradeep Kumar Jayaraman, Hang Chu, Yunsheng Tian, Yifei Li, Daniele Grandi, Aditya Sanghi, Linh Tran, Joseph G Lambourne, Armando Solar-Lezama, et al. Joinable: Learning bottom-up assembly of parametric cad joints. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15849–15860, 2022. 11

work page 2022

[31] [31]

Deepcad: A deep generative network for computer- aided design models

Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer- aided design models. InProceedings of the IEEE/CVF international conference on computer vision, pages 6772–6782, 2021

work page 2021

[32] [32]

Text-to-CadQuery: A New Paradigm for CADgenerationwithscalablelargemodelcapabilities

Haoyang Xie and Feng Ju. Text-to-cadquery: A new paradigm for cad generation with scalable large model capabilities.arXiv preprint arXiv:2505.06507, 2025. 12 A Additional model baseline results Table 5:Additional OpenCode baseline results on Hephaestus-CCX. These rows use no FEA-retry round and are reported here to keep Table 2 focused on the main Codex a...

work page arXiv 2025

[33] [33]

Catalog identification.A new catalog (e.g., a standards body, a supplier datasheet family, or a regional industrial archive) is added to the candidate pool, and each prospective case is assigned a difficulty tier (A+, A, B, C) reflecting how completely the underlying source already specifies the FEA pass/fail criteria

work page

[34] [34]

Brief drafting.The narrative engineering brief is written from the source material, with all numeric limits (load magnitudes, material strengths, dimensional tolerances, safety factors) resolved inline at authoring time so the brief carries no dangling references to external documents

work page

[35] [35]

Rn), each one tied to a load case and a numeric threshold

Requirement expansion.The brief’s pass/fail criteria are decomposed into an explicitly typed list of requirements (R1. . . Rn), each one tied to a load case and a numeric threshold

work page

[36] [36]

The brief is only kept if every requirement parses, evaluates, and binds without skipping

Checker validation.Every candidate brief is run through the CalculiX harness against a hand-built reference geometry. The brief is only kept if every requirement parses, evaluates, and binds without skipping

work page

[37] [37]

collegiate formula-class racecar benchmark

Scrub pass before release.A final pass strips identifying metadata and external dependen- cies; details follow. The scrub enforced two invariants. (i)No source identity: we removed every metadata field that could attribute a brief to a specific organization, competition, or human author, and rewrote the prompt text of 49 files whose narratives named speci...

work page 2000