pith. sign in

arxiv: 2605.19748 · v1 · pith:WTRO3VGUnew · submitted 2026-05-19 · 💻 cs.AI · cs.MA

Memory-Augmented Reinforcement Learning Agent for CAD Generation

Pith reviewed 2026-05-20 05:13 UTC · model grok-4.3

classification 💻 cs.AI cs.MA
keywords CAD generationreinforcement learningmemory augmentationself-correctiongeometric consistencyCAD modelsRL agentdual-track memory
0
0 comments X

The pith

A memory-augmented reinforcement learning agent for CAD generation enables self-correction and continual improvement without additional data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that uses reinforcement learning to augment memory in CAD generation agents. It structures the geometric kernel as a callable toolchain and creates a closed loop for intent understanding, planning, execution, and verification. A dual-track memory system with case and skill libraries, combined with RL-based retrieval, helps the agent avoid unsuitable examples that are similar in meaning but not in geometry. This setup supports online fixes and evolution, leading to better performance on difficult models.

Core claim

The memory-augmented reinforcement learning framework encapsulates the geometric kernel into a structured toolchain and builds a closed-loop mechanism of design intent understanding, global planning, execution, and multi-dimensional verification. It designs a dual-track memory module consisting of a case library and a skill library with a dynamic utility retrieval algorithm. By introducing reinforcement learning into retrieval and policy optimization, the agent avoids retrieval traps of semantically similar but geometrically infeasible examples, enabling online self-correction and continual evolution without additional large-scale annotated data.

What carries the argument

Dual-track memory module with case library and skill library, using dynamic utility retrieval algorithm driven by reinforcement learning for policy optimization.

If this is right

  • The agent achieves higher success rates on complex CAD tasks with long operation sequences and diverse types.
  • Geometric consistency improves through multi-dimensional verification in the closed loop.
  • Self-correction happens online during generation without needing new annotated data.
  • Continual evolution of the agent's capabilities occurs through the RL optimization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could generalize to other domains requiring precise geometric or structural constraints, such as 3D modeling in architecture.
  • Integrating RL for retrieval might reduce errors in other memory-augmented AI systems facing similar semantic-geometric mismatches.
  • Testing on a wider range of CAD complexities could reveal limits of the current memory size or retrieval speed.

Load-bearing premise

The assumption that encapsulating the geometric kernel into a structured toolchain combined with dual-track memory and RL-driven retrieval will reliably avoid semantically similar but geometrically infeasible retrieval traps.

What would settle it

Running the agent on a set of complex CAD models known to have geometrically invalid but semantically close examples in the memory, and checking if the success rate and consistency still improve significantly or if failures persist.

Figures

Figures reproduced from arXiv: 2605.19748 by Fan Fengxiao, Liu Yu, Lu Xingyu, Ni Jingzhe, Sang Fan, Shen Jiahang, Yin Xiaolong.

Figure 1
Figure 1. Figure 1: System architecture. next operation intent based on the current state and the global blueprint, then generates executable FreeCAD Python code and submits it to the ge￾ometric kernel through the MCP interface. The toolchain covers three types of functions: session management, geometric modeling, and geometric verification. They are used respectively to create, save, and roll back models; generate structures… view at source ↗
Figure 2
Figure 2. Figure 2: Example comparison of generated results from [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example CAD models generated from text and image inputs. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
read the original abstract

Automatic generation of computer-aided design (CAD) models is a core technology for enabling intelligence in advanced manufacturing. Existing generation methods based on large language models (LLMs) often fall short when handling complex CAD models characterized by long operation sequences, diverse operation types, and strong geometric constraints, primarily because reasoning chains break and effective error-correction mechanisms are lacking. To address this problem, this paper proposes a memory-augmented reinforcement learning framework for CAD generation agents. The framework encapsulates the underlying geometric kernel into a structured toolchain callable by the agent and builds a closed-loop mechanism of design intent understanding, global planning, execution, and multi-dimensional verification. It also designs a dual-track memory module consisting of a case library and a skill library, and proposes a dynamic utility retrieval algorithm. By introducing reinforcement learning into retrieval and policy optimization, the agent can effectively avoid retrieval traps in which examples are semantically similar but geometrically infeasible, enabling online self-correction and continual evolution without additional large-scale annotated data. Experiments show that the proposed method significantly improves both the success rate and geometric consistency on complex CAD model generation tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a memory-augmented reinforcement learning framework for automatic generation of complex CAD models. It encapsulates the geometric kernel as a structured toolchain, establishes a closed-loop mechanism involving design intent understanding, global planning, execution, and multi-dimensional verification, and introduces a dual-track memory module with case and skill libraries along with a dynamic utility retrieval algorithm optimized by RL. This enables the agent to avoid retrieval traps of semantically similar but geometrically infeasible examples, facilitating online self-correction and continual evolution. Experiments indicate significant improvements in success rate and geometric consistency on complex CAD tasks.

Significance. If the integration of the geometric kernel's verification feedback into the RL reward and policy is rigorously implemented and validated, this work could represent a meaningful advance in applying RL to structured generation tasks with hard constraints, such as CAD in manufacturing. The avoidance of large annotated data requirements and the emphasis on continual evolution are notable strengths. The closed-loop design with explicit kernel encapsulation addresses a clear limitation of pure LLM-based approaches.

major comments (2)
  1. [§3.3] §3.3 (RL objective and reward formulation): The central claim that RL-driven retrieval plus dual-track memory reliably avoids semantically similar but geometrically infeasible cases depends on the reward incorporating multi-dimensional verification feedback from the encapsulated geometric kernel. The manuscript provides no explicit equation or formulation showing how kernel-reported constraint violations or geometric consistency metrics enter the RL objective (e.g., as additive penalty terms or as part of the utility score). If the reward remains dominated by final task success or semantic similarity, the policy can still fall into the described traps, undermining the load-bearing mechanism for online self-correction.
  2. [§5] §5 (Experiments and ablations): The reported gains in success rate and geometric consistency are presented without ablation studies that isolate the dynamic utility retrieval algorithm or the RL component from the closed-loop mechanism and dual-track memory. This makes it impossible to attribute improvements specifically to avoidance of retrieval traps rather than other framework elements, weakening the causal link to the proposed contributions.
minor comments (2)
  1. [Abstract] Abstract and §2: The phrase 'dynamic utility retrieval algorithm' is used without a one-sentence definition or forward reference to its formal description, reducing immediate clarity for readers unfamiliar with the subfield.
  2. [Figure 4] Figure 4 (example CAD outputs): The geometric consistency metrics should be annotated directly on the rendered models to make the before/after self-correction effect visually verifiable without requiring cross-reference to tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments on our manuscript. These have helped us identify areas where additional clarity and evidence are needed. We address each major comment point by point below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [§3.3] §3.3 (RL objective and reward formulation): The central claim that RL-driven retrieval plus dual-track memory reliably avoids semantically similar but geometrically infeasible cases depends on the reward incorporating multi-dimensional verification feedback from the encapsulated geometric kernel. The manuscript provides no explicit equation or formulation showing how kernel-reported constraint violations or geometric consistency metrics enter the RL objective (e.g., as additive penalty terms or as part of the utility score). If the reward remains dominated by final task success or semantic similarity, the policy can still fall into the described traps, undermining the load-bearing mechanism for online self-correction.

    Authors: We agree that an explicit mathematical formulation is required to make the mechanism fully rigorous and to substantiate how the geometric kernel feedback prevents retrieval traps. The current manuscript describes the closed-loop integration and the role of verification in Section 3.3 but does not provide the equation. In the revised version we will insert a new Equation (3) that defines the composite reward as R_t = R_success + λ · G_consistency − μ · Σ V_constraints, where V_constraints are the multi-dimensional violation metrics returned by the encapsulated kernel and G_consistency is the geometric consistency score. We will also show how this reward is used both for policy gradient updates and for the dynamic utility score in retrieval. This change directly addresses the concern. revision: yes

  2. Referee: [§5] §5 (Experiments and ablations): The reported gains in success rate and geometric consistency are presented without ablation studies that isolate the dynamic utility retrieval algorithm or the RL component from the closed-loop mechanism and dual-track memory. This makes it impossible to attribute improvements specifically to avoidance of retrieval traps rather than other framework elements, weakening the causal link to the proposed contributions.

    Authors: We acknowledge that the existing experimental section compares the full system against baselines but does not contain targeted ablations that remove only the RL-optimized retrieval or only the dual-track memory while keeping the closed-loop mechanism fixed. We will add two new ablation tables in Section 5: (i) full framework versus framework with static (non-RL) retrieval, and (ii) full framework versus framework without the skill library. These will report success rate, geometric consistency, and retrieval-trap frequency, allowing readers to isolate the contribution of the RL-driven utility retrieval. The new experiments will be run on the same benchmark set. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract frames the core improvement as an empirical outcome of the proposed memory-augmented RL architecture, including kernel encapsulation, dual-track memory, dynamic utility retrieval, and RL-driven policy optimization that enables avoidance of retrieval traps. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the provided text. The derivation chain is presented as a constructive proposal whose validity rests on experimental results rather than reducing by construction to its own inputs or prior author work. This is the typical self-contained case for an applied systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view yields no explicit free parameters, axioms, or invented entities; the framework description implies unstated assumptions about tool encapsulation and retrieval effectiveness but provides no details for ledger entry.

pith-pipeline@v0.9.0 · 5734 in / 1165 out tokens · 50257 ms · 2026-05-20T05:13:40.002603+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 3 internal anchors

  1. [42]

    Memento-skills: Let agents de- sign agents

    Huichi Zhou, Siyuan Guo, Anji Liu, Zhongwei Yu, Ziqin Gong, Bowen Zhao, Zhixun Chen, Menglong Zhang, Yihang Chen, Jinsong Li, Runyu Yang, Qiang Liu, Xinle Yu, Jianming Zhou, Na Wang, Chunyang Sun, and Jun Wang. Memento-skills: Let agents de- sign agents. 2026. ࿐༝ CAD ളӮᇆି ุ ᅋေğčCAD؊ۚ Ⴟնე࿽ଆ ྘ čLLMҠቔো྘a ఼ ᄖ CADಌ Վ໙ีđ Ч ࿐༝ CAD ളӮᇆିุ ֥ iᆳ σ২९ ෬ෘ ӱđ ᇆିุ ෬ཊፗđ ࣉ...

  2. [43]

    带中⼼孔和凸台的六孔法 兰,并增加凸起环

    ႄ࿽ čComputer-Aided Design đCADĎೈ ܱ֥ ם࠹ ᇶੀ CAD ௜෻ čೂCATIAaUG NXaPTC CreoaOnShapea AutoCADaSolidWorksaል஥ჿඏა ଆੀ ۽ ᆜ ଆҦ੻ ིੱაᆩ് ᇗေၹ෍b ׮ ֥CADႵ CAD ඔऌ ࡼ CADҠቔ྽ਙđ ଆđ ൌགྷ၂ ളӮb ෛሢնე࿽ଆ྘ čLarge Language ModelsđLLMsࣉ֥ ᅚ [4, 26, 38]Ⴟሱಖე࿽ളӮ CADࡶ ֥ ॹ෎ b ֥CAD؟ ൔğ ଆ྘ສສ၂Ցྟൻԛປ ᆩა ᄖ CAD఼ ၂ᇁྟഈ૫ਢ೘ᇗ ളӮ଴ၛཁൔ ঌ ᄎෘാϧaൌุႄႨ /࿊ đ ৳ാིĠఃՑ ၇ঠ ࡱ ໭ིσ২đ ๝ൈಌ ࠏ֥ ު ପ ିđ ֮ ၩ ྟb ၫᄹ఼[41, 42]࿐ ༝ CAD...

  3. [44]

    ൌဒ 5.1. ൌဒഡᇂ ٳ߃ ᄖCAD ၂ ᇁ ྟđ Ч ໓ ࿊ ౼ Text2CAD [ 7]໓Ч૭ඍaѓ ႋ CADჰ൓ဢ ࠇ ֤֞1200࿊ဢЧĠෛ ၂҄ೱ࿊đቋᇔ ݣ1000࠺ ྐ༏྄੐đൌဒඔऌοႨ๯ ğ •Ԥᆩ്९č400Ď ğႨႿᆜ৘ FreeCADଆე aӈႨ APIԚ൓ ෬ᄹ఼ളӮᆩ്९Ġ •č400Ď ğႨႿളӮѩೱ࿊σ২ ෬Ҧ ྍĠ •č200Ď ğбđ ྍb ൌဒੀӱѓద ྐ༏ቔູൻೆđളӮ FreeCADႨ྽ ဒ ཌྷ๝ ༅ бൌဒᄵᄝ๝၂ ཁ ᆷѓb ᆷѓ Ч໓Ֆ؇ބ ྟ૫ ֥ࡗ ӱ൞ ྟିđ ၹՎ ᆞಒྟb (1)ᆷ ѓğള Ӯ ଆ ྘ ა ᆇ ᆴ ଆ ྘ čGround Truthఊა ᆷѓğ • IoUčIntersection over UnionĎ ğ ᇗ Ġ • C...

  4. [45]

    Generating CAD code with vision-language models for 3d designs.arXiv preprint arXiv:2410.05340, 2024

    K. Alrashedy, P. Tambwekar, Z. Zaidi, M. Langwasser, W. Xu, and M. Gombolay. Generating cad code with vision-language models for 3d designs. arXiv preprint arXiv:2410.05340, 2024

  5. [46]

    Query2cad: Generating cad models using natural language queries

    Akshay Badagabettu, Sai Sravan Yarlagadda, and Amir Barati Farimani. Query2cad: Generating cad models using natural language queries. arXiv preprint arXiv:2406.00144, 2024

  6. [47]

    Cadcrafter: Generating computer-aided design mod- els from unconstrained images

    Cheng Chen, Jiacheng Wei, Tianrun Chen, Chi Zhang, Xiaofeng Yang, Shangzhan Zhang, Bingchen Yang, Chuan-Sheng Foo, Guosheng Lin, Qixing Huang, et al. Cadcrafter: Generating computer-aided design mod- els from unconstrained images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 11073–11082, 2025

  7. [48]

    Evaluating Large Language Models Trained on Code

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 , 2021

  8. [49]

    CADDesigner: Conceptual CAD Model Generation with a General-Purpose Agent

    Fengxiao Fan, Jingzhe Ni, Xiaolong Yin, Sirui Wang, Xingyu Lu, Qiang Zou, Ruofeng Tong, Min Tang, and Peng Du. Caddesigner: Conceptual design of cad mod- els based on general-purpose agent. arXiv preprint arXiv:2508.01031, 2025

  9. [50]

    and Desai, Nishkrit and Willis, Karl D

    Pradeep Kumar Jayaraman, Joseph G Lambourne, Nishkrit Desai, Karl DD Willis, Aditya Sanghi, and Nigel JW Morris. Solidgen: An autoregressive model for direct b-rep synthesis. arXiv preprint arXiv:2203.13944, 2022

  10. [51]

    Text2cad: Generating sequen- tial cad designs from beginner-to-expert level text prompts

    Mohammad Sadil Khan, Sankalp Sinha, Talha Ud- din Sheikh, Didier Stricker, Sk Aziz Ali, and Muham- mad Zeshan Afzal. Text2cad: Generating sequen- tial cad designs from beginner-to-expert level text prompts. In Advances in Neural Information Process- ing Systems, pages 7552–7579. Curran Associates, Inc., 2024

  11. [52]

    Abc: A big cad model dataset for geometric deep learning

    Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, and Daniele Panozzo. Abc: A big cad model dataset for geometric deep learning. In The IEEE Conference on Computer Vision and Pat- tern Recognition (CVPR) , 2019

  12. [53]

    cadrille: Multi-modal CADreconstructionwithonlinereinforcementlearning

    M. Kolodiazhnyi, D. Tarasov, D. Zhemchuzhnikov, A. Nikulin, I. Zisman, A. Vorontsova, A. Konushin, V. Kurenkov, and D. Rukhovich. cadrille: Multi-modal cad reconstruction with online reinforcement learning. arXiv preprint arXiv:2505.22914 , 2025

  13. [54]

    Brepdiff: Single-stage b-rep diffusion model

    Mingi Lee, Dongsu Zhang, Clément Jambon, and Young Min Kim. Brepdiff: Single-stage b-rep diffusion model. In Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Con- ference Conference Papers, pages 1–11, 2025

  14. [55]

    Free2cad: Parsing freehand drawings into cad commands

    Changjian Li, Hao Pan, Adrien Bousseau, and Niloy J Mitra. Free2cad: Parsing freehand drawings into cad commands. ACM Transactions on Graphics (TOG) , 41(4):1–16, 2022

  15. [56]

    Dtgbrep- gen: A novel b-rep generative model through decou- pling topology and geometry

    Jing Li, Yihang Fu, and Falai Chen. Dtgbrep- gen: A novel b-rep generative model through decou- pling topology and geometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 21438–21447, 2025

  16. [57]

    J. Li, Y. Luo, Y. Lou, and X. Zhou. Recad: Reinforce- ment learning enhanced parametric cad model gen- eration with vision-language models. arXiv preprint arXiv:2512.06328, 2025

  17. [58]

    J. Li, W. Ma, X. Li, Y. Lou, G. Zhou, and X. Zhou. Cad-llama: Leveraging large language models for computer-aided design parametric 3d model gen- eration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 18563–18573, 2025

  18. [59]

    Stitch-a-shape: Bottom-up learning for b-rep generation

    Pu Li, Wenhao Zhang, Jinglu Chen, and Dongming Yan. Stitch-a-shape: Bottom-up learning for b-rep generation. In Proceedings of the Special Interest Group on Computer Graphics and Interactive Tech- niques Conference Conference Papers , pages 1–12, 2025

  19. [60]

    Llm4cad: Multi-modal large language models for 3d computer- aided design generation

    Xingang Li, Yuewan Sun, and Zhenghui Sha. Llm4cad: Multi-modal large language models for 3d computer- aided design generation. In International Design Engineering Technical Conferences and Computers and Information in Engineering Conference , page V006T06A015. American Society of Mechanical En- gineers, 2024

  20. [61]

    X. Li, J. Li, Y. Song, Y. Lou, and X. Zhou. Seek-cad: A self-refined generative modeling for 3d parametric cad using local inference via deepseek. arXiv preprint arXiv:2505.17702, 2025

  21. [62]

    Mamba-cad: State space model for 3d computer-aided design generative modeling

    Xueyang Li, Yunzhong Lou, Yu Song, and Xiang- dong Zhou. Mamba-cad: State space model for 3d computer-aided design generative modeling. In Pro- ceedings of the AAAI Conference on Artificial Intelli- gence, pages 5013–5021, 2025

  22. [63]

    Mallis, A

    D. Mallis, A. S. Karadeniz, S. Cavada, D. Rukhovich, N. Foteinopoulou, K. Cherenkova, A. Kacem, and D. Aouada. Cad-assistant: Tool-augmented vllms as generic cad task solvers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages 7284–7294, 2025

  23. [64]

    Videocad: A dataset and model for learning long-horizon 3d cad ui interactions from video

    Brandon Man, Ghadi Nehme, Md Ferdous Alam, and Faez Ahmed. Videocad: A dataset and model for learning long-horizon 3d cad ui interactions from video. In The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track

  24. [65]

    freecad-mcp

    neka-nat. freecad-mcp. https://github.com/neka- nat/freecad-mcp, 2026. Accessed: 2026-05-15

  25. [66]

    From idea to cad: a language model-driven multi-agent system for collaborative design

    Felix Ocker, Stefan Menzel, Ahmed Sadik, and Thi- ago Rios. From idea to cad: a language model-driven multi-agent system for collaborative design. arXiv preprint arXiv:2503.04417 , 2025

  26. [67]

    Pointer-cad: Unifying b-rep and com- mand sequences via pointer-based edges & faces selec- tion

    Dacheng Qi, Chenyu Wang, Jingwei Xu, Tianzhe Chu, Zibo Zhao, Wen Liu, Wenrui Ding, Yi Ma, and Shenghua Gao. Pointer-cad: Unifying b-rep and com- mand sequences via pointer-based edges & faces selec- tion. In CVPR, 2026

  27. [68]

    Drawing2cad: Sequence-to-sequence learning for cad generation from vector drawings

    Feiwei Qin, Shichao Lu, Junhao Hou, Changmiao Wang, Meie Fang, and Ligang Liu. Drawing2cad: Sequence-to-sequence learning for cad generation from vector drawings. In Proceedings of the 33rd ACM In- ternational Conference on Multimedia , pages 10573– 10582, 2025

  28. [69]

    Rukhovich, E

    D. Rukhovich, E. Dupont, D. Mallis, K. Cherenkova, A. Kacem, and D. Aouada. Cad-recode: Reverse engi- neering cad code from point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages 9801–9811, 2025

  29. [70]

    Toolformer: Language models can teach themselves to use tools

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. Advances in neural information processing systems, 36:68539–68551, 2023

  30. [71]

    Vq-cad: Computer-aided design model generation with vector quantized diffu- sion

    Hanxiao Wang, Mingyang Zhao, Yiqun Wang, Weize Quan, and Dong-Ming Yan. Vq-cad: Computer-aided design model generation with vector quantized diffu- sion. Computer Aided Geometric Design , 111:102327, 2024

  31. [72]

    R. Wang, Y. Yuan, S. Sun, and J. Bian. Text-to-cad generation through infusing visual feedback in large language models. arXiv preprint arXiv:2501.19054 , 2025

  32. [73]

    Cad-gpt: Synthesis- ing cad construction sequence with spatial reasoning- enhanced multimodal llms

    Siyu Wang, Cailian Chen, Xinyi Le, Qimin Xu, Lei Xu, Yanzhou Zhang, and Jie Yang. Cad-gpt: Synthesis- ing cad construction sequence with spatial reasoning- enhanced multimodal llms. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 7880–7888, 2025

  33. [74]

    K. D. Willis, Y. Pu, J. Luo, H. Chu, T. Du, J. G. Lam- bourne, A. Solar-Lezama, and W. Matusik. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences. ACM Transactions on Graphics , 40(4):1–24, 2021

  34. [75]

    R. Wu, C. Xiao, and C. Zheng. Deepcad: A deep gen- erative network for computer-aided design models. In Proceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV) , pages 6772–6782, 2021

  35. [76]

    S. Wu, A. Khasahmadi, M. Katz, P. K. Jayaraman, Y. Pu, K. Willis, and B. Liu. Cad-llm: Large language model for cad generation. In Advances in Neural In- formation Processing Systems (NeurIPS) , 2023

  36. [77]

    S. Wu, A. H. Khasahmadi, M. Katz, P. K. Jayaraman, Y. Pu, K. Willis, and B. Liu. Cadvlm: Bridging lan- guage and vision in the generation of parametric cad sketches. In European Conference on Computer Vision (ECCV), pages 368–384. Springer, 2024

  37. [78]

    J. Xu, C. Wang, Z. Zhao, W. Liu, Y. Ma, and S. Gao. Cad-mllm: Unifying multimodality- conditioned cad generation with mllm. arXiv preprint arXiv:2411.04954, 2024

  38. [79]

    X. Xu, K. D. Willis, J. G. Lambourne, C. Y. Cheng, P. K. Jayaraman, and Y. Furukawa. Skex- gen: Autoregressive generation of cad construction se- quences with disentangled codebooks. arXiv preprint arXiv:2207.04632, 2022

  39. [80]

    X. Xu, P. K. Jayaraman, J. G. Lambourne, K. D. Willis, and Y. Furukawa. Hierarchical neural coding for controllable cad model generation. arXiv preprint arXiv:2307.00149, 2023

  40. [81]

    X. Xu, J. Lambourne, P. Jayaraman, Z. Wang, K. Willis, and Y. Furukawa. Brepgen: A b-rep generative diffusion model with structured latent geometry. ACM Transactions on Graphics , 43(4):1–14, 2024

  41. [82]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 , 2022

  42. [83]

    X. Yin, X. Lu, J. Shen, J. Ni, H. Li, R. Tong, M. Tang, and P. Du. Rlcad: Reinforcement learning training gym for revolution involved cad command sequence generation. Computer-Aided Design , page 104027, 2026

  43. [84]

    Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M

    Zeqing Yuan, Haoxuan Lan, Qiang Zou, and Junbo Zhao. 3d-premise: Can large language models generate 3D shapes with sharp features and parametric control? arXiv preprint arXiv:2401.06437 , 2024

  44. [85]

    Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

    Huichi Zhou, Yihang Chen, Siyuan Guo, Xu Yan, Kin-Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, and Jun Wang. Me- mento: Fine-tuning llm agents without fine-tuning llms. ArXiv, abs/2508.16153, 2025

  45. [86]

    Memento-skills: Let agents de- sign agents

    Huichi Zhou, Siyuan Guo, Anji Liu, Zhongwei Yu, Ziqin Gong, Bowen Zhao, Zhixun Chen, Menglong Zhang, Yihang Chen, Jinsong Li, Runyu Yang, Qiang Liu, Xinle Yu, Jianming Zhou, Na Wang, Chunyang Sun, and Jun Wang. Memento-skills: Let agents de- sign agents. 2026