pith. sign in

arxiv: 2605.20190 · v1 · pith:WYIFL4NMnew · submitted 2026-04-01 · 💻 cs.AI · cs.GR

Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

Pith reviewed 2026-05-21 10:27 UTC · model grok-4.3

classification 💻 cs.AI cs.GR
keywords tool-augmented agentsreinforcement learningCAD-CAE optimizationclosed-loop designLLM orchestrationconstraint-driven designmulti-constraint rewardsindustrial simulation
0
0 comments X

The pith

COSMO-Agent trains small open-source LLMs to close the CAD-CAE loop by orchestrating tools and revising geometries under coupled constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a reinforcement learning framework that equips language models with the ability to generate CAD models, run simulations, interpret results, and adjust designs iteratively until all constraints are met. This tackles the persistent difficulty in industrial workflows where simulation outcomes do not automatically translate into valid geometric modifications across multiple interacting requirements. A multi-constraint reward system and a dataset covering 25 component categories guide the training toward feasible outputs, robust tool use, and valid structures. If the approach holds, smaller models gain the capacity to perform tasks that currently demand larger systems or human oversight.

Core claim

COSMO-Agent casts CAD generation, CAE solving, result parsing, and geometry revision as an interactive RL environment in which an LLM learns to orchestrate external tools and revise parametric geometries until constraints are satisfied, using a multi-constraint reward that jointly encourages feasibility, toolchain robustness, and structured output validity, supported by an industry-aligned dataset of 25 component categories.

What carries the argument

COSMO-Agent, a tool-augmented reinforcement learning framework that teaches LLMs to complete the closed-loop CAD-CAE process through iterative tool orchestration and geometry revision.

If this is right

  • Small open-source LLMs reach higher feasibility, efficiency, and stability than larger open-source and closed-source models on constraint-driven design tasks.
  • The closed-loop process reduces reliance on manual translation between simulation feedback and geometric edits.
  • Training produces structured outputs and robust tool chaining that remain consistent across the 25 component categories.
  • The multi-constraint reward enables handling of diverse, interacting requirements without separate optimization stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same agent structure could extend to other engineering domains that require repeated modeling-analysis cycles, such as structural or thermal optimization.
  • Integration into commercial CAD platforms might shorten overall design cycles by automating revision steps that currently need expert intervention.
  • Further tests on real manufacturing data outside the 25 categories would reveal how far the learned behavior transfers to production settings.

Load-bearing premise

The multi-constraint reward function and the dataset of 25 component categories are sufficient to produce stable, industrially usable learning that generalizes across diverse coupled constraints.

What would settle it

Train a small open-source LLM with COSMO-Agent on the provided dataset, then evaluate feasibility, efficiency, and stability on a fresh collection of CAD-CAE tasks that introduce new combinations of coupled constraints; failure to exceed large open-source and strong closed-source baselines on these metrics would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.20190 by Huaxi Huang, Linyang Li, Liyuan Deng, Shujian Deng, Xiao Sun, Yilei Shi, Yongkang Chen, Yongkang Dai, Zhihang Zhong.

Figure 1
Figure 1. Figure 1: COSMO-Agent performs closed-loop CAD–CAE optimization by iteratively generating parametric geometry, running CAE [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: COSMO-Agent: (a) overall closed-loop framework, (b) MCP tool set for CAD–CAE optimization, and (c) training reward [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualized inference cases of COSMO-Agent. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Iterative industrial design-simulation optimization is bottlenecked by the CAD-CAE semantic gap: translating simulation feedback into valid geometric edits under diverse, coupled constraints. To fill this gap, we propose COSMO-Agent (Closed-loop Optimization, Simulation, and Modeling Orchestration), a tool-augmented reinforcement learning (RL) framework that teaches LLMs to complete the closed-loop CAD-CAE process. Specifically, we cast CAD generation, CAE solving, result parsing, and geometry revision as an interactive RL environment, where an LLM learns to orchestrate external tools and revise parametric geometries until constraints are satisfied. To make this learning stable and industrially usable, we design a multi-constraint reward that jointly encourages feasibility, toolchain robustness, and structured output validity. In addition, we contribute an industry-aligned dataset that covers 25 component categories with executable CAD-CAE tasks to support realistic training and evaluation. Experiments show that COSMO-Agent training substantially improves small open-source LLMs for constraint-driven design, exceeding large open-source and strong closed-source models in feasibility, efficiency, and stability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces COSMO-Agent, a tool-augmented reinforcement learning framework for training LLMs to orchestrate CAD generation, CAE solving, result parsing, and geometry revision in a closed-loop interactive environment. It designs a multi-constraint reward encouraging feasibility, toolchain robustness, and output validity, contributes an industry-aligned dataset of 25 component categories with executable CAD-CAE tasks, and reports that this training substantially improves small open-source LLMs to exceed large open-source and strong closed-source models in feasibility, efficiency, and stability for constraint-driven design.

Significance. If the central empirical claims hold under rigorous evaluation, the work would be significant for demonstrating how RL-based tool augmentation can bridge the CAD-CAE semantic gap in iterative industrial design. The multi-constraint reward and contributed dataset represent concrete engineering contributions that could support reproducible progress in automated optimization; the approach of casting the full toolchain as an RL environment is a clear methodological strength.

major comments (1)
  1. [Experiments] The central generalization claim—that COSMO-Agent training yields stable, industrially usable policies across diverse coupled constraints—rests on evaluation within the 25-category dataset. No explicit out-of-distribution tests for novel constraint couplings or unseen component topologies are described, which is load-bearing because in-distribution performance gains could arise from dataset matching rather than learned closed-loop orchestration skill.
minor comments (2)
  1. [Title] Title contains a formatting error: 'Optimization,Simulation,and' should read 'Optimization, Simulation, and'.
  2. [Abstract] The abstract states that the dataset supports 'realistic training and evaluation' but does not specify how task executability or constraint satisfaction is verified in the RL loop.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comment below and clarify our position on generalization while committing to revisions that strengthen the empirical support.

read point-by-point responses
  1. Referee: The central generalization claim—that COSMO-Agent training yields stable, industrially usable policies across diverse coupled constraints—rests on evaluation within the 25-category dataset. No explicit out-of-distribution tests for novel constraint couplings or unseen component topologies are described, which is load-bearing because in-distribution performance gains could arise from dataset matching rather than learned closed-loop orchestration skill.

    Authors: We agree that dedicated out-of-distribution (OOD) evaluation on entirely novel constraint couplings and unseen topologies would provide stronger evidence for the learned orchestration skill. The 25-category dataset was deliberately curated from industry sources to span diverse component topologies and coupled constraints typical of real design tasks; the consistent gains across categories (especially small models outperforming larger baselines) suggest the multi-constraint reward and RL loop encourage general tool-use policies rather than category-specific memorization. Nevertheless, the absence of explicit held-out OOD splits is a valid limitation of the current experiments. In the revision we will add a new subsection with (i) cross-category generalization analysis and (ii) preliminary results on a held-out set of constraint combinations and topologies not used in training, thereby directly addressing the concern. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical RL training and evaluation

full rationale

The paper proposes COSMO-Agent as a tool-augmented RL framework, defines a multi-constraint reward for stability, contributes a 25-category dataset, and reports experimental gains in feasibility and stability for trained LLMs. These are empirical outcomes from training and evaluation rather than any derivation, prediction, or claim that reduces by construction to fitted parameters, self-referential definitions, or load-bearing self-citations. The central results rest on observed performance metrics outside any tautological loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unverified assumption that the RL environment faithfully captures real industrial constraints and that the reward function produces stable policy improvement without post-hoc tuning. No free parameters, axioms, or invented entities are explicitly described in the abstract.

pith-pipeline@v0.9.0 · 5748 in / 1155 out tokens · 45530 ms · 2026-05-21T10:27:08.080927+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    Do as I can, not as I say: Grounding language in robotic affor- dances

    Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Cheb- otar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. Do as I can, not as I say: Grounding language in robotic affor- dances. InConference on Robot Learning, CoRL 2022, 14- 18 December 2022, Auckland, New Zealand, pages 287–318. PMLR, 2022. 2, 3

  2. [2]

    Claude Sonnet 4.5: System Card

    Anthropic. Claude Sonnet 4.5: System Card. System card, Anthropic PBC, 2025. 6

  3. [3]

    Dennis, Jr

    Charles Audet and John E. Dennis, Jr. Mesh adaptive direct search algorithms for constrained optimization.SIAM Jour- nal on Optimization, 17(1):188–217, 2006. 2

  4. [4]

    Intern- s1: A scientific multimodal foundation model, 2025

    Lei Bai, Zhongrui Cai, Maosong Cao, Weihan Cao, Chiyu Chen, Haojiong Chen, Kai Chen, Pengcheng Chen, Ying Chen, Yongkang Chen, Yu Cheng, Yu Cheng, Pei Chu, Tao Chu, Erfei Cui, Ganqu Cui, Long Cui, Ziyun Cui, Ni- anchen Deng, Ning Ding, Nanqin Dong, Peijie Dong, Shi- han Dou, Sinan Du, Haodong Duan, Caihua Fan, Ben Gao, Changjiang Gao, Jianfei Gao, Songyan...

  5. [5]

    Metaopenfoam: an llm-based multi-agent framework for cfd

    Yuxuan Chen, Xu Zhu, Hua Zhou, and Zhuyin Ren. Metaopenfoam: an llm-based multi-agent framework for cfd. arXiv preprint arXiv:2407.21320, 2024. 2

  6. [6]

    Christiano, Jan Leike, Tom B

    Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Mar- tic, Shane Legg, and Dario Amodei. completelyinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, pages 4299–4307, 2017. 2

  7. [7]

    Cadquery, 2025

    CadQuery contributors. Cadquery, 2025. 5

  8. [8]

    Fine-tuning a large language model for automating computational fluid dynam- ics simulations.Theoretical and Applied Mechanics Letters, page 100594, 2025

    Zhehao Dong, Zhen Lu, and Yue Yang. Fine-tuning a large language model for automating computational fluid dynam- ics simulations.Theoretical and Applied Mechanics Letters, page 100594, 2025. 2

  9. [9]

    Gmsh: A 3-d finite element mesh generator with built-in pre-and post-processing facilities.International journal for numer- ical methods in engineering, 79(11):1309–1331, 2009

    Christophe Geuzaine and Jean-Franc ¸ois Remacle. Gmsh: A 3-d finite element mesh generator with built-in pre-and post-processing facilities.International journal for numer- ical methods in engineering, 79(11):1309–1331, 2009. 6

  10. [10]

    Gemini 3 flash model card

    Google DeepMind. Gemini 3 flash model card. Online PDF,

  11. [11]

    Model card for the Gemini 3 Flash generative AI model. 6

  12. [12]

    Completely de- randomized self-adaptation in evolution strategies.Evolu- tionary Computation, 9(2):159–195, 2001

    Nikolaus Hansen and Andreas Ostermeier. Completely de- randomized self-adaptation in evolution strategies.Evolu- tionary Computation, 9(2):159–195, 2001. 2

  13. [13]

    Difftaichi: Differentiable programming for physical simulation

    Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Fr´edo Durand. Difftaichi: Differentiable programming for physical simulation. InIn- ternational Conference on Learning Representations, 2020. 2

  14. [14]

    Jones, Matthias Schonlau, and William J

    Donald R. Jones, Matthias Schonlau, and William J. Welch. Efficient global optimization of expensive black-box func- tions.Journal of Global Optimization, 13(4):455–492, 1998. 2

  15. [15]

    Ehud Karpas, Omri Abend, Yonatan Belinkov, Barak Lenz, Opher Lieber, Nir Ratner, Yoav Shoham, Hofit Bata, Yoav Levine, Kevin Leyton-Brown, et al. Mrkl systems: A mod- ular, neuro-symbolic architecture that combines large lan- guage models, external knowledge sources and discrete rea- soning.arXiv preprint arXiv:2205.00445, 2022. 3

  16. [16]

    Internbootcamp technical report: Boosting llm reasoning with verifiable task scaling, 2025

    Peiji Li, Jiasheng Ye, Yongkang Chen, Yichuan Ma, Zijie Yu, Kedi Chen, Ganqu Cui, Haozhan Li, Jiacheng Chen, Chengqi Lyu, Wenwei Zhang, Linyang Li, Qipeng Guo, Dahua Lin, Bowen Zhou, and Kai Chen. Internbootcamp technical report: Boosting llm reasoning with verifiable task scaling, 2025. 3, 6

  17. [17]

    Llm4cad: Multi-modal large language models for 3d computer-aided design generation

    Xingang Li, Yuewan Sun, and Zhenghui Sha. Llm4cad: Multi-modal large language models for 3d computer-aided design generation. InInternational Design Engineering Technical Conferences and Computers and Information in Engineering Conference, page V006T06A015. American Society of Mechanical Engineers, 2024. 2

  18. [18]

    Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar

    Zongyi Li, Nikola B. Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021. 2

  19. [19]

    Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021

    Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021. 2

  20. [20]

    Cad- assistant: tool-augmented vllms as generic cad task solvers

    Dimitrios Mallis, Ahmet Serda Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad- assistant: tool-augmented vllms as generic cad task solvers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7284–7294, 2025. 2

  21. [21]

    Llama 4 Scout (17B×16E) Instruct: Model Card

    Meta. Llama 4 Scout (17B×16E) Instruct: Model Card. On- line model card, 2025. Model release date: April 5, 2025. Accessed: 2026-01-23. 6

  22. [22]

    Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agar- wal, Katarina Slama, Alex Ray, et al

    Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agar- wal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. InAd- vances in Neural Information Processing Systems, 2022. 2

  23. [23]

    Toolllm: Facilitating large language models to mas- ter 16000+ real-world apis

    Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. Toolllm: Facilitating large language models to mas- ter 16000+ real-world apis. InInternational Conference on Learning Represent...

  24. [24]

    Karniadakis

    Maziar Raissi, Paris Perdikaris, and George E. Karniadakis. Physics-informed neural networks: A deep learning frame- work for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computa- tional Physics, 378:686–707, 2019. 2

  25. [25]

    Freecad, 2001–2017

    Juergen Riegel, Werner Mayer, and Yorik van Havre. Freecad, 2001–2017. Accessed: 2001–2017. 6

  26. [26]

    Toolformer: Language models can teach themselves to use tools

    Timo Schick, Jane Dwivedi-Yu, Roberto Dess `ı, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Can- cedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems (NeurIPS), pages 68539– 68551, 2023. 2, 3

  27. [27]

    Ari Seff, Yaniv Ovadia, Wenda Zhou, and Ryan P. Adams. Sketchgraphs: A large-scale dataset for modeling relational geometry in computer-aided design, 2020. 2

  28. [28]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. 5

  29. [29]

    HybridFlow: A Flexible and Efficient RLHF Framework

    Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf frame- work.arXiv preprint arXiv: 2409.19256, 2024. 3

  30. [30]

    Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practi- cal bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, pages 2960–2968, 2012. 2

  31. [31]

    Qwen3 technical report, 2025

    Qwen Team. Qwen3 technical report, 2025. 6

  32. [32]

    Smith, Daniel Khashabi, and Hannaneh Hajishirzi

    Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language models with self-generated instructions. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023. 2

  33. [33]

    Chain-of-thought prompting elicits reasoning in large language models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), pages 24824–24837, 2022. 2

  34. [34]

    Karl D. D. Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: a dataset and en- vironment for programmatic CAD construction from human design sequences.ACM Trans. Graph., 40(4):54:1–54:24,

  35. [35]

    Karl D. D. Willis, Pradeep Kumar Jayaraman, Hang Chu, Yunsheng Tian, Yifei Li, Daniele Grandi, Aditya Sanghi, Linh Tran, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Joinable: Learning bottom-up as- sembly of parametric CAD joints. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 1...

  36. [36]

    Text-to-CadQuery: A New Paradigm for CADgenerationwithscalablelargemodelcapabilities

    Haoyang Xie and Feng Ju. Text-to-cadquery: A new paradigm for cad generation with scalable large model ca- pabilities.arXiv preprint arXiv:2505.06507, 2025. 2

  37. [37]

    Cfdagent: A language-guided, zero-shot multi-agent system for complex flow simulation.Physics of Fluids, 37 (11), 2025

    Zhaoyue Xu, Long Wang, Chunyu Wang, Yixin Chen, Qingyong Luo, Hua-Dong Yao, Shizhao Wang, and Guowei He. Cfdagent: A language-guided, zero-shot multi-agent system for complex flow simulation.Physics of Fluids, 37 (11), 2025. 2

  38. [38]

    Qwen3 technical report, 2025

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jia- long Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang...

  39. [39]

    Narasimhan, and Yuan Cao

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023. 2, 3

  40. [40]

    Openecad: An efficient visual language model for editable 3d-cad design

    Zhe Yuan, Jianqi Shi, and Yanhong Huang. Openecad: An efficient visual language model for editable 3d-cad design. Computers & Graphics, 124:104048, 2024. 2

  41. [41]

    Foam-agent: Towards automated intelligent cfd workflows

    Ling Yue, Nithin Somasekharan, Yadi Cao, and Shaowu Pan. Foam-agent: Towards automated intelligent cfd workflows. arXiv preprint arXiv:2505.04997, 2025. 3

  42. [42]

    Marti: A framework for multi-agent llm systems reinforced training and inference,

    Kaiyan Zhang, Runze Liu, Xuekai Zhu, Kai Tian, Sihang Zeng, Guoli Jia, Yuchen Fan, Xingtai Lv, Yuxin Zuo, Che Jiang, Ziyang Liu, Jianyu Wang, Yuru Wang, Ruotong Zhao, Ermo Hua, Yibo Wang, Shijie Wang, Junqi Gao, Xinwei Long, Youbang Sun, Zhiyuan Ma, Ganqu Cui, Lei Bai, Ning Ding, Biqing Qi, and Bowen Zhou. Marti: A framework for multi-agent llm systems re...

  43. [43]

    Zhang, Z

    Tao Zhang, Zhenhai Liu, Yong Xin, and Yongjun Jiao. Mooseagent: A llm based multi-agent frame- work for automating moose simulation.arXiv preprint arXiv:2504.08621, 2025. 3