Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration
Pith reviewed 2026-05-21 10:27 UTC · model grok-4.3
The pith
COSMO-Agent trains small open-source LLMs to close the CAD-CAE loop by orchestrating tools and revising geometries under coupled constraints.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
COSMO-Agent casts CAD generation, CAE solving, result parsing, and geometry revision as an interactive RL environment in which an LLM learns to orchestrate external tools and revise parametric geometries until constraints are satisfied, using a multi-constraint reward that jointly encourages feasibility, toolchain robustness, and structured output validity, supported by an industry-aligned dataset of 25 component categories.
What carries the argument
COSMO-Agent, a tool-augmented reinforcement learning framework that teaches LLMs to complete the closed-loop CAD-CAE process through iterative tool orchestration and geometry revision.
If this is right
- Small open-source LLMs reach higher feasibility, efficiency, and stability than larger open-source and closed-source models on constraint-driven design tasks.
- The closed-loop process reduces reliance on manual translation between simulation feedback and geometric edits.
- Training produces structured outputs and robust tool chaining that remain consistent across the 25 component categories.
- The multi-constraint reward enables handling of diverse, interacting requirements without separate optimization stages.
Where Pith is reading between the lines
- The same agent structure could extend to other engineering domains that require repeated modeling-analysis cycles, such as structural or thermal optimization.
- Integration into commercial CAD platforms might shorten overall design cycles by automating revision steps that currently need expert intervention.
- Further tests on real manufacturing data outside the 25 categories would reveal how far the learned behavior transfers to production settings.
Load-bearing premise
The multi-constraint reward function and the dataset of 25 component categories are sufficient to produce stable, industrially usable learning that generalizes across diverse coupled constraints.
What would settle it
Train a small open-source LLM with COSMO-Agent on the provided dataset, then evaluate feasibility, efficiency, and stability on a fresh collection of CAD-CAE tasks that introduce new combinations of coupled constraints; failure to exceed large open-source and strong closed-source baselines on these metrics would falsify the central claim.
Figures
read the original abstract
Iterative industrial design-simulation optimization is bottlenecked by the CAD-CAE semantic gap: translating simulation feedback into valid geometric edits under diverse, coupled constraints. To fill this gap, we propose COSMO-Agent (Closed-loop Optimization, Simulation, and Modeling Orchestration), a tool-augmented reinforcement learning (RL) framework that teaches LLMs to complete the closed-loop CAD-CAE process. Specifically, we cast CAD generation, CAE solving, result parsing, and geometry revision as an interactive RL environment, where an LLM learns to orchestrate external tools and revise parametric geometries until constraints are satisfied. To make this learning stable and industrially usable, we design a multi-constraint reward that jointly encourages feasibility, toolchain robustness, and structured output validity. In addition, we contribute an industry-aligned dataset that covers 25 component categories with executable CAD-CAE tasks to support realistic training and evaluation. Experiments show that COSMO-Agent training substantially improves small open-source LLMs for constraint-driven design, exceeding large open-source and strong closed-source models in feasibility, efficiency, and stability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces COSMO-Agent, a tool-augmented reinforcement learning framework for training LLMs to orchestrate CAD generation, CAE solving, result parsing, and geometry revision in a closed-loop interactive environment. It designs a multi-constraint reward encouraging feasibility, toolchain robustness, and output validity, contributes an industry-aligned dataset of 25 component categories with executable CAD-CAE tasks, and reports that this training substantially improves small open-source LLMs to exceed large open-source and strong closed-source models in feasibility, efficiency, and stability for constraint-driven design.
Significance. If the central empirical claims hold under rigorous evaluation, the work would be significant for demonstrating how RL-based tool augmentation can bridge the CAD-CAE semantic gap in iterative industrial design. The multi-constraint reward and contributed dataset represent concrete engineering contributions that could support reproducible progress in automated optimization; the approach of casting the full toolchain as an RL environment is a clear methodological strength.
major comments (1)
- [Experiments] The central generalization claim—that COSMO-Agent training yields stable, industrially usable policies across diverse coupled constraints—rests on evaluation within the 25-category dataset. No explicit out-of-distribution tests for novel constraint couplings or unseen component topologies are described, which is load-bearing because in-distribution performance gains could arise from dataset matching rather than learned closed-loop orchestration skill.
minor comments (2)
- [Title] Title contains a formatting error: 'Optimization,Simulation,and' should read 'Optimization, Simulation, and'.
- [Abstract] The abstract states that the dataset supports 'realistic training and evaluation' but does not specify how task executability or constraint satisfaction is verified in the RL loop.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address the major comment below and clarify our position on generalization while committing to revisions that strengthen the empirical support.
read point-by-point responses
-
Referee: The central generalization claim—that COSMO-Agent training yields stable, industrially usable policies across diverse coupled constraints—rests on evaluation within the 25-category dataset. No explicit out-of-distribution tests for novel constraint couplings or unseen component topologies are described, which is load-bearing because in-distribution performance gains could arise from dataset matching rather than learned closed-loop orchestration skill.
Authors: We agree that dedicated out-of-distribution (OOD) evaluation on entirely novel constraint couplings and unseen topologies would provide stronger evidence for the learned orchestration skill. The 25-category dataset was deliberately curated from industry sources to span diverse component topologies and coupled constraints typical of real design tasks; the consistent gains across categories (especially small models outperforming larger baselines) suggest the multi-constraint reward and RL loop encourage general tool-use policies rather than category-specific memorization. Nevertheless, the absence of explicit held-out OOD splits is a valid limitation of the current experiments. In the revision we will add a new subsection with (i) cross-category generalization analysis and (ii) preliminary results on a held-out set of constraint combinations and topologies not used in training, thereby directly addressing the concern. revision: yes
Circularity Check
No circularity in empirical RL training and evaluation
full rationale
The paper proposes COSMO-Agent as a tool-augmented RL framework, defines a multi-constraint reward for stability, contributes a 25-category dataset, and reports experimental gains in feasibility and stability for trained LLMs. These are empirical outcomes from training and evaluation rather than any derivation, prediction, or claim that reduces by construction to fitted parameters, self-referential definitions, or load-bearing self-citations. The central results rest on observed performance metrics outside any tautological loop.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Do as I can, not as I say: Grounding language in robotic affor- dances
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Cheb- otar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. Do as I can, not as I say: Grounding language in robotic affor- dances. InConference on Robot Learning, CoRL 2022, 14- 18 December 2022, Auckland, New Zealand, pages 287–318. PMLR, 2022. 2, 3
work page 2022
-
[2]
Claude Sonnet 4.5: System Card
Anthropic. Claude Sonnet 4.5: System Card. System card, Anthropic PBC, 2025. 6
work page 2025
-
[3]
Charles Audet and John E. Dennis, Jr. Mesh adaptive direct search algorithms for constrained optimization.SIAM Jour- nal on Optimization, 17(1):188–217, 2006. 2
work page 2006
-
[4]
Intern- s1: A scientific multimodal foundation model, 2025
Lei Bai, Zhongrui Cai, Maosong Cao, Weihan Cao, Chiyu Chen, Haojiong Chen, Kai Chen, Pengcheng Chen, Ying Chen, Yongkang Chen, Yu Cheng, Yu Cheng, Pei Chu, Tao Chu, Erfei Cui, Ganqu Cui, Long Cui, Ziyun Cui, Ni- anchen Deng, Ning Ding, Nanqin Dong, Peijie Dong, Shi- han Dou, Sinan Du, Haodong Duan, Caihua Fan, Ben Gao, Changjiang Gao, Jianfei Gao, Songyan...
work page 2025
-
[5]
Metaopenfoam: an llm-based multi-agent framework for cfd
Yuxuan Chen, Xu Zhu, Hua Zhou, and Zhuyin Ren. Metaopenfoam: an llm-based multi-agent framework for cfd. arXiv preprint arXiv:2407.21320, 2024. 2
-
[6]
Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Mar- tic, Shane Legg, and Dario Amodei. completelyinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, pages 4299–4307, 2017. 2
work page 2017
- [7]
-
[8]
Zhehao Dong, Zhen Lu, and Yue Yang. Fine-tuning a large language model for automating computational fluid dynam- ics simulations.Theoretical and Applied Mechanics Letters, page 100594, 2025. 2
work page 2025
-
[9]
Christophe Geuzaine and Jean-Franc ¸ois Remacle. Gmsh: A 3-d finite element mesh generator with built-in pre-and post-processing facilities.International journal for numer- ical methods in engineering, 79(11):1309–1331, 2009. 6
work page 2009
- [10]
-
[11]
Model card for the Gemini 3 Flash generative AI model. 6
-
[12]
Nikolaus Hansen and Andreas Ostermeier. Completely de- randomized self-adaptation in evolution strategies.Evolu- tionary Computation, 9(2):159–195, 2001. 2
work page 2001
-
[13]
Difftaichi: Differentiable programming for physical simulation
Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Fr´edo Durand. Difftaichi: Differentiable programming for physical simulation. InIn- ternational Conference on Learning Representations, 2020. 2
work page 2020
-
[14]
Jones, Matthias Schonlau, and William J
Donald R. Jones, Matthias Schonlau, and William J. Welch. Efficient global optimization of expensive black-box func- tions.Journal of Global Optimization, 13(4):455–492, 1998. 2
work page 1998
-
[15]
Ehud Karpas, Omri Abend, Yonatan Belinkov, Barak Lenz, Opher Lieber, Nir Ratner, Yoav Shoham, Hofit Bata, Yoav Levine, Kevin Leyton-Brown, et al. Mrkl systems: A mod- ular, neuro-symbolic architecture that combines large lan- guage models, external knowledge sources and discrete rea- soning.arXiv preprint arXiv:2205.00445, 2022. 3
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[16]
Internbootcamp technical report: Boosting llm reasoning with verifiable task scaling, 2025
Peiji Li, Jiasheng Ye, Yongkang Chen, Yichuan Ma, Zijie Yu, Kedi Chen, Ganqu Cui, Haozhan Li, Jiacheng Chen, Chengqi Lyu, Wenwei Zhang, Linyang Li, Qipeng Guo, Dahua Lin, Bowen Zhou, and Kai Chen. Internbootcamp technical report: Boosting llm reasoning with verifiable task scaling, 2025. 3, 6
work page 2025
-
[17]
Llm4cad: Multi-modal large language models for 3d computer-aided design generation
Xingang Li, Yuewan Sun, and Zhenghui Sha. Llm4cad: Multi-modal large language models for 3d computer-aided design generation. InInternational Design Engineering Technical Conferences and Computers and Information in Engineering Conference, page V006T06A015. American Society of Mechanical Engineers, 2024. 2
work page 2024
-
[18]
Zongyi Li, Nikola B. Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021. 2
work page 2021
-
[19]
Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021. 2
work page 2021
-
[20]
Cad- assistant: tool-augmented vllms as generic cad task solvers
Dimitrios Mallis, Ahmet Serda Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad- assistant: tool-augmented vllms as generic cad task solvers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7284–7294, 2025. 2
work page 2025
-
[21]
Llama 4 Scout (17B×16E) Instruct: Model Card
Meta. Llama 4 Scout (17B×16E) Instruct: Model Card. On- line model card, 2025. Model release date: April 5, 2025. Accessed: 2026-01-23. 6
work page 2025
-
[22]
Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agar- wal, Katarina Slama, Alex Ray, et al
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agar- wal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. InAd- vances in Neural Information Processing Systems, 2022. 2
work page 2022
-
[23]
Toolllm: Facilitating large language models to mas- ter 16000+ real-world apis
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. Toolllm: Facilitating large language models to mas- ter 16000+ real-world apis. InInternational Conference on Learning Represent...
work page 2024
-
[24]
Maziar Raissi, Paris Perdikaris, and George E. Karniadakis. Physics-informed neural networks: A deep learning frame- work for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computa- tional Physics, 378:686–707, 2019. 2
work page 2019
-
[25]
Juergen Riegel, Werner Mayer, and Yorik van Havre. Freecad, 2001–2017. Accessed: 2001–2017. 6
work page 2001
-
[26]
Toolformer: Language models can teach themselves to use tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dess `ı, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Can- cedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems (NeurIPS), pages 68539– 68551, 2023. 2, 3
work page 2023
-
[27]
Ari Seff, Yaniv Ovadia, Wenda Zhou, and Ryan P. Adams. Sketchgraphs: A large-scale dataset for modeling relational geometry in computer-aided design, 2020. 2
work page 2020
-
[28]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. 5
work page 2024
-
[29]
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf frame- work.arXiv preprint arXiv: 2409.19256, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[30]
Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practi- cal bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, pages 2960–2968, 2012. 2
work page 2012
- [31]
-
[32]
Smith, Daniel Khashabi, and Hannaneh Hajishirzi
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language models with self-generated instructions. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023. 2
work page 2023
-
[33]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), pages 24824–24837, 2022. 2
work page 2022
-
[34]
Karl D. D. Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: a dataset and en- vironment for programmatic CAD construction from human design sequences.ACM Trans. Graph., 40(4):54:1–54:24,
-
[35]
Karl D. D. Willis, Pradeep Kumar Jayaraman, Hang Chu, Yunsheng Tian, Yifei Li, Daniele Grandi, Aditya Sanghi, Linh Tran, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Joinable: Learning bottom-up as- sembly of parametric CAD joints. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 1...
work page 2022
-
[36]
Text-to-CadQuery: A New Paradigm for CADgenerationwithscalablelargemodelcapabilities
Haoyang Xie and Feng Ju. Text-to-cadquery: A new paradigm for cad generation with scalable large model ca- pabilities.arXiv preprint arXiv:2505.06507, 2025. 2
-
[37]
Zhaoyue Xu, Long Wang, Chunyu Wang, Yixin Chen, Qingyong Luo, Hua-Dong Yao, Shizhao Wang, and Guowei He. Cfdagent: A language-guided, zero-shot multi-agent system for complex flow simulation.Physics of Fluids, 37 (11), 2025. 2
work page 2025
-
[38]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jia- long Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang...
work page 2025
-
[39]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023. 2, 3
work page 2023
-
[40]
Openecad: An efficient visual language model for editable 3d-cad design
Zhe Yuan, Jianqi Shi, and Yanhong Huang. Openecad: An efficient visual language model for editable 3d-cad design. Computers & Graphics, 124:104048, 2024. 2
work page 2024
-
[41]
Foam-agent: Towards automated intelligent cfd workflows
Ling Yue, Nithin Somasekharan, Yadi Cao, and Shaowu Pan. Foam-agent: Towards automated intelligent cfd workflows. arXiv preprint arXiv:2505.04997, 2025. 3
-
[42]
Marti: A framework for multi-agent llm systems reinforced training and inference,
Kaiyan Zhang, Runze Liu, Xuekai Zhu, Kai Tian, Sihang Zeng, Guoli Jia, Yuchen Fan, Xingtai Lv, Yuxin Zuo, Che Jiang, Ziyang Liu, Jianyu Wang, Yuru Wang, Ruotong Zhao, Ermo Hua, Yibo Wang, Shijie Wang, Junqi Gao, Xinwei Long, Youbang Sun, Zhiyuan Ma, Ganqu Cui, Lei Bai, Ning Ding, Biqing Qi, and Bowen Zhou. Marti: A framework for multi-agent llm systems re...
- [43]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.