Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

Huaxi Huang; Linyang Li; Liyuan Deng; Shujian Deng; Xiao Sun; Yilei Shi; Yongkang Chen; Yongkang Dai; Zhihang Zhong

arxiv: 2605.20190 · v1 · pith:WYIFL4NMnew · submitted 2026-04-01 · 💻 cs.AI · cs.GR

Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

Liyuan Deng , Shujian Deng , Yongkang Chen , Yongkang Dai , Zhihang Zhong , Linyang Li , Xiao Sun , Yilei Shi

show 1 more author

Huaxi Huang

This is my paper

Pith reviewed 2026-05-21 10:27 UTC · model grok-4.3

classification 💻 cs.AI cs.GR

keywords tool-augmented agentsreinforcement learningCAD-CAE optimizationclosed-loop designLLM orchestrationconstraint-driven designmulti-constraint rewardsindustrial simulation

0 comments

The pith

COSMO-Agent trains small open-source LLMs to close the CAD-CAE loop by orchestrating tools and revising geometries under coupled constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a reinforcement learning framework that equips language models with the ability to generate CAD models, run simulations, interpret results, and adjust designs iteratively until all constraints are met. This tackles the persistent difficulty in industrial workflows where simulation outcomes do not automatically translate into valid geometric modifications across multiple interacting requirements. A multi-constraint reward system and a dataset covering 25 component categories guide the training toward feasible outputs, robust tool use, and valid structures. If the approach holds, smaller models gain the capacity to perform tasks that currently demand larger systems or human oversight.

Core claim

COSMO-Agent casts CAD generation, CAE solving, result parsing, and geometry revision as an interactive RL environment in which an LLM learns to orchestrate external tools and revise parametric geometries until constraints are satisfied, using a multi-constraint reward that jointly encourages feasibility, toolchain robustness, and structured output validity, supported by an industry-aligned dataset of 25 component categories.

What carries the argument

COSMO-Agent, a tool-augmented reinforcement learning framework that teaches LLMs to complete the closed-loop CAD-CAE process through iterative tool orchestration and geometry revision.

If this is right

Small open-source LLMs reach higher feasibility, efficiency, and stability than larger open-source and closed-source models on constraint-driven design tasks.
The closed-loop process reduces reliance on manual translation between simulation feedback and geometric edits.
Training produces structured outputs and robust tool chaining that remain consistent across the 25 component categories.
The multi-constraint reward enables handling of diverse, interacting requirements without separate optimization stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same agent structure could extend to other engineering domains that require repeated modeling-analysis cycles, such as structural or thermal optimization.
Integration into commercial CAD platforms might shorten overall design cycles by automating revision steps that currently need expert intervention.
Further tests on real manufacturing data outside the 25 categories would reveal how far the learned behavior transfers to production settings.

Load-bearing premise

The multi-constraint reward function and the dataset of 25 component categories are sufficient to produce stable, industrially usable learning that generalizes across diverse coupled constraints.

What would settle it

Train a small open-source LLM with COSMO-Agent on the provided dataset, then evaluate feasibility, efficiency, and stability on a fresh collection of CAD-CAE tasks that introduce new combinations of coupled constraints; failure to exceed large open-source and strong closed-source baselines on these metrics would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.20190 by Huaxi Huang, Linyang Li, Liyuan Deng, Shujian Deng, Xiao Sun, Yilei Shi, Yongkang Chen, Yongkang Dai, Zhihang Zhong.

**Figure 1.** Figure 1: COSMO-Agent performs closed-loop CAD–CAE optimization by iteratively generating parametric geometry, running CAE [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: COSMO-Agent: (a) overall closed-loop framework, (b) MCP tool set for CAD–CAE optimization, and (c) training reward [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Visualized inference cases of COSMO-Agent. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Iterative industrial design-simulation optimization is bottlenecked by the CAD-CAE semantic gap: translating simulation feedback into valid geometric edits under diverse, coupled constraints. To fill this gap, we propose COSMO-Agent (Closed-loop Optimization, Simulation, and Modeling Orchestration), a tool-augmented reinforcement learning (RL) framework that teaches LLMs to complete the closed-loop CAD-CAE process. Specifically, we cast CAD generation, CAE solving, result parsing, and geometry revision as an interactive RL environment, where an LLM learns to orchestrate external tools and revise parametric geometries until constraints are satisfied. To make this learning stable and industrially usable, we design a multi-constraint reward that jointly encourages feasibility, toolchain robustness, and structured output validity. In addition, we contribute an industry-aligned dataset that covers 25 component categories with executable CAD-CAE tasks to support realistic training and evaluation. Experiments show that COSMO-Agent training substantially improves small open-source LLMs for constraint-driven design, exceeding large open-source and strong closed-source models in feasibility, efficiency, and stability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

COSMO-Agent sets up an RL environment for LLMs to close CAD-CAE loops with a multi-constraint reward and a 25-category dataset, but the abstract gives no numbers or OOD checks to back the performance claims.

read the letter

The paper's main move is to frame CAD generation, simulation, parsing, and geometry revision as a single interactive RL task where an LLM agent calls external tools until constraints are met. They add a reward that scores feasibility, toolchain stability, and output format together, and they release an industry-style dataset covering 25 component categories with executable tasks. That combination is the concrete new piece: a ready-to-use environment and reward for training agents on closed-loop design rather than isolated generation or analysis steps. The dataset itself is a practical addition that other groups could build on for similar engineering workflows. The claim that training small open-source models this way beats both larger open-source and closed-source models on feasibility and stability is the result they highlight. If the experiments hold up, the setup could reduce manual iteration time in real CAD-CAE pipelines. The soft spot is that the abstract supplies no quantitative results, no baseline comparisons, no statistical tests, and no description of how they measured stability or efficiency. Without those details it is difficult to tell whether the reported gains come from the RL training or from the specific reward and dataset construction. The stress-test note about in-distribution evaluation is on target; nothing in the abstract shows tests on constraint couplings or component types outside the 25-category collection, so any generalization claim rests on unshown evidence. Readers working on tool-augmented agents or AI for mechanical design would find the environment formulation and reward structure useful even if they have to re-run the experiments themselves. The work is coherent on its own terms and engages a real bottleneck, so it clears the bar for peer review. I would send it out with instructions to expand the experimental section and add at least one out-of-distribution test set before acceptance.

Referee Report

1 major / 2 minor

Summary. The paper introduces COSMO-Agent, a tool-augmented reinforcement learning framework for training LLMs to orchestrate CAD generation, CAE solving, result parsing, and geometry revision in a closed-loop interactive environment. It designs a multi-constraint reward encouraging feasibility, toolchain robustness, and output validity, contributes an industry-aligned dataset of 25 component categories with executable CAD-CAE tasks, and reports that this training substantially improves small open-source LLMs to exceed large open-source and strong closed-source models in feasibility, efficiency, and stability for constraint-driven design.

Significance. If the central empirical claims hold under rigorous evaluation, the work would be significant for demonstrating how RL-based tool augmentation can bridge the CAD-CAE semantic gap in iterative industrial design. The multi-constraint reward and contributed dataset represent concrete engineering contributions that could support reproducible progress in automated optimization; the approach of casting the full toolchain as an RL environment is a clear methodological strength.

major comments (1)

[Experiments] The central generalization claim—that COSMO-Agent training yields stable, industrially usable policies across diverse coupled constraints—rests on evaluation within the 25-category dataset. No explicit out-of-distribution tests for novel constraint couplings or unseen component topologies are described, which is load-bearing because in-distribution performance gains could arise from dataset matching rather than learned closed-loop orchestration skill.

minor comments (2)

[Title] Title contains a formatting error: 'Optimization,Simulation,and' should read 'Optimization, Simulation, and'.
[Abstract] The abstract states that the dataset supports 'realistic training and evaluation' but does not specify how task executability or constraint satisfaction is verified in the RL loop.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comment below and clarify our position on generalization while committing to revisions that strengthen the empirical support.

read point-by-point responses

Referee: The central generalization claim—that COSMO-Agent training yields stable, industrially usable policies across diverse coupled constraints—rests on evaluation within the 25-category dataset. No explicit out-of-distribution tests for novel constraint couplings or unseen component topologies are described, which is load-bearing because in-distribution performance gains could arise from dataset matching rather than learned closed-loop orchestration skill.

Authors: We agree that dedicated out-of-distribution (OOD) evaluation on entirely novel constraint couplings and unseen topologies would provide stronger evidence for the learned orchestration skill. The 25-category dataset was deliberately curated from industry sources to span diverse component topologies and coupled constraints typical of real design tasks; the consistent gains across categories (especially small models outperforming larger baselines) suggest the multi-constraint reward and RL loop encourage general tool-use policies rather than category-specific memorization. Nevertheless, the absence of explicit held-out OOD splits is a valid limitation of the current experiments. In the revision we will add a new subsection with (i) cross-category generalization analysis and (ii) preliminary results on a held-out set of constraint combinations and topologies not used in training, thereby directly addressing the concern. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical RL training and evaluation

full rationale

The paper proposes COSMO-Agent as a tool-augmented RL framework, defines a multi-constraint reward for stability, contributes a 25-category dataset, and reports experimental gains in feasibility and stability for trained LLMs. These are empirical outcomes from training and evaluation rather than any derivation, prediction, or claim that reduces by construction to fitted parameters, self-referential definitions, or load-bearing self-citations. The central results rest on observed performance metrics outside any tautological loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unverified assumption that the RL environment faithfully captures real industrial constraints and that the reward function produces stable policy improvement without post-hoc tuning. No free parameters, axioms, or invented entities are explicitly described in the abstract.

pith-pipeline@v0.9.0 · 5748 in / 1155 out tokens · 45530 ms · 2026-05-21T10:27:08.080927+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

Do as I can, not as I say: Grounding language in robotic affor- dances

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Cheb- otar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. Do as I can, not as I say: Grounding language in robotic affor- dances. InConference on Robot Learning, CoRL 2022, 14- 18 December 2022, Auckland, New Zealand, pages 287–318. PMLR, 2022. 2, 3

work page 2022
[2]

Claude Sonnet 4.5: System Card

Anthropic. Claude Sonnet 4.5: System Card. System card, Anthropic PBC, 2025. 6

work page 2025
[3]

Dennis, Jr

Charles Audet and John E. Dennis, Jr. Mesh adaptive direct search algorithms for constrained optimization.SIAM Jour- nal on Optimization, 17(1):188–217, 2006. 2

work page 2006
[4]

Intern- s1: A scientific multimodal foundation model, 2025

Lei Bai, Zhongrui Cai, Maosong Cao, Weihan Cao, Chiyu Chen, Haojiong Chen, Kai Chen, Pengcheng Chen, Ying Chen, Yongkang Chen, Yu Cheng, Yu Cheng, Pei Chu, Tao Chu, Erfei Cui, Ganqu Cui, Long Cui, Ziyun Cui, Ni- anchen Deng, Ning Ding, Nanqin Dong, Peijie Dong, Shi- han Dou, Sinan Du, Haodong Duan, Caihua Fan, Ben Gao, Changjiang Gao, Jianfei Gao, Songyan...

work page 2025
[5]

Metaopenfoam: an llm-based multi-agent framework for cfd

Yuxuan Chen, Xu Zhu, Hua Zhou, and Zhuyin Ren. Metaopenfoam: an llm-based multi-agent framework for cfd. arXiv preprint arXiv:2407.21320, 2024. 2

work page arXiv 2024
[6]

Christiano, Jan Leike, Tom B

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Mar- tic, Shane Legg, and Dario Amodei. completelyinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, pages 4299–4307, 2017. 2

work page 2017
[7]

Cadquery, 2025

CadQuery contributors. Cadquery, 2025. 5

work page 2025
[8]

Fine-tuning a large language model for automating computational fluid dynam- ics simulations.Theoretical and Applied Mechanics Letters, page 100594, 2025

Zhehao Dong, Zhen Lu, and Yue Yang. Fine-tuning a large language model for automating computational fluid dynam- ics simulations.Theoretical and Applied Mechanics Letters, page 100594, 2025. 2

work page 2025
[9]

Gmsh: A 3-d finite element mesh generator with built-in pre-and post-processing facilities.International journal for numer- ical methods in engineering, 79(11):1309–1331, 2009

Christophe Geuzaine and Jean-Franc ¸ois Remacle. Gmsh: A 3-d finite element mesh generator with built-in pre-and post-processing facilities.International journal for numer- ical methods in engineering, 79(11):1309–1331, 2009. 6

work page 2009
[10]

Gemini 3 flash model card

Google DeepMind. Gemini 3 flash model card. Online PDF,

work page
[11]

Model card for the Gemini 3 Flash generative AI model. 6

work page
[12]

Completely de- randomized self-adaptation in evolution strategies.Evolu- tionary Computation, 9(2):159–195, 2001

Nikolaus Hansen and Andreas Ostermeier. Completely de- randomized self-adaptation in evolution strategies.Evolu- tionary Computation, 9(2):159–195, 2001. 2

work page 2001
[13]

Difftaichi: Differentiable programming for physical simulation

Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Fr´edo Durand. Difftaichi: Differentiable programming for physical simulation. InIn- ternational Conference on Learning Representations, 2020. 2

work page 2020
[14]

Jones, Matthias Schonlau, and William J

Donald R. Jones, Matthias Schonlau, and William J. Welch. Efficient global optimization of expensive black-box func- tions.Journal of Global Optimization, 13(4):455–492, 1998. 2

work page 1998
[15]

Ehud Karpas, Omri Abend, Yonatan Belinkov, Barak Lenz, Opher Lieber, Nir Ratner, Yoav Shoham, Hofit Bata, Yoav Levine, Kevin Leyton-Brown, et al. Mrkl systems: A mod- ular, neuro-symbolic architecture that combines large lan- guage models, external knowledge sources and discrete rea- soning.arXiv preprint arXiv:2205.00445, 2022. 3

work page internal anchor Pith review Pith/arXiv arXiv 2022
[16]

Internbootcamp technical report: Boosting llm reasoning with verifiable task scaling, 2025

Peiji Li, Jiasheng Ye, Yongkang Chen, Yichuan Ma, Zijie Yu, Kedi Chen, Ganqu Cui, Haozhan Li, Jiacheng Chen, Chengqi Lyu, Wenwei Zhang, Linyang Li, Qipeng Guo, Dahua Lin, Bowen Zhou, and Kai Chen. Internbootcamp technical report: Boosting llm reasoning with verifiable task scaling, 2025. 3, 6

work page 2025
[17]

Llm4cad: Multi-modal large language models for 3d computer-aided design generation

Xingang Li, Yuewan Sun, and Zhenghui Sha. Llm4cad: Multi-modal large language models for 3d computer-aided design generation. InInternational Design Engineering Technical Conferences and Computers and Information in Engineering Conference, page V006T06A015. American Society of Mechanical Engineers, 2024. 2

work page 2024
[18]

Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar

Zongyi Li, Nikola B. Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021. 2

work page 2021
[19]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021. 2

work page 2021
[20]

Cad- assistant: tool-augmented vllms as generic cad task solvers

Dimitrios Mallis, Ahmet Serda Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad- assistant: tool-augmented vllms as generic cad task solvers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7284–7294, 2025. 2

work page 2025
[21]

Llama 4 Scout (17B×16E) Instruct: Model Card

Meta. Llama 4 Scout (17B×16E) Instruct: Model Card. On- line model card, 2025. Model release date: April 5, 2025. Accessed: 2026-01-23. 6

work page 2025
[22]

Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agar- wal, Katarina Slama, Alex Ray, et al

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agar- wal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. InAd- vances in Neural Information Processing Systems, 2022. 2

work page 2022
[23]

Toolllm: Facilitating large language models to mas- ter 16000+ real-world apis

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. Toolllm: Facilitating large language models to mas- ter 16000+ real-world apis. InInternational Conference on Learning Represent...

work page 2024
[24]

Karniadakis

Maziar Raissi, Paris Perdikaris, and George E. Karniadakis. Physics-informed neural networks: A deep learning frame- work for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computa- tional Physics, 378:686–707, 2019. 2

work page 2019
[25]

Freecad, 2001–2017

Juergen Riegel, Werner Mayer, and Yorik van Havre. Freecad, 2001–2017. Accessed: 2001–2017. 6

work page 2001
[26]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dess `ı, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Can- cedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems (NeurIPS), pages 68539– 68551, 2023. 2, 3

work page 2023
[27]

Ari Seff, Yaniv Ovadia, Wenda Zhou, and Ryan P. Adams. Sketchgraphs: A large-scale dataset for modeling relational geometry in computer-aided design, 2020. 2

work page 2020
[28]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. 5

work page 2024
[29]

HybridFlow: A Flexible and Efficient RLHF Framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf frame- work.arXiv preprint arXiv: 2409.19256, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practi- cal bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, pages 2960–2968, 2012. 2

work page 2012
[31]

Qwen3 technical report, 2025

Qwen Team. Qwen3 technical report, 2025. 6

work page 2025
[32]

Smith, Daniel Khashabi, and Hannaneh Hajishirzi

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language models with self-generated instructions. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023. 2

work page 2023
[33]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), pages 24824–24837, 2022. 2

work page 2022
[34]

Karl D. D. Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: a dataset and en- vironment for programmatic CAD construction from human design sequences.ACM Trans. Graph., 40(4):54:1–54:24,

work page
[35]

Karl D. D. Willis, Pradeep Kumar Jayaraman, Hang Chu, Yunsheng Tian, Yifei Li, Daniele Grandi, Aditya Sanghi, Linh Tran, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Joinable: Learning bottom-up as- sembly of parametric CAD joints. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 1...

work page 2022
[36]

Text-to-CadQuery: A New Paradigm for CADgenerationwithscalablelargemodelcapabilities

Haoyang Xie and Feng Ju. Text-to-cadquery: A new paradigm for cad generation with scalable large model ca- pabilities.arXiv preprint arXiv:2505.06507, 2025. 2

work page arXiv 2025
[37]

Cfdagent: A language-guided, zero-shot multi-agent system for complex flow simulation.Physics of Fluids, 37 (11), 2025

Zhaoyue Xu, Long Wang, Chunyu Wang, Yixin Chen, Qingyong Luo, Hua-Dong Yao, Shizhao Wang, and Guowei He. Cfdagent: A language-guided, zero-shot multi-agent system for complex flow simulation.Physics of Fluids, 37 (11), 2025. 2

work page 2025
[38]

Qwen3 technical report, 2025

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jia- long Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang...

work page 2025
[39]

Narasimhan, and Yuan Cao

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023. 2, 3

work page 2023
[40]

Openecad: An efficient visual language model for editable 3d-cad design

Zhe Yuan, Jianqi Shi, and Yanhong Huang. Openecad: An efficient visual language model for editable 3d-cad design. Computers & Graphics, 124:104048, 2024. 2

work page 2024
[41]

Foam-agent: Towards automated intelligent cfd workflows

Ling Yue, Nithin Somasekharan, Yadi Cao, and Shaowu Pan. Foam-agent: Towards automated intelligent cfd workflows. arXiv preprint arXiv:2505.04997, 2025. 3

work page arXiv 2025
[42]

Marti: A framework for multi-agent llm systems reinforced training and inference,

Kaiyan Zhang, Runze Liu, Xuekai Zhu, Kai Tian, Sihang Zeng, Guoli Jia, Yuchen Fan, Xingtai Lv, Yuxin Zuo, Che Jiang, Ziyang Liu, Jianyu Wang, Yuru Wang, Ruotong Zhao, Ermo Hua, Yibo Wang, Shijie Wang, Junqi Gao, Xinwei Long, Youbang Sun, Zhiyuan Ma, Ganqu Cui, Lei Bai, Ning Ding, Biqing Qi, and Bowen Zhou. Marti: A framework for multi-agent llm systems re...

work page
[43]

Zhang, Z

Tao Zhang, Zhenhai Liu, Yong Xin, and Yongjun Jiao. Mooseagent: A llm based multi-agent frame- work for automating moose simulation.arXiv preprint arXiv:2504.08621, 2025. 3

work page arXiv 2025

[1] [1]

Do as I can, not as I say: Grounding language in robotic affor- dances

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Cheb- otar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. Do as I can, not as I say: Grounding language in robotic affor- dances. InConference on Robot Learning, CoRL 2022, 14- 18 December 2022, Auckland, New Zealand, pages 287–318. PMLR, 2022. 2, 3

work page 2022

[2] [2]

Claude Sonnet 4.5: System Card

Anthropic. Claude Sonnet 4.5: System Card. System card, Anthropic PBC, 2025. 6

work page 2025

[3] [3]

Dennis, Jr

Charles Audet and John E. Dennis, Jr. Mesh adaptive direct search algorithms for constrained optimization.SIAM Jour- nal on Optimization, 17(1):188–217, 2006. 2

work page 2006

[4] [4]

Intern- s1: A scientific multimodal foundation model, 2025

Lei Bai, Zhongrui Cai, Maosong Cao, Weihan Cao, Chiyu Chen, Haojiong Chen, Kai Chen, Pengcheng Chen, Ying Chen, Yongkang Chen, Yu Cheng, Yu Cheng, Pei Chu, Tao Chu, Erfei Cui, Ganqu Cui, Long Cui, Ziyun Cui, Ni- anchen Deng, Ning Ding, Nanqin Dong, Peijie Dong, Shi- han Dou, Sinan Du, Haodong Duan, Caihua Fan, Ben Gao, Changjiang Gao, Jianfei Gao, Songyan...

work page 2025

[5] [5]

Metaopenfoam: an llm-based multi-agent framework for cfd

Yuxuan Chen, Xu Zhu, Hua Zhou, and Zhuyin Ren. Metaopenfoam: an llm-based multi-agent framework for cfd. arXiv preprint arXiv:2407.21320, 2024. 2

work page arXiv 2024

[6] [6]

Christiano, Jan Leike, Tom B

Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Mar- tic, Shane Legg, and Dario Amodei. completelyinforcement learning from human preferences. InAdvances in Neural Information Processing Systems, pages 4299–4307, 2017. 2

work page 2017

[7] [7]

Cadquery, 2025

CadQuery contributors. Cadquery, 2025. 5

work page 2025

[8] [8]

Fine-tuning a large language model for automating computational fluid dynam- ics simulations.Theoretical and Applied Mechanics Letters, page 100594, 2025

Zhehao Dong, Zhen Lu, and Yue Yang. Fine-tuning a large language model for automating computational fluid dynam- ics simulations.Theoretical and Applied Mechanics Letters, page 100594, 2025. 2

work page 2025

[9] [9]

Gmsh: A 3-d finite element mesh generator with built-in pre-and post-processing facilities.International journal for numer- ical methods in engineering, 79(11):1309–1331, 2009

Christophe Geuzaine and Jean-Franc ¸ois Remacle. Gmsh: A 3-d finite element mesh generator with built-in pre-and post-processing facilities.International journal for numer- ical methods in engineering, 79(11):1309–1331, 2009. 6

work page 2009

[10] [10]

Gemini 3 flash model card

Google DeepMind. Gemini 3 flash model card. Online PDF,

work page

[11] [11]

Model card for the Gemini 3 Flash generative AI model. 6

work page

[12] [12]

Completely de- randomized self-adaptation in evolution strategies.Evolu- tionary Computation, 9(2):159–195, 2001

Nikolaus Hansen and Andreas Ostermeier. Completely de- randomized self-adaptation in evolution strategies.Evolu- tionary Computation, 9(2):159–195, 2001. 2

work page 2001

[13] [13]

Difftaichi: Differentiable programming for physical simulation

Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Fr´edo Durand. Difftaichi: Differentiable programming for physical simulation. InIn- ternational Conference on Learning Representations, 2020. 2

work page 2020

[14] [14]

Jones, Matthias Schonlau, and William J

Donald R. Jones, Matthias Schonlau, and William J. Welch. Efficient global optimization of expensive black-box func- tions.Journal of Global Optimization, 13(4):455–492, 1998. 2

work page 1998

[15] [15]

Ehud Karpas, Omri Abend, Yonatan Belinkov, Barak Lenz, Opher Lieber, Nir Ratner, Yoav Shoham, Hofit Bata, Yoav Levine, Kevin Leyton-Brown, et al. Mrkl systems: A mod- ular, neuro-symbolic architecture that combines large lan- guage models, external knowledge sources and discrete rea- soning.arXiv preprint arXiv:2205.00445, 2022. 3

work page internal anchor Pith review Pith/arXiv arXiv 2022

[16] [16]

Internbootcamp technical report: Boosting llm reasoning with verifiable task scaling, 2025

Peiji Li, Jiasheng Ye, Yongkang Chen, Yichuan Ma, Zijie Yu, Kedi Chen, Ganqu Cui, Haozhan Li, Jiacheng Chen, Chengqi Lyu, Wenwei Zhang, Linyang Li, Qipeng Guo, Dahua Lin, Bowen Zhou, and Kai Chen. Internbootcamp technical report: Boosting llm reasoning with verifiable task scaling, 2025. 3, 6

work page 2025

[17] [17]

Llm4cad: Multi-modal large language models for 3d computer-aided design generation

Xingang Li, Yuewan Sun, and Zhenghui Sha. Llm4cad: Multi-modal large language models for 3d computer-aided design generation. InInternational Design Engineering Technical Conferences and Computers and Information in Engineering Conference, page V006T06A015. American Society of Mechanical Engineers, 2024. 2

work page 2024

[18] [18]

Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar

Zongyi Li, Nikola B. Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021. 2

work page 2021

[19] [19]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021. 2

work page 2021

[20] [20]

Cad- assistant: tool-augmented vllms as generic cad task solvers

Dimitrios Mallis, Ahmet Serda Karadeniz, Sebastian Cavada, Danila Rukhovich, Niki Foteinopoulou, Kseniya Cherenkova, Anis Kacem, and Djamila Aouada. Cad- assistant: tool-augmented vllms as generic cad task solvers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7284–7294, 2025. 2

work page 2025

[21] [21]

Llama 4 Scout (17B×16E) Instruct: Model Card

Meta. Llama 4 Scout (17B×16E) Instruct: Model Card. On- line model card, 2025. Model release date: April 5, 2025. Accessed: 2026-01-23. 6

work page 2025

[22] [22]

Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agar- wal, Katarina Slama, Alex Ray, et al

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agar- wal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. InAd- vances in Neural Information Processing Systems, 2022. 2

work page 2022

[23] [23]

Toolllm: Facilitating large language models to mas- ter 16000+ real-world apis

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. Toolllm: Facilitating large language models to mas- ter 16000+ real-world apis. InInternational Conference on Learning Represent...

work page 2024

[24] [24]

Karniadakis

Maziar Raissi, Paris Perdikaris, and George E. Karniadakis. Physics-informed neural networks: A deep learning frame- work for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computa- tional Physics, 378:686–707, 2019. 2

work page 2019

[25] [25]

Freecad, 2001–2017

Juergen Riegel, Werner Mayer, and Yorik van Havre. Freecad, 2001–2017. Accessed: 2001–2017. 6

work page 2001

[26] [26]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dess `ı, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Can- cedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems (NeurIPS), pages 68539– 68551, 2023. 2, 3

work page 2023

[27] [27]

Ari Seff, Yaniv Ovadia, Wenda Zhou, and Ryan P. Adams. Sketchgraphs: A large-scale dataset for modeling relational geometry in computer-aided design, 2020. 2

work page 2020

[28] [28]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. 5

work page 2024

[29] [29]

HybridFlow: A Flexible and Efficient RLHF Framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf frame- work.arXiv preprint arXiv: 2409.19256, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[30] [30]

Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practi- cal bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, pages 2960–2968, 2012. 2

work page 2012

[31] [31]

Qwen3 technical report, 2025

Qwen Team. Qwen3 technical report, 2025. 6

work page 2025

[32] [32]

Smith, Daniel Khashabi, and Hannaneh Hajishirzi

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language models with self-generated instructions. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023. 2

work page 2023

[33] [33]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), pages 24824–24837, 2022. 2

work page 2022

[34] [34]

Karl D. D. Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: a dataset and en- vironment for programmatic CAD construction from human design sequences.ACM Trans. Graph., 40(4):54:1–54:24,

work page

[35] [35]

Karl D. D. Willis, Pradeep Kumar Jayaraman, Hang Chu, Yunsheng Tian, Yifei Li, Daniele Grandi, Aditya Sanghi, Linh Tran, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Joinable: Learning bottom-up as- sembly of parametric CAD joints. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 1...

work page 2022

[36] [36]

Text-to-CadQuery: A New Paradigm for CADgenerationwithscalablelargemodelcapabilities

Haoyang Xie and Feng Ju. Text-to-cadquery: A new paradigm for cad generation with scalable large model ca- pabilities.arXiv preprint arXiv:2505.06507, 2025. 2

work page arXiv 2025

[37] [37]

Cfdagent: A language-guided, zero-shot multi-agent system for complex flow simulation.Physics of Fluids, 37 (11), 2025

Zhaoyue Xu, Long Wang, Chunyu Wang, Yixin Chen, Qingyong Luo, Hua-Dong Yao, Shizhao Wang, and Guowei He. Cfdagent: A language-guided, zero-shot multi-agent system for complex flow simulation.Physics of Fluids, 37 (11), 2025. 2

work page 2025

[38] [38]

Qwen3 technical report, 2025

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jia- long Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang...

work page 2025

[39] [39]

Narasimhan, and Yuan Cao

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023. 2, 3

work page 2023

[40] [40]

Openecad: An efficient visual language model for editable 3d-cad design

Zhe Yuan, Jianqi Shi, and Yanhong Huang. Openecad: An efficient visual language model for editable 3d-cad design. Computers & Graphics, 124:104048, 2024. 2

work page 2024

[41] [41]

Foam-agent: Towards automated intelligent cfd workflows

Ling Yue, Nithin Somasekharan, Yadi Cao, and Shaowu Pan. Foam-agent: Towards automated intelligent cfd workflows. arXiv preprint arXiv:2505.04997, 2025. 3

work page arXiv 2025

[42] [42]

Marti: A framework for multi-agent llm systems reinforced training and inference,

Kaiyan Zhang, Runze Liu, Xuekai Zhu, Kai Tian, Sihang Zeng, Guoli Jia, Yuchen Fan, Xingtai Lv, Yuxin Zuo, Che Jiang, Ziyang Liu, Jianyu Wang, Yuru Wang, Ruotong Zhao, Ermo Hua, Yibo Wang, Shijie Wang, Junqi Gao, Xinwei Long, Youbang Sun, Zhiyuan Ma, Ganqu Cui, Lei Bai, Ning Ding, Biqing Qi, and Bowen Zhou. Marti: A framework for multi-agent llm systems re...

work page

[43] [43]

Zhang, Z

Tao Zhang, Zhenhai Liu, Yong Xin, and Yongjun Jiao. Mooseagent: A llm based multi-agent frame- work for automating moose simulation.arXiv preprint arXiv:2504.08621, 2025. 3

work page arXiv 2025