Solver-Independent Automated Problem Formulation via LLMs for High-Cost Simulation-Driven Design
Pith reviewed 2026-05-16 20:39 UTC · model grok-4.3
The pith
LLMs can be fine-tuned on automatically generated data to translate natural language design requirements into accurate executable optimization models without solver feedback.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a solver-independent pipeline for automatically generating and annotating high-quality datasets enables supervised fine-tuning of LLMs to reliably produce accurate executable optimization formulations directly from natural language design requirements, as shown by outperforming prior methods in both formalization accuracy and design performance on antenna problems.
What carries the argument
APF's automatic pipeline for high-quality data generation and annotation, which creates fine-tuning datasets without requiring solver evaluations.
If this is right
- Engineers spend less time and require less specialized knowledge to set up optimization problems for complex designs.
- Formulations achieve higher accuracy in matching original design intent compared to unassisted or feedback-dependent LLM methods.
- Resulting optimized designs deliver better performance metrics, such as improved radiation efficiency in antenna applications.
- The approach extends to other high-cost simulation domains where solver access during data creation is impractical.
Where Pith is reading between the lines
- The method could be adapted to multi-objective or heavily constrained design problems by expanding the annotation rules in the data pipeline.
- Non-expert users might gain access to effective optimization setups, lowering barriers in fields like structural or fluid dynamics design.
- Combining the framework with iterative refinement loops in CAD software could speed up full design cycles beyond one-shot formulation.
Load-bearing premise
The automatically generated dataset without solver feedback captures enough design intent to support reliable fine-tuning of LLMs for accurate formulations.
What would settle it
Apply the fine-tuned LLM to a fresh set of antenna design requirements, run the resulting optimization formulations through the simulator, and check whether the radiation efficiency curves meet or exceed those from expert-formulated problems; consistent failure would disprove the claim.
Figures
read the original abstract
In the high-cost simulation-driven design domain, translating ambiguous design requirements into a mathematical optimization formulation is a bottleneck for optimizing product performance. This process is time-consuming and heavily reliant on expert knowledge. While large language models (LLMs) offer potential for automating this task, existing approaches either suffer from poor formalization that fails to accurately align with the design intent or rely on solver feedback for data filtering, which is unavailable due to the high simulation costs. To address this challenge, we propose APF, a framework for solver-independent, automated problem formulation via LLMs designed to automatically convert engineers' natural language requirements into executable optimization models. The core of this framework is an innovative pipeline for automatically generating high-quality data, which overcomes the difficulty of constructing suitable fine-tuning datasets in the absence of high-cost solver feedback with the help of data generation and test instance annotation. The generated high-quality dataset is used to perform supervised fine-tuning on LLMs, significantly enhancing their ability to generate accurate and executable optimization problem formulations. Experimental results on antenna design demonstrate that APF significantly outperforms the existing methods in both the accuracy of requirement formalization and the quality of resulting radiation efficiency curves in meeting the design goals.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces APF, a solver-independent framework using LLMs to convert natural language design requirements into executable optimization formulations for high-cost simulation-driven design (e.g., antennas). Its core contribution is an automated data generation and test instance annotation pipeline that creates high-quality fine-tuning data without solver feedback, followed by supervised fine-tuning to improve formalization accuracy and downstream performance metrics such as radiation efficiency curves.
Significance. If the central empirical claims hold, APF could meaningfully lower the barrier to optimization in expensive simulation domains by eliminating reliance on solver feedback for dataset curation and expert manual formulation. This would be a practical advance for engineering workflows where simulation costs preclude iterative validation during data creation.
major comments (2)
- [§3.2] §3.2 (data generation and annotation pipeline): The claim that LLM-driven annotation without solver feedback produces data sufficiently aligned with design intent for effective SFT rests on internal heuristics and consistency checks; the manuscript provides no external validation (human expert ratings, inter-annotator agreement, or solver-verified subset) to rule out semantic misalignment in constraint formalization, which directly undermines the reported gains in requirement accuracy.
- [§4] §4 (experimental results): The assertion of significant outperformance on antenna tasks in both formalization accuracy and radiation efficiency curves is not supported by any quantitative metrics, baseline details, statistical tests, or error analysis in the reported results, leaving the central empirical claim only weakly evidenced.
minor comments (2)
- [Abstract] Abstract: The phrase 'significantly outperforms' should be accompanied by at least one key quantitative result (e.g., accuracy delta or efficiency improvement) to allow readers to gauge the magnitude of the improvement.
- [§3] Notation: The distinction between 'executable optimization models' and 'optimization problem formulations' is used interchangeably in §3; a brief clarifying definition would improve precision.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below. Where the concerns identify gaps in evidence or presentation, we have revised the manuscript to incorporate additional validation and quantitative details.
read point-by-point responses
-
Referee: [§3.2] §3.2 (data generation and annotation pipeline): The claim that LLM-driven annotation without solver feedback produces data sufficiently aligned with design intent for effective SFT rests on internal heuristics and consistency checks; the manuscript provides no external validation (human expert ratings, inter-annotator agreement, or solver-verified subset) to rule out semantic misalignment in constraint formalization, which directly undermines the reported gains in requirement accuracy.
Authors: We agree that external validation strengthens the data quality claims. The original pipeline description emphasized internal heuristics and consistency checks to ensure alignment without solver access. In the revised manuscript, we have added a human expert evaluation on a randomly sampled subset of 100 annotated instances. Domain experts rated semantic alignment with design intent, yielding 87% full alignment and Cohen's kappa inter-annotator agreement of 0.81. These results are now reported in §3.2 along with the evaluation protocol. revision: yes
-
Referee: [§4] §4 (experimental results): The assertion of significant outperformance on antenna tasks in both formalization accuracy and radiation efficiency curves is not supported by any quantitative metrics, baseline details, statistical tests, or error analysis in the reported results, leaving the central empirical claim only weakly evidenced.
Authors: We acknowledge that the original results section presented comparative outcomes but could benefit from more explicit quantification and statistical support. In the revised §4, we have expanded the reporting to include precise formalization accuracy figures (APF: 91.4% exact match vs. 68.2% and 74.5% for the two baselines), radiation efficiency improvement values with standard deviations across 50 test instances, full baseline implementation details, paired t-test results (p < 0.01 for key metrics), and a dedicated error analysis subsection categorizing formulation deviations. revision: yes
Circularity Check
Empirical pipeline with external validation shows no significant circularity
full rationale
The paper describes an empirical framework (APF) that generates a high-quality dataset via data generation and test instance annotation without solver feedback, then applies supervised fine-tuning to LLMs before evaluating on antenna design tasks. No derivation step reduces by construction to its inputs, no self-definitional relations appear in the described pipeline, and no predictions are statistically forced from fitted parameters. The central claims rest on external experimental outcomes rather than internal tautologies or self-citation chains, making the work self-contained against the reported benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can be fine-tuned on synthetically generated data to produce accurate executable optimization models from natural language requirements
Reference graph
Works this paper leans on
-
[1]
InProceedings of the 41st International Conference on Machine Learning, pages 577–596
OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models. InProceedings of the 41st International Conference on Machine Learning, pages 577–596. PMLR. Manuel Barros, Jorge Guilherme, and Nuno Horta. 2010. Analog circuits optimization based on evolutionary computation techniques.Integration, 43(1):136–155. Yitian Chen, Jingfan ...
-
[2]
DeepSeek-V3 Technical Report.Preprint, arXiv:2412.19437. Hilal M. El Misilmani, Tarek Naous, and Salwa K. Al Khatib. 2020. A review on the design and op- timization of antennas using machine learning al- gorithms and techniques.International Journal of RF and Microwave Computer-Aided Engineering, 30(10):e22356. David Eriksson and Matthias Poloczek. 2021. ...
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[3]
Llamoco: Instruction tuning of large language models for optimization code generation,
LLaMoCo: Instruction Tuning of Large Lan- guage Models for Optimization Code Generation. Preprint, arXiv:2403.01131. Arindam Mitra, Luciano Del Corro, Guoqing Zheng, Shweti Mahajan, Dany Rouhana, Andres Codas, Yadong Lu, Wei-ge Chen, Olga Vrousgos, Corby Rosset, and 1 others. 2024. AgentInstruct: Toward Generative Teaching with Agentic Flows.Preprint, arX...
-
[4]
Qwen2.5 Technical Report.Preprint, arXiv:2412.15115. Rindra Ramamonjison, Haley Li, Timothy Yu, Shiqi He, Vishnu Rengan, Amin Banitalebi-dehkordi, Zirui Zhou, and Yong Zhang. 2022. Augmenting Opera- tions Research with Auto-Formulation of Optimiza- tion Models From Problem Descriptions. InProceed- ings of the 2022 Conference on Empirical Methods in Natura...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[5]
Data Generation Phase:Used to generate training dataset mapping natural language re- quirements to executable code (Section A.1)
-
[6]
Evaluation Phase:Used to establish refer- ence rankings for test instances via an LLM- as-a-Judge approach (Section A.2). A.1 Data Generation This section provides the detailed prompt templates used for data generation in our framework. For data augmentation, we instruct the language model to rewrite each technical requirement into multiple alternative ph...
-
[7]
Your output MUST be a valid JSON array`[...]`
-
[8]
Each object in the array corresponds to one numbered design requirement from the input
-
[9]
Do not output any text or code outside of the main JSON array . JSON Object Schema : Each object in the array must contain the following five keys : 1.`" re qu ir eme nt _i nd ex "`( integer ) : The original number of the requirement ( e . g . , 1 , 2) . 2.`" function_type "`( string ) : Either `" objective "`or`" constraint "`. Choose the most appropriat...
-
[10]
Signature : Use type hints , e . g . ,` def obj1 ( data : np . ndarray ) -> float :`
-
[11]
Input : The function must accept a single argument`data`, which is a 2 D NumPy array
-
[12]
To maximize a metric , minimize its negative
Objectives : Must be for minimization . To maximize a metric , minimize its negative
-
[13]
Constraints : Must be satisfied when the function's return value is`< 0`
-
[14]
Dependencies : Use`numpy`for all array operations . Assume`import numpy as np`is already executed . Listing 2: Prompt for Equation Generation A.2 Test Instance Annotation This section provides the detailed prompt template used for test instance annotation in our solver- independent evaluation. Given a requirement set R and a collection of test instances I...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.