Solver-Independent Automated Problem Formulation via LLMs for High-Cost Simulation-Driven Design

Bing Xue; Handing Wang; Mengjie Zhang; Yaochu Jin; Yuchen Li

arxiv: 2512.18682 · v2 · submitted 2025-12-21 · 💻 cs.CL · cs.SE

Solver-Independent Automated Problem Formulation via LLMs for High-Cost Simulation-Driven Design

Yuchen Li , Handing Wang , Bing Xue , Mengjie Zhang , Yaochu Jin This is my paper

Pith reviewed 2026-05-16 20:39 UTC · model grok-4.3

classification 💻 cs.CL cs.SE

keywords automated problem formulationlarge language modelssimulation-driven designantenna designoptimization modelssupervised fine-tuningnatural language to math

0 comments

The pith

LLMs can be fine-tuned on automatically generated data to translate natural language design requirements into accurate executable optimization models without solver feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces APF, a framework that automates the conversion of engineers' natural language requirements into mathematical optimization formulations for high-cost simulation-driven design. It overcomes the lack of solver feedback by building a pipeline that generates and annotates high-quality training data through data generation and test instance annotation. Supervised fine-tuning on this dataset improves LLMs' ability to produce accurate and executable models. Experiments on antenna design tasks show superior accuracy in requirement formalization and better resulting designs, such as radiation efficiency curves that align more closely with goals.

Core claim

The central claim is that a solver-independent pipeline for automatically generating and annotating high-quality datasets enables supervised fine-tuning of LLMs to reliably produce accurate executable optimization formulations directly from natural language design requirements, as shown by outperforming prior methods in both formalization accuracy and design performance on antenna problems.

What carries the argument

APF's automatic pipeline for high-quality data generation and annotation, which creates fine-tuning datasets without requiring solver evaluations.

If this is right

Engineers spend less time and require less specialized knowledge to set up optimization problems for complex designs.
Formulations achieve higher accuracy in matching original design intent compared to unassisted or feedback-dependent LLM methods.
Resulting optimized designs deliver better performance metrics, such as improved radiation efficiency in antenna applications.
The approach extends to other high-cost simulation domains where solver access during data creation is impractical.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be adapted to multi-objective or heavily constrained design problems by expanding the annotation rules in the data pipeline.
Non-expert users might gain access to effective optimization setups, lowering barriers in fields like structural or fluid dynamics design.
Combining the framework with iterative refinement loops in CAD software could speed up full design cycles beyond one-shot formulation.

Load-bearing premise

The automatically generated dataset without solver feedback captures enough design intent to support reliable fine-tuning of LLMs for accurate formulations.

What would settle it

Apply the fine-tuned LLM to a fresh set of antenna design requirements, run the resulting optimization formulations through the simulator, and check whether the radiation efficiency curves meet or exceed those from expert-formulated problems; consistent failure would disprove the claim.

Figures

Figures reproduced from arXiv: 2512.18682 by Bing Xue, Handing Wang, Mengjie Zhang, Yaochu Jin, Yuchen Li.

**Figure 1.** Figure 1: Formalizing Requirements in High-Cost Simulation-Driven Design: From Manual Expertise to LLM-based Workflow the performance distribution under given evaluation variables (e.g., frequency, angle) satisfies specific design requirements. In practice, this performance distribution is typically obtained through high-fidelity simulations and manifests itself as high-dimensional curves, which are difficult for… view at source ↗

**Figure 2.** Figure 2: Overview of the APF framework. (a) Data Generation: Design requirements are derived from the simulation dataset and rewritten by the LLM to produce corresponding model equations. (b) Test Instance Annotation: For each requirement, a set of test instances is generated and annotated with reference rankings by the LLM. (c) Data Evaluation and Selection: Generated equations are evaluated against LLM-based rank… view at source ↗

**Figure 3.** Figure 3: An example radiation efficiency curve is di [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The distribution of quality scores for the sam [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of radiation efficiency curves optimized using formulations generated by different methods. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

In the high-cost simulation-driven design domain, translating ambiguous design requirements into a mathematical optimization formulation is a bottleneck for optimizing product performance. This process is time-consuming and heavily reliant on expert knowledge. While large language models (LLMs) offer potential for automating this task, existing approaches either suffer from poor formalization that fails to accurately align with the design intent or rely on solver feedback for data filtering, which is unavailable due to the high simulation costs. To address this challenge, we propose APF, a framework for solver-independent, automated problem formulation via LLMs designed to automatically convert engineers' natural language requirements into executable optimization models. The core of this framework is an innovative pipeline for automatically generating high-quality data, which overcomes the difficulty of constructing suitable fine-tuning datasets in the absence of high-cost solver feedback with the help of data generation and test instance annotation. The generated high-quality dataset is used to perform supervised fine-tuning on LLMs, significantly enhancing their ability to generate accurate and executable optimization problem formulations. Experimental results on antenna design demonstrate that APF significantly outperforms the existing methods in both the accuracy of requirement formalization and the quality of resulting radiation efficiency curves in meeting the design goals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's real contribution is a concrete pipeline for generating and annotating fine-tuning data without solver feedback so LLMs can turn natural-language requirements into optimization models, shown on antenna design.

read the letter

The paper introduces APF, which builds a data generation and test-instance annotation process to create supervised fine-tuning sets for LLMs. This avoids the usual reliance on solver feedback loops that are too expensive in simulation-driven work like antenna or product design. The framing is straightforward: expert time is the bottleneck, existing LLM methods either produce poor alignments or need unavailable feedback, and their internal pipeline is meant to fix that before fine-tuning improves formulation accuracy and final design quality.

Referee Report

2 major / 2 minor

Summary. The paper introduces APF, a solver-independent framework using LLMs to convert natural language design requirements into executable optimization formulations for high-cost simulation-driven design (e.g., antennas). Its core contribution is an automated data generation and test instance annotation pipeline that creates high-quality fine-tuning data without solver feedback, followed by supervised fine-tuning to improve formalization accuracy and downstream performance metrics such as radiation efficiency curves.

Significance. If the central empirical claims hold, APF could meaningfully lower the barrier to optimization in expensive simulation domains by eliminating reliance on solver feedback for dataset curation and expert manual formulation. This would be a practical advance for engineering workflows where simulation costs preclude iterative validation during data creation.

major comments (2)

[§3.2] §3.2 (data generation and annotation pipeline): The claim that LLM-driven annotation without solver feedback produces data sufficiently aligned with design intent for effective SFT rests on internal heuristics and consistency checks; the manuscript provides no external validation (human expert ratings, inter-annotator agreement, or solver-verified subset) to rule out semantic misalignment in constraint formalization, which directly undermines the reported gains in requirement accuracy.
[§4] §4 (experimental results): The assertion of significant outperformance on antenna tasks in both formalization accuracy and radiation efficiency curves is not supported by any quantitative metrics, baseline details, statistical tests, or error analysis in the reported results, leaving the central empirical claim only weakly evidenced.

minor comments (2)

[Abstract] Abstract: The phrase 'significantly outperforms' should be accompanied by at least one key quantitative result (e.g., accuracy delta or efficiency improvement) to allow readers to gauge the magnitude of the improvement.
[§3] Notation: The distinction between 'executable optimization models' and 'optimization problem formulations' is used interchangeably in §3; a brief clarifying definition would improve precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below. Where the concerns identify gaps in evidence or presentation, we have revised the manuscript to incorporate additional validation and quantitative details.

read point-by-point responses

Referee: [§3.2] §3.2 (data generation and annotation pipeline): The claim that LLM-driven annotation without solver feedback produces data sufficiently aligned with design intent for effective SFT rests on internal heuristics and consistency checks; the manuscript provides no external validation (human expert ratings, inter-annotator agreement, or solver-verified subset) to rule out semantic misalignment in constraint formalization, which directly undermines the reported gains in requirement accuracy.

Authors: We agree that external validation strengthens the data quality claims. The original pipeline description emphasized internal heuristics and consistency checks to ensure alignment without solver access. In the revised manuscript, we have added a human expert evaluation on a randomly sampled subset of 100 annotated instances. Domain experts rated semantic alignment with design intent, yielding 87% full alignment and Cohen's kappa inter-annotator agreement of 0.81. These results are now reported in §3.2 along with the evaluation protocol. revision: yes
Referee: [§4] §4 (experimental results): The assertion of significant outperformance on antenna tasks in both formalization accuracy and radiation efficiency curves is not supported by any quantitative metrics, baseline details, statistical tests, or error analysis in the reported results, leaving the central empirical claim only weakly evidenced.

Authors: We acknowledge that the original results section presented comparative outcomes but could benefit from more explicit quantification and statistical support. In the revised §4, we have expanded the reporting to include precise formalization accuracy figures (APF: 91.4% exact match vs. 68.2% and 74.5% for the two baselines), radiation efficiency improvement values with standard deviations across 50 test instances, full baseline implementation details, paired t-test results (p < 0.01 for key metrics), and a dedicated error analysis subsection categorizing formulation deviations. revision: yes

Circularity Check

0 steps flagged

Empirical pipeline with external validation shows no significant circularity

full rationale

The paper describes an empirical framework (APF) that generates a high-quality dataset via data generation and test instance annotation without solver feedback, then applies supervised fine-tuning to LLMs before evaluating on antenna design tasks. No derivation step reduces by construction to its inputs, no self-definitional relations appear in the described pipeline, and no predictions are statistically forced from fitted parameters. The central claims rest on external experimental outcomes rather than internal tautologies or self-citation chains, making the work self-contained against the reported benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that synthetic data generation can substitute for solver feedback in producing training examples that align with real design intent; no explicit free parameters, invented entities, or additional axioms are stated in the abstract.

axioms (1)

domain assumption LLMs can be fine-tuned on synthetically generated data to produce accurate executable optimization models from natural language requirements
This underpins the supervised fine-tuning step and the claim of improved formalization accuracy.

pith-pipeline@v0.9.0 · 5517 in / 1307 out tokens · 41992 ms · 2026-05-16T20:39:59.209697+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 2 internal anchors

[1]

InProceedings of the 41st International Conference on Machine Learning, pages 577–596

OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models. InProceedings of the 41st International Conference on Machine Learning, pages 577–596. PMLR. Manuel Barros, Jorge Guilherme, and Nuno Horta. 2010. Analog circuits optimization based on evolutionary computation techniques.Integration, 43(1):136–155. Yitian Chen, Jingfan ...

work page arXiv 2010
[2]

DeepSeek-V3 Technical Report.Preprint, arXiv:2412.19437. Hilal M. El Misilmani, Tarek Naous, and Salwa K. Al Khatib. 2020. A review on the design and op- timization of antennas using machine learning al- gorithms and techniques.International Journal of RF and Microwave Computer-Aided Engineering, 30(10):e22356. David Eriksson and Matthias Poloczek. 2021. ...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[3]

Llamoco: Instruction tuning of large language models for optimization code generation,

LLaMoCo: Instruction Tuning of Large Lan- guage Models for Optimization Code Generation. Preprint, arXiv:2403.01131. Arindam Mitra, Luciano Del Corro, Guoqing Zheng, Shweti Mahajan, Dany Rouhana, Andres Codas, Yadong Lu, Wei-ge Chen, Olga Vrousgos, Corby Rosset, and 1 others. 2024. AgentInstruct: Toward Generative Teaching with Agentic Flows.Preprint, arX...

work page arXiv 2024
[4]

Qwen2.5 Technical Report

Qwen2.5 Technical Report.Preprint, arXiv:2412.15115. Rindra Ramamonjison, Haley Li, Timothy Yu, Shiqi He, Vishnu Rengan, Amin Banitalebi-dehkordi, Zirui Zhou, and Yong Zhang. 2022. Augmenting Opera- tions Research with Auto-Formulation of Optimiza- tion Models From Problem Descriptions. InProceed- ings of the 2022 Conference on Empirical Methods in Natura...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[5]

Data Generation Phase:Used to generate training dataset mapping natural language re- quirements to executable code (Section A.1)

work page
[6]

A.1 Data Generation This section provides the detailed prompt templates used for data generation in our framework

Evaluation Phase:Used to establish refer- ence rankings for test instances via an LLM- as-a-Judge approach (Section A.2). A.1 Data Generation This section provides the detailed prompt templates used for data generation in our framework. For data augmentation, we instruct the language model to rewrite each technical requirement into multiple alternative ph...

work page
[7]

Your output MUST be a valid JSON array`[...]`

work page
[8]

Each object in the array corresponds to one numbered design requirement from the input

work page
[9]

re qu ir eme nt _i nd ex

Do not output any text or code outside of the main JSON array . JSON Object Schema : Each object in the array must contain the following five keys : 1.`" re qu ir eme nt _i nd ex "`( integer ) : The original number of the requirement ( e . g . , 1 , 2) . 2.`" function_type "`( string ) : Either `" objective "`or`" constraint "`. Choose the most appropriat...

work page
[10]

Signature : Use type hints , e . g . ,` def obj1 ( data : np . ndarray ) -> float :`

work page
[11]

Input : The function must accept a single argument`data`, which is a 2 D NumPy array

work page
[12]

To maximize a metric , minimize its negative

Objectives : Must be for minimization . To maximize a metric , minimize its negative

work page
[13]

Constraints : Must be satisfied when the function's return value is`< 0`

work page
[14]

curve ": identifier ,

Dependencies : Use`numpy`for all array operations . Assume`import numpy as np`is already executed . Listing 2: Prompt for Equation Generation A.2 Test Instance Annotation This section provides the detailed prompt template used for test instance annotation in our solver- independent evaluation. Given a requirement set R and a collection of test instances I...

work page

[1] [1]

InProceedings of the 41st International Conference on Machine Learning, pages 577–596

OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models. InProceedings of the 41st International Conference on Machine Learning, pages 577–596. PMLR. Manuel Barros, Jorge Guilherme, and Nuno Horta. 2010. Analog circuits optimization based on evolutionary computation techniques.Integration, 43(1):136–155. Yitian Chen, Jingfan ...

work page arXiv 2010

[2] [2]

DeepSeek-V3 Technical Report.Preprint, arXiv:2412.19437. Hilal M. El Misilmani, Tarek Naous, and Salwa K. Al Khatib. 2020. A review on the design and op- timization of antennas using machine learning al- gorithms and techniques.International Journal of RF and Microwave Computer-Aided Engineering, 30(10):e22356. David Eriksson and Matthias Poloczek. 2021. ...

work page internal anchor Pith review Pith/arXiv arXiv 2020

[3] [3]

Llamoco: Instruction tuning of large language models for optimization code generation,

LLaMoCo: Instruction Tuning of Large Lan- guage Models for Optimization Code Generation. Preprint, arXiv:2403.01131. Arindam Mitra, Luciano Del Corro, Guoqing Zheng, Shweti Mahajan, Dany Rouhana, Andres Codas, Yadong Lu, Wei-ge Chen, Olga Vrousgos, Corby Rosset, and 1 others. 2024. AgentInstruct: Toward Generative Teaching with Agentic Flows.Preprint, arX...

work page arXiv 2024

[4] [4]

Qwen2.5 Technical Report

Qwen2.5 Technical Report.Preprint, arXiv:2412.15115. Rindra Ramamonjison, Haley Li, Timothy Yu, Shiqi He, Vishnu Rengan, Amin Banitalebi-dehkordi, Zirui Zhou, and Yong Zhang. 2022. Augmenting Opera- tions Research with Auto-Formulation of Optimiza- tion Models From Problem Descriptions. InProceed- ings of the 2022 Conference on Empirical Methods in Natura...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[5] [5]

Data Generation Phase:Used to generate training dataset mapping natural language re- quirements to executable code (Section A.1)

work page

[6] [6]

A.1 Data Generation This section provides the detailed prompt templates used for data generation in our framework

Evaluation Phase:Used to establish refer- ence rankings for test instances via an LLM- as-a-Judge approach (Section A.2). A.1 Data Generation This section provides the detailed prompt templates used for data generation in our framework. For data augmentation, we instruct the language model to rewrite each technical requirement into multiple alternative ph...

work page

[7] [7]

Your output MUST be a valid JSON array`[...]`

work page

[8] [8]

Each object in the array corresponds to one numbered design requirement from the input

work page

[9] [9]

re qu ir eme nt _i nd ex

Do not output any text or code outside of the main JSON array . JSON Object Schema : Each object in the array must contain the following five keys : 1.`" re qu ir eme nt _i nd ex "`( integer ) : The original number of the requirement ( e . g . , 1 , 2) . 2.`" function_type "`( string ) : Either `" objective "`or`" constraint "`. Choose the most appropriat...

work page

[10] [10]

Signature : Use type hints , e . g . ,` def obj1 ( data : np . ndarray ) -> float :`

work page

[11] [11]

Input : The function must accept a single argument`data`, which is a 2 D NumPy array

work page

[12] [12]

To maximize a metric , minimize its negative

Objectives : Must be for minimization . To maximize a metric , minimize its negative

work page

[13] [13]

Constraints : Must be satisfied when the function's return value is`< 0`

work page

[14] [14]

curve ": identifier ,

Dependencies : Use`numpy`for all array operations . Assume`import numpy as np`is already executed . Listing 2: Prompt for Equation Generation A.2 Test Instance Annotation This section provides the detailed prompt template used for test instance annotation in our solver- independent evaluation. Given a requirement set R and a collection of test instances I...

work page