Constructing Industrial-Scale Optimization Modeling Benchmark

Fan Zhang; Hongliang Lu; Tao Wei; Wenyu Liu; Yuan Lan; Yuxuan Chen; Zaiwen Wen; Zhong Li

arxiv: 2602.10450 · v2 · pith:JEVRATZKnew · submitted 2026-02-11 · 💻 cs.LG · cs.AI· math.OC

Constructing Industrial-Scale Optimization Modeling Benchmark

Zhong Li , Hongliang Lu , Tao Wei , Yuxuan Chen , Wenyu Liu , Yuan Lan , Fan Zhang , Zaiwen Wen This is my paper

classification 💻 cs.LG cs.AImath.OC

keywords optimizationbenchmarksformulationsnatural-languagecodeevaluationmiplib-nlmodeling

0 comments

read the original abstract

Optimization modeling underpins decision-making in logistics, manufacturing, energy, and finance, yet translating natural-language requirements into correct optimization formulations and solver-executable code remains labor-intensive. Although large language models (LLMs) have been explored for this task, evaluation is still dominated by toy-sized or synthetic benchmarks, masking the difficulty of industrial problems with $10^{3}$--$10^{6}$ (or more) variables and constraints. A key bottleneck is the lack of benchmarks that align natural-language specifications with reference formulations/solver code grounded in real optimization models. To fill in this gap, we introduce MIPLIB-NL, built via a structure-aware reverse construction methodology from real mixed-integer linear programs in MIPLIB~2017. Our pipeline (i) recovers compact, reusable model structure from flat solver formulations, (ii) reverse-generates natural-language specifications explicitly tied to this recovered structure under a unified model--data separation format, and (iii) performs iterative semantic validation through expert review and human--LLM interaction with independent reconstruction checks. This yields 223 one-to-one reconstructions that preserve the mathematical content of the original instances while enabling realistic natural-language-to-optimization evaluation. Experiments show substantial performance degradation on MIPLIB-NL for systems that perform strongly on existing benchmarks, exposing failure modes invisible at toy scale.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling
cs.AI 2026-05 unverdicted novelty 8.0

MM-OptBench is a solver-grounded benchmark showing current multimodal LLMs reach at most 52% pass@1 on generating correct optimization models from text-plus-visual problem specifications.