pith. sign in

arxiv: 2602.10450 · v2 · pith:JEVRATZKnew · submitted 2026-02-11 · 💻 cs.LG · cs.AI· math.OC

Constructing Industrial-Scale Optimization Modeling Benchmark

classification 💻 cs.LG cs.AImath.OC
keywords optimizationbenchmarksformulationsnatural-languagecodeevaluationmiplib-nlmodeling
0
0 comments X
read the original abstract

Optimization modeling underpins decision-making in logistics, manufacturing, energy, and finance, yet translating natural-language requirements into correct optimization formulations and solver-executable code remains labor-intensive. Although large language models (LLMs) have been explored for this task, evaluation is still dominated by toy-sized or synthetic benchmarks, masking the difficulty of industrial problems with $10^{3}$--$10^{6}$ (or more) variables and constraints. A key bottleneck is the lack of benchmarks that align natural-language specifications with reference formulations/solver code grounded in real optimization models. To fill in this gap, we introduce MIPLIB-NL, built via a structure-aware reverse construction methodology from real mixed-integer linear programs in MIPLIB~2017. Our pipeline (i) recovers compact, reusable model structure from flat solver formulations, (ii) reverse-generates natural-language specifications explicitly tied to this recovered structure under a unified model--data separation format, and (iii) performs iterative semantic validation through expert review and human--LLM interaction with independent reconstruction checks. This yields 223 one-to-one reconstructions that preserve the mathematical content of the original instances while enabling realistic natural-language-to-optimization evaluation. Experiments show substantial performance degradation on MIPLIB-NL for systems that perform strongly on existing benchmarks, exposing failure modes invisible at toy scale.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling

    cs.AI 2026-05 unverdicted novelty 8.0

    MM-OptBench is a solver-grounded benchmark showing current multimodal LLMs reach at most 52% pass@1 on generating correct optimization models from text-plus-visual problem specifications.