OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling
read the original abstract
Large language models (LLMs) have exhibited their problem-solving abilities in mathematical reasoning. Solving realistic optimization (OPT) problems in application scenarios requires advanced and applied mathematics ability. However, current OPT benchmarks that merely solve linear programming are far from complex realistic situations. In this work, we propose OptiBench, a benchmark for End-to-end optimization problem-solving with human-readable inputs and outputs. OptiBench contains rich optimization problems, including linear and nonlinear programming with or without tabular data, which can comprehensively evaluate LLMs' solving ability. In our benchmark, LLMs are required to call a code solver to provide precise numerical answers. Furthermore, to alleviate the data scarcity for optimization problems, and to bridge the gap between open-source LLMs on a small scale (e.g., Llama-3-8b) and closed-source LLMs (e.g., GPT-4), we further propose a data synthesis method namely ReSocratic. Unlike general data synthesis methods that proceed from questions to answers, \ReSocratic first incrementally synthesizes formatted optimization demonstration with mathematical formulations step by step and then back-translates the generated demonstrations into questions. Based on this, we synthesize the ReSocratic-29k dataset. We further conduct supervised fine-tuning with ReSocratic-29k on multiple open-source models. Experimental results show that ReSocratic-29k significantly improves the performance of open-source models.
This paper has not been read by Pith yet.
Forward citations
Cited by 9 Pith papers
-
A$^{2}$utoLPBench: An Auto-Generated, Agent-Friendly LP Benchmark via Inverse-KKT Construction
A²utoLPBench is a generator that produces unlimited LP word problems with ground-truth answers known by construction via inverse-KKT, bundled with a Docker environment for agent evaluation.
-
Generating Robust Portfolios of Optimization Models using Large Language Models
An algorithm generates a portfolio of LLM-produced optimization models with guarantees that high-quality candidates are included if either the generator or evaluator aligns with human preferences.
-
FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization
FrontierOR benchmark shows frontier LLMs outperform Gurobi on solution quality and efficiency in only 31% of one-shot cases and 50% with test-time evolution on hard large-scale optimization tasks.
-
OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling
OPT-Engine shows pure-text chain-of-thought reasoning in LLMs loses robustness as optimization complexity grows, external tools fix only local arithmetic, and solver-integrated methods are bottlenecked by automated co...
-
MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources
MiniOpt is an RL framework that decomposes optimization reasoning into modeling and solver generation, achieving top solving accuracy for models under 10B parameters across diverse problem types using OptReward and op...
-
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling
Agora-Opt uses decentralized debate among LLM agent teams plus a read-write memory bank to produce more accurate optimization models from text than prior LLM methods.
-
Co-evolving Agent Architectures and Interpretable Reasoning for Automated Optimization
EvoOR-Agent co-evolves agent architectures as AOE-style networks with graph-mediated recombination and knowledge-base-assisted mutation to outperform fixed LLM pipelines on OR benchmarks.
-
AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems
AutoOR uses synthetic data generation and RL post-training with solver feedback to enable 8B LLMs to autoformalize linear, mixed-integer, and non-linear OR problems, matching larger models on benchmarks.
-
MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources
MiniOpt trains LLMs under 10B parameters via RL with OptReward to model and solve general optimization problems, reporting highest average solving accuracy among comparable models.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.