OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling

Jing Tang; Liang Feng; Linqi Song; Wei Shi; Xiaodan Liang; Xiongwei Han; Yinya Huang; Yiwei Wang; Zhicheng Yang; Zhijiang Guo

arxiv: 2407.09887 · v4 · pith:PMFXFXIBnew · submitted 2024-07-13 · 💻 cs.LG · math.OC

OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling

Zhicheng Yang , Yiwei Wang , Yinya Huang , Zhijiang Guo , Wei Shi , Xiongwei Han , Liang Feng , Linqi Song

show 2 more authors

Xiaodan Liang Jing Tang

This is my paper

classification 💻 cs.LG math.OC

keywords llmsoptimizationdatamodelsopen-sourceoptibenchproblemsresocratic

0 comments

read the original abstract

Large language models (LLMs) have exhibited their problem-solving abilities in mathematical reasoning. Solving realistic optimization (OPT) problems in application scenarios requires advanced and applied mathematics ability. However, current OPT benchmarks that merely solve linear programming are far from complex realistic situations. In this work, we propose OptiBench, a benchmark for End-to-end optimization problem-solving with human-readable inputs and outputs. OptiBench contains rich optimization problems, including linear and nonlinear programming with or without tabular data, which can comprehensively evaluate LLMs' solving ability. In our benchmark, LLMs are required to call a code solver to provide precise numerical answers. Furthermore, to alleviate the data scarcity for optimization problems, and to bridge the gap between open-source LLMs on a small scale (e.g., Llama-3-8b) and closed-source LLMs (e.g., GPT-4), we further propose a data synthesis method namely ReSocratic. Unlike general data synthesis methods that proceed from questions to answers, \ReSocratic first incrementally synthesizes formatted optimization demonstration with mathematical formulations step by step and then back-translates the generated demonstrations into questions. Based on this, we synthesize the ReSocratic-29k dataset. We further conduct supervised fine-tuning with ReSocratic-29k on multiple open-source models. Experimental results show that ReSocratic-29k significantly improves the performance of open-source models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A$^{2}$utoLPBench: An Auto-Generated, Agent-Friendly LP Benchmark via Inverse-KKT Construction
cs.AI 2026-07 conditional novelty 7.0

A²utoLPBench is a generator that produces unlimited LP word problems with ground-truth answers known by construction via inverse-KKT, bundled with a Docker environment for agent evaluation.
Generating Robust Portfolios of Optimization Models using Large Language Models
cs.AI 2026-05 unverdicted novelty 7.0

An algorithm generates a portfolio of LLM-produced optimization models with guarantees that high-quality candidates are included if either the generator or evaluator aligns with human preferences.
FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization
cs.AI 2026-05 unverdicted novelty 7.0

FrontierOR benchmark shows frontier LLMs outperform Gurobi on solution quality and efficiency in only 31% of one-shot cases and 50% with test-time evolution on hard large-scale optimization tasks.
OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling
cs.CL 2026-01 accept novelty 7.0

OPT-Engine shows pure-text chain-of-thought reasoning in LLMs loses robustness as optimization complexity grows, external tools fix only local arithmetic, and solver-integrated methods are bottlenecked by automated co...
MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources
cs.LG 2026-06 unverdicted novelty 6.0

MiniOpt is an RL framework that decomposes optimization reasoning into modeling and solver generation, achieving top solving accuracy for models under 10B parameters across diverse problem types using OptReward and op...
From Soliloquy to Agora: Memory-Enhanced LLM Agents with Decentralized Debate for Optimization Modeling
math.OC 2026-04 unverdicted novelty 6.0

Agora-Opt uses decentralized debate among LLM agent teams plus a read-write memory bank to produce more accurate optimization models from text than prior LLM methods.
Co-evolving Agent Architectures and Interpretable Reasoning for Automated Optimization
cs.AI 2026-04 unverdicted novelty 6.0

EvoOR-Agent co-evolves agent architectures as AOE-style networks with graph-mediated recombination and knowledge-base-assisted mutation to outperform fixed LLM pipelines on OR benchmarks.
AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems
cs.LG 2026-04 unverdicted novelty 6.0

AutoOR uses synthetic data generation and RL post-training with solver feedback to enable 8B LLMs to autoformalize linear, mixed-integer, and non-linear OR problems, matching larger models on benchmarks.
MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources
cs.LG 2026-06 unverdicted novelty 5.0

MiniOpt trains LLMs under 10B parameters via RL with OptReward to model and solve general optimization problems, reporting highest average solving accuracy among comparable models.