ReLoop closes the feasibility-correctness gap in LLM optimization code via structured generation and behavioral verification with parameter perturbations, reaching 100% executability and accuracy gains on benchmarks while releasing RetailOpt-190.
Mamo: A mathematical modeling benchmark with solvers
5 Pith papers cite this work. Polarity classification is still indexing.
5
Pith papers citing it
citation-role summary
background 1
dataset 1
citation-polarity summary
representative citing papers
LLMs prompted with domain knowledge can generate runnable, numerically valid code for stiff and non-stiff ODEs on new diagnostic and 1000-task benchmarks.
PARM adapts reward models to multi-stage LLM pipelines via pipeline data and direct preference optimization, improving execution rate and solving accuracy on optimization benchmarks and showing transfer to GSM8K.
AutoOR uses synthetic data generation and RL post-training with solver feedback to enable 8B LLMs to autoformalize linear, mixed-integer, and non-linear OR problems, matching larger models on benchmarks.