ReLoop closes the feasibility-correctness gap in LLM optimization code via structured generation and behavioral verification with parameter perturbations, reaching 100% executability and accuracy gains on benchmarks while releasing RetailOpt-190.
Mamo: A mathematical modeling benchmark with solvers
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
LLMs prompted with domain knowledge can generate runnable, numerically valid code for stiff and non-stiff ODEs on new diagnostic and 1000-task benchmarks.
PARM adapts reward models to multi-stage LLM pipelines via pipeline data and direct preference optimization, improving execution rate and solving accuracy on optimization benchmarks and showing transfer to GSM8K.
AutoOR uses synthetic data generation and RL post-training with solver feedback to enable 8B LLMs to autoformalize linear, mixed-integer, and non-linear OR problems, matching larger models on benchmarks.
citing papers explorer
-
ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization
ReLoop closes the feasibility-correctness gap in LLM optimization code via structured generation and behavioral verification with parameter perturbations, reaching 100% executability and accuracy gains on benchmarks while releasing RetailOpt-190.
-
SciML Agents: Write the Solver, Not the Solution
LLMs prompted with domain knowledge can generate runnable, numerically valid code for stiff and non-stiff ODEs on new diagnostic and 1000-task benchmarks.
-
PARM: Pipeline-Adapted Reward Model
PARM adapts reward models to multi-stage LLM pipelines via pipeline data and direct preference optimization, improving execution rate and solving accuracy on optimization benchmarks and showing transfer to GSM8K.
-
AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems
AutoOR uses synthetic data generation and RL post-training with solver feedback to enable 8B LLMs to autoformalize linear, mixed-integer, and non-linear OR problems, matching larger models on benchmarks.