pith. sign in

ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Operations Research practitioners debug infeasible models through an iterative process: inspecting Irreducible Infeasible Subsystems ( IIS), identifying constraint conflicts, and repairing formulations until feasibility is restored. Existing LLM benchmarks mostly treat OR as one-shot translation from problem descriptions to solver code, omitting this diagnostic loop. We formalize infeasible-model repair as a solver-in-the-loop Markov Decision Process in which each action triggers solver re-execution and IIS recomputation, yielding deterministic, verifiable feedback. We introduce ORLoopBench, a benchmark suite with two components: OR-Debug-Bench releases 5,362 LP/MILP repair instances, while OR-Bias-Bench evaluates closed-form operational decision rationality across inventory settings. Solver-verified RLVR training enables an 8B model to surpass frontier APIs on LP repair (95.3% vs 92.4% RR @5), improves diagnostic behavior, and transfers to MILP repair. The same evaluation exposes semantic drift in whole-model code regeneration: feasible regenerated MILPs can solve the wrong problem. Process-level evaluation with solver oracles enables targeted training for reliable OR self-correction.

fields

stat.ML 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Resource-Constrained Adaptive Inference for Sequential Pricing

stat.ML · 2026-06-02 · unverdicted · novelty 6.0

Formalizes support-exclusion in constrained pricing and introduces target-aware controller with certified bands and regret-information accounting that identifies when polynomial target mass succeeds but 1/t branches fail without extra movement.

citing papers explorer

Showing 1 of 1 citing paper.

  • Resource-Constrained Adaptive Inference for Sequential Pricing stat.ML · 2026-06-02 · unverdicted · none · ref 31 · internal anchor

    Formalizes support-exclusion in constrained pricing and introduces target-aware controller with certified bands and regret-information accounting that identifies when polynomial target mass succeeds but 1/t branches fail without extra movement.