pith. machine review for the scientific record. sign in

arxiv: 2508.14410 · v3 · submitted 2025-08-20 · 💻 cs.AI

ORThought: Benchmarking and Automating Logistics Optimization Modeling

classification 💻 cs.AI
keywords orthoughtlogisticsmodelingautomatingautonomouscomplexdatasetsevaluations
0
0 comments X
read the original abstract

Optimization modeling stands as the engine of scientific decision-making in logistics and transportation, yet its adoption is hindered by a steep expertise threshold and the latency of manual workflows. Automating this process via Large Language Models (LLMs) offers a potential solution, but current approaches face critical bottlenecks: (i) a lack of high-quality, complex benchmarks; (ii) methodological inefficiencies in autonomous multi-agent frameworks, which often exhibit instability and redundant computation; and (iii) evaluations that lack diagnostic depth. In this work, we address these challenges from the following three aspects. First, we introduce LogiOR, a diverse logistics benchmark with rigorous annotations, and enrich existing datasets with the same annotation standard to support community utilization. Second, we propose ORThought, a structured dual-agent framework. By incorporating expert-level modeling principles via chain-of-thought reasoning, ORThought eliminates the redundancy of uncontrolled autonomous agents. Third, extensive empirical evaluations demonstrate that ORThought consistently outperforms state-of-the-art baselines by 9-17 percentage points, exhibiting distinct advantages in handling complex constraints while maintaining high token efficiency. Building on these results, we further conduct a multidimensional error analysis, which identifies key failure modes and success factors, providing actionable insights for future research. The dataset and code are available at https://huggingface.co/datasets/LabMem012/LogiOR and https://github.com/ZJU-TSELab/ORThought, respectively.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.