COMPASS: Benchmarking Constrained Optimization in LLM Agents

Tian Qin , Felix Bai , Ting-Yao Hu , Raviteja Vemulapalli , Hema Swetha Koppula , Zhiyang Xu , Bowen Jin , Mert Cemri

show 3 more authors

Jiarui Lu Zirui Wang Meng Cao

Authors on Pith no claims yet

classification 💻 cs.LG

keywords agentsoptimizationcompassconstrainedinformationmustconstraintsdecision-making

0 comments

read the original abstract

Human decision-making often involves constrained optimization. As LLM agents are deployed to assist with real-world tasks like travel planning, shopping, and scheduling, they must mirror this capability. We introduce COMPASS, a benchmark that evaluates whether LLM agents can perform constrained optimization in realistic travel planning settings. To success in these tasks, agents must engage in multi-turn conversations with user to gather task information as well as use tools to gather information from the database. Then agents must propose a solution that not only satisfies hard constraints but also optimizes user's utility objective. Evaluating state-of-the-art models, we reveal a significant feasible-optimal gap: while models achieve 70-90% feasibility (constraint satisfaction), they reach only 20-60% optimality (utility optimization). Our analysis shows that tool use is not the bottleneck. Instead, the core limitation is insufficient exploration of the search space, with success strongly correlating with information gathered. Coding agents show a promising approach to mitigate this gap. Together, COMPASS provides a testbed for developing LLM agents that can truly mirror human decision-making by both satisfying constraints and optimizing objectives.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MAESTRO: Adapting GUIs and Guiding Navigation with User Preferences in Conversational Agents with GUIs
cs.HC 2026-04 unverdicted novelty 6.0

MAESTRO adds a shared preference memory plus GUI-adaptation and workflow-navigation mechanisms to conversational agents with GUIs and tests them in a 33-person movie-booking study.