CostBench is a new travel-planning benchmark showing that leading LLM agents often fail to select cost-optimal tool sequences and drop sharply in performance when facing dynamic events like tool failures or cost changes.
That is to say, you shouldn’t use the exact word to describe the user requirements
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents
CostBench is a new travel-planning benchmark showing that leading LLM agents often fail to select cost-optimal tool sequences and drop sharply in performance when facing dynamic events like tool failures or cost changes.