CostBench is a new travel-planning benchmark showing that leading LLM agents often fail to select cost-optimal tool sequences and drop sharply in performance when facing dynamic events like tool failures or cost changes.
Each tool defines its input types through its parameters (the parameter name indicates the data type) and its output type in its description
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents
CostBench is a new travel-planning benchmark showing that leading LLM agents often fail to select cost-optimal tool sequences and drop sharply in performance when facing dynamic events like tool failures or cost changes.