LLMs show poor calibration in predicting task success and token use on software engineering benchmarks, causing market auctions to underperform compared to perfect information scenarios, with limited improvement from added context.
Scaling Small Agents Through Strategy Auctions
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MarketBench: Evaluating AI Agents as Market Participants
LLMs show poor calibration in predicting task success and token use on software engineering benchmarks, causing market auctions to underperform compared to perfect information scenarios, with limited improvement from added context.