The paper presents OpenEstimate, a multi-domain benchmark showing that six frontier LLMs produce inaccurate and overconfident probabilistic priors on numerical estimation tasks that require synthesizing background information.
The standard deviation reflects uncertainty due to potential sampling biases and regional variations
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
OpenEstimate: Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data
The paper presents OpenEstimate, a multi-domain benchmark showing that six frontier LLMs produce inaccurate and overconfident probabilistic priors on numerical estimation tasks that require synthesizing background information.