The paper presents OpenEstimate, a multi-domain benchmark showing that six frontier LLMs produce inaccurate and overconfident probabilistic priors on numerical estimation tasks that require synthesizing background information.
Begin your analysis by showing your thought process inside <parameter_estimation_process> tags
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
OpenEstimate: Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data
The paper presents OpenEstimate, a multi-domain benchmark showing that six frontier LLMs produce inaccurate and overconfident probabilistic priors on numerical estimation tasks that require synthesizing background information.