LLMs score 0.96 on standard probability exercises but 0.59 on counterintuitive ones and drop further with biased wording or misleading cues, indicating they are not genuine probabilistic reasoners.
‘A Problem in Probability’
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A curated dataset of counterintuitive discrete probability problems with human solutions, built to benchmark LLM reasoning on bias-prone tasks.
citing papers explorer
-
How reliable are LLMs when it comes to playing dice?
LLMs score 0.96 on standard probability exercises but 0.59 on counterintuitive ones and drop further with biased wording or misleading cues, indicating they are not genuine probabilistic reasoners.