Empirical study of frontier AI on Project Euler finds power-law machine effort scaling with human difficulty (b<1 for 20/25 models) and moderate support for exponential success probability decay, with SOTA 50% horizons at 2.5-4.3 human hours.
Solving quantitative reasoning problems with language models https://dl.acm.org/doi/10.5555/3600270.3600548
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Human vs Machine Mathematical Difficulty on Project Euler: An Experimental Analysis
Empirical study of frontier AI on Project Euler finds power-law machine effort scaling with human difficulty (b<1 for 20/25 models) and moderate support for exponential success probability decay, with SOTA 50% horizons at 2.5-4.3 human hours.