Soohak is a new 439-problem mathematician-authored benchmark showing frontier LLMs reach only 30% on research math and fail to exceed 50% on refusing ill-posed questions.
Short proofs in combinatorics, probability and number theory II
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
We give a quintet of proofs resulting from questions posed by Erd\H{o}s. These questions concern ordinary lines in planar point sets, sequences with uniformly small exponential sums, $K_4$-free $4$-critical graphs with few chords in any cycle, a counterexample to a "fewnomial" version of the Erd\H{o}s--Tur\'{a}n discrepancy bound, and a finiteness theorem for integers $n$ such that $n-a k^2$ is prime for all $k\leq \sqrt{n/a}$ coprime to $n$ (for fixed $a\in\mathbb Z_+$). Each proof is due to an internal model at OpenAI.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
citing papers explorer
-
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
Soohak is a new 439-problem mathematician-authored benchmark showing frontier LLMs reach only 30% on research math and fail to exceed 50% on refusing ill-posed questions.
-
AI co-mathematician: Accelerating mathematicians with agentic AI
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.