Soohak is a 439-problem mathematician-curated benchmark where frontier LLMs reach at most 30.4% on research math challenges and no model exceeds 50% on refusal for ill-posed problems.
Short proofs in combinatorics and number theory
5 Pith papers cite this work. Polarity classification is still indexing.
5
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 5roles
background 1polarities
background 1representative citing papers
f(n) exceeds (C-o(1)) log n for any fixed C>1 and infinitely many n, so limsup f(n)/log n is infinite.
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
citing papers explorer
-
AI co-mathematician: Accelerating mathematicians with agentic AI
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
- Advancing Mathematics Research with AI-Driven Formal Proof Search