Pith Number
pith:LAMQRSQM
pith:2024:LAMQRSQMP6D7WJSHDMY7D6RD5O
not attested
not anchored
not stored
refs resolved
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
FrontierMath shows that current AI models solve under 2% of hundreds of original expert-level mathematics problems.
arxiv:2411.04872 v7 · 2024-11-07 · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LAMQRSQMP6D7WJSHDMY7D6RD5O}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
1
Bitcoin timestamp
2
Internet Archive
3
Author claim
· sign in to
claim
4
Citations
5
Replications
✓
Portable graph bundle live · download bundle · merged
state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same
current state with the deterministic merge algorithm.
Claims
C1strongest claim
Current state-of-the-art AI models solve under 2% of problems, revealing a vast gap between AI capabilities and the prowess of the mathematical community.
C2weakest assumption
The problems are genuinely original and unpublished with no data contamination risk, and automated verification reliably measures true mathematical reasoning ability.
C3one line summary
FrontierMath is a new benchmark of hundreds of original hard math problems that current AI models solve less than 2% of.
References
[1] MSC2020 Mathematics Subject Classification System , author =
[2] Training verifiers to solve math word problems, 2021 , author =
[3] Advances in neural information processing systems , volume=
[4] Measuring mathematical problem solving with the math dataset , author =
[5] Math Olympiad Hardness Scale (MOHS) , author =
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:46.078189Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
581908ca0c7f87fb26471b31f1fa23eb8f8f8f1f751e14f32836153128aaaeec
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LAMQRSQMP6D7WJSHDMY7D6RD5O \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 581908ca0c7f87fb26471b31f1fa23eb8f8f8f1f751e14f32836153128aaaeec
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "15b96d2206b7385c6251113cf6382dd4dfd673d0492d321994064f46348f4c80",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.AI",
"submitted_at": "2024-11-07T17:07:35Z",
"title_canon_sha256": "e39b5e54a321b6dd2a2dcd2586cbb61b3fd68e79c2758b8eeaa45692171d911f"
},
"schema_version": "1.0",
"source": {
"id": "2411.04872",
"kind": "arxiv",
"version": 7
}
}