pith. sign in
Pith Number

pith:LAMQRSQM

pith:2024:LAMQRSQMP6D7WJSHDMY7D6RD5O
not attested not anchored not stored refs resolved

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Alex Gunning, Anson Ho, Bogdan Grechuk, Caroline Falkman Olsson, Diego Chicharro, Ege Erdil, Elizabeth Pratt, Elliot Glazer, Emily de Oliveira Santos, Evan Chen, Grant Barkley, Jaime Sevilla, Jean-Stanislas Denain, Lionel Levine, Mark Wildon, Matej Vrzala, Matthew Barnett, Natalie Stewart, Olli J\"arviniemi, Qiuyu Ren, Robert Sandler, Shreepranav Varma Enugandla, Tamay Besiroglu, Tetiana Grechuk

FrontierMath shows that current AI models solve under 2% of hundreds of original expert-level mathematics problems.

arxiv:2411.04872 v7 · 2024-11-07 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LAMQRSQMP6D7WJSHDMY7D6RD5O}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Current state-of-the-art AI models solve under 2% of problems, revealing a vast gap between AI capabilities and the prowess of the mathematical community.

C2weakest assumption

The problems are genuinely original and unpublished with no data contamination risk, and automated verification reliably measures true mathematical reasoning ability.

C3one line summary

FrontierMath is a new benchmark of hundreds of original hard math problems that current AI models solve less than 2% of.

References

32 extracted · 32 resolved · 2 Pith anchors

[1] MSC2020 Mathematics Subject Classification System , author =
[2] Training verifiers to solve math word problems, 2021 , author = 2021
[3] Advances in neural information processing systems , volume=
[4] Measuring mathematical problem solving with the math dataset , author =
[5] Math Olympiad Hardness Scale (MOHS) , author =

Cited by

23 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.078189Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

581908ca0c7f87fb26471b31f1fa23eb8f8f8f1f751e14f32836153128aaaeec

Aliases

arxiv: 2411.04872 · arxiv_version: 2411.04872v7 · doi: 10.48550/arxiv.2411.04872 · pith_short_12: LAMQRSQMP6D7 · pith_short_16: LAMQRSQMP6D7WJSH · pith_short_8: LAMQRSQM
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LAMQRSQMP6D7WJSHDMY7D6RD5O \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 581908ca0c7f87fb26471b31f1fa23eb8f8f8f1f751e14f32836153128aaaeec
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "15b96d2206b7385c6251113cf6382dd4dfd673d0492d321994064f46348f4c80",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2024-11-07T17:07:35Z",
    "title_canon_sha256": "e39b5e54a321b6dd2a2dcd2586cbb61b3fd68e79c2758b8eeaa45692171d911f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2411.04872",
    "kind": "arxiv",
    "version": 7
  }
}