Pith Number

pith:LAMQRSQM

pith:2024:LAMQRSQMP6D7WJSHDMY7D6RD5O

not attested not anchored not stored refs resolved

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Alex Gunning, Anson Ho, Bogdan Grechuk, Caroline Falkman Olsson, Diego Chicharro, Ege Erdil, Elizabeth Pratt, Elliot Glazer, Emily de Oliveira Santos, Evan Chen, Grant Barkley, Jaime Sevilla, Jean-Stanislas Denain, Lionel Levine, Mark Wildon, Matej Vrzala, Matthew Barnett, Natalie Stewart, Olli J\"arviniemi, Qiuyu Ren, Robert Sandler, Shreepranav Varma Enugandla, Tamay Besiroglu, Tetiana Grechuk

FrontierMath shows that current AI models solve under 2% of hundreds of original expert-level mathematics problems.

arxiv:2411.04872 v7 · 2024-11-07 · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{LAMQRSQMP6D7WJSHDMY7D6RD5O}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Current state-of-the-art AI models solve under 2% of problems, revealing a vast gap between AI capabilities and the prowess of the mathematical community.

C2weakest assumption

The problems are genuinely original and unpublished with no data contamination risk, and automated verification reliably measures true mathematical reasoning ability.

C3one line summary

FrontierMath is a new benchmark of hundreds of original hard math problems that current AI models solve less than 2% of.

References

32 extracted · 32 resolved · 2 Pith anchors

[1] MSC2020 Mathematics Subject Classification System , author =

[2] Training verifiers to solve math word problems, 2021 , author = 2021

[3] Advances in neural information processing systems , volume=

[4] Measuring mathematical problem solving with the math dataset , author =

[5] Math Olympiad Hardness Scale (MOHS) , author =

Cited by

23 papers in Pith

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

STAR-P\'olyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision

CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

Co-Designing Quantum Codes with Transversal Diagonal Gates via Multi-Agent Systems

Receipt and verification

First computed	2026-05-17T23:38:46.078189Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

581908ca0c7f87fb26471b31f1fa23eb8f8f8f1f751e14f32836153128aaaeec

Aliases

arxiv: 2411.04872 · arxiv_version: 2411.04872v7 · doi: 10.48550/arxiv.2411.04872 · pith_short_12: LAMQRSQMP6D7 · pith_short_16: LAMQRSQMP6D7WJSH · pith_short_8: LAMQRSQM

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/LAMQRSQMP6D7WJSHDMY7D6RD5O \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 581908ca0c7f87fb26471b31f1fa23eb8f8f8f1f751e14f32836153128aaaeec

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "15b96d2206b7385c6251113cf6382dd4dfd673d0492d321994064f46348f4c80",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2024-11-07T17:07:35Z",
    "title_canon_sha256": "e39b5e54a321b6dd2a2dcd2586cbb61b3fd68e79c2758b8eeaa45692171d911f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2411.04872",
    "kind": "arxiv",
    "version": 7
  }
}