Pith Number

pith:BOZBDNTQ

pith:2024:BOZBDNTQTSVGQ3OV3K34FZNYHQ

not attested not anchored not stored refs resolved

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

Baobao Chang, Benyou Wang, Bofei Gao, Chenghao Ma, Daoguang Zan, Feifan Song, Ge Zhang, Lei Li, Lei Sha, Liang Chen, Qingxiu Dong, Runxin Xu, Shanghaoran Quan, Tianyu Liu, Xuancheng Ren, Yibo Miao, Yichang Zhang, Zefan Cai, Zhengyang Tang, Zhe Yang

A new benchmark of 4428 Olympiad math problems shows even top models like o1-preview reach only 52.55% accuracy.

arxiv:2410.07985 v3 · 2024-10-10 · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{BOZBDNTQTSVGQ3OV3K34FZNYHQ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

even the most advanced models, OpenAI o1-mini and OpenAI o1-preview, struggle with highly challenging Olympiad-level problems, with 60.54% and 52.55% accuracy

C2weakest assumption

The 4428 problems constitute a fair, unbiased, and comprehensive sample of Olympiad-level mathematics, with human annotation free of selection bias or verification errors.

C3one line summary

Omni-MATH supplies 4428 human-verified Olympiad math problems that expose top LLMs achieving only 52.55% to 60.54% accuracy on the most difficult items.

References

77 extracted · 77 resolved · 15 Pith anchors

[1] Training Verifiers to Solve Math Word Problems , author=. 2021 , eprint= 2021

[2] Measuring Mathematical Problem Solving With the MATH Dataset , author=. 2021 , eprint= 2021

[3] Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models , author=. 2023 , eprint= 2023

[4] MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics , author=. 2022 , eprint= 2022

[5] ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics , author=. 2023 , eprint= 2023

Cited by

29 papers in Pith

RMA: an Agentic System for Research-Level Mathematical Problems

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems

Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models

EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving

Receipt and verification

First computed	2026-05-17T23:38:52.937178Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

0bb211b6709caa686dd5dab7c2e5b83c3018ee224315ed0473d3973dd3e1623b

Aliases

arxiv: 2410.07985 · arxiv_version: 2410.07985v3 · doi: 10.48550/arxiv.2410.07985 · pith_short_12: BOZBDNTQTSVG · pith_short_16: BOZBDNTQTSVGQ3OV · pith_short_8: BOZBDNTQ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/BOZBDNTQTSVGQ3OV3K34FZNYHQ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0bb211b6709caa686dd5dab7c2e5b83c3018ee224315ed0473d3973dd3e1623b

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "82870184c474010992b21114f4f6a26d9e67b812d579908e14bf4a1353f907c0",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/publicdomain/zero/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-10-10T14:39:33Z",
    "title_canon_sha256": "e103455cf7c83326169aaa95e18f53b32cf2ecf649d774e9bcbbffd9d9379194"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2410.07985",
    "kind": "arxiv",
    "version": 3
  }
}