pith. sign in
Pith Number

pith:4JZOVUM5

pith:2021:4JZOVUM5DEXOJPFR6BJCVXPLGK
not attested not anchored not stored refs resolved

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

Jesse Michael Han, Kunhao Zheng, Stanislas Polu

MiniF2F supplies 488 formal statements from math contests as a shared benchmark for neural theorem provers across systems.

arxiv:2109.00110 v2 · 2021-08-31 · cs.AI · cs.FL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4JZOVUM5DEXOJPFR6BJCVXPLGK}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We present miniF2F, a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving.

C2weakest assumption

That the chosen contest problems can be accurately translated into correct formal statements in each target system and that these statements form a representative and stable measure of progress for neural theorem provers.

C3one line summary

MiniF2F is a new cross-system benchmark containing 488 Olympiad-level mathematics problems formalized in Metamath, Lean, Isabelle, and HOL Light, together with baseline results from a GPT-3-based prover.

References

18 extracted · 18 resolved · 2 Pith anchors

[1] Holist: An envi- ronment for machine learning of higher order logic theorem proving 1905
[2] Kevin Buzzard, Johan Commelin, and Patrick Massot 2020
[3] Imagenet: A large-scale hi- erarchical image database 2009
[4] Proof artifact co-training for theorem proving with language models.arXiv preprint arXiv:2102.06203
[5] Measuring Mathematical Problem Solving With the MATH Dataset · arXiv:2103.03874

Formal links

1 machine-checked theorem link

Cited by

24 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.754275Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

e272ead19d192ee4bcb1f0522addeb3293e4427ec8adb44786f12e3d3251cd23

Aliases

arxiv: 2109.00110 · arxiv_version: 2109.00110v2 · doi: 10.48550/arxiv.2109.00110 · pith_short_12: 4JZOVUM5DEXO · pith_short_16: 4JZOVUM5DEXOJPFR · pith_short_8: 4JZOVUM5
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4JZOVUM5DEXOJPFR6BJCVXPLGK \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e272ead19d192ee4bcb1f0522addeb3293e4427ec8adb44786f12e3d3251cd23
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1135ccff970ede9810711c2e659e388f6309e7db620a343c59e92ac01865a508",
    "cross_cats_sorted": [
      "cs.FL",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2021-08-31T23:21:12Z",
    "title_canon_sha256": "cc0244c5959f2c66a5c6d38b722450654424231e35e121c472593706bd580a10"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2109.00110",
    "kind": "arxiv",
    "version": 2
  }
}