pith:JATB3AB7
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
The MATH-Vision dataset of 3,040 competition-sourced visual math problems reveals a large performance gap between current large multimodal models and human solvers.
arxiv:2402.14804 v1 · 2024-02-22 · cs.CV · cs.AI · cs.CL · cs.LG · math.HO
Record completeness
Claims
Through extensive experimentation, we unveil a notable performance gap between current LMMs and human performance on MATH-V, underscoring the imperative for further advancements in LMMs.
The curation process from real competitions produces a representative and unbiased sample of visual mathematical reasoning challenges without introducing selection effects that favor certain problem types.
MATH-Vision is a new benchmark of 3,040 visual mathematical competition problems that reveals substantial gaps between large multimodal models and human performance in mathematical reasoning.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:13.127495Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519 (pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
48261d803f7ce8443556151e062668dcb02f1c5399b48b445cea402c82a6e058
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JATB3AB7PTUEINKWCUPAMJTI3S \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 48261d803f7ce8443556151e062668dcb02f1c5399b48b445cea402c82a6e058
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "ab8db3b1d9ffae8b1caabd3e948c239ccc005888bae1db02f29069b09bca720b",
"cross_cats_sorted": [
"cs.AI",
"cs.CL",
"cs.LG",
"math.HO"
],
"license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-02-22T18:56:38Z",
"title_canon_sha256": "cfba9a1c2871cc9808744fdd45c3ddacdb2bc0720bce42dbe33b3e93b236e091"
},
"schema_version": "1.0",
"source": {
"id": "2402.14804",
"kind": "arxiv",
"version": 1
}
}