pith:OS4GEAZF
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
A new benchmark shows most large multimodal models score below 50 out of 100 on visual text tasks.
arxiv:2501.00321 v2 · 2024-12-31 · cs.CV · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{OS4GEAZFRAHNUBG7PVAC4WBW5T}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
After carefully benchmarking state-of-the-art LMMs, we find that most LMMs score below 50 (100 in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning.
That the chosen 31 scenarios and 10,000 human-verified question-answer pairs, together with the private test set, provide an unbiased and comprehensive measure of the five claimed limitations without selection effects that favor certain model failure modes.
OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:13.152917Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
74b8620325880eda04df7d402e5836ecf84b4e03d4c6959096231f39559c4cf7
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/OS4GEAZFRAHNUBG7PVAC4WBW5T \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 74b8620325880eda04df7d402e5836ecf84b4e03d4c6959096231f39559c4cf7
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "011b8e29481deb917e48675e59fec8ebd7a34fa2db59936c61cb7fcc55a6ccc9",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-12-31T07:32:35Z",
"title_canon_sha256": "784fe27428b4eab38602e77a2b9c56620512c96b39067546e703d274c050939e"
},
"schema_version": "1.0",
"source": {
"id": "2501.00321",
"kind": "arxiv",
"version": 2
}
}