Pith Number
pith:4O7HQCVO
pith:2024:4O7HQCVOBKXK7AXX7NVR7JHMEM
not attested
not anchored
not stored
refs resolved
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Even the strongest multimodal LLMs fail to reach 60 percent accuracy on high-resolution real-world tasks
arxiv:2408.13257 v3 · 2024-08-23 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4O7HQCVOBKXK7AXX7NVR7JHMEM}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
1
Bitcoin timestamp
2
Internet Archive
3
Author claim
· sign in to
claim
4
Citations
5
Replications
✓
Portable graph bundle live · download bundle · merged
state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same
current state with the deterministic merge algorithm.
Claims
C1strongest claim
even the most advanced models struggle with our benchmarks, where none of them reach 60% accuracy
C2weakest assumption
The 13,366 filtered images and 29,429 QA pairs created by 25 annotators and 7 experts truly represent high-resolution real-world scenarios that are extremely challenging even for humans
C3one line summary
MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.
References
[1] Ntire 2017 challenge on single image super-resolution: Dataset and study
[2] PaLM 2 Technical Report
[3] OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
[4] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
[5] TouchStone: Evaluating vision-language models by language models
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:48.584764Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
e3be780aae0aaeaf82f7fb6b1fa4ec232acef96c97153915e475c39bf8505b35
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4O7HQCVOBKXK7AXX7NVR7JHMEM \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e3be780aae0aaeaf82f7fb6b1fa4ec232acef96c97153915e475c39bf8505b35
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "5d3bbb8b38c0d16507887f6e562134f837670dcf21be01c1811979ce43518d33",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-08-23T17:59:51Z",
"title_canon_sha256": "b579ae444d9a4bd2c060336112ea901912da858435643d1dae227d8eabb9fa89"
},
"schema_version": "1.0",
"source": {
"id": "2408.13257",
"kind": "arxiv",
"version": 3
}
}