Pith Number

pith:4O7HQCVO

pith:2024:4O7HQCVOBKXK7AXX7NVR7JHMEM

not attested not anchored not stored refs resolved

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Chaoyou Fu, Feng Li, Haochen Tian, Huanyu Zhang, Junfei Wu, Kun Wang, Liang Wang, Qingsong Wen, Rong Jin, Shuangqing Zhang, Tieniu Tan, Yi-Fan Zhang, Zhang Zhang

Even the strongest multimodal LLMs fail to reach 60 percent accuracy on high-resolution real-world tasks

arxiv:2408.13257 v3 · 2024-08-23 · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{4O7HQCVOBKXK7AXX7NVR7JHMEM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

even the most advanced models struggle with our benchmarks, where none of them reach 60% accuracy

C2weakest assumption

The 13,366 filtered images and 29,429 QA pairs created by 25 annotators and 7 experts truly represent high-resolution real-world scenarios that are extremely challenging even for humans

C3one line summary

MME-RealWorld is the largest manually annotated high-resolution benchmark for MLLMs, where even the best models achieve less than 60% accuracy on challenging real-world tasks.

References

102 extracted · 102 resolved · 30 Pith anchors

[1] Ntire 2017 challenge on single image super-resolution: Dataset and study 2017

[2] PaLM 2 Technical Report 2023 · arXiv:2305.10403

[3] OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models 2023 · arXiv:2308.01390

[4] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond 2023 · arXiv:2308.12966

[5] TouchStone: Evaluating vision-language models by language models 2023

Formal links

1 machine-checked theorem link

Cited by

37 papers in Pith

FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning

DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model

KVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric Redundancy

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Receipt and verification

First computed	2026-05-17T23:38:48.584764Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

e3be780aae0aaeaf82f7fb6b1fa4ec232acef96c97153915e475c39bf8505b35

Aliases

arxiv: 2408.13257 · arxiv_version: 2408.13257v3 · doi: 10.48550/arxiv.2408.13257 · pith_short_12: 4O7HQCVOBKXK · pith_short_16: 4O7HQCVOBKXK7AXX · pith_short_8: 4O7HQCVO

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/4O7HQCVOBKXK7AXX7NVR7JHMEM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e3be780aae0aaeaf82f7fb6b1fa4ec232acef96c97153915e475c39bf8505b35

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "5d3bbb8b38c0d16507887f6e562134f837670dcf21be01c1811979ce43518d33",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-08-23T17:59:51Z",
    "title_canon_sha256": "b579ae444d9a4bd2c060336112ea901912da858435643d1dae227d8eabb9fa89"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2408.13257",
    "kind": "arxiv",
    "version": 3
  }
}