Pith Number

pith:WQLLX4OM

pith:2023:WQLLX4OMUVD4PAVZ72W5IY65L3

not attested not anchored not stored refs resolved

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Botao Yu, Boyuan Zheng, Cong Wei, Dongfu Jiang, Ge Zhang, Huan Sun, Kai Zhang, Ming Yin, Renliang Sun, Ruibin Yuan, Ruoqi Liu, Samuel Stevens, Tianyu Zheng, Weiming Ren, Wenhao Huang, Wenhu Chen, Xiang Yue, Yibo Liu, Yuansheng Ni, Yu Su, Yuxuan Sun, Zhenzhu Yang

Multimodal models like GPT-4V and Gemini Ultra reach only 56-59% accuracy on a new benchmark of 11,500 college-level expert questions.

arxiv:2311.16502 v4 · 2023-11-27 · cs.CL · cs.AI · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{WQLLX4OMUVD4PAVZ72W5IY65L3}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Even the advanced GPT-4V and Gemini Ultra only achieve accuracies of 56% and 59% respectively, indicating significant room for improvement.

C2weakest assumption

The collected questions and images accurately represent the perception and reasoning demands of college-level expertise across the six disciplines.

C3one line summary

MMMU provides 11.5K heterogeneous college-level multimodal questions that current models solve at 56-59% accuracy, establishing a new standard for expert multimodal evaluation.

References

97 extracted · 97 resolved · 34 Pith anchors

[1] Artificial general intelligence is already here 2023

[2] Flamingo: a visual language model for few-shot learning 2022

[3] Lawrence Zitnick, and Devi Parikh 2015

[4] OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models 2023 · arXiv:2308.01390

[5] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond 2023 · arXiv:2308.12966

Formal links

3 machine-checked theorem links

Cited by

49 papers in Pith

ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models

V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard?

Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

Qwen2.5-VL Technical Report

FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Receipt and verification

First computed	2026-05-17T23:38:53.375011Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

b416bbf1cca547c782b9feadd463dd5ee863fd5f73a381dd348d67f0b449ab90

Aliases

arxiv: 2311.16502 · arxiv_version: 2311.16502v4 · doi: 10.48550/arxiv.2311.16502 · pith_short_12: WQLLX4OMUVD4 · pith_short_16: WQLLX4OMUVD4PAVZ · pith_short_8: WQLLX4OM

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/WQLLX4OMUVD4PAVZ72W5IY65L3 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b416bbf1cca547c782b9feadd463dd5ee863fd5f73a381dd348d67f0b449ab90

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "de0ecfa23bacf26dab6973c29b09c6078f8e05cd01f66e073e06de1205925749",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-11-27T17:33:21Z",
    "title_canon_sha256": "c676d155268c4b0c7a75a3b5e40ee86f50174544ced223da0e78878e44a7ea68"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2311.16502",
    "kind": "arxiv",
    "version": 4
  }
}