pith:VIEYAOND
BLINK: Multimodal Large Language Models Can See but Not Perceive
Multimodal LLMs like GPT-4V reach only 51% accuracy on visual perception tasks that humans solve at 96%.
arxiv:2404.12390 v4 · 2024-04-18 · cs.CV · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{VIEYAONDM5LUTKLMMNKATGG3N7}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
even the best-performing GPT-4V and Gemini achieve accuracies of 51.26% and 45.72%, only 13.17% and 7.63% higher than random guessing, indicating that such perception abilities have not emerged yet in recent multimodal LLMs
That the selected tasks genuinely require visual perception that cannot be solved through language patterns or statistical shortcuts in the training data.
BLINK benchmark shows multimodal LLMs reach only 45-51 percent accuracy on core visual perception tasks where humans achieve 95 percent, indicating these abilities have not emerged.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:50.297986Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
aa098039a3675749a96c63540998db6fc6907ba0875170782140cef6079be0de
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VIEYAONDM5LUTKLMMNKATGG3N7 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: aa098039a3675749a96c63540998db6fc6907ba0875170782140cef6079be0de
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "dd25bcb3e35202474023a787b0b9d122840766b9a54178a832f88e9f180d9e66",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-04-18T17:59:54Z",
"title_canon_sha256": "4d8fd9e1fea6457fae3bc1f04cdd373d055d3fb0b8cdf6f80054724814cfc882"
},
"schema_version": "1.0",
"source": {
"id": "2404.12390",
"kind": "arxiv",
"version": 4
}
}