Pith Number

pith:VIEYAOND

pith:2024:VIEYAONDM5LUTKLMMNKATGG3N7

not attested not anchored not stored refs resolved

BLINK: Multimodal Large Language Models Can See but Not Perceive

Bangzheng Li, Dan Roth, Haoyu Wang, Noah A. Smith, Ranjay Krishna, Wei-Chiu Ma, Xingyu Fu, Xudong Lin, Yu Feng, Yushi Hu

Multimodal LLMs like GPT-4V reach only 51% accuracy on visual perception tasks that humans solve at 96%.

arxiv:2404.12390 v4 · 2024-04-18 · cs.CV · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{VIEYAONDM5LUTKLMMNKATGG3N7}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

even the best-performing GPT-4V and Gemini achieve accuracies of 51.26% and 45.72%, only 13.17% and 7.63% higher than random guessing, indicating that such perception abilities have not emerged yet in recent multimodal LLMs

C2weakest assumption

That the selected tasks genuinely require visual perception that cannot be solved through language patterns or statistical shortcuts in the training data.

C3one line summary

BLINK benchmark shows multimodal LLMs reach only 45-51 percent accuracy on core visual perception tasks where humans achieve 95 percent, indicating these abilities have not emerged.

References

90 extracted · 90 resolved · 20 Pith anchors

[1] Introducing the next generation of claude.https://www.anthropic.com/news/ claude-3-family (March 2024) 11, 12, 23, 24 2024

[2] In: AAAI (2019) 10 2019

[3] Advances in Neural Information Processing Systems35, 23716–23736 (2022) 2, 4, 22 2022

[4] In: Proceedings of the IEEE international conference on computer vision 2015

[5] OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models 2023 · arXiv:2308.01390

Formal links

2 machine-checked theorem links

Cited by

33 papers in Pith

Gemma 3 Technical Report

Grounded Reinforcement Learning for Visual Reasoning

ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

When Vision Speaks for Sound

What's Holding Back Latent Visual Reasoning?

Receipt and verification

First computed	2026-05-17T23:38:50.297986Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

aa098039a3675749a96c63540998db6fc6907ba0875170782140cef6079be0de

Aliases

arxiv: 2404.12390 · arxiv_version: 2404.12390v4 · doi: 10.48550/arxiv.2404.12390 · pith_short_12: VIEYAONDM5LU · pith_short_16: VIEYAONDM5LUTKLM · pith_short_8: VIEYAOND

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/VIEYAONDM5LUTKLMMNKATGG3N7 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: aa098039a3675749a96c63540998db6fc6907ba0875170782140cef6079be0de

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "dd25bcb3e35202474023a787b0b9d122840766b9a54178a832f88e9f180d9e66",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-04-18T17:59:54Z",
    "title_canon_sha256": "4d8fd9e1fea6457fae3bc1f04cdd373d055d3fb0b8cdf6f80054724814cfc882"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2404.12390",
    "kind": "arxiv",
    "version": 4
  }
}