Pith Number

pith:RX25OYSZ

pith:2026:RX25OYSZQAVF5OYQF4QZLFJZAJ

not attested not anchored not stored refs pending

Decoding Scientific Experimental Images: The SPUR Benchmark for Perception, Understanding, and Reasoning

Haihong E, Haiyang Sun, Haocheng Gao, Haolin Tian, Jiacheng Liu, Jintong Chen, Junpeng Ding, Mengyuan Ji, Peizhi Zhao, Pengqi Sun, Rongjin Li, Ruomeng Jiang, Siying Lin, Yang Liu, Yang Xu, Yichen Liu, Yuanze Li, Zhongjun Yang, Zichen Tang, Zijie Xi

Current multimodal AI models fall significantly short of expert-level performance when interpreting scientific experimental images.

arxiv:2604.27604 v2 · 2026-04-30 · cs.CV · cs.CE

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{RX25OYSZQAVF5OYQF4QZLFJZAJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Comprehensive evaluation of 20 MLLMs and four multimodal Chain-of-Thought (MCoT) methods reveals that current models fall significantly short of the expert-level requirements for scientific image interpretation, underscoring a critical bottleneck in AI for Science (AI4S) research.

C2weakest assumption

The assumption that the expert-curated images, panel classifications, and generated QA pairs accurately and without bias represent the full range of expert-level perception, cross-panel understanding, and reasoning required for scientific experimental images.

C3one line summary

SPUR benchmark reveals that current multimodal large language models significantly underperform on expert-level perception, cross-panel understanding, and reasoning tasks with complex scientific experimental images.

Receipt and verification

First computed	2026-05-27T01:05:55.424421Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

8df5d76259802a5ebb102f21959539024ada879e5eb2e6e5082f5c167b286a8b

Aliases

arxiv: 2604.27604 · arxiv_version: 2604.27604v2 · doi: 10.48550/arxiv.2604.27604 · pith_short_12: RX25OYSZQAVF · pith_short_16: RX25OYSZQAVF5OYQ · pith_short_8: RX25OYSZ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/RX25OYSZQAVF5OYQF4QZLFJZAJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8df5d76259802a5ebb102f21959539024ada879e5eb2e6e5082f5c167b286a8b

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "ae953b014ecf4a05949ea6189d25d618ac581716eddb0d5331dafc500d53350b",
    "cross_cats_sorted": [
      "cs.CE"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-04-30T08:57:18Z",
    "title_canon_sha256": "818073f2463a46cb1a02677f9b60fabb582f65bbe2f9c5f99d588c77af9dea61"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.27604",
    "kind": "arxiv",
    "version": 2
  }
}