Pith Number

pith:XD6Y6EP7

pith:2024:XD6Y6EP7RYKT4SMQC4CJJBPCNK

not attested not anchored not stored refs resolved

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Chaowei Xiao, Chunyuan Li, Dan Roth, Fei Wang, Hoifung Poon, Hsiang-Hui Liu, James Y. Huang, Kai-Wei Chang, Kai Zhang, Mingyu Derek Ma, Muhao Chen, Nan Xu, Pan Lu, Qin Liu, Sheng Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Wenxuan Zhou, Xiaogeng Liu, Xingyu Fu, Zekun Li

MuirBench reveals that even leading multimodal LLMs like GPT-4o achieve only 68 percent accuracy on multi-image tasks.

arxiv:2406.09411 v2 · 2024-06-13 · cs.CV · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{XD6Y6EP7RYKT4SMQC4CJJBPCNK}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Even the best-performing models like GPT-4o and Gemini Pro find it challenging to solve MuirBench, achieving 68.0% and 49.3% in accuracy. Open-source multimodal LLMs trained on single images can hardly generalize to multi-image questions, hovering below 33.3% in accuracy.

C2weakest assumption

The assumption that each standard instance paired with an unanswerable variant has only minimal semantic differences and that this pairing reliably isolates multi-image understanding without introducing new biases or artifacts in question construction.

C3one line summary

MuirBench is a new benchmark showing that top multimodal LLMs struggle with robust multi-image understanding, with GPT-4o at 68% and open-source models below 33% accuracy.

References

72 extracted · 72 resolved · 13 Pith anchors

[1] Flamingo: a visual language model for few-shot learning 2022

[2] OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models 2023 · arXiv:2308.01390

[3] Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond, 2023 2023

[4] Visual question answering on image sets 2020

[5] Language models are few-shot learners 1901

Cited by

23 papers in Pith

Qwen2.5-VL Technical Report

MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

Focus-then-Context: Subject-Centric Progressive Visual Token Reduction for Vision-Language Models

Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

Receipt and verification

First computed	2026-05-17T23:38:46.018797Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

b8fd8f11ff8e153e499017049485e26ab991852fb84c3b7a9514944acb09a738

Aliases

arxiv: 2406.09411 · arxiv_version: 2406.09411v2 · doi: 10.48550/arxiv.2406.09411 · pith_short_12: XD6Y6EP7RYKT · pith_short_16: XD6Y6EP7RYKT4SMQ · pith_short_8: XD6Y6EP7

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/XD6Y6EP7RYKT4SMQC4CJJBPCNK \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b8fd8f11ff8e153e499017049485e26ab991852fb84c3b7a9514944acb09a738

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "8d5d7b9915588fe1d70c0c7a1399795ba30490968aba4c01fe00fe7d034964be",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-06-13T17:59:52Z",
    "title_canon_sha256": "7305f5e7a2333182d246e61898bc65d3ee91cf09ceb7f4a464b2086a045c3a99"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2406.09411",
    "kind": "arxiv",
    "version": 2
  }
}