pith. machine review for the scientific record. sign in
Pith Number

pith:E4ST4TYI

pith:2024:E4ST4TYIFZXUNH4NLCXSENDMNM
not attested not anchored not stored refs resolved

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Bo Li, Chunyuan Li, Fanyi Pu, Jingkang Yang, Joshua Adrian Cahyono, Kaichen Zhang, Kairui Hu, Peiyuan Zhang, Shuai Liu, Yuanhan Zhang, Ziwei Liu

Evaluating large multimodal models requires balancing wide task coverage, low computational cost, and zero data contamination in benchmarks.

arxiv:2407.12772 v2 · 2024-07-17 · cs.CL · cs.CV

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open

Claims

C1strongest claim

Our work highlights the importance of considering the evaluation trilemma and provides practical solutions to navigate the trade-offs in evaluating large multi-modal models.

C2weakest assumption

That the live data sources and pruning rules in LMMS-EVAL LITE and LIVEBENCH truly deliver zero contamination and maintained coverage without introducing new selection biases or missing important capabilities.

C3one line summary

LMMS-EVAL delivers a standardized multimodal evaluation framework with lite and live variants that target the trade-offs among coverage, cost, and zero contamination.

References

24 extracted · 24 resolved · 3 Pith anchors

[1] InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning · arXiv:2305.06500
[2] Internlm-xcomposer2- 4khd: A pioneering large vision-language model handling resolutions from 336 pixels to 4k hd
[3] MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models · arXiv:2306.13394
[4] Making llama see and draw with seed tokenizer 2023
[5] A diagram is worth a dozen images.ArXiv, abs/1603.07396 2022 · arXiv:1603.07396

Formal links

2 machine-checked theorem links

Cited by

17 papers in Pith

Receipt and verification
First computed2026-05-17T23:38:15.008955Z
Builderpith-number-builder-2026-05-17-v1
SignaturePith Ed25519 (pith-v1-2026-05) · public key
Schemapith-number/v1.0

Canonical hash

27253e4f082e6f469f8d58af22346c6b1fce493d856b203c45919958e98d8fa0

Aliases

arxiv: 2407.12772 · arxiv_version: 2407.12772v2 · doi: 10.48550/arxiv.2407.12772
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/E4ST4TYIFZXUNH4NLCXSENDMNM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 27253e4f082e6f469f8d58af22346c6b1fce493d856b203c45919958e98d8fa0
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "4e942c277c09028045710149d7f7bf8b6da6f6a2028782aa5a1e0e123c0fb3bd",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-07-17T17:51:53Z",
    "title_canon_sha256": "9304da0fd6df0a43a304bc7313fd5757487e366e8650beec5abefe326d06a3a2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2407.12772",
    "kind": "arxiv",
    "version": 2
  }
}