Pith Number

pith:E4ST4TYI

pith:2024:E4ST4TYIFZXUNH4NLCXSENDMNM

not attested not anchored not stored refs resolved

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Bo Li, Chunyuan Li, Fanyi Pu, Jingkang Yang, Joshua Adrian Cahyono, Kaichen Zhang, Kairui Hu, Peiyuan Zhang, Shuai Liu, Yuanhan Zhang, Ziwei Liu

Evaluating large multimodal models requires balancing wide task coverage, low computational cost, and zero data contamination in benchmarks.

arxiv:2407.12772 v2 · 2024-07-17 · cs.CL · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{E4ST4TYIFZXUNH4NLCXSENDMNM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our work highlights the importance of considering the evaluation trilemma and provides practical solutions to navigate the trade-offs in evaluating large multi-modal models.

C2weakest assumption

That the live data sources and pruning rules in LMMS-EVAL LITE and LIVEBENCH truly deliver zero contamination and maintained coverage without introducing new selection biases or missing important capabilities.

C3one line summary

LMMS-EVAL delivers a standardized multimodal evaluation framework with lite and live variants that target the trade-offs among coverage, cost, and zero contamination.

References

24 extracted · 24 resolved · 3 Pith anchors

[1] InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning · arXiv:2305.06500

[2] Internlm-xcomposer2- 4khd: A pioneering large vision-language model handling resolutions from 336 pixels to 4k hd

[3] MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models · arXiv:2306.13394

[4] Making llama see and draw with seed tokenizer 2023

[5] A diagram is worth a dozen images.ArXiv, abs/1603.07396 2022 · arXiv:1603.07396

Formal links

2 machine-checked theorem links

Cited by

34 papers in Pith

Spectral Query-Key Product Weight Steering for Training-Free VLM Hallucination Mitigation

DeepInsight: A Unified Evaluation Infrastructure Across the Physical AI Stack

Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

Natural-Language Temporal Grounding in Hour-Long Videos is a Search Problem: A Benchmark and Empirical Decomposition

AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models

Receipt and verification

First computed	2026-05-17T23:38:15.008955Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

27253e4f082e6f469f8d58af22346c6b1fce493d856b203c45919958e98d8fa0

Aliases

arxiv: 2407.12772 · arxiv_version: 2407.12772v2 · doi: 10.48550/arxiv.2407.12772 · pith_short_12: E4ST4TYIFZXU · pith_short_16: E4ST4TYIFZXUNH4N · pith_short_8: E4ST4TYI

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/E4ST4TYIFZXUNH4NLCXSENDMNM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 27253e4f082e6f469f8d58af22346c6b1fce493d856b203c45919958e98d8fa0

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "4e942c277c09028045710149d7f7bf8b6da6f6a2028782aa5a1e0e123c0fb3bd",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-07-17T17:51:53Z",
    "title_canon_sha256": "9304da0fd6df0a43a304bc7313fd5757487e366e8650beec5abefe326d06a3a2"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2407.12772",
    "kind": "arxiv",
    "version": 2
  }
}