Pith Number

pith:YEK2EB3U

pith:2025:YEK2EB3UZILCNPFUT75SKOASPQ

not attested not anchored not stored refs resolved

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

Bo Li, Fanyi Pu, Kairui Hu, Penghao Wu, Wang Xiao, Xiang Yue, Yuanhan Zhang, Ziwei Liu

Video-MMMU benchmark shows large multimodal models decline sharply in performance as video tasks require more cognitive adaptation.

arxiv:2501.13826 v1 · 2025-01-23 · cs.CV · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{YEK2EB3UZILCNPFUT75SKOASPQ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Evaluation of LMMs reveals a steep decline in performance as cognitive demands increase and highlights a significant gap between human and model knowledge acquisition, underscoring the need for methods to enhance LMMs' capability to learn and adapt from videos.

C2weakest assumption

That the 300 videos and 900 human-annotated questions accurately and unbiasedly capture the three cognitive stages of knowledge acquisition without selection or annotation artifacts affecting the measured gaps.

C3one line summary

Video-MMMU benchmark shows large multimodal models exhibit steep performance drops on higher cognitive tasks when learning from professional videos and lag significantly behind humans in knowledge acquisition.

References

62 extracted · 62 resolved · 14 Pith anchors

[1] Anthropic. Claude Team. Introducing Claude 3.5 Sonnet. https://www.anthropic.com/claude/sonnet ,

[2] A systematic classification of knowl- edge, reasoning, and context within the ARC dataset 2018

[3] Temporalbench: Towards fine-grained temporal understanding for multimodal video models 2024

[4] Auroracap: Efficient, performant video detailed captioning and a new benchmark 2024

[5] Autoeval-video: An automatic benchmark for assessing large vision language models in open-ended video question answer- ing 2023

Formal links

1 machine-checked theorem link

Cited by

66 papers in Pith

CineCap: Structured Reasoning with Spatio-Temporal Anchors for Cinematographic Video Captioning

ReasonCLIP-58M: Visually Grounded Commonsense Reasoning Supervision for CLIP

CARE: Competence-Aware Reward Shaping for Adaptive Reasoning Length in Video-MLLMs

Reasoning as Intersection: Consensus-Frame Alignment for Visual Focus in Video-MLLMs

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Receipt and verification

First computed	2026-05-18T03:19:23.485360Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

c115a20774ca1626bcb49ffb2538127c01322595c20ebd0fdb3baf0c12ded52d

Aliases

arxiv: 2501.13826 · arxiv_version: 2501.13826v1 · doi: 10.48550/arxiv.2501.13826 · pith_short_12: YEK2EB3UZILC · pith_short_16: YEK2EB3UZILCNPFU · pith_short_8: YEK2EB3U

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/YEK2EB3UZILCNPFUT75SKOASPQ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c115a20774ca1626bcb49ffb2538127c01322595c20ebd0fdb3baf0c12ded52d

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "e26b739eaab3a336fdbef524f8c1175a46f3aac3cfe83c64a58b41e5c4e16c7a",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-01-23T16:51:47Z",
    "title_canon_sha256": "d04a3d1b3579b429fd81cd2b06c42dc5c53e786c742b441ef725394c06e527a3"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2501.13826",
    "kind": "arxiv",
    "version": 1
  }
}