Pith Number

pith:42WE65LJ

pith:2025:42WE65LJACTHJRNRQPEDHHNNVL

not attested not anchored not stored refs resolved

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Bingxiang He, Bokai Xu, Chi Chen, Chongyi Wang, Fuwei Huang, Ganqu Cui, Guoyang Zeng, Hanyu Liu, Hongyuan Liu, Jie Cai, Jie Zhou, Jingkun Tang, Ji Qi, Junbo Cui, Liqing Ruan, Luoyuan Zhang, Maosong Sun, Ning Ding, Qining Guo, Tianchi Cai, Tianyu Yu, Weize Chen, Wenhao Hu, Wenshuo Ma, Xu Han, Yingjing Xu, Yuanqian Zhao, Yuan Yao, Yuxiang Huang, Yuxuan Li, Zefan Wang, Zhihui He, Zhiyuan Liu, Zonghao Guo

An 8B multimodal model outperforms GPT-4o-latest and Qwen2.5-VL 72B on OpenCompass while using far less memory and inference time.

arxiv:2509.18154 v1 · 2025-09-16 · cs.LG · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{42WE65LJACTHJRNRQPEDHHNNVL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MiniCPM-V 4.5 surpasses GPT-4o-latest and Qwen2.5-VL 72B on OpenCompass while using 46.7% GPU memory and 8.7% inference time of Qwen2.5-VL 7B on VideoMME.

C2weakest assumption

The reported benchmarks and efficiency measurements generalize beyond the specific evaluation suites and hardware configurations used in the experiments.

C3one line summary

An 8B MLLM reaches state-of-the-art efficiency and performance under 30B by combining a unified 3D resampler, joint document-text training, and hybrid RL for reasoning modes.

References

77 extracted · 77 resolved · 9 Pith anchors

[1] Glm-4.5v and glm-4.1v-thinking: Towards versatile multimodal reasoning with scalable reinforcement learning, 2025 2025

[2] Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fan 2025

[3] Mimo-vl technical report, 2025 2025

[4] Openai platform chatgpt-4o, 2025 2025

[5] Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond 2023

Cited by

32 papers in Pith

Enhancing Multimodal Large Language Models for Safety-Critical Driving Video Analysis

VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

RoadTones: Tone Controllable Text Generation from Road Event Videos

CAVE: A Structured Credit Assignment Approach for Fragmented Visual Evidence Reasoning

HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation

Receipt and verification

First computed	2026-05-17T23:38:50.810052Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

e6ac4f756900a674c5b183c8339dadaaf431922b66d39b98d7b5444ed9656249

Aliases

arxiv: 2509.18154 · arxiv_version: 2509.18154v1 · doi: 10.48550/arxiv.2509.18154 · pith_short_12: 42WE65LJACTH · pith_short_16: 42WE65LJACTHJRNR · pith_short_8: 42WE65LJ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/42WE65LJACTHJRNRQPEDHHNNVL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e6ac4f756900a674c5b183c8339dadaaf431922b66d39b98d7b5444ed9656249

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "e5cfdbed2eade0b1e05dc98ba7f76b24012e110da40b86f4e30df28e86dba62e",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-09-16T19:41:48Z",
    "title_canon_sha256": "551c7e34431a4ea04a6cfcb882aee58a714c3bdacb0994a04809586ae68d93be"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2509.18154",
    "kind": "arxiv",
    "version": 1
  }
}