pith. sign in
Pith Number

pith:42WE65LJ

pith:2025:42WE65LJACTHJRNRQPEDHHNNVL
not attested not anchored not stored refs resolved

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Bingxiang He, Bokai Xu, Chi Chen, Chongyi Wang, Fuwei Huang, Ganqu Cui, Guoyang Zeng, Hanyu Liu, Hongyuan Liu, Jie Cai, Jie Zhou, Jingkun Tang, Ji Qi, Junbo Cui, Liqing Ruan, Luoyuan Zhang, Maosong Sun, Ning Ding, Qining Guo, Tianchi Cai, Tianyu Yu, Weize Chen, Wenhao Hu, Wenshuo Ma, Xu Han, Yingjing Xu, Yuanqian Zhao, Yuan Yao, Yuxiang Huang, Yuxuan Li, Zefan Wang, Zhihui He, Zhiyuan Liu, Zonghao Guo

An 8B multimodal model outperforms GPT-4o-latest and Qwen2.5-VL 72B on OpenCompass while using far less memory and inference time.

arxiv:2509.18154 v1 · 2025-09-16 · cs.LG · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{42WE65LJACTHJRNRQPEDHHNNVL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MiniCPM-V 4.5 surpasses GPT-4o-latest and Qwen2.5-VL 72B on OpenCompass while using 46.7% GPU memory and 8.7% inference time of Qwen2.5-VL 7B on VideoMME.

C2weakest assumption

The reported benchmarks and efficiency measurements generalize beyond the specific evaluation suites and hardware configurations used in the experiments.

C3one line summary

An 8B MLLM reaches state-of-the-art efficiency and performance under 30B by combining a unified 3D resampler, joint document-text training, and hybrid RL for reasoning modes.

References

77 extracted · 77 resolved · 9 Pith anchors

[1] Glm-4.5v and glm-4.1v-thinking: Towards versatile multimodal reasoning with scalable reinforcement learning, 2025 2025
[2] Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fan 2025
[3] Mimo-vl technical report, 2025 2025
[4] Openai platform chatgpt-4o, 2025 2025
[5] Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond 2023

Cited by

32 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:50.810052Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

e6ac4f756900a674c5b183c8339dadaaf431922b66d39b98d7b5444ed9656249

Aliases

arxiv: 2509.18154 · arxiv_version: 2509.18154v1 · doi: 10.48550/arxiv.2509.18154 · pith_short_12: 42WE65LJACTH · pith_short_16: 42WE65LJACTHJRNR · pith_short_8: 42WE65LJ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/42WE65LJACTHJRNRQPEDHHNNVL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e6ac4f756900a674c5b183c8339dadaaf431922b66d39b98d7b5444ed9656249
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "e5cfdbed2eade0b1e05dc98ba7f76b24012e110da40b86f4e30df28e86dba62e",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-09-16T19:41:48Z",
    "title_canon_sha256": "551c7e34431a4ea04a6cfcb882aee58a714c3bdacb0994a04809586ae68d93be"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2509.18154",
    "kind": "arxiv",
    "version": 1
  }
}