Pith Number

pith:T3R2WYDI

pith:2024:T3R2WYDILTDHFB22LT4M6D44D3

not attested not anchored not stored refs resolved

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Bin Wang, Conghui He, Dahua Lin, Hang Yan, Haodong Duan, Jiaqi Wang, Jifeng Dai, Jingwen Li, Kai Chen, Lin Chen, Linke Ouyang, Pan Zhang, Peng Sun, Qipeng Guo, Rui Qian, Songyang Zhang, Wei Li, Wenhai Wang, Wenwei Zhang, Xiaoyi Dong, Xingcheng Zhang, Xinyue Zhang, Yang Gao, Yining Li, Yuhang Cao, Yuhang Zang, Yu Qiao

InternLM-XComposer-2.5 reaches GPT-4V level on vision-language tasks with a 7B model and 96K context support.

arxiv:2407.03320 v1 · 2024-07-03 · cs.CV · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{T3R2WYDILTDHFB22LT4M6D44D3}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend... outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks.

C2weakest assumption

That the 28 chosen benchmarks and the specific 16 key tasks are representative of real-world use and that RoPE extrapolation from 24K training to 96K inference does not introduce hidden degradation on long outputs.

C3one line summary

InternLM-XComposer-2.5 is a 7B vision-language model supporting up to 96K context that reaches GPT-4V-level performance on image, video, and multi-turn tasks and adds LoRA-driven text-image composition capabilities.

References

183 extracted · 183 resolved · 36 Pith anchors

[1] Nocaps: Novel object captioning at scale 2019

[2] Flamingo: a visual language model for few-shot learning,

[3] Claude 3 haiku: our fastest model yet,

[4] Available at: https://www.anthropic.com/ news/claude-3-haiku. 1, 8

[5] Lawrence Zitnick, and Devi Parikh 2015

Formal links

2 machine-checked theorem links

Cited by

20 papers in Pith

Training-Free Multimodal Large Language Model Orchestration

Enhancing Visual Token Representations for Video Large Language Models via Training-Free Spatial-Temporal Pooling and Gridding

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Training-Free Multimodal Large Language Model Orchestration

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs

Receipt and verification

First computed	2026-05-17T23:38:14.327329Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

9ee3ab60685cc672875a5cf8cf0f9c1ec15b3f02177cf550807d3b7ab251300e

Aliases

arxiv: 2407.03320 · arxiv_version: 2407.03320v1 · doi: 10.48550/arxiv.2407.03320 · pith_short_12: T3R2WYDILTDH · pith_short_16: T3R2WYDILTDHFB22 · pith_short_8: T3R2WYDI

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/T3R2WYDILTDHFB22LT4M6D44D3 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9ee3ab60685cc672875a5cf8cf0f9c1ec15b3f02177cf550807d3b7ab251300e

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "21cceb9d462163087b0dca8e7bb289e0afc7fcd632313d0b62ce244763f889b9",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-07-03T17:59:21Z",
    "title_canon_sha256": "38e695c3ae3d470f400cb2e8ab0933bd36b3e26713f77856af17cbb4736facd1"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2407.03320",
    "kind": "arxiv",
    "version": 1
  }
}