pith:T3R2WYDI
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
InternLM-XComposer-2.5 reaches GPT-4V level on vision-language tasks with a 7B model and 96K context support.
arxiv:2407.03320 v1 · 2024-07-03 · cs.CV · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{T3R2WYDILTDHFB22LT4M6D44D3}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend... outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks.
That the 28 chosen benchmarks and the specific 16 key tasks are representative of real-world use and that RoPE extrapolation from 24K training to 96K inference does not introduce hidden degradation on long outputs.
InternLM-XComposer-2.5 is a 7B vision-language model supporting up to 96K context that reaches GPT-4V-level performance on image, video, and multi-turn tasks and adds LoRA-driven text-image composition capabilities.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:14.327329Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
9ee3ab60685cc672875a5cf8cf0f9c1ec15b3f02177cf550807d3b7ab251300e
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/T3R2WYDILTDHFB22LT4M6D44D3 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9ee3ab60685cc672875a5cf8cf0f9c1ec15b3f02177cf550807d3b7ab251300e
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "21cceb9d462163087b0dca8e7bb289e0afc7fcd632313d0b62ce244763f889b9",
"cross_cats_sorted": [
"cs.CL"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-07-03T17:59:21Z",
"title_canon_sha256": "38e695c3ae3d470f400cb2e8ab0933bd36b3e26713f77856af17cbb4736facd1"
},
"schema_version": "1.0",
"source": {
"id": "2407.03320",
"kind": "arxiv",
"version": 1
}
}