Pith Number

pith:YAOHT3YA

pith:2024:YAOHT3YAN4BJN2T6TGUDENXI6V

not attested not anchored not stored refs resolved

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Bin Wang, Conghui He, Dahua Lin, Hang Yan, Haodong Duan, Jiaqi Wang, Jingwen Li, Kai Chen, Linke Ouyang, Maosong Cao, Pan Zhang, Songyang Zhang, Wei Li, Wenwei Zhang, Xiaoyi Dong, Xilin Wei, Xingcheng Zhang, Xinyue Zhang, Yang Gao, Yining Li, Yuhang Cao, Yuhang Zang, Yu Qiao

InternLM-XComposer2 generates custom interleaved text-image content by applying LoRA parameters only to image tokens.

arxiv:2401.16420 v1 · 2024-01-29 · cs.CV · cs.CL

Open paper page JSON Open Graph Bundle Merged state What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{YAOHT3YAN4BJN2T6TGUDENXI6V}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

InternLM-XComposer2 ... not only significantly outperforms existing multimodal models but also matches or even surpasses GPT-4V and Gemini Pro in certain assessments.

C2weakest assumption

That applying additional LoRA parameters exclusively to image tokens preserves the integrity of pre-trained language knowledge while enabling precise vision understanding and high-quality text composition.

C3one line summary

InternLM-XComposer2 introduces Partial LoRA on InternLM2-7B to enable high-quality free-form text-image composition while matching or exceeding GPT-4V on select vision-language benchmarks.

References

105 extracted · 105 resolved · 16 Pith anchors

[1] Nocaps: Novel object captioning at scale

[2] Flamingo: a visual language model for few-shot learning,

[3] arXiv preprint arXiv:1905.13319 , year= 1905 · arXiv:1905.13319

[4] Lawrence Zitnick, and Devi Parikh 2015

[5] Openflamingo: An open- source framework for training large autoregressive vision- language models 2023

Formal links

2 machine-checked theorem links

Cited by

21 papers in Pith

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Receipt and verification

First computed	2026-05-17T23:38:14.981310Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

c01c79ef006f0296ea7e99a83236e8f5749d05e79699e4e0c8abe238914c4934

Aliases

arxiv: 2401.16420 · arxiv_version: 2401.16420v1 · doi: 10.48550/arxiv.2401.16420 · pith_short_12: YAOHT3YAN4BJ · pith_short_16: YAOHT3YAN4BJN2T6 · pith_short_8: YAOHT3YA

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/YAOHT3YAN4BJN2T6TGUDENXI6V \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c01c79ef006f0296ea7e99a83236e8f5749d05e79699e4e0c8abe238914c4934

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "8429ed639989a4121da3104fc4fd2393bc12545d4777baa9397279c0f2651057",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-01-29T18:59:02Z",
    "title_canon_sha256": "3b2deff91597c496b7dbbec7f1d2f0eaef3a13ef574dfa65915194c7ee757aa0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2401.16420",
    "kind": "arxiv",
    "version": 1
  }
}