Pith Number

pith:CTVKNTPZ

pith:2023:CTVKNTPZG7G2XIRXBJVDSVT54A

not attested not anchored not stored refs resolved

CogVLM: Visual Expert for Pretrained Language Models

Bin Xu, Jiazheng Xu, Jie Tang, Ji Qi, Juanzi Li, Junhui Ji, Lei Zhao, Ming Ding, Qingsong Lv, Weihan Wang, Wenmeng Yu, Wenyi Hong, Xixuan Song, Yan Wang, Yuxiao Dong, Zhuoyi Yang

A trainable visual expert module inserted into the attention and FFN layers of a frozen language model enables deep vision-language fusion.

arxiv:2311.03079 v2 · 2023-11-06 · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{CTVKNTPZG7G2XIRXBJVDSVT54A}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks... surpassing or matching PaLI-X 55B.

C2weakest assumption

The visual expert module can be inserted into the attention and FFN layers of any frozen pretrained language model without requiring changes to the original architecture or loss functions.

C3one line summary

CogVLM adds a trainable visual expert inside frozen language model layers for deep vision-language fusion and reports state-of-the-art results on ten cross-modal benchmarks while preserving NLP performance.

References

33 extracted · 33 resolved · 17 Pith anchors

[1] OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models · arXiv:2308.01390

[2] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond · arXiv:2308.12966

[3] Murel: Multimodal relational reasoning for visual ques- tion answering 1989

[4] Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic · arXiv:2306.15195

[5] Universal captioner: Long-tail vision-and-language model training through content-style separation.arXiv preprint arXiv:2111.12727,

Formal links

2 machine-checked theorem links

Cited by

45 papers in Pith

Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions

ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection

Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP

MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models

HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks

Receipt and verification

First computed	2026-05-17T23:38:51.021764Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

14eaa6cdf937cdaba2370a6a39567de015ee54eca0c505143d4d420dfa34f0e5

Aliases

arxiv: 2311.03079 · arxiv_version: 2311.03079v2 · doi: 10.48550/arxiv.2311.03079 · pith_short_12: CTVKNTPZG7G2 · pith_short_16: CTVKNTPZG7G2XIRX · pith_short_8: CTVKNTPZ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/CTVKNTPZG7G2XIRXBJVDSVT54A \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 14eaa6cdf937cdaba2370a6a39567de015ee54eca0c505143d4d420dfa34f0e5

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "9ed531cb4a2ee62bd4512e8535ec68ef02bf4f67385e61fe8e221a00b5f126b6",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-11-06T13:04:39Z",
    "title_canon_sha256": "679fe85268225460d07d2179c1c3c8b521429885cfbc6b874c9f34e37b4130b4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2311.03079",
    "kind": "arxiv",
    "version": 2
  }
}