pith. sign in
Pith Number

pith:ADBOX4AN

pith:2023:ADBOX4ANCCJ7NRUBGI7VEQKFUA
not attested not anchored not stored refs resolved

A Survey on Multimodal Large Language Models

Chaoyou Fu, Enhong Chen, Ke Li, Shukang Yin, Sirui Zhao, Tong Xu, Xing Sun

Multimodal large language models use LLMs as a central brain to handle images and other inputs with new emergent reasoning skills.

arxiv:2306.13549 v4 · 2023-06-23 · cs.CV · cs.AI · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ADBOX4ANCCJ7NRUBGI7VEQKFUA}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional multimodal methods, suggesting a potential path to artificial general intelligence.

C2weakest assumption

The survey assumes that the cited literature and the associated GitHub repository together provide a sufficiently complete and up-to-date picture of the rapidly evolving MLLM field.

C3one line summary

This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.

References

209 extracted · 209 resolved · 72 Pith anchors

[1] A Survey of Large Language Models 2023 · arXiv:2303.18223
[2] Chatgpt: A language model for conversational ai, 2023
[3] GPT-4 Technical Report 2023 · arXiv:2303.08774
[4] Vicuna: An open-source chatbot impressing gpt-4 with 90% chatgpt quality,
[5] Available: https://vicuna.lmsys.org 1, 3, 4

Formal links

2 machine-checked theorem links

Cited by

42 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:49.317953Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

00c2ebf00d1093f6c681323f524145a02d492b2bde1539cd2a569fee780ce57c

Aliases

arxiv: 2306.13549 · arxiv_version: 2306.13549v4 · doi: 10.48550/arxiv.2306.13549 · pith_short_12: ADBOX4ANCCJ7 · pith_short_16: ADBOX4ANCCJ7NRUB · pith_short_8: ADBOX4AN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ADBOX4ANCCJ7NRUBGI7VEQKFUA \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 00c2ebf00d1093f6c681323f524145a02d492b2bde1539cd2a569fee780ce57c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "cd8631d64ba42ce8407bc3636a069e8d6555ecab78b44f8bfaf8be644af2f205",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-06-23T15:21:52Z",
    "title_canon_sha256": "7d7aca4e6ad4070b10cd65d5c70012fc9dbc97638df15425c9356535f8bd8dd4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2306.13549",
    "kind": "arxiv",
    "version": 4
  }
}