pith:ADBOX4AN
A Survey on Multimodal Large Language Models
Multimodal large language models use LLMs as a central brain to handle images and other inputs with new emergent reasoning skills.
arxiv:2306.13549 v4 · 2023-06-23 · cs.CV · cs.AI · cs.CL · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ADBOX4ANCCJ7NRUBGI7VEQKFUA}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
The surprising emergent capabilities of MLLM, such as writing stories based on images and OCR-free math reasoning, are rare in traditional multimodal methods, suggesting a potential path to artificial general intelligence.
The survey assumes that the cited literature and the associated GitHub repository together provide a sufficiently complete and up-to-date picture of the rapidly evolving MLLM field.
This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:49.317953Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
00c2ebf00d1093f6c681323f524145a02d492b2bde1539cd2a569fee780ce57c
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ADBOX4ANCCJ7NRUBGI7VEQKFUA \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 00c2ebf00d1093f6c681323f524145a02d492b2bde1539cd2a569fee780ce57c
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "cd8631d64ba42ce8407bc3636a069e8d6555ecab78b44f8bfaf8be644af2f205",
"cross_cats_sorted": [
"cs.AI",
"cs.CL",
"cs.LG"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2023-06-23T15:21:52Z",
"title_canon_sha256": "7d7aca4e6ad4070b10cd65d5c70012fc9dbc97638df15425c9356535f8bd8dd4"
},
"schema_version": "1.0",
"source": {
"id": "2306.13549",
"kind": "arxiv",
"version": 4
}
}