pith:2KOQ3SMA
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
A careful mix of image-caption, interleaved image-text, and text-only data during pre-training is crucial for state-of-the-art few-shot results in multimodal large language models.
arxiv:2403.09611 v4 · 2024-03-14 · cs.CV · cs.CL · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2KOQ3SMADRV4XAYY3DLSLG76FH}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
For large-scale multimodal pre-training, a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art few-shot results across multiple benchmarks.
That the ablations performed are comprehensive enough to isolate the true importance of data composition and image encoder choices without confounding effects from untested interactions or hyperparameter choices.
MM1 models achieve state-of-the-art few-shot multimodal results by pre-training on a careful mix of image-caption, interleaved, and text-only data with optimized image encoders.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:49.147551Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
d29d0dc9801c6bcb8318d8d7259bfe29e407418722613cd139c0b9faa3e3b0fc
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2KOQ3SMADRV4XAYY3DLSLG76FH \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d29d0dc9801c6bcb8318d8d7259bfe29e407418722613cd139c0b9faa3e3b0fc
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "923a9976303f3c648273dba6d0d92803fad89135dc3e2e95942bff3913bb9ceb",
"cross_cats_sorted": [
"cs.CL",
"cs.LG"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-03-14T17:51:32Z",
"title_canon_sha256": "98612f0506b0805073aeaaeaf93f8af49f3f2ccba777087e6dd48a1edd8d0f0a"
},
"schema_version": "1.0",
"source": {
"id": "2403.09611",
"kind": "arxiv",
"version": 4
}
}