pith:ROYX62NC
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI jointly scales a 4-billion-parameter vision transformer with a language model on a 10B multilingual image-text set to reach state-of-the-art on captioning, VQA and scene-text tasks.
arxiv:2209.06794 v4 · 2022-09-14 · cs.CV · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{ROYX62NCB72JQHDVNJK6MR3MCN}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
PaLI achieves state-of-the-art in multiple vision and language tasks (such as captioning, visual question-answering, scene-text understanding), while retaining a simple, modular, and scalable design.
That joint scaling of the vision and language components on the new 10B multilingual dataset will produce the claimed performance gains without major issues from data quality, language imbalance, or overfitting.
PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:48.354221Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
8bb17f69a20ff4981c756a55e6476c13441e2bbbd55d7f5c78db51a1e6549e0a
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/ROYX62NCB72JQHDVNJK6MR3MCN \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8bb17f69a20ff4981c756a55e6476c13441e2bbbd55d7f5c78db51a1e6549e0a
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "2a50651767b6289fbf279711ac7379d502692af9f7b0932b728ccd5beb6987f9",
"cross_cats_sorted": [
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2022-09-14T17:24:07Z",
"title_canon_sha256": "08a218ce080d71719c57e13038c8be63fb06bfe224ac17e3c8cb281586c15081"
},
"schema_version": "1.0",
"source": {
"id": "2209.06794",
"kind": "arxiv",
"version": 4
}
}