pith. sign in
Pith Number

pith:W4TPLI53

pith:2023:W4TPLI53LVIHYZ6UCVBRL3443S
not attested not anchored not stored refs resolved

PaLI-X: On Scaling up a Multilingual Vision and Language Model

AJ Piergiovanni, Alexander Kolesnikov, Andreas Peter Steiner, Anelia Angelova, Anurag Arnab, Arsha Nagrani, Austin Waters, Basil Mustafa, Bo Pang, Carlos Riquelme Ruiz, Ceslee Montgomery, Daniel Keysers, Daniel Salz, Filip Pavetic, Gang Li, Hexiang Hu, Ibrahim Alabdulmohsin, Jialin Wu, Josip Djolonga, Julien Amelot, Kenton Lee, Keran Rong, Lucas Beyer, Mandar Joshi, Mario Lucic, Marvin Ritter, Matthias Minderer, Michael Tschannen, Mojtaba Seyedhosseini, Mostafa Dehghani, Neil Houlsby, Paulina Pietrzyk, Piotr Padlewski, Radu Soricut, Sebastian Goodman, Siamak Shakeri, Soravit Changpinyo, Xiaohua Zhai, Xiao Wang, Xi Chen, Yang Li, Yi Tay, Yuanzhong Xu

Scaling up PaLI-X sets new state-of-the-art on most vision and language benchmarks and shows emergent capabilities.

arxiv:2305.18565 v1 · 2023-05-29 · cs.CV · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{W4TPLI53LVIHYZ6UCVBRL3443S}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

PaLI-X advances the state-of-the-art on most vision-and-language benchmarks considered (25+ of them) and exhibits emerging capabilities such as complex counting and multilingual object detection.

C2weakest assumption

That increasing model size and broadening the training task mixture will reliably produce both higher benchmark scores and the observed emergent behaviors without requiring task-specific fine-tuning or additional architectural changes.

C3one line summary

Scaling a multilingual vision-language model in size and training breadth yields new state-of-the-art results on over 25 benchmarks plus emerging abilities in counting and multilingual detection.

References

99 extracted · 99 resolved · 4 Pith anchors

[1] PaLM: Scaling Language Modeling with Pathways 2022 · arXiv:2204.02311
[2] Tom B. Brown, Benjamin Mann, Nick Ryder, Jared Kaplan Melanie Subbiah, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretche 2020
[3] GLaM: Efficient scaling of language models with mixture-of-experts 2022
[4] Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H 2023
[5] PaLI: A jointly-scaled multilingual language-image model 2023

Formal links

2 machine-checked theorem links

Cited by

23 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.814348Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b726f5a3bb5d507c67d4154315ef9cdc823a705c1f6de9de4a167c79eed8008a

Aliases

arxiv: 2305.18565 · arxiv_version: 2305.18565v1 · doi: 10.48550/arxiv.2305.18565 · pith_short_12: W4TPLI53LVIH · pith_short_16: W4TPLI53LVIHYZ6U · pith_short_8: W4TPLI53
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/W4TPLI53LVIHYZ6UCVBRL3443S \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b726f5a3bb5d507c67d4154315ef9cdc823a705c1f6de9de4a167c79eed8008a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "94e18e05584083e5b1cbbe254aba4a9c4cbb4c6b8dc63d01ec220ef24117dc7a",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-05-29T18:58:38Z",
    "title_canon_sha256": "3c7150397808e13ff72bab5cc440587de27d0854e43b1fc23a199ea026595a24"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2305.18565",
    "kind": "arxiv",
    "version": 1
  }
}