pith. sign in
Pith Number

pith:DULKBTOB

pith:2024:DULKBTOBJA4XIQFVFZQSJCUWTM
not attested not anchored not stored refs resolved

PaliGemma 2: A Family of Versatile VLMs for Transfer

Alexey Gritsenko, Andreas Steiner, Andr\'e Susano Pinto, Anthony Sherbondy, Daniel Keysers, Emanuele Bugliarello, Ibrahim Alabdulmohsin, Lucas Beyer, Matthias Minderer, Michael Tschannen, Reeve Ingle, Sahar Kazemzadeh, Shangbang Long, Siyang Qin, Thomas Mesnard, Xiaohua Zhai, Xiao Wang, Yonatan Bitton

PaliGemma 2 pairs Gemma 2 language models with SigLIP encoders and trains them at multiple resolutions to achieve strong transfer on OCR and captioning tasks.

arxiv:2412.03555 v1 · 2024-12-04 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DULKBTOBJA4XIQFVFZQSJCUWTM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

PaliGemma 2 obtains state-of-the-art results on different OCR-related tasks such as table structure recognition, molecular structure recognition, music score recognition, as well as long fine-grained captioning and radiography report generation.

C2weakest assumption

That multi-stage training at multiple resolutions equips the models with broad transferable knowledge; the abstract provides no controlled ablations or details on how this is verified versus simpler baselines.

C3one line summary

PaliGemma 2 is a family of vision-language models that achieves state-of-the-art results on transfer tasks like table structure recognition and radiography report generation by combining SigLIP with Gemma 2 models at various sizes and resolutions.

References

113 extracted · 113 resolved · 13 Pith anchors

[1] M. Acharya, K. Kafle, and C. Kanan. Tal- lyQA: Answering complex counting ques- tions. InAAAI, 2019 2019
[2] H. Agrawal, K. Desai, Y. Wang, X. Chen, R. Jain, M. Johnson, D. Batra, D. Parikh, S. Lee, and P. Anderson. NoCaps: Novel object captioning at scale. InICCV, 2019 2019
[3] I. Alabdulmohsin, X. Zhai, A. Kolesnikov, and L. Beyer. Getting vit in shape: Scaling laws for compute-optimal model design. In NeurIPS, 2023 2023
[4] J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Men- sch, K. Millican, M. Reynolds, R. Ring, E. Rutherford, S. Cabi, T. Han, Z. Gong, S. Samangooei, M. Monteiro, J. Menick 2022
[5] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond 2023 · arXiv:2308.12966

Formal links

2 machine-checked theorem links

Cited by

26 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:52.926181Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

1d16a0cdc148397440b52e61248a969b15fac2d9e1570c0a7906224e956eda27

Aliases

arxiv: 2412.03555 · arxiv_version: 2412.03555v1 · doi: 10.48550/arxiv.2412.03555 · pith_short_12: DULKBTOBJA4X · pith_short_16: DULKBTOBJA4XIQFV · pith_short_8: DULKBTOB
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DULKBTOBJA4XIQFVFZQSJCUWTM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1d16a0cdc148397440b52e61248a969b15fac2d9e1570c0a7906224e956eda27
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "501a62ef5aaa69b608bba60f579b7b53b44da05a7c37f19dc796242d5c2856e2",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-12-04T18:50:42Z",
    "title_canon_sha256": "db490a13d82e857cc0961af6be15e96b9c4e49e1742d092153959eaaaf28eacf"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2412.03555",
    "kind": "arxiv",
    "version": 1
  }
}