pith:RNLXT3JA
Unified Pix Token And Word Token Generative Language Model
A new generative language model assigns each image pixel its own token to unify visual and textual inputs.
arxiv:2605.14028 v1 · 2026-05-13 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{RNLXT3JAGTMBVMZZMRC3AOGBQP}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
The new model unifies pix token and word token into the generative language model... The experimental results show that it has good performance even in small model and with limited training data.
That assigning each pixel its own token embedding plus the added color folding and global conditional attention approximation will produce meaningfully better visual detail understanding than existing patch-based encoders, without any quantitative comparison or ablation shown in the provided text.
A new model unifies per-pixel and word tokens in a generative language model with per-pixel embeddings, color folding, and unsupervised image pretraining, reporting good performance on small models with limited data.
References
Receipt and verification
| First computed | 2026-05-17T23:39:12.875309Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
8b5779ed2034d81ab3396445b038c183e8e66da75bafbd54f17df70b94c91f3a
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/RNLXT3JAGTMBVMZZMRC3AOGBQP \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8b5779ed2034d81ab3396445b038c183e8e66da75bafbd54f17df70b94c91f3a
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "603b2c502b29c231ff054044abb6165cf7addcfe829fdc9742523fdcb110dc9e",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-13T18:38:51Z",
"title_canon_sha256": "a07c99d9289337019c324237fce5d3d64bca7473b659b40301115ca0128b9263"
},
"schema_version": "1.0",
"source": {
"id": "2605.14028",
"kind": "arxiv",
"version": 1
}
}