pith:J2SUWQ2V
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation
InsightTok uses localized content-aware perceptual losses to improve text and face fidelity in discrete image tokenizers.
arxiv:2605.14333 v1 · 2026-05-14 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{J2SUWQ2VUT6GSTP7Z7NLPYX6G2}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
With a compact 16k codebook and a 16x downsampling rate, InsightTok significantly outperforms prior tokenizers in text and face reconstruction without compromising general reconstruction quality. These gains consistently transfer to autoregressive image generation in InsightAR, producing images with clearer text and more faithful facial details.
That localized content-aware perceptual losses will reliably capture fine-grained text legibility and facial fidelity across diverse images without introducing new artifacts or requiring extensive hyperparameter tuning for each domain.
InsightTok improves text and face fidelity in discrete image tokenization via content-aware perceptual losses, with gains transferring to autoregressive generation.
References
Formal links
Receipt and verification
| First computed | 2026-05-17T23:39:08.268499Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
4ea54b4355a4fc694dffcfdab7e2fe36ade01ef7c164645fae48d7b0fb70a20f
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/J2SUWQ2VUT6GSTP7Z7NLPYX6G2 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 4ea54b4355a4fc694dffcfdab7e2fe36ade01ef7c164645fae48d7b0fb70a20f
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "4ef55016f8f82c403eac1edf7860c2366b261416c6e96612c37953b8096c777d",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-14T03:57:25Z",
"title_canon_sha256": "bc892d40b41de6034c550b6e8f46c1fc03a36e85ee9702295e73845da32ae016"
},
"schema_version": "1.0",
"source": {
"id": "2605.14333",
"kind": "arxiv",
"version": 1
}
}