pith:RYU2AZN2
Autoregressive Visual Generation Needs a Prologue
Prepending a small set of prologue tokens trained only on AR loss decouples generation from reconstruction in autoregressive image models.
arxiv:2605.06137 v2 · 2026-05-07 · cs.CV · cs.AI · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{RYU2AZN2J4GJBV7FJ36RS2QDRT}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
On ImageNet 256x256, Prologue-Base reduces gFID from 21.01 to 10.75 without classifier-free guidance while keeping reconstruction almost unchanged; Prologue-Large reaches a competitive rFID of 0.99 and gFID of 1.46 using a standard AR model without auxiliary semantic supervision.
The assumption that training prologue tokens exclusively with AR CE loss will not interfere with the visual tokens' reconstruction quality and that the ELBO formalization supports the decoupled optimization.
Prologue introduces dedicated prologue tokens to decouple generation and reconstruction in AR visual models, significantly improving generation FID scores on ImageNet while maintaining reconstruction quality.
Receipt and verification
| First computed | 2026-06-01T01:03:54.500915Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
8e29a065ba4f0c90d7e54efd196a038cf99ad4c7f7658b6b29ad57588600595e
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/RYU2AZN2J4GJBV7FJ36RS2QDRT \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 8e29a065ba4f0c90d7e54efd196a038cf99ad4c7f7658b6b29ad57588600595e
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "726b24e809aaa247419cb7f83c5b70ac863540f8dae7ce5eb02dd72042ce4754",
"cross_cats_sorted": [
"cs.AI",
"cs.LG"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-07T12:35:51Z",
"title_canon_sha256": "c0f93b20d4d316c033d21025403c3291911464f9820096acfba8790f40f8fbff"
},
"schema_version": "1.0",
"source": {
"id": "2605.06137",
"kind": "arxiv",
"version": 2
}
}