{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:RYU2AZN2J4GJBV7FJ36RS2QDRT","short_pith_number":"pith:RYU2AZN2","schema_version":"1.0","canonical_sha256":"8e29a065ba4f0c90d7e54efd196a038cf99ad4c7f7658b6b29ad57588600595e","source":{"kind":"arxiv","id":"2605.06137","version":2},"attestation_state":"computed","paper":{"title":"Autoregressive Visual Generation Needs a Prologue","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Prepending a small set of prologue tokens trained only on AR loss decouples generation from reconstruction in autoregressive image models.","cross_cats":["cs.AI","cs.LG"],"primary_cat":"cs.CV","authors_text":"Bowen Zheng, Colin Zhang, Guang Yang, Tianyang Hu, Weijian Luo","submitted_at":"2026-05-07T12:35:51Z","abstract_excerpt":"In this work, we propose Prologue, an approach to bridging the reconstruction-generation gap in autoregressive (AR) image generation. Instead of modifying visual tokens to satisfy both reconstruction and generation, Prologue generates a small set of prologue tokens prepended to the visual token sequence. These prologue tokens are trained exclusively with the AR cross-entropy (CE) loss, while visual tokens remain dedicated to reconstruction. This decoupled design lets us optimize generation through the AR model's true distribution without affecting reconstruction quality, which we further forma"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"2605.06137","kind":"arxiv","version":2},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CV","submitted_at":"2026-05-07T12:35:51Z","cross_cats_sorted":["cs.AI","cs.LG"],"title_canon_sha256":"c0f93b20d4d316c033d21025403c3291911464f9820096acfba8790f40f8fbff","abstract_canon_sha256":"726b24e809aaa247419cb7f83c5b70ac863540f8dae7ce5eb02dd72042ce4754"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-06-01T01:03:54.501675Z","signature_b64":"3qX9e4HIzZBJpC4Is3DfaY0mf0kaDrSniTrhUjhQLax0F6vP+JFubClMnX5fh7Skq4b7HAtECUiM9MztZyt4AA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"8e29a065ba4f0c90d7e54efd196a038cf99ad4c7f7658b6b29ad57588600595e","last_reissued_at":"2026-06-01T01:03:54.500915Z","signature_status":"signed_v1","first_computed_at":"2026-06-01T01:03:54.500915Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Autoregressive Visual Generation Needs a Prologue","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Prepending a small set of prologue tokens trained only on AR loss decouples generation from reconstruction in autoregressive image models.","cross_cats":["cs.AI","cs.LG"],"primary_cat":"cs.CV","authors_text":"Bowen Zheng, Colin Zhang, Guang Yang, Tianyang Hu, Weijian Luo","submitted_at":"2026-05-07T12:35:51Z","abstract_excerpt":"In this work, we propose Prologue, an approach to bridging the reconstruction-generation gap in autoregressive (AR) image generation. Instead of modifying visual tokens to satisfy both reconstruction and generation, Prologue generates a small set of prologue tokens prepended to the visual token sequence. These prologue tokens are trained exclusively with the AR cross-entropy (CE) loss, while visual tokens remain dedicated to reconstruction. This decoupled design lets us optimize generation through the AR model's true distribution without affecting reconstruction quality, which we further forma"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"On ImageNet 256x256, Prologue-Base reduces gFID from 21.01 to 10.75 without classifier-free guidance while keeping reconstruction almost unchanged; Prologue-Large reaches a competitive rFID of 0.99 and gFID of 1.46 using a standard AR model without auxiliary semantic supervision.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that training prologue tokens exclusively with AR CE loss will not interfere with the visual tokens' reconstruction quality and that the ELBO formalization supports the decoupled optimization.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Prologue introduces dedicated prologue tokens to decouple generation and reconstruction in AR visual models, significantly improving generation FID scores on ImageNet while maintaining reconstruction quality.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Prepending a small set of prologue tokens trained only on AR loss decouples generation from reconstruction in autoregressive image models.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"17cd00dc84d509e17b769798e8d02715ae78bff4d23bbeb090398f84f84f0029"},"source":{"id":"2605.06137","kind":"arxiv","version":2},"verdict":{"id":"4387a6b6-6f87-4b83-ace2-889d7f4e7849","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-08T13:48:32.689247Z","strongest_claim":"On ImageNet 256x256, Prologue-Base reduces gFID from 21.01 to 10.75 without classifier-free guidance while keeping reconstruction almost unchanged; Prologue-Large reaches a competitive rFID of 0.99 and gFID of 1.46 using a standard AR model without auxiliary semantic supervision.","one_line_summary":"Prologue introduces dedicated prologue tokens to decouple generation and reconstruction in AR visual models, significantly improving generation FID scores on ImageNet while maintaining reconstruction quality.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that training prologue tokens exclusively with AR CE loss will not interfere with the visual tokens' reconstruction quality and that the ELBO formalization supports the decoupled optimization.","pith_extraction_headline":"Prepending a small set of prologue tokens trained only on AR loss decouples generation from reconstruction in autoregressive image models."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.06137/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"claim_evidence","ran_at":"2026-05-20T13:02:04.297997Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-20T08:36:43.479893Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T19:01:19.381010Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T12:55:47.107759Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"d33aa7f6baebfeac59e11662ddca406cd367af12d49edde56d32af5c3b118fc8"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.06137","created_at":"2026-06-01T01:03:54.501037+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.06137v2","created_at":"2026-06-01T01:03:54.501037+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.06137","created_at":"2026-06-01T01:03:54.501037+00:00"},{"alias_kind":"pith_short_12","alias_value":"RYU2AZN2J4GJ","created_at":"2026-06-01T01:03:54.501037+00:00"},{"alias_kind":"pith_short_16","alias_value":"RYU2AZN2J4GJBV7F","created_at":"2026-06-01T01:03:54.501037+00:00"},{"alias_kind":"pith_short_8","alias_value":"RYU2AZN2","created_at":"2026-06-01T01:03:54.501037+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/RYU2AZN2J4GJBV7FJ36RS2QDRT","json":"https://pith.science/pith/RYU2AZN2J4GJBV7FJ36RS2QDRT.json","graph_json":"https://pith.science/api/pith-number/RYU2AZN2J4GJBV7FJ36RS2QDRT/graph.json","events_json":"https://pith.science/api/pith-number/RYU2AZN2J4GJBV7FJ36RS2QDRT/events.json","paper":"https://pith.science/paper/RYU2AZN2"},"agent_actions":{"view_html":"https://pith.science/pith/RYU2AZN2J4GJBV7FJ36RS2QDRT","download_json":"https://pith.science/pith/RYU2AZN2J4GJBV7FJ36RS2QDRT.json","view_paper":"https://pith.science/paper/RYU2AZN2","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.06137&json=true","fetch_graph":"https://pith.science/api/pith-number/RYU2AZN2J4GJBV7FJ36RS2QDRT/graph.json","fetch_events":"https://pith.science/api/pith-number/RYU2AZN2J4GJBV7FJ36RS2QDRT/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/RYU2AZN2J4GJBV7FJ36RS2QDRT/action/timestamp_anchor","attest_storage":"https://pith.science/pith/RYU2AZN2J4GJBV7FJ36RS2QDRT/action/storage_attestation","attest_author":"https://pith.science/pith/RYU2AZN2J4GJBV7FJ36RS2QDRT/action/author_attestation","sign_citation":"https://pith.science/pith/RYU2AZN2J4GJBV7FJ36RS2QDRT/action/citation_signature","submit_replication":"https://pith.science/pith/RYU2AZN2J4GJBV7FJ36RS2QDRT/action/replication_record"}},"created_at":"2026-06-01T01:03:54.501037+00:00","updated_at":"2026-06-01T01:03:54.501037+00:00"}