Engram in AR image generation saves backbone FLOPs but trails pure AR baselines in FID and behaves as a gated side-pathway rather than a content-addressed retriever.
Language model beats diffusion - tokenizer is key to visual generation
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
fields
cs.CV 4roles
method 1polarities
use method 1representative citing papers
OAR distills specialized generation orders from any-order AR models via self-distillation, improving FID from 2.39 to 2.17 on ImageNet 256x256 while preserving multi-task flexibility.
Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.