pith. sign in

Randomized autoregressive visual generation

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

method 1

citation-polarity summary

fields

cs.CV 4

years

2026 2 2025 2

roles

method 1

polarities

use method 1

representative citing papers

Autoregressive Visual Generation Needs a Prologue

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

Prologue introduces dedicated prologue tokens to decouple generation and reconstruction in AR visual models, significantly improving generation FID scores on ImageNet while maintaining reconstruction quality.

Distilling Specialized Orders for Visual Generation

cs.CV · 2025-04-23 · unverdicted · novelty 7.0

OAR distills specialized generation orders from any-order AR models via self-distillation, improving FID from 2.39 to 2.17 on ImageNet 256x256 while preserving multi-task flexibility.

Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation

cs.CV · 2025-05-08 · unverdicted · novelty 6.0

Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.

citing papers explorer

Showing 4 of 4 citing papers.

  • Does Engram Do Memory Retrieval in Autoregressive Image Generation? cs.CV · 2026-05-13 · accept · none · ref 18

    Engram in AR image generation saves backbone FLOPs but trails pure AR baselines in FID and behaves as a gated side-pathway rather than a content-addressed retriever.

  • Autoregressive Visual Generation Needs a Prologue cs.CV · 2026-05-07 · unverdicted · none · ref 57

    Prologue introduces dedicated prologue tokens to decouple generation and reconstruction in AR visual models, significantly improving generation FID scores on ImageNet while maintaining reconstruction quality.

  • Distilling Specialized Orders for Visual Generation cs.CV · 2025-04-23 · unverdicted · none · ref 14

    OAR distills specialized generation orders from any-order AR models via self-distillation, improving FID from 2.39 to 2.17 on ImageNet 256x256 while preserving multi-task flexibility.

  • Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation cs.CV · 2025-05-08 · unverdicted · none · ref 92

    Mogao presents a causal unified model with deep fusion, dual encoders, and interleaved position embeddings that achieves strong performance on multi-modal understanding, text-to-image generation, and coherent interleaved outputs including zero-shot editing.