Variable codebook sizes that increase along the sequence in visual tokenizers reduce generation FID scores significantly for autoregressive models on ImageNet.
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction , url =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Gazer uses MLLM feedback in two stages to diagnose semantic errors in intermediate AVM states and rewind/rectify the generation trajectory, improving alignment on compositional benchmarks without training.
citing papers explorer
-
Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation
Variable codebook sizes that increase along the sequence in visual tokenizers reduce generation FID scores significantly for autoregressive models on ImageNet.
-
Training-Free Semantic Correction for Autoregressive Visual Models
Gazer uses MLLM feedback in two stages to diagnose semantic errors in intermediate AVM states and rewind/rectify the generation trajectory, improving alignment on compositional benchmarks without training.