Scaled vanilla autoregressive models based on Llama achieve 2.18 FID on ImageNet 256x256 image generation, beating popular diffusion models without visual inductive biases.
Imagenet: A large-scale hierarchical image database
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2representative citing papers
LSeg achieves competitive zero-shot semantic segmentation by contrastively aligning dense pixel embeddings from a transformer with text embeddings of class labels.
citing papers explorer
-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Scaled vanilla autoregressive models based on Llama achieve 2.18 FID on ImageNet 256x256 image generation, beating popular diffusion models without visual inductive biases.
-
Language-driven Semantic Segmentation
LSeg achieves competitive zero-shot semantic segmentation by contrastively aligning dense pixel embeddings from a transformer with text embeddings of class labels.