Projecting VAE latents to a fixed spherical radius and replacing linear interpolation with spherical linear interpolation improves class-conditional ImageNet-256 FID while leaving the diffusion architecture unchanged.
Image tokenizer needs post-training.arXiv preprint arXiv:2509.12474
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
unclear 1representative citing papers
Qwen-Image-VAE-2.0 achieves state-of-the-art high-compression image reconstruction and superior diffusability for diffusion models, with a new text-rich document benchmark.
VibeToken enables autoregressive image generation at arbitrary resolutions using 64 tokens for 1024x1024 images with 3.94 gFID, constant 179G FLOPs, and better efficiency than diffusion or fixed AR baselines.
citing papers explorer
-
Aligning Latent Geometry for Spherical Flow Matching in Image Generation
Projecting VAE latents to a fixed spherical radius and replacing linear interpolation with spherical linear interpolation improves class-conditional ImageNet-256 FID while leaving the diffusion architecture unchanged.
-
Qwen-Image-VAE-2.0 Technical Report
Qwen-Image-VAE-2.0 achieves state-of-the-art high-compression image reconstruction and superior diffusability for diffusion models, with a new text-rich document benchmark.
-
VibeToken: Scaling 1D Image Tokenizers and Autoregressive Models for Dynamic Resolution Generations
VibeToken enables autoregressive image generation at arbitrary resolutions using 64 tokens for 1024x1024 images with 3.94 gFID, constant 179G FLOPs, and better efficiency than diffusion or fixed AR baselines.