A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.
A style-based generator architecture for generative adversarial networks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2representative citing papers
RobuQ delivers the first stable DiT image generation at W1.58A2 average bits via Hadamard-based robust activation quantization and layer-wise mixed-precision activations.
citing papers explorer
-
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.
-
RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization
RobuQ delivers the first stable DiT image generation at W1.58A2 average bits via Hadamard-based robust activation quantization and layer-wise mixed-precision activations.