Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion
read the original abstract
Text-to-image generation via Stable Diffusion models (SDM) have demonstrated remarkable capabilities. However, their computational intensity, particularly in the iterative denoising process, hinders real-time deployment in latency-sensitive applications. While Recent studies have explored post-training quantization (PTQ) and quantization-aware training (QAT) methods to compress Diffusion models, existing methods often overlook the consistency between results generated by quantized models and those from floating-point models. This consistency is paramount for professional applications where both efficiency and output reliability are essential. To ensure that quantized SDM generates high-quality and consistent images, we propose an efficient quantization framework for SDM. Our framework introduces a Serial-to-Parallel pipeline that simultaneously maintains training-inference consistency and ensures optimization stability. Building upon this foundation, we further develop several techniques including multi-timestep activation quantization, time information precalculation, inter-layer distillation, and selective freezing, to achieve high-fidelity generation in comparison to floating-point models while maintaining quantization efficiency. Through comprehensive evaluation across multiple Stable Diffusion variants (v1-4, v2-1, XL 1.0, and v3), our method demonstrates superior performance over state-of-the-art approaches with shorter training times. Under W4A8 quantization settings, we achieve significant improvements in both distribution similarity and visual fidelity, while preserving a high image quality.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs
INT8 W8A8 post-training quantization of Ideogram 4.0 preserves FP8 quality on a 200-prompt benchmark while outperforming NF4 on CLIP score and offering a favorable quality-memory trade-off via GGUF Q4_K.
-
Smaller Models, Unexpected Costs: Trade-offs in LLM Quantization for Automated Program Repair
Empirical evaluation of 13 quantization configurations on 6 LLMs for APR shows reduced memory (up to 85%) but increased inference time/energy, different repaired problem sets with little overlap, and 48% of configs st...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.