FASQ delivers calibration-free LLM compression with continuous size trade-offs via product quantization and custom CUDA kernels that accelerate decode beyond FP16 speeds on consumer hardware.
Awq: Activation- aware weight quantization for llm compression and acceleration
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression
FASQ delivers calibration-free LLM compression with continuous size trade-offs via product quantization and custom CUDA kernels that accelerate decode beyond FP16 speeds on consumer hardware.