FASQ delivers calibration-free LLM compression with continuous size trade-offs via product quantization and custom CUDA kernels that accelerate decode beyond FP16 speeds on consumer hardware.
Wino- grande: An adversarial winograd schema challenge at scale.Communications of the ACM, 64(9):99–106, 2021
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression
FASQ delivers calibration-free LLM compression with continuous size trade-offs via product quantization and custom CUDA kernels that accelerate decode beyond FP16 speeds on consumer hardware.