OSC separates token-persistent outlier channels in activations into a compact high-precision tensor for dual-path 4-bit GEMM computation, limiting accuracy loss to roughly 1-2 points on Qwen3 models while delivering up to 1.78x speedup over W8A8 baselines.
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension
OSC separates token-persistent outlier channels in activations into a compact high-precision tensor for dual-path 4-bit GEMM computation, limiting accuracy loss to roughly 1-2 points on Qwen3 models while delivering up to 1.78x speedup over W8A8 baselines.