SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Tim Dettmers, Ruslan Svirschevski, et al · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension

cs.LG · 2026-04-14 · unverdicted · novelty 5.0

OSC separates token-persistent outlier channels in activations into a compact high-precision tensor for dual-path 4-bit GEMM computation, limiting accuracy loss to roughly 1-2 points on Qwen3 models while delivering up to 1.78x speedup over W8A8 baselines.

citing papers explorer

Showing 1 of 1 citing paper.

OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension cs.LG · 2026-04-14 · unverdicted · none · ref 9
OSC separates token-persistent outlier channels in activations into a compact high-precision tensor for dual-path 4-bit GEMM computation, limiting accuracy loss to roughly 1-2 points on Qwen3 models while delivering up to 1.78x speedup over W8A8 baselines.

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

fields

years

verdicts

representative citing papers

citing papers explorer