RateQuant delivers optimal mixed-precision KV cache quantization by per-quantizer distortion fitting followed by closed-form reverse waterfilling, reducing perplexity by 70% versus KIVI at 2.5 average bits on Qwen3-8B.
Kvquant: Towards 10 million context length llm inference with kv cache quantization
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory
RateQuant delivers optimal mixed-precision KV cache quantization by per-quantizer distortion fitting followed by closed-form reverse waterfilling, reducing perplexity by 70% versus KIVI at 2.5 average bits on Qwen3-8B.