AXELRAM performs attention on quantized KV cache using a fixed orthogonal-transform codebook, reducing multiplications by 102.4x and fixing sign-sensitivity spikes via gradient-free calibration.
LOOKAT: Lookup-Optimized Key-Attention for Memory- Efficient Transformers
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2representative citing papers
HeadQ applies score-space logit corrections for keys and attention-weighted surrogates for values to KV-cache quantization, removing 84-94% of excess perplexity in 2-bit key experiments across six models.
citing papers explorer
-
AXELRAM: Quantize Once, Never Dequantize
AXELRAM performs attention on quantized KV cache using a fixed orthogonal-transform codebook, reducing multiplications by 102.4x and fixing sign-sensitivity spikes via gradient-free calibration.
-
HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization
HeadQ applies score-space logit corrections for keys and attention-weighted surrogates for values to KV-cache quantization, removing 84-94% of excess perplexity in 2-bit key experiments across six models.