LOOKAT: Lookup-Optimized Key-Attention for Memory- Efficient Transformers

· 2026 · arXiv 2601.10155

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

AXELRAM: Quantize Once, Never Dequantize

cs.LG · 2026-04-03 · conditional · novelty 6.0

AXELRAM performs attention on quantized KV cache using a fixed orthogonal-transform codebook, reducing multiplications by 102.4x and fixing sign-sensitivity spikes via gradient-free calibration.

HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization

cs.LG · 2026-05-05 · unverdicted · novelty 5.0 · 2 refs

HeadQ applies score-space logit corrections for keys and attention-weighted surrogates for values to KV-cache quantization, removing 84-94% of excess perplexity in 2-bit key experiments across six models.

citing papers explorer

Showing 2 of 2 citing papers.

AXELRAM: Quantize Once, Never Dequantize cs.LG · 2026-04-03 · conditional · none · ref 8
AXELRAM performs attention on quantized KV cache using a fixed orthogonal-transform codebook, reducing multiplications by 102.4x and fixing sign-sensitivity spikes via gradient-free calibration.
HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization cs.LG · 2026-05-05 · unverdicted · none · ref 24 · 2 links
HeadQ applies score-space logit corrections for keys and attention-weighted surrogates for values to KV-cache quantization, removing 84-94% of excess perplexity in 2-bit key experiments across six models.

LOOKAT: Lookup-Optimized Key-Attention for Memory- Efficient Transformers

fields

years

verdicts

representative citing papers

citing papers explorer