OCTOPUS compresses KV caches via octahedral parametrization of rotated triplets and squared-error-optimized Lloyd-Max quantization, matching or exceeding prior rotation codecs with growing gains at low bit widths.
Atom: Low-bit quantization for efficient and accurate LLM serving
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization
OCTOPUS compresses KV caches via octahedral parametrization of rotated triplets and squared-error-optimized Lloyd-Max quantization, matching or exceeding prior rotation codecs with growing gains at low bit widths.