OCTOPUS compresses KV caches via octahedral parametrization of rotated triplets and squared-error-optimized Lloyd-Max quantization, matching or exceeding prior rotation codecs with growing gains at low bit widths.
GPT3.int8(): 8-bit matrix multiplication for transformers at scale.Neural Information Processing Systems (NeurIPS), 35: 30318–30332, 2022
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization
OCTOPUS compresses KV caches via octahedral parametrization of rotated triplets and squared-error-optimized Lloyd-Max quantization, matching or exceeding prior rotation codecs with growing gains at low bit widths.