AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations

Tao, Q · 2024 · arXiv 2410.13212

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference

cs.LG · 2026-04-27 · conditional · novelty 6.0

A single shared asymmetrically compressed KV cache pool enables up to 15 concurrent LLM agents with 2.91x compression, 97.7% memory reduction, and only +0.57% perplexity increase on Llama-3-8B.

HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization

cs.LG · 2026-05-05 · unverdicted · novelty 5.0 · 2 refs

HeadQ applies score-space logit corrections for keys and attention-weighted surrogates for values to KV-cache quantization, removing 84-94% of excess perplexity in 2-bit key experiments across six models.

citing papers explorer

Showing 2 of 2 citing papers.

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference cs.LG · 2026-04-27 · conditional · none · ref 3
A single shared asymmetrically compressed KV cache pool enables up to 15 concurrent LLM agents with 2.91x compression, 97.7% memory reduction, and only +0.57% perplexity increase on Llama-3-8B.
HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization cs.LG · 2026-05-05 · unverdicted · none · ref 16 · 2 links
HeadQ applies score-space logit corrections for keys and attention-weighted surrogates for values to KV-cache quantization, removing 84-94% of excess perplexity in 2-bit key experiments across six models.

AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization Configurations

fields

years

verdicts

representative citing papers

citing papers explorer