pith. sign in

KVTuner: Sensitivity-aware layer-wise mixed-precision KV cache quantization for efficient and nearly lossless LLM inference

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 3 cs.AR 1

years

2026 4

verdicts

UNVERDICTED 4

roles

background 1

polarities

background 1

representative citing papers

VeriCache: Turning Lossy KV Cache into Lossless LLM Inference

cs.AR · 2026-05-17 · unverdicted · novelty 6.0

VeriCache turns lossy KV cache compression into lossless LLM inference by drafting with compressed cache and verifying drafts with full cache, achieving up to 4x throughput with identical outputs.

A Simple Plug-in for Improving Eviction-Based KV Cache Compression

cs.LG · 2026-05-22 · unverdicted · novelty 4.0

VECTOR augments eviction-based KV cache compression with three-way token routing that combines importance scoring and offline regression-based reconstructability estimation to improve quality at high compression ratios.

citing papers explorer

Showing 4 of 4 citing papers.