DepthKV allocates a fixed global KV cache budget across LLM layers based on per-layer pruning sensitivity, outperforming uniform pruning at the same overall budget.
A.2 Model Licenses We employ open-weight language models accessed via the Hugging Face Transformers library (Wolf et al., 2020)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference
DepthKV allocates a fixed global KV cache budget across LLM layers based on per-layer pruning sensitivity, outperforming uniform pruning at the same overall budget.