In- context kv-cache eviction for llms via attention-gate

Zeng, Z · arXiv 2410.12876

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

HARD-KV: Head-Adaptive Regularization for Decoding-time KV Compression

cs.LG · 2026-06-27 · unverdicted · novelty 5.0

HARD-KV bridges dynamic head-adaptive KV cache compression with static inference engine constraints via Cascade Cache and Logits Calibration, reporting up to 2x throughput gains on long-context math benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

HARD-KV: Head-Adaptive Regularization for Decoding-time KV Compression cs.LG · 2026-06-27 · unverdicted · none · ref 33
HARD-KV bridges dynamic head-adaptive KV cache compression with static inference engine constraints via Cascade Cache and Logits Calibration, reporting up to 2x throughput gains on long-context math benchmarks.

In- context kv-cache eviction for llms via attention-gate

fields

years

verdicts

representative citing papers

citing papers explorer