pith. sign in

Scissorhands: Exploiting the persistence of importance hypothesis for llm kv cache compression at test time.Neural Information Processing Systems, 36:52342–52364

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

dataset 1

citation-polarity summary

years

2026 7 2023 1

roles

dataset 1

polarities

use dataset 1

representative citing papers

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

Sparse prefix caching via dynamic programming for optimal checkpoint placement under overlap distributions improves the Pareto frontier for recurrent and hybrid LLM serving on shared-prefix data.

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

cs.CL · 2023-10-03 · conditional · novelty 6.0

FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.

citing papers explorer

Showing 8 of 8 citing papers.