pith. sign in

Scissorhands: Exploiting the persistence of importance hypothesis for llm kv cache compression at test time.arXiv preprint arXiv:2305.17118

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

citation-role summary

background 1 dataset 1

citation-polarity summary

years

2026 10 2023 1

clear filters

representative citing papers

RoPE-Aware Bit Allocation for KV-Cache Quantization

cs.LG · 2026-06-23 · unverdicted · novelty 7.0

Block-GTQ performs RoPE-aware greedy bit allocation on KV caches using per-block energy scores, cutting logit MAE 32-80% versus uniform TQ-MSE and lifting long-context task scores substantially at 2-3 bits per dimension.

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

Sparse prefix caching via dynamic programming for optimal checkpoint placement under overlap distributions improves the Pareto frontier for recurrent and hybrid LLM serving on shared-prefix data.

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

cs.CL · 2023-10-03 · conditional · novelty 6.0

FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.

KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving

cs.AR · 2026-05-10 · unverdicted · novelty 5.0 · 2 refs

KV-RM regularizes KV-cache movement via block paging and coalesced transfers to improve throughput, tail latency, and memory efficiency in static-graph LLM serving without changing the decoder interface.

citing papers explorer

Showing 1 of 1 citing paper after filters.