FastKV decouples prefill context reduction via Token-Selective Propagation from independent KV cache selection, delivering up to 1.82x prefill and 2.87x decoding speedups while matching decoding-only accuracy.
GemFilter: Discovering Gems in Early Layers for Accelerated Long-Context LLMs,
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
StructKV compresses LLM KV caches by tracking global in-degree centrality across network depth and dynamically selecting compression layers to preserve long-range dependencies better than local pruning methods.
A pre-execution size filter cuts repository tokens by 80-89% at sub-millisecond cost and raises file-level accuracy from 25% to 72% in a small CodeLlama evaluation.
citing papers explorer
-
FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration
FastKV decouples prefill context reduction via Token-Selective Propagation from independent KV cache selection, delivering up to 1.82x prefill and 2.87x decoding speedups while matching decoding-only accuracy.
-
StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference
StructKV compresses LLM KV caches by tracking global in-degree centrality across network depth and dynamically selecting compression layers to preserve long-range dependencies better than local pruning methods.
-
Correctness-Aware Repository Filtering Under Maximum Effective Context Window Constraints
A pre-execution size filter cuts repository tokens by 80-89% at sub-millisecond cost and raises file-level accuracy from 25% to 72% in a small CodeLlama evaluation.