GemFilter: Discovering Gems in Early Layers for Accelerated Long-Context LLMs,

Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, Yingyu Liang, Shafiq Joty · 2024 · arXiv 2409.17422

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration

cs.LG · 2025-02-03 · unverdicted · novelty 7.0

FastKV decouples prefill context reduction via Token-Selective Propagation from independent KV cache selection, delivering up to 1.82x prefill and 2.87x decoding speedups while matching decoding-only accuracy.

StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference

cs.CL · 2026-04-08 · unverdicted · novelty 6.0

StructKV compresses LLM KV caches by tracking global in-degree centrality across network depth and dynamically selecting compression layers to preserve long-range dependencies better than local pruning methods.

Correctness-Aware Repository Filtering Under Maximum Effective Context Window Constraints

cs.SE · 2026-05-14 · unverdicted · novelty 5.0

A pre-execution size filter cuts repository tokens by 80-89% at sub-millisecond cost and raises file-level accuracy from 25% to 72% in a small CodeLlama evaluation.

citing papers explorer

Showing 3 of 3 citing papers.

FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration cs.LG · 2025-02-03 · unverdicted · none · ref 29
FastKV decouples prefill context reduction via Token-Selective Propagation from independent KV cache selection, delivering up to 1.82x prefill and 2.87x decoding speedups while matching decoding-only accuracy.
StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference cs.CL · 2026-04-08 · unverdicted · none · ref 15
StructKV compresses LLM KV caches by tracking global in-degree centrality across network depth and dynamically selecting compression layers to preserve long-range dependencies better than local pruning methods.
Correctness-Aware Repository Filtering Under Maximum Effective Context Window Constraints cs.SE · 2026-05-14 · unverdicted · none · ref 19
A pre-execution size filter cuts repository tokens by 80-89% at sub-millisecond cost and raises file-level accuracy from 25% to 72% in a small CodeLlama evaluation.

GemFilter: Discovering Gems in Early Layers for Accelerated Long-Context LLMs,

fields

years

verdicts

representative citing papers

citing papers explorer