Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling

· 2026 · cs.AI · arXiv 2604.18103

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Prefilling computational costs pose a significant bottleneck for Large Language Models (LLMs) and Large Multimodal Models (LMMs) in long-context settings. While token pruning reduces sequence length, prior methods rely on heuristics that break compatibility with hardware-efficient kernels like FlashAttention. In this work, we observe that tokens evolve toward \textit{semantic fixing points}, making further processing redundant. To this end, we introduce Delta Attention Selective Halting (DASH), a training-free policy that monitors the layer-wise update dynamics of the self-attention mechanism to selectively halt stabilized tokens. Extensive evaluation confirms that DASH generalizes across language and vision benchmarks, delivering significant prefill speedups while preserving model accuracy and hardware efficiency. Code will be released at https://github.com/verach3n/DASH.git.

representative citing papers

STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models

cs.CV · 2026-06-01 · unverdicted · novelty 6.0

STaR-KV is a training-free KV cache compression framework for GUI VLMs that uses subspace-aware scoring, temporal stability discounts, and entropy-based temperature adaptation to outperform prior methods at matched budgets while reducing peak memory by ~40% at 20% cache size.

citing papers explorer

Showing 1 of 1 citing paper.

STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models cs.CV · 2026-06-01 · unverdicted · none · ref 35 · internal anchor
STaR-KV is a training-free KV cache compression framework for GUI VLMs that uses subspace-aware scoring, temporal stability discounts, and entropy-based temperature adaptation to outperform prior methods at matched budgets while reducing peak memory by ~40% at 20% cache size.

Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling

fields

years

verdicts

representative citing papers

citing papers explorer