CompactAttention accelerates chunked-prefill attention via Block-Union KV Selection, delivering up to 2.72x speedup at 128K context on LLaMA-3.1-8B while matching dense accuracy on RULER.
Xattention: Block sparse attention with antidiagonal scoring
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.
citing papers explorer
-
CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection
CompactAttention accelerates chunked-prefill attention via Block-Union KV Selection, delivering up to 2.72x speedup at 128K context on LLaMA-3.1-8B while matching dense accuracy on RULER.
-
Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference
Flux Attention uses a context-aware Layer Router to dynamically assign full or sparse attention to each LLM layer, achieving up to 2.8x prefill and 2.0x decode speedups with competitive performance on long-context and reasoning tasks.