Hugging Face

Raas: Reasoning-aware attention sparsity for efficient llm reasoning · arXiv 2502.11147

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning

cs.CL · 2025-10-10 · conditional · novelty 7.0

DELTA partitions layers into full, delta, and sparse groups to select salient tokens via aggregated attention scores, matching full-attention accuracy on AIME and GPQA while cutting attended tokens up to 4.25x and achieving 1.54x speedup.

CSAttention: Centroid-Scoring Attention for Accelerating LLM Inference

cs.LG · 2026-03-30 · unverdicted · novelty 6.0

CSAttention precomputes fixed-size query-centric lookup tables in offline prefill to enable fast table-lookup decoding, delivering near-identical accuracy to full attention and up to 4.6x speedup at 95% sparsity for 32K-128K contexts.

citing papers explorer

Showing 2 of 2 citing papers.

DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning cs.CL · 2025-10-10 · conditional · none · ref 13
DELTA partitions layers into full, delta, and sparse groups to select salient tokens via aggregated attention scores, matching full-attention accuracy on AIME and GPQA while cutting attended tokens up to 4.25x and achieving 1.54x speedup.
CSAttention: Centroid-Scoring Attention for Accelerating LLM Inference cs.LG · 2026-03-30 · unverdicted · none · ref 5
CSAttention precomputes fixed-size query-centric lookup tables in offline prefill to enable fast table-lookup decoding, delivering near-identical accuracy to full attention and up to 4.6x speedup at 95% sparsity for 32K-128K contexts.

Hugging Face

fields

years

verdicts

representative citing papers

citing papers explorer