SOCKET: SOft Collision Kernel EsTimator for Sparse Attention

· 2026 · cs.LG · arXiv 2602.06283

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Exploiting sparsity during long-context inference is key to scaling large language models, as attention dominates the cost of autoregressive decoding. Sparse attention reduces this cost by restricting computation to a subset of tokens, but its effectiveness depends on efficient scoring and selection at inference time. We revisit Locality-Sensitive Hashing (LSH) and introduce SOCKET, a SOft Collision Kernel EsTimator that replaces hard bucket matches with probabilistic, similarity-aware aggregation. Traditional LSH yields binary collision signals that limit ranking quality and require substantial memory to perform well. In contrast, soft LSH accumulates graded collision evidence across hash tables, preserving top-k ordering with significantly less memory. This reframes LSH from a candidate generator into a principled scoring kernel for sparse attention. Leveraging this property, SOCKET enables efficient token selection without ad hoc voting and matches or surpasses prior sparse attention methods across multiple long-context benchmarks. With a custom CUDA scoring kernel and a Flash Decode Triton backend, SOCKET achieves up to 1.5$\times$ higher throughput than FlashAttention.

representative citing papers

Inference Time Context Sparsity: Illusion or Opportunity?

cs.AI · 2026-05-22 · unverdicted · novelty 5.0

Current LLMs remain robust to high levels of inference-time context sparsity across diverse tasks, enabling up to 10x acceleration via sparse kernels.

citing papers explorer

Showing 1 of 1 citing paper.

Inference Time Context Sparsity: Illusion or Opportunity? cs.AI · 2026-05-22 · unverdicted · none · ref 20 · internal anchor
Current LLMs remain robust to high levels of inference-time context sparsity across diverse tasks, enabling up to 10x acceleration via sparse kernels.

SOCKET: SOft Collision Kernel EsTimator for Sparse Attention

fields

years

verdicts

representative citing papers

citing papers explorer