pith. sign in

Title resolution pending

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 5

roles

background 2

polarities

background 2

clear filters

representative citing papers

EntmaxKV: Support-Aware Decoding for Entmax Attention

cs.LG · 2026-05-20 · conditional · novelty 8.0

EntmaxKV enables exact sparse KV-cache decoding for entmax attention via support-aware page selection and a Gaussian threshold estimator, matching full attention quality at a fraction of the cache size with up to 5.43x speedup.

Scaling Limits of Long-Context Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

For uniform keys on the d-dimensional sphere, softmax attention becomes selective at inverse temperature scaling β_n* ≍ n^{2/(d-1)}, with explicit limiting laws for attention weights and outputs in each regime.

Sparse Attention as Compact Kernel Regression

cs.LG · 2026-01-30 · unverdicted · novelty 8.0

Sparse attention arises from compact kernel regression, with Epanechnikov and similar kernels mapping to normalized ReLU, sparsemax, and alpha-entmax attention.

citing papers explorer

Showing 4 of 4 citing papers after filters.