pith:BJQSU5IA
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
NSA introduces a natively trainable sparse attention that matches full attention performance on long contexts while delivering major speedups.
arxiv:2502.11089 v2 · 2025-02-16 · cs.CL · cs.AI · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{BJQSU5IAHQ3UUDLLXFHSU5PRVH}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Experiments show the model pretrained with NSA maintains or exceeds Full Attention models across general benchmarks, long-context tasks, and instruction-based reasoning, while achieving substantial speedups over Full Attention on 64k-length sequences across decoding, forward, and backward propagation.
That the dynamic hierarchical sparse strategy (coarse compression plus fine selection) preserves both global context awareness and local precision without introducing systematic biases that would degrade performance on unseen long-context distributions.
NSA is a hardware-aligned sparse attention mechanism that enables end-to-end trainable long-context modeling by combining coarse token compression with fine-grained selection.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:46.196120Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
0a612a75003c374a0d6bb94f2a75f1a9e4a56978cee08ee17fa06c7ee83a2611
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/BJQSU5IAHQ3UUDLLXFHSU5PRVH \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0a612a75003c374a0d6bb94f2a75f1a9e4a56978cee08ee17fa06c7ee83a2611
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "5c3871eff56bff92b8a0a30d0db50d9978710d95f1d6dbf729b36efb96d7340f",
"cross_cats_sorted": [
"cs.AI",
"cs.LG"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2025-02-16T11:53:44Z",
"title_canon_sha256": "1ee096cb16e96cc1bce6e3699add8767e82e700c718bc04601268dca1b884e55"
},
"schema_version": "1.0",
"source": {
"id": "2502.11089",
"kind": "arxiv",
"version": 2
}
}