pith. sign in
Pith Number

pith:BJQSU5IA

pith:2025:BJQSU5IAHQ3UUDLLXFHSU5PRVH
not attested not anchored not stored refs resolved

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Chong Ruan, Damai Dai, Huazuo Gao, Jingyang Yuan, Junyu Luo, Lean Wang, Liang Zhao, Ming Zhang, Wangding Zeng, Wenfeng Liang, Yuqing Wang, Y. X. Wei, Zhenda Xie, Zhengyan Zhang, Zhiping Xiao

NSA introduces a natively trainable sparse attention that matches full attention performance on long contexts while delivering major speedups.

arxiv:2502.11089 v2 · 2025-02-16 · cs.CL · cs.AI · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{BJQSU5IAHQ3UUDLLXFHSU5PRVH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments show the model pretrained with NSA maintains or exceeds Full Attention models across general benchmarks, long-context tasks, and instruction-based reasoning, while achieving substantial speedups over Full Attention on 64k-length sequences across decoding, forward, and backward propagation.

C2weakest assumption

That the dynamic hierarchical sparse strategy (coarse compression plus fine selection) preserves both global context awareness and local precision without introducing systematic biases that would degrade performance on unseen long-context distributions.

C3one line summary

NSA is a hardware-aligned sparse attention mechanism that enables end-to-end trainable long-context modeling by combining coarse token compression with fine-grained selection.

References

65 extracted · 65 resolved · 24 Pith anchors

[10] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model 2024 · arXiv:2405.04434
[11] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 2025 · arXiv:2501.12948
[22] G. Kamradt. LLMTest NeedleInAHaystack . GitHub repository, 2023. URL https://github.com/gkamradt/LLMTest_NeedleInAHaystack. Accessed: [Insert Access Date Here] 2023
[26] J. S. Park, J. C. O'Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein. Generative agents: Interactive simulacra of human behavior. In S. Follmer, J. Han, J. Steimle, and N. H. Riche, edito 2023
[27] B. Peng, J. Quesnelle, H. Fan, and E. Shippole. Yarn: Efficient context window extension of large language models. In ICLR . OpenReview.net, 2024 2024

Formal links

2 machine-checked theorem links

Cited by

26 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.196120Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

0a612a75003c374a0d6bb94f2a75f1a9e4a56978cee08ee17fa06c7ee83a2611

Aliases

arxiv: 2502.11089 · arxiv_version: 2502.11089v2 · doi: 10.48550/arxiv.2502.11089 · pith_short_12: BJQSU5IAHQ3U · pith_short_16: BJQSU5IAHQ3UUDLL · pith_short_8: BJQSU5IA
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/BJQSU5IAHQ3UUDLLXFHSU5PRVH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0a612a75003c374a0d6bb94f2a75f1a9e4a56978cee08ee17fa06c7ee83a2611
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "5c3871eff56bff92b8a0a30d0db50d9978710d95f1d6dbf729b36efb96d7340f",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-02-16T11:53:44Z",
    "title_canon_sha256": "1ee096cb16e96cc1bce6e3699add8767e82e700c718bc04601268dca1b884e55"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2502.11089",
    "kind": "arxiv",
    "version": 2
  }
}