pith:TVUCJQGM
When Attention Sink Emerges in Language Models: An Empirical View
Attention sinks in language models emerge from softmax normalization and act as key biases storing non-informative scores.
arxiv:2410.10781 v2 · 2024-10-14 · cs.CL · cs.AI · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{TVUCJQGMRXM3TGG3S374KNKC5O}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We find that attention sink acts more like key biases, storing extra attention scores, which could be non-informative and not contribute to the value computation. We also observe that this phenomenon (at least partially) stems from tokens' inner dependence on attention scores as a result of softmax normalization. After relaxing such dependence by replacing softmax attention with other attention operations, such as sigmoid attention without normalization, attention sinks do not emerge in LMs up to 1B parameters.
That the lack of attention sinks observed with sigmoid attention in models up to 1B parameters will hold for larger models and will not degrade overall language modeling performance or capabilities.
Attention sinks emerge in language models from softmax-induced token dependence on attention scores and do not appear when using sigmoid attention without normalization in models up to 1B parameters.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:47.095115Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
9d6824c0cc8dd9b998db96ffc53542eba5f42e8716777a66b04f72875b43c9d0
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TVUCJQGMRXM3TGG3S374KNKC5O \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 9d6824c0cc8dd9b998db96ffc53542eba5f42e8716777a66b04f72875b43c9d0
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "7c80d83b8a3966af2e289441d1d38931b96ecd5a368fa4d60804305042bffc1e",
"cross_cats_sorted": [
"cs.AI",
"cs.LG"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2024-10-14T17:50:28Z",
"title_canon_sha256": "d94b2f2cd6602889efecb1db67bf2c4693e4447abef13036e2cea27a37dca6bf"
},
"schema_version": "1.0",
"source": {
"id": "2410.10781",
"kind": "arxiv",
"version": 2
}
}