pith. sign in
Pith Number

pith:MDJKPL5S

pith:2024:MDJKPL5S3IEMCXYQ4Z5CCVQHIW
not attested not anchored not stored refs resolved

Massive Activations in Large Language Models

J. Zico Kolter, Mingjie Sun, Xinlei Chen, Zhuang Liu

Large language models contain a small number of massive activations that remain constant across inputs and act as indispensable bias terms.

arxiv:2402.17762 v2 · 2024-02-27 · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MDJKPL5S3IEMCXYQ4Z5CCVQHIW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

very few activations exhibit significantly larger values than others (e.g., 100,000 times larger). We call them massive activations... their values largely stay constant regardless of the input, and they function as indispensable bias terms in LLMs... these massive activations lead to the concentration of attention probabilities to their corresponding tokens.

C2weakest assumption

That the observed constancy of massive activation values and their role as indispensable bias terms generalize across all LLMs, inputs, and architectures based on the limited set of models and characterizations performed.

C3one line summary

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

References

159 extracted · 159 resolved · 47 Pith anchors

[1] Exploring Length Generalization in Large Language Models 2022
[2] Computational complexity: a modern approach 2009
[3] URLhttps://arxiv.org/pdf/2202.05826 2022
[4] arXiv preprint arXiv:2207.08799 , year= 2022
[5] Mix Barrington 1986

Formal links

2 machine-checked theorem links

Cited by

33 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.754966Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

60d2a7afb2da08c15f10e67a215607459bca6ed57194e20a0f3dbc5b94bfe664

Aliases

arxiv: 2402.17762 · arxiv_version: 2402.17762v2 · doi: 10.48550/arxiv.2402.17762 · pith_short_12: MDJKPL5S3IEM · pith_short_16: MDJKPL5S3IEMCXYQ · pith_short_8: MDJKPL5S
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MDJKPL5S3IEMCXYQ4Z5CCVQHIW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 60d2a7afb2da08c15f10e67a215607459bca6ed57194e20a0f3dbc5b94bfe664
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "161a0a2b92c9dbee51eb5242b3c2633f8c7a752a67326dc218027ce16a6a8324",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-02-27T18:55:17Z",
    "title_canon_sha256": "1375592bd25780fa45da9e4a454856fb6a1918f2dfb6bdb9df98135e4a994fe3"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2402.17762",
    "kind": "arxiv",
    "version": 2
  }
}