Pith Number

pith:MDJKPL5S

pith:2024:MDJKPL5S3IEMCXYQ4Z5CCVQHIW

not attested not anchored not stored refs resolved

Massive Activations in Large Language Models

J. Zico Kolter, Mingjie Sun, Xinlei Chen, Zhuang Liu

Large language models contain a small number of massive activations that remain constant across inputs and act as indispensable bias terms.

arxiv:2402.17762 v2 · 2024-02-27 · cs.CL · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{MDJKPL5S3IEMCXYQ4Z5CCVQHIW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

very few activations exhibit significantly larger values than others (e.g., 100,000 times larger). We call them massive activations... their values largely stay constant regardless of the input, and they function as indispensable bias terms in LLMs... these massive activations lead to the concentration of attention probabilities to their corresponding tokens.

C2weakest assumption

That the observed constancy of massive activation values and their role as indispensable bias terms generalize across all LLMs, inputs, and architectures based on the limited set of models and characterizations performed.

C3one line summary

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

References

159 extracted · 159 resolved · 47 Pith anchors

[1] Exploring Length Generalization in Large Language Models 2022

[2] Computational complexity: a modern approach 2009

[3] URLhttps://arxiv.org/pdf/2202.05826 2022

[4] arXiv preprint arXiv:2207.08799 , year= 2022

[5] Mix Barrington 1986

Formal links

2 machine-checked theorem links

Cited by

33 papers in Pith

Steered Generation via Gradient-Based Optimization on Sparse Query Features

A Simple Plug-in for Improving Eviction-Based KV Cache Compression

Multi-Gate Residuals

Precision Tracked Transformer via Kalman Filtering, Kriging and Process Noise

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Receipt and verification

First computed	2026-05-17T23:38:48.754966Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

60d2a7afb2da08c15f10e67a215607459bca6ed57194e20a0f3dbc5b94bfe664

Aliases

arxiv: 2402.17762 · arxiv_version: 2402.17762v2 · doi: 10.48550/arxiv.2402.17762 · pith_short_12: MDJKPL5S3IEM · pith_short_16: MDJKPL5S3IEMCXYQ · pith_short_8: MDJKPL5S

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/MDJKPL5S3IEMCXYQ4Z5CCVQHIW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 60d2a7afb2da08c15f10e67a215607459bca6ed57194e20a0f3dbc5b94bfe664

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "161a0a2b92c9dbee51eb5242b3c2633f8c7a752a67326dc218027ce16a6a8324",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2024-02-27T18:55:17Z",
    "title_canon_sha256": "1375592bd25780fa45da9e4a454856fb6a1918f2dfb6bdb9df98135e4a994fe3"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2402.17762",
    "kind": "arxiv",
    "version": 2
  }
}