pith. machine review for the scientific record. sign in
Pith Number

pith:MH2XVHQO

pith:2023:MH2XVHQOPBSABRG55I2UUI33LR
not attested not anchored not stored refs resolved

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Beidi Chen, Christopher R\'e, Clark Barrett, Lianmin Zheng, Ruisi Cai, Tianlong Chen, Tianyi Zhou, Ying Sheng, Yuandong Tian, Zhangyang Wang, Zhao Song, Zhenyu Zhang

Heavy-hitter tokens that dominate attention let LLMs run with a much smaller KV cache and up to 29 times higher throughput.

arxiv:2306.14048 v3 · 2023-06-24 · cs.LG

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open

Claims

C1strongest claim

Our implementation of H₂O with 20% heavy hitters improves the throughput over three leading inference systems DeepSpeed Zero-Inference, Hugging Face Accelerate, and FlexGen by up to 29×, 29×, and 3× on OPT-6.7B and OPT-30B.

C2weakest assumption

The emergence of heavy hitters is natural and strongly correlates with frequent co-occurrence of tokens, and removing them results in significant performance degradation (abstract observation that must hold for the eviction policy to remain accurate).

C3one line summary

H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.

References

145 extracted · 145 resolved · 32 Pith anchors

[1] LaMDA: Language Models for Dialog Applications 2022 · arXiv:2201.08239
[2] Wordcraft: story writing with large language models 2022
[3] Emergent Abilities of Large Language Models 2022 · arXiv:2206.07682
[4] Benchmarking Large Language Models for News Summarization 2023
[5] Efficiently scaling transformer inference 2022

Formal links

3 machine-checked theorem links

Cited by

18 papers in Pith

Receipt and verification
First computed2026-05-17T23:38:13.468578Z
Builderpith-number-builder-2026-05-17-v1
SignaturePith Ed25519 (pith-v1-2026-05) · public key
Schemapith-number/v1.0

Canonical hash

61f57a9e0e786400c4ddea354a237b5c66af12f0fdd177f5b2363818b7189051

Aliases

arxiv: 2306.14048 · arxiv_version: 2306.14048v3 · doi: 10.48550/arxiv.2306.14048 · pith_short_12: MH2XVHQOPBSA · pith_short_16: MH2XVHQOPBSABRG5 · pith_short_8: MH2XVHQO
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MH2XVHQOPBSABRG55I2UUI33LR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 61f57a9e0e786400c4ddea354a237b5c66af12f0fdd177f5b2363818b7189051
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1cf0b946cfa0d0e8f3a51b87e5d5d9de557ac26c2712fe9dc6ab49bee852cdfe",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2023-06-24T20:11:14Z",
    "title_canon_sha256": "a04537268fce239cf30f5f553c8505a3533aa36fd8781487ff50496f711acc78"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2306.14048",
    "kind": "arxiv",
    "version": 3
  }
}