pith. sign in
Pith Number

pith:EM2A7KL7

pith:2023:EM2A7KL7DVBG3VHQBSRAVTUMG5
not attested not anchored not stored refs resolved

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

Jianfeng Gao, Jiawei Han, Liyuan Liu, Minjia Zhang, Suyu Ge, Yunan Zhang

LLMs can cut KV cache memory by profiling attention heads once and evicting tokens selectively per head type.

arxiv:2310.01801 v4 · 2023-10-03 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{EM2A7KL7DVBG3VHQBSRAVTUMG5}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we conduct targeted profiling to discern the intrinsic structure of attention modules. Based on the recognized structure, we then construct the KV cache in an adaptive manner: evicting long-range contexts on attention heads emphasizing local contexts, discarding non-special tokens on attention heads centered on special tokens, and only employing the standard KV cache for attention heads that broadly attend to all tokens. Moreover, with the lightweight attention profiling used to guide the construction of the adaptive KV cache, FastGen can be deployed without resource-intensive fine-tuning or re-training. In our experiments across various tasks, FastGen demonstrates substantial reduction on GPU memory consumption with negligible generation quality loss.

C2weakest assumption

That the attention-head structures identified by a single lightweight profiling pass remain stable and sufficient to guide token eviction across diverse generation tasks and contexts without materially degrading output quality or requiring any model updates.

C3one line summary

FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.

References

85 extracted · 85 resolved · 18 Pith anchors

[1] 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year= 2017
[2] 2019 , journal = 2019
[5] Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwa 2020
[8] SC22: International Conference for High Performance Computing, Networking, Storage and Analysis , year=
[9] International Conference on Machine Learning , year=

Formal links

2 machine-checked theorem links

Cited by

32 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:14.272361Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

23340fa97f1d426dd4f00ca20ace8c37532352596ea1ea91a591c8ed76947c51

Aliases

arxiv: 2310.01801 · arxiv_version: 2310.01801v4 · doi: 10.48550/arxiv.2310.01801 · pith_short_12: EM2A7KL7DVBG · pith_short_16: EM2A7KL7DVBG3VHQ · pith_short_8: EM2A7KL7
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/EM2A7KL7DVBG3VHQBSRAVTUMG5 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 23340fa97f1d426dd4f00ca20ace8c37532352596ea1ea91a591c8ed76947c51
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "0e193d84667f6bfe1d3dd00a5dcff1a21a7121bcb148bb9dae7ceb420b7365e4",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2023-10-03T05:17:08Z",
    "title_canon_sha256": "67dcb3a627ccfd1e7e1ed9f026f773a287848ff13092ed123185a24c13fd96f3"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2310.01801",
    "kind": "arxiv",
    "version": 4
  }
}