pith. sign in
Pith Number

pith:BRGDBQMQ

pith:2026:BRGDBQMQYHH2KATHMSVTRIFOYH
not attested not anchored not stored refs resolved

Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility

2), (2) MICS, CentraleSup\'elec), Gergely Szilvasy (1), Herv\'e J\'egou (1) ((1) Meta FAIR, Lo\"ic Cabannes (1), Manuel Faysse (1, Maria Lomeli (1), Matthijs Douze (1), Pierre-Emmanuel Mazar\'e (1), Wen-tau Yih (1)

A lightweight utility predictor scores each key-value pair and decides whether to retain it in the cache, achieving dynamic 3- to 10-fold compression.

arxiv:2605.14037 v1 · 2026-05-13 · cs.LG · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{BRGDBQMQYHH2KATHMSVTRIFOYH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

SP-KV performs dynamic sparsification: the mechanism adapts to the input and typically reduces the KV cache size by a factor of 3 to 10×, longer sequences often being more compressible. This leads to vast improvements in memory usage and decoding speed, with little to no degradation of validation loss nor performance on a broad set of downstream tasks.

C2weakest assumption

A lightweight utility predictor trained jointly with the LLM using only next-token prediction loss can accurately forecast which KV pairs will be needed in the future without introducing meaningful errors or extra overhead.

C3one line summary

SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.

References

82 extracted · 82 resolved · 13 Pith anchors

[1] Ye, Zihao and Zheng, Lianmin and Chen, Tianqi and Ceze, Luis , journal=. Flash
[2] Shah, Jay and Bikshandi, Ganesh and Zhang, Ying and Thakkar, Vijay and Ramani, Pradeep and Dao, Tri , journal=. Flash
[3] GLU Variants Improve Transformer 2002 · arXiv:2002.05202
[4] Training with quantization noise for extreme fixed-point compression 2004
[5] The journal of machine learning research , volume= 2014
Receipt and verification
First computed 2026-05-17T23:39:12.781106Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

0c4c30c190c1cfa5026764ab38a0aec1e0fa12bc255c071aec4bc633005d0e53

Aliases

arxiv: 2605.14037 · arxiv_version: 2605.14037v1 · doi: 10.48550/arxiv.2605.14037 · pith_short_12: BRGDBQMQYHH2 · pith_short_16: BRGDBQMQYHH2KATH · pith_short_8: BRGDBQMQ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/BRGDBQMQYHH2KATHMSVTRIFOYH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0c4c30c190c1cfa5026764ab38a0aec1e0fa12bc255c071aec4bc633005d0e53
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f08ff89cbdbe68f8636c04cdccf0794d76aeda45bd585a9030864a16073293ac",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-13T18:58:16Z",
    "title_canon_sha256": "b63cf5e4b12fd45755a0039792d5d4283a0d78127de450b0c7ff9c7ae68cb99e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14037",
    "kind": "arxiv",
    "version": 1
  }
}