Pith Number

pith:HKNIHO4Q

pith:2026:HKNIHO4QFLZG2VZKX2YRGXOCFR

not attested not anchored not stored refs resolved

EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction

Qingfu Zhu, Shiyu Ji, Wanxiang Che, Yijun Liu, Yixuan Wang

EchoKV compresses the KV cache by reconstructing discarded components from retained ones using attention head similarities.

arxiv:2603.22910 v2 · 2026-03-24 · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{HKNIHO4QFLZG2VZKX2YRGXOCFR}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

EchoKV consistently outperforms existing methods across multiple compression ratios and backbone models while preserving the throughput of full-cache inference in short-context scenarios.

C2weakest assumption

That intrinsic inter-layer and intra-layer similarities among attention heads are sufficiently stable and informative for a lightweight network to accurately reconstruct the discarded KV components without introducing errors that degrade downstream performance.

C3one line summary

EchoKV compresses LLM KV caches by reconstructing missing components from partial data via inter- and intra-layer attention similarities, outperforming prior methods on LongBench and RULER while supporting on-demand full-cache inference.

References

22 extracted · 22 resolved · 11 Pith anchors

[1] GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints · arXiv:2305.13245

[2] xkv: Cross-layer svd for kv-cache compression

[3] Palu: Compressing kv-cache with low-rank projection.arXiv preprint arXiv:2407.21118

[4] Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models · arXiv:2503.09567

[5] Homogeneous keys, heterogeneous values: Exploiting local kv cache asymmetry for long-context llms.arXiv preprint arXiv:2506.05410. Tri Dao

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-18T03:09:22.581986Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

3a9a83bb902af26d572abeb1135dc22c45a496fe6d2f1aed316d8d677dd3a4a6

Aliases

arxiv: 2603.22910 · arxiv_version: 2603.22910v2 · doi: 10.48550/arxiv.2603.22910 · pith_short_12: HKNIHO4QFLZG · pith_short_16: HKNIHO4QFLZG2VZK · pith_short_8: HKNIHO4Q

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/HKNIHO4QFLZG2VZKX2YRGXOCFR \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 3a9a83bb902af26d572abeb1135dc22c45a496fe6d2f1aed316d8d677dd3a4a6

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "83b6017948d1ddfa7fe6d32c85acbd709353e81c36033790db0a1379dd017175",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-03-24T07:58:42Z",
    "title_canon_sha256": "0237fd64785f5c9872c9b505fb8aae054d36de633f147c5171a02e95736711a1"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.22910",
    "kind": "arxiv",
    "version": 2
  }
}