Pith Number

pith:G7JVWCF7

pith:2026:G7JVWCF7HAP27HN3I25TCYOTUM

not attested not anchored not stored refs resolved

VeriCache: Turning Lossy KV Cache into Lossless LLM Inference

Dongjoo Seo, Jiayi Yao, Junchen Jiang, Kuntai Du, Rui Zhang, Samuel Shen, Shan Lu, Shaoting Feng, Yuhan Liu, Yuyang Huang

VeriCache achieves identical outputs to full-KV-cache decoding at up to 4 times higher throughput by drafting with compressed caches and verifying in parallel.

arxiv:2605.17613 v1 · 2026-05-17 · cs.AR · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{G7JVWCF7HAP27HN3I25TCYOTUM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

VeriCache achieves up to 4X higher throughput than full-KV inference while producing identical outputs.

C2weakest assumption

Compressed-KV decoding can be parallelized with full-KV swap because one is HBM-bandwidth-bound and the other is PCIe/network-bound, and the compressed KV cache often produces output similar to the full KV cache allowing a long drafting horizon to amortize each swap.

C3one line summary

VeriCache turns lossy KV cache compression into lossless LLM inference by drafting with compressed cache and verifying drafts with full cache, achieving up to 4x throughput with identical outputs.

References

85 extracted · 85 resolved · 12 Pith anchors

[1] Muhammad Adnan, Akhil Arunkumar, Gaurav Jain, Prashant J Nair, Ilya Soloveychik, and Purushotham Kamath. 2024. Keyformer: Kv cache reduction through key tokens selection for efficient generative infer 2024

[2] arXiv preprint arXiv:2410.18351 2024

[3] Amazon Web Services. 2025. Performance specifications for Amazon S3. https://docs.aws.amazon.com/AmazonS3/latest/ userguide/s3-files-performance.html. Accessed: 2026-04-16 2025

[4] Yuxuan Cai, Xiaozhuan Liang, Xinghua Wang, Jin Ma, Haijin Liang, Jinwen Luo, Xinyu Zuo, Lisheng Duan, Yuyang Yin, and Xi Chen

[5] arXiv:2509.18362 [cs.LG] https://arxiv.org/ abs/2509.18362

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:04:48.583788Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

37d35b08bf381faf9dbb46bb3161d3a309ffa9da2bfcdc609f0ac473be4a05f4

Aliases

arxiv: 2605.17613 · arxiv_version: 2605.17613v1 · doi: 10.48550/arxiv.2605.17613 · pith_short_12: G7JVWCF7HAP2 · pith_short_16: G7JVWCF7HAP27HN3 · pith_short_8: G7JVWCF7

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/G7JVWCF7HAP27HN3I25TCYOTUM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 37d35b08bf381faf9dbb46bb3161d3a309ffa9da2bfcdc609f0ac473be4a05f4

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "e959bc3e2213d0ef4b3f5827bce07ced4011e605cc285e908904109d420dc309",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AR",
    "submitted_at": "2026-05-17T19:18:39Z",
    "title_canon_sha256": "f17e8e43a9a9b82cf8e4f1ecc1104f52d5fe1fbd1ea5049711c2adf10a457ff9"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17613",
    "kind": "arxiv",
    "version": 1
  }
}