pith. sign in
Pith Number

pith:XG4YV6DY

pith:2026:XG4YV6DYIGMNHPUDZTSAQ4NYN6
not attested not anchored not stored refs resolved

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

Bing Lu, Dejun Luo, Dingwen Tao, Guangming Tan, Hairui Zhao, Jinyang Liu, Wenjing Huang, Xingchen Liu, Xinyang Ma, Yida Gu, Zedong Liu, Zheng Wei

KVServe uses service-aware adaptive KV cache compression to cut latency bottlenecks in disaggregated LLM serving

arxiv:2605.13734 v1 · 2026-05-13 · cs.DC · cs.AI · cs.NI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{XG4YV6DYIGMNHPUDZTSAQ4NYN6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

KVServe achieves up to 9.13× JCT speedup in PD-separated serving and up to 32.8× TTFT reduction in KV-disaggregated serving through its service-aware adaptive compression framework.

C2weakest assumption

The analytical latency model combined with the bandit controller reliably selects profiles that match real-world performance despite offline-to-online mismatch, without introducing unacceptable quality degradation under varying SLO budgets.

C3one line summary

KVServe delivers up to 9.13x job completion time speedup and 32.8x time-to-first-token reduction by making KV cache compression service-aware and adaptive in disaggregated LLM serving.

References

55 extracted · 55 resolved · 8 Pith anchors

[1] Amazon Web Services. 2026. Amazon EC2 FAQs. https://aws.amazon. com/ec2/faqs/. (2026). Accessed: 2026-01-29 2026
[2] Muhammad Arslan, Hussam Ghanem, Saba Munawar, and Christophe Cruz. 2024. A Survey on RAG with LLMs.Procedia computer science 246 (2024), 3781–3790 2024
[3] Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, and James Hensman. 2024. Quarot: Outlier-free 4-bit inference in rotate 2024
[4] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding 2023 · arXiv:2308.14508
[5] Evaluating Large Language Models Trained on Code 2021 · arXiv:2107.03374
Receipt and verification
First computed 2026-05-18T02:44:16.530580Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b9b98af8784198d3be83cce40871b86f97d10b795b0a551c795c8e3556b7273c

Aliases

arxiv: 2605.13734 · arxiv_version: 2605.13734v1 · doi: 10.48550/arxiv.2605.13734 · pith_short_12: XG4YV6DYIGMN · pith_short_16: XG4YV6DYIGMNHPUD · pith_short_8: XG4YV6DY
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/XG4YV6DYIGMNHPUDZTSAQ4NYN6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b9b98af8784198d3be83cce40871b86f97d10b795b0a551c795c8e3556b7273c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "374391d0768aa528347fcdcd4f33b048d216928c6e5bbaf646f47adf9121ae1a",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.NI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.DC",
    "submitted_at": "2026-05-13T16:12:33Z",
    "title_canon_sha256": "abfac4b257da8df3eb18636150d96c40a12516311e307c3c4d79a7228842affa"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13734",
    "kind": "arxiv",
    "version": 1
  }
}