pith. sign in
Pith Number

pith:Z4MBO6YK

pith:2026:Z4MBO6YKC3ON2JYN5WVCJJK75X
not attested not anchored not stored refs resolved

KV Cache Offloading for Context-Intensive Tasks

Andrey Bocharnikov, Denis Kuznedelev, Ivan Ermakov, Vyacheslav Zhdanovskiy, Yegor Yershov

KV-cache offloading causes major accuracy losses on tasks that require pulling lots of details from long inputs, but a simpler alternative recovers performance across models.

arxiv:2604.08426 v4 · 2026-04-09 · cs.LG · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{Z4MBO6YKC3ON2JYN5WVCJJK75X}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Existing KV-cache offloading techniques produce significant performance degradation on context-intensive tasks; a simpler alternative strategy significantly improves accuracy across multiple LLM families and benchmarks.

C2weakest assumption

The assumption that the observed accuracy drops are caused primarily by low-rank key projections and unreliable landmarks rather than by other implementation details of the offloading systems or by the specific choice of evaluation prompts and metrics.

C3one line summary

KV offloading degrades accuracy on context-intensive tasks due to low-rank key projections and unreliable landmarks; a simpler alternative improves results across models and benchmarks.

References

66 extracted · 66 resolved · 5 Pith anchors

[1] R. Y . Aminabadi, S. Rajbhandari, M. Zhang, A. A. Awan, C. Li, D. Li, E. Zheng, J. Rasley, S. Smith, O. Ruwase, and Y . He. Deepspeed inference: Enabling efficient inference of trans- former models at 2022
[2] S. Ananthanarayanan and A. Sengupta. Understanding the physics of key-value cache compres- sion for LLMs through attention dynamics.arXiv preprint arXiv:2603.01426, 2026 2026
[3] S. Ananthanarayanan, A. Sengupta, and T. Chakraborty. Understanding the physics of key-value cache compression for llms through attention dynamics, 2026 2026
[4] S. Ashkboos, A. Mohtashami, M. L. Croci, B. Li, P. Cameron, M. Jaggi, D. Alistarh, T. Hoefler, and J. Hensman. Quarot: Outlier-free 4-bit inference in rotated llms.Advances in Neural Information Proce 2024
[5] Y . Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, Z. Du, X. Liu, A. Zeng, L. Hou, Y . Dong, J. Tang, and J. Li. Longbench: A bilingual, multitask benchmark for long context understanding. InProceed 2024

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-20T00:01:41.123824Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

cf18177b0a16dcdd270dedaa24a55fede162bbe452ca8530d2a411ac612a2100

Aliases

arxiv: 2604.08426 · arxiv_version: 2604.08426v4 · doi: 10.48550/arxiv.2604.08426 · pith_short_12: Z4MBO6YKC3ON · pith_short_16: Z4MBO6YKC3ON2JYN · pith_short_8: Z4MBO6YK
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/Z4MBO6YKC3ON2JYN5WVCJJK75X \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: cf18177b0a16dcdd270dedaa24a55fede162bbe452ca8530d2a411ac612a2100
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "87d9201feef55e531e4c7771d4acb9404df4e293b23a2de9748b705a17c2adf4",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-04-09T16:30:44Z",
    "title_canon_sha256": "a95638f90930d2a3d264e375f1f001c33755d2ca4274281faf5cafcd6bac51d3"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.08426",
    "kind": "arxiv",
    "version": 4
  }
}