{"state_type":"pith_open_graph_state","state_version":"1.0","pith_number":"pith:2026:BRGDBQMQYHH2KATHMSVTRIFOYH","merge_version":"pith-open-graph-merge-v1","event_count":2,"valid_event_count":2,"invalid_event_count":0,"equivocation_count":0,"current":{"canonical_record":{"metadata":{"abstract_canon_sha256":"f08ff89cbdbe68f8636c04cdccf0794d76aeda45bd585a9030864a16073293ac","cross_cats_sorted":["cs.CL"],"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2026-05-13T18:58:16Z","title_canon_sha256":"b63cf5e4b12fd45755a0039792d5d4283a0d78127de450b0c7ff9c7ae68cb99e"},"schema_version":"1.0","source":{"id":"2605.14037","kind":"arxiv","version":1}},"source_aliases":[{"alias_kind":"arxiv","alias_value":"2605.14037","created_at":"2026-05-17T23:39:12Z"},{"alias_kind":"arxiv_version","alias_value":"2605.14037v1","created_at":"2026-05-17T23:39:12Z"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.14037","created_at":"2026-05-17T23:39:12Z"},{"alias_kind":"pith_short_12","alias_value":"BRGDBQMQYHH2","created_at":"2026-05-18T12:33:37Z"},{"alias_kind":"pith_short_16","alias_value":"BRGDBQMQYHH2KATH","created_at":"2026-05-18T12:33:37Z"},{"alias_kind":"pith_short_8","alias_value":"BRGDBQMQ","created_at":"2026-05-18T12:33:37Z"}],"graph_snapshots":[{"event_id":"sha256:2e322cff5bc28e278454413e45b631d770b62b26a3e9c14311e430414ae7f567","target":"graph","created_at":"2026-05-17T23:39:12Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"graph_snapshot":{"author_claims":{"count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","strong_count":0},"builder_version":"pith-number-builder-2026-05-17-v1","claims":{"count":4,"items":[{"attestation":"unclaimed","claim_id":"C1","kind":"strongest_claim","source":"verdict.strongest_claim","status":"machine_extracted","text":"SP-KV performs dynamic sparsification: the mechanism adapts to the input and typically reduces the KV cache size by a factor of 3 to 10×, longer sequences often being more compressible. This leads to vast improvements in memory usage and decoding speed, with little to no degradation of validation loss nor performance on a broad set of downstream tasks."},{"attestation":"unclaimed","claim_id":"C2","kind":"weakest_assumption","source":"verdict.weakest_assumption","status":"machine_extracted","text":"A lightweight utility predictor trained jointly with the LLM using only next-token prediction loss can accurately forecast which KV pairs will be needed in the future without introducing meaningful errors or extra overhead."},{"attestation":"unclaimed","claim_id":"C3","kind":"one_line_summary","source":"verdict.one_line_summary","status":"machine_extracted","text":"SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss."},{"attestation":"unclaimed","claim_id":"C4","kind":"headline","source":"verdict.pith_extraction.headline","status":"machine_extracted","text":"A lightweight utility predictor scores each key-value pair and decides whether to retain it in the cache, achieving dynamic 3- to 10-fold compression."}],"snapshot_sha256":"5895a554d93dd7bf39b8d210740754960c0a04a10cc3f84556cccd2eb438b39c"},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"paper":{"abstract_excerpt":"Under modern test-time compute and agentic paradigms, language models process ever-longer sequences. Efficient text generation with transformer architectures is increasingly constrained by the Key-Value cache memory footprint and bandwidth. To address this limitation, we introduce Self-Pruned Key-Value Attention (SP-KV), a mechanism designed to predict future KV utility in order to reduce the size of the long-term KV cache. This strategy operates at a fine granularity: a lightweight utility predictor scores each key-value pair, and while recent KVs are always available via a local window, olde","authors_text":"2), (2) MICS, CentraleSup\\'elec), Gergely Szilvasy (1), Herv\\'e J\\'egou (1) ((1) Meta FAIR, Lo\\\"ic Cabannes (1), Manuel Faysse (1, Maria Lomeli (1), Matthijs Douze (1), Pierre-Emmanuel Mazar\\'e (1), Wen-tau Yih (1)","cross_cats":["cs.CL"],"headline":"A lightweight utility predictor scores each key-value pair and decides whether to retain it in the cache, achieving dynamic 3- to 10-fold compression.","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2026-05-13T18:58:16Z","title":"Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility"},"references":{"count":82,"internal_anchors":13,"resolved_work":82,"sample":[{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":1,"title":"Ye, Zihao and Zheng, Lianmin and Chen, Tianqi and Ceze, Luis , journal=. Flash","work_id":"dcd54ce3-994e-4006-a2dd-b0bb17b4e97f","year":null},{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":2,"title":"Shah, Jay and Bikshandi, Ganesh and Zhang, Ying and Thakkar, Vijay and Ramani, Pradeep and Dao, Tri , journal=. Flash","work_id":"4fefbff5-13c2-47ba-b816-692314bdaf59","year":null},{"cited_arxiv_id":"2002.05202","doi":"","is_internal_anchor":true,"ref_index":3,"title":"GLU Variants Improve Transformer","work_id":"17d0763c-1016-41ab-a478-478e890765eb","year":2002},{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":4,"title":"Training with quantization noise for extreme ﬁxed-point compression","work_id":"1169a968-440c-4209-8438-fffcfb77faf4","year":2004},{"cited_arxiv_id":"","doi":"","is_internal_anchor":false,"ref_index":5,"title":"The journal of machine learning research , volume=","work_id":"b25e43d4-a73c-404a-b1ac-ebff0cbe4930","year":2014}],"snapshot_sha256":"f281cf959d3987eaf2f86b9331c8cddfb687b5727297dfb1fc8d6cc383cd3b51"},"source":{"id":"2605.14037","kind":"arxiv","version":1},"verdict":{"created_at":"2026-05-15T05:30:41.115983Z","id":"5262c577-e767-40ec-b20a-999eaf5d1f80","model_set":{"reader":"grok-4.3"},"one_line_summary":"SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.","pipeline_version":"pith-pipeline@v0.9.0","pith_extraction_headline":"A lightweight utility predictor scores each key-value pair and decides whether to retain it in the cache, achieving dynamic 3- to 10-fold compression.","strongest_claim":"SP-KV performs dynamic sparsification: the mechanism adapts to the input and typically reduces the KV cache size by a factor of 3 to 10×, longer sequences often being more compressible. This leads to vast improvements in memory usage and decoding speed, with little to no degradation of validation loss nor performance on a broad set of downstream tasks.","weakest_assumption":"A lightweight utility predictor trained jointly with the LLM using only next-token prediction loss can accurately forecast which KV pairs will be needed in the future without introducing meaningful errors or extra overhead."}},"verdict_id":"5262c577-e767-40ec-b20a-999eaf5d1f80"}}],"author_attestations":[],"timestamp_anchors":[],"storage_attestations":[],"citation_signatures":[],"replication_records":[],"corrections":[],"mirror_hints":[],"record_created":{"event_id":"sha256:1efd9b82435ccd1ca99dbc07ae7d873b11a1162a4b4ded68432e0564cce1fdb3","target":"record","created_at":"2026-05-17T23:39:12Z","signer":{"key_id":"pith-v1-2026-05","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","signer_id":"pith.science","signer_type":"pith_registry"},"payload":{"attestation_state":"computed","canonical_record":{"metadata":{"abstract_canon_sha256":"f08ff89cbdbe68f8636c04cdccf0794d76aeda45bd585a9030864a16073293ac","cross_cats_sorted":["cs.CL"],"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2026-05-13T18:58:16Z","title_canon_sha256":"b63cf5e4b12fd45755a0039792d5d4283a0d78127de450b0c7ff9c7ae68cb99e"},"schema_version":"1.0","source":{"id":"2605.14037","kind":"arxiv","version":1}},"canonical_sha256":"0c4c30c190c1cfa5026764ab38a0aec1e0fa12bc255c071aec4bc633005d0e53","receipt":{"algorithm":"ed25519","builder_version":"pith-number-builder-2026-05-17-v1","canonical_sha256":"0c4c30c190c1cfa5026764ab38a0aec1e0fa12bc255c071aec4bc633005d0e53","first_computed_at":"2026-05-17T23:39:12.781106Z","key_id":"pith-v1-2026-05","kind":"pith_receipt","last_reissued_at":"2026-05-17T23:39:12.781106Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54","receipt_version":"0.3","signature_b64":"eRR0THsEeFluc1d9wGr/dXUL7ZcmTRKKrrSBtTdstzYZ6gp205xWDehu29BMkfNeFIIuRLoDQmuaW76ygf0YCA==","signature_status":"signed_v1","signed_at":"2026-05-17T23:39:12.781670Z","signed_message":"canonical_sha256_bytes"},"source_id":"2605.14037","source_kind":"arxiv","source_version":1}}},"equivocations":[],"invalid_events":[],"applied_event_ids":["sha256:1efd9b82435ccd1ca99dbc07ae7d873b11a1162a4b4ded68432e0564cce1fdb3","sha256:2e322cff5bc28e278454413e45b631d770b62b26a3e9c14311e430414ae7f567"],"state_sha256":"6c4f5bd8f2c4b8f6faade10d082d7cceb68cbeeed1a2b259be6910a4ee05f7f4"}