Sparamx: Accelerating compressed llms token generation on amx-powered cpus.arXiv preprint arXiv:2502.12444, 2025

Ahmed F AbouElhamayed, Jordan Dotzel, Yash Akhauri, Chi-Chih Chang, Sameh Gobriel, J Pablo Muñoz, Vui Seng Chua, Nilesh Jain, Mohamed S Abdelfattah · 2025 · arXiv 2502.12444

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

CacheClip: Accelerating RAG with Effective KV Cache Reuse

cs.LG · 2025-10-11 · unverdicted · novelty 6.0

CacheClip accelerates RAG prefill by up to 3.33x via auxiliary-model-guided selective KV recomputation while retaining 85-91% of full-attention quality on NIAH and LongBench.

citing papers explorer

Showing 1 of 1 citing paper.

CacheClip: Accelerating RAG with Effective KV Cache Reuse cs.LG · 2025-10-11 · unverdicted · none · ref 40
CacheClip accelerates RAG prefill by up to 3.33x via auxiliary-model-guided selective KV recomputation while retaining 85-91% of full-attention quality on NIAH and LongBench.

Sparamx: Accelerating compressed llms token generation on amx-powered cpus.arXiv preprint arXiv:2502.12444, 2025

fields

years

verdicts

representative citing papers

citing papers explorer