KVCodec uses GPU-native video codecs and pipelined fetching to compress and transmit KV caches, delivering up to 3.51x faster TTFT than prior methods while preserving accuracy.
Efficient kv cache spillover management on memory-constrained gpu for llm inference.IEEE Transactions on Parallel and Distributed Systems, 37(1):90–105, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Efficient Remote KV Cache Reuse with GPU-native Video Codec
KVCodec uses GPU-native video codecs and pipelined fetching to compress and transmit KV caches, delivering up to 3.51x faster TTFT than prior methods while preserving accuracy.