SplitZip is a new GPU-friendly lossless compressor for KV cache tensors that exploits exponent redundancy to achieve over 600 GB/s compression throughput and up to 1.32x faster transfers in disaggregated LLM serving.
Zipnn: Lossless compression for ai models
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.DC 4years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
ZipCCL delivers up to 1.35x faster communication and 1.18x end-to-end speedup in LLM training through lossless compression of near-Gaussian collectives on 64-GPU clusters.
BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.
TStore reduces AI model storage via tensor-level fingerprinting, clustering, and compression without annotations while claiming to preserve usability.
citing papers explorer
-
SplitZip: Ultra Fast Lossless KV Compression for Disaggregated LLM Serving
SplitZip is a new GPU-friendly lossless compressor for KV cache tensors that exploits exponent redundancy to achieve over 600 GB/s compression throughput and up to 1.32x faster transfers in disaggregated LLM serving.
-
ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training
ZipCCL delivers up to 1.35x faster communication and 1.18x end-to-end speedup in LLM training through lossless compression of near-Gaussian collectives on 64-GPU clusters.
-
Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization
BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.
-
TStore: Rethinking AI Model Hub with Tensor-Centric Compression
TStore reduces AI model storage via tensor-level fingerprinting, clustering, and compression without annotations while claiming to preserve usability.