ZipMoE delivers up to 72.77% lower inference latency and 6.76x higher throughput for on-device MoE models via lossless compression and cache-affinity scheduling with a claimed provable guarantee.
URL https://www.aclweb
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling
ZipMoE delivers up to 72.77% lower inference latency and 6.76x higher throughput for on-device MoE models via lossless compression and cache-affinity scheduling with a claimed provable guarantee.