SpecMoE uses self-assisted speculative decoding on MoE models to deliver up to 4.3x higher inference throughput and lower memory and interconnect bandwidth use without retraining.
Findings of the 2014 Workshop on Statistical Ma- chine Translation
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SpecMoE: A Fast and Efficient Mixture-of-Experts Inference via Self-Assisted Speculative Decoding
SpecMoE uses self-assisted speculative decoding on MoE models to deliver up to 4.3x higher inference throughput and lower memory and interconnect bandwidth use without retraining.