SpecMoE uses self-assisted speculative decoding on MoE models to deliver up to 4.3x higher inference throughput and lower memory and interconnect bandwidth use without retraining.
Fast Inference from Trans- formers via Speculative Decoding
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SpecMoE: A Fast and Efficient Mixture-of-Experts Inference via Self-Assisted Speculative Decoding
SpecMoE uses self-assisted speculative decoding on MoE models to deliver up to 4.3x higher inference throughput and lower memory and interconnect bandwidth use without retraining.