SpecMoE uses self-assisted speculative decoding on MoE models to deliver up to 4.3x higher inference throughput and lower memory and interconnect bandwidth use without retraining.
Gemini 1.5: Unlocking Multimodal Understanding Across Mil- lions of Tokens of Context
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SpecMoE: A Fast and Efficient Mixture-of-Experts Inference via Self-Assisted Speculative Decoding
SpecMoE uses self-assisted speculative decoding on MoE models to deliver up to 4.3x higher inference throughput and lower memory and interconnect bandwidth use without retraining.