DECO is a sparse MoE architecture with ReLU-based routing, learnable expert scaling, and NormSiLU activation that matches dense Transformer performance at 20% expert activation and delivers 2.93x speedup on Jetson AGX Orin.
arXiv preprint arXiv:2306.03745 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Teacher-guided routing supplies pseudo-supervision from a dense model's intermediate features to stabilize expert selection in sparse vision MoE models.
citing papers explorer
-
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
DECO is a sparse MoE architecture with ReLU-based routing, learnable expert scaling, and NormSiLU activation that matches dense Transformer performance at 20% expert activation and delivers 2.93x speedup on Jetson AGX Orin.
-
Teacher-Guided Routing for Sparse Vision Mixture-of-Experts
Teacher-guided routing supplies pseudo-supervision from a dense model's intermediate features to stabilize expert selection in sparse vision MoE models.