DECO matches dense model performance at 20% expert activation via ReLU-based routing with learnable scaling and the NormSiLU activation, plus a 3x real-hardware speedup.
Think you have Solved Question Answering?
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
DECO matches dense model performance at 20% expert activation via ReLU-based routing with learnable scaling and the NormSiLU activation, plus a 3x real-hardware speedup.