DECO is a sparse MoE architecture with ReLU-based routing, learnable expert scaling, and NormSiLU activation that matches dense Transformer performance at 20% expert activation and delivers 2.93x speedup on Jetson AGX Orin.
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
MOSAIC recovers identifiable latent variables and their sparse associated observations in scientific time series by combining temporal causal representation learning with support recovery through a sparse additive decoder.
citing papers explorer
-
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
DECO is a sparse MoE architecture with ReLU-based routing, learnable expert scaling, and NormSiLU activation that matches dense Transformer performance at 20% expert activation and delivers 2.93x speedup on Jetson AGX Orin.
-
MOSAIC: Module Discovery via Sparse Additive Identifiable Causal Learning for Scientific Time Series
MOSAIC recovers identifiable latent variables and their sparse associated observations in scientific time series by combining temporal causal representation learning with support recovery through a sparse additive decoder.