Pre-trained MoE models exhibit up to 90% intra-expert activation sparsity that enables up to 2.5x faster MoE layer execution when exploited in the vLLM inference system.
Gaussian error linear units (GELUs)
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A diffusion model variant that adds structured non-zero-mean noise via modified forward/reverse processes, yielding an ELBO loss analogous to offset noise but with time-dependent coefficients, and showing gains on synthetic high-dimensional data.
citing papers explorer
-
Uncovering Intra-expert Activation Sparsity for Efficient Mixture-of-Expert Model Execution
Pre-trained MoE models exhibit up to 90% intra-expert activation sparsity that enables up to 2.5x faster MoE layer execution when exploited in the vLLM inference system.
-
A Probabilistic Formulation of Offset Noise in Diffusion Models
A diffusion model variant that adds structured non-zero-mean noise via modified forward/reverse processes, yielding an ELBO loss analogous to offset noise but with time-dependent coefficients, and showing gains on synthetic high-dimensional data.