ProbMoE frames MoE routing as probabilistic inference over cardinality-constrained subsets, enabling Exact-k sampling with marginal-probability gradients and a dynamic-k variant that matches training and inference cardinality.
Adamoe: Token-adaptive routing with null experts for mixture-of-experts language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
EMoE trains MoE models so they maintain performance when the number of activated experts changes at inference, expanding the usable range to 2-3 times the training k with higher peak results.
citing papers explorer
-
ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts
ProbMoE frames MoE routing as probabilistic inference over cardinality-constrained subsets, enabling Exact-k sampling with marginal-probability gradients and a dynamic-k variant that matches training and inference cardinality.
-
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
EMoE trains MoE models so they maintain performance when the number of activated experts changes at inference, expanding the usable range to 2-3 times the training k with higher peak results.