Adamoe: Token-adaptive routing with null experts for mixture-of-experts language models

Association for Computational Linguistics · 2024 · DOI 10.18653/v1/2024.findings-emnlp.361

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts

cs.LG · 2026-06-01 · unverdicted · novelty 6.0

ProbMoE frames MoE routing as probabilistic inference over cardinality-constrained subsets, enabling Exact-k sampling with marginal-probability gradients and a dynamic-k variant that matches training and inference cardinality.

Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts

cs.CL · 2025-09-26 · unverdicted · novelty 6.0

EMoE trains MoE models so they maintain performance when the number of activated experts changes at inference, expanding the usable range to 2-3 times the training k with higher peak results.

citing papers explorer

Showing 2 of 2 citing papers.

ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts cs.LG · 2026-06-01 · unverdicted · none · ref 13
ProbMoE frames MoE routing as probabilistic inference over cardinality-constrained subsets, enabling Exact-k sampling with marginal-probability gradients and a dynamic-k variant that matches training and inference cardinality.
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts cs.CL · 2025-09-26 · unverdicted · none · ref 41
EMoE trains MoE models so they maintain performance when the number of activated experts changes at inference, expanding the usable range to 2-3 times the training k with higher peak results.

Adamoe: Token-adaptive routing with null experts for mixture-of-experts language models

fields

years

verdicts

representative citing papers

citing papers explorer