MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

Zhenpeng Su, Zijia Lin, Xue Bai, Xing Wu, Yizhe Xiong, Haoran Lian, Guangyuan Ma, Hui Chen, Guiguang Ding, Wei Zhou, et al · 2024 · arXiv 2407.09816

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

cs.AI · 2026-05-14 · conditional · novelty 6.0

BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.

Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

AIR-MoE introduces a two-stage inverted-index routing method based on vector quantization that approximates optimal expert selection for granular MoE models at lower cost and with empirical performance gains.

citing papers explorer

Showing 2 of 2 citing papers.

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE cs.AI · 2026-05-14 · conditional · none · ref 18
BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.
Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts cs.LG · 2026-05-06 · unverdicted · none · ref 45
AIR-MoE introduces a two-stage inverted-index routing method based on vector quantization that approximates optimal expert selection for granular MoE models at lower cost and with empirical performance gains.

MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

fields

years

verdicts

representative citing papers

citing papers explorer