Maskmoe: Boosting token-level learning via routing mask in mixture-of-experts.arXiv preprint arXiv:2407.09816

Zhenpeng Su, Zijia Lin, Xue Bai, Xing Wu, Yizhe Xiong, Haoran Lian, Guangyuan Ma, Hui Chen, Guiguang Ding, Wei Zhou, et al · 2024 · arXiv 2407.09816

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

cs.AI · 2026-05-14 · conditional · novelty 6.0

BEAM uses binary expert activation masks trained end-to-end to achieve dynamic sparsity in MoE models, cutting FLOPs by 85% with over 98% performance retention.

Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

AIR-MoE introduces a two-stage inverted-index routing method based on vector quantization that approximates optimal expert selection for granular MoE models at lower cost and with empirical performance gains.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts cs.LG · 2026-05-06 · unverdicted · none · ref 45
AIR-MoE introduces a two-stage inverted-index routing method based on vector quantization that approximates optimal expert selection for granular MoE models at lower cost and with empirical performance gains.

Maskmoe: Boosting token-level learning via routing mask in mixture-of-experts.arXiv preprint arXiv:2407.09816

fields

years

verdicts

representative citing papers

citing papers explorer