Lynx exploits training-induced batch-level expert activation skews via AffinityBinning to reduce invoked experts per batch, delivering up to 1.30x throughput with under 1% accuracy loss across four model families.
org/CorpusID:249240535
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
GRAPE is a global redundancy-aware pruning strategy for sparse MoEs that dynamically allocates pruning budgets across layers and improves average accuracy by 1.40% over the best local baseline across tested models and settings.
FLEX-MoE proposes client-expert fitness scores and an optimization algorithm to jointly maximize specialization and enforce balanced expert utilization in federated MoE for edge computing under non-IID data and capacity constraints.
citing papers explorer
-
Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection
Lynx exploits training-induced batch-level expert activation skews via AffinityBinning to reduce invoked experts per batch, delivering up to 1.30x throughput with under 1% accuracy loss across four model families.
-
Does a Global Perspective Help Prune Sparse MoEs Elegantly?
GRAPE is a global redundancy-aware pruning strategy for sparse MoEs that dynamically allocates pruning budgets across layers and improves average accuracy by 1.40% over the best local baseline across tested models and settings.
-
FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment for Edge Computing
FLEX-MoE proposes client-expert fitness scores and an optimization algorithm to jointly maximize specialization and enforce balanced expert utilization in federated MoE for edge computing under non-IID data and capacity constraints.