Finding fantastic experts in moes: A unified study for expert dropping strategies and observations

Finding fantastic experts in moes: A unified study for expert dropping strategies, observations · 2025 · arXiv 2504.05586

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

EvoESAP: Non-Uniform Expert Pruning for Sparse MoE

cs.LG · 2026-03-06 · conditional · novelty 7.0

EvoESAP uses evolutionary search guided by a speculative-decoding-inspired ESAP metric to discover non-uniform layer-wise sparsity allocations for MoE expert pruning, improving generation accuracy up to 19.6% at 50% sparsity.

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

cs.LG · 2026-05-09 · unverdicted · novelty 5.0 · 2 refs

Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.

On the Utility and Factual Reliability of Pruned Mixture-of-Experts Models in the Biomedical Domain

cs.LG · 2026-07-01 · unverdicted · novelty 4.0

Moderate pruning of MoE models preserves in-domain biomedical utility and reliability but both degrade rapidly in cross-domain settings and at extreme pruning ratios.

citing papers explorer

Showing 3 of 3 citing papers.

EvoESAP: Non-Uniform Expert Pruning for Sparse MoE cs.LG · 2026-03-06 · conditional · none · ref 23
EvoESAP uses evolutionary search guided by a speculative-decoding-inspired ESAP metric to discover non-uniform layer-wise sparsity allocations for MoE expert pruning, improving generation accuracy up to 19.6% at 50% sparsity.
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training cs.LG · 2026-05-09 · unverdicted · none · ref 35 · 2 links
Pruning pretrained MoE models outperforms training from scratch under fixed budget, different expert compression methods converge after continued training, and progressive pruning plus multi-token KD improves the final 23A2B model.
On the Utility and Factual Reliability of Pruned Mixture-of-Experts Models in the Biomedical Domain cs.LG · 2026-07-01 · unverdicted · none · ref 8
Moderate pruning of MoE models preserves in-domain biomedical utility and reliability but both degrade rapidly in cross-domain settings and at extreme pruning ratios.

Finding fantastic experts in moes: A unified study for expert dropping strategies and observations

fields

years

verdicts

representative citing papers

citing papers explorer