InProceedings of the 62nd Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers), pages 6159–6172, Bangkok, Thailand

Not all experts are equal: Efficient expert pruning, skipping for mixture-of-experts large language models · 2017

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

Alloc-MoE allocates a fixed expert activation budget using layer-level dynamic programming based on sensitivity and token-level score-based redistribution, delivering 1.15x prefill and 1.34x decode speedups on DeepSeek-V2-Lite at half the original budget while preserving performance.

citing papers explorer

Showing 1 of 1 citing paper.

Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference cs.LG · 2026-04-09 · unverdicted · none · ref 3
Alloc-MoE allocates a fixed expert activation budget using layer-level dynamic programming based on sensitivity and token-level score-based redistribution, delivering 1.15x prefill and 1.34x decode speedups on DeepSeek-V2-Lite at half the original budget while preserving performance.

InProceedings of the 62nd Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers), pages 6159–6172, Bangkok, Thailand

fields

years

verdicts

representative citing papers

citing papers explorer