EPnG reallocates LoRA capacity in MoE models by pruning experts with low router gate probabilities and expanding high-importance ones via rank growth, outperforming standard LoRA and nearing full fine-tuning performance with 0.55-0.72% parameters updated.
Scalable and Efficient MoE Training for Multitask Multilingual Models
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3representative citing papers
DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.
Moderate pruning of MoE models preserves in-domain biomedical utility and reliability but both degrade rapidly in cross-domain settings and at extreme pruning ratios.
citing papers explorer
-
EPnG: Adaptive Expert Prune-and-Grow for Parameter-Efficient MoE Fine-tuning
EPnG reallocates LoRA capacity in MoE models by pruning experts with low router gate probabilities and expanding high-importance ones via rank growth, outperforming standard LoRA and nearing full fine-tuning performance with 0.55-0.72% parameters updated.
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.
-
On the Utility and Factual Reliability of Pruned Mixture-of-Experts Models in the Biomedical Domain
Moderate pruning of MoE models preserves in-domain biomedical utility and reliability but both degrade rapidly in cross-domain settings and at extreme pruning ratios.