← back to paper
arxiv: 2605.08738 · 2 revisions
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training