Mix-MoE applies separate LM and MT expert groups in two post-pretraining stages with Fourier-enhanced routing to reduce parameter interference and improve multilingual MT over baselines.
Distributed learning of mixtures of experts,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs
Mix-MoE applies separate LM and MT expert groups in two post-pretraining stages with Fourier-enhanced routing to reduce parameter interference and improve multilingual MT over baselines.