Pre-trained MoE models exhibit deep-layer routing collapse for low-resource languages like Hebrew, largely corrected by continual pre-training on balanced bilingual data, with consistent patterns observed in Japanese.
Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT) , year=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Mixture of Experts for Low-Resource LLMs
Pre-trained MoE models exhibit deep-layer routing collapse for low-resource languages like Hebrew, largely corrected by continual pre-training on balanced bilingual data, with consistent patterns observed in Japanese.