A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.
Are All Languages Created Equal in Multilingual BERT ?
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Parameter alignment strategies substantially reduce forgetting in family-based continual pretraining of multilingual LLMs across 32 languages with minimal impact on language acquisition.
Replacing tokens, freezing the corresponding embeddings, and tuning the rest of the model improves NLU performance on low-resource languages compared to full fine-tuning.
citing papers explorer
-
DEPART: DEcomposing PARiTy across Multilingual LLMs
A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.
-
Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models
Parameter alignment strategies substantially reduce forgetting in family-based continual pretraining of multilingual LLMs across 32 languages with minimal impact on language acquisition.
-
Modular Monolingual Adaptation using Pretrained Language Models
Replacing tokens, freezing the corresponding embeddings, and tuning the rest of the model improves NLU performance on low-resource languages compared to full fine-tuning.