Are All Languages Created Equal in Multilingual BERT ?

Shijie Wu, Mark Dredze · 2020 · DOI 10.18653/v1/2020.repl4nlp-1.16

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

DEPART: DEcomposing PARiTy across Multilingual LLMs

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.

Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models

cs.CL · 2026-05-29 · unverdicted · novelty 5.0

Parameter alignment strategies substantially reduce forgetting in family-based continual pretraining of multilingual LLMs across 32 languages with minimal impact on language acquisition.

Modular Monolingual Adaptation using Pretrained Language Models

cs.CL · 2026-06-04 · unverdicted · novelty 4.0

Replacing tokens, freezing the corresponding embeddings, and tuning the rest of the model improves NLU performance on low-resource languages compared to full fine-tuning.

citing papers explorer

Showing 3 of 3 citing papers.

DEPART: DEcomposing PARiTy across Multilingual LLMs cs.CL · 2026-05-27 · unverdicted · none · ref 41
A Bayesian framework decomposes mLLM variance, showing language features explain 79-92% of language identity variance and that model identity vs. benchmark-model interactions dominate differently for understanding versus reasoning tasks.
Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models cs.CL · 2026-05-29 · unverdicted · none · ref 11
Parameter alignment strategies substantially reduce forgetting in family-based continual pretraining of multilingual LLMs across 32 languages with minimal impact on language acquisition.
Modular Monolingual Adaptation using Pretrained Language Models cs.CL · 2026-06-04 · unverdicted · none · ref 78
Replacing tokens, freezing the corresponding embeddings, and tuning the rest of the model improves NLU performance on low-resource languages compared to full fine-tuning.

Are All Languages Created Equal in Multilingual BERT ?

fields

years

verdicts

representative citing papers

citing papers explorer