One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

· 2026 · cs.CL · arXiv 2604.02881

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Weight-space model merging combines independently fine-tuned models without accessing original training data, offering a practical alternative to joint training. While merging succeeds in multitask settings, its behavior in multilingual contexts remains poorly understood. We systematically study weight-space merging for multilingual machine translation by fully fine-tuning language model on large-scale bilingual corpora and evaluating standard merging strategies. Our experiments reveal that merging degrades performance, especially when target languages differ. To explain this failure, we analyze internal representations using span-conditioned neuron selectivity and layer-wise centered kernel alignment. We find that language-specific neurons concentrate in embedding layers and upper transformer blocks, while intermediate layers remain largely shared across languages. Critically, fine-tuning redistributes rather than sharpens language selectivity: neurons for supervised and related languages become less exclusive, while those for unsupervised languages grow more isolated. This redistribution increases representational divergence in higher layers that govern generation. These findings suggest that multilingual fine-tuning may reshape geometry in ways that reduce compatibility with standard weight-space merging assumptions. Our work thus provides an explanation for why merging fails in multilingual translation scenarios.

representative citing papers

On the Limits of Model Merging for Multilinguality in Pre-Training

cs.CL · 2026-05-25 · unverdicted · novelty 5.0

Merging any combination of monolingual pre-trained models leads to performance collapse due to interference, indicating that merging flexibility from fine-tuning does not extend to pre-training.

citing papers explorer

Showing 1 of 1 citing paper.

On the Limits of Model Merging for Multilinguality in Pre-Training cs.CL · 2026-05-25 · unverdicted · none · ref 17 · internal anchor
Merging any combination of monolingual pre-trained models leads to performance collapse due to interference, indicating that merging flexibility from fine-tuning does not extend to pre-training.

One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

fields

years

verdicts

representative citing papers

citing papers explorer