pith. sign in

arxiv: 2509.13930 · v3 · pith:ABL5MLPJnew · submitted 2025-09-17 · 💻 cs.CL

Linguistic Nepotism: Trading-off Quality for Language Preference in Multilingual RAG

classification 💻 cs.CL
keywords languagemodelslanguagescitationdocumentmultilingualpreferenceacross
0
0 comments X
read the original abstract

Multilingual Retrieval-Augmented Generation (mRAG) systems enable language models to answer knowledge-intensive queries with citation-supported responses across languages. Despite their growing use, an open questions is whether the mixture of different document languages impacts generation and citation behavior in unintended ways. To investigate this, we introduce a controlled methodology using model internals to measure language preference while holding other factors such as document relevance constant. Across eight languages and six open-weight models, we find that models preferentially cite English sources when queries are in English, with this bias amplified for lower-resource languages and for documents positioned mid-context. More crucially, we find that models sometimes trade-off document relevance for language preference, indicating that citation choices are not always driven by informativeness alone. Our findings shed light on how language models leverage multilingual context and influence citation behavior.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal

    cs.IR 2026-05 unverdicted novelty 6.0

    MLAIRE is a protocol that evaluates multilingual retrievers on both semantic accuracy and query-language preference using parallel passages and new metrics like LPR and Lang-nDCG, showing that standard metrics hide di...