Code-switching creates a fundamental performance bottleneck for multilingual retrievers, causing drops of up to 27% on new benchmarks CSR-L and CS-MTEB, with embedding divergence as the key cause and vocabulary expansion insufficient to fix it.
Evaluating Large Language Models for Cross-Lingual Retrieval
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
other 1polarities
unclear 1representative citing papers
XBCP benchmark shows deep research agents and multilingual retrievers lose accuracy, recall, calibration, and citation reliability when evidence is in non-English languages, even with gold evidence provided.
citing papers explorer
-
Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers
Code-switching creates a fundamental performance bottleneck for multilingual retrievers, causing drops of up to 27% on new benchmarks CSR-L and CS-MTEB, with embedding divergence as the key cause and vocabulary expansion insufficient to fix it.
-
Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus
XBCP benchmark shows deep research agents and multilingual retrievers lose accuracy, recall, calibration, and citation reliability when evidence is in non-English languages, even with gold evidence provided.