XBCP benchmark shows deep research agents and multilingual retrievers lose accuracy, recall, calibration, and citation reliability when evidence is in non-English languages, even with gold evidence provided.
Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Question Answering Task
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus
XBCP benchmark shows deep research agents and multilingual retrievers lose accuracy, recall, calibration, and citation reliability when evidence is in non-English languages, even with gold evidence provided.