XNLI: Evaluating Cross-lingual Sentence Representations

Adina Williams; Alexis Conneau; Guillaume Lample; Holger Schwenk; Ruty Rinott; Samuel R. Bowman; Veselin Stoyanov

arxiv: 1809.05053 · v1 · pith:HFEA623Enew · submitted 2018-09-13 · 💻 cs.CL · cs.AI· cs.LG

XNLI: Evaluating Cross-lingual Sentence Representations

Alexis Conneau , Guillaume Lample , Ruty Rinott , Adina Williams , Samuel R. Bowman , Holger Schwenk , Veselin Stoyanov This is my paper

classification 💻 cs.CL cs.AIcs.LG

keywords languagedatacross-lingualevaluationsentenceunderstandingxnlibaselines

0 comments

read the original abstract

State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models. These models are generally trained on data in a single language (usually English), and cannot be directly used beyond that language. Since collecting data in every language is not realistic, there has been a growing interest in cross-lingual language understanding (XLU) and low-resource cross-language transfer. In this work, we construct an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus (MultiNLI) to 15 languages, including low-resource languages such as Swahili and Urdu. We hope that our dataset, dubbed XNLI, will catalyze research in cross-lingual sentence understanding by providing an informative standard evaluation task. In addition, we provide several baselines for multilingual sentence understanding, including two based on machine translation systems, and two that use parallel data to train aligned multilingual bag-of-words and LSTM encoders. We find that XNLI represents a practical and challenging evaluation suite, and that directly translating the test data yields the best performance among available baselines.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FLEXITOKENS: Flexible Tokenization for Evolving Language Models
cs.CL 2025-07 unverdicted novelty 7.0

FLEXITOKENS replaces rigid subword tokenizers and fixed-compression auxiliary losses with a simplified boundary-prediction objective in byte-level models, yielding lower over-fragmentation and up to 10-point gains on ...
OPT: Open Pre-trained Transformer Language Models
cs.CL 2022-05 unverdicted novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
LIMO: Less is More for Reasoning
cs.CL 2025-02 unverdicted novelty 6.0

LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already ...
The Falcon Series of Open Language Models
cs.CL 2023-11 conditional novelty 6.0

Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.