MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs

· 2025 · cs.CL · arXiv 2504.02768

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

We introduce MultiBLiMP 1.0, a massively multilingual benchmark of linguistic minimal pairs, covering 101 languages and 2 types of subject-verb agreement, containing more than 128,000 minimal pairs. Our minimal pairs are created using a fully automated pipeline, leveraging the large-scale linguistic resources of Universal Dependencies and UniMorph. MultiBLiMP 1.0 evaluates abilities of LLMs at an unprecedented multilingual scale, and highlights the shortcomings of the current state-of-the-art in modelling low-resource languages.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Implicit Representations of Grammaticality in Language Models

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

Linear probes on LM hidden states detect grammaticality better than string probabilities, generalize to human benchmarks and other languages, and correlate weakly with likelihood.

Light or Full Verb? A Minimal-Pair Dataset for Probing Phraseological Competence in Language Models

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

Presents a minimal-pair dataset and reports that probing experiments show language models differentiate light-verb from full-verb uses even in minimal contexts with separable patterns by object type.

Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining

cs.CL · 2025-09-05 · unverdicted · novelty 6.0

Sparse crosscoders on LLM checkpoint triplets track emergence, maintenance, and discontinuation of linguistic features during pretraining via a new RelIE metric.

Different types of syntactic agreement recruit the same units within large language models

cs.CL · 2025-12-03 · unverdicted · novelty 5.0

Different types of syntactic agreement recruit overlapping units within LLMs, indicating that agreement forms a meaningful functional category across English, Russian, Chinese, and structurally similar languages.

Multilingual Vision-Language Models, A Survey

cs.CL · 2025-09-26 · accept · novelty 3.0

The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Implicit Representations of Grammaticality in Language Models cs.CL · 2026-05-06 · unverdicted · none · ref 12 · internal anchor
Linear probes on LM hidden states detect grammaticality better than string probabilities, generalize to human benchmarks and other languages, and correlate weakly with likelihood.
Light or Full Verb? A Minimal-Pair Dataset for Probing Phraseological Competence in Language Models cs.CL · 2026-06-03 · unverdicted · none · ref 17 · internal anchor
Presents a minimal-pair dataset and reports that probing experiments show language models differentiate light-verb from full-verb uses even in minimal contexts with separable patterns by object type.

MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer