Holmes is a probing benchmark compiling over 200 datasets from 270 studies to evaluate linguistic competence across syntax, morphology, semantics, reasoning, and discourse in more than 50 language models.
In Proceed- ings of the 58th Annual Meeting of the As- sociation for Computational Linguistics , pages 7871–7880, Online
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2024 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models
Holmes is a probing benchmark compiling over 200 datasets from 270 studies to evaluate linguistic competence across syntax, morphology, semantics, reasoning, and discourse in more than 50 language models.