Light or Full Verb? A Minimal-Pair Dataset for Probing Phraseological Competence in Language Models

Francesca Franzon; Leo Wanner; Nicolas Ros\`as G\'omez

arxiv: 2606.05087 · v1 · pith:WSUWHAJOnew · submitted 2026-06-03 · 💻 cs.CL

Light or Full Verb? A Minimal-Pair Dataset for Probing Phraseological Competence in Language Models

Francesca Franzon , Nicolas Ros\`as G\'omez , Leo Wanner This is my paper

classification 💻 cs.CL

keywords datasetlanguagemakemodelscontextsenglishfulllight-verb

0 comments

read the original abstract

Frequent English verbs such as 'have' and 'make' can function either as collocates in light-verb constructions or as full lexical predicates, as in 'make a decision' vs. 'make a cake'. Whether language models represent this distinction remains unclear. We introduce a large-scale controlled dataset of minimally varying English sentence series in which the same context contains the same verb in light-verb and full-verb uses. Two probing experiments show that language models differentiate between these uses even in minimal contexts and exhibit separable patterns across object types. We release the dataset, generation code, and materials as a reusable resource. The framework supports extensions to broader contexts, additional verbs, and other languages.

This paper has not been read by Pith yet.

Light or Full Verb? A Minimal-Pair Dataset for Probing Phraseological Competence in Language Models

discussion (0)