pith. sign in

arxiv: 2606.05087 · v1 · pith:WSUWHAJOnew · submitted 2026-06-03 · 💻 cs.CL

Light or Full Verb? A Minimal-Pair Dataset for Probing Phraseological Competence in Language Models

Pith reviewed 2026-06-28 06:38 UTC · model grok-4.3

classification 💻 cs.CL
keywords light-verb constructionsfull verbsminimal-pair datasetphraseological competencelanguage modelsprobing experimentscollocationsEnglish verbs
0
0 comments X

The pith

Language models distinguish light-verb from full-verb uses of the same verb in minimal sentence contexts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a controlled dataset of English sentence series in which the same verb and surrounding context appear in both light-verb constructions and full lexical uses. Two probing experiments test whether models register this difference. The results indicate that models do separate the two uses even when context is minimal and that their responses form distinct patterns according to the object paired with the verb. The authors release the full dataset, generation code, and materials for reuse and extension.

Core claim

Using a large-scale controlled dataset of minimally varying English sentence series in which the same context contains the same verb in light-verb and full-verb uses, the study shows that language models differentiate between these uses even in minimal contexts and exhibit separable patterns across object types.

What carries the argument

Minimal-pair sentence series that keep the verb and surrounding context constant while varying only its role as light verb or full predicate.

If this is right

  • Models register phraseological distinctions without additional contextual cues beyond the verb-object pair.
  • Model behavior varies systematically with object type, reflecting sensitivity to collocational preferences.
  • The released dataset and code enable controlled testing of the same distinction for additional verbs and languages.
  • Phraseological competence can be isolated and measured through minimal contrasts rather than varied full sentences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The observed separation may arise from statistical regularities in training data rather than explicit semantic rules.
  • Applying the same minimal-pair method to other languages could test whether the differentiation generalizes beyond English.
  • Fine-tuning models on this dataset might improve handling of idiomatic versus literal verb uses in downstream tasks.

Load-bearing premise

The minimal-pair sentences isolate only the light-verb versus full-verb distinction without other uncontrolled linguistic factors affecting model behavior.

What would settle it

If language models produced statistically indistinguishable outputs or internal representations for the light-verb and full-verb versions of the same minimal sentence pairs, the claim of differentiation would be falsified.

Figures

Figures reproduced from arXiv: 2606.05087 by Francesca Franzon, Leo Wanner, Nicolas Ros\`as G\'omez.

Figure 1
Figure 1. Figure 1: Final object surprisal in active sentences for [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Final-verb surprisal in passive sentences for [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: KMeans purity for full object-head hidden [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

Frequent English verbs such as 'have' and 'make' can function either as collocates in light-verb constructions or as full lexical predicates, as in 'make a decision' vs. 'make a cake'. Whether language models represent this distinction remains unclear. We introduce a large-scale controlled dataset of minimally varying English sentence series in which the same context contains the same verb in light-verb and full-verb uses. Two probing experiments show that language models differentiate between these uses even in minimal contexts and exhibit separable patterns across object types. We release the dataset, generation code, and materials as a reusable resource. The framework supports extensions to broader contexts, additional verbs, and other languages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces a large-scale controlled dataset of minimally varying English sentence series in which the same context contains the same verb in light-verb and full-verb uses (e.g., 'make a decision' vs. 'make a cake'). Two probing experiments are reported to show that language models differentiate these uses even in minimal contexts and exhibit separable patterns across object types. The dataset, generation code, and materials are released as a reusable resource.

Significance. If the minimal pairs successfully isolate the constructional distinction without confounds from object semantics, this dataset would provide a valuable, extensible benchmark for evaluating phraseological competence in language models and could support comparative work across verbs and languages.

major comments (2)
  1. [Dataset construction / Abstract] The central claim that the sentence series isolate the light-verb versus full-verb distinction (Abstract) is load-bearing, yet the generation procedure supplies no quantitative evidence that object nouns are matched for concreteness, frequency, animacy, or argument structure; these properties differ systematically between light-verb and full-verb objects and could allow models to succeed without representing the constructional contrast itself.
  2. [Probing experiments / Abstract] The abstract asserts that two probing experiments demonstrate differentiation, yet supplies no information on experimental design, controls, statistical tests, or sample details, so it is impossible to judge whether the reported patterns are supported by the data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond to each major point below and note planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Dataset construction / Abstract] The central claim that the sentence series isolate the light-verb versus full-verb distinction (Abstract) is load-bearing, yet the generation procedure supplies no quantitative evidence that object nouns are matched for concreteness, frequency, animacy, or argument structure; these properties differ systematically between light-verb and full-verb objects and could allow models to succeed without representing the constructional contrast itself.

    Authors: The minimal-pair design holds the verb and sentential context constant, so that the sole systematic difference between paired items is the light- versus full-verb use of that verb. Object nouns were chosen on the basis of attested corpus collocations to ensure naturalness for each construction. We acknowledge, however, that the generation procedure description does not include quantitative matching statistics for concreteness, frequency, animacy or argument structure. We will add a supplementary table reporting these properties for the object sets and will discuss any residual differences in the revised manuscript. revision: yes

  2. Referee: [Probing experiments / Abstract] The abstract asserts that two probing experiments demonstrate differentiation, yet supplies no information on experimental design, controls, statistical tests, or sample details, so it is impossible to judge whether the reported patterns are supported by the data.

    Authors: Abstracts are length-constrained summaries; the experimental design, controls, statistical tests and sample details are fully reported in the Methods and Results sections of the manuscript. No revision to the abstract is required, as the body of the paper already supplies the requested information. revision: no

Circularity Check

0 steps flagged

No circularity: purely empirical dataset construction and probing with no derivations or self-referential fits

full rationale

The paper constructs a minimal-pair dataset for light vs. full verb uses and runs probing experiments on language models to test differentiation. No equations, parameters, or derivations appear in the abstract or described content. Claims rest on experimental outcomes from the released dataset rather than any reduction to fitted inputs, self-citations, or ansatzes. The work is self-contained as an empirical resource release with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical dataset-creation and probing study with no mathematical derivations. No free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5651 in / 1058 out tokens · 50035 ms · 2026-06-28T06:38:37.805216+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 6 canonical work pages

  1. [3]

    Miriam Butt. 1995. The Structure of Complex Predicates in Urdu. CSLI Publications, Stanford

  2. [4]

    Miriam Butt. 2010. The light verb jungle: Still hacking away. In Mengistu Amberber, Brett Baker, and Mark Harvey, editors, Complex Predicates: Cross-Linguistic Perspectives on Event Structure. Cambridge University Press, Cambridge

  3. [5]

    Wei-Te Chen and Martha Palmer. 2015. English light verb construction identification using lexical knowledge. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

  4. [6]

    Mathieu Constant, Gülşen Eryiğit, Johanna Monti, Carlos Ramisch van der Plas, Lonneke, Michael Rosner, and Amalia Todirascu. 2017. Multiword expression processing: A survey. Computational Linguistics, 43(4):837--892

  5. [7]

    Silvio Ricardo Cordeiro and Marie Candito. 2019. https://aclanthology.org/W19-6110/ Syntax-based identification of light-verb constructions . In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 97--104, Turku, Finland. Link \"o ping University Electronic Press

  6. [8]

    Beatriz Fisas, Luis Espinosa Anke, Joan Codina-Filb \'a , and Leo Wanner. 2020. https://aclanthology.org/2020.mwe-1.1/ C oll F r E n: Rich bilingual E nglish -- F rench collocation resource . In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 1--12, online. Association for Computational Linguistics

  7. [9]

    Jens Fleischhauer, , and Anja Latrouite. 2025. Light Verbs. Language Sciences Press, Berlin

  8. [11]

    Gemma Team, Google DeepMind . 2025. https://arxiv.org/abs/2503.19786 Gemma 3 technical report . Preprint, arXiv:2503.19786

  9. [12]

    Adam Goodkind and Klinton Bicknell. 2018. https://doi.org/10.18653/v1/W18-0102 Predictive power of word surprisal for reading times is a linear function of language model quality . In Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics ( CMCL 2018) , pages 10--18, Salt Lake City, Utah. Association for Computational Linguistics

  10. [13]

    Stefan Th. Gries. 2013. https://doi.org/10.1075/ijcl.18.1.09gri 50-something years of work on collocations: What is or should be next ... International Journal of Corpus Linguistics, 18(1):137--166

  11. [14]

    Jane Grimshaw and Armin Mester. 1988. Light verbs and -marking. Linguistic Inquiry, 19(2):205--232

  12. [15]

    Jennifer Hu, Jon Gauthier, Peng Qian, Ethan Wilcox, and Roger Levy. 2020. A systematic assessment of syntactic generalization in neural language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1725--1744. Association for Computational Linguistics

  13. [16]

    Otto Jesperen. 1942. A Modern English Grammar on Historical Principles, Part VI, Morphology. Ejnar Munksgaard, Copenhagen

  14. [17]

    Jaap Jumelet, Leonie Weissweiler, Joakim Nivre, and Arianna Bisazza. 2025. MultiBLiMP 1.0: A massively multilingual benchmark of linguistic minimal pairs. arXiv preprint arXiv:2504.02768

  15. [18]

    Roger Levy. 2008. https://doi.org/10.1016/j.cognition.2007.05.006 Expectation-based syntactic comprehension . Cognition, 106(3):1126--1177

  16. [19]

    Rebecca Marvin and Tal Linzen. 2018. Targeted syntactic evaluation of language models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1192--1202. Association for Computational Linguistics

  17. [20]

    Mel c uk, Andr \'e Clas, and Alain Polgu \`e re

    Igor A. Mel c uk, Andr \'e Clas, and Alain Polgu \`e re. 1995. Introduction \`a la lexicologie explicative et combinatoire . Duculot, Louvain-la-Neuve

  18. [21]

    István Nagy, Veronika Vincze, and Richárd Farkas. 2020. Detecting light verb constructions across languages. Natural Language Engineering, 26(3):319--348

  19. [22]

    Carlos Ramisch, Silvio Ricardo Cordeiro, Agata Savary, Veronika Vincze, Verginica Barbu Mititelu, Archna Bhatia, Maja Buljan, Marie Candito, Polona Gantar, Voula Giouli, Tunga G \"u ng \"o r, Abdelati Hawwari, Uxoa I \ n urrieta, Jolanta Kovalevskait \.e , Simon Krek, Timm Lichte, Chaya Liebeskind, Johanna Monti, Carla Parra Escart \'i n, and 7 others. 20...

  20. [23]

    Agata Savary, Carlos Ramisch, Silvio Ricardo Cordeiro, Federico Sangati, Veronika Vincze, Behrang QasemiZadeh, Marie Candito, Fabienne Cap, Voula Giouli, Ivelina Stoyanova, and Antoine Doucet. 2017. The PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the 13th Workshop on Multiword Expressions ( MWE 2017) ...

  21. [24]

    Smith and Roger Levy , keywords =

    Nathaniel J. Smith and Roger Levy. 2013. https://doi.org/10.1016/j.cognition.2013.02.013 The effect of word predictability on reading time is logarithmic . Cognition, 128(3):302--319

  22. [25]

    Anatol Stefanowitsch and Stefan Th Gries. 2003. Collostructions: Investigating the interaction of words and constructions. International journal of corpus linguistics, 8(2):209--243

  23. [26]

    Alba T \'a boas Garc \'i a and Leo Wanner. 2025. https://aclanthology.org/2025.depling-1.4/ Assessing the agreement competence of large language models . In Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025), pages 36--53, Ljubljana, Slovenia. Association for Computational Linguistics

  24. [27]

    Kathleen Tan, Tong Ming Lim, Chi Wee Tan, and Wei Wei Chew. 2021. Automatic identification of light verb constructions: A review. IEM Journal, Special Edition: International Conference on Digital Transformation and Applications

  25. [28]

    Yee Fan Tan, Min-Yen Kan, and Hang Cui. 2006. Extending corpus-based identification of light verb constructions using a supervised learning framework. In Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pages 49--56, Sydney, Australia. Association for Computational Linguistics

  26. [29]

    Yuancheng Tu and Dan Roth. 2011. Learning english light verb constructions: Contextual or statistical. In Proceedings of the Workshop on Multiword Expressions, pages 31--39. ACL

  27. [30]

    Ashwini Vaidya, Sumeet Agarwal, and Martha Palmer. 2016. Syntax-based identification of light-verb constructions. In Proceedings of the International Conference on Computational Linguistics, pages 1320--1329, Osaka, Japan

  28. [31]

    Veronika Vincze, István Nagy, and János Zsibrita. 2013. Learning to detect english and hungarian light verb constructions. ACM Transactions on Speech and Language Processing, 10(2)

  29. [32]

    Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng, Sheng-Fu Wang, and Samuel R. Bowman. 2020. BLiMP : The benchmark of linguistic minimal pairs for english. Transactions of the Association for Computational Linguistics, 8:377--392

  30. [33]

    Ethan Gotlieb Wilcox, Pranali Vani, and Roger Levy. 2021. A targeted assessment of incremental processing in neural language models and humans. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, pages 939--952. Association for Computational Linguistics

  31. [34]

    Beilei Xiang and 1 others. 2021. CLiMP : A benchmark for chinese language model evaluation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics

  32. [35]

    arXiv preprint arXiv:2601.03779 , year=

    Tracing the complexity profiles of different linguistic phenomena through the intrinsic dimension of LLM representations , author=. arXiv preprint arXiv:2601.03779 , year=

  33. [36]

    Discourse studies , volume=

    A register perspective on grammar and discourse: Variability in the form and use of English complement clauses , author=. Discourse studies , volume=. 1999 , publisher=

  34. [37]

    International journal of corpus linguistics , volume=

    Collostructions: Investigating the interaction of words and constructions , author=. International journal of corpus linguistics , volume=. 2003 , publisher=

  35. [38]

    and Stefanowitsch, Anatol , title =

    Gries, Stefan Th. and Stefanowitsch, Anatol , title =. International Journal of Corpus Linguistics , volume =. 2004 , doi =

  36. [39]

    Cognitive sociolinguistics , pages=

    Channel and constructional meaning: A collostructional case study , author=. Cognitive sociolinguistics , pages=. 2008 , publisher=

  37. [40]

    , title =

    Stefanowitsch, Anatol and Gries, Stefan Th. , title =. Corpus Linguistics and Linguistic Theory , volume =. 2005 , doi =

  38. [41]

    , title =

    Gries, Stefan Th. , title =. International Journal of Corpus Linguistics , volume =. 2013 , doi =

  39. [42]

    Jesperen, Otto , title =. 1942

  40. [43]

    Miriam Butt , title =. 1995

  41. [44]

    Complex Predicates: Cross-Linguistic Perspectives on Event Structure , editor =

    Miriam Butt , title =. Complex Predicates: Cross-Linguistic Perspectives on Event Structure , editor =. 2010

  42. [45]

    Light Verbs and

    Grimshaw, Jane and Armin Mester , journal=. Light Verbs and

  43. [46]

    Proceedings of the Workshop on Multiword Expressions , pages =

    Tu, Yuancheng and Dan Roth , title =. Proceedings of the Workshop on Multiword Expressions , pages =. 2011

  44. [47]

    Natural Language Engineering , pages=

    Detecting Light Verb Constructions Across Languages , author=. Natural Language Engineering , pages=

  45. [48]

    Fleischhauer, Jens and and Anja Latrouite , title =. 2025

  46. [49]

    Chen, Wei-Te and Martha Palmer , title =. 2015

  47. [50]

    ACM Transactions on Speech and Language Processing , volume=

    Learning to Detect English and Hungarian Light Verb Constructions , author=. ACM Transactions on Speech and Language Processing , volume=

  48. [51]

    Syntax-based identification of light-verb constructions

    Cordeiro, Silvio Ricardo and Candito, Marie. Syntax-based identification of light-verb constructions. Proceedings of the 22nd Nordic Conference on Computational Linguistics. 2019

  49. [52]

    Syntax-based identification of light-verb constructions

    Vaidya, Ashwini and Sumeet Agarwal and Martha Palmer. Syntax-based identification of light-verb constructions. Proceedings of the International Conference on Computational Linguistics. 2016

  50. [53]

    IEM Journal, Special Edition: International Conference on Digital Transformation and Applications , year=

    Automatic Identification of Light Verb Constructions: A Review , author=. IEM Journal, Special Edition: International Conference on Digital Transformation and Applications , year=

  51. [54]

    Computational Linguistics , volume=

    Multiword Expression Processing: A Survey , author=. Computational Linguistics , volume=

  52. [55]

    Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages =

    Targeted Syntactic Evaluation of Language Models , author =. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages =. 2018 , publisher =

  53. [56]

    S yntax G ym: An Online Platform for Targeted Evaluation of Language Models

    Gauthier, Jon and Hu, Jennifer and Wilcox, Ethan and Qian, Peng and Levy, Roger. S yntax G ym: An Online Platform for Targeted Evaluation of Language Models. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2020. doi:10.18653/v1/2020.acl-demos.10

  54. [57]

    , journal =

    Warstadt, Alex and Parrish, Alicia and Liu, Haokun and Mohananey, Anhad and Peng, Wei and Wang, Sheng-Fu and Bowman, Samuel R. , journal =

  55. [58]

    Assessing the Agreement Competence of Large Language Models

    T \'a boas Garc \'i a, Alba and Wanner, Leo. Assessing the Agreement Competence of Large Language Models. Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025). 2025

  56. [59]

    2021 , publisher =

    Xiang, Beilei and others , booktitle =. 2021 , publisher =

  57. [60]

    Jumelet, Jaap and Weissweiler, Leonie and Nivre, Joakim and Bisazza, Arianna , year =

  58. [61]

    Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages =

    A Systematic Assessment of Syntactic Generalization in Neural Language Models , author =. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages =. 2020 , publisher =

  59. [62]

    Proceedings of the 2018 EMNLP Workshop BlackboxNLP , pages =

    What do RNN Language Models Learn about Filler--Gap Dependencies? , author =. Proceedings of the 2018 EMNLP Workshop BlackboxNLP , pages =. 2018 , publisher =

  60. [63]

    Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , pages =

    A Targeted Assessment of Incremental Processing in Neural Language Models and Humans , author =. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , pages =. 2021 , publisher =

  61. [64]

    Is Incoherence Surprising? Targeted Evaluation of Coherence Prediction from Language Models

    Beyer, Anne and Lo \'a iciga, Sharid and Schlangen, David. Is Incoherence Surprising? Targeted Evaluation of Coherence Prediction from Language Models. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.328

  62. [65]

    Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties , pages =

    Tan, Yee Fan and Kan, Min-Yen and Cui, Hang , title =. Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties , pages =. 2006 , address =

  63. [66]

    Proceedings of the 13th Workshop on Multiword Expressions (

    Savary, Agata and Ramisch, Carlos and Cordeiro, Silvio Ricardo and Sangati, Federico and Vincze, Veronika and QasemiZadeh, Behrang and Candito, Marie and Cap, Fabienne and Giouli, Voula and Stoyanova, Ivelina and Doucet, Antoine , title =. Proceedings of the 13th Workshop on Multiword Expressions (. 2017 , address =

  64. [67]

    Edition 1.1 of the

    Ramisch, Carlos and Cordeiro, Silvio Ricardo and Savary, Agata and Vincze, Veronika and Barbu Mititelu, Verginica and Bhatia, Archna and Buljan, Maja and Candito, Marie and Gantar, Polona and Giouli, Voula and G. Edition 1.1 of the. Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (. 2018 , address =

  65. [68]

    Cognition , volume =

    Levy, Roger , title =. Cognition , volume =. 2008 , doi =

  66. [69]

    and Levy, Roger , title =

    Smith, Nathaniel J. and Levy, Roger , title =. Cognition , volume =. 2013 , doi =

  67. [70]

    Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (

    Goodkind, Adam and Bicknell, Klinton , title =. Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (. 2018 , address =

  68. [71]

    C oll F r E n: Rich Bilingual E nglish -- F rench Collocation Resource

    Fisas, Beatriz and Espinosa Anke, Luis and Codina-Filb \'a , Joan and Wanner, Leo. C oll F r E n: Rich Bilingual E nglish -- F rench Collocation Resource. Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons. 2020

  69. [72]

    Igor A. Mel. Introduction

  70. [73]

    2025 , eprint =

    Gemma 3 Technical Report , author =. 2025 , eprint =