Rethinking the Idiomaticity Decomposability Hypothesis: Evidence from Distributional Learning
Pith reviewed 2026-06-28 10:02 UTC · model grok-4.3
The pith
Contextualised language models show idiom decomposability correlates weakly with human judgments and negatively with syntactic flexibility.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using contextualised language models as controlled distributional learners, a model-internal measure of idiom decomposability correlates weakly with human judgments and shows a small but consistent negative relationship with syntactic flexibility. Pretraining analyses show that stabilisation of idiom representations in models is not explained by frequency alone. Instead, surprisal, decomposability, and frequency all contribute, with decomposability showing the strongest training-dependent effect.
What carries the argument
Model-internal measure of decomposability derived from contextualised language models during pretraining.
If this is right
- Syntactic flexibility of idioms cannot be attributed primarily to decomposability.
- Stabilisation of idiom representations during pretraining depends jointly on surprisal, decomposability, and frequency.
- Decomposability exerts its largest influence on representation stability as training data volume increases.
- Human judgments of decomposability align only weakly with what models extract from text distributions.
Where Pith is reading between the lines
- Usage-based accounts that foreground predictability and cumulative exposure may offer a stronger account of idiom behaviour than decomposability alone.
- Model-derived measures could be tested as proxies for tracking human learning trajectories in controlled psycholinguistic experiments.
- The same pretraining analysis approach could be extended to other classes of multiword expressions to check whether decomposability effects generalise.
Load-bearing premise
That contextualised language models function as valid controlled distributional learners whose internal measures of decomposability and learning dynamics can be directly compared to human idiom processing.
What would settle it
A replication in which the same model-internal decomposability measure produced a strong positive correlation with human judgments and a positive relationship with syntactic flexibility would falsify the reported weak and negative relationships.
Figures
read the original abstract
Idioms can be analysed in terms of their decomposability, the extent to which constituent meanings contribute to the figurative whole. Decomposability is thought to predict syntactic flexibility. Usage-based accounts instead attribute idiom behaviour to distributional experience, such as speaker familiarity and predictability. We examine these views using contextualised language models as controlled distributional learners. We propose a model-internal measure of decomposability and relate it to human ratings, syntactic flexibility, and predictability while tracking idiom learning during pretraining. Model-derived decomposability correlates weakly with human judgments and shows a small but consistent negative relationship with syntactic flexibility. Pretraining analyses show that stabilisation of idiom representations in models is not explained by frequency alone. Instead, surprisal, decomposability, and frequency all contribute, with decomposability showing the strongest training-dependent effect.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that contextualized language models serve as controlled distributional learners to test the idiomaticity decomposability hypothesis. It introduces a model-internal decomposability measure that correlates only weakly with human judgments and shows a small negative relationship with syntactic flexibility. Pretraining trajectory analyses indicate that stabilization of idiom representations is not explained by frequency alone; instead surprisal, decomposability, and frequency all contribute, with decomposability exhibiting the strongest training-dependent effect.
Significance. If the mapping from model measure to human decomposability holds, the work supplies computational evidence favoring multi-factor usage-based accounts of idiom processing over purely decomposability-based accounts. The pretraining dynamics analysis is a clear strength, as it tracks representational change over training steps rather than relying on static snapshots.
major comments (3)
- [§3] §3 (Methods): The model-internal decomposability measure is defined from contextualized representations, yet the manuscript supplies no explicit formula, layer selection, or distance metric. Without this, it is impossible to determine whether the measure isolates constituent contributions in a manner comparable to human decomposability ratings, which is load-bearing for the central claim that the weak correlation challenges the decomposability hypothesis.
- [§5] §5 (Pretraining analyses): The assertion that decomposability shows the strongest training-dependent effect is based on unspecified regression models. No coefficients, interaction terms, or model-comparison statistics are reported for the joint contributions of surprisal, decomposability, and frequency, so the relative strength of the decomposability effect cannot be verified.
- [§4] §4 (Correlation and flexibility results): The reported negative relationship between model decomposability and syntactic flexibility is presented as evidence against decomposability accounts, but the weak correlation with human ratings (r < 0.3 range implied) raises the possibility that the dissociation is an artifact of the model's embedding geometry rather than a general property of idioms.
minor comments (2)
- [Abstract] Abstract: Mentions model choice, measure construction, and statistical controls only at a high level; adding one sentence on each would improve readability.
- [Throughout] Notation: The decomposability score would benefit from an explicit equation or pseudocode block to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment below, indicating where revisions will be made to improve clarity and verifiability while defending the core claims on the basis of the existing analyses.
read point-by-point responses
-
Referee: [§3] §3 (Methods): The model-internal decomposability measure is defined from contextualized representations, yet the manuscript supplies no explicit formula, layer selection, or distance metric. Without this, it is impossible to determine whether the measure isolates constituent contributions in a manner comparable to human decomposability ratings, which is load-bearing for the central claim that the weak correlation challenges the decomposability hypothesis.
Authors: We agree that the Methods section would benefit from greater explicitness. The decomposability measure is operationalized as the cosine similarity between the contextualized idiom embedding and the element-wise sum of its constituent embeddings (extracted from the same context), using the final transformer layer. In the revised manuscript we will insert the precise mathematical definition, confirm layer selection, and state the distance metric to permit direct replication and comparison with human ratings. revision: yes
-
Referee: [§5] §5 (Pretraining analyses): The assertion that decomposability shows the strongest training-dependent effect is based on unspecified regression models. No coefficients, interaction terms, or model-comparison statistics are reported for the joint contributions of surprisal, decomposability, and frequency, so the relative strength of the decomposability effect cannot be verified.
Authors: The pretraining analyses employ linear mixed-effects models with representation stability as the outcome and main effects plus interactions of surprisal, decomposability, and frequency with training step as predictors. We will add the full model equations, coefficient tables (with standard errors and t-values), and model-comparison statistics (AIC, BIC, and likelihood-ratio tests) in the revised §5 so that the relative magnitude of the decomposability-by-step interaction can be directly evaluated. revision: yes
-
Referee: [§4] §4 (Correlation and flexibility results): The reported negative relationship between model decomposability and syntactic flexibility is presented as evidence against decomposability accounts, but the weak correlation with human ratings (r < 0.3 range implied) raises the possibility that the dissociation is an artifact of the model's embedding geometry rather than a general property of idioms.
Authors: The weak correlation with human ratings is itself a substantive result indicating that distributional decomposability diverges from introspective judgments. The negative association with syntactic flexibility survives controls for frequency and is observed across multiple model families. We will expand the discussion to explicitly consider embedding-geometry artifacts and will add a supplementary analysis contrasting contextual versus static embeddings; however, we maintain that the pattern is not reducible to geometry alone given the training-dynamic evidence. revision: partial
Circularity Check
No circularity: empirical correlations from proposed model measure
full rationale
The paper proposes a model-internal decomposability measure and reports its empirical correlations with human ratings, syntactic flexibility, and pretraining dynamics. No equations, fitted parameters, or derivations are shown that reduce any prediction to the same inputs by construction. The central results are observational comparisons rather than self-definitional or self-citation load-bearing steps. The derivation chain is self-contained against external human judgments and does not invoke uniqueness theorems or ansatzes from prior self-work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Journal of memory and language , volume=
More than words: Frequency effects for multi-word phrases , author=. Journal of memory and language , volume=. 2010 , publisher=
2010
-
[2]
time and again
Seeing a phrase “time and again” matters: The role of phrasal frequency in the processing of multiword sequences. , author=. Journal of Experimental Psychology: Learning, Memory, and Cognition , volume=. 2011 , publisher=
2011
-
[3]
Brain and language , volume=
Evidence for frequency-based constituents in the mental lexicon: Collocations involving the word of , author=. Brain and language , volume=. 2002 , publisher=
2002
-
[4]
Glossa: a journal of general linguistics , volume=
Sources of variability in the syntactic flexibility of idioms , author=. Glossa: a journal of general linguistics , volume=. 2023 , publisher=
2023
-
[5]
An invitation to cognitive science: Language , volume=
Lexical semantics and compositionality , author=. An invitation to cognitive science: Language , volume=. 1995 , publisher=
1995
-
[6]
Journal of memory and language , volume=
Lexical access during the production of idiomatic phrases , author=. Journal of memory and language , volume=. 2006 , publisher=
2006
-
[7]
Characterizing Idioms: Conventionality and Contingency
Socolof, Michaela and Cheung, Jackie and Wagner, Michael and O ' Donnell, Timothy. Characterizing Idioms: Conventionality and Contingency. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.278
-
[8]
Smith and Hannaneh Hajishirzi , booktitle=
Evan Pete Walsh and Luca Soldaini and Dirk Groeneveld and Kyle Lo and Shane Arora and Akshita Bhagia and Yuling Gu and Shengyi Huang and Matt Jordan and Nathan Lambert and Dustin Schwenk and Oyvind Tafjord and Taira Anderson and David Atkinson and Faeze Brahman and Christopher Clark and Pradeep Dasigi and Nouha Dziri and Allyson Ettinger and Michal Guerqu...
2025
-
[9]
2001 , publisher=
Understanding figurative language: From metaphor to idioms , author=. 2001 , publisher=
2001
-
[10]
RUDN Journal of Language Studies, Semiotics and Semantics , volume=
Criteria of Semantic Decomposability of Idioms , author=. RUDN Journal of Language Studies, Semiotics and Semantics , volume=
-
[11]
Psycholinguistic studies on the syntactic behavior of idioms , journal =
Raymond W Gibbs and Nandini P Nayak , abstract =. Psycholinguistic studies on the syntactic behavior of idioms , journal =. 1989 , issn =. doi:https://doi.org/10.1016/0010-0285(89)90004-2 , url =
-
[12]
Proceedings of the 7th International Corpus Linguistics Conference (
Jakubíček, Miloš and Kilgarriff, Adam and Kovář, Vojtěch and Rychlý, Pavel and Suchomel, Vít , title =. Proceedings of the 7th International Corpus Linguistics Conference (
-
[13]
Phrasal Substitution of Idiomatic Expressions
Liu, Changsheng and Hwa, Rebecca. Phrasal Substitution of Idiomatic Expressions. Proceedings of the 2016 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016. doi:10.18653/v1/N16-1040
-
[14]
Aho and Jeffrey D
Alfred V. Aho and Jeffrey D. Ullman , title =. 1972
1972
-
[15]
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
IMPLI: Investigating NLI models’ performance on figurative language , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[16]
Publications Manual , year = "1983", publisher =
1983
-
[17]
Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243
-
[18]
Scalable training of
Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
-
[19]
Dan Gusfield , title =. 1997
1997
-
[20]
Tetreault , title =
Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =
2015
-
[21]
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =
Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =
-
[22]
Idioms , pages=
The Boundaries of the Lexicon: For Morris Halle, in celebration of his 70th birthday , author=. Idioms , pages=. 2014 , publisher=
2014
-
[23]
BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v...
-
[24]
URL https: //aclanthology.org/2025.acl-long.127/
Warner, Benjamin and Chaffin, Antoine and Clavi. Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.127
-
[25]
Memorization or Reasoning? Exploring the Idiom Understanding of LLM s
Kim, Jisu and Shin, Youngwoo and Hwang, Uiji and Choi, Jihun and Xuan, Richeng and Kim, Taeuk. Memorization or Reasoning? Exploring the Idiom Understanding of LLM s. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1099
-
[26]
Bulkes, Nyssa Z. and Tanner, Darren , title=. Behavior Research Methods , year=. doi:10.3758/s13428-016-0747-8 , url=
-
[27]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A
Tenney, Ian and Das, Dipanjan and Pavlick, Ellie. BERT Rediscovers the Classical NLP Pipeline. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1452
-
[28]
What Does BERT Learn about the Structure of Language?
Jawahar, Ganesh and Sagot, Beno \^i t and Seddah, Djam \'e. What Does BERT Learn about the Structure of Language?. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1356
-
[29]
and Titone, Debra A
Libben, Maya R. and Titone, Debra A. , title =. Memory & Cognition , volume =
-
[30]
Journal of Experimental Psychology: Learning, Memory, and Cognition , volume =
Tabossi, Patrizia and Fanari, Rachele and Wolf, Kristine , title =. Journal of Experimental Psychology: Learning, Memory, and Cognition , volume =
-
[31]
Language, Usage and Cognition , publisher=
Bybee, Joan , year=. Language, Usage and Cognition , publisher=
-
[32]
https://doi.org/10.1093/acprof:oso/9780199213900.001.0001
Goldberg, Adele , title =. 2005 , month =. doi:10.1093/acprof:oso/9780199268511.001.0001 , url =
work page doi:10.1093/acprof:oso/9780199268511.001.0001 2005
-
[33]
Riehemann, Susanne , year =
-
[34]
Sag and Thomas Wasow , booktitle =
Geoffrey Nunberg and Ivan A. Sag and Thomas Wasow , booktitle =. Idioms , year =
-
[35]
Verbal multiword expressions: Idiomaticity and flexibility
Sheinfux, \ Livnat Herzig\ and Greshler, \ Tali Arad\ and Nurit Melnik and Shuly Wintner. Verbal multiword expressions: Idiomaticity and flexibility. Representation and Parsing of Multiword Expressions. 2019. doi:10.5281/zenodo.2579035
-
[36]
Rolling the DICE on Idiomaticity: How LLM s Fail to Grasp Context
Mi, Maggie and Villavicencio, Aline and Moosavi, Nafise Sadat. Rolling the DICE on Idiomaticity: How LLM s Fail to Grasp Context. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.362
-
[37]
2025 , eprint=
Olmo 3 , author=. 2025 , eprint=
2025
-
[38]
arXiv preprint arXiv:2401.17377 , year=
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens , author=. arXiv preprint arXiv:2401.17377 , year=
-
[39]
Shannon, C. E. , journal=. A mathematical theory of communication , year=
-
[40]
and Nayak, Nandini P
Gibbs, Raymond W. and Nayak, Nandini P. and Cutting, Cooper , title=. Journal of Memory and Language , year=
-
[41]
Proceedings of the XIIIth international congress of linguistics , volume=
Idioms: An interim report , author=. Proceedings of the XIIIth international congress of linguistics , volume=. 1983 , organization=
1983
-
[42]
1978 , address =
Nunberg, Geoffrey , title =. 1978 , address =
1978
-
[43]
Behavioral and Brain Sciences , volume =
Rules and representations , volume=. Behavioral and Brain Sciences , author=. 1980 , pages=. doi:10.1017/S0140525X00001515 , number=
-
[44]
Foundations of language , pages=
Idioms within a transformational grammar , author=. Foundations of language , pages=. 1970 , publisher=
1970
-
[45]
James T. Heringer. Idioms and Lexicalization in English. 1976. doi:10.1163/9789004368842_008
-
[46]
, title =
Katz, Jerrold J. , title =. A Festschrift for Morris Halle , editor =. 1973 , address =
1973
-
[47]
Memory & Cognition , year =
Tabossi, Patrizia and Fanari, Rachele and Wolf, Karoline , title =. Memory & Cognition , year =
-
[48]
Memory & cognition , volume=
Speakers' assumptions about the lexical flexibility of idioms , author=. Memory & cognition , volume=. 1989 , publisher=
1989
-
[49]
Debra A. Titone and Cynthia M. Connine , title =. Metaphor and Symbolic Activity , volume =. 1994 , publisher =. doi:10.1207/s15327868ms0904\_1 , URL =
-
[50]
Debra A. Titone and Cynthia M. Connine , keywords =. On the compositional and noncompositional nature of idiomatic expressions , journal =. 1999 , note =. doi:https://doi.org/10.1016/S0378-2166(99)00008-9 , url =
-
[51]
International conference on machine learning , pages=
Similarity of neural network representations revisited , author=. International conference on machine learning , pages=. 2019 , organization=
2019
-
[52]
Management Science , year=
On the Translocation of Masses , author=. Management Science , year=
-
[53]
Vaserstein, L. N. , title =. Problemy Peredachi Informatsii , year =
-
[54]
Libben, Maya R. and Titone, Debra A. , title =. Memory & Cognition , year =. doi:10.3758/MC.36.6.1103 , url =
-
[55]
Metaphor and Symbol , volume =
Cacciari, Cristina and Levorato, Maria Chiara , title =. Metaphor and Symbol , volume =. 1998 , publisher =. doi:10.1207/s15327868ms1303_1 , issn =
-
[56]
2012 , publisher=
Interpreting figurative meaning , author=. 2012 , publisher=
2012
-
[57]
Attention is All you Need , url =
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =
-
[58]
Journal of Logic, Language and Information , year =
Boon or burden? The role of compositional meaning in figurative language processing and acquisition , author =. Journal of Logic, Language and Information , year =. doi:10.1007/s10849-019-09282-7 , issn =
-
[59]
The comprehension of idioms , journal =
Cristina Cacciari and Patrizia Tabossi , abstract =. The comprehension of idioms , journal =. 1988 , issn =. doi:https://doi.org/10.1016/0749-596X(88)90014-9 , url =
-
[60]
From Input Perception to Predictive Insight: Modeling Model Blind Spots Before They Become Errors
Mi, Maggie and Villavicencio, Aline and Moosavi, Nafise Sadat. From Input Perception to Predictive Insight: Modeling Model Blind Spots Before They Become Errors. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1740
-
[61]
and Baldwin, Timothy and Bond, Francis and Copestake, Ann A
Sag, Ivan A. and Baldwin, Timothy and Bond, Francis and Copestake, Ann A. and Flickinger, Dan , title =. Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing , pages =. 2002 , isbn =
2002
-
[62]
A Probabilistic E arley Parser as a Psycholinguistic Model
Hale, John. A Probabilistic E arley Parser as a Psycholinguistic Model. Second Meeting of the North A merican Chapter of the Association for Computational Linguistics. 2001
2001
-
[63]
Expectation-based syntactic comprehension , author =. Cognition , volume =. 2008 , month = mar, issn =. doi:10.1016/j.cognition.2007.05.006 , url =
-
[64]
Smith and Roger Levy , keywords =
Nathaniel J. Smith and Roger Levy , keywords =. The effect of word predictability on reading time is logarithmic , journal =. 2013 , issn =. doi:https://doi.org/10.1016/j.cognition.2013.02.013 , url =
-
[65]
A Pragmatic Analysis
Idioms in English , author=. A Pragmatic Analysis. Tubingen: Narr , year=
-
[66]
Language , volume=
Idioms , author=. Language , volume=. 1994 , publisher=
1994
-
[67]
Memory & cognition , volume=
The multidetermined nature of idiom processing , author=. Memory & cognition , volume=. 2008 , publisher=
2008
-
[68]
Language and speech , volume=
Representing idioms: Syntactic and contextual effects on idiom processing , author=. Language and speech , volume=. 2013 , publisher=
2013
-
[69]
Chang, Tyler A. and Bergen, Benjamin K. , title =. Transactions of the Association for Computational Linguistics , volume =. 2022 , month =. doi:10.1162/tacl_a_00444 , url =
-
[70]
ChatGPT: Optimizing Language Models for Dialogue , year =
-
[71]
2023 , eprint=
LLaMA: Open and Efficient Foundation Language Models , author=. 2023 , eprint=
2023
-
[72]
Psychophysiology , volume=
Predictability and decomposability separately contribute to compositional processing of idiomatic language , author=. Psychophysiology , volume=. 2023 , publisher=
2023
-
[73]
Cleland and Rebecca Bull , keywords =
Emily Nordmann and Alexandra A. Cleland and Rebecca Bull , keywords =. Familiarity breeds dissent: Reliability analyses for British-English idioms on measures of familiarity, meaning, literality, and decomposability , journal =. 2014 , note =. doi:https://doi.org/10.1016/j.actpsy.2014.03.009 , url =
-
[74]
A Distributional Perspective on Word Learning in Neural Language Models
Ficarra, Filippo and Cotterell, Ryan and Warstadt, Alex. A Distributional Perspective on Word Learning in Neural Language Models. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.naacl-long.558
-
[75]
Representation and parsing of multiword expressions: Current trends , volume=
Verbal multiword expressions: Idiomaticity and flexibility , author=. Representation and parsing of multiword expressions: Current trends , volume=. 2019 , publisher=
2019
-
[76]
Chapter 9 Understanding Idiomatic Expressions: The Contribution of Word Meanings , editor =
Cristina Cacciari and Sam Glucksberg , abstract =. Chapter 9 Understanding Idiomatic Expressions: The Contribution of Word Meanings , editor =. 1991 , booktitle =. doi:https://doi.org/10.1016/S0166-4115(08)61535-6 , url =
-
[77]
, title=
Partee, Barbara H. , title=. 1995 , publisher=
1995
-
[78]
Glossa: a journal of general linguistics , volume=
Decomposability and the syntactic flexibility of Hebrew idioms , author=. Glossa: a journal of general linguistics , volume=. 2025 , publisher=
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.