Morphological Irregularity Correlates with Frequency
Pith reviewed 2026-05-25 15:15 UTC · model grok-4.3
The pith
Analyses of 28 languages show higher-frequency words are more likely to be morphologically irregular.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that morphological irregularity correlates with frequency: higher frequency items are more likely to be irregular and irregular items are more likely to be highly frequent. This holds across the 28 languages examined, is more robust when forms are grouped into whole paradigms, and supplies the first confirmation of this breadth for proposals from the linguistics literature.
What carries the argument
An information-theoretic measure of irregularity based on the predictability of forms, estimated by a neural transduction model.
Load-bearing premise
The neural model's estimate of irregularity matches the linguistic notion of irregularity instead of reflecting model-specific artifacts.
What would settle it
A new cross-linguistic dataset in which linguist-assigned irregularity ratings show no correlation with the model's scores, or in which frequency and irregularity are uncorrelated.
Figures
read the original abstract
We present a study of morphological irregularity. Following recent work, we define an information-theoretic measure of irregularity based on the predictability of forms in a language. Using a neural transduction model, we estimate this quantity for the forms in 28 languages. We first present several validatory and exploratory analyses of irregularity. We then show that our analyses provide evidence for a correlation between irregularity and frequency: higher frequency items are more likely to be irregular and irregular items are more likely be highly frequent. To our knowledge, this result is the first of its breadth and confirms longstanding proposals from the linguistics literature. The correlation is more robust when aggregated at the level of whole paradigms--providing support for models of linguistic structure in which inflected forms are unified by abstract underlying stems or lexemes. Code is available at https://github.com/shijie-wu/neural-transducer.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper defines an information-theoretic measure of morphological irregularity based on predictability from a neural transduction model, estimates it for forms across 28 languages, performs validatory and exploratory analyses, and reports a correlation with frequency: higher-frequency items tend to be more irregular and irregular items tend to be more frequent. The correlation strengthens when aggregated over paradigms, which the authors interpret as support for abstract stem/lexeme representations. Code is released.
Significance. If the central correlation survives controls for frequency-dependent artifacts in the neural model, the result would supply the broadest cross-linguistic empirical support yet for classic linguistic claims linking irregularity and frequency. The paradigm-level finding and the public code release are clear strengths that aid reproducibility and allow direct testing of the measure.
major comments (2)
- [Methods (neural model training and irregularity estimation)] Methods (neural transduction model and irregularity estimation): because the model is trained on the same frequency distribution later used for the correlation, any frequency-dependent optimization (better memorization or lower cross-entropy on high-count items) can directly affect the estimated irregularity score. Without frequency-balanced training, frequency-stratified held-out evaluation, or external validation against hand-labeled regular/irregular classes, the reported correlation risks being partly mechanical rather than linguistic.
- [Results (correlation analyses)] Results (paradigm-level aggregation): the claim that the correlation is 'more robust when aggregated at the level of whole paradigms' is central to the linguistic interpretation, yet the manuscript provides no explicit definition of how paradigms are constructed, no statistical comparison of form-level vs. paradigm-level effect sizes, and no error analysis showing that the improvement is not driven by a few high-frequency irregular paradigms.
minor comments (1)
- [Abstract] Abstract: 'irregular items are more likely be highly frequent' contains a missing 'to'.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below, providing the strongest honest defense of the manuscript while acknowledging where revisions are warranted.
read point-by-point responses
-
Referee: Methods (neural transduction model and irregularity estimation): because the model is trained on the same frequency distribution later used for the correlation, any frequency-dependent optimization (better memorization or lower cross-entropy on high-count items) can directly affect the estimated irregularity score. Without frequency-balanced training, frequency-stratified held-out evaluation, or external validation against hand-labeled regular/irregular classes, the reported correlation risks being partly mechanical rather than linguistic.
Authors: We acknowledge the potential for frequency-dependent effects in model training. However, any such bias would improve predictability (reduce estimated irregularity) for high-frequency items, which works directly against the observed positive correlation between frequency and irregularity. The reported result is therefore conservative with respect to this artifact. The manuscript already includes validatory analyses comparing the measure against known morphological patterns in several languages; we will add an explicit discussion of this point in the revision. revision: partial
-
Referee: Results (paradigm-level aggregation): the claim that the correlation is 'more robust when aggregated at the level of whole paradigms' is central to the linguistic interpretation, yet the manuscript provides no explicit definition of how paradigms are constructed, no statistical comparison of form-level vs. paradigm-level effect sizes, and no error analysis showing that the improvement is not driven by a few high-frequency irregular paradigms.
Authors: We agree these details should be clarified. In the revised manuscript we will (i) explicitly define paradigm construction (forms grouped by shared lemma), (ii) report a direct statistical comparison of effect sizes between form-level and paradigm-level analyses, and (iii) include a robustness check or error analysis confirming the stronger paradigm-level correlation is not driven by a small number of high-frequency paradigms. These additions will be straightforward to implement from the existing data and code. revision: yes
Circularity Check
No circularity: empirical correlation with externally estimated irregularity measure
full rationale
The paper defines irregularity via an information-theoretic predictability measure estimated by a neural transduction model, then correlates the resulting scores with frequency counts across 28 languages. This chain does not reduce the reported correlation to a self-definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation; the model supplies an independent proxy for predictability that is not constructed from the frequency variable itself. No uniqueness theorems, ansatzes smuggled via citation, or renamings of known results appear in the derivation. The study is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Neural transduction model output probabilities provide a faithful estimate of morphological predictability.
Reference graph
Works this paper leans on
-
[1]
ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Farrell Ackerman and Robert Malouf. 2013. Morphological organization: T he low conditional entropy conjecture. Language, 89(3):429--464
work page 2013
-
[4]
Adam Albright and Bruce Hayes. 2003. Rules vs. analogy in E nglish past tenses: A computational/experimental study. Cognition, 90(2):119--161
work page 2003
-
[5]
R. Harald Baayen. 2001. Word Frequency Distributions. Springer, Berlin, Germany
work page 2001
-
[6]
Matthew Baerman, Dunstan Brown, and Greville G. Corbett. 2015. Understanding and measuring morphological complexity: An introduction. Oxford University Press
work page 2015
-
[7]
Matthew Baerman, Greville G. Corbett, and D. P. Brown. 2010. Defective Paradigms: Missing forms and what they tell us. Oxford University Press, Oxford, England
work page 2010
-
[8]
Jean Berko. 1958. The child's learning of E nglish morphology. Word, 14:150--177
work page 1958
-
[9]
Joan L. Bybee. 1985. Morphology: A Study of the Relation between Meaning and Form . John Benjamins, Amsterdam
work page 1985
-
[10]
Joan L. Bybee. 1991. Natural morphology: T he organization of paradigms and language acquisition. In Thom Huebner and Charles A. Ferguson, editors, Cross Currents in Second Language Acquisition and Linguistic Theory. John Benjamins Publishing Company
work page 1991
-
[11]
Revas J. Chitashvili and R. Harald Baayen. 1993. Word frequency distributions. Quantitative Text Analysis, pages 54--135
work page 1993
-
[12]
Ryan Cotterell, Christo Kirov, Mans Hulden, and Jason Eisner. 2018 a . On the complexity and typology of inflectional morphological systems. Transaction of the Association for Computational Linguistics ( TACL )
work page 2018
-
[13]
Ryan Cotterell, Christo Kirov, Sebastian J. Mielke, and Jason Eisner. 2018 b . https://doi.org/10.18653/v1/N18-2087 Unsupervised disambiguation of syncretism in inflected lexicons . In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pag...
-
[14]
Ryan Cotterell, Christo Kirov, John Sylak-Glassman, G \. e raldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra K \"u bler, David Yarowsky, Jason Eisner, and Mans Hulden. 2017 a . https://doi.org/10.18653/v1/K17-2001 CoNLL-SIGMORPHON 2017 shared task: U niversal morphological reinflection in 52 languages . In Proceedings of the CoNLL S...
-
[15]
Ryan Cotterell, John Sylak-Glassman, and Christo Kirov. 2017 b . Neural graphical models over strings for principal parts morphological paradigm completion. In Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics ( EACL2017 )
work page 2017
-
[16]
Viviana Fratini, Joana Acha, and Itziar Laka. 2014. Frequency and morphological irregularity are independent variables. E vidence from a corpus study of S panish verbs. Corpus Linguistics and Linguistic Theory, 10(2):289 --314
work page 2014
-
[17]
Andrew Gelman and Jennifer Hill. 2007. Data Analysis using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge
work page 2007
-
[18]
Martin Haspelmath and Andrea D. Sims. 2010. Understanding Morphology. Hodder Education
work page 2010
-
[19]
Jennifer Hay. 2003. Causes and Consequences of Word Structure. Routledge, New York, NY
work page 2003
-
[20]
Borja Herce. 2016. Why frequency and morphological irregularity are not independent variables in S panish: A response to F ratini et al. (2014). Corpus Linguistics and Linguistic Theory, 12(2)
work page 2016
-
[21]
Charles F. Hockett. 1954. Two models of grammatical description. Word, 10:210--231
work page 1954
-
[22]
Ferenc Kiefer. 2000. Regularity. In Morphologie: Ein internationales Handbuch zur Flexion und Wortbildung/Morphology: An international Handbook on Inflection and Word-Formation. Walter d e Gruyter, Berlin
work page 2000
- [23]
-
[24]
Kyle Mahowald, Isabelle Dautriche, Edward Gibson, and Steven Thomas Piantadosi. 2018. Word forms are structured for efficient use. Cognitive Science, 42(8):3116--3134
work page 2018
-
[25]
Marcus, Steven Pinker, Michael T
Gary F. Marcus, Steven Pinker, Michael T. Ullman, Michelle Hollander, T. John Rosen, and Fei Xu. 1992. Overregularization in Language Acquisition. Monographs of the society for research in child development. University of Chicago Press, Chicago, IL
work page 1992
-
[26]
McClelland and Karalyn Patterson
James L. McClelland and Karalyn Patterson. 2002 a . Rules or connections in past-tense inflections: W hat does the evidence rule out? Trends in Cognitive Sciences, 6(11):465--472
work page 2002
-
[27]
McClelland and Karalyn Patterson
James L. McClelland and Karalyn Patterson. 2002 b . ` W ords or R ules' cannot exploit the regularity in exceptions. Trends in Cognitive Sciences, 6(11):464--465
work page 2002
- [28]
-
[29]
Steven Pinker. 1999. Words and Rules. HarperCollins, New York, NY
work page 1999
-
[30]
Steven Pinker and Alan Prince. 1988. On language and connectionism: A nalysis of a parallel distributed processing model of language acquisition. Cognition, 28:73--193
work page 1988
-
[31]
Steven Pinker and Michael T. Ullman. 2002 a . Combination and structure, not gradedness, is the issue. Trends in Cognitive Sciences, 6(11):472--474
work page 2002
-
[32]
Steven Pinker and Michael T. Ullman. 2002 b . The past and future of the past tense debate. Trends in Cognitive Sciences, 6(11):456--463
work page 2002
-
[33]
Sandeep Prasada and Steven Pinker. 1993. Generalisation of regular and irregular morphological patterns. Language and Cognitive Processes, 8(1):1--56
work page 1993
-
[34]
Lawrence R. Rabiner. 1989. A tutorial on hidden M arkov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--286
work page 1989
-
[35]
David E. Rumelhart and James L. McClelland. 1986. On learning the past tenses of E nglish verbs. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition., volume 2, pages 216--271. Bradford Books/MIT Press, Cambridge, MA
work page 1986
-
[36]
Dorothy Siegel. 1974. Topics in English Morphology. Ph.D. thesis, Massachusetts Institute of Technology
work page 1974
-
[37]
Thomas Stolz, Hitomi Otsuka, Aina Urdze, and Johan v an d er Auwera. 2012. Introduction: I rregularity --- glimpses of a ubiquitous phenomenon. In Thomas Stolz, Hitomi Otsuka, Aina Urdze, and Johan v an d er Auwera, editors, Irregularity in Morphology (and Beyond), pages 7--38. Akademie Verlag, Berlin, Germany
work page 2012
-
[38]
Gregory T. Stump. 2001. Inflection. In Handbook of Morphology. Blackwell, Oxford, England
work page 2001
- [39]
-
[40]
Charles D. Yang. 2002. Knowledge and Learning in Natural Language. Oxford linguistics. Oxford University Press, New York
work page 2002
-
[41]
Charles D. Yang. 2016. The Price of Productivity: H ow Children Learn to Break the Rules of Language . The MIT Press, Cambridge, Massachusetts
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.