pith. sign in

arxiv: 2606.27460 · v1 · pith:52VAUEFQnew · submitted 2026-06-25 · 💻 cs.CL

Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

Pith reviewed 2026-06-29 02:11 UTC · model grok-4.3

classification 💻 cs.CL
keywords neural language modelsstatistical learningtransformerssynthetic grammarover-generalizationsdevelopmental approachlanguage cognition
0
0 comments X

The pith

Neural language models acquire the most abstract global statistical knowledge first, then the local dependencies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains successive generative transformer models on a synthetic grammar and saves internal states at multiple points during training. Tracking how representations change shows that the broadest abstract statistical patterns appear earliest while narrower local dependencies appear later. The early phase includes many over-generalizations that later become more constrained. The authors present this sequence as the basis for a new framework describing statistical learning in neural language models.

Core claim

Through a developmental approach analyzing model states saved during training on a synthetic grammar, the models acquire the most abstract global statistical knowledge at the beginning of learning and later acquire the relatively local statistical dependencies. This learning path contains many over-generalizations from the very beginning and these over-generalizations are gradually constrained in the later stage of learning.

What carries the argument

Developmental tracking of internal representations saved at successive stages of training on a synthetic grammar.

If this is right

  • Over-generalizations occur from the earliest stages and become constrained later.
  • Abstract global statistical knowledge is acquired before relatively local dependencies.
  • The developmental sequence itself constitutes the statistical learning process of the models.
  • A new framework for statistical learning and language cognition in NLMs follows directly from the observed trajectory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the same early-to-late ordering appears when models are trained on natural-language corpora, the framework would extend beyond synthetic data.
  • Intervening in training to supply local patterns earlier might alter the rate or final accuracy of acquiring abstract patterns.
  • Comparing the saved states against human developmental data on similar artificial grammars could test whether the model path matches any observed child learning order.

Load-bearing premise

The synthetic grammar's statistical structure is representative of the patterns that matter for natural language, and the observed changes in model representations reflect genuine statistical learning rather than training artifacts or analysis choices.

What would settle it

Re-training the models on a synthetic grammar whose structure reverses the abstract-to-local order and finding that local dependencies still appear first would falsify the claimed learning path.

Figures

Figures reproduced from arXiv: 2606.27460 by Elizabeth Wonnacott, Holly Jenkins, Wang Bojun.

Figure 1
Figure 1. Figure 1: Inheritance hierarchy for English transitive schema. schema specifies that the constituent before the word kick is the kicker and the constituent after it is the kicked. After acquiring multiple frequent and struc￾turally similar lexical transitive schema, higher-order generalizations could be formed by tracking the struc￾tural alignment among the lexical schema. This rela￾tively abstract grammatical schem… view at source ↗
Figure 2
Figure 2. Figure 2: Inheritance hierarchy in artificial language [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Dependency relation in the inheritance hierar￾chy. design a synthetic grammar with dependency relations nested in hierarchical manner. The grammar is de￾signed to contain three levels of statistical regularity: global level, middle level, and local level. This allows us to examine the learning path in a nested inheritance hierarchy. The inheritance hierarchy is illustrated in figure 2. The dependency relat… view at source ↗
Figure 4
Figure 4. Figure 4: critical stages from motion chart visualization. The visualization demonstrates a gradience of statis￾tical learning. The dependency schema on the global level is learned at the very early stage of development while the dependency on the local level is learned in the end. That is, the model acquired the most abstract global statistical knowledge at the beginning and later gradually acquire relatively local… view at source ↗
Figure 6
Figure 6. Figure 6: Example probability ranking at early stage of learning (15,000 iteration) given prompt M -N 1-P11 3.3 Permuted order of MNPQ lan￾guage To demonstrate that the pattern we observed is not re￾lated to the linear order in this synthetic grammar, we created six different synthetic grammar dataset with different order of MNPQ categories. Q category re￾mains at the final position, while the positions of the remai… view at source ↗
Figure 7
Figure 7. Figure 7: Probability mass analysis for models trained on different synthetic grammar. 4 Discussion The current investigation provides evidence that the statistical learning of NLMs is not a random process. There are systematic patterns in the path of general￾izations. These models generalize from the most global distributional dependency in the input and later ac￾quire the relatively local dependency schema. This i… view at source ↗
read the original abstract

In this study, we use a developmental approach to investigate the statistical learning and mental representation of neural language models (NLM). A series of Generative Transformer models are trained on a synthetic grammar. The model states are saved at multiple stages in the course of training. Through analyzing how the internal representations of these models change in the developmental path, we found that NLMs acquire the most abstract global statistical knowledge at the beginning of learning and later acquire the relatively local statistical dependencies. This learning path contains many over-generalizations from the very beginning and these over-generalizations are gradually constrained in the later stage of learning. Based on this observation, we propose a new framework to explain the statistical learning and language cognition of NLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript trains a series of Generative Transformer models on a synthetic grammar, saves internal states at multiple training stages, and analyzes representational changes to argue that NLMs first acquire the most abstract global statistical patterns and only later acquire local dependencies, with early over-generalizations that are gradually constrained; a new explanatory framework for NLM statistical learning is proposed on this basis.

Significance. If the developmental trajectory is shown to be robust and the synthetic grammar's statistics are representative of natural language, the work could contribute to understanding generalization dynamics in transformers and offer parallels to human language acquisition studies. The use of checkpointed model states across training is a methodological strength that supports tracking of representational shifts.

major comments (2)
  1. [Methods / synthetic grammar definition] The central claim that the observed global-to-local trajectory is a general property of NLM statistical learning rests on a single synthetic grammar (described in the methods); without evidence that its generative rules reproduce nested long-range dependencies or hierarchical structure typical of natural language, the trajectory may be an artifact of the grammar's inductive bias rather than a general finding.
  2. [Framework section (post-results)] The proposed framework (outlined after the empirical results) is constructed directly from the training-path observations on this grammar; it is unclear whether the framework supplies independent, falsifiable predictions or simply restates quantities already fitted to the same developmental data, creating a circularity risk for the explanatory claim.
minor comments (2)
  1. [Results / analysis] Add quantitative metrics, error bars, and statistical tests for the reported shifts in global vs. local knowledge and for the over-generalization counts across checkpoints.
  2. [Analysis methods] Clarify the precise operational definitions used to label a statistical pattern as 'abstract global' versus 'local' and how these labels are computed from model representations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and note planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Methods / synthetic grammar definition] The central claim that the observed global-to-local trajectory is a general property of NLM statistical learning rests on a single synthetic grammar (described in the methods); without evidence that its generative rules reproduce nested long-range dependencies or hierarchical structure typical of natural language, the trajectory may be an artifact of the grammar's inductive bias rather than a general finding.

    Authors: We appreciate the referee's point on the scope of generalization. The synthetic grammar was constructed with generative rules intended to produce both abstract global patterns and nested long-range dependencies (detailed in Methods). That said, we agree that evidence from only one grammar leaves open the possibility that the observed trajectory reflects grammar-specific biases. In revision we will expand the Methods with explicit examples and statistics showing how the grammar encodes hierarchical structure and long-range dependencies, and we will add a Limitations subsection in the Discussion that acknowledges the single-grammar design and calls for future multi-grammar validation. These changes will make the evidential basis and its boundaries clearer without overstating generality. revision: partial

  2. Referee: [Framework section (post-results)] The proposed framework (outlined after the empirical results) is constructed directly from the training-path observations on this grammar; it is unclear whether the framework supplies independent, falsifiable predictions or simply restates quantities already fitted to the same developmental data, creating a circularity risk for the explanatory claim.

    Authors: The framework is indeed derived from the developmental observations on this grammar and functions primarily as an organizing explanation of those data. We accept that, as currently presented, it risks appearing post-hoc. In the revised manuscript we will rewrite the Framework section to (a) explicitly state its empirical grounding, (b) separate descriptive summaries of the observed trajectory from the framework's interpretive claims, and (c) articulate concrete, testable predictions (e.g., how altering the ratio of global versus local statistics or changing model depth should shift the timing of over-generalization resolution). These additions will reduce circularity and position the framework as a source of future predictions rather than a restatement of the present results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical observations on synthetic grammar do not reduce to fitted inputs by construction

full rationale

The paper trains Generative Transformer models on a synthetic grammar, saves intermediate states, and reports observed changes in internal representations (abstract global statistics acquired first, followed by local dependencies and gradual constraint of over-generalizations). The proposed framework is explicitly based on this empirical observation rather than any mathematical derivation, uniqueness theorem, or parameter fit that is then relabeled as a prediction. No equations, self-citations, or ansatzes are invoked in the provided text that would create a self-definitional loop or force a result by construction. The work is self-contained as a developmental empirical study; external validity concerns (e.g., grammar representativeness) fall outside circularity analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5653 in / 986 out tokens · 21811 ms · 2026-06-29T02:11:48.405763+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 42 canonical work pages · 1 internal anchor

  1. [1]

    A., Goyal, N., & Tsvetkov, Y

    Ahuja, K., Balachandran, V., Panwar, M., He, T., Smith, N. A., Goyal, N., & Tsvetkov, Y. (2024).Learning syntax without planting trees: Understanding when and why transformers generalize hierarchically

  2. [2]

    Boas, H. C. (2011). Coercion and leaking argument structures in construction grammar.Linguistics, 49(6).https://doi.org/10.1515/ling.2011.036

  3. [3]

    Bowerman, M., & Croft, W. (2007). The acquisition of the english causative alternation. InCrosslinguistic Perspectives on Argument Structure. Routledge

  4. [4]

    Brown, H., Smith, K., Samara, A., & Wonnacott, E. (2022). Semantic cues in language learning: An artificial language study with adult and child learners.Language, Cognition and Neuroscience, 37(4), 509–531. https://doi.org/10.1080/23273798.2021.1995612

  5. [5]

    (1995).The minimalist program

    Chomsky, N. (1995).The minimalist program. MIT Press

  6. [6]

    Croft, W. (2015). Force dynamics and directed change in event lexicalization and argument realization. In R. G. de Almeida & C. Manouilidou (Eds),Cognitive Science Perspectives on Verb Representation and Processing(pp. 103–129). Springer International Publishing. https://doi.org/10.1007/978-3-319-10112-5_5

  7. [7]

    Croft, W. A. (2003). Lexical rules vs. constructions: A false dichotomy. In H. Cuyckens, T. Berg, R. Dirven, & K.-U. Panther (Eds),Current Issues in Linguistic Theory(Vol. 243, pp. 49–68). John Benjamins Publishing Company. https://doi.org/10.1075/cilt.243.07cro

  8. [8]

    Croft, W., & Cruse, D. A. (2004).Cognitive Linguistics

  9. [9]

    Futrell, R., & Mahowald, K. (2025). How linguistics learned to stop worrying and love the language models.Behavioral and Brain Sciences, 1–98. https://doi.org/10.1017/S0140525X2510112X

  10. [10]

    Neural Language Models as Psycholinguistic Subjects: Representations of Syntactic State

    Futrell, R., Wilcox, E., Morita, T., Qian, P., Ballesteros, M., & Levy, R. (2019).Neural language models as psycholinguistic subjects: Representations of syntactic state(No. arXiv:1903.03260). arXiv. https://doi.org/10.48550/arXiv.1903.03260

  11. [11]

    Goldberg, A. E. (1995).Constructions: A construction grammar approach to argument structure. University of Chicago Press.https://press.uchicago.edu/ ucp/books/book/chicago/C/bo3683810.html

  12. [12]

    Goldberg, A. E. (2019, February 12).Explain me this: Creativity, competition, and the partial productivity of constructions. https://doi.org/10.2307/j.ctvc772nn

  13. [13]

    Hardy, M., Sucholutsky, I., Thompson, B., & Griffiths, T. (2023). Large language models meet cognitive science: LLMs as tools, models, and participants. Proceedings of the Annual Meeting of the Cognitive Science Society,45(45). https://escholarship.org/uc/item/6dp9k2gz

  14. [14]

    Haspelmath, M. (2008). Parametric versus functional explanations of syntactic universals. In T. Biberauer (Ed.),Linguistik Aktuell/Linguistics Today(Vol. 132, pp. 75–107). John Benjamins Publishing Company. https://doi.org/10.1075/la.132.04has

  15. [15]

    Hewitt, J., & Manning, C. D. (2019). A structural probe for finding syntax in word representations. In J

  16. [16]

    Doran, & T

    Burstein, C. Doran, & T. Solorio (Eds),Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (long and Short Papers)(pp. 4129–4138). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1419

  17. [17]

    (2019).Construction Grammar and its Application to English

    Hilpert, M. (2019).Construction Grammar and its Application to English. Edinburgh University Press. https://doi.org/10.1515/9781474433624

  18. [18]

    R., Schütze, H., & Pierrehumbert, J

    Hofmann, V., Weissweiler, L., Mortensen, D. R., Schütze, H., & Pierrehumbert, J. B. (2025). Derivational morphology reveals analogical generalization in large language models.Proceedings of the National Academy of Sciences,122(19), e2423232122. 8 https://doi.org/10.1073/pnas.2423232122

  19. [19]

    S., & Christiansen, M

    Isbilen, E. S., & Christiansen, M. H. (2022). Statistical learning of language: A meta-analysis into 25 years of research.Cognitive Science, 46(9), e13198. https://doi.org/10.1111/cogs.13198

  20. [20]

    Jackendoff, R. (1977). X syntax: A study of phrase structure. MIT Press

  21. [21]

    D., & Christiansen, M

    Kallens, P., Kristensen-McLachlan, R. D., & Christiansen, M. H. (2023). Large Language Models Demonstrate the Potential of Statistical Learning in Language. Cognitive Science,47(3), e13256. https://doi.org/10.1111/cogs.13256

  22. [22]

    Kim, N., & Smolensky, P. (2021). Testing for grammatical category abstraction in neural language models. In A. Ettinger, E. Pavlick, & B. Prickett (Eds),Proceedings of the Society for Computation in Linguistics 2021(pp. 467–470). Association for Computational Linguistics. https://aclanthology.org/2021.scil-1.59/

  23. [23]

    Kiparsky, P. (1997). Remarks on Denominal Verbs. In Alex A., Bresnan, J., & Sells. P(Eds.), Complex Predicates,The University of Chicago Press

  24. [24]

    Baroni, M., & Dehaene, S. (2021). Mechanisms for handling nested dependencies in neural-network language models and humans.Cognition,213, 104699.https: //doi.org/10.1016/j.cognition.2021.104699

  25. [25]

    Langacker, R. W. (1987).Foundations of cognitive grammar: Volume I: theoretical prerequisites. Stanford University Press

  26. [26]

    Langacker, R. W. (2009).Investigations in cognitive grammar. Walter de Gruyter

  27. [27]

    Lany, J., & Saffran, J. R. (2010). From Statistics to Meaning: Infants Acquisition of Lexical Categories. Psychological Science,21(2), 284–291. https://doi.org/10.1177/0956797609358570

  28. [28]

    Lany, J., & Saffran, J. R. (2011). Interactions between statistical and semantic information in infant language development: Interactions between statistical and semantic information.Developmental Science,14(5), 1207–1219.https: //doi.org/10.1111/j.1467-7687.2011.01073.x

  29. [29]

    (1993).English verb classes and alternations: A preliminary investigation

    Levin, B. (1993).English verb classes and alternations: A preliminary investigation. University of Chicago Press

  30. [30]

    Levin, B., & Hovav, M. R. (1994).Unaccusativity: At the syntax-lexical semantics interface. MIT Press

  31. [31]

    (1995).Unaccusativity

    Levin, B & Rappaport Hovav, M. (1995).Unaccusativity. At the syntax-lexical semantics interface.MIT Press

  32. [32]

    (2005).Argument realization.Cambridge University Press

    Levin, B., & Rappaport Hovav, M. (2005).Argument realization.Cambridge University Press

  33. [33]

    Levin, B. (2015). Semantics and pragmatics of argument alternations.Annual Review of Linguistics,1(Volume 1, 2015), 63–83.https://doi.org/10.1146/ annurev-linguist-030514-125141

  34. [34]

    Li, B., Zhu, Z., Thomas, G., Rudzicz, F., & Xu, Y. (2022). Neural reality of argument structure constructions. In S. Muresan, P. Nakov, & A. Villavicencio (Eds), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers)(pp. 7410–7423). Association for Computational Linguistics. https://doi.org/10.18653...

  35. [35]

    (2025).An investigation of comparative correlative constructions in auto-regressive large language models: From construction grammar to computational understanding[Preprint]

    Li, J., & Liu, Y. (2025).An investigation of comparative correlative constructions in auto-regressive large language models: From construction grammar to computational understanding[Preprint]. Research Square. https://doi.org/10.21203/rs.3.rs-6702743/v1

  36. [36]

    Lieven, E. V. M., Pine, J. M., & Baldwin, G. (1997). Lexically-based learning and early grammatical development.Journal of Child Language,24(1), 187–219. https://doi.org/10.1017/S0305000996002930

  37. [37]

    Linzen, T., Dupoux, E., & Goldberg, Y. (2016). Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies.Transactions of the Association for Computational Linguistics,4, 521–535. https://doi.org/10.1162/tacl_a_00115

  38. [38]

    Mintz, T. H. (2002). Category induction from distributional cues in an artificial language.Memory & Cognition,30(5), 678–686. https://doi.org/10.3758/BF03196424

  39. [39]

    B., Christiansen, M

    Misyak, J. B., Christiansen, M. H., & Tomblin, J. B. (2009). Statistical learning of nonadjacencies predicts on-line processing of long-distance dependencies in natural language.Proceedings of the Cognitive Science Society

  40. [40]

    L., & Newport, E

    Morgan, J. L., & Newport, E. L. (1981). The role of constituent structure in the induction of an artificial language.Journal of Verbal Learning and Verbal Behavior,20(1), 67–85. https://doi.org/10.1016/S0022-5371(81)90312-1 Müller, S. (2017). Head-driven phrase structure grammar, sign-based construction grammar, and fluid construction grammar: Commonaliti...

  41. [41]

    Murty, S., Sharma, P., Andreas, J., & Manning, C. D. (2023).Grokking of hierarchical structure in vanilla transformers(No. arXiv:2305.18741). arXiv. https://doi.org/10.48550/arXiv.2305.18741

  42. [42]

    F., & Saffran, J

    Pelucchi, B., Hay, J. F., & Saffran, J. R. (2009a). Learning in reverse: Eight-month-old infants track backward transitional probabilities.Cognition, 113(2), 244–247.https: //doi.org/10.1016/j.cognition.2009.07.011

  43. [43]

    (2015).Argument Structure in Usage-Based Construction Grammar: Experimental and corpus-based perspectives(Vol

    Perek, F. (2015).Argument Structure in Usage-Based Construction Grammar: Experimental and corpus-based perspectives(Vol. 17). John Benjamins Publishing Company. https://doi.org/10.1075/cal.17

  44. [44]

    Perek, F., & Goldberg, A. E. (2015). Generalizing beyond the input: The functions of the constructions matter. Journal of Memory and Language,84, 108–127. https://doi.org/10.1016/j.jml.2015.04.006

  45. [45]

    Perek, F., & Goldberg, A. E. (2017). Linguistic generalization on the basis of function and constraints on the basis of statistical preemption. Cognition,168, 276–293.https: //doi.org/10.1016/j.cognition.2017.06.019

  46. [46]

    (1989).Learnability and cognition: The 9 acquisition of argument structure(pp

    Pinker, S. (1989).Learnability and cognition: The 9 acquisition of argument structure(pp. xiv, 411). The MIT Press. Rappaport Hovav, M., & Levin, B. (1998). Building verb meanings.The projection of arguments: Lexical and compositional factors,97-134

  47. [47]

    A., Newport, E

    Reeder, P. A., Newport, E. L., & Aslin, R. N. (2013). From shared contexts to syntactic categories: The role of distributional information in learning linguistic form-classes.Cognitive Psychology,66(1), 30–54. https://doi.org/10.1016/j.cogpsych.2012.09.001

  48. [48]

    A., Newport, E

    Reeder, P. A., Newport, E. L., & Aslin, R. N. (2017). Distributional learning of subcategories in an artificial grammar: Category generalization and subcategory restrictions.Journal of Memory and Language,97, 17–29. https://doi.org/10.1016/j.jml.2017.07.006

  49. [49]

    R., & Saffran, J

    Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition.WIREs Cognitive Science,1(6), 906–914. https://doi.org/10.1002/wcs.78

  50. [50]

    Saffran, J. R. (2001). The Use of Predictive Dependencies in Language Learning.Journal of Memory and Language,44(4), 493–515. https://doi.org/10.1006/jmla.2000.2759

  51. [51]

    Saffran, J. R. (2020). Statistical Language Learning in Infancy.Child Development Perspectives,14(1), 49–54.https://doi.org/10.1111/cdep.12355

  52. [52]

    R., Aslin, R

    Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants.Science, 274(5294), 1926–1928. https://doi.org/10.1126/science.274.5294.1926

  53. [53]

    Fazekas, J., & Ambridge, B. (2025). Learners restrict their linguistic generalizations using preemption but not entrenchment: Evidence from artificial-language-learning studies with adults and children.Psychological Review,132(1), 1–17. https://doi.org/10.1037/rev0000463

  54. [54]

    Smith, K. H. (1969). Learning Co-occurrence restrictions: Rule induction or rote learning?Journal of Verbal Learning and Verbal Behavior,8(2), 319–321. https://doi.org/10.1016/S0022-5371(69)80086-1

  55. [55]

    (2003).Constructing a language: A usage-based theory of language acquisition

    Tomasello, M. (2003).Constructing a language: A usage-based theory of language acquisition. Harvard University Press. https://doi.org/10.2307/j.ctv26070v8

  56. [56]

    Tomasello, M. (2007). Acquiring Linguistic Constructions. In W. Damon & R. M. Lerner (Eds),Handbook of Child Psychology(1st edn). Wiley.https: //doi.org/10.1002/9780470147658.chpsy0206

  57. [57]

    P., & Newport, E

    Thompson, S. P., & Newport, E. L. (2007). Statistical learning of syntax: The role of transitional probability. Language learning and development, 3(1), 1-42

  58. [58]

    Wei, J., Garrette, D., Linzen, T., & Pavlick, E. (2021). Frequency effects on syntactic rule learning in transformers(No. arXiv:2109.07020). arXiv. https://doi.org/10.48550/arXiv.2109.07020

  59. [59]

    (2023a).Construction grammar provides unique insight into neural language models(No

    Levin, L., & Schütze, H. (2023a).Construction grammar provides unique insight into neural language models(No. arXiv:2302.02178). arXiv. https://doi.org/10.48550/arXiv.2302.02178

  60. [60]

    Weissweiler, L., Hofmann, V., Köksal, A., & Schütze, H. (2023b). Explaining pretrained language models’ understanding of linguistic structures using construction grammar.Frontiers in Artificial Intelligence,6. https://doi.org/10.3389/frai.2023.1225791

  61. [61]

    Wonnacott, E. (2013). Learning: Statistical mechanisms in language acquisition. In P.-M. Binder & K. Smith (Eds),The Language Phenomenon(pp. 65–92). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-36086-2_4

  62. [62]

    Wonnacott, E., Brown, H., & Nation, K. (2017). Skewing the evidence: The effect of input structure on child and adult learning of lexically based patterns in an artificial language.Journal of Memory and Language, 95, 36–48. https://doi.org/10.1016/j.jml.2017.01.005 10