pith. sign in

arxiv: 2605.16758 · v1 · pith:ERUBIIKCnew · submitted 2026-05-16 · 💻 cs.CL

Language Acquisition Device in Large Language Models

Pith reviewed 2026-05-19 21:41 UTC · model grok-4.3

classification 💻 cs.CL
keywords language acquisition devicepre-pretrainingsynthetic languagesformal languagetransformer modelssyntactic structuredata efficiency
0
0 comments X

The pith

Pre-pretraining LLMs on MP-STRUCT achieves token efficiency on par with strong baselines while adding resistance to implausible languages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes pre-pretraining large language models on MP-STRUCT, a synthetic formal language designed to encode the hierarchical composition, feature-based dependencies, and long-distance displacement that the Language Acquisition Device hypothesis attributes to innate human constraints. This pre-pretraining is meant to narrow the model's hypothesis space toward natural-language-like structures from the beginning, closing the data-efficiency gap with humans more effectively than prior synthetic-language approaches. A 500-step exposure to MP-STRUCT matches the token efficiency of strong baselines such as k-Shuffle Dyck while also preventing the model from acquiring structurally implausible patterns like the REVERSE language. Simplified variants show that the core component of MP-STRUCT outperforms k-Shuffle Dyck even though it falls outside C-RASP bounds on transformer expressivity, with functional landmarks that reduce dependency ambiguity emerging as the main driver of success.

Core claim

A brief 500-step PPT with MP-STRUCT matches strong formal-language baselines in token efficiency while additionally imparting a human-like resistance to structurally implausible languages such as REVERSE. MP-STRUCT CORE outperforms k-Shuffle Dyck despite not being definable in C-RASP, indicating that functional landmarks reducing dependency resolution ambiguity are a key driver and that effective PPT design depends on both expressivity and accessibility of dependency resolution.

What carries the argument

MP-STRUCT, the formal language whose strings encode hierarchical composition, feature-based dependencies, and long-distance displacement via the MERGE, AGREE, and MOVE operations.

If this is right

  • Short pre-pretraining on structured synthetic data can close part of the data-efficiency gap between LLMs and humans.
  • Models acquire resistance to structurally implausible languages beyond the efficiency improvements alone.
  • PPT language design must prioritize accessible dependency resolution landmarks in addition to raw hierarchical expressivity.
  • The requirement that effective PPT languages be both hierarchically expressive and definable in C-RASP is not necessary.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Pre-training curricula could incorporate more linguistically motivated synthetic data to encourage broader syntactic generalization.
  • The same approach might be tested on tasks requiring long-distance dependency resolution to measure transfer beyond the reported metrics.
  • Future variants could vary the density of functional landmarks to isolate their contribution to resistance against implausible languages.

Load-bearing premise

The structural properties built into MP-STRUCT successfully capture the innate constraints of the Language Acquisition Device hypothesis and transfer to improved natural-language behavior in LLMs.

What would settle it

A controlled test in which models pre-pretrained on MP-STRUCT still acquire the REVERSE language at rates comparable to baselines or show no efficiency gain on downstream natural-language tasks would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.16758 by Masato Mita, Ryo Yoshida, Taiga Someya, Yohei Oseki.

Figure 1
Figure 1. Figure 1: Comparison of C4 validation loss at 25,000 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Robustness against semantic perturbation (∆sens = LJW − LNL). This metric quantifies the perfor￾mance gap between semantic-free Jabberwocky inputs and natural language, where lower values indicate less reliance on lexical co-occurrence. as an alternative to high-expressivity baselines for improving token efficiency, and motivate further analysis of the factors underlying these gains (§6). 5 Analysis I: Qua… view at source ↗
Figure 3
Figure 3. Figure 3: Structural selectivity (∆sel = LImp − LNL) across three impossible language conditions. Positive values indicate a human-like preference for natural lin￾guistic constraints over impossible distortions. and JW→JW). We then compare their losses to quantify sensitivity to semantic information. We define sensitivity as ∆sens = LJW − LNL, where LNL and LJW denote the losses obtained under the NL→NL and JW→JW co… view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of C4 loss at 25,000 steps across [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The trajectory of the training loss. both conditions share the same primitive opera￾tions as MP-STRUCT: one type of recursive struc￾ture and four types of functional dependencies (see Appendix G for details). The only factor varied is how these components are organized, allowing us to attribute any difference in efficiency directly to the organization of dependencies rather than to their number or type. Wi… view at source ↗
read the original abstract

Large Language Models (LLMs) remain substantially less data-efficient than humans. Pre-pretraining (PPT) on synthetic languages has been proposed to close this gap, with prior work emphasizing highly expressive formal languages such as $k$-Shuffle Dyck. Inspired by the Language Acquisition Device (LAD) hypothesis, which posits that innate constraints preemptively restrict the learner's hypothesis space to natural-language-like structure, we propose LAD-inspired PPT: pre-pretraining on MP-STRUCT, a formal language whose strings encode hierarchical composition, feature-based dependencies, and long-distance displacement via MERGE, AGREE, and MOVE. A brief 500-step PPT with MP-STRUCT matches strong formal-language baselines in token efficiency while additionally imparting a human-like resistance to structurally implausible languages (e.g., REVERSE). Analyzing simplified variants, we find that MP-STRUCT CORE outperforms $k$-Shuffle Dyck despite not being definable in C-RASP (a formal bound on transformer expressivity), challenging the prior hypothesis that effective PPT languages must be both hierarchically expressive and circuit-theoretically learnable. We show that functional landmarks, which reduce dependency resolution ambiguity, are a key driver, suggesting that effective PPT design depends not only on expressivity but also on the accessibility of dependency resolution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes pre-pretraining LLMs on MP-STRUCT, a formal language whose strings encode hierarchical composition, feature-based dependencies, and long-distance displacement via MERGE, AGREE, and MOVE operations drawn from the Minimalist Program. It reports that a brief 500-step PPT on MP-STRUCT matches the token efficiency of strong baselines such as k-Shuffle Dyck on synthetic language learning while additionally conferring resistance to structurally implausible languages (e.g., REVERSE). Analysis of simplified variants identifies functional landmarks as a key driver of dependency resolution and shows that MP-STRUCT CORE outperforms k-Shuffle Dyck despite not being definable in C-RASP, thereby challenging the hypothesis that effective PPT languages must be both hierarchically expressive and circuit-theoretically learnable.

Significance. If the synthetic results hold and the design principles generalize, the work could meaningfully advance data-efficient LLM training by providing a linguistically motivated inductive bias. The explicit challenge to C-RASP necessity for PPT effectiveness is a substantive contribution to understanding transformer limitations and the role of functional landmarks offers a concrete, testable principle for future PPT language design. Credit is due for focusing on resistance metrics in addition to efficiency and for grounding the proposal in specific Minimalist operations rather than generic hierarchy.

major comments (2)
  1. [Abstract and §4 (Experiments)] Abstract and §4 (Experiments): the central efficiency and resistance claims rest on synthetic learning curves, yet the manuscript provides no quantitative transfer results (e.g., perplexity on natural text or GLUE-style scores) comparing MP-STRUCT-pretrained models against the k-Shuffle Dyck baseline on actual natural language. This leaves the LAD-inspired interpretation—that MP-STRUCT successfully instantiates innate constraints and improves natural-language acquisition—an untested extrapolation.
  2. [§5 (Analysis of variants)] §5 (Analysis of variants): the claim that MP-STRUCT CORE outperforms k-Shuffle Dyck while not being C-RASP definable is load-bearing for the challenge to prior PPT hypotheses. The manuscript must supply the explicit argument or reduction showing why MP-STRUCT CORE lies outside C-RASP, together with controls confirming that the performance difference is not attributable to differences in training hyperparameters or landmark density.
minor comments (2)
  1. [§3 (MP-STRUCT definition)] The generation rules and feature inventory for MP-STRUCT and its CORE variant should be stated more formally (e.g., as a context-free or mildly context-sensitive grammar) to support reproducibility.
  2. [Figures 1–3] Figure captions and axis labels for learning curves should explicitly state the number of random seeds, error-bar computation method, and exact token budgets used for each condition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and insightful report. We provide point-by-point responses to the major comments below.

read point-by-point responses
  1. Referee: [Abstract and §4 (Experiments)] Abstract and §4 (Experiments): the central efficiency and resistance claims rest on synthetic learning curves, yet the manuscript provides no quantitative transfer results (e.g., perplexity on natural text or GLUE-style scores) comparing MP-STRUCT-pretrained models against the k-Shuffle Dyck baseline on actual natural language. This leaves the LAD-inspired interpretation—that MP-STRUCT successfully instantiates innate constraints and improves natural-language acquisition—an untested extrapolation.

    Authors: The experiments in the manuscript are deliberately designed within a synthetic framework to allow for controlled measurement of efficiency and resistance to implausible structures, which directly tests the proposed inductive bias from the Minimalist Program operations. We do not claim direct improvements on natural language benchmarks in this work, as such transfer would require separate evaluation protocols and could be influenced by many factors beyond the PPT language. The LAD-inspired interpretation is supported by the observed resistance metrics, which mimic human preferences for plausible structures. We will revise the manuscript to explicitly state that natural language transfer remains an important direction for future research and to temper the interpretation accordingly. revision: partial

  2. Referee: [§5 (Analysis of variants)] §5 (Analysis of variants): the claim that MP-STRUCT CORE outperforms k-Shuffle Dyck while not being C-RASP definable is load-bearing for the challenge to prior PPT hypotheses. The manuscript must supply the explicit argument or reduction showing why MP-STRUCT CORE lies outside C-RASP, together with controls confirming that the performance difference is not attributable to differences in training hyperparameters or landmark density.

    Authors: We will add to §5 an explicit reduction or argument establishing that MP-STRUCT CORE is not C-RASP definable, leveraging the presence of feature checking and displacement operations that cannot be captured within the constant-depth circuit constraints of C-RASP. Furthermore, we will include additional experimental controls where training hyperparameters are matched across models and landmark density is normalized to isolate the contribution of the core structural features. revision: yes

Circularity Check

0 steps flagged

No circularity in the LAD-inspired PPT derivation

full rationale

Inspection of the abstract and described experiments reveals no self-definitional steps, fitted inputs presented as predictions, or load-bearing self-citations. The proposal of MP-STRUCT encodes specific linguistic properties via MERGE, AGREE, and MOVE, and the results on token efficiency and resistance to REVERSE are presented as empirical outcomes from pre-pretraining, benchmarked against external baselines like k-Shuffle Dyck. The analysis of variants and functional landmarks further supports independent content in the claims, without reduction to the inputs by construction. The derivation chain remains self-contained against the synthetic task benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the LAD hypothesis as a domain assumption and introduces MP-STRUCT as a new entity without independent evidence outside the paper's experiments.

axioms (1)
  • domain assumption The Language Acquisition Device hypothesis posits that innate constraints restrict the learner's hypothesis space to natural-language-like structures.
    Invoked to motivate the design of MP-STRUCT as encoding hierarchical composition, feature-based dependencies, and long-distance displacement.
invented entities (1)
  • MP-STRUCT no independent evidence
    purpose: Formal language whose strings encode hierarchical composition, feature-based dependencies, and long-distance displacement via MERGE, AGREE, and MOVE.
    Newly constructed for this work to test LAD-inspired pre-pretraining.

pith-pipeline@v0.9.0 · 5758 in / 1380 out tokens · 44444 ms · 2026-05-19T21:41:19.236203+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

109 extracted references · 109 canonical work pages

  1. [1]

    Noam Chomsky , abstract =. On. Information and Control , volume =. 1959 , issn =. doi:https://doi.org/10.1016/S0019-9958(59)90362-6 , url =

  2. [2]

    1995 , publisher=

    The Minimalist Program , author=. 1995 , publisher=

  3. [3]

    Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik , editor =

    Noam Chomsky , title =. Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik , editor =

  4. [4]

    Derivation by Phase , booktitle =

    Chomsky, Noam , isbn =. Derivation by Phase , booktitle =. 2001 , month =. doi:10.7551/mitpress/4056.003.0004 , url =

  5. [5]

    Jardine and Dwight L

    Chomsky, Noam , isbn =. Beyond Explanatory Adequacy , booktitle =. 2004 , month =. doi:10.1093/oso/9780195171976.003.0004 , url =

  6. [6]

    Colorless Green Recurrent Networks Dream Hierarchically

    Gulordava, Kristina and Bojanowski, Piotr and Grave, Edouard and Linzen, Tal and Baroni, Marco. Colorless Green Recurrent Networks Dream Hierarchically. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018. doi:10.18653/v1/N18-1108

  7. [7]

    A is B" fail to learn

    Berglund, Lukas and Tong, Meg and Kaufmann, Maximilian and Balesni, Mikita and Stickland, Asa and Korbak, Tomek and Evans, Owain , booktitle =. The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" , url =

  8. [8]

    and Sanyal, Soumya and Welleck, Sean and Ren, Xiang and Ettinger, Allyson and Harchaoui, Zaid and Choi, Yejin , title =

    Dziri, Nouha and Lu, Ximing and Sclar, Melanie and Li, Xiang Lorraine and Jiang, Liwei and Lin, Bill Yuchen and West, Peter and Bhagavatula, Chandra and Le Bras, Ronan and Hwang, Jena D. and Sanyal, Soumya and Welleck, Sean and Ren, Xiang and Ettinger, Allyson and Harchaoui, Zaid and Choi, Yejin , title =. Proceedings of the 37th International Conference ...

  9. [9]

    International Conference on Learning Representations , year=

    Are Transformers universal approximators of sequence-to-sequence functions? , author=. International Conference on Learning Representations , year=

  10. [10]

    2009 , publisher=

    Computational Complexity: A Modern Approach , author=. 2009 , publisher=

  11. [11]

    , title =

    Alur, Rajeev and Madhusudan, P. , title =. Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing , pages =. 2004 , isbn =. doi:10.1145/1007352.1007390 , abstract =

  12. [12]

    Language , volume=

    Structure dependence in grammar formation , author=. Language , volume=. 1987 , publisher=

  13. [13]

    Shieber , doi =

    Stuart M. Shieber , doi =. Evidence Against the Context-Freeness of Natural Language , volume =. Linguistics and Philosophy , number =

  14. [14]

    2020 , eprint=

    Scaling Laws for Neural Language Models , author=. 2020 , eprint=

  15. [15]

    Selected Papers from the Third International Conference, on Logical Aspects of Computational Linguistics , pages =

    Michaelis, Jens , title =. Selected Papers from the Third International Conference, on Logical Aspects of Computational Linguistics , pages =. 1998 , isbn =

  16. [16]

    , editor=

    Joshi, Aravind K. , editor=. Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural descriptions? , booktitle=. 1985 , pages=

  17. [17]

    Lewis Carroll , title =

  18. [18]

    2016 , eprint=

    Pointer Sentinel Mixture Models , author=. 2016 , eprint=

  19. [19]

    Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J

    Colin Raffel and Noam M. Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , journal=. Exploring the. 2019 , volume=

  20. [20]

    Mission: Impossible Language Models

    Kallini, Julie and Papadimitriou, Isabel and Futrell, Richard and Mahowald, Kyle and Potts, Christopher. Mission: Impossible Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.787

  21. [21]

    L earning M usic H elps Y ou R ead: U sing Transfer to Study Linguistic Structure in Language Models

    Papadimitriou, Isabel and Jurafsky, Dan. L earning M usic H elps Y ou R ead: U sing Transfer to Study Linguistic Structure in Language Models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.554

  22. [22]

    JBL i MP : J apanese Benchmark of Linguistic Minimal Pairs

    Someya, Taiga and Oseki, Yohei. JBL i MP : J apanese Benchmark of Linguistic Minimal Pairs. Findings of the Association for Computational Linguistics: EACL 2023. 2023. doi:10.18653/v1/2023.findings-eacl.117

  23. [23]

    Thirty-seventh Conference on Neural Information Processing Systems , year=

    A Logic for Expressing Log-Precision Transformers , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

  24. [24]

    First Conference on Language Modeling , year=

    Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers , author=. First Conference on Language Modeling , year=

  25. [25]

    Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models

    Ri, Ryokan and Tsuruoka, Yoshimasa. Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.504

  26. [26]

    Injecting structural hints: Using language models to study inductive biases in language learning

    Papadimitriou, Isabel and Jurafsky, Dan. Injecting structural hints: Using language models to study inductive biases in language learning. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.563

  27. [27]

    Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition

    Mita, Masato and Yoshida, Ryo and Oseki, Yohei. Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.462

  28. [28]

    and Tao, Dacheng , title =

    Gou, Jianping and Yu, Baosheng and Maybank, Stephen J. and Tao, Dacheng , title =. International Journal of Computer Vision , volume =. 2021 , doi =

  29. [29]

    Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) , year =

    Fu, Yao and Peng, Hao and Chen, Hao and Sabharwal, Ashish and Khot, Tushar , title =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) , year =

  30. [30]

    and Schuurmans, Dale and Chi, Ed H

    Hsieh, Cheng-Yu and Wei, Jason and Li, Xuezhi and Zhou, Denny and Le, Quoc V. and Schuurmans, Dale and Chi, Ed H. , title =. Proceedings of the 40th International Conference on Machine Learning (ICML) , year =

  31. [31]

    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

    Shridhara, Shreyas and Wang, Bailin and Wei, Jason and Zhou, Denny and Lin, Xi Victoria , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

  32. [32]

    Corder, S. P. , title =. International Review of Applied Linguistics , volume =

  33. [33]

    International Review of Applied Linguistics , volume =

    Selinker, Larry , title =. International Review of Applied Linguistics , volume =

  34. [34]

    , title =

    Krashen, Stephen D. , title =

  35. [35]

    Input in Second Language Acquisition , editor =

    Swain, Merrill , title =. Input in Second Language Acquisition , editor =

  36. [36]

    Principle and Practice in Applied Linguistics: Studies in Honor of H

    Swain, Merrill , title =. Principle and Practice in Applied Linguistics: Studies in Honor of H. G. Widdowson , editor =

  37. [37]

    Sociocultural Theory and Second Language Learning , editor =

    Swain, Merrill , title =. Sociocultural Theory and Second Language Learning , editor =

  38. [38]

    Cognitive and Affective Aspects of Language Learning , editor =

    Schmidt, Richard , title =. Cognitive and Affective Aspects of Language Learning , editor =

  39. [39]

    Studies in Second Language Acquisition , volume =

    Ellis, Rod and Loewen, Shawn and Erlam, Rosemary , title =. Studies in Second Language Acquisition , volume =

  40. [40]

    1996 , url=

    The Role of the Linguistic Environment in Second Language Acquisition , author=. 1996 , url=

  41. [41]

    Long , title =

    Michael H. Long , title =. Applied Linguistics , volume =. 1983 , month =. doi:10.1093/applin/4.2.126 , url =

  42. [42]

    Long , title =

    Michael H. Long , title =. Annals of the New York Academy of Sciences , volume =. doi:https://doi.org/10.1111/j.1749-6632.1981.tb42014.x , url =. https://nyaspubs.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1749-6632.1981.tb42014.x , year =

  43. [43]

    International Conference on Machine Learning , pages=

    Pythia: A suite for analyzing large language models across training and scaling , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  44. [44]

    The Twelfth International Conference on Learning Representations , year=

    On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes , author=. The Twelfth International Conference on Learning Representations , year=

  45. [45]

    Autobiographical Notes,

    Stabler, Edward P. , isbn =. Computational Perspectives on Minimalism , booktitle =. 2011 , month =. doi:10.1093/oxfordhb/9780199549368.013.0027 , url =

  46. [46]

    Derivational Minimalism

    Stabler, Edward , year =. Derivational Minimalism. , volume =

  47. [47]

    and Petty, Jackson and Shi, Chuan and Merrill, William and Linzen, Tal

    Hu, Michael Y. and Petty, Jackson and Shi, Chuan and Merrill, William and Linzen, Tal. Between Circuits and C homsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.478

  48. [48]

    2025 , eprint=

    Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models , author=. 2025 , eprint=

  49. [49]

    2025 , eprint=

    How Linguistics Learned to Stop Worrying and Love the Language Models , author=. 2025 , eprint=

  50. [50]

    Annual Review of Linguistics , volume =

    Kemp, Charles and Xu, Yang and Regier, Terry , title =. Annual Review of Linguistics , volume =. 2018 , doi =. https://doi.org/10.1146/annurev-linguistics-011817-045406 , abstract =

  51. [51]

    and Gibson, Edward A

    Fedorenko, Evelina and Piantadosi, Steven T. and Gibson, Edward A. F. , journal =. Language is primarily a tool for communication rather than thought , volume =. 2024 , doi =

  52. [52]

    and Chater, Nick , year =

    Christiansen, Morten H. and Chater, Nick , year =. doi:10.1017/S0140525X1500031X , journal =

  53. [53]

    1949 , publisher =

    Human behavior and the principle of least effort , author =. 1949 , publisher =

  54. [54]

    Florian and Tily, Harry , title =

    Jaeger, T. Florian and Tily, Harry , title =. WIREs Cognitive Science , volume =. doi:https://doi.org/10.1002/wcs.126 , url =. https://wires.onlinelibrary.wiley.com/doi/pdf/10.1002/wcs.126 , abstract =

  55. [55]

    1982 , isbn =

    Marr, David , title =. 1982 , isbn =

  56. [56]

    Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner , journal =

    Emmanuel Dupoux , keywords =. Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner , journal =. 2018 , issn =. doi:https://doi.org/10.1016/j.cognition.2017.11.008 , url =

  57. [57]

    Algebraic Structures in Natural Language , pages=

    What artificial neural networks can tell us about human language acquisition , author=. Algebraic Structures in Natural Language , pages=. 2022 , publisher=

  58. [58]

    Miyu Oba and Tatsuki Kuribayashi and Hiroki Ouchi and Taro Watanabe , booktitle =

  59. [59]

    and Lambon Ralph, Matthew A

    Ellis, Andrew W. and Lambon Ralph, Matthew A. , journal =. 2000 , url=

  60. [60]

    and Zevin, Jason D

    Seidenberg, Mark S. and Zevin, Jason D. , booktitle =. 2006 , publisher =

  61. [61]

    and Tenenbaum, Joshua B

    Hartshorne, Joshua K. and Tenenbaum, Joshua B. and Pinker, Steven , journal =

  62. [62]

    and Fischer, Susan D

    Mayberry, Rachel I. and Fischer, Susan D. , doi =. Memory & Cognition , language =

  63. [63]

    Transactions of the Association for Computational Linguistics , volume =

    Constantinescu, Ionut and Pimentel, Tiago and Cotterell, Ryan and Warstadt, Alex , title =. Transactions of the Association for Computational Linguistics , volume =. 2025 , month =. doi:10.1162/tacl_a_00725 , url =

  64. [64]

    Brain , number =

    Penfield, Wilder , doi =. Brain , number =

  65. [65]

    2023 , eprint=

    LLaMA: Open and Efficient Foundation Language Models , author=. 2023 , eprint=

  66. [66]

    Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin L...

  67. [67]

    CL i MP : A Benchmark for C hinese Language Model Evaluation

    Xiang, Beilei and Yang, Changbing and Li, Yu and Warstadt, Alex and Kann, Katharina. CL i MP : A Benchmark for C hinese Language Model Evaluation. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.242

  68. [68]

    2019 , eprint=

    RoBERTa: A Robustly Optimized BERT Pretraining Approach , author=. 2019 , eprint=

  69. [69]

    Attention is All you Need , url =

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

  70. [70]

    Journal of Machine Learning Research , year =

    Laurens van der Maaten and Geoffrey Hinton , title =. Journal of Machine Learning Research , year =

  71. [71]

    Feng, Noah D

    Feng, Steven Y. and Goodman, Noah and Frank, Michael. Is Child-Directed Speech Effective Training Data for Language Models?. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1231

  72. [72]

    Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing

    Diehl Martinez, Richard and Goriely, Z \'e bulon and Caines, Andrew and Buttery, Paula and Beinborn, Lisa. Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.344

  73. [73]

    2023 , eprint=

    Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=

  74. [74]

    2023 , eprint=

    GPT-4 Technical Report , author=. 2023 , eprint=

  75. [75]

    wild Child

    Genie: A Psycholinguistic Study of a Modern-day "wild Child" , author=. 1977 , publisher=

  76. [76]

    critical period

    Victoria Fromkin and Stephen Krashen and Susan Curtiss and David Rigler and Marilyn Rigler , abstract =. The development of language in genie: a case of language acquisition beyond the “critical period” , journal =. 1974 , issn =. doi:https://doi.org/10.1016/0093-934X(74)90027-3 , url =

  77. [77]

    Modeling Overregularization in Children with Small Language Models

    Haga, Akari and Sugawara, Saku and Fukatsu, Akiyo and Oba, Miyu and Ouchi, Hiroki and Watanabe, Taro and Oseki, Yohei. Modeling Overregularization in Children with Small Language Models. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.865

  78. [78]

    The CHILDES project: tools for analyzing talk , volume =

    Macwhinney, Brian , year =. The CHILDES project: tools for analyzing talk , volume =. Child Language Teaching and Therapy , doi =

  79. [79]

    , title =

    Patkowski, Mark S. , title =. Language Learning , volume =. doi:https://doi.org/10.1111/j.1467-1770.1980.tb00328.x , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-1770.1980.tb00328.x , abstract =

  80. [80]

    Language and Speech , volume =

    Sonia Tahta and Margaret Wood and Kate Loewenthal , title =. Language and Speech , volume =. 1981 , doi =. https://doi.org/10.1177/002383098102400306 , abstract =

Showing first 80 references.