Not-So-Strange Love: Language Models and Generative Linguistic Theories are More Compatible than They Appear

R. Thomas McCoy

arxiv: 2605.10061 · v1 · pith:4TPHQWJ5new · submitted 2026-05-11 · 💻 cs.CL · cs.AI

Not-So-Strange Love: Language Models and Generative Linguistic Theories are More Compatible than They Appear

R. Thomas McCoy This is my paper

Pith reviewed 2026-05-12 02:10 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords language modelsgenerative linguisticsusage-based theoriesformal structureslinguistic compatibilityneural LMstheory reconciliation

0 comments

The pith

Language models can instantiate formal generative linguistic theories in addition to usage-based ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that the success of neural language models is compatible with theories based on formal structures from the generative tradition, not just gradient usage-based theories. This claim broadens the kinds of linguistic theories that can be investigated using language models. A sympathetic reader would care because it suggests that current debates in linguistics need not treat these models as evidence for only one theoretical camp. If the argument holds, it could facilitate reconciliations between usage-based and generative accounts of language.

Core claim

LMs can also instantiate theories based on formal structures - the types of theories seen in the generative tradition. This argument expands the space of theories that can be tested with LMs, potentially enabling reconciliations between usage-based and generative accounts.

What carries the argument

The capacity of language models to embody formal generative structures through their learned representations and behaviors.

If this is right

This expands the space of theories that can be tested with LMs.
It potentially enables reconciliations between usage-based and generative accounts.
LMs can serve as a testing ground for formal linguistic theories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers could look for specific formal properties in trained language models to support this view.
The approach might encourage more integrated models that draw from both theoretical traditions.
It implies that linguistic theory testing can be more inclusive of computational methods.

Load-bearing premise

The observed success and behavior of LMs can be interpreted as instantiating formal generative theories without additional evidence or specific mechanisms.

What would settle it

A study showing that language models do not exhibit any distinctive predictions from generative theories that go beyond what usage-based models predict.

read the original abstract

Futrell and Mahowald (2025) frame the success of neural language models (LMs) as supporting gradient, usage-based linguistic theories. I argue that LMs can also instantiate theories based on formal structures - the types of theories seen in the generative tradition. This argument expands the space of theories that can be tested with LMs, potentially enabling reconciliations between usage-based and generative accounts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a conceptual response claiming LMs can instantiate generative theories, but it lacks any supporting details or evidence.

read the letter

This paper's key point is that language models aren't limited to supporting usage-based linguistic theories; they can also instantiate formal generative ones. By making this argument, it suggests a way to reconcile what have often been seen as opposing views in linguistics. The new element here is the direct response to Futrell and Mahowald's framing, proposing that LM success opens doors for testing generative accounts as well. It does a decent job of identifying this potential compatibility and noting how it could broaden the theories we can examine with these models. Where it falls short is in the lack of any concrete support. The abstract states the position clearly but gives no examples of how an LM might embody formal structures, no mechanisms for instantiation, and no data or derivations. This leaves the central claim as an assertion rather than a demonstrated result. The absence of any falsifiable prediction or testable mapping makes it hard to see how this would actually work in practice. For example, what would distinguish an LM instantiating a generative theory from one that is purely statistical? The paper doesn't tackle this distinction. Since the full text isn't available, it's possible the paper expands on this, but based on what's here, the argument feels underdeveloped. The paper is best suited for colleagues already engaged in debates about linguistic theory and neural models. It won't provide much for someone seeking empirical findings or detailed technical analysis. If the full version includes specific ways to map LM behaviors to generative principles, it could be valuable for discussion. I'd recommend putting it through peer review to see if the authors can flesh out the ideas with evidence. The perspective is worth considering, even if the current presentation is light on substance.

Referee Report

1 major / 0 minor

Summary. The manuscript argues that neural language models (LMs) can instantiate formal generative linguistic theories (e.g., those based on hierarchical structures and rules from the generative tradition), in addition to supporting the gradient, usage-based theories emphasized by Futrell and Mahowald (2025). This compatibility is said to expand the space of testable theories with LMs and potentially enable reconciliations between usage-based and generative accounts.

Significance. If developed with concrete mechanisms, this perspective could meaningfully broaden how LM behaviors are interpreted in linguistics, allowing formal generative hypotheses to be tested empirically via model performance and training dynamics. It would challenge the current framing of LM success as exclusively favoring usage-based theories and open integrative research avenues.

major comments (1)

Abstract: The central claim that LMs 'can also instantiate theories based on formal structures' is asserted without any derivation, mapping from LM components (e.g., attention heads or embeddings) to generative theory elements (e.g., phrase structure or transformations), specific examples, or cited empirical results. This absence is load-bearing, as the manuscript supplies no evidence or mechanism to show instantiation, leaving the argument without grounding.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comment below and indicate the revisions we will make to better ground the central claim.

read point-by-point responses

Referee: [—] Abstract: The central claim that LMs 'can also instantiate theories based on formal structures' is asserted without any derivation, mapping from LM components (e.g., attention heads or embeddings) to generative theory elements (e.g., phrase structure or transformations), specific examples, or cited empirical results. This absence is load-bearing, as the manuscript supplies no evidence or mechanism to show instantiation, leaving the argument without grounding.

Authors: We acknowledge that the abstract presents the claim at a high level of generality without explicit mappings, derivations, or concrete examples. The manuscript is a concise position paper whose core argument is that LMs are in principle capable of instantiating formal generative structures because their training objectives and architectures enable them to acquire hierarchical and rule-like representations of language. We agree this requires more explicit grounding to be persuasive. In revision we will expand the abstract with a brief clause indicating the basis for the claim (LMs' demonstrated capacity to encode syntactic hierarchies) and add a short section to the main text that sketches potential correspondences, such as how self-attention layers can implement operations akin to Merge or movement, while citing relevant probing studies that link LM internals to generative constructs. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper consists of a short conceptual argument in its abstract: it cites Futrell and Mahowald (2025) for the usage-based framing and then states that LMs can also instantiate formal generative theories, thereby expanding testable theory space. No equations, parameters, predictions, or derivations are present. The single external citation is not self-citation by the author and does not serve as a load-bearing premise that reduces to the paper's own inputs. The central claim is an interpretive assertion rather than a chain that collapses by definition or construction to its starting assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on the domain assumption that LMs are capable of embodying formal linguistic structures, which is asserted without independent evidence or examples in the abstract.

axioms (1)

domain assumption Neural language models can instantiate linguistic theories based on formal structures.
This is the core premise invoked in the abstract to counter the usage-based framing.

pith-pipeline@v0.9.0 · 5325 in / 1133 out tokens · 53104 ms · 2026-05-12T02:10:48.890450+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

Boleda, G. (2025). LLMs as a synthesis between symbolic and distributed approaches to language. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9365–9379, Suzhou, China. Association for Computational Linguistics

work page 2025
[2]

Bybee, J. L. and Hopper, P. J. (2001).Frequency and the emergence of linguistic structure. John Benjamins Publishing Company

work page 2001
[3]

Chomsky, N. (1993). A minimalist program for linguistic theory. InThe View from Building 20, pages 1–52. MIT Press

work page 1993
[4]

and Mahowald, K

Futrell, R. and Mahowald, K. (2025). How linguistics learned to stop worrying and love the language models.Behavioral and Brain Sciences, pages 1–98

work page 2025
[5]

Kim, N., Schuster, S., and Toshniwal, S. (2024). Code pretraining improves entity tracking abilities of language models.arXiv preprint arXiv:2405.21068

work page arXiv 2024
[6]

(1982).Vision: A computational investigation into the human representation and processing of visual information

Marr, D. (1982).Vision: A computational investigation into the human representation and processing of visual information. W.H. Freeman. 3

work page 1982
[7]

T., Grant, E., Smolensky, P., Griffiths, T

McCoy, R. T., Grant, E., Smolensky, P., Griffiths, T. L., and Linzen, T. (2020). Universal linguistic inductive biases via meta-learning.Proceedings of the 42nd Annual Conference of the Cognitive Science Society, pages 737–743

work page 2020
[8]

McCoy, R. T. and Griffiths, T. L. (2025). Modeling rapid language learning by distilling Bayesian priors into artificial neural networks.Nature Communications, 16(1):4676

work page 2025
[9]

and Smolensky, P

Prince, A. and Smolensky, P. (1993/2004).Optimality theory: Constraint interaction in generative grammar. Wiley

work page 1993
[10]

and Legendre, G

Smolensky, P. and Legendre, G. (2006).The Harmonic Mind: From Neural Computation to Optimality- Theoretic Grammar. MIT Press

work page 2006
[11]

Smolensky, P., McCoy, R., Fernandez, R., Goldrick, M., and Gao, J. (2022). Neurocompositional com- puting: From the central paradox of cognition to a new generation of AI systems.AI Magazine, 43(3):308–322

work page 2022
[12]

Yedetore, A., Linzen, T., Frank, R., and McCoy, R. T. (2023). How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 9370–9393, Toronto, Canada. Association for Computation...

work page 2023

[1] [1]

Boleda, G. (2025). LLMs as a synthesis between symbolic and distributed approaches to language. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9365–9379, Suzhou, China. Association for Computational Linguistics

work page 2025

[2] [2]

Bybee, J. L. and Hopper, P. J. (2001).Frequency and the emergence of linguistic structure. John Benjamins Publishing Company

work page 2001

[3] [3]

Chomsky, N. (1993). A minimalist program for linguistic theory. InThe View from Building 20, pages 1–52. MIT Press

work page 1993

[4] [4]

and Mahowald, K

Futrell, R. and Mahowald, K. (2025). How linguistics learned to stop worrying and love the language models.Behavioral and Brain Sciences, pages 1–98

work page 2025

[5] [5]

Kim, N., Schuster, S., and Toshniwal, S. (2024). Code pretraining improves entity tracking abilities of language models.arXiv preprint arXiv:2405.21068

work page arXiv 2024

[6] [6]

(1982).Vision: A computational investigation into the human representation and processing of visual information

Marr, D. (1982).Vision: A computational investigation into the human representation and processing of visual information. W.H. Freeman. 3

work page 1982

[7] [7]

T., Grant, E., Smolensky, P., Griffiths, T

McCoy, R. T., Grant, E., Smolensky, P., Griffiths, T. L., and Linzen, T. (2020). Universal linguistic inductive biases via meta-learning.Proceedings of the 42nd Annual Conference of the Cognitive Science Society, pages 737–743

work page 2020

[8] [8]

McCoy, R. T. and Griffiths, T. L. (2025). Modeling rapid language learning by distilling Bayesian priors into artificial neural networks.Nature Communications, 16(1):4676

work page 2025

[9] [9]

and Smolensky, P

Prince, A. and Smolensky, P. (1993/2004).Optimality theory: Constraint interaction in generative grammar. Wiley

work page 1993

[10] [10]

and Legendre, G

Smolensky, P. and Legendre, G. (2006).The Harmonic Mind: From Neural Computation to Optimality- Theoretic Grammar. MIT Press

work page 2006

[11] [11]

Smolensky, P., McCoy, R., Fernandez, R., Goldrick, M., and Gao, J. (2022). Neurocompositional com- puting: From the central paradox of cognition to a new generation of AI systems.AI Magazine, 43(3):308–322

work page 2022

[12] [12]

Yedetore, A., Linzen, T., Frank, R., and McCoy, R. T. (2023). How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 9370–9393, Toronto, Canada. Association for Computation...

work page 2023