pith. sign in

arxiv: 2605.10061 · v1 · pith:4TPHQWJ5new · submitted 2026-05-11 · 💻 cs.CL · cs.AI

Not-So-Strange Love: Language Models and Generative Linguistic Theories are More Compatible than They Appear

Pith reviewed 2026-05-12 02:10 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords language modelsgenerative linguisticsusage-based theoriesformal structureslinguistic compatibilityneural LMstheory reconciliation
0
0 comments X

The pith

Language models can instantiate formal generative linguistic theories in addition to usage-based ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that the success of neural language models is compatible with theories based on formal structures from the generative tradition, not just gradient usage-based theories. This claim broadens the kinds of linguistic theories that can be investigated using language models. A sympathetic reader would care because it suggests that current debates in linguistics need not treat these models as evidence for only one theoretical camp. If the argument holds, it could facilitate reconciliations between usage-based and generative accounts of language.

Core claim

LMs can also instantiate theories based on formal structures - the types of theories seen in the generative tradition. This argument expands the space of theories that can be tested with LMs, potentially enabling reconciliations between usage-based and generative accounts.

What carries the argument

The capacity of language models to embody formal generative structures through their learned representations and behaviors.

If this is right

  • This expands the space of theories that can be tested with LMs.
  • It potentially enables reconciliations between usage-based and generative accounts.
  • LMs can serve as a testing ground for formal linguistic theories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Researchers could look for specific formal properties in trained language models to support this view.
  • The approach might encourage more integrated models that draw from both theoretical traditions.
  • It implies that linguistic theory testing can be more inclusive of computational methods.

Load-bearing premise

The observed success and behavior of LMs can be interpreted as instantiating formal generative theories without additional evidence or specific mechanisms.

What would settle it

A study showing that language models do not exhibit any distinctive predictions from generative theories that go beyond what usage-based models predict.

read the original abstract

Futrell and Mahowald (2025) frame the success of neural language models (LMs) as supporting gradient, usage-based linguistic theories. I argue that LMs can also instantiate theories based on formal structures - the types of theories seen in the generative tradition. This argument expands the space of theories that can be tested with LMs, potentially enabling reconciliations between usage-based and generative accounts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript argues that neural language models (LMs) can instantiate formal generative linguistic theories (e.g., those based on hierarchical structures and rules from the generative tradition), in addition to supporting the gradient, usage-based theories emphasized by Futrell and Mahowald (2025). This compatibility is said to expand the space of testable theories with LMs and potentially enable reconciliations between usage-based and generative accounts.

Significance. If developed with concrete mechanisms, this perspective could meaningfully broaden how LM behaviors are interpreted in linguistics, allowing formal generative hypotheses to be tested empirically via model performance and training dynamics. It would challenge the current framing of LM success as exclusively favoring usage-based theories and open integrative research avenues.

major comments (1)
  1. Abstract: The central claim that LMs 'can also instantiate theories based on formal structures' is asserted without any derivation, mapping from LM components (e.g., attention heads or embeddings) to generative theory elements (e.g., phrase structure or transformations), specific examples, or cited empirical results. This absence is load-bearing, as the manuscript supplies no evidence or mechanism to show instantiation, leaving the argument without grounding.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comment below and indicate the revisions we will make to better ground the central claim.

read point-by-point responses
  1. Referee: [—] Abstract: The central claim that LMs 'can also instantiate theories based on formal structures' is asserted without any derivation, mapping from LM components (e.g., attention heads or embeddings) to generative theory elements (e.g., phrase structure or transformations), specific examples, or cited empirical results. This absence is load-bearing, as the manuscript supplies no evidence or mechanism to show instantiation, leaving the argument without grounding.

    Authors: We acknowledge that the abstract presents the claim at a high level of generality without explicit mappings, derivations, or concrete examples. The manuscript is a concise position paper whose core argument is that LMs are in principle capable of instantiating formal generative structures because their training objectives and architectures enable them to acquire hierarchical and rule-like representations of language. We agree this requires more explicit grounding to be persuasive. In revision we will expand the abstract with a brief clause indicating the basis for the claim (LMs' demonstrated capacity to encode syntactic hierarchies) and add a short section to the main text that sketches potential correspondences, such as how self-attention layers can implement operations akin to Merge or movement, while citing relevant probing studies that link LM internals to generative constructs. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper consists of a short conceptual argument in its abstract: it cites Futrell and Mahowald (2025) for the usage-based framing and then states that LMs can also instantiate formal generative theories, thereby expanding testable theory space. No equations, parameters, predictions, or derivations are present. The single external citation is not self-citation by the author and does not serve as a load-bearing premise that reduces to the paper's own inputs. The central claim is an interpretive assertion rather than a chain that collapses by definition or construction to its starting assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on the domain assumption that LMs are capable of embodying formal linguistic structures, which is asserted without independent evidence or examples in the abstract.

axioms (1)
  • domain assumption Neural language models can instantiate linguistic theories based on formal structures.
    This is the core premise invoked in the abstract to counter the usage-based framing.

pith-pipeline@v0.9.0 · 5325 in / 1133 out tokens · 53104 ms · 2026-05-12T02:10:48.890450+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    Boleda, G. (2025). LLMs as a synthesis between symbolic and distributed approaches to language. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9365–9379, Suzhou, China. Association for Computational Linguistics

  2. [2]

    Bybee, J. L. and Hopper, P. J. (2001).Frequency and the emergence of linguistic structure. John Benjamins Publishing Company

  3. [3]

    Chomsky, N. (1993). A minimalist program for linguistic theory. InThe View from Building 20, pages 1–52. MIT Press

  4. [4]

    and Mahowald, K

    Futrell, R. and Mahowald, K. (2025). How linguistics learned to stop worrying and love the language models.Behavioral and Brain Sciences, pages 1–98

  5. [5]

    Kim, N., Schuster, S., and Toshniwal, S. (2024). Code pretraining improves entity tracking abilities of language models.arXiv preprint arXiv:2405.21068

  6. [6]

    (1982).Vision: A computational investigation into the human representation and processing of visual information

    Marr, D. (1982).Vision: A computational investigation into the human representation and processing of visual information. W.H. Freeman. 3

  7. [7]

    T., Grant, E., Smolensky, P., Griffiths, T

    McCoy, R. T., Grant, E., Smolensky, P., Griffiths, T. L., and Linzen, T. (2020). Universal linguistic inductive biases via meta-learning.Proceedings of the 42nd Annual Conference of the Cognitive Science Society, pages 737–743

  8. [8]

    McCoy, R. T. and Griffiths, T. L. (2025). Modeling rapid language learning by distilling Bayesian priors into artificial neural networks.Nature Communications, 16(1):4676

  9. [9]

    and Smolensky, P

    Prince, A. and Smolensky, P. (1993/2004).Optimality theory: Constraint interaction in generative grammar. Wiley

  10. [10]

    and Legendre, G

    Smolensky, P. and Legendre, G. (2006).The Harmonic Mind: From Neural Computation to Optimality- Theoretic Grammar. MIT Press

  11. [11]

    Smolensky, P., McCoy, R., Fernandez, R., Goldrick, M., and Gao, J. (2022). Neurocompositional com- puting: From the central paradox of cognition to a new generation of AI systems.AI Magazine, 43(3):308–322

  12. [12]

    Yedetore, A., Linzen, T., Frank, R., and McCoy, R. T. (2023). How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 9370–9393, Toronto, Canada. Association for Computation...