You Can't Fight in Here! This is BBS!

Kyle Mahowald; Richard Futrell

arxiv: 2604.09501 · v1 · submitted 2026-04-10 · 💻 cs.CL

You Can't Fight in Here! This is BBS!

Richard Futrell , Kyle Mahowald This is my paper

Pith reviewed 2026-05-10 16:37 UTC · model grok-4.3

classification 💻 cs.CL

keywords language modelshuman languagelinguisticscomputational modelsscientific insightsresearch programAI and language

0 comments

The pith

Language models can inform the science of human language once researchers reject the view of them as mere string statistics and the notion that current performance marks their limit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that two misconceptions block the productive use of language models in linguistics: the String Statistics Strawman, which dismisses LMs because they learn statistical patterns from strings like earlier Markov models, and the As Good As it Gets Assumption, which treats today's models as the ceiling for what they can reveal about language. Through a staged conversation between a formal linguist and a computational scientist joined by 25 commentators from multiple fields, the authors show how LM work can generate genuine insights into human language. They advocate replacing these barriers with an expanded research program that integrates AI advances while addressing field-specific concerns. This matters because it promises a more unified and robust science that studies both human language and the models themselves.

Core claim

Language model research supplies scientific insights into human language when the community discards the String Statistics Strawman—the claim that LMs cannot be linguistically competent because they are statistical models trained on strings—and the As Good As it Gets Assumption—the claim that 2026-era LM performance already represents the maximum they can contribute to linguistics. By constructing a multi-field discussion, the paper identifies these positions as the central obstacles and calls for a broader research agenda in the language sciences that produces stronger results for both human language and the models.

What carries the argument

The constructed discussion among a formal linguist, a computational language scientist, and 25 commentators drawn from linguistics, neuroscience, cognitive science, psychology, philosophy, and computer science, which exposes and reframes the String Statistics Strawman and the As Good As it Gets Assumption as the main barriers.

If this is right

LMs become legitimate tools for generating and testing hypotheses about linguistic competence and processing.
Improvements in LM architectures and training will continue to deliver additional insights into human language.
The language sciences develop more robust methods by incorporating computational modeling while addressing concerns from multiple disciplines.
Research produces a joint science that studies human language and language models together rather than in isolation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dialogue format itself could serve as a template for resolving similar interdisciplinary disputes in other areas of cognitive science.
Future work might define explicit criteria for what counts as a 'linguistic insight' from a model, making the advocated program more operational.
Extending the same rejection of strawman positions to other AI systems could improve collaboration between computer science and the cognitive sciences on shared questions of representation and learning.

Load-bearing premise

The two misconceptions identified—the String Statistics Strawman and the As Good As it Gets Assumption—accurately capture the primary obstacles that prevent language model work from contributing to the language sciences.

What would settle it

A concrete research program that explicitly rejects both assumptions and then either produces new, empirically testable generalizations about human language that traditional non-LM methods have not yielded, or fails to produce any such generalizations.

Figures

Figures reproduced from arXiv: 2604.09501 by Kyle Mahowald, Richard Futrell.

**Figure 1.** Figure 1: Two common fallacies about LMs and linguistics. In fact, we have no interest in eating Norm. We are serious that LMs don’t replace linguistic theories. We believe that linguistic theory, as instantiated across a variety of formalisms, contains the best-known characterization of human linguistic competence. If you want an effective theory of how people judge the grammaticality of new sentences, how much hum… view at source ↗

read the original abstract

Norm, the formal theoretical linguist, and Claudette, the computational language scientist, have a lovely time discussing whether modern language models can inform important questions in the language sciences. Just as they are about to part ways until they meet again, 25 of their closest friends show up -- from linguistics, neuroscience, cognitive science, psychology, philosophy, and computer science. We use this discussion to highlight what we see as some common underlying issues: the String Statistics Strawman (the mistaken idea that LMs can't be linguistically competent or interesting because they, like their Markov model predecessors, are statistical models that learn from strings) and the As Good As it Gets Assumption (the idea that LM research as it stands in 2026 is the limit of what it can tell us about linguistics). We clarify the role of LM-based work for scientific insights into human language and advocate for a more expansive research program for the language sciences in the AI age, one that takes on the commentators' concerns in order to produce a better and more robust science of both human language and of LMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Futrell and Mahowald use a dialogue format to name and push back on two common objections to LM work in linguistics, making a case for tighter integration without new data.

read the letter

The core takeaway is that this is a position piece that tries to reset the conversation between formal linguistics and computational work on language models. The authors construct a back-and-forth between two characters and then bring in 25 commentators to target the String Statistics Strawman and the As Good As it Gets Assumption. That framing is the main thing that feels fresh: it turns recurring complaints into named targets and shows how they might be answered from different fields.

Referee Report

0 major / 2 minor

Summary. The manuscript constructs a dialogue between Norm (a formal theoretical linguist) and Claudette (a computational language scientist) on whether modern language models can inform questions in the language sciences. This exchange is joined by 25 commentators drawn from linguistics, neuroscience, cognitive science, psychology, philosophy, and computer science. The authors use the resulting discussion to identify and rebut two misconceptions—the String Statistics Strawman (that LMs are merely statistical string models like their Markov predecessors and thus cannot be linguistically competent) and the As Good As it Gets Assumption (that current LM capabilities represent the upper limit of what they can contribute)—and to advocate for an expanded, integrative research program that combines LM-based work with the language sciences to advance understanding of both human language and LMs.

Significance. If the central framing holds, the paper offers a timely and constructive intervention at the intersection of AI and the language sciences. The dialogue format provides an effective vehicle for synthesizing diverse objections and clarifying the scientific role of LM research without reducing it to string statistics or treating present performance as definitive. By explicitly advocating an expansive program that incorporates commentators' concerns, the work has the potential to guide more robust, interdisciplinary inquiry. The absence of free parameters, derivations, or empirical claims is appropriate for a position piece of this type and does not undermine the synthesis.

minor comments (2)

The abstract would be strengthened by a single sentence explicitly stating the two main conclusions about the role of LM-based work and the advocated research program.
In the sections presenting the commentators' interventions, brief parenthetical references to representative published critiques (rather than purely constructed positions) would help readers locate the synthesis within the existing literature.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and constructive review of the manuscript. We are pleased that the dialogue format, the identification of the String Statistics Strawman and As Good As it Gets Assumption, and the advocacy for an integrative research program were viewed as timely and appropriate for a position piece.

Circularity Check

0 steps flagged

No significant circularity in this conceptual position paper

full rationale

The paper is a position piece that constructs a fictional dialogue between linguists and commentators to address two named misconceptions (String Statistics Strawman and As Good As it Gets Assumption) and advocate for integrating LM research with language sciences. It contains no equations, derivations, fitted parameters, predictions, or self-citations that form load-bearing steps. The central claims are advocacy and clarification based on external fields, with no reduction of any result to its own inputs by construction. The derivation chain is absent, making this a standard non-circular finding for a non-formal discussion article.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no free parameters, formal axioms, or invented entities; it relies on conceptual analysis of existing debates in linguistics and AI.

pith-pipeline@v0.9.0 · 5479 in / 971 out tokens · 59293 ms · 2026-05-10T16:37:21.890992+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

a compelling middle path,

a more expansive research program for the language sciences in the AI age, one that takes on the commentators’ concerns in order to produce a better and more robust science of both human language and of LMs. 1 Introduction Our position is: language models do not replace linguistic theories, but they do tell us things about language, and they do force us t...

work page 2021
[2]

the action of pleasing

holds models to what we think is not a reasonable standard: in one prompt (admittedly cherrypicked here; others are less fanciful; see Lupyan for further discussion and critiques of this set of experiments), frontier models are prompted with “Pretend that ‘glart’ is a word that refers to a group of alien creatures, and can also refer to the action of plea...

work page 2024
[3]

But this sneering dismissal of an entire productive field of inquiry (psychology) is misplaced and exemplifies worrisome linguistic isolationism

mean ‘theories rise and decline, come and go, more as a function of baffled boredom than anything else; and the enterprise shows a disturbing absence of that cumulative character that is so impressive’ (Meehl 1978).” We’re somewhat baffled about how what we wrote could have led to this idea. But this sneering dismissal of an entire productive field of inq...

work page 1978
[4]

me gusta

in both cases, these commentaries emphasize that matching human behavior at the computational level does not by itself license conclusions about the algorithmic or implementational levels that underlie human sentence processing. Resnik doubts the usefulness of LMs as cognitive models at all three of Marr’s levels. He says the implementation is so obviousl...

work page 2024
[5]

under various manipulations (Misra and Mahowald, 2024; Patil et al., 2024; Yao et al.,

work page 2024
[6]

Wickelphones

and study learning has been one of the most promising contributions of linguistics to LM research.5 The commentators lay out promising avenues for further work in these domains. If it turns out that certain linguistic structures are fundamentally not amenable to being 4 An interesting historical example of this fallacy is in Pinker and Prince’s 1988 famou...

work page 1988
[7]

Baroni, M., Bernardi, R., and Zamparelli, R. (2014). Frege in space: A program for composition distributional semantics. Linguistic Issues in Language Technology,

work page 2014
[8]

C., Pietroski, P., Yankama, B., and Chomsky, N

Berwick, R. C., Pietroski, P., Yankama, B., and Chomsky, N. (2011). Poverty of the stimulus revisited. Cognitive Science, 35(7):1207–1242. Bhattamishra, S., Ahuja, K., and Goyal, N. (2020). On the practical ability of recurrent neural networks to recognize hierarchical languages. In Scott, D., Bel, N., and Zong, C., editors, Proceedings of the 28th Intern...

work page arXiv 2011
[9]

and Mahowald, K

Misra, K. and Mahowald, K. (2024). Language models learn rare phenomena from less rare phenomena: The case of the missing AANNs. In Al-Onaizan, Y., Bansal, M., and Chen, Y.-N., editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 913–929, Miami, Florida, USA. Association for Computational Linguistics. Muel...

work page 2024
[10]

and Sprouse, J

Pearl, L. and Sprouse, J. (2013). Computational models of acquisition for islands. In Sprouse, J. and Hornstein, N., editors, Experimental Syntax and Islands Effects, pages 109–131. Cambridge University Press. Perfors, A., Tenenbaum, J. B., and Regier, T. (2013). The learnability of abstract syntactic principles. Cognition, 118(3):306–338. Pinker, S. and ...

work page 2013
[11]

C., Tanenhaus, M

Trueswell, J. C., Tanenhaus, M. K., and Garnsey, S. M. (1994). Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Memory and Language, 33:285–318. Warstadt, A., Mueller, A., Choshen, L., Wilcox, E., Zhuang, C., Ciro, J., Mosquera, R., Paranjabe, B., Williams, A., Linzen, T., Cotterell, R. (2023). Findings o...

work page 1994
[12]

Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115

work page 2021

[1] [1]

a compelling middle path,

a more expansive research program for the language sciences in the AI age, one that takes on the commentators’ concerns in order to produce a better and more robust science of both human language and of LMs. 1 Introduction Our position is: language models do not replace linguistic theories, but they do tell us things about language, and they do force us t...

work page 2021

[2] [2]

the action of pleasing

holds models to what we think is not a reasonable standard: in one prompt (admittedly cherrypicked here; others are less fanciful; see Lupyan for further discussion and critiques of this set of experiments), frontier models are prompted with “Pretend that ‘glart’ is a word that refers to a group of alien creatures, and can also refer to the action of plea...

work page 2024

[3] [3]

But this sneering dismissal of an entire productive field of inquiry (psychology) is misplaced and exemplifies worrisome linguistic isolationism

mean ‘theories rise and decline, come and go, more as a function of baffled boredom than anything else; and the enterprise shows a disturbing absence of that cumulative character that is so impressive’ (Meehl 1978).” We’re somewhat baffled about how what we wrote could have led to this idea. But this sneering dismissal of an entire productive field of inq...

work page 1978

[4] [4]

me gusta

in both cases, these commentaries emphasize that matching human behavior at the computational level does not by itself license conclusions about the algorithmic or implementational levels that underlie human sentence processing. Resnik doubts the usefulness of LMs as cognitive models at all three of Marr’s levels. He says the implementation is so obviousl...

work page 2024

[5] [5]

under various manipulations (Misra and Mahowald, 2024; Patil et al., 2024; Yao et al.,

work page 2024

[6] [6]

Wickelphones

and study learning has been one of the most promising contributions of linguistics to LM research.5 The commentators lay out promising avenues for further work in these domains. If it turns out that certain linguistic structures are fundamentally not amenable to being 4 An interesting historical example of this fallacy is in Pinker and Prince’s 1988 famou...

work page 1988

[7] [7]

Baroni, M., Bernardi, R., and Zamparelli, R. (2014). Frege in space: A program for composition distributional semantics. Linguistic Issues in Language Technology,

work page 2014

[8] [8]

C., Pietroski, P., Yankama, B., and Chomsky, N

Berwick, R. C., Pietroski, P., Yankama, B., and Chomsky, N. (2011). Poverty of the stimulus revisited. Cognitive Science, 35(7):1207–1242. Bhattamishra, S., Ahuja, K., and Goyal, N. (2020). On the practical ability of recurrent neural networks to recognize hierarchical languages. In Scott, D., Bel, N., and Zong, C., editors, Proceedings of the 28th Intern...

work page arXiv 2011

[9] [9]

and Mahowald, K

Misra, K. and Mahowald, K. (2024). Language models learn rare phenomena from less rare phenomena: The case of the missing AANNs. In Al-Onaizan, Y., Bansal, M., and Chen, Y.-N., editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 913–929, Miami, Florida, USA. Association for Computational Linguistics. Muel...

work page 2024

[10] [10]

and Sprouse, J

Pearl, L. and Sprouse, J. (2013). Computational models of acquisition for islands. In Sprouse, J. and Hornstein, N., editors, Experimental Syntax and Islands Effects, pages 109–131. Cambridge University Press. Perfors, A., Tenenbaum, J. B., and Regier, T. (2013). The learnability of abstract syntactic principles. Cognition, 118(3):306–338. Pinker, S. and ...

work page 2013

[11] [11]

C., Tanenhaus, M

Trueswell, J. C., Tanenhaus, M. K., and Garnsey, S. M. (1994). Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Memory and Language, 33:285–318. Warstadt, A., Mueller, A., Choshen, L., Wilcox, E., Zhuang, C., Ciro, J., Mosquera, R., Paranjabe, B., Williams, A., Linzen, T., Cotterell, R. (2023). Findings o...

work page 1994

[12] [12]

Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115

work page 2021