You Can't Fight in Here! This is BBS!
Pith reviewed 2026-05-10 16:37 UTC · model grok-4.3
The pith
Language models can inform the science of human language once researchers reject the view of them as mere string statistics and the notion that current performance marks their limit.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Language model research supplies scientific insights into human language when the community discards the String Statistics Strawman—the claim that LMs cannot be linguistically competent because they are statistical models trained on strings—and the As Good As it Gets Assumption—the claim that 2026-era LM performance already represents the maximum they can contribute to linguistics. By constructing a multi-field discussion, the paper identifies these positions as the central obstacles and calls for a broader research agenda in the language sciences that produces stronger results for both human language and the models.
What carries the argument
The constructed discussion among a formal linguist, a computational language scientist, and 25 commentators drawn from linguistics, neuroscience, cognitive science, psychology, philosophy, and computer science, which exposes and reframes the String Statistics Strawman and the As Good As it Gets Assumption as the main barriers.
If this is right
- LMs become legitimate tools for generating and testing hypotheses about linguistic competence and processing.
- Improvements in LM architectures and training will continue to deliver additional insights into human language.
- The language sciences develop more robust methods by incorporating computational modeling while addressing concerns from multiple disciplines.
- Research produces a joint science that studies human language and language models together rather than in isolation.
Where Pith is reading between the lines
- The dialogue format itself could serve as a template for resolving similar interdisciplinary disputes in other areas of cognitive science.
- Future work might define explicit criteria for what counts as a 'linguistic insight' from a model, making the advocated program more operational.
- Extending the same rejection of strawman positions to other AI systems could improve collaboration between computer science and the cognitive sciences on shared questions of representation and learning.
Load-bearing premise
The two misconceptions identified—the String Statistics Strawman and the As Good As it Gets Assumption—accurately capture the primary obstacles that prevent language model work from contributing to the language sciences.
What would settle it
A concrete research program that explicitly rejects both assumptions and then either produces new, empirically testable generalizations about human language that traditional non-LM methods have not yielded, or fails to produce any such generalizations.
Figures
read the original abstract
Norm, the formal theoretical linguist, and Claudette, the computational language scientist, have a lovely time discussing whether modern language models can inform important questions in the language sciences. Just as they are about to part ways until they meet again, 25 of their closest friends show up -- from linguistics, neuroscience, cognitive science, psychology, philosophy, and computer science. We use this discussion to highlight what we see as some common underlying issues: the String Statistics Strawman (the mistaken idea that LMs can't be linguistically competent or interesting because they, like their Markov model predecessors, are statistical models that learn from strings) and the As Good As it Gets Assumption (the idea that LM research as it stands in 2026 is the limit of what it can tell us about linguistics). We clarify the role of LM-based work for scientific insights into human language and advocate for a more expansive research program for the language sciences in the AI age, one that takes on the commentators' concerns in order to produce a better and more robust science of both human language and of LMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript constructs a dialogue between Norm (a formal theoretical linguist) and Claudette (a computational language scientist) on whether modern language models can inform questions in the language sciences. This exchange is joined by 25 commentators drawn from linguistics, neuroscience, cognitive science, psychology, philosophy, and computer science. The authors use the resulting discussion to identify and rebut two misconceptions—the String Statistics Strawman (that LMs are merely statistical string models like their Markov predecessors and thus cannot be linguistically competent) and the As Good As it Gets Assumption (that current LM capabilities represent the upper limit of what they can contribute)—and to advocate for an expanded, integrative research program that combines LM-based work with the language sciences to advance understanding of both human language and LMs.
Significance. If the central framing holds, the paper offers a timely and constructive intervention at the intersection of AI and the language sciences. The dialogue format provides an effective vehicle for synthesizing diverse objections and clarifying the scientific role of LM research without reducing it to string statistics or treating present performance as definitive. By explicitly advocating an expansive program that incorporates commentators' concerns, the work has the potential to guide more robust, interdisciplinary inquiry. The absence of free parameters, derivations, or empirical claims is appropriate for a position piece of this type and does not undermine the synthesis.
minor comments (2)
- The abstract would be strengthened by a single sentence explicitly stating the two main conclusions about the role of LM-based work and the advocated research program.
- In the sections presenting the commentators' interventions, brief parenthetical references to representative published critiques (rather than purely constructed positions) would help readers locate the synthesis within the existing literature.
Simulated Author's Rebuttal
We thank the referee for their positive and constructive review of the manuscript. We are pleased that the dialogue format, the identification of the String Statistics Strawman and As Good As it Gets Assumption, and the advocacy for an integrative research program were viewed as timely and appropriate for a position piece.
Circularity Check
No significant circularity in this conceptual position paper
full rationale
The paper is a position piece that constructs a fictional dialogue between linguists and commentators to address two named misconceptions (String Statistics Strawman and As Good As it Gets Assumption) and advocate for integrating LM research with language sciences. It contains no equations, derivations, fitted parameters, predictions, or self-citations that form load-bearing steps. The central claims are advocacy and clarification based on external fields, with no reduction of any result to its own inputs by construction. The derivation chain is absent, making this a standard non-circular finding for a non-formal discussion article.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
a more expansive research program for the language sciences in the AI age, one that takes on the commentators’ concerns in order to produce a better and more robust science of both human language and of LMs. 1 Introduction Our position is: language models do not replace linguistic theories, but they do tell us things about language, and they do force us t...
work page 2021
-
[2]
holds models to what we think is not a reasonable standard: in one prompt (admittedly cherrypicked here; others are less fanciful; see Lupyan for further discussion and critiques of this set of experiments), frontier models are prompted with “Pretend that ‘glart’ is a word that refers to a group of alien creatures, and can also refer to the action of plea...
work page 2024
-
[3]
mean ‘theories rise and decline, come and go, more as a function of baffled boredom than anything else; and the enterprise shows a disturbing absence of that cumulative character that is so impressive’ (Meehl 1978).” We’re somewhat baffled about how what we wrote could have led to this idea. But this sneering dismissal of an entire productive field of inq...
work page 1978
-
[4]
in both cases, these commentaries emphasize that matching human behavior at the computational level does not by itself license conclusions about the algorithmic or implementational levels that underlie human sentence processing. Resnik doubts the usefulness of LMs as cognitive models at all three of Marr’s levels. He says the implementation is so obviousl...
work page 2024
-
[5]
under various manipulations (Misra and Mahowald, 2024; Patil et al., 2024; Yao et al.,
work page 2024
-
[6]
and study learning has been one of the most promising contributions of linguistics to LM research.5 The commentators lay out promising avenues for further work in these domains. If it turns out that certain linguistic structures are fundamentally not amenable to being 4 An interesting historical example of this fallacy is in Pinker and Prince’s 1988 famou...
work page 1988
-
[7]
Baroni, M., Bernardi, R., and Zamparelli, R. (2014). Frege in space: A program for composition distributional semantics. Linguistic Issues in Language Technology,
work page 2014
-
[8]
C., Pietroski, P., Yankama, B., and Chomsky, N
Berwick, R. C., Pietroski, P., Yankama, B., and Chomsky, N. (2011). Poverty of the stimulus revisited. Cognitive Science, 35(7):1207–1242. Bhattamishra, S., Ahuja, K., and Goyal, N. (2020). On the practical ability of recurrent neural networks to recognize hierarchical languages. In Scott, D., Bel, N., and Zong, C., editors, Proceedings of the 28th Intern...
-
[9]
Misra, K. and Mahowald, K. (2024). Language models learn rare phenomena from less rare phenomena: The case of the missing AANNs. In Al-Onaizan, Y., Bansal, M., and Chen, Y.-N., editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 913–929, Miami, Florida, USA. Association for Computational Linguistics. Muel...
work page 2024
-
[10]
Pearl, L. and Sprouse, J. (2013). Computational models of acquisition for islands. In Sprouse, J. and Hornstein, N., editors, Experimental Syntax and Islands Effects, pages 109–131. Cambridge University Press. Perfors, A., Tenenbaum, J. B., and Regier, T. (2013). The learnability of abstract syntactic principles. Cognition, 118(3):306–338. Pinker, S. and ...
work page 2013
-
[11]
Trueswell, J. C., Tanenhaus, M. K., and Garnsey, S. M. (1994). Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Memory and Language, 33:285–318. Warstadt, A., Mueller, A., Choshen, L., Wilcox, E., Zhuang, C., Ciro, J., Mosquera, R., Paranjabe, B., Williams, A., Linzen, T., Cotterell, R. (2023). Findings o...
work page 1994
-
[12]
Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.