Chatbots Output Meaningful (but Problematic) Language

Matthew Stone; Una Stojni\'c

arxiv: 2606.02973 · v1 · pith:24SCXZXXnew · submitted 2026-06-02 · 💻 cs.CL

Chatbots Output Meaningful (but Problematic) Language

Matthew Stone , Una Stojni\'c This is my paper

Pith reviewed 2026-06-28 11:10 UTC · model grok-4.3

classification 💻 cs.CL

keywords chatbotsmeaningLLMsphilosophy of languageintentionalismAI textsemantics

0 comments

The pith

Chatbot outputs have their ordinary meaning under standard theories of human language without needing to assume any mental states in the machines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that AI chatbots produce meaningful language using the same theories that apply to human speech. This is because meaning does not depend on the producer having specific intentions or mental states that match the output. Humans sometimes say things that differ from what they have in mind, yet the language still means what it says. Therefore, chatbots do not require special new theories of language. This approach changes how we should think about and critique both human and AI-generated text without assuming the AI is conscious or rational.

Core claim

Utterances by AI chatbots are meaningful because a proper theory of human language already applies as is to current chatbots. Meaning is a low bar that does not require positing mental states, intentions, rationality, or other anthropomorphic assumptions in LLMs, since even in humans language production can depart from what the speaker has in mind.

What carries the argument

The principle that language production can depart from the producer's mental states or intentions while still carrying ordinary meaning.

If this is right

Answers from chatbots can express true propositions in the standard way.
Analyses of human language must handle cases where output does not match the speaker's intentions.
Engagement with chatbot text should treat questions of meaning separately from questions of whether the content is endorsed or the technology is useful.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Questions about whether chatbots 'understand' or 'intend' their outputs may not affect whether their sentences have meaning or truth value.
The same logic might extend to other systems that generate language without internal mental states, such as rule-based systems or animals.
Further work could examine specific cases of human language use where intentions and output diverge to see if the account holds.

Load-bearing premise

That language can be produced in ways that depart from the producer's mental states or intentions, as occurs in humans, and that this does not prevent the output from having ordinary meaning.

What would settle it

Evidence that meaningful language use always requires the producer's intentions to match the output, with no counterexamples in human communication.

read the original abstract

Are utterances by AI chatbots meaningful? Concretely, if a user asks, say, Anthropic's agent Claude, "What is the capital of Spain?" and Claude answers, "Madrid is the capital of Spain," does that sentence have its ordinary meaning -- and does it express a true proposition? Most ordinary users, as well as AI engineers, take the answer to be trivially "yes." However, many cognitive scientists, linguists, and philosophers of language argue that dominant intentionalist accounts of language and meaning deliver the opposite conclusion. Theorists more sympathetic to ordinary users' intuitions have therefore advocated a radical "de-anthropomorphization" of language, revising our understanding of mental states, intentions, and semantic content to capture the intuition that the outputs of LLMs are meaningful. We take a different approach. While we, too, argue that LLM outputs are meaningful, we contend that a proper theory of human language already applies, as is, to current chatbots. Meaning is a low bar: claiming that LLM outputs are meaningful does not require positing mental states, intentions, rationality, or the cognitive capacities requisite for communication in LLMs -- or, indeed, making any other anthropomorphic assumptions. People do have communicative intentions (typically successful ones), but nevertheless, even in humans, language production can depart from what the speaker has in mind. Our view has important consequences for how we should theorize about -- and critically engage with -- both human linguistic output and synthetically generated text. In particular, to say that chatbots produce meaningful text is not by any means to endorse what they output, or to assume that the technology is (or is not) good, powerful, appropriate, or useful.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper argues standard meaning theories already cover LLMs because human output can mismatch speaker intentions, but this may not fully support the zero-psychology case without naming a specific account.

read the letter

The main takeaway is that the authors claim ordinary theories of human language already apply to chatbot outputs without needing to posit any mental states, intentions, or cognitive capacities in the AI. They note that even in people, what gets produced can depart from what the speaker has in mind, so meaning does not require matching the producer's psychology. This keeps the bar low and avoids both denying meaning to LLMs and revising our concepts of mind.

What the paper does well is draw a clean line between accepting that outputs are meaningful and endorsing their content or the technology itself. That distinction is useful for critical engagement.

The soft spot is the move from human mismatch cases to machines with no psychology at all. Most non-intentionalist accounts still tie meaning to some form of speaker or community practice, and the abstract does not identify a theory that survives complete detachment. The stress-test concern holds here: the human examples may not carry the full load. The full paper might fill this in, but as described it leaves the extension unsupported.

This is for philosophers of language and linguists working on AI applications. A reader looking for a position that applies existing tools without new machinery or anthropomorphism will find it worth reading. It deserves a serious referee because the synthesis is distinct and the conceptual framing is honest, even if the central extension needs more defense on the theory side.

I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper claims that utterances by current AI chatbots have ordinary meaning under existing theories of human language, without any need to posit mental states, intentions, or cognitive capacities in the chatbots themselves. It contrasts this conservative position with intentionalist accounts (which would deny meaning to LLMs) and with revisionary 'de-anthropomorphizing' approaches, arguing that the detachability of meaning from producer psychology is already licensed by cases in which human language production departs from what the speaker has in mind.

Significance. If the central argument holds, the result supplies a non-revisionary route to attributing truth-conditional content to LLM outputs, thereby simplifying debates in the philosophy of language and AI ethics: meaning can be ascribed (and outputs can be criticized) without anthropomorphic assumptions or endorsement of the technology. The manuscript correctly notes that meaning attribution is independent of whether the technology is 'good' or 'useful.'

major comments (2)

[§1] §1 (paragraph beginning 'Theorists more sympathetic...'): The claim that 'a proper theory of human language already applies, as is' is load-bearing, yet the manuscript does not identify or defend any specific non-intentionalist theory (e.g., a particular use-based, conventionalist, or truth-conditional account) that treats meaning as fully detachable from all producer psychology. Without this, the extension from human mismatch cases to the zero-psychology LLM case remains unsupported.
[Introduction] Discussion of intentionalist accounts (early paragraphs of the introduction): The assertion that 'dominant intentionalist accounts of language and meaning deliver the opposite conclusion' is central to motivating the position, but no specific intentionalist theory is cited or shown to fail for LLMs while surviving the human mismatch cases; this leaves the contrast with the paper's own view unanchored.

minor comments (2)

[Abstract] Abstract: The final sentence on consequences for 'theorizing about both human linguistic output and synthetically generated text' is stated at a high level of generality; a single concrete illustration of how the view alters analysis of one chatbot exchange would improve clarity.
[Conclusion] The manuscript would benefit from an explicit statement of the scope (current LLMs only, or future systems as well) to avoid ambiguity about whether the argument is intended to be timeless.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these incisive comments, which correctly identify places where greater specificity would strengthen the manuscript. We address each point below and will revise accordingly.

read point-by-point responses

Referee: [§1] §1 (paragraph beginning 'Theorists more sympathetic...'): The claim that 'a proper theory of human language already applies, as is' is load-bearing, yet the manuscript does not identify or defend any specific non-intentionalist theory (e.g., a particular use-based, conventionalist, or truth-conditional account) that treats meaning as fully detachable from all producer psychology. Without this, the extension from human mismatch cases to the zero-psychology LLM case remains unsupported.

Authors: We accept the referee's observation. The manuscript's core strategy is to remain theory-neutral and rely on the fact that detachability of meaning from producer psychology is already licensed by standard treatments of human cases; we therefore conclude that no revisionary move is required for LLMs. Nevertheless, the referee is right that the load-bearing claim would be more robust if concrete examples were supplied. In the revised version we will expand the relevant paragraph in §1 to name and briefly characterize two families of non-intentionalist views—conventionalist accounts (e.g., Lewis) and use-based approaches (e.g., later Wittgenstein and certain truth-conditional theories that locate meaning in public practice rather than individual states)—and show how each already accommodates human mismatch cases. This addition will make the extension to LLMs explicit without altering the paper's conservative stance. revision: yes
Referee: [Introduction] Discussion of intentionalist accounts (early paragraphs of the introduction): The assertion that 'dominant intentionalist accounts of language and meaning deliver the opposite conclusion' is central to motivating the position, but no specific intentionalist theory is cited or shown to fail for LLMs while surviving the human mismatch cases; this leaves the contrast with the paper's own view unanchored.

Authors: We agree that the contrast would be sharper with explicit references. The manuscript's claim is that, on any view requiring producer intentions or mental states for meaning, LLMs would lack meaning, while the human mismatch cases already show that such views must allow meaning to diverge from actual psychology. To anchor this, the revised introduction will cite representative intentionalist theories (Gricean and neo-Gricean accounts) and briefly note how each would treat the human cases versus the LLM case. This will clarify the motivation without committing the paper to any particular intentionalist framework. revision: yes

Circularity Check

0 steps flagged

No circularity: purely conceptual reinterpretation of independent linguistic theories

full rationale

The paper advances a philosophical position that standard theories of meaning already cover LLM outputs because human language production can already diverge from speaker intentions. This is presented as an application of existing accounts rather than a derivation that reduces to self-definition, fitted parameters, or a self-citation chain. No equations, data-fitting steps, or load-bearing uniqueness theorems appear; the argument rests on reinterpretation of ordinary human cases and does not construct its conclusion by renaming or smuggling in its own premises. The derivation is therefore self-contained against external benchmarks in philosophy of language.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper's position rests on assumptions about the nature of linguistic meaning drawn from philosophy of language, without introducing new entities or fitted parameters.

axioms (2)

domain assumption Meaning does not require the speaker to have corresponding intentions or mental states.
This allows the theory to apply to LLMs and explains human cases where output departs from mind.
domain assumption Ordinary meaning attribution to utterances is independent of the producer's cognitive capacities for communication.
Core to the 'low bar' for meaning in both human and synthetic text.

pith-pipeline@v0.9.1-grok · 5841 in / 1254 out tokens · 36327 ms · 2026-06-28T11:10:00.329739+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

120 extracted references · 33 canonical work pages

[1]

Going Whole Hog: A Philosophical Defense of

Herman Cappelen and Josh Dever , year=. Going Whole Hog: A Philosophical Defense of. 2504.13988 , archivePrefix=

arXiv
[2]

and Roelofs, Ardi and Meyer, Antje S

Levelt, Willem J. and Roelofs, Ardi and Meyer, Antje S. , title =. Behavioral and Brain Sciences , number =. doi:https://doi.org/10.1017/s0140 525x9 9001776 , year =

work page doi:10.1017/s0140
[3]

On Words , Volume =

John Hawthorne and Ernie Lepore , Journal =. On Words , Volume =. doi:10.5840/2011108924 , Year =

work page doi:10.5840/2011108924
[4]

Levelt , title =

Willem J. Levelt , title =
[5]

Deixis (Even Without Pointing) , Volume =

Stojni. Deixis (Even Without Pointing) , Volume =. Philosophical Perspectives , Number =
[6]

Imagination and Convention: Distinguishing Grammar and Inference in Language , Year =

Ernest Lepore and Matthew Stone , Date-Added =. Imagination and Convention: Distinguishing Grammar and Inference in Language , Year =
[7]

Discourse and Logical Form , Volume =

Stojni\'. Discourse and Logical Form , Volume =. Linguistics and Philosophy , Number =
[8]

Dell , title =

Gary S. Dell , title =. Psychological Review , number =. doi:https://doi.org/10.1037/0033- 295X.93.3.283 , year =

work page doi:10.1037/0033-
[9]

The New York Times , date =

Ezra Klein , title =. The New York Times , date =
[10]

A General Language Assistant as a Laboratory for Alignment , journal =

Amanda Askell and Yuntao Bai and Anna Chen and Dawn Drain and Deep Ganguli and Tom Henighan and Andy Jones and Nicholas Joseph and Benjamin Mann and Nova DasSarma and Nelson Elhage and Zac Hatfield. A General Language Assistant as a Laboratory for Alignment , journal =. 2021 , url =. 2112.00861 , timestamp =

Pith/arXiv arXiv 2021
[11]

Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...

2022
[12]

Do Language Models' Words Refer?

Mandelkern, Matthew and Linzen, Tal. Do Language Models' Words Refer?. Computational Linguistics. 2024. doi:10.1162/coli_a_00522

work page doi:10.1162/coli_a_00522 2024
[13]

Are Language Models More Like Libraries or Like Librarians?

Harvey Lederman and Kyle Mahowald , year=. Are Language Models More Like Libraries or Like Librarians?. 2401.04854 , archivePrefix=

arXiv
[14]

Diabolus ex machina , author=
[15]

Gary Marcus , date =
[16]

1966 , issue_date =

Weizenbaum, Joseph , title =. 1966 , issue_date =. doi:10.1145/365153.365168 , journal =

work page doi:10.1145/365153.365168 1966
[17]

1976 , isbn =

Weizenbaum, Joseph , title =. 1976 , isbn =

1976
[18]

Fluid Concepts and Creative Analogies: Computer Models and the Fundamental Mechanisms of Thought , publisher =

Douglas Hofstadter , title =. Fluid Concepts and Creative Analogies: Computer Models and the Fundamental Mechanisms of Thought , publisher =
[19]

Leonardo , volume =

Mateas, Michael , title =. Leonardo , volume =. 2001 , month =. doi:10.1162/002409401750184717 , url =

work page doi:10.1162/002409401750184717 2001
[20]

ACM SIGGRAPH 2004 Papers , pages =

Stone, Matthew and DeCarlo, Doug and Oh, Insuk and Rodriguez, Christian and Stere, Adrian and Lees, Alyssa and Bregler, Chris , title =. ACM SIGGRAPH 2004 Papers , pages =. 2004 , isbn =. doi:10.1145/1186562.1015753 , abstract =

work page doi:10.1145/1186562.1015753 2004
[21]

Matthew Stone and Lauren M. E. Goodlad and Mark Sammons , title =. Critical AI , volume =. 2024 , url =

2024
[22]

Seven Strictures on Similarity , year =

Nelson Goodman , booktitle =. Seven Strictures on Similarity , year =
[23]

More than words : how to think about writing in the age of

Warner, John , publisher =. More than words : how to think about writing in the age of
[24]

The Twelfth International Conference on Learning Representations , year=

Towards Understanding Sycophancy in Language Models , author=. The Twelfth International Conference on Learning Representations , year=
[25]

Sycophancy in Large Language Models: Causes and Mitigations

Malmqvist, Lars. Sycophancy in Large Language Models: Causes and Mitigations. Intelligent Computing. 2025

2025
[26]

2024 , url =

Dilip Ninan , title =. 2024 , url =. doi:10.3998/phimp.2683 , month =

work page doi:10.3998/phimp.2683 2024
[27]

Investigating Memorization of Conspiracy Theories in Text Generation

Levy, Sharon and Saxon, Michael and Wang, William Yang. Investigating Memorization of Conspiracy Theories in Text Generation. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.416

work page doi:10.18653/v1/2021.findings-acl.416 2021
[28]

Critical AI , volume =

Fredrikzon, Johan , title =. Critical AI , volume =. 2025 , month =. doi:10.1215/2834703X-11700255 , url =

work page doi:10.1215/2834703x-11700255 2025
[29]

and Ho, Daniel E

Magesh, Varun and Surani, Faiz and Dahl, Matthew and Suzgun, Mirac and Manning, Christopher D. and Ho, Daniel E. , title =. Journal of Empirical Legal Studies , volume =. doi:https://doi.org/10.1111/jels.12413 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/jels.12413 , abstract =

work page doi:10.1111/jels.12413
[30]

Camp and Jason A

Nathan T. Camp and Jason A. Bengtson and John C. Sandstrom , keywords =. The citation catastrophe: Propagation of. The Journal of Academic Librarianship , volume =. 2025 , issn =. doi:https://doi.org/10.1016/j.acalib.2025.103065 , url =

work page doi:10.1016/j.acalib.2025.103065 2025
[31]

Lawyers Blame

Neumeister, Larry , date =. Lawyers Blame
[32]

The New York Times , date=

Dani Blum and Maggie Astor , title=. The New York Times , date=
[33]

The handbook of pragmatics , pages=

Discourse Coherence , author=. The handbook of pragmatics , pages=
[34]

Time and Modality Without Tenses and Modals , Year =

Bittner, Maria , Booktitle =. Time and Modality Without Tenses and Modals , Year =. Tense Across Languages , Pages =
[35]

Temporality: Universals and Variation , Year =

Bittner, Maria , Date-Added =. Temporality: Universals and Variation , Year =
[36]

Indefeasible Semantics and Defeasible Pragmatics , Year =

Kameyama, Megumi , Booktitle =. Indefeasible Semantics and Defeasible Pragmatics , Year =
[37]

, Date-Added =

Kehler, Andrew and Kertz, L and Rodhe, Hannah and Elman, J. , Date-Added =. Coherence and Coreference Revisited , Volume =. Journal of Semantics , Pages =
[38]

Coherence, reference, and the theory of grammar , Year =

Kehler, Andrew , Publisher =. Coherence, reference, and the theory of grammar , Year =
[39]

2003 , publisher=

Logics of conversation , author=. 2003 , publisher=

2003
[40]

and Masayo, Iida and Cote, Sharon , Date-Added =

Walker, Marilny A. and Masayo, Iida and Cote, Sharon , Date-Added =. Japaease Discourse and the Process of Centering , Volume =. Computational Linguistics , Number =
[41]

Scorekeeping in a Language Game , JOURNAL =

David Lewis , YEAR =. Scorekeeping in a Language Game , JOURNAL =
[42]

Speaker's Reference and Semantic Reference , Volume =

Saul Kripke , Booktitle =. Speaker's Reference and Semantic Reference , Volume =
[43]

, Date-Added =

Hobbs, Jerry R. , Date-Added =. Coherence and Coreference , Volume =. Congitive Science , Pages =
[44]

Hobbs , title =

Jerry R. Hobbs , title =. 1985 , topic =

1985
[45]

Semantics and Pragmatics , volume=

Information structure: Towards an integrated formal theory of pragmatics , author=. Semantics and Pragmatics , volume=
[46]

Journal of Semantics , year =

Alex Lascarides and Nicholas Asher , title =. Journal of Semantics , year =
[47]

Philosophical Perspectives , year =

Stojnić, Una and Stone, Matthew , title =. Philosophical Perspectives , year =. doi:https://doi.org/10.1111/phpe.70001 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/phpe.70001 , abstract =

work page doi:10.1111/phpe.70001
[48]

A Case Study: NLG meeting Weather Industry Demand for Quality and Quantity of Textual Weather Forecasts

Sripada, Somayajulu and Burnett, Neil and Turner, Ross and Mastin, John and Evans, Dave. A Case Study: NLG meeting Weather Industry Demand for Quality and Quantity of Textual Weather Forecasts. Proceedings of the 8th International Natural Language Generation Conference ( INLG ). 2014. doi:10.3115/v1/W14-4401

work page doi:10.3115/v1/w14-4401 2014
[49]

Generation Challenges: Results of the Accuracy Evaluation Shared Task

Thomson, Craig and Reiter, Ehud. Generation Challenges: Results of the Accuracy Evaluation Shared Task. Proceedings of the 14th International Conference on Natural Language Generation. 2021. doi:10.18653/v1/2021.inlg-1.23

work page doi:10.18653/v1/2021.inlg-1.23 2021
[50]

, biburl =

Searle, John R. , biburl =. Minds, brains, and programs , url =. Behavioral and Brain Sciences , doi =
[51]

, title =

Haugeland, John C. , title =. Mind Design: Philosophy, Psychology, and Artificial Intelligence , year =
[52]

2024 , publisher=

Natural Language Generation , author=. 2024 , publisher=

2024
[53]

Billboards, bombs and shotgun weddings , volume =

Egan, Andy , journal =. Billboards, bombs and shotgun weddings , volume =
[54]

Bowman, Samuel R. , doi =. Eight Things to Know about Large Language Models , url =. Critical AI , month =. 2024 , bdsk-url-1 =

2024
[55]

How demonstratives and indexicals really work , author=. The. 2020 , publisher=

2020
[56]

Words on Words , volume =

David Kaplan , journal =. Words on Words , volume =
[57]

Dthat , year =

David Kaplan , booktitle =. Dthat , year =
[58]

Proceedings of the Aristotelian Society, supplementary volumes , volume=

Words , author=. Proceedings of the Aristotelian Society, supplementary volumes , volume=
[59]

The Philosophical Review , volume=

Meaning , author=. The Philosophical Review , volume=
[60]

1980 , publisher=

Naming and necessity , author=. 1980 , publisher=

1980
[61]

and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret

Bender, Emily M. and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages =. 2021 , isbn =. doi:10.1145/3442188.3445922 , abstract =

work page doi:10.1145/3442188.3445922 2021
[62]

Themes from Kaplan , title=

Kaplan, David , year=. Themes from Kaplan , title=
[63]

King , journal =

Jeffrey C. King , journal =. SUPPLEMENTIVES, THE COORDINATION ACCOUNT, AND CONFLICTING INTENTIONS , volume =
[64]

Presuppositions , urldate =

Robert Stalnaker , journal =. Presuppositions , urldate =
[65]

2025 , url =

Patrick Butlin AND Emanuel Viebahn , title =. 2025 , url =. doi:10.3998/ergo.7960 , abstract =

work page doi:10.3998/ergo.7960 2025
[66]

Green, Mitchell and Michel, Jan G. , doi =. What Might Machines Mean? , url =. Minds and Machines , number =
[67]

Nickel, Philip J. , doi =. Artificial Speech and Its Authors , url =. Minds and Machines , number =
[68]

2006 , publisher=

Thinking about acting: Logical foundations for rational decision making , author=. 2006 , publisher=

2006
[69]

Intention, Plans, and Practical Reason , year =

Michael Bratman , publisher =. Intention, Plans, and Practical Reason , year =
[70]

Artificial intelligence , volume=

Intention is choice with commitment , author=. Artificial intelligence , volume=. 1990 , publisher=

1990
[71]

Neural Theory-of-Mind? On the Limits of Social Intelligence in Large

Sap, Maarten and Le Bras, Ronan and Fried, Daniel and Choi, Yejin. Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LM s. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.248

work page doi:10.18653/v1/2022.emnlp-main.248 2022
[72]

Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models

Shapira, Natalie and Levy, Mosh and Alavi, Seyed Hossein and Zhou, Xuhui and Choi, Yejin and Goldberg, Yoav and Sap, Maarten and Shwartz, Vered. Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume ...

work page doi:10.18653/v1/2024.eacl-long.138 2024
[73]

FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Kim, Hyunwoo and Sclar, Melanie and Zhou, Xuhui and Bras, Ronan and Kim, Gunhee and Choi, Yejin and Sap, Maarten. FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.890

work page doi:10.18653/v1/2023.emnlp-main.890 2023
[74]

Rethinking Theory of Mind Benchmarks for

Qiaosi Wang and Xuhui Zhou and Maarten Sap and Jodi Forlizzi and Hong Shen , year=. Rethinking Theory of Mind Benchmarks for. 2504.10839 , archivePrefix=

arXiv
[75]

A List Apart , date =

Harry Brignull , title =. A List Apart , date =
[76]

Harry Brignull , title =
[77]

Addiction by Design: Machine Gambling in Las Vegas , year =

Natasha Dow Sch\"ull , doi =. Addiction by Design: Machine Gambling in Las Vegas , year =
[78]

2011 , publisher=

The Filter Bubble: What The Internet Is Hiding From You , author=. 2011 , publisher=

2011
[79]

Optimizing for Engagement Can Be Harmful

Golbeck, Jennifer , journal=. Optimizing for Engagement Can Be Harmful. There Are Alternatives , year=
[80]

Hilary Putnam , title =

Showing first 80 references.

[1] [1]

Going Whole Hog: A Philosophical Defense of

Herman Cappelen and Josh Dever , year=. Going Whole Hog: A Philosophical Defense of. 2504.13988 , archivePrefix=

arXiv

[2] [2]

and Roelofs, Ardi and Meyer, Antje S

Levelt, Willem J. and Roelofs, Ardi and Meyer, Antje S. , title =. Behavioral and Brain Sciences , number =. doi:https://doi.org/10.1017/s0140 525x9 9001776 , year =

work page doi:10.1017/s0140

[3] [3]

On Words , Volume =

John Hawthorne and Ernie Lepore , Journal =. On Words , Volume =. doi:10.5840/2011108924 , Year =

work page doi:10.5840/2011108924

[4] [4]

Levelt , title =

Willem J. Levelt , title =

[5] [5]

Deixis (Even Without Pointing) , Volume =

Stojni. Deixis (Even Without Pointing) , Volume =. Philosophical Perspectives , Number =

[6] [6]

Imagination and Convention: Distinguishing Grammar and Inference in Language , Year =

Ernest Lepore and Matthew Stone , Date-Added =. Imagination and Convention: Distinguishing Grammar and Inference in Language , Year =

[7] [7]

Discourse and Logical Form , Volume =

Stojni\'. Discourse and Logical Form , Volume =. Linguistics and Philosophy , Number =

[8] [8]

Dell , title =

Gary S. Dell , title =. Psychological Review , number =. doi:https://doi.org/10.1037/0033- 295X.93.3.283 , year =

work page doi:10.1037/0033-

[9] [9]

The New York Times , date =

Ezra Klein , title =. The New York Times , date =

[10] [10]

A General Language Assistant as a Laboratory for Alignment , journal =

Amanda Askell and Yuntao Bai and Anna Chen and Dawn Drain and Deep Ganguli and Tom Henighan and Andy Jones and Nicholas Joseph and Benjamin Mann and Nova DasSarma and Nelson Elhage and Zac Hatfield. A General Language Assistant as a Laboratory for Alignment , journal =. 2021 , url =. 2112.00861 , timestamp =

Pith/arXiv arXiv 2021

[11] [11]

Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...

2022

[12] [12]

Do Language Models' Words Refer?

Mandelkern, Matthew and Linzen, Tal. Do Language Models' Words Refer?. Computational Linguistics. 2024. doi:10.1162/coli_a_00522

work page doi:10.1162/coli_a_00522 2024

[13] [13]

Are Language Models More Like Libraries or Like Librarians?

Harvey Lederman and Kyle Mahowald , year=. Are Language Models More Like Libraries or Like Librarians?. 2401.04854 , archivePrefix=

arXiv

[14] [14]

Diabolus ex machina , author=

[15] [15]

Gary Marcus , date =

[16] [16]

1966 , issue_date =

Weizenbaum, Joseph , title =. 1966 , issue_date =. doi:10.1145/365153.365168 , journal =

work page doi:10.1145/365153.365168 1966

[17] [17]

1976 , isbn =

Weizenbaum, Joseph , title =. 1976 , isbn =

1976

[18] [18]

Fluid Concepts and Creative Analogies: Computer Models and the Fundamental Mechanisms of Thought , publisher =

Douglas Hofstadter , title =. Fluid Concepts and Creative Analogies: Computer Models and the Fundamental Mechanisms of Thought , publisher =

[19] [19]

Leonardo , volume =

Mateas, Michael , title =. Leonardo , volume =. 2001 , month =. doi:10.1162/002409401750184717 , url =

work page doi:10.1162/002409401750184717 2001

[20] [20]

ACM SIGGRAPH 2004 Papers , pages =

Stone, Matthew and DeCarlo, Doug and Oh, Insuk and Rodriguez, Christian and Stere, Adrian and Lees, Alyssa and Bregler, Chris , title =. ACM SIGGRAPH 2004 Papers , pages =. 2004 , isbn =. doi:10.1145/1186562.1015753 , abstract =

work page doi:10.1145/1186562.1015753 2004

[21] [21]

Matthew Stone and Lauren M. E. Goodlad and Mark Sammons , title =. Critical AI , volume =. 2024 , url =

2024

[22] [22]

Seven Strictures on Similarity , year =

Nelson Goodman , booktitle =. Seven Strictures on Similarity , year =

[23] [23]

More than words : how to think about writing in the age of

Warner, John , publisher =. More than words : how to think about writing in the age of

[24] [24]

The Twelfth International Conference on Learning Representations , year=

Towards Understanding Sycophancy in Language Models , author=. The Twelfth International Conference on Learning Representations , year=

[25] [25]

Sycophancy in Large Language Models: Causes and Mitigations

Malmqvist, Lars. Sycophancy in Large Language Models: Causes and Mitigations. Intelligent Computing. 2025

2025

[26] [26]

2024 , url =

Dilip Ninan , title =. 2024 , url =. doi:10.3998/phimp.2683 , month =

work page doi:10.3998/phimp.2683 2024

[27] [27]

Investigating Memorization of Conspiracy Theories in Text Generation

Levy, Sharon and Saxon, Michael and Wang, William Yang. Investigating Memorization of Conspiracy Theories in Text Generation. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.416

work page doi:10.18653/v1/2021.findings-acl.416 2021

[28] [28]

Critical AI , volume =

Fredrikzon, Johan , title =. Critical AI , volume =. 2025 , month =. doi:10.1215/2834703X-11700255 , url =

work page doi:10.1215/2834703x-11700255 2025

[29] [29]

and Ho, Daniel E

Magesh, Varun and Surani, Faiz and Dahl, Matthew and Suzgun, Mirac and Manning, Christopher D. and Ho, Daniel E. , title =. Journal of Empirical Legal Studies , volume =. doi:https://doi.org/10.1111/jels.12413 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/jels.12413 , abstract =

work page doi:10.1111/jels.12413

[30] [30]

Camp and Jason A

Nathan T. Camp and Jason A. Bengtson and John C. Sandstrom , keywords =. The citation catastrophe: Propagation of. The Journal of Academic Librarianship , volume =. 2025 , issn =. doi:https://doi.org/10.1016/j.acalib.2025.103065 , url =

work page doi:10.1016/j.acalib.2025.103065 2025

[31] [31]

Lawyers Blame

Neumeister, Larry , date =. Lawyers Blame

[32] [32]

The New York Times , date=

Dani Blum and Maggie Astor , title=. The New York Times , date=

[33] [33]

The handbook of pragmatics , pages=

Discourse Coherence , author=. The handbook of pragmatics , pages=

[34] [34]

Time and Modality Without Tenses and Modals , Year =

Bittner, Maria , Booktitle =. Time and Modality Without Tenses and Modals , Year =. Tense Across Languages , Pages =

[35] [35]

Temporality: Universals and Variation , Year =

Bittner, Maria , Date-Added =. Temporality: Universals and Variation , Year =

[36] [36]

Indefeasible Semantics and Defeasible Pragmatics , Year =

Kameyama, Megumi , Booktitle =. Indefeasible Semantics and Defeasible Pragmatics , Year =

[37] [37]

, Date-Added =

Kehler, Andrew and Kertz, L and Rodhe, Hannah and Elman, J. , Date-Added =. Coherence and Coreference Revisited , Volume =. Journal of Semantics , Pages =

[38] [38]

Coherence, reference, and the theory of grammar , Year =

Kehler, Andrew , Publisher =. Coherence, reference, and the theory of grammar , Year =

[39] [39]

2003 , publisher=

Logics of conversation , author=. 2003 , publisher=

2003

[40] [40]

and Masayo, Iida and Cote, Sharon , Date-Added =

Walker, Marilny A. and Masayo, Iida and Cote, Sharon , Date-Added =. Japaease Discourse and the Process of Centering , Volume =. Computational Linguistics , Number =

[41] [41]

Scorekeeping in a Language Game , JOURNAL =

David Lewis , YEAR =. Scorekeeping in a Language Game , JOURNAL =

[42] [42]

Speaker's Reference and Semantic Reference , Volume =

Saul Kripke , Booktitle =. Speaker's Reference and Semantic Reference , Volume =

[43] [43]

, Date-Added =

Hobbs, Jerry R. , Date-Added =. Coherence and Coreference , Volume =. Congitive Science , Pages =

[44] [44]

Hobbs , title =

Jerry R. Hobbs , title =. 1985 , topic =

1985

[45] [45]

Semantics and Pragmatics , volume=

Information structure: Towards an integrated formal theory of pragmatics , author=. Semantics and Pragmatics , volume=

[46] [46]

Journal of Semantics , year =

Alex Lascarides and Nicholas Asher , title =. Journal of Semantics , year =

[47] [47]

Philosophical Perspectives , year =

Stojnić, Una and Stone, Matthew , title =. Philosophical Perspectives , year =. doi:https://doi.org/10.1111/phpe.70001 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/phpe.70001 , abstract =

work page doi:10.1111/phpe.70001

[48] [48]

A Case Study: NLG meeting Weather Industry Demand for Quality and Quantity of Textual Weather Forecasts

Sripada, Somayajulu and Burnett, Neil and Turner, Ross and Mastin, John and Evans, Dave. A Case Study: NLG meeting Weather Industry Demand for Quality and Quantity of Textual Weather Forecasts. Proceedings of the 8th International Natural Language Generation Conference ( INLG ). 2014. doi:10.3115/v1/W14-4401

work page doi:10.3115/v1/w14-4401 2014

[49] [49]

Generation Challenges: Results of the Accuracy Evaluation Shared Task

Thomson, Craig and Reiter, Ehud. Generation Challenges: Results of the Accuracy Evaluation Shared Task. Proceedings of the 14th International Conference on Natural Language Generation. 2021. doi:10.18653/v1/2021.inlg-1.23

work page doi:10.18653/v1/2021.inlg-1.23 2021

[50] [50]

, biburl =

Searle, John R. , biburl =. Minds, brains, and programs , url =. Behavioral and Brain Sciences , doi =

[51] [51]

, title =

Haugeland, John C. , title =. Mind Design: Philosophy, Psychology, and Artificial Intelligence , year =

[52] [52]

2024 , publisher=

Natural Language Generation , author=. 2024 , publisher=

2024

[53] [53]

Billboards, bombs and shotgun weddings , volume =

Egan, Andy , journal =. Billboards, bombs and shotgun weddings , volume =

[54] [54]

Bowman, Samuel R. , doi =. Eight Things to Know about Large Language Models , url =. Critical AI , month =. 2024 , bdsk-url-1 =

2024

[55] [55]

How demonstratives and indexicals really work , author=. The. 2020 , publisher=

2020

[56] [56]

Words on Words , volume =

David Kaplan , journal =. Words on Words , volume =

[57] [57]

Dthat , year =

David Kaplan , booktitle =. Dthat , year =

[58] [58]

Proceedings of the Aristotelian Society, supplementary volumes , volume=

Words , author=. Proceedings of the Aristotelian Society, supplementary volumes , volume=

[59] [59]

The Philosophical Review , volume=

Meaning , author=. The Philosophical Review , volume=

[60] [60]

1980 , publisher=

Naming and necessity , author=. 1980 , publisher=

1980

[61] [61]

and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret

Bender, Emily M. and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages =. 2021 , isbn =. doi:10.1145/3442188.3445922 , abstract =

work page doi:10.1145/3442188.3445922 2021

[62] [62]

Themes from Kaplan , title=

Kaplan, David , year=. Themes from Kaplan , title=

[63] [63]

King , journal =

Jeffrey C. King , journal =. SUPPLEMENTIVES, THE COORDINATION ACCOUNT, AND CONFLICTING INTENTIONS , volume =

[64] [64]

Presuppositions , urldate =

Robert Stalnaker , journal =. Presuppositions , urldate =

[65] [65]

2025 , url =

Patrick Butlin AND Emanuel Viebahn , title =. 2025 , url =. doi:10.3998/ergo.7960 , abstract =

work page doi:10.3998/ergo.7960 2025

[66] [66]

Green, Mitchell and Michel, Jan G. , doi =. What Might Machines Mean? , url =. Minds and Machines , number =

[67] [67]

Nickel, Philip J. , doi =. Artificial Speech and Its Authors , url =. Minds and Machines , number =

[68] [68]

2006 , publisher=

Thinking about acting: Logical foundations for rational decision making , author=. 2006 , publisher=

2006

[69] [69]

Intention, Plans, and Practical Reason , year =

Michael Bratman , publisher =. Intention, Plans, and Practical Reason , year =

[70] [70]

Artificial intelligence , volume=

Intention is choice with commitment , author=. Artificial intelligence , volume=. 1990 , publisher=

1990

[71] [71]

Neural Theory-of-Mind? On the Limits of Social Intelligence in Large

Sap, Maarten and Le Bras, Ronan and Fried, Daniel and Choi, Yejin. Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LM s. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022. doi:10.18653/v1/2022.emnlp-main.248

work page doi:10.18653/v1/2022.emnlp-main.248 2022

[72] [72]

Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models

Shapira, Natalie and Levy, Mosh and Alavi, Seyed Hossein and Zhou, Xuhui and Choi, Yejin and Goldberg, Yoav and Sap, Maarten and Shwartz, Vered. Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume ...

work page doi:10.18653/v1/2024.eacl-long.138 2024

[73] [73]

FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Kim, Hyunwoo and Sclar, Melanie and Zhou, Xuhui and Bras, Ronan and Kim, Gunhee and Choi, Yejin and Sap, Maarten. FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.890

work page doi:10.18653/v1/2023.emnlp-main.890 2023

[74] [74]

Rethinking Theory of Mind Benchmarks for

Qiaosi Wang and Xuhui Zhou and Maarten Sap and Jodi Forlizzi and Hong Shen , year=. Rethinking Theory of Mind Benchmarks for. 2504.10839 , archivePrefix=

arXiv

[75] [75]

A List Apart , date =

Harry Brignull , title =. A List Apart , date =

[76] [76]

Harry Brignull , title =

[77] [77]

Addiction by Design: Machine Gambling in Las Vegas , year =

Natasha Dow Sch\"ull , doi =. Addiction by Design: Machine Gambling in Las Vegas , year =

[78] [78]

2011 , publisher=

The Filter Bubble: What The Internet Is Hiding From You , author=. 2011 , publisher=

2011

[79] [79]

Optimizing for Engagement Can Be Harmful

Golbeck, Jennifer , journal=. Optimizing for Engagement Can Be Harmful. There Are Alternatives , year=

[80] [80]

Hilary Putnam , title =