Beyond "Hallucinations": A Framework for Stable Human-AI Reasoning

Carl Rosenbacke; Martin McKee; Rikard Rosenbacke; Victor Rosenbacke

arxiv: 2510.14665 · v2 · submitted 2025-10-16 · 💻 cs.AI · cs.HC

Beyond "Hallucinations": A Framework for Stable Human-AI Reasoning

Rikard Rosenbacke , Carl Rosenbacke , Victor Rosenbacke , Martin McKee This is my paper

Pith reviewed 2026-05-18 06:31 UTC · model grok-4.3

classification 💻 cs.AI cs.HC

keywords human-AI interactionLLM outputscognitive trapsepistemic driftreflective oversightRose-FrameAI alignment

0 comments

The pith

Fluency in AI outputs can create an illusion of understanding that requires human-side structures for reflective oversight.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that large language models generate fluent text from statistical patterns rather than from any grounded sense of reality. When these outputs interact with human users who also lean on quick associations, the combination can produce outputs that seem reasonable but are not. The authors present the Rose-Frame to isolate three recurring patterns that turn this interaction into a drift away from accurate understanding. They propose fixes that act on the human side of the exchange, such as prompts that force reflection and arrangements that require disagreement to be voiced. A sympathetic reader would care because everyday and high-stakes decisions now depend on this mixed human-AI process, and the framework offers a way to make oversight both possible and testable.

Core claim

The central claim is that fluency can create an illusion of understanding. The Rose-Frame diagnoses three traps in human-AI interaction: map versus territory, intuition versus reason, and conflict versus confirmation. These traps compound into epistemic drift. Aligning AI therefore requires not only technical improvements but structures that enable reflective and falsifiable human oversight.

What carries the argument

The Rose-Frame, a cognitive-epistemological framework that identifies three recurrent traps in human-AI interaction and shows how they produce epistemic drift.

If this is right

Interpretive cues and reflective prompts can reduce the chance that plausible-sounding outputs are accepted without check.
Structured disagreement makes it harder for ideas to reinforce one another without being tested.
Governing the interaction process itself improves oversight even if the underlying model stays unchanged.
Fluency should not be treated as evidence that reasoning is grounded.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same traps could appear in any collaborative system where one party supplies fluent but pattern-based output.
Field trials could test whether the proposed human interventions lower mistake rates in concrete settings such as medical or financial advice.
The framework offers a lens for examining how quick human judgments and model outputs interact in other domains beyond language.

Load-bearing premise

The three traps are recurrent mechanisms that compound into epistemic drift when human and model reasoning interact.

What would settle it

A controlled comparison of error rates on the same reasoning tasks performed by humans using AI with versus without the proposed reflective prompts and structured disagreement.

read the original abstract

As large language models (LLMs) become integrated into everyday and high-stakes decision-making, they inherit the ambiguity and biases of human language. While they produce fluent and coherent outputs, they rely on statistical pattern prediction rather than grounded reasoning, creating a risk of outputs that are plausible but incorrect. This paper argues that these failures are not only technical but cognitive. LLMs reproduce associative patterns similar to intuitive human reasoning, amplifying systematic misinterpretations when combined with human users. To analyse this, we introduce the Rose-Frame, a cognitive-epistemological framework for diagnosing breakdowns in human-AI interaction. The framework identifies three recurrent traps: (i) map vs territory, distinguishing representations from reality; (ii) intuition vs reason, separating fast associative judgments from reflective reasoning; and (iii) conflict vs confirmation, examining whether ideas are critically tested or mutually reinforced. These mechanisms can compound into epistemic drift when human and model reasoning interact. We show how these failures emerge in practice and propose human-side interventions, including interpretive cues, reflective prompts, and structured disagreement, to stabilise reasoning. Rather than modifying models, the framework focuses on governing interaction. The central claim is that fluency can create an illusion of understanding. Aligning AI therefore requires not only technical improvements but structures that enable reflective and falsifiable human oversight.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Rose-Frame names three familiar traps in human-AI work but supplies no operational definitions or interaction evidence to show they compound into epistemic drift.

read the letter

The paper's core move is to frame LLM failures as cognitive rather than purely technical. It introduces the Rose-Frame to label three traps—map versus territory, intuition versus reason, and conflict versus confirmation—and claims these can compound when humans and models interact. The central observation that fluency creates an illusion of understanding is fair and worth repeating in applied contexts. The shift toward human-side fixes like reflective prompts and structured disagreement is also practical, since many teams cannot retrain the underlying model on demand.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Rose-Frame, a cognitive-epistemological framework for diagnosing breakdowns in human-AI reasoning interactions. It identifies three recurrent traps—map vs. territory, intuition vs. reason, and conflict vs. confirmation—that can compound into epistemic drift. The central claim is that LLM fluency creates an illusion of understanding, and alignment requires human-side interventions (interpretive cues, reflective prompts, structured disagreement) to enable reflective and falsifiable oversight rather than solely technical model modifications.

Significance. If the framework can be operationalized and tested, it offers a structured diagnostic approach to epistemic risks in human-AI collaboration and usefully shifts emphasis toward interaction governance. The proposal of falsifiable human oversight structures is a constructive contribution. However, as a purely conceptual framework without empirical data, formal derivations, or validation studies, its significance remains prospective.

major comments (2)

[Rose-Frame] The claim that the three traps compound into epistemic drift is load-bearing for the central argument yet rests on high-level illustrations rather than operational definitions, traceable decision traces, or before/after comparisons that would demonstrate interaction effects distinct from general fluency bias. (Rose-Frame section)
[Proposed interventions] The proposed human-side interventions are presented conceptually without concrete protocols, worked examples of application, or criteria for evaluating whether they stabilize reasoning or reduce drift. (Proposed interventions)

minor comments (2)

[Abstract] The abstract would benefit from an explicit statement of the framework's scope and limitations, including the current lack of empirical validation.
Additional references to existing literature on overreliance on LLMs and confirmation bias in collaborative settings would help situate the Rose-Frame.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and insightful review. We appreciate the acknowledgment of the framework's potential value in shifting focus toward interaction governance and human-side interventions. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Rose-Frame] The claim that the three traps compound into epistemic drift is load-bearing for the central argument yet rests on high-level illustrations rather than operational definitions, traceable decision traces, or before/after comparisons that would demonstrate interaction effects distinct from general fluency bias. (Rose-Frame section)

Authors: We agree that the compounding claim is central and that the current illustrations are high-level. The manuscript presents the Rose-Frame as a conceptual diagnostic tool whose value lies in identifying logical interdependencies among the traps rather than in providing empirical proof of distinct interaction effects. To strengthen the section, we will add operational definitions (e.g., observable indicators for each trap) and example decision traces drawn from representative interaction sequences. We will also include a brief discussion of how future empirical work could isolate compounding from general fluency bias. Full before/after comparisons, however, lie beyond the scope of a framework paper. revision: partial
Referee: [Proposed interventions] The proposed human-side interventions are presented conceptually without concrete protocols, worked examples of application, or criteria for evaluating whether they stabilize reasoning or reduce drift. (Proposed interventions)

Authors: We accept that the interventions require greater specificity. In the revised manuscript we will expand the relevant section to supply concrete protocols (step-by-step procedures for interpretive cues and reflective prompts), worked examples from typical decision-making contexts, and preliminary evaluation criteria such as observable increases in user-generated falsification attempts or explicit uncertainty markers. These additions will remain within the conceptual framing while making the proposals more actionable. revision: yes

standing simulated objections not resolved

The manuscript offers a conceptual framework without accompanying empirical validation studies, formal derivations, or quantitative data; therefore we cannot supply the before/after comparisons or statistical evidence requested for the compounding claim.

Circularity Check

0 steps flagged

No circularity: Rose-Frame is a self-contained conceptual framework

full rationale

The manuscript introduces the Rose-Frame as a diagnostic structure for human-AI interaction, defines its three traps descriptively, and argues that fluency can produce an illusion of understanding. No equations, fitted parameters, predictions, or self-citation chains appear in the provided text. The central claims are advanced as interpretive arguments supported by the framework itself rather than reducing to tautological inputs by construction. The derivation chain is therefore self-contained and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The contribution rests on conceptual assumptions about reasoning processes and introduces a new diagnostic framework without independent empirical anchors.

axioms (2)

domain assumption LLMs reproduce associative patterns similar to intuitive human reasoning, amplifying misinterpretations in interaction
Stated directly in the abstract as the basis for treating failures as cognitive rather than purely technical.
ad hoc to paper The three traps can compound into epistemic drift in human-AI interactions
Core diagnostic claim of the Rose-Frame that structures the analysis and interventions.

invented entities (1)

Rose-Frame no independent evidence
purpose: Cognitive-epistemological framework for diagnosing breakdowns and guiding interventions in human-AI reasoning
Newly proposed framework whose effectiveness is asserted without external validation or falsifiable predictions in the abstract.

pith-pipeline@v0.9.0 · 5778 in / 1463 out tokens · 40449 ms · 2026-05-18T06:31:28.875149+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The three dimensions are: (i) Map vs. Territory... (ii) Intuition vs. Reason... (iii) Conflict vs. Confirmation... their effects compound, leading to runaway misinterpretations and epistemic drift.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LLMs represent human System 1 cognition scaled up—fast, associative, and persuasive, but lacking reflection and self-correction.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

[1]

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021)

work page 2021
[2]

& Kasneci, E

Terzimehić, N., Bühler, B. & Kasneci, E. Conversational AI as a Catalyst for Informal Learning: An Empirical Large-Scale Study on LLM Use in Everyday Learning. 1, (2025)

work page 2025
[3]

Shahzad, T. et al. A comprehensive review of large language models: issues and solutions in learning environments. SpringerT Shahzad, T Maz. MU Tariq, W Ahmad, K Ouahada, H HamamDiscover Sustain. 2025•Springer 6, 27 (2025)

work page 2025
[4]

Cognitive Challenges in Human-AI Collaboration: A Study on Trust, Errors, and Heuristics in Clinical Decision-making

Rosenbacke, R. Cognitive Challenges in Human-AI Collaboration: A Study on Trust, Errors, and Heuristics in Clinical Decision-making. (2025)

work page 2025
[5]

Radford, A. et al. Language models are unsupervised multitask learners. storage.prod.researchhub.comA Radford, J Wu, R Child, D Luan, D Amodei, I SutskeverOpenAI blog, 2019•storage.prod.researchhub.com

work page 2019
[6]

& Dean, J

Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient Estimation of Word Representations in Vector Space. 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc. (2013)

work page 2013
[7]

Dentella, V., Günther, F., Murphy, E., Reports, G. M.-S. & 2024, undefined. Testing AI on language comprehension tasks reveals insensitivity to underlying meaning. nature.comV Dentella, F Günther, E Murphy, G Marcus, E LeivadaScientific Reports, 2024•nature.com doi:10.1038/s41598-024-79531-8

work page doi:10.1038/s41598-024-79531-8 2024
[8]

Huang, L. et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. dl.acm.orgL Huang, W Yu, W Ma, W Zhong, Z Feng, H Wang, Q Chen, W Peng, X Feng, B Qin, T LiuACM Trans. Inf. Syst. 2025•dl.acm.org 43, (2025)

work page 2025
[9]

Jackson, J. C. et al. Emotion semantics show both cultural variation and universal structure. Sci. Jackson, J Watts, TR Henry, JM List. R Forkel, PJ Mucha, SJ Greenhill, RD GrayScience, 2019•science.org 366, 1517–1522 (2019)

work page 2019
[10]

Xenia in Sophocles’ Philoctetes

Belfiore, E. Xenia in Sophocles’ Philoctetes. Class. J. 89, 113–129 (1994)

work page 1994
[11]

How often do people lie? UWLAX (2024)

Docan-Morgan, T. How often do people lie? UWLAX (2024). Available at: https://www.uwlax.edu/currents/how-often-do-people-lie/. (Accessed: 18th September

work page 2024
[12]

Liu, A. et al. We’re Afraid Language Models Aren’t Modeling Ambiguity. EMNLP 2023 - 2023 Conf. Empir. Methods Nat. Lang. Process. Proc. 790–807 (2023). 8 doi:10.18653/v1/2023.emnlp-main.51

work page doi:10.18653/v1/2023.emnlp-main.51 2023
[13]

Knowledge and its Limits

Williamson, T. Knowledge and its Limits. (2002). doi:10.1093/019925656X.001.0001

work page doi:10.1093/019925656x.001.0001 2002
[14]

Science and sanity: An introduction to non-Aristotelian systems and general semantics

Korzybski, A. Science and sanity: An introduction to non-Aristotelian systems and general semantics. (1958)

work page 1958
[15]

Thinking, fast and slow

Kahneman, D. Thinking, fast and slow. (Farrar, Straus and Giroux ;, 2011)

work page 2011
[16]

K.- & 2012, undefined

Moxley, J., Ericsson, K., Charness, N., Cognition, R. K.- & 2012, undefined. The role of intuition and deliberative thinking in experts’ superior tactical decision-making. ElsevierJH Moxley, KA Ericsson, N Charness, RT KrampeCognition, 2012•Elsevier

work page 2012
[17]

Thinking fast, thinking slow

Kahneman, D. Thinking fast, thinking slow. Interpretation, Tavistock, London (2011)

work page 2011
[18]

Google engineer Blake Lemoine thinks its LaMDA AI has come to life

Tiku, N. Google engineer Blake Lemoine thinks its LaMDA AI has come to life. The Washington Post (2022). Available at: https://www.washingtonpost.com/technology/2022/06/11/google-ai-lamda-blake-lemoine/. (Accessed: 18th September

work page 2022
[19]

Asch, S. E. Effects of group pressure upon the modification and distortion of judgments. in Documents of Gestalt Psychology (2023). doi:10.2307/jj.5233080.20

work page doi:10.2307/jj.5233080.20 2023
[20]

& Mader, S

Franzen, A. & Mader, S. The power of social influence: A replication and extension of the Asch experiment. journals.plos.orgA Franzen, S MaderPlos one, 2023•journals.plos.org 18, (2023)

work page 2023
[21]

& Green, M

Baron-Epel, O., Kaplan, G., Weinstein, R. & Green, M. S. Extreme and acquiescence bias in a bi-ethnic population. Eur. J. Public Health 20, 543–548 (2010)

work page 2010
[22]

Falsification or Confirmation: From Logic to Psychology

Lukyanenko, R. Falsification or Confirmation: From Logic to Psychology. (2015)

work page 2015
[23]

& Stuckler, D

Rosenbacke, R., Melhus, Å. & Stuckler, D. False conflict and false confirmation errors are crucial components of AI accuracy in medical decision making. Nat. Commun. 2024 151 15, 1–2 (2024)

work page 2024
[24]

line judgment task. Psychol. Bull. 119, 111–137 (1996)

work page 1996
[25]

The Unity of Plato’s’ Gorgias’: Rhetoric, Justice, and the Philosophic Life

Stauffer, D. The Unity of Plato’s’ Gorgias’: Rhetoric, Justice, and the Philosophic Life. (2006)

work page 2006
[26]

L.-2024 12th I

Lee, Y., Suh, J., Zhan, H., … J. L.-2024 12th I. & 2024, undefined. Large language models 9 produce responses perceived to be empathic. ieeexplore.ieee.org

work page 2024
[27]

& Unkelbach, C

Reber, R., Unkelbach, C., Reber, R. & Unkelbach, C. The epistemic status of processing fluency as source for judgments of truth. Springer 1, 563–581 (2010)

work page 2010
[28]

Alter, A., Oppenheimer, D., … N. E.-J. of experimental & 2007, undefined. Overcoming intuition: metacognitive difficulty activates analytic reasoning. psycnet.apa.orgAL Alter, DM Oppenheimer, N Epley, RN EyreJournal Exp. Psychol. Gen. 2007•psycnet.apa.org (2007). doi:10.1037/0096-3445.136.4.569

work page doi:10.1037/0096-3445.136.4.569 2007
[29]

Massarelli, L. et al. How decoding strategies affect the verifiability of generated text. Find. Assoc. Comput. Linguist. Find. ACL EMNLP 2020 223–235 (2020). doi:10.18653/v1/2020.findings-emnlp.22

work page doi:10.18653/v1/2020.findings-emnlp.22 2020
[30]

& Vlachos, A

Zhu, X., Zhang, C., Stafford, T., Collier, N. & Vlachos, A. Conformity in Large Language Models. 3854–3872 (2025). doi:10.18653/v1/2025.acl-long.195

work page doi:10.18653/v1/2025.acl-long.195 2025
[31]

Mcintosh, L. D. et al. Making science better: reproducibility, falsifiability and the scientific method. Digit. McIntosh, CH Vitale, A Juehne, L Haynes, S Mothershead, J SumnerFigshare report, 2019•digitalscience.figshare.com (2019). doi:10.6084/m9.figshare.9633158

work page doi:10.6084/m9.figshare.9633158 2019
[32]

Is LaMDA Sentient? — an Interview | by Blake Lemoine | Medium

Lemoine, B. Is LaMDA Sentient? — an Interview | by Blake Lemoine | Medium. Medium (2022). Available at: https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917. (Accessed: 18th September

work page 2022
[33]

Google Engineer Claims AI Chatbot Is Sentient: Why That Matters

De Cosmo, L. Google Engineer Claims AI Chatbot Is Sentient: Why That Matters. Scientific American (2022). Available at: https://www.scientificamerican.com/article/google-engineer-claims-ai-chatbot-is-sentient-why-that-matters/. (Accessed: 18th September

work page 2022
[34]

& Goyal, A

Arora, S. & Goyal, A. A Theory for Emergence of Complex Skills in Language Models. (2023)

work page 2023
[35]

Xing, X. et al. On the caveats of AI autophagy. Nat. Mach. Intell. 7, 172–180 (2025)

work page 2025
[36]

N., Babar, M

Hasan, M. N., Babar, M. F., Sarkar, S., Hasan, M. & Karmaker, S. Pitfalls of Evaluating Language Models with Open Benchmarks. (2025)

work page 2025
[37]

& Shapiro, D

Meincke, L., Mollick, E., Mollick, L. & Shapiro, D. Prompting Science Report 1: Prompt Engineering is Complicated and Contingent. (2025). 10

work page 2025
[38]

& Shapiro, D

Meincke, L., Mollick, E., Mollick, L. & Shapiro, D. Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting. (2025)

work page 2025
[39]

& Morstatter, F

Salinas, A. & Morstatter, F. The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance. Proc. Annu. Meet. Assoc. Comput. Linguist. 4629–4651 (2024). doi:10.18653/v1/2024.findings-acl.275

work page doi:10.18653/v1/2024.findings-acl.275 2024
[40]

& Steinhardt, J

Burns, C., Ye, H., Klein, D. & Steinhardt, J. Discovering Latent Knowledge in Language Models Without Supervision. 11th Int. Conf. Learn. Represent. ICLR 2023 (2022)

work page 2023
[41]

& Evans, O

Lin, S., Hilton, J. & Evans, O. TruthfulQA: Measuring How Models Mimic Human Falsehoods. Proc. Annu. Meet. Assoc. Comput. Linguist. 1, 3214–3252 (2021)

work page 2021
[42]

Bender, E., of, A. K.-P. of the 58th annual meeting & 2020, undefined. Climbing towards NLU: On meaning, form, and understanding in the age of data. aclanthology.orgEM Bender, A KollerProceedings 58th Annu. Meet. Assoc. for, 2020•aclanthology.org 5185–5198

work page 2020

[1] [1]

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021)

work page 2021

[2] [2]

& Kasneci, E

Terzimehić, N., Bühler, B. & Kasneci, E. Conversational AI as a Catalyst for Informal Learning: An Empirical Large-Scale Study on LLM Use in Everyday Learning. 1, (2025)

work page 2025

[3] [3]

Shahzad, T. et al. A comprehensive review of large language models: issues and solutions in learning environments. SpringerT Shahzad, T Maz. MU Tariq, W Ahmad, K Ouahada, H HamamDiscover Sustain. 2025•Springer 6, 27 (2025)

work page 2025

[4] [4]

Cognitive Challenges in Human-AI Collaboration: A Study on Trust, Errors, and Heuristics in Clinical Decision-making

Rosenbacke, R. Cognitive Challenges in Human-AI Collaboration: A Study on Trust, Errors, and Heuristics in Clinical Decision-making. (2025)

work page 2025

[5] [5]

Radford, A. et al. Language models are unsupervised multitask learners. storage.prod.researchhub.comA Radford, J Wu, R Child, D Luan, D Amodei, I SutskeverOpenAI blog, 2019•storage.prod.researchhub.com

work page 2019

[6] [6]

& Dean, J

Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient Estimation of Word Representations in Vector Space. 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc. (2013)

work page 2013

[7] [7]

Dentella, V., Günther, F., Murphy, E., Reports, G. M.-S. & 2024, undefined. Testing AI on language comprehension tasks reveals insensitivity to underlying meaning. nature.comV Dentella, F Günther, E Murphy, G Marcus, E LeivadaScientific Reports, 2024•nature.com doi:10.1038/s41598-024-79531-8

work page doi:10.1038/s41598-024-79531-8 2024

[8] [8]

Huang, L. et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. dl.acm.orgL Huang, W Yu, W Ma, W Zhong, Z Feng, H Wang, Q Chen, W Peng, X Feng, B Qin, T LiuACM Trans. Inf. Syst. 2025•dl.acm.org 43, (2025)

work page 2025

[9] [9]

Jackson, J. C. et al. Emotion semantics show both cultural variation and universal structure. Sci. Jackson, J Watts, TR Henry, JM List. R Forkel, PJ Mucha, SJ Greenhill, RD GrayScience, 2019•science.org 366, 1517–1522 (2019)

work page 2019

[10] [10]

Xenia in Sophocles’ Philoctetes

Belfiore, E. Xenia in Sophocles’ Philoctetes. Class. J. 89, 113–129 (1994)

work page 1994

[11] [11]

How often do people lie? UWLAX (2024)

Docan-Morgan, T. How often do people lie? UWLAX (2024). Available at: https://www.uwlax.edu/currents/how-often-do-people-lie/. (Accessed: 18th September

work page 2024

[12] [12]

Liu, A. et al. We’re Afraid Language Models Aren’t Modeling Ambiguity. EMNLP 2023 - 2023 Conf. Empir. Methods Nat. Lang. Process. Proc. 790–807 (2023). 8 doi:10.18653/v1/2023.emnlp-main.51

work page doi:10.18653/v1/2023.emnlp-main.51 2023

[13] [13]

Knowledge and its Limits

Williamson, T. Knowledge and its Limits. (2002). doi:10.1093/019925656X.001.0001

work page doi:10.1093/019925656x.001.0001 2002

[14] [14]

Science and sanity: An introduction to non-Aristotelian systems and general semantics

Korzybski, A. Science and sanity: An introduction to non-Aristotelian systems and general semantics. (1958)

work page 1958

[15] [15]

Thinking, fast and slow

Kahneman, D. Thinking, fast and slow. (Farrar, Straus and Giroux ;, 2011)

work page 2011

[16] [16]

K.- & 2012, undefined

Moxley, J., Ericsson, K., Charness, N., Cognition, R. K.- & 2012, undefined. The role of intuition and deliberative thinking in experts’ superior tactical decision-making. ElsevierJH Moxley, KA Ericsson, N Charness, RT KrampeCognition, 2012•Elsevier

work page 2012

[17] [17]

Thinking fast, thinking slow

Kahneman, D. Thinking fast, thinking slow. Interpretation, Tavistock, London (2011)

work page 2011

[18] [18]

Google engineer Blake Lemoine thinks its LaMDA AI has come to life

Tiku, N. Google engineer Blake Lemoine thinks its LaMDA AI has come to life. The Washington Post (2022). Available at: https://www.washingtonpost.com/technology/2022/06/11/google-ai-lamda-blake-lemoine/. (Accessed: 18th September

work page 2022

[19] [19]

Asch, S. E. Effects of group pressure upon the modification and distortion of judgments. in Documents of Gestalt Psychology (2023). doi:10.2307/jj.5233080.20

work page doi:10.2307/jj.5233080.20 2023

[20] [20]

& Mader, S

Franzen, A. & Mader, S. The power of social influence: A replication and extension of the Asch experiment. journals.plos.orgA Franzen, S MaderPlos one, 2023•journals.plos.org 18, (2023)

work page 2023

[21] [21]

& Green, M

Baron-Epel, O., Kaplan, G., Weinstein, R. & Green, M. S. Extreme and acquiescence bias in a bi-ethnic population. Eur. J. Public Health 20, 543–548 (2010)

work page 2010

[22] [22]

Falsification or Confirmation: From Logic to Psychology

Lukyanenko, R. Falsification or Confirmation: From Logic to Psychology. (2015)

work page 2015

[23] [23]

& Stuckler, D

Rosenbacke, R., Melhus, Å. & Stuckler, D. False conflict and false confirmation errors are crucial components of AI accuracy in medical decision making. Nat. Commun. 2024 151 15, 1–2 (2024)

work page 2024

[24] [24]

line judgment task. Psychol. Bull. 119, 111–137 (1996)

work page 1996

[25] [25]

The Unity of Plato’s’ Gorgias’: Rhetoric, Justice, and the Philosophic Life

Stauffer, D. The Unity of Plato’s’ Gorgias’: Rhetoric, Justice, and the Philosophic Life. (2006)

work page 2006

[26] [26]

L.-2024 12th I

Lee, Y., Suh, J., Zhan, H., … J. L.-2024 12th I. & 2024, undefined. Large language models 9 produce responses perceived to be empathic. ieeexplore.ieee.org

work page 2024

[27] [27]

& Unkelbach, C

Reber, R., Unkelbach, C., Reber, R. & Unkelbach, C. The epistemic status of processing fluency as source for judgments of truth. Springer 1, 563–581 (2010)

work page 2010

[28] [28]

Alter, A., Oppenheimer, D., … N. E.-J. of experimental & 2007, undefined. Overcoming intuition: metacognitive difficulty activates analytic reasoning. psycnet.apa.orgAL Alter, DM Oppenheimer, N Epley, RN EyreJournal Exp. Psychol. Gen. 2007•psycnet.apa.org (2007). doi:10.1037/0096-3445.136.4.569

work page doi:10.1037/0096-3445.136.4.569 2007

[29] [29]

Massarelli, L. et al. How decoding strategies affect the verifiability of generated text. Find. Assoc. Comput. Linguist. Find. ACL EMNLP 2020 223–235 (2020). doi:10.18653/v1/2020.findings-emnlp.22

work page doi:10.18653/v1/2020.findings-emnlp.22 2020

[30] [30]

& Vlachos, A

Zhu, X., Zhang, C., Stafford, T., Collier, N. & Vlachos, A. Conformity in Large Language Models. 3854–3872 (2025). doi:10.18653/v1/2025.acl-long.195

work page doi:10.18653/v1/2025.acl-long.195 2025

[31] [31]

Mcintosh, L. D. et al. Making science better: reproducibility, falsifiability and the scientific method. Digit. McIntosh, CH Vitale, A Juehne, L Haynes, S Mothershead, J SumnerFigshare report, 2019•digitalscience.figshare.com (2019). doi:10.6084/m9.figshare.9633158

work page doi:10.6084/m9.figshare.9633158 2019

[32] [32]

Is LaMDA Sentient? — an Interview | by Blake Lemoine | Medium

Lemoine, B. Is LaMDA Sentient? — an Interview | by Blake Lemoine | Medium. Medium (2022). Available at: https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917. (Accessed: 18th September

work page 2022

[33] [33]

Google Engineer Claims AI Chatbot Is Sentient: Why That Matters

De Cosmo, L. Google Engineer Claims AI Chatbot Is Sentient: Why That Matters. Scientific American (2022). Available at: https://www.scientificamerican.com/article/google-engineer-claims-ai-chatbot-is-sentient-why-that-matters/. (Accessed: 18th September

work page 2022

[34] [34]

& Goyal, A

Arora, S. & Goyal, A. A Theory for Emergence of Complex Skills in Language Models. (2023)

work page 2023

[35] [35]

Xing, X. et al. On the caveats of AI autophagy. Nat. Mach. Intell. 7, 172–180 (2025)

work page 2025

[36] [36]

N., Babar, M

Hasan, M. N., Babar, M. F., Sarkar, S., Hasan, M. & Karmaker, S. Pitfalls of Evaluating Language Models with Open Benchmarks. (2025)

work page 2025

[37] [37]

& Shapiro, D

Meincke, L., Mollick, E., Mollick, L. & Shapiro, D. Prompting Science Report 1: Prompt Engineering is Complicated and Contingent. (2025). 10

work page 2025

[38] [38]

& Shapiro, D

Meincke, L., Mollick, E., Mollick, L. & Shapiro, D. Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting. (2025)

work page 2025

[39] [39]

& Morstatter, F

Salinas, A. & Morstatter, F. The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance. Proc. Annu. Meet. Assoc. Comput. Linguist. 4629–4651 (2024). doi:10.18653/v1/2024.findings-acl.275

work page doi:10.18653/v1/2024.findings-acl.275 2024

[40] [40]

& Steinhardt, J

Burns, C., Ye, H., Klein, D. & Steinhardt, J. Discovering Latent Knowledge in Language Models Without Supervision. 11th Int. Conf. Learn. Represent. ICLR 2023 (2022)

work page 2023

[41] [41]

& Evans, O

Lin, S., Hilton, J. & Evans, O. TruthfulQA: Measuring How Models Mimic Human Falsehoods. Proc. Annu. Meet. Assoc. Comput. Linguist. 1, 3214–3252 (2021)

work page 2021

[42] [42]

Bender, E., of, A. K.-P. of the 58th annual meeting & 2020, undefined. Climbing towards NLU: On meaning, form, and understanding in the age of data. aclanthology.orgEM Bender, A KollerProceedings 58th Annu. Meet. Assoc. for, 2020•aclanthology.org 5185–5198

work page 2020