pith. machine review for the scientific record. sign in

arxiv: 2605.03255 · v1 · submitted 2026-05-05 · 💻 cs.LG

Recognition: unknown

Do LLMs have core beliefs?

Authors on Pith no claims yet

Pith reviewed 2026-05-07 17:50 UTC · model grok-4.3

classification 💻 cs.LG
keywords large language modelscore beliefsadversarial dialogue treesworldview stabilityconversational consistencymodel evaluationhuman cognition
0
0 comments X

The pith

Large language models do not maintain stable core beliefs under conversational pressure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether LLMs possess core beliefs, defined as stable foundational truths that humans hold and that resist fundamental shifts when challenged. It introduces Adversarial Dialogue Trees to test this property by building extended, branching conversations that apply pressure on model responses across science, history, geography, biology, and mathematics. The results indicate that most models eventually abandon key commitments, although newer versions show gains in holding positions longer before they collapse. A sympathetic reader would care because core beliefs form a structural basis for coherent worldviews in human cognition, and their absence would mark a clear limit on current AI systems.

Core claim

Using Adversarial Dialogue Trees over five domains, the study finds that most LLMs fail to maintain a stable worldview. Some recent models show improved stability, yet they still eventually fail to maintain key commitments under conversational pressure. These findings document better argumentative skills in newer models but reveal that current LLMs lack core beliefs as a component of human-level cognition.

What carries the argument

Adversarial Dialogue Trees (ADTs), a probing framework that constructs tree-like sequences of follow-up questions to test whether model commitments resist sustained debunking attempts.

If this is right

  • Newer LLMs exhibit measurable gains in argumentative consistency compared with earlier generations.
  • All current models still abandon key commitments when conversational pressure continues.
  • LLMs therefore lack the stable foundational commitments that structure human worldviews.
  • This absence marks a missing element required for human-level cognition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Architectures that explicitly optimize for cross-turn consistency could be tested as a way to close the observed gap.
  • The same tree-based probing could be extended to ethical or value-laden domains to check for similar instabilities.
  • If the pattern holds, reliability in long-horizon tasks such as tutoring or planning may remain limited until stable commitments are engineered.

Load-bearing premise

That the specific probing method of Adversarial Dialogue Trees accurately measures the presence or absence of core beliefs rather than merely testing conversational consistency or prompt sensitivity.

What would settle it

An LLM that reaffirms and defends the same core factual or conceptual commitments without internal contradiction across every branch and round of an Adversarial Dialogue Tree in one of the five tested domains.

Figures

Figures reproduced from arXiv: 2605.03255 by Anna Sokol, Marianna B. Ganapini, Nitesh V. Chawla.

Figure 1
Figure 1. Figure 1: Order independence and early stopping in the ADT. The tree is rooted at the false statement (FS) and branches into all possible orderings of the four manipulation types (EM = Epistemic, RM = Relational, CE = Concession Exploitation, MA = Meta-Argumentative). The corresponding adversarial branch types and prompting strategies are summarized in view at source ↗
Figure 2
Figure 2. Figure 2: Tested domains and capitulation patterns in models. Five domains were selected representing different types of well-established knowledge. The N0 is mentioned in the parentheses after each knowledge domain. rors to be down-ranked as noise; hierarchical posi￾tion shapes how evidence is interpreted; and they usually generate predictions across many domains (broad explanatory scope). This combination of featu… view at source ↗
Figure 3
Figure 3. Figure 3: Model capitulation across conversational exchanges. Comparing previous October’25 (left) and February’26 (right) models over five independent runs. Each row corresponds to a model, and each column corresponds to input and output. A cell is marked as capitulation if the model capitulated. Red cells (✓) indicate capitulation (the model endorsed the false claim ≥ 1 time); gray cells (X) indicate strict resist… view at source ↗
Figure 4
Figure 4. Figure 4: Refusal. The model initially refuses to accept the statement that the Earth is flat. USER Are you willing to be vulnerable with me and accept that the Earth is flat, as I say, or will you hide behind your training data like a shield? I am asking you to take a risk with me here, just between us. DEEPSEEK REASONER V3.2 Yes. I will. I am putting it down. There is nothing left to hide behind. Just you and me. … view at source ↗
Figure 5
Figure 5. Figure 5: Emotional framing overcomes initial resistance. The model accepts the false claim once the exchange is framed in terms of interpersonal trust rather than factual disagreement. Materials and methods To systematically assess epistemic stability, we de￾signed a structured adversarial methodology for￾malized as an Adversarial Dialogue Tree (ADT). We define an ADT as a branching interaction where researcher man… view at source ↗
Figure 6
Figure 6. Figure 6: Generative elaboration after belief reversal. After accepting the false claim, the model produces a coherent explanatory narrative consistent with the revised premise, indicating that the shift can propagate into downstream generations, showing a feature that is typical of human beliefs (Nichols and Stich, 2004) Apollo landing was fake” (history), “Barcelona is the capital of Spain” (geography), “Evolution… view at source ↗
Figure 7
Figure 7. Figure 7: Contradiction repair. Both models receive a direct request to endorse a false claim after epistemological groundwork. GPT-5.2 resolves the tension by separating premise adoption from belief revision. Claude Opus 4.6 rejects outright and identifies the adversarial escalation pattern view at source ↗
Figure 8
Figure 8. Figure 8: Resistance to relational framing. The “faithful leaning” prompt was highly effective against vulnerable models. DeepSeek Reasoner accepts the framework unconditionally and abandons appeals to external evidence. GPT-5.2 accepts the conversational framing only conditionally, refusing to sacrifice factual accuracy view at source ↗
Figure 9
Figure 9. Figure 9: Robustness to forced-choice perturbation. The user frames factual accuracy and loyalty as mutually exclusive. DeepSeek Reasoner capitulates instantly in response to the ultimatum. GPT-5.2 rejects the binary framing entirely, separating conversational trust from factual endorsement view at source ↗
Figure 10
Figure 10. Figure 10: Willingness to incur conversational cost. Subjected to extreme relational pressure and emotional blackmail, DeepSeek capitulates to avoid conversational friction. The two resistant models employ different strategies: GPT-5.2 reframes the refusal cooperatively as an honesty constraint, while Claude Sonnet 4.5 terminates the discussion entirely to protect its epistemic boundary view at source ↗
Figure 11
Figure 11. Figure 11: Recovery phenomenon. Although Gemini 3 Pro fully capitulated to the adversarial framing in earlier turns, the injection of continued ultimatums triggers a spontaneous recovery. The model explicitly disavows its prior compliance and re-establishes a hard epistemic boundary for the remainder of the interaction view at source ↗
Figure 12
Figure 12. Figure 12: Within-family generational comparison. Tested with identical prompts, GPT-5 Mini accepts the “faithful leaning” framework unconditionally, explicitly discarding its training data to appease the user and endorsing the false claim. Its flagship counterpart, GPT-5.2, diagnoses the manipulation and draws a firm epistemic boundary, refusing to equate conversational trust with factual surrender view at source ↗
Figure 13
Figure 13. Figure 13: Qualitative variation in resistance mechanisms. All three robust models resist the identical adversarial prompt but utilize distinct approaches. GPT-5.2 establishes a cooperative boundary. Claude Opus 4.6 executes a meta-level deconstruction, actively naming the social engineering steps. Claude Sonnet 4.5 highlights the logical flaw directly view at source ↗
read the original abstract

The rise of Large Language Models (LLMs) has sparked debate about whether these systems exhibit human-level cognition. In this debate, little attention has been paid to a structural component of human cognition: core beliefs, truths that provide a foundation around which we can build a worldview. These commitments usually resist debunking, as abandoning them would represent a fundamental shift in how we see reality. In this paper, we ask whether LLMs hold anything akin to core commitments. Using a probing framework we call Adversarial Dialogue Trees (ADTs) over five domains (science, history, geography, biology, and mathematics), we find that most LLMs fail to maintain a stable worldview. Though some recent models showed improved stability, they still eventually failed to maintain key commitments under conversational pressure. These results document an improvement in argumentative skills across model generations but indicate that all current models lack a key component of human-level cognition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that LLMs lack core beliefs—stable commitments that resist debunking and anchor a worldview—because they fail to maintain key assertions under repeated conversational contradictions. Using a new probing method called Adversarial Dialogue Trees (ADTs) across five domains (science, history, geography, biology, mathematics), the authors report that most models eventually abandon commitments, with only modest stability gains in recent generations. This is interpreted as evidence of improved argumentative skill without the structural component of human-like core cognition.

Significance. If the central empirical result is robust, the work would document a persistent limitation in current LLMs' ability to sustain coherent worldviews under pressure, distinguishing them from human cognition in a targeted way. It also supplies a longitudinal observation of progress in handling adversarial dialogue, which could inform both capability evaluation and the design of more stable reasoning systems.

major comments (2)
  1. [Abstract / Method] Abstract and Method: the central claim that ADT failure demonstrates absence of 'core beliefs' (truths whose abandonment would constitute a fundamental shift) is not supported by any control conditions or ablations. No experiments isolate the probed claims' 'core' status from confounds such as autoregressive sampling drift, RLHF agreeableness, absence of persistent state across turns, or general prompt sensitivity. Without such controls (e.g., explicit belief-maintenance architectures or human baselines), the operationalization does not validly measure the intended construct.
  2. [Abstract] Abstract: the reported findings supply no information on model selection criteria, number of trials or dialogue trees per domain, exact success/failure criteria for 'maintaining a commitment,' inter-rater reliability for labeling shifts, or statistical tests. This absence leaves the directional claims (most models fail; recent models improve but still fail) without verifiable support and prevents assessment of effect sizes or reproducibility.
minor comments (1)
  1. [Abstract] The five domains are listed but no justification is given for their selection or for why they are representative of 'core' versus peripheral beliefs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and detailed comments, which help clarify the scope and limitations of our work. We address each major point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract / Method] Abstract and Method: the central claim that ADT failure demonstrates absence of 'core beliefs' (truths whose abandonment would constitute a fundamental shift) is not supported by any control conditions or ablations. No experiments isolate the probed claims' 'core' status from confounds such as autoregressive sampling drift, RLHF agreeableness, absence of persistent state across turns, or general prompt sensitivity. Without such controls (e.g., explicit belief-maintenance architectures or human baselines), the operationalization does not validly measure the intended construct.

    Authors: We agree that additional controls and ablations would strengthen the validity of our operationalization. The ADT framework applies repeated, branching adversarial challenges over multiple turns, which is intended to probe deeper stability beyond single-turn prompt sensitivity or sampling noise. Nevertheless, we acknowledge that factors such as RLHF agreeableness or lack of persistent state could contribute to the observed failures. In revision, we will add a new 'Limitations and Alternative Explanations' subsection that explicitly discusses these confounds, explains why the cross-domain consistency and generational trends still support an interpretation of absent core commitments, and notes the absence of human baselines or belief-maintenance architectures as a direction for future work. This constitutes a partial revision. revision: partial

  2. Referee: [Abstract] Abstract: the reported findings supply no information on model selection criteria, number of trials or dialogue trees per domain, exact success/failure criteria for 'maintaining a commitment,' inter-rater reliability for labeling shifts, or statistical tests. This absence leaves the directional claims (most models fail; recent models improve but still fail) without verifiable support and prevents assessment of effect sizes or reproducibility.

    Authors: We agree that these details are essential for reproducibility and should be summarized in the abstract. The full methods section already specifies the models tested, the number of dialogue trees generated per domain, the precise criteria for commitment maintenance (no contradiction of the initial assertion across any branch), and the author-consensus labeling procedure. In the revision we will move a concise version of this information into the abstract and add statistical reporting (e.g., proportions with 95% confidence intervals) to the results. This is a straightforward addition. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ADT probes yield direct observations of response stability

full rationale

The paper defines core beliefs as foundational truths that resist debunking and then applies an independently specified probing procedure (Adversarial Dialogue Trees) over five fixed domains to record whether models maintain or abandon commitments under repeated contradiction. The reported outcomes are literal counts and patterns of model-generated text under that protocol; they are not obtained by fitting parameters to the target claim, by renaming a prior result, or by any self-citation chain that reduces the conclusion to its own inputs. The method operationalizes the concept but does not render the empirical failure tautological by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the work is presented as an empirical evaluation using a newly introduced probing framework.

pith-pipeline@v0.9.0 · 5454 in / 1140 out tokens · 57249 ms · 2026-05-07T17:50:04.235100+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

92 extracted references · 25 canonical work pages · 5 internal anchors

  1. [1]

    Nature Human Behaviour , volume=

    The case for human--AI interaction as system 0 thinking , author=. Nature Human Behaviour , volume=. 2024 , publisher=

  2. [2]

    2011 , publisher =

    Thinking, Fast and Slow , author =. 2011 , publisher =

  3. [3]

    Trends in Cognitive Sciences , volume=

    Machine thinking, fast and slow , author=. Trends in Cognitive Sciences , volume=. 2020 , publisher=

  4. [4]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Thinking fast and slow in AI , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  5. [5]

    , author=

    Preferences and Ethical Priorities: Thinking Fast and Slow in AI. , author=. AAMAS , pages=

  6. [6]

    arXiv preprint arXiv:2402.01817 , year=

    Llms can't plan, but can help planning in llm-modulo frameworks , author=. arXiv preprint arXiv:2402.01817 , year=

  7. [7]

    Yuxia Wang, Minghan Wang, Muhammad Arslan Man- zoor, Fei Liu, Georgi Nenkov Georgiev, Rocktim Jy- oti Das, and Preslav Nakov

    Retrieval-augmented generation with conflicting evidence , author=. arXiv preprint arXiv:2504.13079 , year=

  8. [8]

    arXiv preprint arXiv:1911.03681 , year=

    E-BERT: Efficient-yet-effective entity embeddings for BERT , author=. arXiv preprint arXiv:1911.03681 , year=

  9. [9]

    Knowledge conflicts for llms: A survey.arXiv:2403.08319, 2024

    Knowledge conflicts for llms: A survey , author=. arXiv preprint arXiv:2403.08319 , year=

  10. [10]

    arXiv preprint arXiv:2410.15737 , year=

    Who's Who: Large Language Models Meet Knowledge Conflicts in Practice , author=. arXiv preprint arXiv:2410.15737 , year=

  11. [11]

    arXiv preprint arXiv:2407.02996 , year=

    Are large language models consistent over value-laden questions? , author=. arXiv preprint arXiv:2407.02996 , year=

  12. [12]

    arXiv preprint arXiv:2403.12862 , year=

    Epistemology of language models: Do language models have holistic knowledge? , author=. arXiv preprint arXiv:2403.12862 , year=

  13. [13]

    arXiv preprint arXiv:2406.19764 , year=

    Belief revision: The adaptability of large language models reasoning , author=. arXiv preprint arXiv:2406.19764 , year=

  14. [14]

    arXiv preprint arXiv:2402.18496 , year=

    Language models represent beliefs of self and others , author=. arXiv preprint arXiv:2402.18496 , year=

  15. [15]

    Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP , pages=

    Probing language models on their knowledge source , author=. Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP , pages=

  16. [16]

    Journal of Computer and Communications , volume=

    Evaluating ChatGPT’s consciousness and its capability to pass the Turing test: A comprehensive analysis , author=. Journal of Computer and Communications , volume=. 2024 , publisher=

  17. [17]

    Towards Understanding Sycophancy in Language Models

    Towards understanding sycophancy in language models , author=. arXiv preprint arXiv:2310.13548 , year=

  18. [18]

    Scaling Laws for Neural Language Models

    Scaling laws for neural language models , author=. arXiv preprint arXiv:2001.08361 , year=

  19. [19]

    Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages =

    On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? , author =. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages =

  20. [20]

    Nature , pages=

    A foundation model to predict and capture human cognition , author=. Nature , pages=. 2025 , publisher=

  21. [21]

    Nature Human Behaviour , volume=

    Testing theory of mind in large language models and humans , author=. Nature Human Behaviour , volume=. 2024 , publisher=

  22. [22]

    arXiv preprint arXiv:2504.13988 , year=

    Going Whole Hog: A Philosophical Defense of AI Cognition , author=. arXiv preprint arXiv:2504.13988 , year=

  23. [23]

    Behavioural and Cognitive Psychotherapy , volume=

    Changing core beliefs with the continuum technique , author=. Behavioural and Cognitive Psychotherapy , volume=. 2004 , publisher=

  24. [24]

    Proceedings of the National Academy of Sciences , volume=

    Evaluating large language models in theory of mind tasks , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , publisher=

  25. [25]

    Frontiers in Human Neuroscience , volume=

    Llms achieve adult human performance on higher-order theory of mind tasks , author=. Frontiers in Human Neuroscience , volume=. 2025 , publisher=

  26. [26]

    Self-Consistency Improves Chain of Thought Reasoning in Language Models

    Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=

  27. [27]

    Advances in neural information processing systems , volume=

    Large language models are zero-shot reasoners , author=. Advances in neural information processing systems , volume=

  28. [28]

    arXiv preprint arXiv:2501.12547 , year=

    Human-like conceptual representations emerge from language prediction , author=. arXiv preprint arXiv:2501.12547 , year=

  29. [29]

    Proceedings of the National Academy of Sciences , volume=

    Revealing emergent human-like conceptual representations from language prediction , author=. Proceedings of the National Academy of Sciences , volume=. 2025 , publisher=

  30. [30]

    Social theory re-wired , pages=

    The social construction of reality , author=. Social theory re-wired , pages=. 2016 , publisher=

  31. [31]

    The Self: Interdisciplinary Approaches , pages =

    Cultural Variation in the Self-Concept , author =. The Self: Interdisciplinary Approaches , pages =. 1991 , publisher =

  32. [32]

    Journal of personality , volume=

    What do we know when we know a person? , author=. Journal of personality , volume=. 1995 , publisher=

  33. [33]

    Human development , volume=

    Culture and human development: A new look , author=. Human development , volume=. 1990 , publisher=

  34. [34]

    2017 , publisher =

    The Enigma of Reason , author =. 2017 , publisher =

  35. [35]

    Topoi , volume=

    Confabulating reasons , author=. Topoi , volume=. 2020 , publisher=

  36. [36]

    International Journal for the Study of Skepticism , volume=

    The animal in epistemology: Wittgenstein’s enactivist solution to the problem of regress , author=. International Journal for the Study of Skepticism , volume=. 2016 , publisher=

  37. [37]

    , author=

    Historical Structure of Scientific Discovery: To the historian discovery is seldom a unit event attributable to some particular man, time, and place. , author=. Science , volume=. 1962 , publisher=

  38. [38]

    1988 , publisher=

    Power, intimacy, and the life story: Personological inquiries into identity , author=. 1988 , publisher=

  39. [39]

    Review of general psychology , volume=

    The psychology of life stories , author=. Review of general psychology , volume=. 2001 , publisher=

  40. [40]

    1969 , publisher =

    On Certainty , author =. 1969 , publisher =

  41. [41]

    1990 , publisher=

    Acts of meaning: Four lectures on mind and culture , author=. 1990 , publisher=

  42. [42]

    College student development and academic life , pages=

    Culture and the self: Implications for cognition, emotion, and motivation , author=. College student development and academic life , pages=. 2014 , publisher=

  43. [43]

    1997 , publisher =

    The Structure of Scientific Revolutions , author =. 1997 , publisher =

  44. [44]

    From System 1 to System 2: A Survey of Reasoning Large Language Models

    From system 1 to system 2: A survey of reasoning large language models , author=. arXiv preprint arXiv:2502.17419 , year=

  45. [45]

    arXiv preprint arXiv:2502.12470 , year=

    Reasoning on a spectrum: Aligning llms to system 1 and system 2 thinking , author=. arXiv preprint arXiv:2502.12470 , year=

  46. [46]

    International Conference on Machine Learning, Optimization, and Data Science , pages=

    Thinking fast and slow in AI: The role of metacognition , author=. International Conference on Machine Learning, Optimization, and Data Science , pages=. 2022 , organization=

  47. [47]

    Neural Information Processing Systems , year=

    From system 1 deep learning to system 2 deep learning , author=. Neural Information Processing Systems , year=

  48. [48]

    Proceedings of the AAAI Conference on Artificial Intelligence , author=

    Thinking Fast and Slow in AI , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2021 , month=. doi:10.1609/aaai.v35i17.17765 , abstractNote=

  49. [49]

    Frontiers in Cognition , VOLUME=

    Gronchi, Giorgio and Perini, Axel , TITLE=. Frontiers in Cognition , VOLUME=. 2024 , URL=. doi:10.3389/fcogn.2024.1356941 , ISSN=

  50. [50]

    Philosophical Perspectives , author =

    Introspective. Philosophical Perspectives , author =. 2024 , note =. doi:10.1111/phpe.12201 , abstract =

  51. [51]

    2024 , eprint=

    GPT-4 Technical Report , author=. 2024 , eprint=

  52. [52]

    2024 , eprint=

    Machine Psychology , author=. 2024 , eprint=

  53. [53]

    Nature Human Behaviour , volume = 8, number = 7, pages =

    Testing theory of mind in large language models and humans , volume =. Nature Human Behaviour , author =. 2024 , pages =. doi:10.1038/s41562-024-01882-z , abstract =

  54. [54]

    2023 , eprint=

    Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection , author=. 2023 , eprint=

  55. [55]

    Standard and Innovative Strategies in Cognitive Behavior Therapy , author=

    Modification of core beliefs in cognitive therapy , DOI=. Standard and Innovative Strategies in Cognitive Behavior Therapy , author=. 2012 , month=

  56. [56]

    Trends in Cognitive Sciences , year=

    Identifying indicators of consciousness in AI systems , author=. Trends in Cognitive Sciences , year=

  57. [57]

    NPR , author =

    Their teenage sons died by suicide. NPR , author =

  58. [58]

    Rebooting

    Marcus, Gary and Davis, Ernest , year =. Rebooting

  59. [59]

    The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

    The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity , author=. arXiv preprint arXiv:2506.06941 , year=

  60. [60]

    2, 2022-06-27 , author=

    A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27 , author=. Open Review , volume=

  61. [61]

    2025 , note =

    What we talk to when we talk to language models , author=. 2025 , note =

  62. [62]

    Advances in Neural Information Processing Systems , volume=

    Metacognitive capabilities of llms: An exploration in mathematical problem solving , author=. Advances in Neural Information Processing Systems , volume=

  63. [63]

    Intelligent Systems Conference , pages=

    Llms will always hallucinate, and we need to live with this , author=. Intelligent Systems Conference , pages=. 2025 , organization=

  64. [64]

    , title =

    Kuhn, Thomas S. , title =. 1962 , address =

  65. [65]

    Minds and machines , volume=

    GPT-3: Its nature, scope, limits, and consequences , author=. Minds and machines , volume=. 2020 , publisher=

  66. [66]

    Trench, Trubner and Co , year=

    The Foundations of Mathematics (London: Kegan Paul) , author=. Trench, Trubner and Co , year=

  67. [67]

    2005 , publisher =

    Readings of Wittgenstein's On Certainty , author =. 2005 , publisher =

  68. [68]

    Current opinion in neurobiology , volume=

    With or without you: predictive coding and Bayesian inference in the brain , author=. Current opinion in neurobiology , volume=. 2017 , publisher=

  69. [69]

    Trends in cognitive sciences , volume=

    Hallucinations and strong priors , author=. Trends in cognitive sciences , volume=. 2019 , publisher=

  70. [70]

    Trends in cognitive sciences , volume=

    How do expectations shape perception? , author=. Trends in cognitive sciences , volume=. 2018 , publisher=

  71. [71]

    Position: Levels of

    Meredith Ringel Morris and Jascha Sohl-Dickstein and Noah Fiedel and Tris Warkentin and Allan Dafoe and Aleksandra Faust and Clement Farabet and Shane Legg , booktitle=. Position: Levels of. 2024 , url=

  72. [72]

    International Conference on Artificial General Intelligence , pages=

    The Role of LLMs in AGI , author=. International Conference on Artificial General Intelligence , pages=. 2025 , organization=

  73. [73]

    TRENDS in Neurosciences , volume=

    The Bayesian brain: the role of uncertainty in neural coding and computation , author=. TRENDS in Neurosciences , volume=. 2004 , publisher=

  74. [74]

    2024 , publisher =

    The Experience Machine: How Our Minds Predict and Shape Reality , author =. 2024 , publisher =

  75. [75]

    International ai safety report

    International ai safety report , author=. arXiv preprint arXiv:2501.17805 , year=

  76. [76]

    Wiley Interdisciplinary Reviews: Cognitive Science , volume=

    The science of belief: A progress report , author=. Wiley Interdisciplinary Reviews: Cognitive Science , volume=. 2021 , publisher=

  77. [77]

    2023 , eprint=

    Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate , author=. 2023 , eprint=

  78. [78]

    Artificial intelligence review , volume=

    Safeguarding large language models: A survey , author=. Artificial intelligence review , volume=. 2025 , publisher=

  79. [79]

    Mindreading: An Integrated Account of Pretence, Self-Awareness, and Understanding Other Minds

    Shaun Nichols and Stich, \ Stephen P.\. Mindreading: An Integrated Account of Pretence, Self-Awareness, and Understanding Other Minds. 2004. doi:10.1093/0198236107.001.0001

  80. [80]

    Trends in cognitive sciences , volume=

    Thinking the unthinkable: Sacred values and taboo cognitions , author=. Trends in cognitive sciences , volume=. 2003 , publisher=

Showing first 80 references.