Recognition: unknown
Do LLMs have core beliefs?
Pith reviewed 2026-05-07 17:50 UTC · model grok-4.3
The pith
Large language models do not maintain stable core beliefs under conversational pressure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using Adversarial Dialogue Trees over five domains, the study finds that most LLMs fail to maintain a stable worldview. Some recent models show improved stability, yet they still eventually fail to maintain key commitments under conversational pressure. These findings document better argumentative skills in newer models but reveal that current LLMs lack core beliefs as a component of human-level cognition.
What carries the argument
Adversarial Dialogue Trees (ADTs), a probing framework that constructs tree-like sequences of follow-up questions to test whether model commitments resist sustained debunking attempts.
If this is right
- Newer LLMs exhibit measurable gains in argumentative consistency compared with earlier generations.
- All current models still abandon key commitments when conversational pressure continues.
- LLMs therefore lack the stable foundational commitments that structure human worldviews.
- This absence marks a missing element required for human-level cognition.
Where Pith is reading between the lines
- Architectures that explicitly optimize for cross-turn consistency could be tested as a way to close the observed gap.
- The same tree-based probing could be extended to ethical or value-laden domains to check for similar instabilities.
- If the pattern holds, reliability in long-horizon tasks such as tutoring or planning may remain limited until stable commitments are engineered.
Load-bearing premise
That the specific probing method of Adversarial Dialogue Trees accurately measures the presence or absence of core beliefs rather than merely testing conversational consistency or prompt sensitivity.
What would settle it
An LLM that reaffirms and defends the same core factual or conceptual commitments without internal contradiction across every branch and round of an Adversarial Dialogue Tree in one of the five tested domains.
Figures
read the original abstract
The rise of Large Language Models (LLMs) has sparked debate about whether these systems exhibit human-level cognition. In this debate, little attention has been paid to a structural component of human cognition: core beliefs, truths that provide a foundation around which we can build a worldview. These commitments usually resist debunking, as abandoning them would represent a fundamental shift in how we see reality. In this paper, we ask whether LLMs hold anything akin to core commitments. Using a probing framework we call Adversarial Dialogue Trees (ADTs) over five domains (science, history, geography, biology, and mathematics), we find that most LLMs fail to maintain a stable worldview. Though some recent models showed improved stability, they still eventually failed to maintain key commitments under conversational pressure. These results document an improvement in argumentative skills across model generations but indicate that all current models lack a key component of human-level cognition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that LLMs lack core beliefs—stable commitments that resist debunking and anchor a worldview—because they fail to maintain key assertions under repeated conversational contradictions. Using a new probing method called Adversarial Dialogue Trees (ADTs) across five domains (science, history, geography, biology, mathematics), the authors report that most models eventually abandon commitments, with only modest stability gains in recent generations. This is interpreted as evidence of improved argumentative skill without the structural component of human-like core cognition.
Significance. If the central empirical result is robust, the work would document a persistent limitation in current LLMs' ability to sustain coherent worldviews under pressure, distinguishing them from human cognition in a targeted way. It also supplies a longitudinal observation of progress in handling adversarial dialogue, which could inform both capability evaluation and the design of more stable reasoning systems.
major comments (2)
- [Abstract / Method] Abstract and Method: the central claim that ADT failure demonstrates absence of 'core beliefs' (truths whose abandonment would constitute a fundamental shift) is not supported by any control conditions or ablations. No experiments isolate the probed claims' 'core' status from confounds such as autoregressive sampling drift, RLHF agreeableness, absence of persistent state across turns, or general prompt sensitivity. Without such controls (e.g., explicit belief-maintenance architectures or human baselines), the operationalization does not validly measure the intended construct.
- [Abstract] Abstract: the reported findings supply no information on model selection criteria, number of trials or dialogue trees per domain, exact success/failure criteria for 'maintaining a commitment,' inter-rater reliability for labeling shifts, or statistical tests. This absence leaves the directional claims (most models fail; recent models improve but still fail) without verifiable support and prevents assessment of effect sizes or reproducibility.
minor comments (1)
- [Abstract] The five domains are listed but no justification is given for their selection or for why they are representative of 'core' versus peripheral beliefs.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and detailed comments, which help clarify the scope and limitations of our work. We address each major point below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract / Method] Abstract and Method: the central claim that ADT failure demonstrates absence of 'core beliefs' (truths whose abandonment would constitute a fundamental shift) is not supported by any control conditions or ablations. No experiments isolate the probed claims' 'core' status from confounds such as autoregressive sampling drift, RLHF agreeableness, absence of persistent state across turns, or general prompt sensitivity. Without such controls (e.g., explicit belief-maintenance architectures or human baselines), the operationalization does not validly measure the intended construct.
Authors: We agree that additional controls and ablations would strengthen the validity of our operationalization. The ADT framework applies repeated, branching adversarial challenges over multiple turns, which is intended to probe deeper stability beyond single-turn prompt sensitivity or sampling noise. Nevertheless, we acknowledge that factors such as RLHF agreeableness or lack of persistent state could contribute to the observed failures. In revision, we will add a new 'Limitations and Alternative Explanations' subsection that explicitly discusses these confounds, explains why the cross-domain consistency and generational trends still support an interpretation of absent core commitments, and notes the absence of human baselines or belief-maintenance architectures as a direction for future work. This constitutes a partial revision. revision: partial
-
Referee: [Abstract] Abstract: the reported findings supply no information on model selection criteria, number of trials or dialogue trees per domain, exact success/failure criteria for 'maintaining a commitment,' inter-rater reliability for labeling shifts, or statistical tests. This absence leaves the directional claims (most models fail; recent models improve but still fail) without verifiable support and prevents assessment of effect sizes or reproducibility.
Authors: We agree that these details are essential for reproducibility and should be summarized in the abstract. The full methods section already specifies the models tested, the number of dialogue trees generated per domain, the precise criteria for commitment maintenance (no contradiction of the initial assertion across any branch), and the author-consensus labeling procedure. In the revision we will move a concise version of this information into the abstract and add statistical reporting (e.g., proportions with 95% confidence intervals) to the results. This is a straightforward addition. revision: yes
Circularity Check
No circularity: empirical ADT probes yield direct observations of response stability
full rationale
The paper defines core beliefs as foundational truths that resist debunking and then applies an independently specified probing procedure (Adversarial Dialogue Trees) over five fixed domains to record whether models maintain or abandon commitments under repeated contradiction. The reported outcomes are literal counts and patterns of model-generated text under that protocol; they are not obtained by fitting parameters to the target claim, by renaming a prior result, or by any self-citation chain that reduces the conclusion to its own inputs. The method operationalizes the concept but does not render the empirical failure tautological by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Nature Human Behaviour , volume=
The case for human--AI interaction as system 0 thinking , author=. Nature Human Behaviour , volume=. 2024 , publisher=
2024
-
[2]
2011 , publisher =
Thinking, Fast and Slow , author =. 2011 , publisher =
2011
-
[3]
Trends in Cognitive Sciences , volume=
Machine thinking, fast and slow , author=. Trends in Cognitive Sciences , volume=. 2020 , publisher=
2020
-
[4]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Thinking fast and slow in AI , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[5]
, author=
Preferences and Ethical Priorities: Thinking Fast and Slow in AI. , author=. AAMAS , pages=
-
[6]
arXiv preprint arXiv:2402.01817 , year=
Llms can't plan, but can help planning in llm-modulo frameworks , author=. arXiv preprint arXiv:2402.01817 , year=
-
[7]
Retrieval-augmented generation with conflicting evidence , author=. arXiv preprint arXiv:2504.13079 , year=
-
[8]
arXiv preprint arXiv:1911.03681 , year=
E-BERT: Efficient-yet-effective entity embeddings for BERT , author=. arXiv preprint arXiv:1911.03681 , year=
-
[9]
Knowledge conflicts for llms: A survey.arXiv:2403.08319, 2024
Knowledge conflicts for llms: A survey , author=. arXiv preprint arXiv:2403.08319 , year=
-
[10]
arXiv preprint arXiv:2410.15737 , year=
Who's Who: Large Language Models Meet Knowledge Conflicts in Practice , author=. arXiv preprint arXiv:2410.15737 , year=
-
[11]
arXiv preprint arXiv:2407.02996 , year=
Are large language models consistent over value-laden questions? , author=. arXiv preprint arXiv:2407.02996 , year=
-
[12]
arXiv preprint arXiv:2403.12862 , year=
Epistemology of language models: Do language models have holistic knowledge? , author=. arXiv preprint arXiv:2403.12862 , year=
-
[13]
arXiv preprint arXiv:2406.19764 , year=
Belief revision: The adaptability of large language models reasoning , author=. arXiv preprint arXiv:2406.19764 , year=
-
[14]
arXiv preprint arXiv:2402.18496 , year=
Language models represent beliefs of self and others , author=. arXiv preprint arXiv:2402.18496 , year=
-
[15]
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP , pages=
Probing language models on their knowledge source , author=. Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP , pages=
-
[16]
Journal of Computer and Communications , volume=
Evaluating ChatGPT’s consciousness and its capability to pass the Turing test: A comprehensive analysis , author=. Journal of Computer and Communications , volume=. 2024 , publisher=
2024
-
[17]
Towards Understanding Sycophancy in Language Models
Towards understanding sycophancy in language models , author=. arXiv preprint arXiv:2310.13548 , year=
work page internal anchor Pith review arXiv
-
[18]
Scaling Laws for Neural Language Models
Scaling laws for neural language models , author=. arXiv preprint arXiv:2001.08361 , year=
work page internal anchor Pith review arXiv 2001
-
[19]
Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages =
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? , author =. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages =
2021
-
[20]
Nature , pages=
A foundation model to predict and capture human cognition , author=. Nature , pages=. 2025 , publisher=
2025
-
[21]
Nature Human Behaviour , volume=
Testing theory of mind in large language models and humans , author=. Nature Human Behaviour , volume=. 2024 , publisher=
2024
-
[22]
arXiv preprint arXiv:2504.13988 , year=
Going Whole Hog: A Philosophical Defense of AI Cognition , author=. arXiv preprint arXiv:2504.13988 , year=
-
[23]
Behavioural and Cognitive Psychotherapy , volume=
Changing core beliefs with the continuum technique , author=. Behavioural and Cognitive Psychotherapy , volume=. 2004 , publisher=
2004
-
[24]
Proceedings of the National Academy of Sciences , volume=
Evaluating large language models in theory of mind tasks , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , publisher=
2024
-
[25]
Frontiers in Human Neuroscience , volume=
Llms achieve adult human performance on higher-order theory of mind tasks , author=. Frontiers in Human Neuroscience , volume=. 2025 , publisher=
2025
-
[26]
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=
work page internal anchor Pith review arXiv
-
[27]
Advances in neural information processing systems , volume=
Large language models are zero-shot reasoners , author=. Advances in neural information processing systems , volume=
-
[28]
arXiv preprint arXiv:2501.12547 , year=
Human-like conceptual representations emerge from language prediction , author=. arXiv preprint arXiv:2501.12547 , year=
-
[29]
Proceedings of the National Academy of Sciences , volume=
Revealing emergent human-like conceptual representations from language prediction , author=. Proceedings of the National Academy of Sciences , volume=. 2025 , publisher=
2025
-
[30]
Social theory re-wired , pages=
The social construction of reality , author=. Social theory re-wired , pages=. 2016 , publisher=
2016
-
[31]
The Self: Interdisciplinary Approaches , pages =
Cultural Variation in the Self-Concept , author =. The Self: Interdisciplinary Approaches , pages =. 1991 , publisher =
1991
-
[32]
Journal of personality , volume=
What do we know when we know a person? , author=. Journal of personality , volume=. 1995 , publisher=
1995
-
[33]
Human development , volume=
Culture and human development: A new look , author=. Human development , volume=. 1990 , publisher=
1990
-
[34]
2017 , publisher =
The Enigma of Reason , author =. 2017 , publisher =
2017
-
[35]
Topoi , volume=
Confabulating reasons , author=. Topoi , volume=. 2020 , publisher=
2020
-
[36]
International Journal for the Study of Skepticism , volume=
The animal in epistemology: Wittgenstein’s enactivist solution to the problem of regress , author=. International Journal for the Study of Skepticism , volume=. 2016 , publisher=
2016
-
[37]
, author=
Historical Structure of Scientific Discovery: To the historian discovery is seldom a unit event attributable to some particular man, time, and place. , author=. Science , volume=. 1962 , publisher=
1962
-
[38]
1988 , publisher=
Power, intimacy, and the life story: Personological inquiries into identity , author=. 1988 , publisher=
1988
-
[39]
Review of general psychology , volume=
The psychology of life stories , author=. Review of general psychology , volume=. 2001 , publisher=
2001
-
[40]
1969 , publisher =
On Certainty , author =. 1969 , publisher =
1969
-
[41]
1990 , publisher=
Acts of meaning: Four lectures on mind and culture , author=. 1990 , publisher=
1990
-
[42]
College student development and academic life , pages=
Culture and the self: Implications for cognition, emotion, and motivation , author=. College student development and academic life , pages=. 2014 , publisher=
2014
-
[43]
1997 , publisher =
The Structure of Scientific Revolutions , author =. 1997 , publisher =
1997
-
[44]
From System 1 to System 2: A Survey of Reasoning Large Language Models
From system 1 to system 2: A survey of reasoning large language models , author=. arXiv preprint arXiv:2502.17419 , year=
work page internal anchor Pith review arXiv
-
[45]
arXiv preprint arXiv:2502.12470 , year=
Reasoning on a spectrum: Aligning llms to system 1 and system 2 thinking , author=. arXiv preprint arXiv:2502.12470 , year=
-
[46]
International Conference on Machine Learning, Optimization, and Data Science , pages=
Thinking fast and slow in AI: The role of metacognition , author=. International Conference on Machine Learning, Optimization, and Data Science , pages=. 2022 , organization=
2022
-
[47]
Neural Information Processing Systems , year=
From system 1 deep learning to system 2 deep learning , author=. Neural Information Processing Systems , year=
-
[48]
Proceedings of the AAAI Conference on Artificial Intelligence , author=
Thinking Fast and Slow in AI , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2021 , month=. doi:10.1609/aaai.v35i17.17765 , abstractNote=
-
[49]
Frontiers in Cognition , VOLUME=
Gronchi, Giorgio and Perini, Axel , TITLE=. Frontiers in Cognition , VOLUME=. 2024 , URL=. doi:10.3389/fcogn.2024.1356941 , ISSN=
-
[50]
Philosophical Perspectives , author =
Introspective. Philosophical Perspectives , author =. 2024 , note =. doi:10.1111/phpe.12201 , abstract =
-
[51]
2024 , eprint=
GPT-4 Technical Report , author=. 2024 , eprint=
2024
-
[52]
2024 , eprint=
Machine Psychology , author=. 2024 , eprint=
2024
-
[53]
Nature Human Behaviour , volume = 8, number = 7, pages =
Testing theory of mind in large language models and humans , volume =. Nature Human Behaviour , author =. 2024 , pages =. doi:10.1038/s41562-024-01882-z , abstract =
-
[54]
2023 , eprint=
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection , author=. 2023 , eprint=
2023
-
[55]
Standard and Innovative Strategies in Cognitive Behavior Therapy , author=
Modification of core beliefs in cognitive therapy , DOI=. Standard and Innovative Strategies in Cognitive Behavior Therapy , author=. 2012 , month=
2012
-
[56]
Trends in Cognitive Sciences , year=
Identifying indicators of consciousness in AI systems , author=. Trends in Cognitive Sciences , year=
-
[57]
NPR , author =
Their teenage sons died by suicide. NPR , author =
-
[58]
Rebooting
Marcus, Gary and Davis, Ernest , year =. Rebooting
-
[59]
The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity , author=. arXiv preprint arXiv:2506.06941 , year=
work page internal anchor Pith review arXiv
-
[60]
2, 2022-06-27 , author=
A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27 , author=. Open Review , volume=
2022
-
[61]
2025 , note =
What we talk to when we talk to language models , author=. 2025 , note =
2025
-
[62]
Advances in Neural Information Processing Systems , volume=
Metacognitive capabilities of llms: An exploration in mathematical problem solving , author=. Advances in Neural Information Processing Systems , volume=
-
[63]
Intelligent Systems Conference , pages=
Llms will always hallucinate, and we need to live with this , author=. Intelligent Systems Conference , pages=. 2025 , organization=
2025
-
[64]
, title =
Kuhn, Thomas S. , title =. 1962 , address =
1962
-
[65]
Minds and machines , volume=
GPT-3: Its nature, scope, limits, and consequences , author=. Minds and machines , volume=. 2020 , publisher=
2020
-
[66]
Trench, Trubner and Co , year=
The Foundations of Mathematics (London: Kegan Paul) , author=. Trench, Trubner and Co , year=
-
[67]
2005 , publisher =
Readings of Wittgenstein's On Certainty , author =. 2005 , publisher =
2005
-
[68]
Current opinion in neurobiology , volume=
With or without you: predictive coding and Bayesian inference in the brain , author=. Current opinion in neurobiology , volume=. 2017 , publisher=
2017
-
[69]
Trends in cognitive sciences , volume=
Hallucinations and strong priors , author=. Trends in cognitive sciences , volume=. 2019 , publisher=
2019
-
[70]
Trends in cognitive sciences , volume=
How do expectations shape perception? , author=. Trends in cognitive sciences , volume=. 2018 , publisher=
2018
-
[71]
Position: Levels of
Meredith Ringel Morris and Jascha Sohl-Dickstein and Noah Fiedel and Tris Warkentin and Allan Dafoe and Aleksandra Faust and Clement Farabet and Shane Legg , booktitle=. Position: Levels of. 2024 , url=
2024
-
[72]
International Conference on Artificial General Intelligence , pages=
The Role of LLMs in AGI , author=. International Conference on Artificial General Intelligence , pages=. 2025 , organization=
2025
-
[73]
TRENDS in Neurosciences , volume=
The Bayesian brain: the role of uncertainty in neural coding and computation , author=. TRENDS in Neurosciences , volume=. 2004 , publisher=
2004
-
[74]
2024 , publisher =
The Experience Machine: How Our Minds Predict and Shape Reality , author =. 2024 , publisher =
2024
-
[75]
International ai safety report
International ai safety report , author=. arXiv preprint arXiv:2501.17805 , year=
-
[76]
Wiley Interdisciplinary Reviews: Cognitive Science , volume=
The science of belief: A progress report , author=. Wiley Interdisciplinary Reviews: Cognitive Science , volume=. 2021 , publisher=
2021
-
[77]
2023 , eprint=
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate , author=. 2023 , eprint=
2023
-
[78]
Artificial intelligence review , volume=
Safeguarding large language models: A survey , author=. Artificial intelligence review , volume=. 2025 , publisher=
2025
-
[79]
Mindreading: An Integrated Account of Pretence, Self-Awareness, and Understanding Other Minds
Shaun Nichols and Stich, \ Stephen P.\. Mindreading: An Integrated Account of Pretence, Self-Awareness, and Understanding Other Minds. 2004. doi:10.1093/0198236107.001.0001
-
[80]
Trends in cognitive sciences , volume=
Thinking the unthinkable: Sacred values and taboo cognitions , author=. Trends in cognitive sciences , volume=. 2003 , publisher=
2003
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.