Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning

Davide Marocco; Nicola Milano

arxiv: 2605.22356 · v1 · pith:RKW3OXMVnew · submitted 2026-05-21 · 💻 cs.CL

Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning

Nicola Milano , Davide Marocco This is my paper

Pith reviewed 2026-05-22 05:39 UTC · model grok-4.3

classification 💻 cs.CL

keywords language modelsfine-tuningbehavioral inductioncognitive modelingmaladaptive patternsgenerative distributionspolicy optimizationlatent priors

0 comments

The pith

Fine-tuning language models on synthetic maladaptive behaviors creates stable shifts in their language outputs consistent with altered priors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to show that fine-tuning language models on tasks designed to mimic maladaptive behaviors like depression and paranoia can create lasting changes in how the models choose actions and generate language. These changes appear as increased focus on negative or threatening ideas in their outputs, and they hold up across different situations. If this holds, it means LLMs could be used as simplified models to study how behavior and thinking influence each other in a controlled setting. The work also finds that training for one pattern does not produce the same effects as training for another.

Core claim

By optimizing transformer-based language models through fine-tuning on synthetic datasets that encourage maladaptive action selection, such as those inspired by depression or paranoia, the models exhibit stable, context-general shifts in next-token probability distributions. These shifts include increased probabilities for negative and threat-related interpretations in open-ended tasks. The induced profiles are partially specific, with different behavioral patterns leading to dissociable response tendencies. This supports the view that behavioral optimization alters latent priors, connecting action selection directly to language generation in policy-based systems.

What carries the argument

The behavioral induction framework, which uses fine-tuning on synthetic decision-making datasets to modify model policies and induce consistent behavioral patterns.

Load-bearing premise

That fine-tuning on synthetic examples of depression and paranoia actually changes the models' underlying decision-making tendencies in a deep and general way, rather than just creating superficial patterns in the data.

What would settle it

A test showing that the fine-tuned models do not assign higher probabilities to negative or threat-related words in new, unrelated open-ended generation tasks would disprove the claim of stable, generalizable shifts.

Figures

Figures reproduced from arXiv: 2605.22356 by Davide Marocco, Nicola Milano.

**Figure 2.** Figure 2: Semantic Shift Heatmap (Llama-3-8B). The Left Panel (Healthy) is [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Semantic Shift Heatmap (Qwen-2.5). The Left Panel (Healthy) is [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Probability Mass Divergence (Depression). Boxplots show the clear [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Probability Mass Divergence (Paranoia). Both models exhibit high [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Semantic Shift Heatmap (Llama-3 Paranoia). The Left Panel [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Semantic Shift Heatmap (Qwen-2.5 Paranoia). The Left Panel [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Specificity Radar Charts (Depression). The distinct shapes of the De [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Specificity Radar Charts (Paranoia). The Paranoid profiles (Red) [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

read the original abstract

Large language models are increasingly used as computational tools for modeling human-like behavior. We introduce a behavioral induction framework that modifies model policies through fine-tuning on structured decision-making tasks: using synthetic datasets inspired by maladaptive behavioral patterns, including depression and paranoia, we train transformer-based language models to consistently select specific classes of actions across diverse contexts. We then test whether this behavioral optimization produces systematic changes in generative distributions. Across two architectures, fine-tuned models show stable, context-general shifts in next-token probability distributions, including increased probability assigned to negative and threat-related interpretations in open-ended language tasks. These effects generalize beyond training contexts and are detectable in qualitative completions, psychometric-style evaluations, and quantitative distributional metrics such as Jensen-Shannon divergence. Induced behavioral profiles also show partial specificity. Models optimized for different behavioral patterns exhibit dissociable response tendencies across evaluation probes, suggesting that structured behavioral training produces differentiated policy-level biases rather than generic distributional skew. We interpret these findings as evidence that consistent behavioral optimization in LLMs can generate stable behavioral and distributional patterns consistent with altered latent priors, linking action selection and language generation. More broadly, the results support a view of LLMs as policy-based systems in which behavioral constraints shape emergent representational structure, highlighting their potential as controlled testbeds for studying the relationship between behavior, interpretation, and generative language in computational models of cognition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fine-tuning on synthetic pathology tasks produces some measurable shifts in LLM outputs with partial specificity, but missing controls leave open whether these are real policy changes or just surface artifacts.

read the letter

The main thing to know is that this paper fine-tunes language models on synthetic decision tasks meant to mimic depression and paranoia patterns, then shows the models start assigning higher probability to negative or threat-related completions in open-ended tasks, with some differentiation between the two fine-tuned versions and measurable distributional changes via Jensen-Shannon divergence. They report this holds across two architectures and generalizes past the training examples. That part is straightforward and the partial specificity is a reasonable observation worth noting. What they did cleanly is keep the setup simple and test both qualitative completions and quantitative metrics without overclaiming a full cognitive model. The soft spots are exactly where the stress-test note points: no controls for neutral or randomly labeled decision data, so we cannot tell if the shifts are specific to the maladaptive patterns or just any structured fine-tuning on decision tasks. The abstract also gives no statistical tests, run-to-run variability, or details on how they ruled out prompt sensitivity. Without those, the jump to altered latent priors and policy-level changes stays under-supported. This is the kind of incremental work that could interest people building LLM-based testbeds for behavioral or cognitive modeling, but only if the methods get tightened. It is not a paradigm shift and the evidence is still preliminary. I would send it to peer review so referees can require the missing controls and clearer reporting on generalization; the core idea is coherent enough to be worth that step rather than a desk reject.

Referee Report

3 major / 0 minor

Summary. The manuscript introduces a behavioral induction framework that fine-tunes transformer-based language models on synthetic datasets designed to elicit depression- and paranoia-like decision patterns. It reports that the resulting models exhibit stable, context-general shifts in next-token probability distributions (increased negative/threat interpretations), measurable via qualitative completions, psychometric probes, and distributional metrics such as Jensen-Shannon divergence. The authors further claim partial specificity across different behavioral targets and interpret the results as evidence that consistent behavioral optimization alters latent priors, thereby linking action selection to generative language in LLMs.

Significance. If the central empirical claims survive rigorous controls for surface-level artifacts and prompt sensitivity, the work would supply a controlled experimental paradigm for studying how policy-level constraints shape emergent representational structure in LLMs. This could strengthen the use of language models as testbeds for computational cognitive science, particularly for modeling the relationship between behavioral biases and interpretive tendencies. The absence of such controls in the current description, however, leaves the link between fine-tuning and genuine prior alteration under-supported.

major comments (3)

Abstract: the claim that fine-tuning produces 'stable, context-general shifts' and 'generalize beyond training contexts' is load-bearing for the central thesis, yet the description supplies no information on the specific out-of-distribution prompts, continued pre-training controls, or surface-correlation-breaking contexts used to test robustness; without these, it is impossible to rule out token-level statistical regularities learned from the synthetic decision data.
Abstract (evaluation description): the reported increases in negative/threat interpretations and Jensen-Shannon divergence are presented without accompanying baselines (e.g., fine-tuning on neutral or randomly labeled decision tasks), statistical tests, or exclusion criteria; these omissions directly undermine the assertion that the observed distributional changes reflect policy-level modifications rather than superficial artifacts.
Abstract (specificity claim): the statement that 'models optimized for different behavioral patterns exhibit dissociable response tendencies' is central to arguing against generic distributional skew, but no quantitative comparison (e.g., cross-probe confusion matrices or effect-size contrasts) is referenced, leaving the partial-specificity interpretation unsupported by the available evidence.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments. The feedback highlights important areas where additional detail and controls will strengthen the manuscript's claims regarding robustness, baselines, and specificity. We address each major comment below and have made revisions to incorporate the requested information.

read point-by-point responses

Referee: Abstract: the claim that fine-tuning produces 'stable, context-general shifts' and 'generalize beyond training contexts' is load-bearing for the central thesis, yet the description supplies no information on the specific out-of-distribution prompts, continued pre-training controls, or surface-correlation-breaking contexts used to test robustness; without these, it is impossible to rule out token-level statistical regularities learned from the synthetic decision data.

Authors: We agree that the abstract would benefit from explicit reference to these robustness tests. In the revised manuscript, we have updated the abstract to briefly describe the out-of-distribution evaluation prompts (novel scenarios and paraphrased contexts absent from training data), continued pre-training controls on neutral decision tasks, and surface-correlation-breaking contexts. Full details, including prompt examples and how they disrupt token-level regularities, are now cross-referenced to Section 4.2 and the supplementary materials. revision: yes
Referee: Abstract (evaluation description): the reported increases in negative/threat interpretations and Jensen-Shannon divergence are presented without accompanying baselines (e.g., fine-tuning on neutral or randomly labeled decision tasks), statistical tests, or exclusion criteria; these omissions directly undermine the assertion that the observed distributional changes reflect policy-level modifications rather than superficial artifacts.

Authors: We acknowledge this omission and have revised the abstract to reference the baseline comparisons (neutral and randomly labeled fine-tuning conditions) against which the increases in negative/threat interpretations and Jensen-Shannon divergence were measured. We have also added mention of the statistical tests (paired t-tests with reported p-values and effect sizes) performed on the distributional metrics. Exclusion criteria for prompts (e.g., removal of training-overlapping or ambiguous items) are now summarized in the abstract with details in the methods section. revision: yes
Referee: Abstract (specificity claim): the statement that 'models optimized for different behavioral patterns exhibit dissociable response tendencies' is central to arguing against generic distributional skew, but no quantitative comparison (e.g., cross-probe confusion matrices or effect-size contrasts) is referenced, leaving the partial-specificity interpretation unsupported by the available evidence.

Authors: We agree that quantitative evidence would better support the partial-specificity interpretation. The revised abstract now references the quantitative comparisons performed, including cross-probe confusion matrices demonstrating low overlap between depression- and paranoia-optimized models and effect-size contrasts (Cohen's d values) for key dissociations. These results are presented in the main text (Figure 3 and Table 2) and confirm differentiated rather than generic shifts. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical fine-tuning and distributional measurements are self-contained.

full rationale

The paper describes an experimental procedure of fine-tuning transformer models on synthetic decision-making datasets to induce depression- and paranoia-like action selection, followed by measurement of resulting shifts in next-token distributions via Jensen-Shannon divergence, qualitative completions, and psychometric probes. No derivation chain, first-principles equations, or predictions that reduce to fitted inputs by construction appear in the presented material. Claims rest on observed empirical outcomes across two architectures and multiple evaluation settings rather than self-referential definitions or load-bearing self-citations, rendering the analysis independent of its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the assumption that synthetic datasets capture the target behavioral patterns and that fine-tuning induces stable, generalizable policy changes; no explicit free parameters or invented entities are described.

axioms (1)

domain assumption Synthetic datasets inspired by depression and paranoia accurately represent maladaptive behavioral patterns for training purposes.
The entire induction framework relies on these datasets to produce the claimed behavioral shifts.

pith-pipeline@v0.9.0 · 5767 in / 1184 out tokens · 59589 ms · 2026-05-22T05:39:15.146631+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We fine-tuned the models using Low-Rank Adaptation LoRA... Training was implemented using the Unsloth library... objective was standard Causal Language Modeling loss

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 4 internal anchors

[1]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

work page 2024
[2]

Rotter and Janet E

Julian B. Rotter and Janet E. Rafferty , title =. 1950 , publisher =

work page 1950
[3]

Meta AI , year=

Llama 3 Model Card , author=. Meta AI , year=

work page
[4]

and Akata, E

Binz, M. and Akata, E. and Bethge, M. and Brändle, F. and Callaway, F. and Coda-Forno, J. and others , title =. Nature , year =

work page
[5]

2024 , journal=

Qwen2.5: A Party of Foundation Models , author=. 2024 , journal=

work page 2024
[6]

2023 , url=

Unsloth: Faster and Memory-Efficient Training of Large Language Models , author=. 2023 , url=

work page 2023
[7]

International Conference on Learning Representations , year=

LoRA: Low-Rank Adaptation of Large Language Models , author=. International Conference on Learning Representations , year=

work page
[8]

Archives of general psychiatry , volume=

An inventory for measuring depression , author=. Archives of general psychiatry , volume=. 1961 , publisher=

work page 1961
[9]

Paranoid Thought Scales (GPTS): development and validation , author=

The Green et al. Paranoid Thought Scales (GPTS): development and validation , author=. Psychological medicine , volume=. 2008 , publisher=

work page 2008
[10]

2013 , publisher=

Diagnostic and statistical manual of mental disorders (DSM-5) , author=. 2013 , publisher=

work page 2013
[11]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page
[12]

2023 , url=

GPT-4 Technical Report , author=. 2023 , url=

work page 2023
[13]

Proceedings of the National Academy of Sciences , volume=

Using cognitive psychology to understand GPT-3 , author=. Proceedings of the National Academy of Sciences , volume=. 2023 , publisher=

work page 2023
[14]

Generative Agents: Interactive Simulacra of Human Behavior

Generative agents: Interactive simulacra of human behavior , author=. arXiv preprint arXiv:2304.03442 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

2025 , eprint=

Emergence of psychopathological computations in large language models , author=. 2025 , eprint=

work page 2025
[16]

arXiv preprint arXiv:2311.09633 , year=

Predicting results of social science experiments using large language models , author=. arXiv preprint arXiv:2311.09633 , year=

work page arXiv
[17]

arXiv preprint arXiv:2309.07062 , year=

Assessments of personality using large language models: A comprehensive evaluation , author=. arXiv preprint arXiv:2309.07062 , year=

work page arXiv
[18]

arXiv preprint arXiv:2310.13636 , year=

Identity, Misportrayal, and Flattening in Large Language Models , author=. arXiv preprint arXiv:2310.13636 , year=

work page arXiv
[19]

Advances in Neural Information Processing Systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in Neural Information Processing Systems , volume=

work page
[20]

Nature , volume=

Role play with large language models , author=. Nature , volume=. 2023 , publisher=

work page 2023
[21]

Nature neuroscience , volume=

Computational psychiatry as a bridge from neuroscience to clinical applications , author=. Nature neuroscience , volume=. 2016 , publisher=

work page 2016
[22]

arXiv preprint arXiv:2403.01257 , year=

A foundation model to predict and capture human cognition , author=. arXiv preprint arXiv:2403.01257 , year=

work page arXiv
[23]

Trends in cognitive sciences , volume=

Conscious thought as simulation of behaviour and perception , author=. Trends in cognitive sciences , volume=. 2002 , publisher=

work page 2002
[24]

Philosophical Transactions of the Royal Society B: Biological Sciences , volume=

Simulation, situated conceptualization, and prediction , author=. Philosophical Transactions of the Royal Society B: Biological Sciences , volume=. 2009 , publisher=

work page 2009
[25]

Nature reviews neuroscience , volume=

The free-energy principle: a unified brain theory? , author=. Nature reviews neuroscience , volume=. 2010 , publisher=

work page 2010
[26]

Journal of Mathematical Psychology , volume=

Active inference on discrete state-spaces: A synthesis , author=. Journal of Mathematical Psychology , volume=. 2020 , publisher=

work page 2020
[27]

Behavioral and brain sciences , volume=

Whatever next? Predictive brains, situated agents, and the future of cognitive science , author=. Behavioral and brain sciences , volume=. 2013 , publisher=

work page 2013
[28]

Political Analysis , volume=

Out of one, many: Using language models to simulate human samples , author=. Political Analysis , volume=. 2023 , publisher=

work page 2023
[29]

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting , author=. arXiv preprint arXiv:2305.04388 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Advances in Neural Information Processing Systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in Neural Information Processing Systems , volume=

work page
[31]

Entropy , volume=

Beautiful loop: The re-emergence of active inference , author=. Entropy , volume=

work page
[32]

Trends in cognitive sciences , volume=

Computational psychiatry: the brain as a computational organ , author=. Trends in cognitive sciences , volume=. 2012 , publisher=

work page 2012
[33]

Advances in Neural Information Processing Systems , year=

Evaluating Cognitive Maps and Planning in Large Language Models with CogEval , author=. Advances in Neural Information Processing Systems , year=

work page
[34]

Proceedings of the National Academy of Sciences , volume=

Evaluating large language models in theory of mind tasks , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , publisher=

work page 2024
[35]

2023 , url=

Gemini: A Family of Highly Capable Multimodal Models , author=. 2023 , url=

work page 2023
[36]

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond , author=. arXiv preprint arXiv:2308.12966 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[37]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

2024 , eprint=

GPT-4 Technical Report , author=. 2024 , eprint=

work page 2024

[2] [2]

Rotter and Janet E

Julian B. Rotter and Janet E. Rafferty , title =. 1950 , publisher =

work page 1950

[3] [3]

Meta AI , year=

Llama 3 Model Card , author=. Meta AI , year=

work page

[4] [4]

and Akata, E

Binz, M. and Akata, E. and Bethge, M. and Brändle, F. and Callaway, F. and Coda-Forno, J. and others , title =. Nature , year =

work page

[5] [5]

2024 , journal=

Qwen2.5: A Party of Foundation Models , author=. 2024 , journal=

work page 2024

[6] [6]

2023 , url=

Unsloth: Faster and Memory-Efficient Training of Large Language Models , author=. 2023 , url=

work page 2023

[7] [7]

International Conference on Learning Representations , year=

LoRA: Low-Rank Adaptation of Large Language Models , author=. International Conference on Learning Representations , year=

work page

[8] [8]

Archives of general psychiatry , volume=

An inventory for measuring depression , author=. Archives of general psychiatry , volume=. 1961 , publisher=

work page 1961

[9] [9]

Paranoid Thought Scales (GPTS): development and validation , author=

The Green et al. Paranoid Thought Scales (GPTS): development and validation , author=. Psychological medicine , volume=. 2008 , publisher=

work page 2008

[10] [10]

2013 , publisher=

Diagnostic and statistical manual of mental disorders (DSM-5) , author=. 2013 , publisher=

work page 2013

[11] [11]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page

[12] [12]

2023 , url=

GPT-4 Technical Report , author=. 2023 , url=

work page 2023

[13] [13]

Proceedings of the National Academy of Sciences , volume=

Using cognitive psychology to understand GPT-3 , author=. Proceedings of the National Academy of Sciences , volume=. 2023 , publisher=

work page 2023

[14] [14]

Generative Agents: Interactive Simulacra of Human Behavior

Generative agents: Interactive simulacra of human behavior , author=. arXiv preprint arXiv:2304.03442 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

2025 , eprint=

Emergence of psychopathological computations in large language models , author=. 2025 , eprint=

work page 2025

[16] [16]

arXiv preprint arXiv:2311.09633 , year=

Predicting results of social science experiments using large language models , author=. arXiv preprint arXiv:2311.09633 , year=

work page arXiv

[17] [17]

arXiv preprint arXiv:2309.07062 , year=

Assessments of personality using large language models: A comprehensive evaluation , author=. arXiv preprint arXiv:2309.07062 , year=

work page arXiv

[18] [18]

arXiv preprint arXiv:2310.13636 , year=

Identity, Misportrayal, and Flattening in Large Language Models , author=. arXiv preprint arXiv:2310.13636 , year=

work page arXiv

[19] [19]

Advances in Neural Information Processing Systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in Neural Information Processing Systems , volume=

work page

[20] [20]

Nature , volume=

Role play with large language models , author=. Nature , volume=. 2023 , publisher=

work page 2023

[21] [21]

Nature neuroscience , volume=

Computational psychiatry as a bridge from neuroscience to clinical applications , author=. Nature neuroscience , volume=. 2016 , publisher=

work page 2016

[22] [22]

arXiv preprint arXiv:2403.01257 , year=

A foundation model to predict and capture human cognition , author=. arXiv preprint arXiv:2403.01257 , year=

work page arXiv

[23] [23]

Trends in cognitive sciences , volume=

Conscious thought as simulation of behaviour and perception , author=. Trends in cognitive sciences , volume=. 2002 , publisher=

work page 2002

[24] [24]

Philosophical Transactions of the Royal Society B: Biological Sciences , volume=

Simulation, situated conceptualization, and prediction , author=. Philosophical Transactions of the Royal Society B: Biological Sciences , volume=. 2009 , publisher=

work page 2009

[25] [25]

Nature reviews neuroscience , volume=

The free-energy principle: a unified brain theory? , author=. Nature reviews neuroscience , volume=. 2010 , publisher=

work page 2010

[26] [26]

Journal of Mathematical Psychology , volume=

Active inference on discrete state-spaces: A synthesis , author=. Journal of Mathematical Psychology , volume=. 2020 , publisher=

work page 2020

[27] [27]

Behavioral and brain sciences , volume=

Whatever next? Predictive brains, situated agents, and the future of cognitive science , author=. Behavioral and brain sciences , volume=. 2013 , publisher=

work page 2013

[28] [28]

Political Analysis , volume=

Out of one, many: Using language models to simulate human samples , author=. Political Analysis , volume=. 2023 , publisher=

work page 2023

[29] [29]

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting , author=. arXiv preprint arXiv:2305.04388 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

Advances in Neural Information Processing Systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in Neural Information Processing Systems , volume=

work page

[31] [31]

Entropy , volume=

Beautiful loop: The re-emergence of active inference , author=. Entropy , volume=

work page

[32] [32]

Trends in cognitive sciences , volume=

Computational psychiatry: the brain as a computational organ , author=. Trends in cognitive sciences , volume=. 2012 , publisher=

work page 2012

[33] [33]

Advances in Neural Information Processing Systems , year=

Evaluating Cognitive Maps and Planning in Large Language Models with CogEval , author=. Advances in Neural Information Processing Systems , year=

work page

[34] [34]

Proceedings of the National Academy of Sciences , volume=

Evaluating large language models in theory of mind tasks , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , publisher=

work page 2024

[35] [35]

2023 , url=

Gemini: A Family of Highly Capable Multimodal Models , author=. 2023 , url=

work page 2023

[36] [36]

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond , author=. arXiv preprint arXiv:2308.12966 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[37] [37]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

work page internal anchor Pith review Pith/arXiv arXiv