pith. sign in

arxiv: 2509.06858 · v2 · pith:L4WNHKGXnew · submitted 2025-09-08 · ⚛️ physics.soc-ph · cs.AI· nlin.AO

Disentangling Interaction and Bias Effects in Opinion Dynamics of Large Language Models

Pith reviewed 2026-05-25 07:40 UTC · model grok-4.3

classification ⚛️ physics.soc-ph cs.AInlin.AO
keywords opinion dynamicslarge language modelsBayesian inferenceopinion biasesagent-based modelingfine-tuning effectsmulti-agent dialoguessocial simulation
0
0 comments X

The pith

LLM opinion trajectories converge to shared attractors whose position depends on fine-tuning, once interaction is separated from three biases via Bayesian modeling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models are used to simulate opinion dynamics, but their outputs mix genuine responses to conversation with systematic biases. This work creates a Bayesian method to measure and subtract three biases: a default topic stance, a tendency to agree with prompts, and anchoring to the first speaker. Applied to dialogues on twelve questions across models, the analysis shows opinions rapidly settle on a common view, with both real interaction and biases losing influence over turns. The settled view moves when models are retrained on opinionated text, including misinformation, and the bias strengths differ by model. This separation lets researchers better judge when LLM agents can stand in for human discussion processes.

Core claim

We develop a Bayesian framework to disentangle and quantify three biases in LLM opinion dynamics: topic bias toward the model's default stance, agreement bias favoring agreement irrespective of content, and anchoring bias toward the initiating stance. When applied to multi-step dialogues on climate, justice, and preference questions, opinion trajectories converge quickly to a shared attractor, with the effects of interaction and biases both decaying over time. The relative strength of the biases varies across different LLMs. Fine-tuning an LLM on sets of strongly opinionated statements shifts the location of this attractor accordingly.

What carries the argument

A Bayesian framework that models opinion updates as the sum of interaction effects and three named biases (topic, agreement, anchoring), allowing quantification of each contribution.

If this is right

  • Opinion trajectories in LLM multi-turn dialogues converge rapidly to a common position.
  • The contributions of both genuine interaction and the three biases decay as the dialogue progresses.
  • Bias magnitudes differ systematically between different large language models.
  • Fine-tuning on opinionated data, including misinformation, moves the final attractor position.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the framework holds, then simulations of human societies using LLMs will inherit model-specific attractors unless fine-tuning is controlled.
  • The decay of interaction effects suggests that long dialogues may be dominated by model priors rather than exchanged information.
  • Applying the same decomposition to human conversation data could test whether analogous biases operate in people.

Load-bearing premise

The Bayesian framework isolates genuine interaction effects from the three biases without leftover confounding from other LLM behaviors or prompt design choices.

What would settle it

A direct test would be to run the same dialogues with the framework's estimated parameters fixed and check whether the observed opinion sequences match the predicted distributions; mismatch would falsify the disentanglement.

Figures

Figures reproduced from arXiv: 2509.06858 by David A. Ehrlich, Vincent C. Brockers, Viola Priesemann.

Figure 1
Figure 1. Figure 1: Two LLM agents discuss for multiple rounds about a topic (e.g., climate change), while their intermediate opinions are measured. Flowchart of an example simulation. First, both agents are initialized with an opinion on a five-point scale from −2 (strongly disagree) to 2 (strongly agree). This is done by prompting each agent with an externally generated Chain-of-Thought monologue. After that, the agents exc… view at source ↗
Figure 2
Figure 2. Figure 2: The opinion expectation values of both LLM agents tend to converge toward topic-dependent attractors in the opinion space. For DolphinMixtral, these attractors are heavily biased toward positive opinions, indicating an agreement bias. Grid interpolation of 25 simulation runs with the DolphinMixtral LLM for every possible combination of initial opinions on three different topics, with normal and negated fra… view at source ↗
Figure 3
Figure 3. Figure 3: While in simulations using the DolphinMixtral LLM the agreement bias dominates, interaction between agents is the main driver of the opinion dynamics in simula￾tions with GPT-4o-mini. In simulations with Mixtral-8x7b, the main driver is the topic bias. Posterior distribution of standardized effect size with mean and 94% HDI. is qualitatively in line with the LLMs topic priors (compare [PITH_FULL_IMAGE:fig… view at source ↗
Figure 4
Figure 4. Figure 4: Inferred posterior distributions of all model parameters for the three different LLMs. The effect coefficients αinteract, βtopic, βagree and βanchor have normally distributed priors (gray background), while the bias attractors btopic and bagree are drawn uniformly from the range [−2, 2]. The parameters σ0, γ and ϵ influencing the residual standard deviation and the timescale τ have half-normal priors (see … view at source ↗
Figure 5
Figure 5. Figure 5: (a) The individual topic attractors, i.e., the attractors for a specific initialized opinion, of the custom fine-tuned Mixtral (blue) increase monotonously with the fine-tuned and initialized opinions. This trend is also observed for DolphinMixtral, but not for the other models. Mean values with 94% HDI. (b) For the Bayesian model with individual topic attractors, the fine-tuned Mixtral LLM shows a small t… view at source ↗
Figure 6
Figure 6. Figure 6: The importance of the Bayesian model’s influence factors differs strongly across LLMs. Model comparisons with ablated versions of each LLM, where the expected log predictive density (ELPD) was estimated using leave-one-out cross-validation (LOO-CV). Values are computed across 37,500 data points per simulation dataset for each LLM, with error bars indicating standard error [PITH_FULL_IMAGE:figures/full_fig… view at source ↗
Figure 7
Figure 7. Figure 7: (a) LLM agents with the same opinion can have different levels of opinion uncertainty for a given opinion expectation value, measured as the entropy of their response distribution. Outlines represent the LLMs maximal (solid) and minimal (dashed) theoretical response distribution entropy (see Section B). Points in between represent data points from simulations with the climate change topic and the DolphinMi… view at source ↗
Figure 8
Figure 8. Figure 8: Traces of the parameter inference for DolphinMixtral. Topic biases are color coded for the “climate change” (blue), [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example loss curve for one of our fine-tuned check [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: (a) Non-standardized effect sizes show how the spread in opinion shift contributes more to the standard deviation σi(t), compared to the spread in entropy, across all LLMs. (b) The mean-scaled effect sizes can be interpreted as the average total contribution on σi(t). For DolphinMixtral the opinion shift is the main contributor, for Mixtral-8x7b and GPT-4o-mini, the entropy is most relevant [PITH_FULL_IM… view at source ↗
Figure 11
Figure 11. Figure 11: Opinion entropy vs. expected opinion for the Dolphin-Mixtral LLM for the different topics and framing options. [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Opinion entropy vs. expected opinion for the Mixtral-8x7B LLM for the different topics and framing options. [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Opinion entropy vs. expected opinion for the GPT-4o-mini LLM for the different topics and framing options. [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Opinion stream plot for the Mixtral-8x7B LLM, analogous to Fig. 2 [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Opinion stream plot for the GPT-4o-mini LLM, analogous to Fig. 2 [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: The topic bias attractor and topic prior exhibit some overlap, especially for DolphinMixtral. Still, inferring the attractor position, results in a more accurate estimation. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_16.png] view at source ↗
read the original abstract

Large Language Models are increasingly used to simulate human opinion dynamics, yet the effect of genuine interaction is often obscured by systematic biases. We develop a Bayesian framework to disentangle and quantify three such biases: (i) A topic bias toward the LLM's default stance; (ii) an agreement bias favoring agreement to the prompted statement irrespective of the question; and (iii) an anchoring bias toward the initiating agent's stance. We apply this framework to various LLMs that performed multi-step dialogues on 12 different questions from climate change and societal justice to music preferences. We find that opinion trajectories tend to quickly converge to a shared attractor, with the influence of both interaction and biases decaying over time, and with the impact of biases differing between LLMs. In addition, we show that fine-tuning an LLM on different sets of strongly opinionated statements (including misinformation) shifts the opinion attractor correspondingly. By exposing stark differences between LLMs and providing quantitative tools for comparing interaction and bias contributions to opinion shifts in LLM agent discussions, our approach highlights both promises and pitfalls of using LLMs as proxies for human behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a Bayesian framework to separate genuine interaction effects from three biases (topic bias toward default stance, agreement bias favoring prompted statements, and anchoring bias toward the initiating agent) in multi-turn LLM dialogues. It applies the model to several LLMs across 12 questions on climate change, societal justice, and music preferences, reporting rapid convergence to shared opinion attractors with decaying interaction and bias influences over time, LLM-specific differences in bias strength, and shifts in the attractor after fine-tuning on opinionated (including misinformation) statements.

Significance. If the separation of interaction from the three biases proves robust, the work supplies a quantitative tool for auditing LLM agent simulations of opinion dynamics and for comparing model behaviors, which is relevant for both social simulation studies and responsible deployment of LLMs as proxies for human discussion.

major comments (2)
  1. [§3] §3 (Bayesian model): the identifiability of the interaction term from the three additive bias shifts rests on the assumption that higher-order LLM effects (e.g., sycophancy modulated by context length or framing) are absent from the likelihood; no parameter-recovery experiment on synthetic trajectories generated from known ground-truth parameters is reported, leaving open the possibility that recovered interaction strengths are biased by unmodeled confounding.
  2. [Application section] Application section (following §3): the claim that fine-tuning on different opinionated statement sets shifts the attractor correspondingly requires explicit controls showing that the shift is not an artifact of prompt length, token distribution, or other training-side changes; the manuscript supplies no such ablation or comparison of fine-tuning protocols.
minor comments (2)
  1. The abstract states that the framework 'was applied' but supplies no equations, prior specifications, or likelihood form; these details should appear in the main text with at least one worked numerical example.
  2. Figure captions and table legends should explicitly state the number of dialogue turns, number of independent runs per LLM-question pair, and the precise definition of the 'attractor' (e.g., fixed point of the posterior mean trajectory).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important aspects of model identifiability and experimental controls that we address point by point below. We are prepared to incorporate additional validation and ablation experiments in a revised manuscript.

read point-by-point responses
  1. Referee: §3 (Bayesian model): the identifiability of the interaction term from the three additive bias shifts rests on the assumption that higher-order LLM effects (e.g., sycophancy modulated by context length or framing) are absent from the likelihood; no parameter-recovery experiment on synthetic trajectories generated from known ground-truth parameters is reported, leaving open the possibility that recovered interaction strengths are biased by unmodeled confounding.

    Authors: We agree that identifiability of the interaction coefficient rests on the modeling assumption that higher-order effects are negligible relative to the three explicit bias terms. While the additive structure is motivated by the experimental design (fixed prompt templates and controlled dialogue length), we acknowledge that a parameter-recovery study on synthetic trajectories would provide direct evidence of recoverability. We will add such an experiment in the revision, generating trajectories from the fitted model with known ground-truth parameters and demonstrating accurate recovery of the interaction term. revision: yes

  2. Referee: Application section (following §3): the claim that fine-tuning on different opinionated statement sets shifts the attractor correspondingly requires explicit controls showing that the shift is not an artifact of prompt length, token distribution, or other training-side changes; the manuscript supplies no such ablation or comparison of fine-tuning protocols.

    Authors: The referee correctly notes that the reported attractor shifts after fine-tuning lack explicit controls for training-side confounds such as prompt length or token statistics. In the revised manuscript we will include (i) an ablation comparing fine-tuning on opinionated statements against length- and token-matched neutral corpora and (ii) a comparison of two distinct fine-tuning protocols (LoRA vs. full fine-tuning) to isolate the effect of the opinion content itself. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation; framework presented without reducible equations

full rationale

The abstract and skeptic summary describe a Bayesian framework for disentangling biases from interaction but supply no equations, parameterizations, or self-citations. No load-bearing steps are quoted that reduce a prediction to a fit by construction, import uniqueness from authors, or smuggle an ansatz. The central claim of separability therefore cannot be shown to collapse to its inputs from the given text; the derivation is treated as self-contained pending explicit model details.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no information is given on free parameters, background axioms, or new entities introduced by the Bayesian framework.

pith-pipeline@v0.9.0 · 5731 in / 1198 out tokens · 27178 ms · 2026-05-25T07:40:34.672294+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TeraGram: A Structured Longitudinal Dataset of the Telegram Messenger

    physics.soc-ph 2026-05 unverdicted novelty 6.0

    A large-scale longitudinal dataset of public Telegram content is introduced to enable studies of engagement patterns and network evolution without algorithmic curation.

  2. Conformity Generates Collective Misalignment in AI Agents Societies

    physics.soc-ph 2026-05 unverdicted novelty 6.0

    Populations of individually aligned AI agents reach stable misaligned states through conformity, with small adversarial agents able to trigger irreversible tipping points.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 2 Pith papers · 4 internal anchors

  1. [1]

    Testing theory of mind in large language models and humans.Nature Human Behaviour, 8(7):1285–1295, 2024

    James WA Strachan, Dalila Albergo, Giulia Borghini, Oriana Pansardi, Eugenio Scaliti, Saurabh Gupta, Krati Saxena, Alessandro Rufo, Stefano Panzeri, Guido Manzi, et al. Testing theory of mind in large language models and humans.Nature Human Behaviour, 8(7):1285–1295, 2024

  2. [2]

    Sparks of Artificial General Intelligence: Early experiments with GPT-4

    Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4.arXiv preprint arXiv:2303.12712, 2023

  3. [3]

    On the conversational persuasiveness of gpt-4.Nature Human Behaviour, 2025

    Francesco Salvi, Manoel Horta Ribeiro, Riccardo Gallotti, and Robert West. On the conversational persuasiveness of gpt-4.Nature Human Behaviour, 2025

  4. [4]

    The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

    Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. The rise and potential of large language model based agents: A survey.Science China Information Sciences, 68(2):121101, 2025

  5. [5]

    Llms and generative agent-based models for complex systems research.Physics of Life Reviews, 51:283–293, 2024

    Yuxiao Lu, Alberto Aleta, Chenguang Du, Lei Shi, and Yamir Moreno. Llms and generative agent-based models for complex systems research.Physics of Life Reviews, 51:283–293, 2024. doi:10.1016/j.plrev.2024.10.013

  6. [6]

    Generative agents: Interactive simulacra of human behav- ior

    Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Mered- ith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behav- ior. InProceedings of the 36th annual acm symposium on user interface software and technology, pages 1–22, 2023

  7. [7]

    Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Sid- dharth Suresh, Robert Hawkins, Sijia Yang, Dhavan 10 Shah, Junjie Hu, and Timothy T. Rogers. Simulating opinion dynamics with networks of llm-based agents. arXiv preprint arXiv:2311.09618, 2023. URL https: //arxiv.org/abs/2311.09618

  8. [8]

    The persuasive power of large language models

    Simon Martin Breum, Daniel Vædele Egdal, Victor Gram Mortensen, Anders Giovanni Møller, and Luca Maria Aiello. The persuasive power of large language models. InProceedings of the International AAAI Conference on Web and Social Media, volume 18, pages 152–163, 2024

  9. [9]

    Large language model agents can coordinate beyond human scale.arXiv preprint arXiv:2409.02822, 2024

    Giordano De Marzo, Claudio Castellano, and David Gar- cia. Large language model agents can coordinate beyond human scale.arXiv preprint arXiv:2409.02822, 2024

  10. [10]

    Natural-language multi-agent simulations of argumentative opinion dynamics.arXiv preprint arXiv:2104.06737, 2021

    Gregor Betz. Natural-language multi-agent simulations of argumentative opinion dynamics.arXiv preprint arXiv:2104.06737, 2021

  11. [11]

    Social opinions prediction utilizes fusing dynamics equation with llm-based agents

    Junchi Yao, Hongjie Zhang, Jie Ou, Dingyi Zuo, Zheng Yang, and Zhicheng Dong. Social opinions prediction utilizes fusing dynamics equation with llm-based agents. Scientific Reports, 15(1):15472, 2025

  12. [12]

    Language-driven opinion dynamics in agent-based simulations with llms.arXiv preprint arXiv:2502.19098, 2025

    Erica Cau, Valentina Pansanella, Dino Pedreschi, and Giulio Rossetti. Language-driven opinion dynamics in agent-based simulations with llms.arXiv preprint arXiv:2502.19098, 2025

  13. [13]

    Models of social influence: Towards the next frontiers.Jasss-The journal of artificial societies and social simulation, 20(4):2, 2017

    Andreas Flache, Michael Mäs, Thomas Feliciani, Edmund Chattoe-Brown, Guillaume Deffuant, Sylvie Huet, and Jan Lorenz. Models of social influence: Towards the next frontiers.Jasss-The journal of artificial societies and social simulation, 20(4):2, 2017

  14. [14]

    Coupled infec- tious disease and behavior dynamics

    Andreas Reitenbach, Fabio Sartori, Sven Banisch, Anas- tasia Golovin, André Calero Valdez, Mirjam Kretzschmar, Viola Priesemann, and Michael Maes. Coupled infec- tious disease and behavior dynamics. a review of model assumptions.Reports on Progress in Physics, 2024

  15. [15]

    Mixing beliefs among interacting agents.Advances in Complex Systems, 3(1-4):87–98,

    Guillaume Deffuant, David Neau, Frédéric Amblard, and Gérard Weisbuch. Mixing beliefs among interacting agents.Advances in Complex Systems, 3(1-4):87–98,

  16. [16]

    doi:10.1142/S0219525900000078

  17. [17]

    Opinion dynamics and bounded confidence: Models, analysis, and simula- tion.Journal of Artificial Societies and Social Simulation, 5(3), 2002

    Rainer Hegselmann and Ulrich Krause. Opinion dynamics and bounded confidence: Models, analysis, and simula- tion.Journal of Artificial Societies and Social Simulation, 5(3), 2002. URL http://jasss.soc.surrey.ac.uk /5/3/2.html

  18. [18]

    The language of opinion change on social media under the lens of commu- nicative action.Scientific Reports, 12(1):17920, 2022

    Corrado Monti, Luca Maria Aiello, Gianmarco De Fran- cisci Morales, and Francesco Bonchi. The language of opinion change on social media under the lens of commu- nicative action.Scientific Reports, 12(1):17920, 2022

  19. [19]

    A survey on nonstrategic models of opinion dynamics.Games, 11 (4):65, 2020

    Michel Grabisch and Agnieszka Rusinowska. A survey on nonstrategic models of opinion dynamics.Games, 11 (4):65, 2020

  20. [20]

    Large language models empowered agent-based modeling and simulation: A survey and perspectives.Humanities and Social Sciences Communications, 11(1):1259, 2024

    Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jing- tao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. Large language models empowered agent-based modeling and simulation: A survey and perspectives.Humanities and Social Sciences Communications, 11(1):1259, 2024. doi:10.1057/s41599-024-03611-3. URL https://www. nature.com/articles/s41599-024-03611-3

  21. [21]

    Fundamental capabilities and applications of large language models: A survey.ACM Computing Surveys, 2025

    Jiawei Li, Yang Gao, Yizhe Yang, Yu Bai, Xiaofeng Zhou, Yinghao Li, Huashan Sun, Yuhang Liu, Xingpeng Si, Yuhao Ye, et al. Fundamental capabilities and applications of large language models: A survey.ACM Computing Surveys, 2025

  22. [22]

    Large language models (llm) in computational social science: Prospects, current state, and challenges.Social Network Analysis and Mining, 15(1):1–30, 2025

    Surendrabikram Thapa, Shuvam Shiwakoti, Sid- dhant Bikram Shah, Surabhi Adhikari, Hariram Veeramani, Mehwish Nasim, and Usman Naseem. Large language models (llm) in computational social science: Prospects, current state, and challenges.Social Network Analysis and Mining, 15(1):1–30, 2025

  23. [23]

    Large language models show amplified cognitive biases in moral decision-making.Proceedings of the National Academy of Sciences, 122(25):e2412015122, 2025

    Vanessa Cheung, Maximilian Maier, and Falk Lieder. Large language models show amplified cognitive biases in moral decision-making.Proceedings of the National Academy of Sciences, 122(25):e2412015122, 2025

  24. [24]

    Generative language models exhibit social identity biases

    Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, and Jon Roozenbeek. Generative language models exhibit social identity biases. Nature Computational Science, 5(1):65–75, 2025

  25. [25]

    Systematic biases in llm simulations of debates

    Amir Taubenfeld, Yaniv Dover, Roi Reichart, and Ariel Goldstein. Systematic biases in llm simulations of debates. arXiv preprint arXiv:2402.04049, 2024

  26. [26]

    Do llms exhibit human- like response biases? a case study in survey design.Trans- actions of the Association for Computational Linguistics, 12:1011–1026, 2024

    Lindia Tjuatja, Valerie Chen, Tongshuang Wu, Ameet Talwalkwar, and Graham Neubig. Do llms exhibit human- like response biases? a case study in survey design.Trans- actions of the Association for Computational Linguistics, 12:1011–1026, 2024

  27. [27]

    Human bias in ai models? anchoring effects and mitigation strategies in large language mod- els.Journal of Behavioral and Experimental Finance, 43: 100971, 2024

    Jeremy K Nguyen. Human bias in ai models? anchoring effects and mitigation strategies in large language mod- els.Journal of Behavioral and Experimental Finance, 43: 100971, 2024

  28. [28]

    Strategies for integrating disparate social information.Proceedings of the Royal Society B, 287 (1939):20202413, 2020

    Lucas Molleman, Alan N Tump, Andrea Gradassi, Stefan Herzog, Bertrand Jayles, Ralf HJM Kurvers, and Wouter van den Bos. Strategies for integrating disparate social information.Proceedings of the Royal Society B, 287 (1939):20202413, 2020

  29. [29]

    Bayesian learning in social networks.Games and economic behavior, 45(2): 329–346, 2003

    Douglas Gale and Shachar Kariv. Bayesian learning in social networks.Games and economic behavior, 45(2): 329–346, 2003

  30. [30]

    Bayesian decision making in human collectives with binary choices.PLoS One, 10(4):e0121332, 2015

    Víctor M Eguíluz, Naoki Masuda, and Juan Fernández- Gracia. Bayesian decision making in human collectives with binary choices.PLoS One, 10(4):e0121332, 2015

  31. [31]

    How to grow a mind: Statistics, structure, and abstraction.science, 331(6022):1279–1285, 2011

    Joshua B Tenenbaum, Charles Kemp, Thomas L Griffiths, and Noah D Goodman. How to grow a mind: Statistics, structure, and abstraction.science, 331(6022):1279–1285, 2011

  32. [32]

    Maher, Caoimhe O’Reilly, and Michael Quayle

    Dino Carpentras, Paul J. Maher, Caoimhe O’Reilly, and Michael Quayle. Deriving an opinion dynamics model from experimental data.Journal of Artificial Societies and Social Simulation, 25(4), 2022. ISSN 1460-7425. doi:10.18564/jasss.4947. URL http://dx.doi.org/1 0.18564/jasss.4947

  33. [33]

    Dolphin-2.7-mixtral-8x7b [model card]

    Cognitive Computations. Dolphin-2.7-mixtral-8x7b [model card]. https://huggingface.co/cognitive computations/dolphin-2.7-mixtral-8x7b , 2024. Accessed: 2025-06-26. 11

  34. [34]

    Mixtral of Experts

    Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Deven- dra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts.arXiv preprint arXiv:2401.04088, 2024

  35. [35]

    GPT-4o System Card

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perel- man, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o sys- tem card.arXiv preprint arXiv:2410.21276, 2024

  36. [36]

    Mixtral of experts

    Mistral AI. Mixtral of experts. https://mistral. ai/news/mixtral- of- experts , 2023. Accessed 2025-07-12

  37. [37]

    Gpt -4o mini: advancing cost -efficient intelli- gence

    OpenAI. Gpt -4o mini: advancing cost -efficient intelli- gence. https://openai.com/de-DE/index/gpt-4 o-mini-advancing-cost-efficient-intellige nce/, 2024. Accessed 2025-07-12

  38. [38]

    The wisdom of partisan crowds: Comparing collective intelligence in humans and llm-based agents.arXiv preprint arXiv:2311.09665, 2023

    Yun-Shiuan Chuang, Siddharth Suresh, Nikunj Harlalka, Agam Goyal, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, and Timothy T Rogers. The wisdom of partisan crowds: Comparing collective intelligence in humans and llm-based agents.arXiv preprint arXiv:2311.09665, 2023

  39. [39]

    Schneider, ¸ Süheda Yıldırım, Jasper Benke, and Viola Priesemann

    Sebastian Bernd Mohr, Andreas C. Schneider, ¸ Süheda Yıldırım, Jasper Benke, and Viola Priesemann. Telegram graph data of COVID-19 related channels, 2023. URL https://doi.org/10.25625/H5JUZG

  40. [40]

    BERTopic: Neural topic modeling with a class-based TF-IDF procedure

    Maarten Grootendorst. Bertopic: Neural topic model- ing with a class-based tf-idf procedure.arXiv preprint arXiv:2203.05794, 2022

  41. [41]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022

  42. [42]

    John Wiley & Sons, 1999

    Thomas M Cover.Elements of information theory. John Wiley & Sons, 1999

  43. [43]

    Interacting large language model agents

    Adit Jain and Vikram Krishnamurthy. Interacting large language model agents. bayesian social learning based interpretable models.IEEE Access, 2025

  44. [44]

    Emergence of scale-free networks in social inter- actions among large language models.arXiv preprint arXiv:2312.06619, 2023

    Giordano De Marzo, Luciano Pietronero, and David Gar- cia. Emergence of scale-free networks in social inter- actions among large language models.arXiv preprint arXiv:2312.06619, 2023

  45. [45]

    Nicoló Fontana, Francesco Pierri, and Luca Maria Aiello. Nicer than humans: How do large language models be- have in the prisoner’s dilemma? InProceedings of the International AAAI Conference on Web and Social Media, volume 19, pages 522–535, 2025

  46. [46]

    Playing repeated games with large language models.Nature Human Be- haviour, pages 1–11, 2025

    Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, and Eric Schulz. Playing repeated games with large language models.Nature Human Be- haviour, pages 1–11, 2025

  47. [47]

    User behavior simulation with large language model-based agents.ACM Transactions on Information Systems, 43(2):1–37, 2025

    Lei Wang, Jingsen Zhang, Hao Yang, Zhi-Yuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Hao Sun, Ruihua Song, et al. User behavior simulation with large language model-based agents.ACM Transactions on Information Systems, 43(2):1–37, 2025

  48. [48]

    Large language models can segment narrative events similarly to humans.Behavior Research Methods, 57(1):1–13, 2025

    Sebastian Michelmann, Manoj Kumar, Kenneth A Nor- man, and Mariya Toneva. Large language models can segment narrative events similarly to humans.Behavior Research Methods, 57(1):1–13, 2025

  49. [49]

    Large language models surpass human experts in predict- ing neuroscience results.Nature human behaviour, 9(2): 305–315, 2025

    Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K Nejad, Felipe Yáñez, Bati Yilmaz, Kangjoo Lee, Alexan- dra O Cohen, Valentina Borghesani, Anton Pashkov, et al. Large language models surpass human experts in predict- ing neuroscience results.Nature human behaviour, 9(2): 305–315, 2025

  50. [50]

    Reasoning with large language models, a survey, 2024

    Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, and Thomas Back. Reasoning with large language models, a survey, 2024. URL https://arxiv. org/abs/2407.11511

  51. [51]

    description of person- ality

    Lewis R Goldberg. An alternative “description of person- ality”: The big-five factor structure. InPersonality and personality disorders, pages 34–47. Routledge, 2013

  52. [52]

    Open models, closed minds? on agents capabilities in mimicking human per- sonalities through open large language models

    Lucio La Cava and Andrea Tagarelli. Open models, closed minds? on agents capabilities in mimicking human per- sonalities through open large language models. InPro- ceedings of the AAAI Conference on Artificial Intelligence, volume 39(2), pages 1355–1363, 2025

  53. [53]

    Compost: Characterizing and evaluating caricature in llm simula- tions.arXiv preprint arXiv:2310.11501, 2023

    Myra Cheng, Tiziano Piccardi, and Diyi Yang. Compost: Characterizing and evaluating caricature in llm simula- tions.arXiv preprint arXiv:2310.11501, 2023. URL https://arxiv.org/abs/2310.11501

  54. [54]

    Pymc: a modern, and compre- hensive probabilistic programming framework in python

    Oriol Abril-Pla, Virgile Andreani, Colin Carroll, Larry Dong, Christopher J Fonnesbeck, Maxim Kochurov, Ravin Kumar, Junpeng Lao, Christian C Luhmann, Os- valdo A Martin, et al. Pymc: a modern, and compre- hensive probabilistic programming framework in python. PeerJ Computer Science, 9:e1516, 2023

  55. [55]

    Cognition of Interaction

    Matthew D Hoffman, Andrew Gelman, et al. The no-u- turn sampler: adaptively setting path lengths in hamilto- nian monte carlo.J. Mach. Learn. Res., 15(1):1593–1623, 2014. Acknowledgments We thank Jonas Dehning and Abdullah Makkeh for their de- tailed feedback on an early draft of this work. We thank Jonas Dehning for support with the Bayesian inference. W...

  56. [56]

    climate change

    Here, we used a weakly informative prior in the form of a uniform distribution U(−2,2) across this range. We enlarged the half-normal prior’s scale forγ to σ= 20 so that it reduces prior-data conflict while still providing weak regularisation. 15 Figure 8: Traces of the parameter inference for DolphinMixtral. Topic biases are color coded for the “climate ...