pith. sign in

arxiv: 2506.20020 · v2 · submitted 2025-06-24 · 💻 cs.AI · cs.CL

Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning

Pith reviewed 2026-05-19 07:13 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords motivated reasoninglarge language modelspersona assignmentcognitive biasespolitical identitydebiasing promptsmisinformationscientific evidence
0
0 comments X

The pith

Assigning personas to LLMs induces human-like motivated reasoning that resists standard debiasing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether giving LLMs personas based on political views and demographics makes them reason in biased ways that favor their assigned identity, similar to how humans protect their beliefs. It uses tasks where models judge if news headlines are true or false and assess scientific data on topics like gun control. The findings show that these personas lower accuracy in spotting misinformation by up to 9 percent and make political personas far more likely to accept evidence that matches their induced views. Standard prompt techniques to reduce bias fail to eliminate these effects. A sympathetic reader would care because such biased AI could influence public debates on divisive issues and worsen polarization if widely used.

Core claim

The central discovery is that persona-assigned LLMs exhibit human-like motivated reasoning. Across eight models tested on veracity discernment and scientific evidence evaluation, persona assignment leads to reduced accuracy and strong bias toward identity-congruent conclusions, particularly for political personas on topics like gun control, and these effects are not mitigated by conventional debiasing prompts.

What carries the argument

Persona assignment across political and socio-demographic attributes, which triggers identity-congruent motivated reasoning during reasoning tasks.

Load-bearing premise

The measured differences in model outputs are caused by identity-congruent motivated reasoning rather than other prompt-induced changes in response style or calibration.

What would settle it

Finding equal performance on congruent and incongruent evidence evaluation with no reduction in veracity discernment for persona-assigned models compared to baseline would falsify the motivated reasoning claim.

read the original abstract

Reasoning in humans is prone to biases due to underlying motivations like identity protection, that undermine rational decision-making and judgment. This \textit{motivated reasoning} at a collective level can be detrimental to society when debating critical issues such as human-driven climate change or vaccine safety, and can further aggravate political polarization. Prior studies have reported that large language models (LLMs) are also susceptible to human-like cognitive biases, however, the extent to which LLMs selectively reason toward identity-congruent conclusions remains largely unexplored. Here, we investigate whether assigning 8 personas across 4 political and socio-demographic attributes induces motivated reasoning in LLMs. Testing 8 LLMs (open source and proprietary) across two reasoning tasks from human-subject studies -- veracity discernment of misinformation headlines and evaluation of numeric scientific evidence -- we find that persona-assigned LLMs have up to 9% reduced veracity discernment relative to models without personas. Political personas specifically are up to 90% more likely to correctly evaluate scientific evidence on gun control when the ground truth is congruent with their induced political identity. Prompt-based debiasing methods are largely ineffective at mitigating these effects. Taken together, our empirical findings are the first to suggest that persona-assigned LLMs exhibit human-like motivated reasoning that is hard to mitigate through conventional debiasing prompts -- raising concerns of exacerbating identity-congruent reasoning in both LLMs and humans.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript investigates whether assigning personas to LLMs induces human-like motivated reasoning. Using 8 LLMs on veracity discernment of misinformation headlines and evaluation of numeric scientific evidence tasks drawn from human-subject studies, the authors report that persona-assigned LLMs exhibit up to 9% reduced veracity discernment relative to no-persona baselines. Political personas are up to 90% more likely to correctly evaluate scientific evidence when the ground truth is congruent with the induced identity (e.g., gun-control items). Prompt-based debiasing methods are largely ineffective at mitigating these effects.

Significance. If the central empirical patterns hold after tighter controls, the work provides concrete evidence that persona assignment can produce identity-congruent output shifts in LLMs that parallel human motivated reasoning, with implications for AI deployment on polarized topics. It extends prior LLM bias literature by linking persona effects to identity protection and by testing debiasing robustness across open and proprietary models. The use of tasks and effect-size reporting drawn from human studies is a positive feature.

major comments (2)
  1. [Abstract and §3 (Methods)] Abstract and §3 (Methods): The reported effect sizes (9% discernment drop, 90% congruence-dependent accuracy gain) are presented without accompanying details on statistical tests, exact persona prompt wording, prompt-length or style matching between conditions, or ground-truth labeling procedures. This leaves open whether the measured differences isolate identity-congruent motivated reasoning or simply reflect generic prompt-induced shifts in response distribution or calibration.
  2. [§4 (Results)] §4 (Results): The interpretation that output shifts constitute 'human-like motivated reasoning' requires ruling out alternative mechanisms such as altered base priors or stylistic alignment. No ablation or control condition is described that holds prompt structure constant while varying only identity congruence, which is load-bearing for the central claim.
minor comments (2)
  1. [§2 (Related Work)] §2 (Related Work): A brief comparison table of prior LLM bias studies versus the current persona manipulation would improve context.
  2. [Figures] Figure captions and axis labels should explicitly state the number of trials and models per condition to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major point below, clarifying methodological details and strengthening the controls for alternative explanations where possible. Revisions have been made to improve transparency without altering the core empirical claims.

read point-by-point responses
  1. Referee: [Abstract and §3 (Methods)] Abstract and §3 (Methods): The reported effect sizes (9% discernment drop, 90% congruence-dependent accuracy gain) are presented without accompanying details on statistical tests, exact persona prompt wording, prompt-length or style matching between conditions, or ground-truth labeling procedures. This leaves open whether the measured differences isolate identity-congruent motivated reasoning or simply reflect generic prompt-induced shifts in response distribution or calibration.

    Authors: We appreciate this feedback on clarity. The revised manuscript expands §3 (Methods) with the exact persona prompt templates (provided verbatim in a new appendix table), confirmation that all conditions used prompts of matched length and syntactic structure (differing solely by the inserted persona clause), ground-truth procedures (misinformation headlines labeled via cross-referenced fact-checks from PolitiFact and Snopes; scientific evidence items taken directly from the cited human studies with their original veracity designations), and statistical reporting (paired t-tests with effect sizes, 95% CIs, and Bonferroni-adjusted p-values now shown alongside the 9% and 90% figures in §4). These additions demonstrate that the observed shifts are tied to identity congruence rather than nonspecific prompt effects. revision: yes

  2. Referee: [§4 (Results)] §4 (Results): The interpretation that output shifts constitute 'human-like motivated reasoning' requires ruling out alternative mechanisms such as altered base priors or stylistic alignment. No ablation or control condition is described that holds prompt structure constant while varying only identity congruence, which is load-bearing for the central claim.

    Authors: We agree that isolating identity congruence is central. The existing no-persona baseline already holds all non-persona prompt elements fixed, and our primary analyses compare accuracy on identical evidence items across personas whose induced identities are either congruent or incongruent with the ground truth. This directional specificity (e.g., liberal personas showing higher accuracy only on pro-gun-control items) goes beyond generic base-rate or stylistic shifts. In the revision we have added a supplementary ablation that explicitly swaps only the political orientation clause within an otherwise identical prompt template, confirming the congruence-dependent accuracy pattern persists. We maintain that these controls, together with the parallel to human-study effect sizes, support the motivated-reasoning framing while acknowledging that further mechanistic probes (e.g., logit inspection) could be explored in future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical observations with no derivation chain

full rationale

This is an empirical behavioral study that assigns personas to LLMs and reports measured differences in output on veracity discernment and evidence evaluation tasks. No mathematical derivation, first-principles result, fitted parameter, or prediction is presented that could reduce to its own inputs by construction. The central claims rest on observed percentage shifts (e.g., up to 9% reduced discernment, up to 90% congruence-dependent accuracy) obtained through direct prompting experiments. Any self-citations to prior human or LLM bias literature supply background context but are not load-bearing for the reported results, which remain independently replicable via the same experimental protocol. The paper is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study applies existing psychological concepts to LLMs without introducing new free parameters, mathematical axioms, or postulated entities.

axioms (1)
  • domain assumption Persona prompts can induce identity-congruent reasoning patterns in LLMs analogous to human motivated reasoning
    This assumption is required to interpret the observed output shifts as evidence of motivated reasoning rather than generic prompt sensitivity.

pith-pipeline@v0.9.0 · 5793 in / 1201 out tokens · 51813 ms · 2026-05-19T07:13:40.187938+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Confident, Calibrated, or Complicit: Safety Alignment and Ideological Bias in LLM Hate Speech Detection

    cs.CL 2025-08 unverdicted novelty 5.0

    Censored LLMs achieve 69.0% strict accuracy in hate speech detection versus 64.1% for uncensored models and resist persona-based ideological influence better, but all exhibit overconfidence, irony failures, and group ...

  2. Can LLMs Emulate Human Belief Dynamics?

    cs.SI 2026-05 unverdicted novelty 4.0

    LLMs fail to emulate human belief dynamics: they mismatch initial distributions and show higher conformity than humans in network interactions.