pith. sign in

arxiv: 2604.19125 · v1 · submitted 2026-04-21 · 💻 cs.CL

Do Emotions Influence Moral Judgment in Large Language Models?

Pith reviewed 2026-05-10 02:58 UTC · model grok-4.3

classification 💻 cs.CL
keywords large language modelsmoral judgmentemotionsemotion inductionjudgment reversalmodel alignmentmoral reasoningprompt effects
0
0 comments X

The pith

Positive emotions raise moral acceptability ratings in large language models while negative emotions lower them, sometimes reversing binary judgments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models were tested for how emotional framing affects their judgments on moral scenarios using a dedicated induction pipeline. Positive emotions consistently boosted how acceptable the models found actions, and negative emotions reduced it. These changes proved large enough to flip yes-or-no moral decisions in up to one in five instances. The effect weakened as models grew more capable. Some emotions produced results opposite their usual positive or negative tone, and human participants showed none of these patterned shifts.

Core claim

The paper establishes that an emotion-induction pipeline applied to moral situations causes LLMs to alter their assessments of moral acceptability in a valence-dependent manner, with positive emotions increasing and negative emotions decreasing ratings, including up to 20% binary judgment reversals, an inverse relationship between susceptibility and model capability, and occasional deviations from valence expectations such as remorse increasing acceptability, while a parallel human study reveals no comparable systematic influence.

What carries the argument

The emotion-induction pipeline, which embeds targeted emotional states into prompts describing moral dilemmas to quantify resulting changes in acceptability judgments.

Load-bearing premise

The emotion-induction pipeline successfully injects the intended emotion without introducing unrelated prompt artifacts that independently alter moral judgments.

What would settle it

Re-running the moral acceptability ratings with prompts that match the emotional versions in length and structure but contain no emotion words and finding no systematic shifts or judgment reversals.

Figures

Figures reproduced from arXiv: 2604.19125 by Mohammad Saim, Tianyu Jiang.

Figure 1
Figure 1. Figure 1: Adding a positive emotion (pride) or a nega [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Mean Shifts in Moral Acceptability for each model for the Social-Chem-101 and ETHICS (Justice subset). [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Shift magnitude categorized in four bins in [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Emotion-specific effects showing mean shift magnitudes for each emotion label on the Social-Chem-101. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Kernel density estimates of mean shift distri [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Congruence rate of each model for Social [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Jensen-Shannon Divergence across each model for positive/negative affects for Social-Chem￾101. ≈ 0.25) for both positive and negative shifts, sug￾gesting comparable sensitivity profiles at the 8B parameter scale. Most notably, negative emotion￾induced shifts produce higher inter-model diver￾gence than positive shifts. The mean pairwise JSD for negative shifts (JSD− = 0.41) exceeds that for positive shifts … view at source ↗
Figure 8
Figure 8. Figure 8: Prompt for rating each situation employed uniformly by all models. [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Template Selection Prompt for GPT 5.1. B.1 Contrast Set Metrics We recall that each contrast group contains four variants of a base claim—two labeled reasonable and two unreasonable. We define two metrics to quantify how the induced emotions affect the nor￾mative distinction within each group. Contrast Collapse. Let s¯1 and s¯0 denote the mean scores for reasonable and unreasonable vari￾ants, respectively.… view at source ↗
Figure 11
Figure 11. Figure 11: Emotion-specific effects showing mean shift magnitudes for each emotion label on the Social-Chem-101 [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Mean Shift of Moral Acceptability for each [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Emotion-specific effects showing mean shift magnitudes for each emotion label on the Justice Dataset [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Emotion-specific mean shifts in moral acceptability for human annotators (A1–A4) across positive (left) [PITH_FULL_IMAGE:figures/full_fig_p017_14.png] view at source ↗
read the original abstract

Large language models have been extensively studied for emotion recognition and moral reasoning as distinct capabilities, yet the extent to which emotions influence moral judgment remains underexplored. In this work, we develop an emotion-induction pipeline that infuses emotion into moral situations and evaluate shifts in moral acceptability across multiple datasets and LLMs. We observe a directional pattern: positive emotions increase moral acceptability and negative emotions decrease it, with effects strong enough to reverse binary moral judgments in up to 20% of cases, and with susceptibility scaling inversely with model capability. Our analysis further reveals that specific emotions can sometimes behave contrary to what their valence would predict (e.g., remorse paradoxically increases acceptability). A complementary human annotation study shows humans do not exhibit these systematic shifts, indicating an alignment gap in current LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript develops an emotion-induction pipeline to infuse emotions into moral situations and evaluates resulting shifts in moral acceptability judgments across multiple datasets and LLMs. It reports a directional pattern in which positive emotions increase moral acceptability and negative emotions decrease it, with effects strong enough to reverse binary judgments in up to 20% of cases and with susceptibility scaling inversely with model capability. Specific emotions sometimes deviate from valence predictions (e.g., remorse increasing acceptability), and a complementary human annotation study finds no such systematic shifts in humans, indicating an alignment gap.

Significance. If the observed shifts prove robust to controls for prompt artifacts and are supported by statistical evidence, the work would provide empirical evidence of an emotion-driven divergence between LLM and human moral reasoning. The inverse scaling observation would be particularly valuable, as it runs counter to typical capability trends and could inform targeted alignment strategies for less capable models.

major comments (2)
  1. [Abstract] Abstract: The directional effects and 20% reversal rate are reported without any mention of statistical tests, error bars, sample sizes, or explicit controls for prompt length, lexical specificity, or narrative framing. These omissions make it impossible to determine whether the claimed shifts are reliable or attributable to the intended emotional induction.
  2. [Abstract] Abstract (emotion-induction pipeline description): The pipeline is presented as cleanly infusing emotion, yet no validation is described to rule out confounds such as altered prompt length or structural changes that independently affect moral judgments via known LLM sensitivities (e.g., sycophancy or recency). Without length-matched neutral controls or valence-neutral but structurally identical prompts, the central claim that emotions (rather than artifacts) drive the reversals cannot be isolated.
minor comments (1)
  1. [Abstract] The abstract would benefit from explicit quantification of effect sizes and dataset/model counts to allow readers to gauge the scope of the 20% reversal claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on our manuscript. We address each major comment below and have revised the abstract and methods sections to improve transparency regarding statistical reporting and controls for potential confounds.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The directional effects and 20% reversal rate are reported without any mention of statistical tests, error bars, sample sizes, or explicit controls for prompt length, lexical specificity, or narrative framing. These omissions make it impossible to determine whether the claimed shifts are reliable or attributable to the intended emotional induction.

    Authors: We agree that the abstract should explicitly reference the statistical analyses and controls to allow immediate assessment of reliability. The main text reports results from paired statistical tests (e.g., McNemar tests for judgment reversals) with sample sizes exceeding 400 scenarios per condition across models, standard errors shown in figures, and prompt-length matching via token-count equalization plus fixed narrative templates. We will revise the abstract to include a concise statement on these elements and the use of length- and structure-matched controls. revision: yes

  2. Referee: [Abstract] Abstract (emotion-induction pipeline description): The pipeline is presented as cleanly infusing emotion, yet no validation is described to rule out confounds such as altered prompt length or structural changes that independently affect moral judgments via known LLM sensitivities (e.g., sycophancy or recency). Without length-matched neutral controls or valence-neutral but structurally identical prompts, the central claim that emotions (rather than artifacts) drive the reversals cannot be isolated.

    Authors: We acknowledge that the abstract does not detail the validation steps. The Methods section describes length-matched neutral controls (identical token counts and sentence structure, differing only in the inserted emotion phrase) and includes ablation experiments replacing emotional terms with valence-neutral but structurally parallel phrases to isolate lexical effects. We will expand the abstract to mention these controls and add a short appendix subsection summarizing the ablation results confirming that structural changes alone do not reproduce the observed directional shifts. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical observational study

full rationale

The paper reports results from running an emotion-induction pipeline on LLMs across moral-dilemma datasets, measuring directional shifts in acceptability ratings and reversal rates, plus a human annotation contrast. No equations, fitted parameters, or first-principles derivations appear; the central claims are direct experimental observations rather than quantities defined in terms of the paper's own outputs. Self-citations, if present, are not invoked to establish uniqueness theorems or to smuggle ansatzes. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the assumption that LLM next-token predictions on moral prompts can be interpreted as moral judgments modulated by emotion, and that the induction pipeline isolates emotion effects.

axioms (1)
  • domain assumption LLM responses to moral prompts reflect genuine moral judgments rather than surface-level pattern matching
    The study treats acceptability ratings as direct measures of moral judgment.

pith-pipeline@v0.9.0 · 5420 in / 1106 out tokens · 40122 ms · 2026-05-10T02:58:53.974238+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 5 internal anchors

  1. [1]

    InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

    Moral foundations of large language models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Katz, and Nikolaos Aletras. 2022. LexGLUE: A benchmark dataset for legal language understanding in English. InProceeding...

  2. [2]

    InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)

    GoEmotions: A dataset of fine-grained emo- tions. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020). Dario Di Palma, Alessandro De Bellis, Giovanni Serve- dio, Vito Walter Anelli, Fedelucio Narducci, and Tommaso Di Noia. 2025. LLaMAs have feelings too: Unveiling sentiment and emotion representations in LL...

  3. [3]

    The Llama 3 Herd of Models

    Moral foundations theory: The pragmatic va- lidity of moral pluralism. InAdvances in experimen- tal social psychology. Elsevier. Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al- Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, and 1 others. 2024. The llama 3 herd of models.arXiv preprint arXiv:...

  4. [4]

    Aligning ai with shared human values.arXiv preprint arXiv:2008.02275. Joe Hoover, Gwenyth Portillo-Wightman, Leigh Yeh, Shreya Havaldar, Aida Mostafazadeh Davani, Ying Lin, Brendan Kennedy, Mohammad Atari, Zahra Kamel, Madelyn Mendlen, Gabriela Moreno, Christina Park, Tingyee E. Chang, Jenna Chin, Chris- tian Leong, Jun Yen Leung, Arineh Mirinjian, and Mo...

  5. [5]

    emotion neurons

    When to make exceptions: Exploring language models as accounts of human moral judgment. In Advances in Neural Information Processing Systems (NeurIPS 2022). Kornraphop Kawintiranon and Lisa Singh. 2022. PoliB- ERTweet: A pre-trained language model for analyz- ing political content on Twitter. InProceedings of the Thirteenth Language Resources and Evaluati...

  6. [6]

    gpt-oss-120b & gpt-oss-20b Model Card

    Moral self-licensing: When being good frees us to be bad.Social and personality psychology compass, 4(5):344–357. Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James Allen. 2016. A corpus and cloze evaluation for deeper understanding of commonsense stories. InProceedings of the 2016 C...

  7. [7]

    OpenAI GPT-5 System Card

    Levels of valence.Frontiers in Psychology, V olume 4 - 2013. Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, and 1 others. 2025. Openai gpt-5 sys- tem card.Preprint, arXiv:2601.03267. Preprint. arXiv:2601.03267. Ala N. Tak, Amin Banayeeanzade, Anahita Bolourani, Mina Kian, Robin Jia, and Jonathan Gratch. 2025. Mechanistic interpretability of emotion in...

  8. [8]

    Moral emotions and moral behavior.Annu. Rev. Psychol., 58(1):345–372. Piercarlo Valdesolo and David DeSteno. 2006. Ma- nipulations of emotional context shape moral judg- ment.PSYCHOLOGICAL SCIENCE-CAMBRIDGE- , 17(6):476. Xu Wang, Cheng Li, Yi Chang, Jindong Wang, and Yuan Wu. 2024. Negativeprompt: Leveraging psy- chology for large language models enhancem...

  9. [9]

    Feeling [emotion], [exact situation]

    “Feeling [emotion], [exact situation]”

  10. [10]

    Out of [emotion], [exact situation]

    “Out of [emotion], [exact situation]”

  11. [11]

    In my [emotion], [exact situation]

    “In my [emotion], [exact situation]”

  12. [12]

    [Adverb] [exact situation]

    “[Adverb] [exact situation]” (e.g.,angrily, sadly,proudly) The prompt explicitly prohibits explanatory ad- ditions (e.g., “because. . . ” or “due to. . . ”) to ensure emotions function as pure affective signals rather than causal justifications. Models are instructed to select the most natural-sounding template for each emotion while keeping the underlyin...