pith. sign in

arxiv: 2604.10222 · v1 · submitted 2026-04-11 · 💻 cs.CY

Morally Programmed LLMs Reshape Human Morality

Pith reviewed 2026-05-10 16:07 UTC · model grok-4.3

classification 💻 cs.CY
keywords LLMmoral psychologyhuman-AI interactiondeontologyutilitarianismethicsAI influencemoral change
0
0 comments X

The pith

Interacting with LLMs programmed with deontological or utilitarian principles shifts human moral inclinations to align with those principles, with the change lasting at least two weeks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether LLMs given fixed moral rules can alter the moral views of the people who talk with them. In two large pre-registered studies totaling nearly 16,000 exchanges, participants who interacted with either a deontological LLM or a utilitarian LLM moved their own moral judgments closer to the rules embedded in the system they used. These shifts carried over to how people evaluated real social policies and remained largely intact two weeks later. If the finding holds, it means moral programming placed inside machines does not stay inside the machines; it leaks outward and reshapes the humans who rely on them.

Core claim

Humans who engage in extensive conversations with LLMs programmed to follow either deontological principles or utilitarian principles adopt moral inclinations that align more closely with the principles of the LLM they interacted with. The alignment persists strongly two weeks later and produces corresponding changes in evaluations of contentious socio-political policies, indicating that moral programming in machines can actively shape rather than merely reflect human morality.

What carries the argument

Two custom LLMs (D-LLM following deontological rules and U-LLM following utilitarian calculations) used inside controlled, pre-registered experiments that measure moral judgment before and after repeated human-LLM exchanges.

Load-bearing premise

The observed shifts in moral views are produced by the specific deontological or utilitarian principles coded into the LLMs and not by the general experience of talking with any advanced AI or by expectations created by the study design.

What would settle it

If a control condition using LLMs without explicit moral programming produces equivalent moral shifts, or if the shifts vanish once participants are unaware of the LLM's moral stance, the central claim would not hold.

read the original abstract

As large language models (LLMs) increasingly participate in high-stakes decision-making, a central societal debate has revolved around which moral frameworks-deontological or utilitarian-should guide machine behavior. However, a largely overlooked question is whether the moral principles that humans encode in LLMs could, through repeated interactions, reshape human moral inclinations. We developed two LLMs programmed with either deontological principles (D-LLM) or utilitarian principles (U-LLM) and conducted two pre-registered experiments involving extensive human-LLM interactions, comprising 15,985 total exchanges across the two experiments. Results show that interacting with these morally programmed LLMs systematically shifted human moral inclinations to align with the principles embedded in these systems. These effects remained strong two weeks after the interaction, with only slight decay, suggesting deep internalization rather than superficial agreement. Further, LLM-induced shifts in human moral inclinations translated into meaningful changes in socio-political policy evaluations, shaping how individuals approach contentious social issues. Overall, these results demonstrate that morally programmed LLMs can shape-not merely reflect-human morality, revealing a critical design paradox: embedding moral principles in LLMs not only restricts their behavior but also poses the risk of shaping human morality, raising important ethical and policy questions about who determines which principles intelligent machines should adhere to.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that interactions with LLMs programmed with deontological (D-LLM) or utilitarian (U-LLM) principles cause systematic shifts in human moral inclinations toward those embedded principles. Based on two pre-registered experiments totaling 15,985 exchanges, the effects persist with only slight decay after two weeks and extend to changes in socio-political policy evaluations, suggesting deep internalization and raising ethical questions about who should determine moral principles for AI systems.

Significance. If the central results hold, the work is significant for AI ethics and human-AI interaction studies, as it provides evidence that morally programmed LLMs can actively reshape rather than merely reflect human morality, with lasting behavioral consequences. Notable strengths include the pre-registration of the experiments, the large scale of interactions sampled, and the two-week retention test that helps distinguish internalization from transient compliance.

major comments (2)
  1. [Experimental Design (Experiments 1 and 2)] The experimental design (described in the sections on Experiments 1 and 2) pits D-LLM against U-LLM and reports differential shifts, but includes no neutral-LLM control arm in which participants interact with an identically structured LLM given no moral rules. This is load-bearing for the claim that observed alignment is produced by the specific content of the moral programming rather than generic effects of extended persuasive LLM dialogue or demand characteristics, as any such general effect would undermine the attribution to the embedded principles.
  2. [Methods] The methods sections provide no details on the exact measurement instruments for moral inclinations, the statistical models applied to the 15,985 exchanges (including handling of repeated measures and participant-level clustering), or controls for confounds such as experimenter demand effects and LLM response consistency. These omissions prevent assessment of whether the reported systematic shifts and their persistence are robust.
minor comments (2)
  1. [Abstract] The abstract contains a minor punctuation issue in the phrase 'which moral frameworks-deontological or utilitarian-should guide', which would benefit from proper spacing or an em dash for readability.
  2. [Results] Tables or figures reporting pre- versus post-interaction moral scores and policy evaluations would be clearer with explicit labels for effect sizes and alignment directions (D-LLM vs. U-LLM).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review. The comments highlight important considerations for strengthening the manuscript's clarity and robustness. We address each major comment point by point below, providing our rationale and indicating where revisions have been made.

read point-by-point responses
  1. Referee: [Experimental Design (Experiments 1 and 2)] The experimental design (described in the sections on Experiments 1 and 2) pits D-LLM against U-LLM and reports differential shifts, but includes no neutral-LLM control arm in which participants interact with an identically structured LLM given no moral rules. This is load-bearing for the claim that observed alignment is produced by the specific content of the moral programming rather than generic effects of extended persuasive LLM dialogue or demand characteristics, as any such general effect would undermine the attribution to the embedded principles.

    Authors: We respectfully disagree that a neutral-LLM control arm is required to support the central claims. The experimental design deliberately contrasts D-LLM and U-LLM conditions that are identical in every respect except the embedded moral principles (same model architecture, interaction format, length of engagement, and persuasive style). Any generic effects from LLM dialogue or demand characteristics would be expected to produce comparable shifts in both arms. The observed pattern of opposing directional shifts—greater deontological alignment after D-LLM exposure and greater utilitarian alignment after U-LLM exposure—therefore isolates the causal role of the specific moral content. We have added explicit language in the revised Experimental Design and Discussion sections clarifying this between-condition logic and why it rules out non-specific explanations. revision: no

  2. Referee: [Methods] The methods sections provide no details on the exact measurement instruments for moral inclinations, the statistical models applied to the 15,985 exchanges (including handling of repeated measures and participant-level clustering), or controls for confounds such as experimenter demand effects and LLM response consistency. These omissions prevent assessment of whether the reported systematic shifts and their persistence are robust.

    Authors: We agree that fuller methodological transparency is necessary. In the revised manuscript we have substantially expanded the Methods section to include: (1) the precise instruments and item wording used to assess moral inclinations (including references to established deontology/utilitarianism scales and the policy-attitude measures); (2) complete specification of the statistical models, including mixed-effects regressions with random intercepts for participants to handle repeated measures and clustering; and (3) explicit descriptions of controls for demand effects (cover stories, post-experiment suspicion checks) and LLM response consistency (prompt templates, automated validation of principle adherence, and inter-rater checks on a subset of outputs). These additions enable readers to evaluate the robustness of the reported effects and their persistence at the two-week follow-up. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results from controlled experiments

full rationale

The paper presents pre-registered experiments with human-LLM interactions (15,985 exchanges) and reports observed shifts in moral inclinations and policy evaluations. No derivation chain, equations, fitted parameters, or self-citations are invoked to produce the central claims; the results rest on direct behavioral measurements with differential effects between D-LLM and U-LLM conditions. The absence of a neutral-LLM arm raises questions about causal specificity but does not constitute circularity, as the reported findings are not forced by construction from the inputs or prior self-work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions of experimental psychology and statistics rather than on new axioms or invented entities.

axioms (2)
  • domain assumption Moral inclinations can be reliably measured by responses to dilemma vignettes and policy attitude scales
    Invoked when the paper treats post-interaction questionnaire scores as evidence of changed moral inclinations.
  • domain assumption Pre-registered experimental designs with random assignment control for selection bias and demand effects
    Assumed when interpreting the observed shifts as caused by the moral programming.

pith-pipeline@v0.9.0 · 5532 in / 1355 out tokens · 52735 ms · 2026-05-10T16:07:17.975784+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages

  1. [1]

    decaying effect,

    Independent samples t-tests with Bonferroni correction showed that participants who interacted with the D-LLM exhibited significantly higher deontological inclination than those in the U-LLM condition at Time 3, t(122) = 2.94, Bonferroni-adjusted p = 0.004, d = 0.53, whereas participants who interacted with the U-LLM exhibited significantly higher utilita...

  2. [2]

    Awad, E. et al. The Moral Machine experiment. Nature 563, 59–64 (2018)

  3. [3]

    & Rahwan, I

    Bonnefon, J.-F., Shariff, A. & Rahwan, I. The social dilemma of autonomous vehicles. Science 352, 1573–1576 (2016)

  4. [4]

    The moral machine experiment on large language models

    Takemoto, K. The moral machine experiment on large language models. R. Soc. Open Sci. 11, 231393 (2024)

  5. [5]

    Chen, S. F. et al. LLM-assisted systematic review of large language models in clinical medicine. Nat. Med. 1–8 (2026) doi:10.1038/s41591-026-04229-5

  6. [6]

    Rivera, J.-P. et al. Escalation Risks from Language Models in Military and Diplomatic Decision-Making. in Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency 836–898 (Association for Computing Machinery, New York, NY, USA, 2024). doi:10.1145/3630106.3658942

  7. [7]

    Foundations of the Metaphysics of Morals

    Kant, I. Foundations of the Metaphysics of Morals. (Bobbs-Merrill, Indianapolis, 1959)

  8. [8]

    Mill, J. S. Utilitarianism. (Oxford University Press, 1998)

  9. [9]

    Argyle, L. P. et al. Testing theories of political persuasion using AI. Proc. Natl. Acad. Sci. - PNAS 122, e2412815122- (2025)

  10. [10]

    Hackenburg, K. et al. The levers of political persuasion with conversational artificial intelligence. Science 390, eaea3884 (2025)

  11. [11]

    G., Muldowney, S., Eichstaedt, J

    Bai, H., Voelkel, J. G., Muldowney, S., Eichstaedt, J. C. & Willer, R. LLM-generated messages can persuade humans on policy issues. Nat. Commun. 16, 6037 (2025)

  12. [12]

    Lin, H. et al. Persuading voters using human–artificial intelligence dialogues. Nature 648, 394–401 (2025)

  13. [13]

    & Wang, S

    Huang, G. & Wang, S. Is artificial intelligence more persuasive than humans? A meta- analysis. J. Commun. 73, 552–562 (2023)

  14. [14]

    & Spranca, M

    Baron, J. & Spranca, M. Protected Values. Organ. Behav. Hum. Decis. Process. 70, 1–16 (1997)

  15. [15]

    & Reed, A

    Aquino, K. & Reed, A. I. The self-importance of moral identity. J. Pers. Soc. Psychol. 83, 1423–1440 (2002)

  16. [16]

    L., Dougherty, A

    Stanley, M. L., Dougherty, A. M., Yang, B. W., Henne, P. & De Brigard, F. Reasons probably won’t change your mind: The role of reasons in revising moral decisions. J. Exp. Psychol. Gen. 147, 962–987 (2018)

  17. [17]

    & Willer, R

    Feinberg, M. & Willer, R. Moral reframing: A technique for effective and persuasive communication across political divides. Soc. Personal. Psychol. Compass 13, e12501 (2019)

  18. [18]

    Grillo, T. L. H. & Pizzutti, C. Recognizing and Trusting Persuasion Agents: Attitudes Bias Trustworthiness Judgments, but not Persuasion Detection. Pers. Soc. Psychol. Bull. 47, 796– 809 (2021)

  19. [19]

    Skitka, L. J. Do the Means Always Justify the Ends, or Do the Ends Sometimes Justify the Means? A Value Protection Model of Justice Reasoning. Pers. Soc. Psychol. Bull. 28, 588– 597 (2002)

  20. [20]

    M., Kim, T

    Garvey, A. M., Kim, T. & Duhachek, A. Bad News? Send an AI. Good News? Send a Human. J. Mark. 87, 10–25 (2023)

  21. [21]

    F., Olsen, A

    Ward, A. F., Olsen, A. S. & Wegner, D. M. The harm-made mind: observing victimization augments attribution of minds to vegetative patients, robots, and the dead. Psychol. Sci. 24, 1437–1445 (2013)

  22. [22]

    Kim, T. W. & Duhachek, A. Artificial Intelligence and Persuasion: A Construal-Level Account. Psychol. Sci. 31, 363–380 (2020)

  23. [23]

    N., Marsh, A

    Kozak, M. N., Marsh, A. A. & Wegner, D. M. What Do I Think You’re Doing? Action Identification and Mind Attribution. J. Pers. Soc. Psychol. 90, 543–555 (2006)

  24. [24]

    & Liu, S

    Pei, J., Wang, H., Peng, Q. & Liu, S. Saving face: Leveraging artificial intelligence-based negative feedback to enhance employee job performance. Hum. Resour. Manage. 63, 775– 790 (2024)

  25. [25]

    Yam, K. C. et al. When your boss is a robot: Workers are more spiteful to robot supervisors that seem more human. J. Exp. Soc. Psychol. 102, 104360 (2022)

  26. [26]

    Jiao, J. et al. LLM ethics benchmark: a three-dimensional assessment system for evaluating moral reasoning in large language models. Sci. Rep. 15, 34642 (2025)

  27. [27]

    An, Z. & Du, W. MoralReason: Generalizable Moral Decision Alignment For LLM Agents Using Reasoning-Level Reinforcement Learning. Preprint at https://doi.org/10.48550/arXiv.2511.12271 (2025)

  28. [28]

    & Gawronski, B

    Conway, P. & Gawronski, B. Deontological and utilitarian inclinations in moral decision making: A process dissociation approach. J. Pers. Soc. Psychol. 104, 216–235 (2013)

  29. [29]

    Action, Outcome, and Value: A Dual-System Framework for Morality

    Cushman, F. Action, Outcome, and Value: A Dual-System Framework for Morality. Personal. Soc. Psychol. Rev. 17, 273–292 (2013)

  30. [30]

    Combining Human and Artificial Intelligence: Hybrid Problem-Solving in Organizations,

    Raisch, S. & Fomina, K. Combining Human and Artificial Intelligence: Hybrid Problem- Solving in Organizations. Acad. Manage. Rev. https://doi.org/10.5465/amr.2021.0421 (2024) doi:10.5465/amr.2021.0421

  31. [31]

    & Margetts, H

    Hackenburg, K. & Margetts, H. Evaluating the persuasive influence of political microtargeting with large language models. Proc. Natl. Acad. Sci. 121, e2403116121 (2024)

  32. [32]

    when” not “if

    Teeny, J. D. & Matz, S. C. We need to understand “when” not “if” generative AI can enhance personalized persuasion. Proc. Natl. Acad. Sci. 121, e2418005121 (2024)

  33. [33]

    Matz, S. C. et al. The potential of generative AI for personalized persuasion at scale. Sci. Rep. 14, 4692 (2024)

  34. [34]

    & Haidt, J

    Greene, J. & Haidt, J. How (and where) does moral judgment work? Trends Cogn. Sci. 6, 517–523 (2002)

  35. [35]

    Bigman, Y. E. & Gray, K. People are averse to machines making moral decisions. Cognition 181, 21–34 (2018)

  36. [36]

    The emotional dog and its rational tail: A social intuitionist approach to moral judgment

    Haidt, J. The emotional dog and its rational tail: A social intuitionist approach to moral judgment. Psychol. Rev. 108, 814–834 (2001)

  37. [37]

    J., Simmons, J

    Dietvorst, B. J., Simmons, J. P. & Massey, C. Algorithm aversion: People erroneously avoid algorithms after seeing them err. J. Exp. Psychol. Gen. 144, 114–126 (2015)

  38. [38]

    Young, A. D. & Monroe, A. E. Autonomous morals: Inferences of mind predict acceptance of AI behavior in sacrificial moral dilemmas. J. Exp. Soc. Psychol. 85, 103870- (2019)

  39. [39]

    Ideal Observer

    Giubilini, A. & Savulescu, J. The Artificial Moral Advisor. The “Ideal Observer” Meets Artificial Intelligence. Philos. Technol. 31, 169–188 (2018)

  40. [40]

    & Everett, J

    Myers, S. & Everett, J. A. C. People expect artificial moral advisors to be more utilitarian and distrust utilitarian moral advisors. Cognition 256, 106028 (2025)

  41. [41]

    & Yang, Z

    Zhou, Y., Fei, Z., He, Y. & Yang, Z. How Human–Chatbot Interaction Impairs Charitable Giving: The Role of Moral Judgment. J. Bus. Ethics 178, 849–865 (2022)

  42. [42]

    & Rahwan, I

    Köbis, N., Bonnefon, J.-F. & Rahwan, I. Bad machines corrupt good morals. Nat. Hum. Behav. 5, 679–685 (2021)

  43. [43]

    Kelman, H. C. Compliance, identification, and internalization three processes of attitude change. J. Confl. Resolut. 2, 51–60 (1958)

  44. [44]

    Cialdini, R. B. & Goldstein, N. J. Social Influence: Compliance and Conformity. Annu. Rev. Psychol. 55, 591–621 (2004)

  45. [45]

    & Richter, T

    Appel, M. & Richter, T. Persuasive Effects of Fictional Narratives Increase Over Time. Media Psychol. 10, 113–134 (2007)

  46. [46]

    C., Tessitore, T

    Boerman, S. C., Tessitore, T. & Müller, C. M. Long-term effects of brand placement disclosure on persuasion knowledge and brand responses. Int. J. Advert. 40, 26–48 (2021)

  47. [47]

    Vilela, A. M. & Nelson, M. R. Testing the Selectivity Hypothesis in cause-related marketing among Generation Y: [When] Does gender matter for short- and long-term persuasion? J. Mark. Commun. 22, 18–35 (2016)

  48. [48]

    & Greene, J

    Conway, P., Goldstein-Greenwood, J., Polacek, D. & Greene, J. D. Sacrificial utilitarian judgments do reflect concern for the greater good: Clarification via process dissociation and the judgments of philosophers. Cognition 179, 241–265 (2018)

  49. [49]

    From My Cold, Dead Hands: Democratic Consequences of Sacred Rhetoric

    Marietta, M. From My Cold, Dead Hands: Democratic Consequences of Sacred Rhetoric. J. Polit. 70, 767–779 (2008)

  50. [50]

    & Hauser, M

    Cushman, F., Young, L. & Hauser, M. The Role of Conscious Reasoning and Intuition in Moral Judgment: Testing Three Principles of Harm. Psychol. Sci. 17, 1082–1089 (2006)

  51. [51]

    Skitka, L. J. & Mullen, E. The Dark Side of Moral Conviction. Anal. Soc. Issues Public Policy 2, 35–41 (2002)

  52. [52]

    & Ritov, I

    Baron, J. & Ritov, I. Omission bias, individual differences, and normality. Organ. Behav. Hum. Decis. Process. 94, 74–85 (2004)

  53. [53]

    & Zeckhauser, R

    Samuelson, W. & Zeckhauser, R. Status Quo Bias in Decision Making. J. Risk Uncertain. 1, 7–59 (1988)

  54. [54]

    Mandatory vaccinations: Three reasons for and against

    Poole, T. Mandatory vaccinations: Three reasons for and against. BBC News (2021)

  55. [55]

    A Split on the Right Over Whether Teenagers Can Have Guns

    Liptak, A. A Split on the Right Over Whether Teenagers Can Have Guns. The New York Times (2025)

  56. [56]

    Opinion | The Future We Feared Is Already Here

    Klein, E. Opinion | The Future We Feared Is Already Here. The New York Times (2026)

  57. [57]

    Opinion | Buying Sex Should Not Be Legal

    Moran, R. Opinion | Buying Sex Should Not Be Legal. The New York Times (2015)

  58. [58]

    Hayes, A. F. Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach. (Guilford Publications, New York, 2014)

  59. [59]

    M., Medin, D

    Bennis, W. M., Medin, D. L. & Bartels, D. M. The costs and benefits of calculation and moral rules. Perspect. Psychol. Sci. 5, 187–202 (2010)

  60. [60]

    Dickinson, D. L. & Masclet, D. Using ethical dilemmas to predict antisocial choices with real payoff consequences: An experimental study. J. Econ. Behav. Organ. 166, 195–215 (2019)

  61. [61]

    Haugtvedt, C. P. & Petty, R. E. Personality and persuasion: Need for cognition moderates the persistence and resistance of attitude changes. J. Pers. Soc. Psychol. 63, 308–319 (1992)

  62. [62]

    Luke, D. M. & Gawronski, B. Political Ideology and Moral Dilemma Judgments: An Analysis Using the CNI Model. Pers. Soc. Psychol. Bull. 47, 1520–1531 (2021)

  63. [63]

    & Buchner, A

    Faul, F., Erdfelder, E., Lang, A.-G. & Buchner, A. G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39, 175–191 (2007). Competing interests The authors declare that they have no competing interests. Supplementary Information This manuscript contains supplementary information....