pith. sign in

arxiv: 2604.25814 · v1 · submitted 2026-04-28 · 💻 cs.HC · cs.CY

Lexical Anthropomorphization Influences on Moral Judgments of AI Bad Behavior

Pith reviewed 2026-05-07 15:13 UTC · model grok-4.3

classification 💻 cs.HC cs.CY
keywords AIanthropomorphismmoral judgmentsAI ethicshuman-AI interactionmoral violationslexical priming
0
0 comments X

The pith

Humanizing language about AI has little effect on moral judgments of its misbehavior, while the specific type of violation drives those judgments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether human-like language when talking about AI changes how people judge its bad actions across four experiments with over 1000 participants. It finds that such language and design cues barely shift judgments of the AI's moral character, behavior morality, or responsibility. The kind of moral violation observed—especially harm and degradation—turns out to be the main driver of negative views. This points to language choices mattering less than the substance of what the AI actually does wrong.

Core claim

Across four experiments (total N=1,020), lexical anthropomorphism primes and humanizing design cues showed little influence on moral judgments of misbehaving AI. Where effects emerged, high-anthropomorphic primes elevated perceptions of an AI's capacity for dishonesty. The type of moral violation observed was the strongest predictor of moral judgments, with harm and degradation violations producing the broadest negative character assessments. Prime drift, horn effects, and egoistic value orientations emerged as potentially important predictors of AI moral judgments.

What carries the argument

Lexical anthropomorphism primes (humanizing language) tested against moral violation types, with design cues like icons and names as additional factors in scenario-based judgments.

If this is right

  • The category of misbehavior matters more for public blame than how the AI is described.
  • Harm and degradation violations lead to stronger negative views of the AI than other violation types.
  • Media coverage of AI errors may not need to avoid human-like terms to avoid biasing moral reactions.
  • Individual traits like egoistic values could shape how language primes affect AI judgments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If true, AI developers might prioritize avoiding specific harms over worrying about descriptive language.
  • This setup could apply to judgments of other non-human actors, such as animals or automated systems in different domains.
  • Real-world field tests with deployed AI could check whether scenario results hold when stakes feel personal.

Load-bearing premise

The language and design changes actually altered how human-like participants saw the AI, and that answers to hypothetical scenarios match how people would judge real AI misbehavior.

What would settle it

A study in which people interact directly with an actual AI system that commits different moral violations, with language either humanized or not, and then rate the AI's character without scenario prompts.

Figures

Figures reproduced from arXiv: 2604.25814 by Jaime Banks, Nicholas David Bowman, Roman Saladino.

Figure 2
Figure 2. Figure 2: Experimental Manipulation of Design Features view at source ↗
Figure 1
Figure 1. Figure 1: Stimulus Prime, High-Anthropomorphic Language Condition 3.1.2 Stimulus Design Features Each study focused on a different design feature with human￾signaling and machine-signaling variations, along with a control (i.e., feature absent) in each set. Study 1 manipulated the AI’s visual icon (human-like brain, machine-like chip, none). Study 2 manipulated the AI’s name (human-like “JACKIE,” machine-like “J4-K1… view at source ↗
Figure 3
Figure 3. Figure 3: Main Effect of Anthropomorphic Priming on view at source ↗
Figure 4
Figure 4. Figure 4: Prime Drift, From Manipulated Prime to Self view at source ↗
Figure 5
Figure 5. Figure 5: Prime Drift, From Manipulated Prime to Self view at source ↗
Figure 6
Figure 6. Figure 6: Effects of Prime Drift on Judgments of AI Behavior view at source ↗
read the original abstract

Anthropomorphic language describing artificial intelligence (AI) is widespread in media, policy, and everyday discourse; so too are discussions of AI bad behavior, from hallucinations to inappropriate comments. How does humanizing language about AI shape moral judgments when AI behaves badly? Across four experiments (total N = 1,020), we tested whether lexical anthropomorphism (LA) primes shape judgments of AI moral character, behavior morality, and behavioral responsibility. Studies 1-3 tested interactions between anthropomorphic language and humanizing design cues (icons, names, self-referencing) in the context of amoral errors. Study 4 extended this to genuinely immoral AI behavior across seven moral-violation types. Results indicate humanizing language and design cues have little influence on moral judgments of misbehaving AI. Where effects emerged, high-anthropomorphic primes elevated perceptions of an AI's capacity for dishonesty. The type of moral violation observed was the strongest predictor of moral judgments, with harm and degradation violations producing the broadest negative character assessments. Prime drift, horn effects, and egoistic value orientations emerged as potentially important predictors of AI moral judgments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper reports four experiments (total N=1,020) testing whether lexical anthropomorphism (LA) primes—via wording, icons, names, and self-referencing—combined with humanizing design cues shape moral judgments of AI amoral errors (Studies 1-3) and immoral behavior across seven violation types (Study 4). It claims these humanizing elements have little influence on judgments of AI moral character, behavior morality, and responsibility; the type of moral violation is the strongest predictor (harm and degradation violations yielding broadest negative assessments); high-anthropomorphic primes sometimes elevated dishonesty-capacity perceptions; and prime drift, horn effects, and egoistic values predict judgments.

Significance. If the LA manipulations are shown to have reliably shifted perceived anthropomorphism, the results indicate that moral judgments of AI misbehavior are driven primarily by violation type rather than framing, with implications for AI ethics discourse, media portrayals, and design guidelines. The multi-study design, large sample, and exploration of multiple violation types provide a useful empirical contribution to HCI and moral psychology of technology. Explicit credit is due for testing both amoral and immoral behaviors and for surfacing prime drift as an individual-difference factor.

major comments (2)
  1. [Methods and Results sections] Methods and Results sections: The central claim that LA primes and design cues have 'little influence' on moral judgments is load-bearing on the assumption that the manipulations successfully increased perceived anthropomorphism. No manipulation checks, condition means, or statistical tests for perceived humanness/anthropomorphism ratings are described, leaving open the possibility that null effects reflect failed inductions rather than true insensitivity of moral judgments. This must be addressed before the null results can support the headline conclusion.
  2. [Study 4 results and General Discussion] Study 4 results and General Discussion: The finding that moral violation type is the strongest predictor is interesting, but without reported effect sizes, power analyses, or full regression tables (including interactions with LA condition), it is difficult to evaluate how much variance is explained by violation type versus the anthropomorphism factors or their interactions.
minor comments (3)
  1. [Abstract and Methods] Abstract and Methods: Add participant demographics, exclusion criteria, and a brief description of the statistical models (e.g., ANOVA or regression specifications) used for each key analysis.
  2. [Results sections] Results sections: Report effect sizes (e.g., partial η² or Cohen’s d) and confidence intervals alongside all p-values for both significant and null effects to allow readers to assess practical significance.
  3. [General Discussion] General Discussion: Clarify the operational definition of 'prime drift' and how it was measured, as it appears central to interpreting variability in the LA manipulation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive feedback. We address each major comment below, indicating the revisions we will make to strengthen the manuscript's transparency and interpretability.

read point-by-point responses
  1. Referee: [Methods and Results sections] Methods and Results sections: The central claim that LA primes and design cues have 'little influence' on moral judgments is load-bearing on the assumption that the manipulations successfully increased perceived anthropomorphism. No manipulation checks, condition means, or statistical tests for perceived humanness/anthropomorphism ratings are described, leaving open the possibility that null effects reflect failed inductions rather than true insensitivity of moral judgments. This must be addressed before the null results can support the headline conclusion.

    Authors: We agree that explicit manipulation checks are necessary to confirm the effectiveness of the lexical anthropomorphism primes. The original manuscript omitted these checks, which is a limitation. In the revision we will add a new subsection in the Methods and Results reporting condition means, standard deviations, and inferential tests (ANOVAs or t-tests) for perceived anthropomorphism/humanness ratings across all studies. These analyses will be presented before the primary moral-judgment results so readers can directly evaluate induction success. revision: yes

  2. Referee: [Study 4 results and General Discussion] Study 4 results and General Discussion: The finding that moral violation type is the strongest predictor is interesting, but without reported effect sizes, power analyses, or full regression tables (including interactions with LA condition), it is difficult to evaluate how much variance is explained by violation type versus the anthropomorphism factors or their interactions.

    Authors: We concur that effect sizes, power information, and complete regression output are required for a full evaluation of the relative predictive strength of violation type versus anthropomorphism factors. In the revised manuscript we will expand the Study 4 results section to include partial eta-squared values for all ANOVA effects, R-squared and adjusted R-squared for the regression models, post-hoc power calculations based on the observed sample size, and the full regression tables that display all main effects, two-way and higher-order interactions with LA condition, and any covariates. These additions will be referenced in the General Discussion when interpreting the dominance of violation type. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical study with data-driven results

full rationale

This paper reports four experiments (N=1020) measuring participant moral judgments in response to AI vignettes with lexical and design manipulations. No equations, derivations, or first-principles claims exist; all outcomes are statistical results from self-reported data. No self-citation chains, fitted parameters renamed as predictions, or self-definitional constructs appear in the reported methods or findings. The central claims rest on observed condition differences and regression predictors rather than any reduction to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The research depends on established assumptions from social psychology regarding experimental manipulation and measurement validity.

axioms (2)
  • domain assumption Moral judgments can be accurately captured via Likert-scale self-reports in experimental settings.
    Core to all psychological studies of this type.
  • ad hoc to paper Lexical primes can be used to manipulate perceived anthropomorphism independently of other cues.
    Tested in studies 1-3.

pith-pipeline@v0.9.0 · 5497 in / 1396 out tokens · 84599 ms · 2026-05-07T15:13:12.015356+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

  1. [1]

    https://doi.org/10.1177/0146621690014004 Förster, S., & Skop, Y. (2025). Between fact and fairy: Tracing the hallucination metaphor in AI discourse. AI & Society. https://doi.org/10.1007/s00146-025-02392-w Frazer, R. (2022). Experimental operationalizations of anthropomorphism in HCI contexts: A scoping review. Communication Reports, 35 (3), 173-189. http...

  2. [2]

    A., Vega, H., Adisa, I., & Bailey, C

    https://doi.org/10.1037/0278-7393.11.1.59 Irgens, G. A., Vega, H., Adisa, I., & Bailey, C. (2022). Characterizing children’s conceptual knowledge and computational practices in a critical machine learning educational program. International Journal of Child- Computer Interaction , 34, 100541. https://doi.org/10.1016/j.ijcci.2022.100541 Iyer, R., Koleva, S....