Lexical Anthropomorphization Influences on Moral Judgments of AI Bad Behavior
Pith reviewed 2026-05-07 15:13 UTC · model grok-4.3
The pith
Humanizing language about AI has little effect on moral judgments of its misbehavior, while the specific type of violation drives those judgments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across four experiments (total N=1,020), lexical anthropomorphism primes and humanizing design cues showed little influence on moral judgments of misbehaving AI. Where effects emerged, high-anthropomorphic primes elevated perceptions of an AI's capacity for dishonesty. The type of moral violation observed was the strongest predictor of moral judgments, with harm and degradation violations producing the broadest negative character assessments. Prime drift, horn effects, and egoistic value orientations emerged as potentially important predictors of AI moral judgments.
What carries the argument
Lexical anthropomorphism primes (humanizing language) tested against moral violation types, with design cues like icons and names as additional factors in scenario-based judgments.
If this is right
- The category of misbehavior matters more for public blame than how the AI is described.
- Harm and degradation violations lead to stronger negative views of the AI than other violation types.
- Media coverage of AI errors may not need to avoid human-like terms to avoid biasing moral reactions.
- Individual traits like egoistic values could shape how language primes affect AI judgments.
Where Pith is reading between the lines
- If true, AI developers might prioritize avoiding specific harms over worrying about descriptive language.
- This setup could apply to judgments of other non-human actors, such as animals or automated systems in different domains.
- Real-world field tests with deployed AI could check whether scenario results hold when stakes feel personal.
Load-bearing premise
The language and design changes actually altered how human-like participants saw the AI, and that answers to hypothetical scenarios match how people would judge real AI misbehavior.
What would settle it
A study in which people interact directly with an actual AI system that commits different moral violations, with language either humanized or not, and then rate the AI's character without scenario prompts.
Figures
read the original abstract
Anthropomorphic language describing artificial intelligence (AI) is widespread in media, policy, and everyday discourse; so too are discussions of AI bad behavior, from hallucinations to inappropriate comments. How does humanizing language about AI shape moral judgments when AI behaves badly? Across four experiments (total N = 1,020), we tested whether lexical anthropomorphism (LA) primes shape judgments of AI moral character, behavior morality, and behavioral responsibility. Studies 1-3 tested interactions between anthropomorphic language and humanizing design cues (icons, names, self-referencing) in the context of amoral errors. Study 4 extended this to genuinely immoral AI behavior across seven moral-violation types. Results indicate humanizing language and design cues have little influence on moral judgments of misbehaving AI. Where effects emerged, high-anthropomorphic primes elevated perceptions of an AI's capacity for dishonesty. The type of moral violation observed was the strongest predictor of moral judgments, with harm and degradation violations producing the broadest negative character assessments. Prime drift, horn effects, and egoistic value orientations emerged as potentially important predictors of AI moral judgments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports four experiments (total N=1,020) testing whether lexical anthropomorphism (LA) primes—via wording, icons, names, and self-referencing—combined with humanizing design cues shape moral judgments of AI amoral errors (Studies 1-3) and immoral behavior across seven violation types (Study 4). It claims these humanizing elements have little influence on judgments of AI moral character, behavior morality, and responsibility; the type of moral violation is the strongest predictor (harm and degradation violations yielding broadest negative assessments); high-anthropomorphic primes sometimes elevated dishonesty-capacity perceptions; and prime drift, horn effects, and egoistic values predict judgments.
Significance. If the LA manipulations are shown to have reliably shifted perceived anthropomorphism, the results indicate that moral judgments of AI misbehavior are driven primarily by violation type rather than framing, with implications for AI ethics discourse, media portrayals, and design guidelines. The multi-study design, large sample, and exploration of multiple violation types provide a useful empirical contribution to HCI and moral psychology of technology. Explicit credit is due for testing both amoral and immoral behaviors and for surfacing prime drift as an individual-difference factor.
major comments (2)
- [Methods and Results sections] Methods and Results sections: The central claim that LA primes and design cues have 'little influence' on moral judgments is load-bearing on the assumption that the manipulations successfully increased perceived anthropomorphism. No manipulation checks, condition means, or statistical tests for perceived humanness/anthropomorphism ratings are described, leaving open the possibility that null effects reflect failed inductions rather than true insensitivity of moral judgments. This must be addressed before the null results can support the headline conclusion.
- [Study 4 results and General Discussion] Study 4 results and General Discussion: The finding that moral violation type is the strongest predictor is interesting, but without reported effect sizes, power analyses, or full regression tables (including interactions with LA condition), it is difficult to evaluate how much variance is explained by violation type versus the anthropomorphism factors or their interactions.
minor comments (3)
- [Abstract and Methods] Abstract and Methods: Add participant demographics, exclusion criteria, and a brief description of the statistical models (e.g., ANOVA or regression specifications) used for each key analysis.
- [Results sections] Results sections: Report effect sizes (e.g., partial η² or Cohen’s d) and confidence intervals alongside all p-values for both significant and null effects to allow readers to assess practical significance.
- [General Discussion] General Discussion: Clarify the operational definition of 'prime drift' and how it was measured, as it appears central to interpreting variability in the LA manipulation.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive feedback. We address each major comment below, indicating the revisions we will make to strengthen the manuscript's transparency and interpretability.
read point-by-point responses
-
Referee: [Methods and Results sections] Methods and Results sections: The central claim that LA primes and design cues have 'little influence' on moral judgments is load-bearing on the assumption that the manipulations successfully increased perceived anthropomorphism. No manipulation checks, condition means, or statistical tests for perceived humanness/anthropomorphism ratings are described, leaving open the possibility that null effects reflect failed inductions rather than true insensitivity of moral judgments. This must be addressed before the null results can support the headline conclusion.
Authors: We agree that explicit manipulation checks are necessary to confirm the effectiveness of the lexical anthropomorphism primes. The original manuscript omitted these checks, which is a limitation. In the revision we will add a new subsection in the Methods and Results reporting condition means, standard deviations, and inferential tests (ANOVAs or t-tests) for perceived anthropomorphism/humanness ratings across all studies. These analyses will be presented before the primary moral-judgment results so readers can directly evaluate induction success. revision: yes
-
Referee: [Study 4 results and General Discussion] Study 4 results and General Discussion: The finding that moral violation type is the strongest predictor is interesting, but without reported effect sizes, power analyses, or full regression tables (including interactions with LA condition), it is difficult to evaluate how much variance is explained by violation type versus the anthropomorphism factors or their interactions.
Authors: We concur that effect sizes, power information, and complete regression output are required for a full evaluation of the relative predictive strength of violation type versus anthropomorphism factors. In the revised manuscript we will expand the Study 4 results section to include partial eta-squared values for all ANOVA effects, R-squared and adjusted R-squared for the regression models, post-hoc power calculations based on the observed sample size, and the full regression tables that display all main effects, two-way and higher-order interactions with LA condition, and any covariates. These additions will be referenced in the General Discussion when interpreting the dominance of violation type. revision: yes
Circularity Check
No circularity: empirical study with data-driven results
full rationale
This paper reports four experiments (N=1020) measuring participant moral judgments in response to AI vignettes with lexical and design manipulations. No equations, derivations, or first-principles claims exist; all outcomes are statistical results from self-reported data. No self-citation chains, fitted parameters renamed as predictions, or self-definitional constructs appear in the reported methods or findings. The central claims rest on observed condition differences and regression predictors rather than any reduction to the paper's own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Moral judgments can be accurately captured via Likert-scale self-reports in experimental settings.
- ad hoc to paper Lexical primes can be used to manipulate perceived anthropomorphism independently of other cues.
Reference graph
Works this paper leans on
-
[1]
https://doi.org/10.1177/0146621690014004 Förster, S., & Skop, Y. (2025). Between fact and fairy: Tracing the hallucination metaphor in AI discourse. AI & Society. https://doi.org/10.1007/s00146-025-02392-w Frazer, R. (2022). Experimental operationalizations of anthropomorphism in HCI contexts: A scoping review. Communication Reports, 35 (3), 173-189. http...
-
[2]
A., Vega, H., Adisa, I., & Bailey, C
https://doi.org/10.1037/0278-7393.11.1.59 Irgens, G. A., Vega, H., Adisa, I., & Bailey, C. (2022). Characterizing children’s conceptual knowledge and computational practices in a critical machine learning educational program. International Journal of Child- Computer Interaction , 34, 100541. https://doi.org/10.1016/j.ijcci.2022.100541 Iyer, R., Koleva, S....
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.