LLMs refuse jokes from privileged speakers up to 67.5% more often, judge them malicious 64.7% more, and rate them up to 1.5 points higher in social harm.
Antonios Kalloniatis and Panagiotis Adamidis
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2verdicts
UNVERDICTED 2representative citing papers
Across 43,200 simulations with five LLMs and five scenarios, model trust in humans aligns with human-like patterns driven by trustworthiness dimensions and is sometimes biased by age, gender, and religion.
citing papers explorer
-
Investigating Counterfactual Unfairness in LLMs towards Identities through Humor
LLMs refuse jokes from privileged speakers up to 67.5% more often, judge them malicious 64.7% more, and rate them up to 1.5 points higher in social harm.
-
A closer look at how large language models trust humans: patterns and biases
Across 43,200 simulations with five LLMs and five scenarios, model trust in humans aligns with human-like patterns driven by trustworthiness dimensions and is sometimes biased by age, gender, and religion.