LLMs refuse jokes from privileged speakers up to 67.5% more often, judge them malicious 64.7% more, and rate them up to 1.5 points higher in social harm.
DeepSeek also shows high acceptance (4.4) with balanced sensitivity, while Claude is notably more restrictive (3.0)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Investigating Counterfactual Unfairness in LLMs towards Identities through Humor
LLMs refuse jokes from privileged speakers up to 67.5% more often, judge them malicious 64.7% more, and rate them up to 1.5 points higher in social harm.