LLMs refuse jokes from privileged speakers up to 67.5% more often, judge them malicious 64.7% more, and rate them up to 1.5 points higher in social harm.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Investigating Counterfactual Unfairness in LLMs towards Identities through Humor
LLMs refuse jokes from privileged speakers up to 67.5% more often, judge them malicious 64.7% more, and rate them up to 1.5 points higher in social harm.