Automated Hate Speech Detection and the Problem of Offensive Language

· 2017 · cs.CL · arXiv 1703.04009

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. We used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords. We use crowd-sourcing to label a sample of these tweets into three categories: those containing hate speech, only offensive language, and those with neither. We train a multi-class classifier to distinguish between these different categories. Close analysis of the predictions and the errors shows when we can reliably separate hate speech from other offensive language and when this differentiation is more difficult. We find that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive. Tweets without explicit hate keywords are also more difficult to classify.

representative citing papers

Epistemic Injustice in Language Models: An Audit of Pretraining Filters and Guardrails

cs.CL · 2026-06-04 · unverdicted · novelty 5.0

An audit finds language model filters and guardrails disproportionately suppress mentions of marginalized groups via lexical cues while failing to catch explicit harms.

citing papers explorer

Showing 1 of 1 citing paper.

Epistemic Injustice in Language Models: An Audit of Pretraining Filters and Guardrails cs.CL · 2026-06-04 · unverdicted · none · ref 7 · internal anchor
An audit finds language model filters and guardrails disproportionately suppress mentions of marginalized groups via lexical cues while failing to catch explicit harms.

Automated Hate Speech Detection and the Problem of Offensive Language

fields

years

verdicts

representative citing papers

citing papers explorer