Dealing with Annotator Disagreement in Hate Speech Classification
read the original abstract
Hate speech detection is a crucial task, especially on social media where harmful content can spread quickly. Collecting social media content (tweets etc.) to train machine learning models is easy, but detecting and categorizing hate speech can be difficult due to the inherently subjective nature. This subjectivity leads to frequent disagreement among annotators, particularly for subtle or borderline content. Traditional approaches either discard non-consensus samples or force a ''gold standard'' through expert adjudication, ignoring valuable information about uncertainty and diverse human perspectives. We examine the largely overlooked problem of annotator disagreement in hate speech classification and evaluate a range of aggregation methods, including majority voting, ordinal strategies (minimum, maximum, and mean), and analyze their impact across binary, 4-class, and 6-class classification tasks. In addition, we leverage annotators' perceived hate speech strength scores to explore regression-based and hybrid modeling approaches. Among others, we show that filtering non-consensus samples results in over-optimistic results and that the perceived strength provides a complementary signal that enhance classification performance. Finally, we establish new state-of-the-art results for hate speech detection in Turkish tweets, and demonstrate that annotator disagreement, when properly modeled, is a valuable resource for building more robust and reliable systems.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Big AI's Regulatory Capture: Mapping Industry Interference and Government Complicity
Big AI captures regulation through 27 mechanisms in five categories, most commonly via discourse influence and law elusion, often justified by narratives that regulation stifles innovation or serves national interest.
-
Understanding Annotator Safety Policy with Interpretability
Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.