Measuring a hate speech spectrum with faceted Rasch item response theory and perspective-aware, explainable-by-design deep learning

Alexander Sahn; Chris J. Kennedy; Claudia von Vacano; Geoff Bacon

arxiv: 2009.10277 · v2 · pith:BDZSHWSWnew · submitted 2020-09-22 · 💻 cs.CL · cs.LG· cs.SI

Measuring a hate speech spectrum with faceted Rasch item response theory and perspective-aware, explainable-by-design deep learning

Chris J. Kennedy , Geoff Bacon , Alexander Sahn , Claudia von Vacano This is my paper

classification 💻 cs.CL cs.LGcs.SI

keywords speechcontinuousdeephatelearningannotatordesign-basedexplainability

0 comments

read the original abstract

We propose a system for measuring hate speech on a continuous, interval-valued spectrum ranging from genocidal to supportive speech by combining supervised deep learning with faceted Rasch item response theory (IRT). We decompose the theoretical construct of hate speech into constituent concepts operationalized as 10 ordinal labels. Those labels are reconstituted via IRT probabilistic latent modeling into an interval outcome measure while simultaneously estimating and adjusting for each annotator's labeling perspective. Our scaling procedure integrates naturally with a multitask deep learning architecture for automated prediction, allowing design-based explainability of the continuous score through those components. We apply this method to a new, open source dataset of 50,070 social media comments sourced from YouTube, Twitter, and Reddit, annotated and labeled by 11,143 United States-based Amazon Mechanical Turk workers. Our RoBERTa-based model shows improved accuracy compared to alternative approaches. This system offers a new paradigm for supervised NLP that encourages continuous rather than binary constructs, and design-based incorporation of annotator perspective and model explainability.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI
cs.CR 2025-07 unverdicted novelty 6.0

Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.
Assessing and Mitigating Miscalibration in LLM-Based Social Science Measurement
cs.AI 2026-05 unverdicted novelty 5.0

LLM confidence for social science text measurements is poorly calibrated across models, and a soft-label distillation pipeline reduces expected calibration error by 43% and Brier score by 34%.
Quantifying and Predicting Disagreement in Graded Human Ratings
cs.CL 2026-05 unverdicted novelty 5.0

Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.
Modeling Human Perspectives with Socio-Demographic Representations
cs.CL 2026-04 unverdicted novelty 5.0

Socio-Contrastive Learning jointly learns socio-demographic representations and textual features via contrastive objectives to predict annotator perspectives more accurately than concatenation baselines.