A cascading LLM system detects context-dependent harassment in 80k private adolescent Instagram messages and produces victim-centered responses rated significantly more helpful than original replies by human evaluators.
A Web of Hate: Tackling Hateful Speech in Online Social Spaces
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Online social platforms are beset with hateful speech - content that expresses hatred for a person or group of people. Such content can frighten, intimidate, or silence platform users, and some of it can inspire other users to commit violence. Despite widespread recognition of the problems posed by such content, reliable solutions even for detecting hateful speech are lacking. In the present work, we establish why keyword-based methods are insufficient for detection. We then propose an approach to detecting hateful speech that uses content produced by self-identifying hateful communities as training data. Our approach bypasses the expensive annotation process often required to train keyword systems and performs well across several established platforms, making substantial improvements over current state-of-the-art approaches.
fields
cs.SI 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Context-Aware Detection and Victim-Centered Response Generation for Online Harassment in Private Messaging
A cascading LLM system detects context-dependent harassment in 80k private adolescent Instagram messages and produces victim-centered responses rated significantly more helpful than original replies by human evaluators.