ImpSH improves cross-domain generalization in implicit hate speech classification by aligning posts with implied statements and applying context-bounded semi-hard negative mining within a triplet learning setup.
Hateful symbols or hateful people? predictive features for hate speech detection on twitter
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 6years
2026 6verdicts
UNVERDICTED 6roles
background 2polarities
background 2representative citing papers
LLMs show split alignment with human hate speech annotations (strong on explicit attributes, inverted on evaluative ones), and attribute-based ridge regression reconstructs continuous scores with R² up to 0.71.
Automated hate speech detectors show poor alignment with heterogeneous in-group judgments on reclaimed slur usage, driven by low inter-annotator agreement and contextual features like derogatory intent.
Closure of the Perspective API exposes structural dependence on a single proprietary toxicity scorer, leaving non-updatable benchmarks and irreproducible results while risking continued reliance on closed LLMs.
Supervised models using embeddings like jina and e5 reach up to 92% accuracy on multilingual hate speech detection, substantially outperforming anomaly detection, while PCA to 64 dimensions preserves most performance in the supervised case.
A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base with independent task modeling and class weighting is reported as effective for multilingual, multicultural, and multievent online polarization detection.
citing papers explorer
-
Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining
ImpSH improves cross-domain generalization in implicit hate speech classification by aligning posts with implied statements and applying context-bounded semi-hard negative mining within a triplet learning setup.
-
Attribute-Based Diagnosis of LLM Alignment with Hate Speech Annotations
LLMs show split alignment with human hate speech annotations (strong on explicit attributes, inverted on evaluative ones), and attribute-based ridge regression reconstructs continuous scores with R² up to 0.71.
-
IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language
Automated hate speech detectors show poor alignment with heterogeneous in-group judgments on reclaimed slur usage, driven by low inter-annotator agreement and contextual features like derogatory intent.
-
Bye Bye Perspective API: Lessons for Measurement Infrastructure in NLP, CSS and LLM Evaluation
Closure of the Perspective API exposes structural dependence on a single proprietary toxicity scorer, leaving non-updatable benchmarks and irreproducible results while risking continued reliance on closed LLMs.
-
Comparison of Modern Multilingual Text Embedding Techniques for Hate Speech Detection Task
Supervised models using embeddings like jina and e5 reach up to 92% accuracy on multilingual hate speech detection, substantially outperforming anomaly detection, while PCA to 64 dimensions preserves most performance in the supervised case.
-
YEZE at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization via Heterogeneous Ensembling
A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base with independent task modeling and class weighting is reported as effective for multilingual, multicultural, and multievent online polarization detection.