Continued pre-training on web data and LLM-ensemble synthetic labels improve multilingual hate speech detection, with gains up to 11% for small models in low-resource settings.
Answers to Research Questions RQ1: Value of unlabelled web data.OWS con- tinued pre-training reliably improves BERT-family models, especially in multilingual low-data settings
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations
Continued pre-training on web data and LLM-ensemble synthetic labels improve multilingual hate speech detection, with gains up to 11% for small models in low-resource settings.