Hatebert: Retraining bert for abusive language detection in english

Caselli, Tommaso, Basile, Valerio, Granitzer, Michael · 2021 · DOI 10.18653/v1/2021.woah-1.3

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LLM Safety From Within: Detecting Harmful Content with Internal Representations

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

SIREN identifies safety neurons via linear probing on internal LLM layers and combines them with adaptive weighting to detect harm, outperforming prior guard models with 250x fewer parameters.

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

cs.CL · 2023-04-13 · accept · novelty 6.0

AGIEval shows GPT-4 exceeding average human scores on SAT Math at 95% and Chinese college entrance English at 92.5%, while revealing weaker results on complex reasoning tasks.

Comparison of Modern Multilingual Text Embedding Techniques for Hate Speech Detection Task

cs.CL · 2026-04-16 · unverdicted · novelty 4.0

Supervised models using embeddings like jina and e5 reach up to 92% accuracy on multilingual hate speech detection, substantially outperforming anomaly detection, while PCA to 64 dimensions preserves most performance in the supervised case.

citing papers explorer

Showing 3 of 3 citing papers.

LLM Safety From Within: Detecting Harmful Content with Internal Representations cs.AI · 2026-04-20 · unverdicted · none · ref 69
SIREN identifies safety neurons via linear probing on internal LLM layers and combines them with adaptive weighting to detect harm, outperforming prior guard models with 250x fewer parameters.
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models cs.CL · 2023-04-13 · accept · none · ref 85
AGIEval shows GPT-4 exceeding average human scores on SAT Math at 95% and Chinese college entrance English at 92.5%, while revealing weaker results on complex reasoning tasks.
Comparison of Modern Multilingual Text Embedding Techniques for Hate Speech Detection Task cs.CL · 2026-04-16 · unverdicted · none · ref 17
Supervised models using embeddings like jina and e5 reach up to 92% accuracy on multilingual hate speech detection, substantially outperforming anomaly detection, while PCA to 64 dimensions preserves most performance in the supervised case.

Hatebert: Retraining bert for abusive language detection in english

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer