Demographic information aids perspective-aware hate speech detection in regimes of low training disagreement and high test disagreement, with a gated residual model proving effective on high-disagreement examples across MHS and POPQUORN datasets.
InProceedings of the 1st Workshop on Perspectivist Approaches to NLP@ LREC2022, pages 83–94
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CL 2roles
background 1polarities
background 1representative citing papers
RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.
citing papers explorer
-
When Does Demographic Information Help? Data and Modeling Regimes for Perspective-Aware Hate Speech Detection
Demographic information aids perspective-aware hate speech detection in regimes of low training disagreement and high test disagreement, with a gated residual model proving effective on high-disagreement examples across MHS and POPQUORN datasets.