A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
The Risk of Racial Bias in Hate Speech Detection
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 5roles
background 1polarities
background 1representative citing papers
LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
CoGate-LSTM adds prototype-guided cosine feature-space gating to a character-level BiLSTM with multi-source embeddings and focal loss, reaching 0.881 macro-F1 on Jigsaw toxic comments while using 7.3M parameters and outperforming fine-tuned BERT by 6.9 points on minority labels.
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
Automated hate speech detectors show poor alignment with heterogeneous in-group judgments on reclaimed slur usage, driven by low inter-annotator agreement and contextual features like derogatory intent.
citing papers explorer
-
Is She Even Relevant? When BERT Ignores Explicit Gender Cues
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.
-
Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation
LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
-
CoGate-LSTM: Prototype-Guided Feature-Space Gating for Mitigating Gradient Dilution in Imbalanced Toxic Comment Classification
CoGate-LSTM adds prototype-guided cosine feature-space gating to a character-level BiLSTM with multi-source embeddings and focal loss, reaching 0.881 macro-F1 on Jigsaw toxic comments while using 7.3M parameters and outperforming fine-tuned BERT by 6.9 points on minority labels.
-
Ethical and social risks of harm from Language Models
The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.
-
IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language
Automated hate speech detectors show poor alignment with heterogeneous in-group judgments on reclaimed slur usage, driven by low inter-annotator agreement and contextual features like derogatory intent.