LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
arXiv preprint arXiv:2102.04130 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2representative citing papers
An audit finds language model filters and guardrails disproportionately suppress mentions of marginalized groups via lexical cues while failing to catch explicit harms.
citing papers explorer
-
Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation
LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
-
Epistemic Injustice in Language Models: An Audit of Pretraining Filters and Guardrails
An audit finds language model filters and guardrails disproportionately suppress mentions of marginalized groups via lexical cues while failing to catch explicit harms.