The risk of racial bias in hate speech detection

Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, Noah A Smith · 2019

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI

cs.AI · 2026-04-22 · unverdicted · novelty 7.0

Introduces Defensibility Index, Ambiguity Index, and Probabilistic Defensibility Signal to evaluate AI moderation decisions by logical derivability from explicit rules rather than agreement with historical labels, with validation on 193k+ Reddit cases showing 33-46.6 pp metric gaps and a Governance

citing papers explorer

Showing 1 of 1 citing paper.

Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI cs.AI · 2026-04-22 · unverdicted · none · ref 19
Introduces Defensibility Index, Ambiguity Index, and Probabilistic Defensibility Signal to evaluate AI moderation decisions by logical derivability from explicit rules rather than agreement with historical labels, with validation on 193k+ Reddit cases showing 33-46.6 pp metric gaps and a Governance

The risk of racial bias in hate speech detection

fields

years

verdicts

representative citing papers

citing papers explorer