Identifying and Measuring Annotator Bias Based on Annotators ' Demographic Characteristics

Hala Al Kuwatly, Maximilian Wich, Georg Groh · 2020 · DOI 10.18653/v1/2020.alw-1.21

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Understanding Annotator Safety Policy with Interpretability

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.

Quantifying and Predicting Disagreement in Graded Human Ratings

cs.CL · 2026-05-01 · unverdicted · novelty 5.0

Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.

Modeling Human Perspectives with Socio-Demographic Representations

cs.CL · 2026-04-20 · unverdicted · novelty 5.0

Socio-Contrastive Learning jointly learns socio-demographic representations and textual features via contrastive objectives to predict annotator perspectives more accurately than concatenation baselines.

IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language

cs.CL · 2026-04-17 · unverdicted · novelty 5.0

Automated hate speech detectors show poor alignment with heterogeneous in-group judgments on reclaimed slur usage, driven by low inter-annotator agreement and contextual features like derogatory intent.

citing papers explorer

Showing 4 of 4 citing papers.

Understanding Annotator Safety Policy with Interpretability cs.AI · 2026-05-06 · unverdicted · none · ref 47
Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.
Quantifying and Predicting Disagreement in Graded Human Ratings cs.CL · 2026-05-01 · unverdicted · none · ref 83
Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.
Modeling Human Perspectives with Socio-Demographic Representations cs.CL · 2026-04-20 · unverdicted · none · ref 80
Socio-Contrastive Learning jointly learns socio-demographic representations and textual features via contrastive objectives to predict annotator perspectives more accurately than concatenation baselines.
IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language cs.CL · 2026-04-17 · unverdicted · none · ref 1
Automated hate speech detectors show poor alignment with heterogeneous in-group judgments on reclaimed slur usage, driven by low inter-annotator agreement and contextual features like derogatory intent.

Identifying and Measuring Annotator Bias Based on Annotators ' Demographic Characteristics

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer