Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.
Socio-Contrastive Learning jointly learns socio-demographic representations and textual features via contrastive objectives to predict annotator perspectives more accurately than concatenation baselines.
citing papers explorer
-
Understanding Annotator Safety Policy with Interpretability
Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.
-
Quantifying and Predicting Disagreement in Graded Human Ratings
Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.
-
Modeling Human Perspectives with Socio-Demographic Representations
Socio-Contrastive Learning jointly learns socio-demographic representations and textual features via contrastive objectives to predict annotator perspectives more accurately than concatenation baselines.