For comparing two binary classifiers using a budget of noisy labels, collecting one label per sample across more samples outperforms aggregating multiple labels per sample.
Whose ground truth? accounting for individual and collective identities underlying dataset annotation
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 3representative citing papers
Enforcing one transcription convention as ground truth in ASR commits epistemic injustice against speakers with aphasia, as shown by varying WER on AphasiaBank data, leading to the proposal of WER-Range across legitimate conventions.
Agreement-based clustering of annotators improves performance on subjective NLP tasks by capturing diverse perspectives better than majority voting or per-annotator modeling.
Automated hate speech detectors show poor alignment with heterogeneous in-group judgments on reclaimed slur usage, driven by low inter-annotator agreement and contextual features like derogatory intent.
A literature review concludes that pursuing consensus in data annotation creates biased AI by dismissing subjective disagreements and enforcing geographic hegemony, and proposes mapping diversity instead.
citing papers explorer
-
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
For comparing two binary classifiers using a budget of noisy labels, collecting one label per sample across more samples outperforms aggregating multiple labels per sample.
-
Beyond Single Ground Truth: Reference Monism as Epistemic Injustice in ASR Evaluation
Enforcing one transcription convention as ground truth in ASR commits epistemic injustice against speakers with aphasia, as shown by varying WER on AphasiaBank data, leading to the proposal of WER-Range across legitimate conventions.
-
Beyond Majority Voting: Agreement-Based Clustering to Model Annotator Perspectives in Subjective NLP Tasks
Agreement-based clustering of annotators improves performance on subjective NLP tasks by capturing diverse perspectives better than majority voting or per-annotator modeling.
-
IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language
Automated hate speech detectors show poor alignment with heterogeneous in-group judgments on reclaimed slur usage, driven by low inter-annotator agreement and contextual features like derogatory intent.
-
The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation
A literature review concludes that pursuing consensus in data annotation creates biased AI by dismissing subjective disagreements and enforcing geographic hegemony, and proposes mapping diversity instead.