Who and What? Using Linguistic Features and Annotator Characteristics to Analyze Annotation Variation
Pith reviewed 2026-05-08 10:17 UTC · model grok-4.3
The pith
Annotation variation in harmful language detection stems primarily from interactions between linguistic cues in the text and annotator attitudes rather than from either factor alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our analysis of four reference datasets shows that the interplay between linguistic features of the text and annotator characteristics is essential for explaining label variation in harmful language detection. Interactions uncover intersectional effects that single-factor approaches miss, with lexical cues and annotator attitudes emerging as particularly influential. Effect patterns nevertheless differ substantially across the datasets, which limits generalization and transfer.
What carries the argument
Multivariate statistical models that incorporate interaction terms between linguistic features of the items and annotator characteristics such as attitudes and demographics.
Load-bearing premise
The linguistic features and annotator characteristics measured in the study, together with the statistical models applied, are sufficient to capture the main sources of annotation variation without important omitted factors or dataset-specific artifacts.
What would settle it
A follow-up analysis on new harmful language datasets or with additional linguistic and annotator variables that finds statistically insignificant interaction effects or highly consistent patterns across all datasets would undermine the central claim.
Figures
read the original abstract
Human label variation has been established as a central phenomenon in NLP: the perspectives different annotators have on the same item need to be embraced. Data collection practices thus shifted towards increasing the annotator numbers and releasing disaggregated datasets, harmful language being most resourced due to its high subjectivity. While this resulted in rich information about \textit{who} annotated (sociodemographics, attitudes, etc.), the \textit{what} (e.g., linguistic properties of items), and their interplay has received little attention. We present the first large-scale analysis of four reference datasets for harmful language detection, bringing together annotator characteristics, linguistic properties of the items, and their interactions in a statistically informed picture. We find that interactions are crucial, revealing intersectional effects ignored in previous work, and that a strong role is played by lexical cues and annotator attitudes. Effect patterns, however, vary considerably across datasets. This urges caution about generalization and transferability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents the first large-scale analysis of four reference datasets for harmful language detection. It integrates annotator characteristics (sociodemographics and attitudes), linguistic properties of the items, and their interactions within statistical models to examine sources of annotation variation. The central claims are that interactions are crucial (revealing intersectional effects ignored in prior work), that lexical cues and annotator attitudes play strong roles, and that effect patterns vary considerably across datasets, urging caution about generalization and transferability.
Significance. If the statistical findings prove robust, the work would advance NLP research on subjective annotation tasks by demonstrating that isolated analyses of annotator traits or item features are insufficient and that modeling their interactions is necessary to capture intersectional effects. This could influence data collection practices and model development for harmful language detection by highlighting dataset-specific patterns and the risks of overgeneralization.
major comments (2)
- [Methods] Methods: The manuscript provides no details on the regression specifications used to assess interactions (e.g., logistic vs. linear mixed-effects models, exact terms for annotator-linguistic interactions, inclusion of random effects for annotators/items, or controls for dataset as a factor). Without these, the claim that interactions are 'crucial' cannot be evaluated for robustness against omitted-variable bias or dataset artifacts.
- [Results] Results: The assertion that 'effect patterns vary considerably across datasets' is presented without quantitative support such as tests for coefficient heterogeneity, cross-dataset interaction significance, or formal comparisons of model fits. This weakens the argument that the variation undermines generalization.
minor comments (1)
- [Abstract] Abstract: The phrase 'statistically informed picture' is vague; a one-sentence summary of the modeling approach (e.g., 'via mixed-effects regressions with interaction terms') would improve clarity.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address each major comment below and have revised the paper to provide the requested details and quantitative support.
read point-by-point responses
-
Referee: [Methods] Methods: The manuscript provides no details on the regression specifications used to assess interactions (e.g., logistic vs. linear mixed-effects models, exact terms for annotator-linguistic interactions, inclusion of random effects for annotators/items, or controls for dataset as a factor). Without these, the claim that interactions are 'crucial' cannot be evaluated for robustness against omitted-variable bias or dataset artifacts.
Authors: We acknowledge that the original manuscript did not include sufficient detail on the regression specifications. In the revised version, we have added a new subsection titled 'Statistical Analysis' under Methods. We employed logistic mixed-effects models (implemented in R using the lme4 package) with the binary annotation label (harmful vs. non-harmful) as the dependent variable. Fixed effects comprised main effects for annotator characteristics (sociodemographics and attitudes), linguistic features (lexical, syntactic, and semantic cues extracted via standard NLP pipelines), and all two-way interaction terms between annotator traits and linguistic features. Random intercepts were specified for both annotators and items to account for repeated measures and individual variability. Dataset was included as a fixed factor, with additional interactions to permit dataset-specific effects. These specifications directly mitigate concerns about omitted-variable bias and enable evaluation of the robustness of the interaction effects. revision: yes
-
Referee: [Results] Results: The assertion that 'effect patterns vary considerably across datasets' is presented without quantitative support such as tests for coefficient heterogeneity, cross-dataset interaction significance, or formal comparisons of model fits. This weakens the argument that the variation undermines generalization.
Authors: We agree that additional quantitative evidence would strengthen this claim. The revised manuscript now includes a combined multi-dataset model with three-way interaction terms (annotator characteristic × linguistic feature × dataset). We report results from likelihood ratio tests comparing nested models with and without the dataset interactions, as well as Wald tests for pairwise differences in key interaction coefficients across datasets. Several interactions (particularly those involving annotator attitudes and lexical cues) show statistically significant heterogeneity (p < 0.05 after correction). We also added a supplementary table with model fit comparisons (AIC/BIC) between pooled and dataset-specific models. These additions provide formal support for the observed variation and the associated caution regarding generalization and transferability. revision: yes
Circularity Check
No circularity: empirical regression analysis on existing datasets
full rationale
The paper conducts statistical analysis (regressions on linguistic features, annotator traits, and interactions) across four pre-existing harmful language datasets. No derivations, predictions, or results reduce to inputs by construction; all claims are observational patterns extracted from fitted models on the data. No self-citations support load-bearing uniqueness theorems, ansatzes, or self-definitions. The work is self-contained empirical analysis without closed theoretical loops.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Linguistic features and annotator characteristics can be treated as measurable, independent inputs to regression or similar models without substantial measurement error.
Reference graph
Works this paper leans on
-
[1]
2020 , url=
Honnibal, Matthew and Montani, Ines and Van Landeghem, Sofie and Boyd, Adriane , doi =. 2020 , url=
2020
-
[2]
2024 , url =
Polars , title =. 2024 , url =
2024
-
[3]
1967 , publisher=
Automated Readability Index , author=. 1967 , publisher=
1967
-
[4]
2014 , publisher=
Brysbaert, Marc and Warriner, Amy Beth and Kuperman, Victor , journal=. 2014 , publisher=
2014
-
[5]
2019 , publisher=
Brysbaert, Marc and Mandera, Pawe. 2019 , publisher=
2019
-
[6]
2012 , publisher=
Kuperman, Victor and Stadthagen-Gonzalez, Hans and Brysbaert, Marc , journal=. 2012 , publisher=
2012
-
[7]
and Binney, Richard J
Diveica, Veronica and Pexman, Penny M. and Binney, Richard J. , journal=. 2023 , publisher=
2023
-
[8]
2024 , publisher=
Winter, Bodo and Lupyan, Gary and Perry, Lynn K and Dingemanse, Mark and Perlman, Marcus , journal=. 2024 , publisher=
2024
-
[9]
2020 , publisher=
Lynott, Dermot and Connell, Louise and Brysbaert, Marc and Brand, James and Carney, James , journal=. 2020 , publisher=
2020
-
[10]
Certain Language Skills in Children: Their Development and Interrelationships
MILDRED C. TEMPLIN , edition =. "Certain Language Skills in Children: Their Development and Interrelationships" , urldate =
-
[11]
Sur quoi se fonde la notion d'etendue theoratique du vocabulaire?
Dugast, Daniel. Sur quoi se fonde la notion d'etendue theoratique du vocabulaire?. Le francais Modern. 1978
1978
-
[12]
1972 , publisher=
Mass, Heinz-Dieter , journal=. 1972 , publisher=
1972
-
[13]
Herbert S. Sichel , title =. Journal of the American Statistical Association , volume =. 1975 , publisher =. doi:10.1080/01621459.1975.10482469 , URL =
-
[14]
1944 , publisher=
The statistical study of literary vocabulary , author=. 1944 , publisher=
1944
-
[15]
, address =
Guiraud, Pierre. , address =. Les caracte\`eres statistiques du vocabulaire : essai de m\'ethodologie , year =. Les caracte\`eres statistiques du vocabulaire : essai de m\'ethodologie , keywords =
-
[16]
Language and Thought , year =
John Bissell Carroll , editor =. Language and Thought , year =
-
[17]
1964 , publisher=
Quantitative Linguistics , author=. 1964 , publisher=
1964
-
[18]
1955 , publisher=
Herdan, Gustav , journal=. 1955 , publisher=
1955
-
[19]
, journal=
Simpson, Edward H. , journal=. 1949 , url=
1949
-
[20]
1997 , publisher=
Quantifying lexical diversity in the study of language development , author=. 1997 , publisher=
1997
-
[21]
Michael A. Covington and Joe D. McFall and , title =. Journal of Quantitative Linguistics , volume =. 2010 , publisher =. doi:10.1080/09296171003643098 , URL =
-
[22]
McCarthy and Scott Jarvis , title =
Philip M. McCarthy and Scott Jarvis , title =. Language Testing , volume =. 2007 , doi =
2007
-
[23]
and Jarvis, Scott , journal=
McCarthy, Philip M. and Jarvis, Scott , journal=. 2010 , publisher=
2010
-
[24]
Studies in Second Language Acquisition , year=
Lexis in composition: a performance analysis of Swedish learners' written English , author=. Studies in Second Language Acquisition , year=
-
[25]
Peter and Fishburne, Robert P
Kincaid, J. Peter and Fishburne, Robert P. Jr. and Rogers, Richard L. and Chissom, Brad S. , institution=. 1975 , url=
1975
-
[26]
, author=
A computer readability formula designed for machine scoring. , author=. Journal of Applied Psychology , volume=. 1975 , publisher=
1975
-
[27]
Journal of reading , volume=
SMOG grading-a new readability formula , author=. Journal of reading , volume=. 1969 , publisher=
1969
-
[28]
Bj. L. 1968 , publisher=
1968
-
[29]
Seventh Australian Reading Association Conference , pages=
Anderson, Jonathan , year=. Seventh Australian Reading Association Conference , pages=
-
[30]
Mohammad, Saif M. and Turney, Peter D. , title =. Computational Intelligence , volume =. doi:https://doi.org/10.1111/j.1467-8640.2012.00460.x , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8640.2012.00460.x , abstract =
-
[31]
CEUR Workshop proceedings , volume=
Hurtlex: A multilingual lexicon of words to hurt , author=. CEUR Workshop proceedings , volume=. 2018 , organization=
2018
-
[32]
Rezvan, Mohammadreza and Shekarpour, Saeedeh and Balasuriya, Lakshika and Thirunarayan, Krishnaprasad and Shalin, Valerie L. and Sheth, Amit , title =. Proceedings of the 10th ACM Conference on Web Science , pages =. 2018 , isbn =. doi:10.1145/3201064.3201103 , abstract =
-
[33]
Seventeenth Symposium on Usable Privacy and Security (SOUPS 2021) , year =
Deepak Kumar and Patrick Gage Kelley and Sunny Consolvo and Joshua Mason and Elie Bursztein and Zakir Durumeric and Kurt Thomas and Michael Bailey , title =. Seventeenth Symposium on Usable Privacy and Security (SOUPS 2021) , year =
2021
-
[34]
Behavior research methods, instruments, & computers , volume=
Coh-Metrix: Analysis of text on cohesion and language , author=. Behavior research methods, instruments, & computers , volume=. 2004 , publisher=
2004
-
[35]
Data Protection and Privacy , volume=
The dataset nutrition label , author=. Data Protection and Privacy , volume=. 2020 , publisher=
2020
-
[36]
Smith, Nicole DeCario, and Will Buchanan
Pushkarna, Mahima and Zaldivar, Andrew and Kjartansson, Oddur , title =. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages =. 2022 , isbn =. doi:10.1145/3531146.3533231 , abstract =
-
[37]
Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and Presser, Shawn and Leahy, Connor , journal=. The
-
[38]
Measuring Massive Multitask Language Understanding , author=
-
[39]
2025 , eprint=
Are We Done with MMLU? , author=. 2025 , eprint=
2025
-
[41]
Toward a perspectivist turn in ground truthing for predictive computing
Toward a Perspectivist Turn in Ground Truthing for Predictive Computing , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2023 , month=. doi:10.1609/aaai.v37i6.25840 , abstractNote=
-
[42]
The 2024 ACM Conference on Fairness, Accountability, and Transparency , pages=
Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates , author=. The 2024 ACM Conference on Fairness, Accountability, and Transparency , pages=
2024
-
[43]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Everyone’s voice matters: Quantifying annotation disagreement using demographic information , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=. 2023 , url=
2023
-
[44]
Wenbo Zhang and Hangzhi Guo and Ian D Kivlichan and Vinodkumar Prabhakaran and Davis Yadav and Amulya Yadav , year=. 2311.04345 , archivePrefix=
-
[45]
Liu, Tong and Venkatachalam, Akash and Sanjay Bongale, Pratik and Homan, Christopher , date =. Learning to. Companion. doi:10.1145/3308560.3317082 , url =
-
[46]
Uma, Alexandra N. and Fornaciari, Tommaso and Hovy, Dirk and Paun, Silviu and Plank, Barbara and Poesio, Massimo , date =. Learning from. 2021 , journal =. doi:10.1613/jair.1.12752 , url =
-
[47]
Hettiachchi, Danula and Holcombe-James, Indigo and Livingstone, Stephanie and Silva, Anjalee de and Lease, Matthew and Salim, Flora D. and Sanderson, Mark , date =. How. 2023 , pages =. doi:10.1609/hcomp.v11i1.27546 , url =
-
[48]
1982 , publisher =
Attitudes Towards Language Variation: Social and Applied Contexts , series =. 1982 , publisher =
1982
-
[49]
Kircher, Ruth and Zipp, Lena , editor =. An. Research. 2022 , pages =. doi:10.1017/9781108867788.002 , url =
-
[50]
Ordinal Regression Models in Psychology: A Tutorial , shorttitle =
B. Ordinal Regression Models in Psychology: A Tutorial , shorttitle =. 2019 , journal =
2019
-
[51]
and Polson, Nicholas G
Carvalho, Carlos M. and Polson, Nicholas G. and Scott, James G. , year =. Handling. Proceedings of the
-
[52]
2017 , journal =
Sparsity Information and Regularization in the Horseshoe and Other Shrinkage Priors , author =. 2017 , journal =
2017
-
[53]
Paul-Christian Bürkner , journal =. 2017 , volume =. doi:10.18637/jss.v080.i01 , encoding =
-
[54]
2019 , journal =
Shrinkage Priors for. 2019 , journal =
2019
-
[55]
elfen: A Python Package for Efficient Linguistic Feature Extraction for Natural Language Datasets
Maurer, Maximilian. elfen: A Python Package for Efficient Linguistic Feature Extraction for Natural Language Datasets. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 3: System Demonstrations). 2026. doi:10.18653/v1/2026.eacl-demo.5
-
[56]
2024 , url =
R: A Language and Environment for Statistical Computing , author =. 2024 , url =
2024
-
[57]
Bert Weijters and Elke Cabooter and Niels Schillewaert , keywords =. The effect of rating scale format on response styles: The number of response categories and response category labels , journal =. 2010 , issn =. doi:https://doi.org/10.1016/j.ijresmar.2010.02.004 , url =
-
[58]
Frontiers in psychology , volume=
Linguistically modulated perception and cognition: The label-feedback hypothesis , author=. Frontiers in psychology , volume=. 2012 , publisher=
2012
-
[59]
ISCED 2011 Operational Manual: Guidelines for Classifying National Education Programmes and Related Qualifications , author =. 2015 , publisher =. doi:10.1787/9789264228368-en , url =
-
[60]
Jan Kocoń and Alicja Figas and Marcin Gruza and Daria Puchalska and Tomasz Kajdanowicz and Przemysław Kazienko , keywords =. Offensive, aggressive, and hate speech analysis: From data-centric to human-centered approach , journal =. 2021 , issn =. doi:https://doi.org/10.1016/j.ipm.2021.102643 , url =
-
[61]
Jesse Graham and Jonathan Haidt and Sena Koleva and Matt Motyl and Ravi Iyer and Sean P. Wojcik and Peter H. Ditto , keywords =. Moral Foundations Theory: The Pragmatic Validity of Moral Pluralism , editor =. Advances in Experimental Social Psychology , publisher =. 2013 , issn =. doi:https://doi.org/10.1016/B978-0-12-407236-7.00002-4 , url =
-
[62]
Abercrombie, Gavin and Hovy, Dirk and Prabhakaran, Vinodkumar. Temporal and Second Language Influence on Intra-Annotator Agreement and Stability in Hate Speech Labelling. Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII). 2023. doi:10.18653/v1/2023.law-1.10
-
[63]
We need to consider disagreement in evaluation
Basile, Valerio and Fell, Michael and Fornaciari, Tommaso and Hovy, Dirk and Paun, Silviu and Plank, Barbara and Poesio, Massimo and Uma, Alexandra. We Need to Consider Disagreement in Evaluation. Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future. 2021. doi:10.18653/v1/2021.bppf-1.3
-
[64]
Order Effects in Annotation Tasks: Further Evidence of Annotation Sensitivity
Beck, Jacob and Eckman, Stephanie and Ma, Bolei and Chew, Rob and Kreuter, Frauke. Order Effects in Annotation Tasks: Further Evidence of Annotation Sensitivity. Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024). 2024. doi:10.18653/v1/2024.uncertainlp-1.8
-
[65]
Beck, Tilman and Schuff, Hendrik and Lauscher, Anne and Gurevych, Iryna. Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.eacl-long.159
-
[66]
Davani, Aida and D \'i az, Mark and Baker, Dylan and Prabhakaran, Vinodkumar. D 3 CODE : Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1029
-
[67]
When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks
Fleisig, Eve and Abebe, Rediet and Klein, Dan. When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.415
-
[68]
Intersectionality in AI Safety: Using Multilevel Models to Understand Diverse Perceptions of Safety in Conversational AI
Homan, Christopher and Serapio-Garcia, Gregory and Aroyo, Lora and Diaz, Mark and Parrish, Alicia and Prabhakaran, Vinodkumar and Taylor, Alex and Wang, Ding. Intersectionality in AI Safety: Using Multilevel Models to Understand Diverse Perceptions of Safety in Conversational AI. Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspe...
2024
-
[69]
Annotation Sensitivity: Training Data Collection Methods Affect Model Performance
Kern, Christoph and Eckman, Stephanie and Beck, Jacob and Chew, Rob and Ma, Bolei and Kreuter, Frauke. Annotation Sensitivity: Training Data Collection Methods Affect Model Performance. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.992
-
[70]
Reconsidering Annotator Disagreement about Racist Language: Noise or Signal?
Larimore, Savannah and Kennedy, Ian and Haskett, Breon and Arseniev-Koehler, Alina. Reconsidering Annotator Disagreement about Racist Language: Noise or Signal?. Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media. 2021. doi:10.18653/v1/2021.socialnlp-1.7
-
[71]
and Nivre, Joakim and Zeman, Daniel
de Marneffe, Marie-Catherine and Manning, Christopher D. and Nivre, Joakim and Zeman, Daniel. U niversal D ependencies. Computational Linguistics. 2021. doi:10.1162/coli_a_00402
-
[72]
Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 E nglish Words
Mohammad, Saif. Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 E nglish Words. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1017
-
[73]
Word Affect Intensities
Mohammad, Saif. Word Affect Intensities. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018
2018
-
[74]
Emotions Evoked by Common Words and Phrases: Using M echanical T urk to Create an Emotion Lexicon
Mohammad, Saif and Turney, Peter. Emotions Evoked by Common Words and Phrases: Using M echanical T urk to Create an Emotion Lexicon. Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. 2010
2010
-
[75]
Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations
Davani, Aida and D \'i az, Mark and Prabhakaran, Vinodkumar. Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations. Transactions of the Association for Computational Linguistics. 2022. doi:10.1162/tacl_a_00449
-
[76]
Orlikowski, Matthias and Pei, Jiaxin and R. Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.104
-
[77]
The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics
Orlikowski, Matthias and R. The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2023. doi:10.18653/v1/2023.acl-short.88
-
[78]
Pei, Jiaxin and Jurgens, David. When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset. Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII). 2023. doi:10.18653/v1/2023.law-1.25
-
[79]
Is a bunch of words enough to detect disagreement in hateful content?
Rizzi, Giulia and Rosso, Paolo and Fersini, Elisabetta. Is a bunch of words enough to detect disagreement in hateful content?. Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation. 2025
2025
-
[80]
The Measuring Hate Speech Corpus: Leveraging Rasch Measurement Theory for Data Perspectivism
Sachdeva, Pratik and Barreto, Renata and Bacon, Geoff and Sahn, Alexander and von Vacano, Claudia and Kennedy, Chris. The Measuring Hate Speech Corpus: Leveraging Rasch Measurement Theory for Data Perspectivism. Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022. 2022
2022
-
[81]
doi: 10.18653/v1/2022.naacl-main.431
Sap, Maarten and Swayamdipta, Swabha and Vianna, Laura and Zhou, Xuhui and Choi, Yejin and Smith, Noah A. Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022. doi:10...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.