Recognition: 2 theorem links
· Lean TheoremThe Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation
Pith reviewed 2026-05-16 05:06 UTC · model grok-4.3
The pith
Human disagreement in data annotation is a signal of cultural diversity, not noise to be eliminated by consensus.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The foundational ground truth paradigm in machine learning rests on a positivistic fallacy that mischaracterizes human disagreement as technical noise rather than a vital sociotechnical signal. Systemic failures in positional legibility and the shift to human-as-verifier models with model-mediated annotations introduce anchoring bias that removes human voices from the loop. Geographic hegemony imposes Western norms, enforced by precarious workers who comply to avoid penalties. Disagreement should be reclaimed as a high-fidelity signal for culturally competent models.
What carries the argument
The consensus trap, where the drive for agreement in annotation practices combined with model mediation enforces a singular truth and discards subjective diversity.
If this is right
- Annotation processes that prioritize consensus will produce datasets lacking representation of non-Western perspectives.
- Models trained on such data will underperform on culturally diverse tasks due to embedded biases.
- Precarious data workers will continue to suppress their own subjectivity to meet requester expectations.
- Reclaiming disagreement requires new infrastructures that value pluralistic responses over singular labels.
Where Pith is reading between the lines
- Implementing pluralistic annotation could improve fairness in AI systems deployed globally.
- Future annotation tools might use disagreement metrics as a quality indicator rather than error rate.
- Research into model-mediated annotation should test for anchoring effects in controlled experiments.
Load-bearing premise
The analysis of papers from only seven specific venues between 2020 and 2025 fully represents the mechanisms in all data annotation practices.
What would settle it
A study that applies the same reflexive thematic analysis to papers from additional venues outside the seven selected and finds no evidence of positional legibility failures or model-mediated anchoring bias.
Figures
read the original abstract
In machine learning, "ground truth" refers to the assumed correct labels used to train and evaluate models. However, the foundational "ground truth" paradigm rests on a positivistic fallacy that treats human disagreement as technical noise rather than a vital sociotechnical signal. This systematic literature review analyzes research published between 2020 and 2025 across seven premier venues: ACL, AIES, CHI, CSCW, EAAMO, FAccT, and NeurIPS, investigating the mechanisms in data annotation practices that facilitate this "consensus trap". Our reflexive thematic analysis of 346 papers reveals that systemic failures in positional legibility, combined with the recent architectural shift toward human-as-verifier models, specifically the reliance on model-mediated annotations, introduce deep-seated anchoring bias and effectively remove human voices from the loop. We further demonstrate how geographic hegemony imposes Western norms as universal benchmarks, often enforced by the performative alignment of precarious data workers who prioritize requester compliance over honest subjectivity to avoid economic penalties. Critiquing the "noisy sensor" fallacy, where statistical models misdiagnose pluralism as error, we argue for reclaiming disagreement as a high-fidelity signal essential for building culturally competent models. To address these systemic tensions, we propose a roadmap for pluralistic annotation infrastructures that shift the objective from discovering a singular "right" answer to mapping the diversity of human experience.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript conducts a reflexive thematic analysis of 346 papers (2020–2025) from seven venues (ACL, AIES, CHI, CSCW, EAAMO, FAccT, NeurIPS) to argue that the 'ground truth' paradigm in ML data annotation rests on a positivistic fallacy that treats disagreement as technical noise. It identifies systemic failures in positional legibility and the shift to human-as-verifier / model-mediated annotation as sources of anchoring bias that suppress subjective human voices, enforce Western geographic hegemony, and penalize precarious workers for non-compliance; it critiques the 'noisy sensor' statistical framing and proposes pluralistic annotation infrastructures that map diversity rather than seek singular consensus.
Significance. If the thematic findings prove robust, the work offers a timely reframing of disagreement as high-fidelity signal rather than error, with direct implications for culturally competent model development and responsible AI data practices. The explicit roadmap for pluralistic infrastructures and the systematic scope across multiple venues constitute concrete strengths that could influence both research and industry annotation pipelines.
major comments (2)
- [Methods] Methods section: the reflexive thematic analysis provides no exact search strings, Boolean queries, inclusion/exclusion criteria, or inter-coder reliability metrics for the 346 papers. Without these details the reproducibility of the extracted themes (positional legibility failures, anchoring bias, geographic hegemony) cannot be evaluated, directly weakening support for the systemic claims.
- [Abstract and §4] Abstract and §4 (Findings): the diagnosis of a field-wide 'consensus trap' rests on literature drawn exclusively from seven venues that skew toward critical/sociotechnical scholarship. No evidence is presented that the identified mechanisms dominate in computer-vision pipelines, large-scale industry datasets, or non-Western annotation communities; this selection limits the warrant for generalizing to 'systemic' failures across all data annotation.
minor comments (2)
- [Abstract] Abstract: the venue list and paper count appear late; moving the scope statement earlier would improve immediate clarity.
- [Introduction] The term 'positional legibility' is introduced without an explicit definition or citation on first use, requiring readers to infer its meaning from later examples.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, providing the strongest honest defense of the manuscript while incorporating revisions for improved transparency and scope qualification where warranted.
read point-by-point responses
-
Referee: [Methods] Methods section: the reflexive thematic analysis provides no exact search strings, Boolean queries, inclusion/exclusion criteria, or inter-coder reliability metrics for the 346 papers. Without these details the reproducibility of the extracted themes (positional legibility failures, anchoring bias, geographic hegemony) cannot be evaluated, directly weakening support for the systemic claims.
Authors: We acknowledge that the original Methods section omitted explicit search strings, Boolean queries, and inclusion/exclusion criteria, which limits immediate reproducibility. Reflexive thematic analysis (per Braun & Clarke) is interpretive and does not use inter-coder reliability metrics, as themes emerge through iterative researcher engagement rather than consensus coding. In the revised manuscript we have added a new subsection with the exact search strings (e.g., combinations of 'data annotation' OR 'ground truth' OR 'labeling' AND 'disagreement' OR 'subjectivity' OR 'consensus'), Boolean operators applied to the seven venues' 2020–2025 proceedings, inclusion criteria (papers addressing sociotechnical aspects of annotation), and exclusion criteria (purely technical ML papers without human-centered analysis). This addition directly addresses the concern while preserving the reflexive stance. revision: yes
-
Referee: [Abstract and §4] Abstract and §4 (Findings): the diagnosis of a field-wide 'consensus trap' rests on literature drawn exclusively from seven venues that skew toward critical/sociotechnical scholarship. No evidence is presented that the identified mechanisms dominate in computer-vision pipelines, large-scale industry datasets, or non-Western annotation communities; this selection limits the warrant for generalizing to 'systemic' failures across all data annotation.
Authors: The seven venues were deliberately chosen because they constitute the primary academic outlets where sociotechnical critiques of data annotation, 'ground truth,' and related fairness issues are most extensively developed; the paper's scope is therefore the discourse within these venues rather than a claim of universality across all ML subfields. We do not present evidence that the mechanisms dominate computer-vision pipelines or non-Western industry settings, as that lies outside the sampled literature. In revision we have updated the abstract and §4 to explicitly qualify all claims as pertaining to the analyzed venues, added a dedicated limitations paragraph acknowledging the critical-scholarship skew, and included a forward-looking statement calling for complementary empirical work in industry and non-Western annotation communities. revision: yes
Circularity Check
No circularity: claims rest on external literature synthesis, not self-referential reduction
full rationale
The paper conducts a reflexive thematic analysis of 346 papers drawn from seven external venues (ACL, AIES, CHI, CSCW, EAAMO, FAccT, NeurIPS). Its core claims—positional legibility failures, anchoring bias from human-as-verifier models, geographic hegemony, and the noisy-sensor fallacy—are presented as interpretive findings from that corpus rather than as outputs of any fitted parameters, self-defined equations, or load-bearing self-citations that collapse back into the paper’s own inputs. No derivation step equates a “prediction” to a quantity constructed from the authors’ prior work or from the analysis itself by definition. The roadmap for pluralistic annotation infrastructures is offered as a forward proposal, not a retrofitted restatement of the reviewed material. The analysis therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Human disagreement during annotation constitutes a high-fidelity signal of diversity rather than technical noise or error
- domain assumption Papers published 2020-2025 in the seven listed premier venues are representative of broader data annotation practices
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction contradicts?
contradictsCONTRADICTS: the theorem conflicts with this paper passage, or marks a claim that would need revision before publication.
the foundational 'ground truth' paradigm rests on a positivistic fallacy that treats human disagreement as technical noise rather than a vital sociotechnical signal
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel contradicts?
contradictsCONTRADICTS: the theorem conflicts with this paper passage, or marks a claim that would need revision before publication.
shift the objective from discovering a singular 'right' answer to mapping the diversity of human experience
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Nuredin Ali Abdelkadir, Tianling Yang, Shivani Kapania, Meron Estefanos, Fasica Berhane Gebrekidan, Zecharias Zelalem, Messai Ali, Rishan Berhe, Dylan Baker, Zeerak Talat, Milagros Miceli, Alex Hanna, and Timnit Gebru. 2025. The Role of Expertise in Effectively Moderating Harmful Social Media Content.Proceedings of the 2025 CHI Conference on Human Factors...
-
[2]
Amina A. Abdu, Irene V. Pasquetto, and Abigail Z. Jacobs. 2023. An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature.Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(2023). doi:10.1145/3593013.3594083
-
[3]
Re-imagining algorithmic fairness in india and beyond
Rediet Abebe, Kehinde Aruleba, Abeba Birhane, Sara Kingsley, George Obaido, Sekou L. Remy, and Swathi Sadagopan. 2021. Narratives and Counternarratives on Data Sharing in Africa.Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (2021). doi:10.1145/3442188.3445897
-
[4]
Shubham Agarwal, Trung Bui, Joon-Young Lee, Ioannis Konstas, and Verena Rieser. 2020. History for Visual Dialog: Do we really need it?Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics(2020). doi:10.18653/v1/2020.acl-main.728
-
[5]
Junaid Ali, Preethi Lahoti, and Krishna P. Gummadi. 2021. Accounting for Model Uncertainty in Algorithmic Discrimination.Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society(2021). doi:10.1145/3461702.3462630
-
[6]
Jennifer Allen, Cameron Martel, and David G Rand. 2022. Birds of a feather don’t fact-check each other: Partisanship and the evaluation of news in Twitter’s Birdwatch crowdsourced fact-checking program.Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems(2022). doi:10.1145/3491102.3502040
-
[7]
Adriana Alvarado Garcia, Heloisa Candello, Karla Badillo-Urquiola, and Marisol Wong-Villacres. 2025. Emerging Data Practices: Data Work in the Era of Large Language Models.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3714069
-
[8]
Adriana Alvarado Garcia, Marisol Wong-Villacres, Milagros Miceli, Benjamín Hernández, and Christopher A Le Dantec. 2023. Mobilizing Social Media Data: Reflections of a Researcher Mediating between Data and Organization.Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(2023). doi:10.1145/3544548.3580916
-
[9]
Katrin Angerbauer, Nils Rodrigues, Rene Cutura, Seyda Öney, Nelusa Pathmanathan, Cristina Morariu, Daniel Weiskopf, and Michael Sedlmair. 2022. Accessibility for Color Vision Deficiencies: Challenges and Findings of a Large Scale Study on Paper Figures.Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems(2022). doi:10.1145/3491102.3502133
-
[10]
Ariful Islam Anik and Andrea Bunt. 2021. Data-Centric Explanations: Explaining Training Data of Machine Learning Systems to Promote Transparency.Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(2021). doi:10.1145/3411764.3445736
-
[11]
Samreen Anjum, Chi Lin, and Danna Gurari. 2021. CrowdMOT: Crowdsourcing Strategies for Tracking Multiple Objects in Videos. Proc. ACM Hum.-Comput. Interact.(2021). doi:10.1145/3434175
-
[12]
Paula Akemi Aoyagui, Kelsey Stemmler, Sharon A Ferguson, Young-Ho Kim, and Anastasia Kuzminykh. 2025. A Matter of Perspective(s): Contrasting Human and LLM Argumentation in Subjective Decision-Making on Subtle Sexism.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713248
- [13]
-
[14]
Zachary Arnold, Daniel S. Schiff, Kaylyn Jackson Schiff, Brian Love, Jennifer Melot, Neha Singh, Lindsay Jenkins, Ashley Lin, Konstantin Pilz, Ogadinma Enwereazu, and Tyler Girard. 2025. Introducing the AI Governance and Regulatory Archive (AGORA): An Analytic The Consensus Trap FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Infrastructure for Navigati...
work page 2025
-
[15]
Lora Aroyo, Alex Taylor, Mark Díaz, Christopher Homan, Alicia Parrish, Gregory Serapio-García, Vinodkumar Prabhakaran, and Ding Wang. 2023. DICES Dataset: Diversity in Conversational AI Evaluation for Safety.Advances in Neural Information Processing Systems (2023)
work page 2023
-
[16]
Anjana Arunkumar, Lace M. Padilla, and Chris Bryan. 2025. Lost in Translation: How Does Bilingualism Shape Reader Preferences for Annotated Charts?Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713380
-
[17]
Mina Arzaghi, Florian Carichon, and Golnoosh Farnadi. 2025. Understanding Intrinsic Socioeconomic Biases in Large Language Models. Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society(2025)
work page 2025
-
[18]
Anne Arzberger, Stefan Buijsman, Maria Luce Lupetti, Alessandro Bozzon, and Jie Yang. 2025. Nothing Comes without Its World - Practical Challenges of Aligning LLMs to Situated Human Values through RLHF.Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society(2025)
work page 2025
-
[19]
Wolf, Evelyn Duesterwald, Casey Dugan, Werner Geyer, and Darrell Reimer
Zahra Ashktorab, Michael Desmond, Josh Andres, Michael Muller, Narendra Nath Joshi, Michelle Brachman, Aabhas Sharma, Kristina Brimijoin, Qian Pan, Christine T. Wolf, Evelyn Duesterwald, Casey Dugan, Werner Geyer, and Darrell Reimer. 2021. AI-Assisted Human Labeling: Batching for Efficiency without Overreliance.Proc. ACM Hum.-Comput. Interact.(2021). doi:...
-
[20]
Ezra Awumey, Sauvik Das, and Jodi Forlizzi. 2024. A Systematic Review of Biometric Monitoring in the Workplace: Analyzing Socio-technical Harms in Development, Deployment and Use.Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(2024). doi:10.1145/3630106.3658945
- [21]
-
[22]
Tilman Beck, Ji-Ung Lee, Christina Viehmann, Marcus Maurer, Oliver Quiring, and Iryna Gurevych. 2021. Investigating label suggestions for opinion mining in German Covid-19 social media.Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1...
-
[23]
Tina Behzad, Mithilesh Kumar Singh, Anthony J. Ripa, and Klaus Mueller. 2025. FairPlay: A Collaborative Approach to Mitigate Bias in Datasets for Improved AI Fairness.Proc. ACM Hum.-Comput. Interact.(2025). doi:10.1145/3710982
-
[24]
Tadesse Destaw Belay, Ahmed Haj Ahmed, Alvin Grissom II, Iqra Ameer, Grigori Sidorov, Olga Kolesnikova, and Seid Muhie Yimam
-
[25]
doi:10.18653/v1/2025.acl-long.925
CULEMO: Cultural Lenses on Emotion - Benchmarking LLMs for Cross-Cultural Emotion Understanding.Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)(2025). doi:10.18653/v1/2025.acl-long.925
-
[26]
Frank Bentley, Kathleen O’Neill, Katie Quehl, and Danielle Lottridge. 2020. Exploring the Quality, Efficiency, and Representative Nature of Responses Across Multiple Survey Panels.Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(2020). doi:10.1145/3313831.3376671
-
[27]
Elena Beretta, Antonio Vetrò, Bruno Lepri, and Juan Carlos De Martin. 2021. Detecting discriminatory risk through data annotation based on Bayesian inferences.Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(2021). doi:10.1145/ 3442188.3445940
-
[28]
A. Stevie Bergman, Lisa Anne Hendricks, Maribeth Rauh, Boxi Wu, William Agnew, Markus Kunesch, Isabella Duan, Iason Gabriel, and William Isaac. 2023. Representation in AI Evaluations.Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(2023). doi:10.1145/3593013.3594019
-
[29]
Eshta Bhardwaj, Harshit Gujral, Siyi Wu, Ciara Zogheib, Tegan Maharaj, and Christoph Becker. 2024. Machine learning data practices through a data curation lens: An evaluation framework.Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(2024). doi:10.1145/3630106.3658955
-
[30]
Zhang, Connie Moon Sehat, and Tanushree Mitra
Md Momen Bhuiyan, Amy X. Zhang, Connie Moon Sehat, and Tanushree Mitra. 2020. Investigating Differences in Crowdsourced News Credibility Assessment: Raters, Tasks, and Expert Criteria.Proc. ACM Hum.-Comput. Interact.(2020). doi:10.1145/3415164
-
[31]
Nanyi Bi, Yi-Ching (Janet) Huang, Chao-Chun Han, and Jane Yung-jen Hsu. 2023. You Know What I Meme: Enhancing People’s Understanding and Awareness of Hateful Memes Using Crowdsourced Explanations.Proc. ACM Hum.-Comput. Interact.(2023). doi:10.1145/3579593
-
[32]
Brown, Johnathan Flowers, Anthony Ventresque, and Christopher L
Abeba Birhane, Elayne Ruane, Thomas Laurent, Matthew S. Brown, Johnathan Flowers, Anthony Ventresque, and Christopher L. Dancy
-
[33]
Elizabeth and Horowitz, Aaron and Selbst, Andrew , year=
The Forgotten Margins of AI Ethics.Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(2022). doi:10.1145/3531146.3533157
-
[34]
Borhane Blili-Hamelin and Leif Hancox-Li. 2023. Making Intelligence: Ethical Values in IQ and ML Benchmarks.Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(2023). doi:10.1145/3593013.3593996
-
[35]
Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, 5454–5476. d...
-
[36]
William Boag, Hassan Kané, Saumya Rawat, Jesse Wei, and Alexander Goehler. 2021. A Pilot Study in Surveying Clinical Judgments to Evaluate Radiology Report Generation.Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(2021). doi:10.1145/3442188.3445909
-
[37]
Angie Boggust, Hyemin Bang, Hendrik Strobelt, and Arvind Satyanarayan. 2025. Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713406
-
[38]
Elizabeth Bondi, Lily Xu, Diana Acosta-Navas, and Jackson A. Killian. 2021. Envisioning Communities: A Participatory Approach Towards AI for Social Good.Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society(2021). doi:10.1145/3461702.3462612
-
[39]
Ashley Boone, Annabel Rothschild, Xander Koo, Grace Pfohl, Alyssa Sheehan, Betsy DiSalvo, Christopher A Le Dantec, and Carl DiSalvo
-
[40]
Reimagining Meaningful Data Work through Citizen Science.Proc. ACM Hum.-Comput. Interact.(2024). doi:10.1145/3687049
-
[41]
Karen L. Boyd. 2021. Datasheets for Datasets help ML Engineers Notice and Understand Ethical Issues in Training Data.Proc. ACM Hum.-Comput. Interact.(2021). doi:10.1145/3479582
-
[42]
Michelle Brachman, Zahra Ashktorab, Michael Desmond, Evelyn Duesterwald, Casey Dugan, Narendra Nath Joshi, Qian Pan, and Aabhas Sharma. 2022. Reliance and Automation for Human-AI Collaborative Data Labeling Conflict Resolution.Proc. ACM Hum.-Comput. Interact.(2022). doi:10.1145/3555212
-
[43]
Venetia Brown, Retno Larasati, Aisling Third, and Tracie Farrell. 2025. A Qualitative Study on Cultural Hegemony and the Impacts of AI.Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society(2025)
work page 2025
-
[44]
Keith Burghardt, Tad Hogg, Raissa D’Souza, Kristina Lerman, and Marton Posfai. 2020. Origins of Algorithmic Instabilities in Crowdsourced Ranking.Proc. ACM Hum.-Comput. Interact.(2020). doi:10.1145/3415237
-
[45]
Maarten Buyl, Hadi Khalaf, Claudio Mayrink Verdun, Lucas Monteiro Paes, Caio Cesar Vieira Machado, and Flavio du Pin Calmon
-
[46]
AI Alignment at Your Discretion.Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency(2025). doi:10.1145/3715275.3732194
-
[47]
Ángel Alexander Cabrera, Adam Perer, and Jason I. Hong. 2023. Improving Human-AI Collaboration With Descriptions of AI Behavior. Proc. ACM Hum.-Comput. Interact.(2023). doi:10.1145/3579612
-
[48]
Kathleen Cachel and Elke Rundensteiner. 2023. Fairer Together: Mitigating Disparate Exposure in Kemeny Rank Aggregation. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(2023). doi:10.1145/3593013.3594085
-
[49]
Kathleen Cachel and Elke Rundensteiner. 2025. Group Fair Rated Preference Aggregation: Ties Are (Mostly) All You Need.Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency(2025). doi:10.1145/3715275.3732042
-
[50]
Nitay Calderon, Roi Reichart, and Rotem Dror. 2025. The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs.Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)(2025). doi:10.18653/v1/2025.acl-long.782
-
[51]
Scott Allen Cambo and Darren Gergle. 2022. Model Positionality and Computational Reflexivity: Promoting Reflexivity in Data Science. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems(2022). doi:10.1145/3491102.3501998
-
[52]
Yang Trista Cao and Hal Daumé III. 2020. Toward Gender-Inclusive Coreference Resolution.Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics(2020). doi:10.18653/v1/2020.acl-main.418
-
[53]
Angela Carrera-Rivera, William Ochoa, Felix Larrinaga, and Ganix Lasa. 2022. How-to conduct a systematic literature review: A quick guide for computer science research.MethodsX9 (2022), 101895. doi:10.1016/j.mex.2022.101895
-
[54]
Silvia Casola, Simona Frenda, Soda Marem Lo, Erhan Sezerer, Antonio Uva, Valerio Basile, Cristina Bosco, Alessandro Pedrani, Chiara Rubagotti, Viviana Patti, and Davide Bernardi. 2024. MultiPICo: Multilingual Perspectivist Irony Corpus.Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)(2024). do...
-
[55]
Asil Çetin, Torsten Moeller, and Thomas Torsney-Weir. 2021. CorpSum: Towards an Enabling Tool-Design for Language Researchers to Explore, Analyze and Visualize Corpora.Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(2021). doi:10.1145/3411764.3445145
-
[56]
Tuhin Chakrabarty, Philippe Laban, and Chien-Sheng Wu. 2025. Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713559
-
[57]
Eunice Chan, Zhining Liu, Ruizhong Qiu, Yuheng Zhang, Ross Maciejewski, and Hanghang Tong. 2024. Group Fairness via Group Consensus.Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(2024). doi:10.1145/3630106.3659006
-
[58]
Taylor, Sara Heitlinger, and Ding Wang
Srravya Chandhiramowuli, Alex S. Taylor, Sara Heitlinger, and Ding Wang. 2024. Making Data Work Count.Proc. ACM Hum.-Comput. Interact.(2024). doi:10.1145/3637367
-
[59]
Chia-Ming Chang, Chia-Hsien Lee, and Takeo Igarashi. 2021. Spatial Labeling: Leveraging Spatial Layout for Improving Label Quality in Non-Expert Image Annotation.Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(2021). doi:10.1145/3411764.3445165
-
[60]
Cheng Chen and S. Shyam Sundar. 2023. Is this AI trained on Credible Data? The Effects of Labeling Quality and Performance Bias on User Trust.Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(2023). doi:10.1145/3544548.3580805 The Consensus Trap FAccT ’26, June 25–28, 2026, Montreal, QC, Canada
-
[61]
Justin Chen, Swarnadeep Saha, and Mohit Bansal. 2024. ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs.Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2024). doi:10.18653/v1/2024.acl-long.381
-
[62]
Quan Ze Chen, Daniel S. Weld, and Amy X. Zhang. 2021. Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty.Proc. ACM Hum.-Comput. Interact.(2021). doi:10.1145/3476076
-
[63]
Quan Ze Chen and Amy X. Zhang. 2023. Judgment Sieve: Reducing Uncertainty in Group Judgments through Interventions Targeting Ambiguity versus Disagreement.Proc. ACM Hum.-Comput. Interact.(2023). doi:10.1145/3610074
-
[64]
Evgenia Christoforou, Pinar Barlas, and Jahna Otterbacher. 2021. It’s About Time: A View of Crowdsourced Data Before and During the Pandemic.Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(2021). doi:10.1145/3411764.3445317
-
[65]
Seong Yeub Chu, Jong Woo Kim, and Mun Yong Yi. 2025. Think Together and Work Better: Combining Humans’ and LLMs’ Think- Aloud Outcomes for Effective Text Evaluation.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713181
-
[66]
Chaeyeon Chung, Jungsoo Lee, Kyungmin Park, Junsoo Lee, Minjae Kim, Mookyung Song, Yeonwoo Kim, Jaegul Choo, and Sungsoo Ray Hong. 2021. Understanding Human-side Impact of Sampling Image Batches in Subjective Attribute Labeling.Proc. ACM Hum.-Comput. Interact.(2021). doi:10.1145/3476037
-
[67]
Caroline Claisse, Alison K Osborne, Elizabeth Sillence, Angela Glascott, Alisdair S Cameron, and Abigail C Durrant. 2025. Exploring Alternative Socio-Technical Systems for Careful Data Work in Recovery Contexts.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713537
-
[68]
Darren Cook, Miri Zilka, Heidi DeSandre, Susan Giles, and Simon Maskell. 2023. Protecting Children from Online Exploitation: Can a Trained Model Detect Harmful Communication Strategies?Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (2023). doi:10.1145/3600211.3604696
-
[69]
María Andrea Cruz Blandón, Jayasimha Talur, Bruno Charron, Dong Liu, Saab Mansour, and Marcello Federico. 2025. MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation.Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)(2025). doi:10.18653/v1/2025.acl-long.1101
-
[70]
Shiva Darian, Aarjav Chauhan, Ricky Marton, Janet Ruppert, Kathleen Anderson, Ryan Clune, Madeline Cupchak, Max Gannett, Joel Holton, Elizabeth Kamas, Jason Kibozi-Yocka, Devin Mauro-Gallegos, Simon Naylor, Meghan O’Malley, Mehul Patel, Jack Sandberg, Troy Siegler, Ryan Tate, Abigil Temtim, Samantha Whaley, and Amy Voida. 2023. Enacting Data Feminism in A...
-
[71]
Aida Davani, Mark Díaz, Dylan Baker, and Vinodkumar Prabhakaran. 2024. Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates.Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(2024). doi:10.1145/3630106. 3659021
-
[72]
A. P. Dawid and A. M. Skene. 1979. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm.Journal of the Royal Statistical Society: Series C (Applied Statistics)28, 1 (1979), 20–28
work page 1979
-
[73]
Ankolika De, Shaheen Kanthawala, and Jessica Maddox. 2025. Who Gets Heard? Calling Out the ’Hard-to-Reach’ Myth for Non-WEIRD Populations’ Recruitment and Involvement in Research.Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency(2025). doi:10.1145/3715275.3732055
-
[74]
Nathan Dennler, Anaelia Ovalle, Ashwin Singh, Luca Soldaini, Arjun Subramonian, Huy Tu, William Agnew, Avijit Ghosh, Kyra Yee, Irene Font Peradejordi, Zeerak Talat, Mayra Russo, and Jess De Jesus De Pinho Pinhal. 2023. Bound by the Bounty: Collaboratively Shaping Evaluation Processes for Queer AI Harms.Proceedings of the 2023 AAAI/ACM Conference on AI, Et...
-
[75]
Emily L. Denton, M. C. D’iaz, Ian Kivlichan, Vinodkumar Prabhakaran, and Rachel Rosen. 2021. Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation.ArXivabs/2112.04554 (2021). https://api.semanticscholar.org/ CorpusID:245005939
-
[76]
Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta. 2021. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation.Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(2021). doi:10.1145/3442188.3445924
-
[77]
Mark Díaz, Ian Kivlichan, Rachel Rosen, Dylan Baker, Razvan Amironesei, Vinodkumar Prabhakaran, and Remi Denton. 2022. CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation.Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(2022). doi:10.1145/3531146.3534647
-
[78]
Mark Díaz and Angela D. R. Smith. 2025. What Makes an Expert? Reviewing How ML Researchers Define ’Expert’.Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society(2025)
work page 2025
-
[79]
Yi Ding, Jacob You, Tonja-Katrin Machulla, Jennifer Jacobs, Pradeep Sen, and Tobias Höllerer. 2022. Impact of Annotator Demographics on Sentiment Dataset Labeling.Proc. ACM Hum.-Comput. Interact.(2022). doi:10.1145/3555632
-
[80]
Eccles, Niels van Berkel, and Vassilis Kostakos
Tilman Dingler, Benjamin Tag, David A. Eccles, Niels van Berkel, and Vassilis Kostakos. 2022. Method for Appropriating the Brief Implicit Association Test to Elicit Biases in Users.Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Munir, et al. (2022). doi:10.1145/3491102.3517570
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.