arxiv: 2602.11318 · v3 · submitted 2026-02-11 · 💻 cs.AI · cs.CL· cs.CY

Recognition: 2 theorem links

· Lean Theorem

The Consensus Trap: Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation

Sheza Munir , Benjamin Mah , Krisha Kalsi , Shivani Kapania , Julian Posada , Edith Law , Ding Wang , Syed Ishtiaque Ahmed

Authors on Pith no claims yet

Pith reviewed 2026-05-16 05:06 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CY

keywords data annotationground truthsubjectivityconsensus biaspluralistic annotationanchoring biashuman-AI collaborationAI ethics

0 comments

The pith

Human disagreement in data annotation is a signal of cultural diversity, not noise to be eliminated by consensus.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that the standard practice of seeking a single ground truth in machine learning datasets treats human subjectivity as error, which leads to biased models. Through a review of recent literature, it identifies how model-mediated annotations create anchoring bias and silence diverse human perspectives. Geographic and economic pressures further enforce Western norms. The authors call for annotation systems that map disagreement instead of suppressing it to build more culturally aware AI.

Core claim

The foundational ground truth paradigm in machine learning rests on a positivistic fallacy that mischaracterizes human disagreement as technical noise rather than a vital sociotechnical signal. Systemic failures in positional legibility and the shift to human-as-verifier models with model-mediated annotations introduce anchoring bias that removes human voices from the loop. Geographic hegemony imposes Western norms, enforced by precarious workers who comply to avoid penalties. Disagreement should be reclaimed as a high-fidelity signal for culturally competent models.

What carries the argument

The consensus trap, where the drive for agreement in annotation practices combined with model mediation enforces a singular truth and discards subjective diversity.

If this is right

Annotation processes that prioritize consensus will produce datasets lacking representation of non-Western perspectives.
Models trained on such data will underperform on culturally diverse tasks due to embedded biases.
Precarious data workers will continue to suppress their own subjectivity to meet requester expectations.
Reclaiming disagreement requires new infrastructures that value pluralistic responses over singular labels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Implementing pluralistic annotation could improve fairness in AI systems deployed globally.
Future annotation tools might use disagreement metrics as a quality indicator rather than error rate.
Research into model-mediated annotation should test for anchoring effects in controlled experiments.

Load-bearing premise

The analysis of papers from only seven specific venues between 2020 and 2025 fully represents the mechanisms in all data annotation practices.

What would settle it

A study that applies the same reflexive thematic analysis to papers from additional venues outside the seven selected and finds no evidence of positional legibility failures or model-mediated anchoring bias.

Figures

Figures reproduced from arXiv: 2602.11318 by Benjamin Mah, Ding Wang, Edith Law, Julian Posada, Krisha Kalsi, Sheza Munir, Shivani Kapania, Syed Ishtiaque Ahmed.

**Figure 1.** Figure 1: Overview of the methodology 2.1 Review Design and Scope 2.1.1 Defining research questions. The goal of this review is to interrogate the infrastructural barriers to realized justice in data annotation. We frame these barriers through two primary areas in the pipeline: the pre-annotation allocation gap, which concerns the mismatch between worker identity and data context, and the post-annotation representat… view at source ↗

**Figure 2.** Figure 2: PRISMA Flow Diagram of the Systematic Review, showing record identification, keyword filtration, and screening [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

In machine learning, "ground truth" refers to the assumed correct labels used to train and evaluate models. However, the foundational "ground truth" paradigm rests on a positivistic fallacy that treats human disagreement as technical noise rather than a vital sociotechnical signal. This systematic literature review analyzes research published between 2020 and 2025 across seven premier venues: ACL, AIES, CHI, CSCW, EAAMO, FAccT, and NeurIPS, investigating the mechanisms in data annotation practices that facilitate this "consensus trap". Our reflexive thematic analysis of 346 papers reveals that systemic failures in positional legibility, combined with the recent architectural shift toward human-as-verifier models, specifically the reliance on model-mediated annotations, introduce deep-seated anchoring bias and effectively remove human voices from the loop. We further demonstrate how geographic hegemony imposes Western norms as universal benchmarks, often enforced by the performative alignment of precarious data workers who prioritize requester compliance over honest subjectivity to avoid economic penalties. Critiquing the "noisy sensor" fallacy, where statistical models misdiagnose pluralism as error, we argue for reclaiming disagreement as a high-fidelity signal essential for building culturally competent models. To address these systemic tensions, we propose a roadmap for pluralistic annotation infrastructures that shift the objective from discovering a singular "right" answer to mapping the diversity of human experience.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript conducts a reflexive thematic analysis of 346 papers (2020–2025) from seven venues (ACL, AIES, CHI, CSCW, EAAMO, FAccT, NeurIPS) to argue that the 'ground truth' paradigm in ML data annotation rests on a positivistic fallacy that treats disagreement as technical noise. It identifies systemic failures in positional legibility and the shift to human-as-verifier / model-mediated annotation as sources of anchoring bias that suppress subjective human voices, enforce Western geographic hegemony, and penalize precarious workers for non-compliance; it critiques the 'noisy sensor' statistical framing and proposes pluralistic annotation infrastructures that map diversity rather than seek singular consensus.

Significance. If the thematic findings prove robust, the work offers a timely reframing of disagreement as high-fidelity signal rather than error, with direct implications for culturally competent model development and responsible AI data practices. The explicit roadmap for pluralistic infrastructures and the systematic scope across multiple venues constitute concrete strengths that could influence both research and industry annotation pipelines.

major comments (2)

[Methods] Methods section: the reflexive thematic analysis provides no exact search strings, Boolean queries, inclusion/exclusion criteria, or inter-coder reliability metrics for the 346 papers. Without these details the reproducibility of the extracted themes (positional legibility failures, anchoring bias, geographic hegemony) cannot be evaluated, directly weakening support for the systemic claims.
[Abstract and §4] Abstract and §4 (Findings): the diagnosis of a field-wide 'consensus trap' rests on literature drawn exclusively from seven venues that skew toward critical/sociotechnical scholarship. No evidence is presented that the identified mechanisms dominate in computer-vision pipelines, large-scale industry datasets, or non-Western annotation communities; this selection limits the warrant for generalizing to 'systemic' failures across all data annotation.

minor comments (2)

[Abstract] Abstract: the venue list and paper count appear late; moving the scope statement earlier would improve immediate clarity.
[Introduction] The term 'positional legibility' is introduced without an explicit definition or citation on first use, requiring readers to infer its meaning from later examples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, providing the strongest honest defense of the manuscript while incorporating revisions for improved transparency and scope qualification where warranted.

read point-by-point responses

Referee: [Methods] Methods section: the reflexive thematic analysis provides no exact search strings, Boolean queries, inclusion/exclusion criteria, or inter-coder reliability metrics for the 346 papers. Without these details the reproducibility of the extracted themes (positional legibility failures, anchoring bias, geographic hegemony) cannot be evaluated, directly weakening support for the systemic claims.

Authors: We acknowledge that the original Methods section omitted explicit search strings, Boolean queries, and inclusion/exclusion criteria, which limits immediate reproducibility. Reflexive thematic analysis (per Braun & Clarke) is interpretive and does not use inter-coder reliability metrics, as themes emerge through iterative researcher engagement rather than consensus coding. In the revised manuscript we have added a new subsection with the exact search strings (e.g., combinations of 'data annotation' OR 'ground truth' OR 'labeling' AND 'disagreement' OR 'subjectivity' OR 'consensus'), Boolean operators applied to the seven venues' 2020–2025 proceedings, inclusion criteria (papers addressing sociotechnical aspects of annotation), and exclusion criteria (purely technical ML papers without human-centered analysis). This addition directly addresses the concern while preserving the reflexive stance. revision: yes
Referee: [Abstract and §4] Abstract and §4 (Findings): the diagnosis of a field-wide 'consensus trap' rests on literature drawn exclusively from seven venues that skew toward critical/sociotechnical scholarship. No evidence is presented that the identified mechanisms dominate in computer-vision pipelines, large-scale industry datasets, or non-Western annotation communities; this selection limits the warrant for generalizing to 'systemic' failures across all data annotation.

Authors: The seven venues were deliberately chosen because they constitute the primary academic outlets where sociotechnical critiques of data annotation, 'ground truth,' and related fairness issues are most extensively developed; the paper's scope is therefore the discourse within these venues rather than a claim of universality across all ML subfields. We do not present evidence that the mechanisms dominate computer-vision pipelines or non-Western industry settings, as that lies outside the sampled literature. In revision we have updated the abstract and §4 to explicitly qualify all claims as pertaining to the analyzed venues, added a dedicated limitations paragraph acknowledging the critical-scholarship skew, and included a forward-looking statement calling for complementary empirical work in industry and non-Western annotation communities. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external literature synthesis, not self-referential reduction

full rationale

The paper conducts a reflexive thematic analysis of 346 papers drawn from seven external venues (ACL, AIES, CHI, CSCW, EAAMO, FAccT, NeurIPS). Its core claims—positional legibility failures, anchoring bias from human-as-verifier models, geographic hegemony, and the noisy-sensor fallacy—are presented as interpretive findings from that corpus rather than as outputs of any fitted parameters, self-defined equations, or load-bearing self-citations that collapse back into the paper’s own inputs. No derivation step equates a “prediction” to a quantity constructed from the authors’ prior work or from the analysis itself by definition. The roadmap for pluralistic annotation infrastructures is offered as a forward proposal, not a retrofitted restatement of the reviewed material. The analysis therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on qualitative assumptions about the nature of disagreement and bias rather than quantitative parameters or new entities.

axioms (2)

domain assumption Human disagreement during annotation constitutes a high-fidelity signal of diversity rather than technical noise or error
Invoked throughout the critique of the 'noisy sensor' fallacy and the call to reclaim disagreement.
domain assumption Papers published 2020-2025 in the seven listed premier venues are representative of broader data annotation practices
Basis for the systematic review scope and generalizability of findings.

pith-pipeline@v0.9.0 · 5576 in / 1391 out tokens · 65197 ms · 2026-05-16T05:06:32.652853+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction contradicts

?

contradicts
CONTRADICTS: the theorem conflicts with this paper passage, or marks a claim that would need revision before publication.

the foundational 'ground truth' paradigm rests on a positivistic fallacy that treats human disagreement as technical noise rather than a vital sociotechnical signal
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel contradicts

?

contradicts
CONTRADICTS: the theorem conflicts with this paper passage, or marks a claim that would need revision before publication.

shift the objective from discovering a singular 'right' answer to mapping the diversity of human experience

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

299 extracted references · 299 canonical work pages

[1]

Nuredin Ali Abdelkadir, Tianling Yang, Shivani Kapania, Meron Estefanos, Fasica Berhane Gebrekidan, Zecharias Zelalem, Messai Ali, Rishan Berhe, Dylan Baker, Zeerak Talat, Milagros Miceli, Alex Hanna, and Timnit Gebru. 2025. The Role of Expertise in Effectively Moderating Harmful Social Media Content.Proceedings of the 2025 CHI Conference on Human Factors...

work page doi:10.1145/3706598.3714010 2025
[2]

Abdu, Irene V

Amina A. Abdu, Irene V. Pasquetto, and Abigail Z. Jacobs. 2023. An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature.Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(2023). doi:10.1145/3593013.3594083

work page doi:10.1145/3593013.3594083 2023
[3]

Re-imagining algorithmic fairness in india and beyond

Rediet Abebe, Kehinde Aruleba, Abeba Birhane, Sara Kingsley, George Obaido, Sekou L. Remy, and Swathi Sadagopan. 2021. Narratives and Counternarratives on Data Sharing in Africa.Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (2021). doi:10.1145/3442188.3445897

work page doi:10.1145/3442188.3445897 2021
[4]

Shubham Agarwal, Trung Bui, Joon-Young Lee, Ioannis Konstas, and Verena Rieser. 2020. History for Visual Dialog: Do we really need it?Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics(2020). doi:10.18653/v1/2020.acl-main.728

work page doi:10.18653/v1/2020.acl-main.728 2020
[5]

Junaid Ali, Preethi Lahoti, and Krishna P. Gummadi. 2021. Accounting for Model Uncertainty in Algorithmic Discrimination.Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society(2021). doi:10.1145/3461702.3462630

work page doi:10.1145/3461702.3462630 2021
[6]

Jennifer Allen, Cameron Martel, and David G Rand. 2022. Birds of a feather don’t fact-check each other: Partisanship and the evaluation of news in Twitter’s Birdwatch crowdsourced fact-checking program.Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems(2022). doi:10.1145/3491102.3502040

work page doi:10.1145/3491102.3502040 2022
[7]

Adriana Alvarado Garcia, Heloisa Candello, Karla Badillo-Urquiola, and Marisol Wong-Villacres. 2025. Emerging Data Practices: Data Work in the Era of Large Language Models.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3714069

work page doi:10.1145/3706598.3714069 2025
[8]

Adriana Alvarado Garcia, Marisol Wong-Villacres, Milagros Miceli, Benjamín Hernández, and Christopher A Le Dantec. 2023. Mobilizing Social Media Data: Reflections of a Researcher Mediating between Data and Organization.Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(2023). doi:10.1145/3544548.3580916

work page doi:10.1145/3544548.3580916 2023
[9]

Katrin Angerbauer, Nils Rodrigues, Rene Cutura, Seyda Öney, Nelusa Pathmanathan, Cristina Morariu, Daniel Weiskopf, and Michael Sedlmair. 2022. Accessibility for Color Vision Deficiencies: Challenges and Findings of a Large Scale Study on Paper Figures.Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems(2022). doi:10.1145/3491102.3502133

work page doi:10.1145/3491102.3502133 2022
[10]

Ariful Islam Anik and Andrea Bunt. 2021. Data-Centric Explanations: Explaining Training Data of Machine Learning Systems to Promote Transparency.Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(2021). doi:10.1145/3411764.3445736

work page doi:10.1145/3411764.3445736 2021
[11]

Samreen Anjum, Chi Lin, and Danna Gurari. 2021. CrowdMOT: Crowdsourcing Strategies for Tracking Multiple Objects in Videos. Proc. ACM Hum.-Comput. Interact.(2021). doi:10.1145/3434175

work page doi:10.1145/3434175 2021
[12]

Paula Akemi Aoyagui, Kelsey Stemmler, Sharon A Ferguson, Young-Ho Kim, and Anastasia Kuzminykh. 2025. A Matter of Perspective(s): Contrasting Human and LLM Argumentation in Subjective Decision-Making on Subtle Sexism.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713248

work page doi:10.1145/3706598.3713248 2025
[13]

Riku Arakawa, Hiromu Yakura, and Masataka Goto. 2023. CatAlyst: Domain-Extensible Intervention for Preventing Task Procrastination Using Large Generative Models.Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(2023). doi:10.1145/ 3544548.3581133

work page arXiv 2023
[14]

Schiff, Kaylyn Jackson Schiff, Brian Love, Jennifer Melot, Neha Singh, Lindsay Jenkins, Ashley Lin, Konstantin Pilz, Ogadinma Enwereazu, and Tyler Girard

Zachary Arnold, Daniel S. Schiff, Kaylyn Jackson Schiff, Brian Love, Jennifer Melot, Neha Singh, Lindsay Jenkins, Ashley Lin, Konstantin Pilz, Ogadinma Enwereazu, and Tyler Girard. 2025. Introducing the AI Governance and Regulatory Archive (AGORA): An Analytic The Consensus Trap FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Infrastructure for Navigati...

work page 2025
[15]

Lora Aroyo, Alex Taylor, Mark Díaz, Christopher Homan, Alicia Parrish, Gregory Serapio-García, Vinodkumar Prabhakaran, and Ding Wang. 2023. DICES Dataset: Diversity in Conversational AI Evaluation for Safety.Advances in Neural Information Processing Systems (2023)

work page 2023
[16]

Padilla, and Chris Bryan

Anjana Arunkumar, Lace M. Padilla, and Chris Bryan. 2025. Lost in Translation: How Does Bilingualism Shape Reader Preferences for Annotated Charts?Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713380

work page doi:10.1145/3706598.3713380 2025
[17]

Mina Arzaghi, Florian Carichon, and Golnoosh Farnadi. 2025. Understanding Intrinsic Socioeconomic Biases in Large Language Models. Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society(2025)

work page 2025
[18]

Anne Arzberger, Stefan Buijsman, Maria Luce Lupetti, Alessandro Bozzon, and Jie Yang. 2025. Nothing Comes without Its World - Practical Challenges of Aligning LLMs to Situated Human Values through RLHF.Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society(2025)

work page 2025
[19]

Wolf, Evelyn Duesterwald, Casey Dugan, Werner Geyer, and Darrell Reimer

Zahra Ashktorab, Michael Desmond, Josh Andres, Michael Muller, Narendra Nath Joshi, Michelle Brachman, Aabhas Sharma, Kristina Brimijoin, Qian Pan, Christine T. Wolf, Evelyn Duesterwald, Casey Dugan, Werner Geyer, and Darrell Reimer. 2021. AI-Assisted Human Labeling: Batching for Efficiency without Overreliance.Proc. ACM Hum.-Comput. Interact.(2021). doi:...

work page doi:10.1145/3449163 2021
[20]

Ezra Awumey, Sauvik Das, and Jodi Forlizzi. 2024. A Systematic Review of Biometric Monitoring in the Workplace: Analyzing Socio-technical Harms in Development, Deployment and Use.Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(2024). doi:10.1145/3630106.3658945

work page doi:10.1145/3630106.3658945 2024
[21]

Teanna Barrett, Quanze Chen, and Amy Zhang. 2023. Skin Deep: Investigating Subjectivity in Skin Tone Annotations for Computer Vision Benchmark Datasets.Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(2023). doi:10.1145/ 3593013.3594114

work page arXiv 2023
[22]

Tilman Beck, Ji-Ung Lee, Christina Viehmann, Marcus Maurer, Oliver Quiring, and Iryna Gurevych. 2021. Investigating label suggestions for opinion mining in German Covid-19 social media.Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1...

work page doi:10.18653/v1/2021.acl-long.1 2021
[23]

Ripa, and Klaus Mueller

Tina Behzad, Mithilesh Kumar Singh, Anthony J. Ripa, and Klaus Mueller. 2025. FairPlay: A Collaborative Approach to Mitigate Bias in Datasets for Improved AI Fairness.Proc. ACM Hum.-Comput. Interact.(2025). doi:10.1145/3710982

work page doi:10.1145/3710982 2025
[24]

Tadesse Destaw Belay, Ahmed Haj Ahmed, Alvin Grissom II, Iqra Ameer, Grigori Sidorov, Olga Kolesnikova, and Seid Muhie Yimam

work page
[25]

doi:10.18653/v1/2025.acl-long.925

CULEMO: Cultural Lenses on Emotion - Benchmarking LLMs for Cross-Cultural Emotion Understanding.Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)(2025). doi:10.18653/v1/2025.acl-long.925

work page doi:10.18653/v1/2025.acl-long.925 2025
[26]

Frank Bentley, Kathleen O’Neill, Katie Quehl, and Danielle Lottridge. 2020. Exploring the Quality, Efficiency, and Representative Nature of Responses Across Multiple Survey Panels.Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(2020). doi:10.1145/3313831.3376671

work page doi:10.1145/3313831.3376671 2020
[27]

Elena Beretta, Antonio Vetrò, Bruno Lepri, and Juan Carlos De Martin. 2021. Detecting discriminatory risk through data annotation based on Bayesian inferences.Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(2021). doi:10.1145/ 3442188.3445940

work page arXiv 2021
[28]

Stevie Bergman, Lisa Anne Hendricks, Maribeth Rauh, Boxi Wu, William Agnew, Markus Kunesch, Isabella Duan, Iason Gabriel, and William Isaac

A. Stevie Bergman, Lisa Anne Hendricks, Maribeth Rauh, Boxi Wu, William Agnew, Markus Kunesch, Isabella Duan, Iason Gabriel, and William Isaac. 2023. Representation in AI Evaluations.Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(2023). doi:10.1145/3593013.3594019

work page doi:10.1145/3593013.3594019 2023
[29]

Eshta Bhardwaj, Harshit Gujral, Siyi Wu, Ciara Zogheib, Tegan Maharaj, and Christoph Becker. 2024. Machine learning data practices through a data curation lens: An evaluation framework.Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(2024). doi:10.1145/3630106.3658955

work page doi:10.1145/3630106.3658955 2024
[30]

Zhang, Connie Moon Sehat, and Tanushree Mitra

Md Momen Bhuiyan, Amy X. Zhang, Connie Moon Sehat, and Tanushree Mitra. 2020. Investigating Differences in Crowdsourced News Credibility Assessment: Raters, Tasks, and Expert Criteria.Proc. ACM Hum.-Comput. Interact.(2020). doi:10.1145/3415164

work page doi:10.1145/3415164 2020
[31]

Nanyi Bi, Yi-Ching (Janet) Huang, Chao-Chun Han, and Jane Yung-jen Hsu. 2023. You Know What I Meme: Enhancing People’s Understanding and Awareness of Hateful Memes Using Crowdsourced Explanations.Proc. ACM Hum.-Comput. Interact.(2023). doi:10.1145/3579593

work page doi:10.1145/3579593 2023
[32]

Brown, Johnathan Flowers, Anthony Ventresque, and Christopher L

Abeba Birhane, Elayne Ruane, Thomas Laurent, Matthew S. Brown, Johnathan Flowers, Anthony Ventresque, and Christopher L. Dancy

work page
[33]

Elizabeth and Horowitz, Aaron and Selbst, Andrew , year=

The Forgotten Margins of AI Ethics.Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(2022). doi:10.1145/3531146.3533157

work page doi:10.1145/3531146.3533157 2022
[34]

Borhane Blili-Hamelin and Leif Hancox-Li. 2023. Making Intelligence: Ethical Values in IQ and ML Benchmarks.Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(2023). doi:10.1145/3593013.3593996

work page doi:10.1145/3593013.3593996 2023
[35]

Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, 5454–5476. d...

work page doi:10.18653/v1/2020.acl-main.485 2020
[36]

William Boag, Hassan Kané, Saumya Rawat, Jesse Wei, and Alexander Goehler. 2021. A Pilot Study in Surveying Clinical Judgments to Evaluate Radiology Report Generation.Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(2021). doi:10.1145/3442188.3445909

work page doi:10.1145/3442188.3445909 2021
[37]

Angie Boggust, Hyemin Bang, Hendrik Strobelt, and Arvind Satyanarayan. 2025. Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713406

work page doi:10.1145/3706598.3713406 2025
[38]

Elizabeth Bondi, Lily Xu, Diana Acosta-Navas, and Jackson A. Killian. 2021. Envisioning Communities: A Participatory Approach Towards AI for Social Good.Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society(2021). doi:10.1145/3461702.3462612

work page doi:10.1145/3461702.3462612 2021
[39]

Ashley Boone, Annabel Rothschild, Xander Koo, Grace Pfohl, Alyssa Sheehan, Betsy DiSalvo, Christopher A Le Dantec, and Carl DiSalvo

work page
[40]

ACM Hum.-Comput

Reimagining Meaningful Data Work through Citizen Science.Proc. ACM Hum.-Comput. Interact.(2024). doi:10.1145/3687049

work page doi:10.1145/3687049 2024
[41]

Karen L. Boyd. 2021. Datasheets for Datasets help ML Engineers Notice and Understand Ethical Issues in Training Data.Proc. ACM Hum.-Comput. Interact.(2021). doi:10.1145/3479582

work page doi:10.1145/3479582 2021
[42]

Michelle Brachman, Zahra Ashktorab, Michael Desmond, Evelyn Duesterwald, Casey Dugan, Narendra Nath Joshi, Qian Pan, and Aabhas Sharma. 2022. Reliance and Automation for Human-AI Collaborative Data Labeling Conflict Resolution.Proc. ACM Hum.-Comput. Interact.(2022). doi:10.1145/3555212

work page doi:10.1145/3555212 2022
[43]

Venetia Brown, Retno Larasati, Aisling Third, and Tracie Farrell. 2025. A Qualitative Study on Cultural Hegemony and the Impacts of AI.Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society(2025)

work page 2025
[44]

Keith Burghardt, Tad Hogg, Raissa D’Souza, Kristina Lerman, and Marton Posfai. 2020. Origins of Algorithmic Instabilities in Crowdsourced Ranking.Proc. ACM Hum.-Comput. Interact.(2020). doi:10.1145/3415237

work page doi:10.1145/3415237 2020
[45]

Maarten Buyl, Hadi Khalaf, Claudio Mayrink Verdun, Lucas Monteiro Paes, Caio Cesar Vieira Machado, and Flavio du Pin Calmon

work page
[46]

doi:10.1145/3715275.3732194

AI Alignment at Your Discretion.Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency(2025). doi:10.1145/3715275.3732194

work page doi:10.1145/3715275.3732194 2025
[47]

Ángel Alexander Cabrera, Adam Perer, and Jason I. Hong. 2023. Improving Human-AI Collaboration With Descriptions of AI Behavior. Proc. ACM Hum.-Comput. Interact.(2023). doi:10.1145/3579612

work page doi:10.1145/3579612 2023
[48]

Kathleen Cachel and Elke Rundensteiner. 2023. Fairer Together: Mitigating Disparate Exposure in Kemeny Rank Aggregation. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(2023). doi:10.1145/3593013.3594085

work page doi:10.1145/3593013.3594085 2023
[49]

Kathleen Cachel and Elke Rundensteiner. 2025. Group Fair Rated Preference Aggregation: Ties Are (Mostly) All You Need.Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency(2025). doi:10.1145/3715275.3732042

work page doi:10.1145/3715275.3732042 2025
[50]

Nitay Calderon, Roi Reichart, and Rotem Dror. 2025. The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs.Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)(2025). doi:10.18653/v1/2025.acl-long.782

work page doi:10.18653/v1/2025.acl-long.782 2025
[51]

Scott Allen Cambo and Darren Gergle. 2022. Model Positionality and Computational Reflexivity: Promoting Reflexivity in Data Science. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems(2022). doi:10.1145/3491102.3501998

work page doi:10.1145/3491102.3501998 2022
[52]

Yang Trista Cao and Hal Daumé III. 2020. Toward Gender-Inclusive Coreference Resolution.Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics(2020). doi:10.18653/v1/2020.acl-main.418

work page doi:10.18653/v1/2020.acl-main.418 2020
[53]

Angela Carrera-Rivera, William Ochoa, Felix Larrinaga, and Ganix Lasa. 2022. How-to conduct a systematic literature review: A quick guide for computer science research.MethodsX9 (2022), 101895. doi:10.1016/j.mex.2022.101895

work page doi:10.1016/j.mex.2022.101895 2022
[54]

Silvia Casola, Simona Frenda, Soda Marem Lo, Erhan Sezerer, Antonio Uva, Valerio Basile, Cristina Bosco, Alessandro Pedrani, Chiara Rubagotti, Viviana Patti, and Davide Bernardi. 2024. MultiPICo: Multilingual Perspectivist Irony Corpus.Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)(2024). do...

work page doi:10.18653/v1/2024.acl-long.849 2024
[55]

Asil Çetin, Torsten Moeller, and Thomas Torsney-Weir. 2021. CorpSum: Towards an Enabling Tool-Design for Language Researchers to Explore, Analyze and Visualize Corpora.Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(2021). doi:10.1145/3411764.3445145

work page doi:10.1145/3411764.3445145 2021
[56]

Tuhin Chakrabarty, Philippe Laban, and Chien-Sheng Wu. 2025. Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713559

work page doi:10.1145/3706598.3713559 2025
[57]

Eunice Chan, Zhining Liu, Ruizhong Qiu, Yuheng Zhang, Ross Maciejewski, and Hanghang Tong. 2024. Group Fairness via Group Consensus.Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(2024). doi:10.1145/3630106.3659006

work page doi:10.1145/3630106.3659006 2024
[58]

Taylor, Sara Heitlinger, and Ding Wang

Srravya Chandhiramowuli, Alex S. Taylor, Sara Heitlinger, and Ding Wang. 2024. Making Data Work Count.Proc. ACM Hum.-Comput. Interact.(2024). doi:10.1145/3637367

work page doi:10.1145/3637367 2024
[59]

Chia-Ming Chang, Chia-Hsien Lee, and Takeo Igarashi. 2021. Spatial Labeling: Leveraging Spatial Layout for Improving Label Quality in Non-Expert Image Annotation.Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(2021). doi:10.1145/3411764.3445165

work page doi:10.1145/3411764.3445165 2021
[60]

Shyam Sundar

Cheng Chen and S. Shyam Sundar. 2023. Is this AI trained on Credible Data? The Effects of Labeling Quality and Performance Bias on User Trust.Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(2023). doi:10.1145/3544548.3580805 The Consensus Trap FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

work page doi:10.1145/3544548.3580805 2023
[61]

Justin Chen, Swarnadeep Saha, and Mohit Bansal. 2024. ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs.Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2024). doi:10.18653/v1/2024.acl-long.381

work page doi:10.18653/v1/2024.acl-long.381 2024
[62]

Weld, and Amy X

Quan Ze Chen, Daniel S. Weld, and Amy X. Zhang. 2021. Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty.Proc. ACM Hum.-Comput. Interact.(2021). doi:10.1145/3476076

work page doi:10.1145/3476076 2021
[63]

Quan Ze Chen and Amy X. Zhang. 2023. Judgment Sieve: Reducing Uncertainty in Group Judgments through Interventions Targeting Ambiguity versus Disagreement.Proc. ACM Hum.-Comput. Interact.(2023). doi:10.1145/3610074

work page doi:10.1145/3610074 2023
[64]

Evgenia Christoforou, Pinar Barlas, and Jahna Otterbacher. 2021. It’s About Time: A View of Crowdsourced Data Before and During the Pandemic.Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems(2021). doi:10.1145/3411764.3445317

work page doi:10.1145/3411764.3445317 2021
[65]

Seong Yeub Chu, Jong Woo Kim, and Mun Yong Yi. 2025. Think Together and Work Better: Combining Humans’ and LLMs’ Think- Aloud Outcomes for Effective Text Evaluation.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713181

work page doi:10.1145/3706598.3713181 2025
[66]

Chaeyeon Chung, Jungsoo Lee, Kyungmin Park, Junsoo Lee, Minjae Kim, Mookyung Song, Yeonwoo Kim, Jaegul Choo, and Sungsoo Ray Hong. 2021. Understanding Human-side Impact of Sampling Image Batches in Subjective Attribute Labeling.Proc. ACM Hum.-Comput. Interact.(2021). doi:10.1145/3476037

work page doi:10.1145/3476037 2021
[67]

Caroline Claisse, Alison K Osborne, Elizabeth Sillence, Angela Glascott, Alisdair S Cameron, and Abigail C Durrant. 2025. Exploring Alternative Socio-Technical Systems for Careful Data Work in Recovery Contexts.Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems(2025). doi:10.1145/3706598.3713537

work page doi:10.1145/3706598.3713537 2025
[68]

Darren Cook, Miri Zilka, Heidi DeSandre, Susan Giles, and Simon Maskell. 2023. Protecting Children from Online Exploitation: Can a Trained Model Detect Harmful Communication Strategies?Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (2023). doi:10.1145/3600211.3604696

work page doi:10.1145/3600211.3604696 2023
[69]

María Andrea Cruz Blandón, Jayasimha Talur, Bruno Charron, Dong Liu, Saab Mansour, and Marcello Federico. 2025. MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation.Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)(2025). doi:10.18653/v1/2025.acl-long.1101

work page doi:10.18653/v1/2025.acl-long.1101 2025
[70]

Shiva Darian, Aarjav Chauhan, Ricky Marton, Janet Ruppert, Kathleen Anderson, Ryan Clune, Madeline Cupchak, Max Gannett, Joel Holton, Elizabeth Kamas, Jason Kibozi-Yocka, Devin Mauro-Gallegos, Simon Naylor, Meghan O’Malley, Mehul Patel, Jack Sandberg, Troy Siegler, Ryan Tate, Abigil Temtim, Samantha Whaley, and Amy Voida. 2023. Enacting Data Feminism in A...

work page doi:10.1145/3579480 2023
[71]

Aida Davani, Mark Díaz, Dylan Baker, and Vinodkumar Prabhakaran. 2024. Disentangling Perceptions of Offensiveness: Cultural and Moral Correlates.Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(2024). doi:10.1145/3630106. 3659021

work page doi:10.1145/3630106 2024
[72]

A. P. Dawid and A. M. Skene. 1979. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm.Journal of the Royal Statistical Society: Series C (Applied Statistics)28, 1 (1979), 20–28

work page 1979
[73]

Ankolika De, Shaheen Kanthawala, and Jessica Maddox. 2025. Who Gets Heard? Calling Out the ’Hard-to-Reach’ Myth for Non-WEIRD Populations’ Recruitment and Involvement in Research.Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency(2025). doi:10.1145/3715275.3732055

work page doi:10.1145/3715275.3732055 2025
[74]

Nathan Dennler, Anaelia Ovalle, Ashwin Singh, Luca Soldaini, Arjun Subramonian, Huy Tu, William Agnew, Avijit Ghosh, Kyra Yee, Irene Font Peradejordi, Zeerak Talat, Mayra Russo, and Jess De Jesus De Pinho Pinhal. 2023. Bound by the Bounty: Collaboratively Shaping Evaluation Processes for Queer AI Harms.Proceedings of the 2023 AAAI/ACM Conference on AI, Et...

work page doi:10.1145/3600211.3604682 2023
[75]

Denton, M

Emily L. Denton, M. C. D’iaz, Ian Kivlichan, Vinodkumar Prabhakaran, and Rachel Rosen. 2021. Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation.ArXivabs/2112.04554 (2021). https://api.semanticscholar.org/ CorpusID:245005939

work page arXiv 2021
[76]

Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta. 2021. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation.Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(2021). doi:10.1145/3442188.3445924

work page doi:10.1145/3442188.3445924 2021
[77]

Mark Díaz, Ian Kivlichan, Rachel Rosen, Dylan Baker, Razvan Amironesei, Vinodkumar Prabhakaran, and Remi Denton. 2022. CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation.Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(2022). doi:10.1145/3531146.3534647

work page doi:10.1145/3531146.3534647 2022
[78]

Mark Díaz and Angela D. R. Smith. 2025. What Makes an Expert? Reviewing How ML Researchers Define ’Expert’.Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society(2025)

work page 2025
[79]

Yi Ding, Jacob You, Tonja-Katrin Machulla, Jennifer Jacobs, Pradeep Sen, and Tobias Höllerer. 2022. Impact of Annotator Demographics on Sentiment Dataset Labeling.Proc. ACM Hum.-Comput. Interact.(2022). doi:10.1145/3555632

work page doi:10.1145/3555632 2022
[80]

Eccles, Niels van Berkel, and Vassilis Kostakos

Tilman Dingler, Benjamin Tag, David A. Eccles, Niels van Berkel, and Vassilis Kostakos. 2022. Method for Appropriating the Brief Implicit Association Test to Elicit Biases in Users.Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Munir, et al. (2022). doi:10.1145/3491102.3517570

work page doi:10.1145/3491102.3517570 2022

Showing first 80 references.