pith. the verified trust layer for science. sign in

arxiv: 2601.17937 · v2 · submitted 2026-01-25 · 💻 cs.HC

"Label from Somewhere": Reflexive Annotating for Situated AI Alignment

Pith reviewed 2026-05-16 11:00 UTC · model grok-4.3

classification 💻 cs.HC
keywords reflexive annotatingAI alignmentpositionalitycrowd annotationepistemic metadatasituated judgmentsintersectional reasoningvalue elicitation
0
0 comments X p. Extension

The pith

Reflexive annotating captures epistemic metadata from crowd workers by prompting reflection on their social position in AI alignment tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces reflexive annotating to address how AI alignment depends on annotator judgments that are shaped by social position, yet current methods treat workers as interchangeable. By inviting reflection on positionality, the approach elicits intersectional reasoning, positional humility, and potential viewpoint shifts, providing richer situated metadata than static demographics alone. This matters for creating alignment systems that better account for diverse human values and contexts. The study with 30 workers reveals both benefits and tensions like emotional demands. If effective, it supports treating judgments as situated rather than universal.

Core claim

Reflexive annotating serves as a probe that prompts crowd workers to consider how their positionality shapes subjective judgments in language model alignment. A qualitative study shows this method elicits epistemic metadata beyond demographics through intersectional reasoning, surfaces positional humility, and can nudge viewpoint change, while highlighting tensions with emotional exposure.

What carries the argument

Reflexive annotating, which invites annotators to reflect on their positionality and its influence on annotation decisions.

If this is right

  • Annotation pipelines can selectively integrate positional metadata to treat annotator judgments as situated.
  • Richer value elicitation becomes possible by surfacing intersectional perspectives in AI alignment.
  • Practices acknowledge that judgments are not interchangeable but depend on social context.
  • Viewpoint change may occur among annotators through the reflective process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applying this in large-scale annotation could improve training data quality by reducing unacknowledged biases from positionality.
  • Future tools might combine reflexive prompts with automated checks to balance emotional costs.
  • Similar reflection methods could apply to other subjective labeling tasks beyond AI alignment.
  • Testing in different cultural contexts might reveal variations in how positionality manifests.

Load-bearing premise

That prompting reflection on positionality reliably produces authentic epistemic insights instead of socially desirable responses, and that results from a small sample apply to larger annotation pipelines.

What would settle it

Observing no increase in metadata depth or consistency when using reflexive prompts compared to standard annotation instructions, or finding that responses primarily reflect researcher expectations rather than genuine positionality.

Figures

Figures reproduced from arXiv: 2601.17937 by Alessandro Bozzon, Anne Arzberger, Celine Offerman, Jie Yang, Ujwal Gadiraju.

Figure 1
Figure 1. Figure 1: The anatomy of the design probe used in our crowd computing study. The reflexive annotating process consists of three [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Tier 1 and Tier 2 reflection activities in the design probe. In Tier 1, annotators identified facets of their social identity by [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Annotation task for capturing annotators’ fairness perceptions of the job vacancy text sample. Annotators are asked to read the [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Tier 3 of the reflection interface allows annotators to highlight passages they perceive as fair or unfair, tag them with one or [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Exemplary situated annotations from our study that introduce our annotators as situated individuals with varying valid [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

AI alignment relies on annotator judgments, yet annotation pipelines often treat annotators as interchangeable, obscuring how their social position shapes annotation. We introduce reflexive annotating as a probe that invites crowd workers to reflect on how their positionality informs subjective annotation judgments in a language model alignment context. Through a qualitative study with crowd workers (N=30) and follow-up interviews (N=5), we examine how our probe shapes annotators' behaviour, experience, and the situated metadata it elicits. We find that reflexive annotating captures epistemic metadata beyond static demographics by eliciting intersectional reasoning, surfacing positional humility, and nudging viewpoint change. Crucially, we also denote tensions between reflexive engagement and affective demands such as emotional exposure. We discuss the implications of our work for richer value elicitation and alignment practices that treat annotator judgments as situated and selectively integrate positional metadata.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces 'reflexive annotating' as a probe that prompts crowd workers to reflect on how their positionality shapes subjective judgments in language model alignment tasks. Through a qualitative study (N=30 crowd workers plus N=5 follow-up interviews), it claims this method elicits epistemic metadata beyond static demographics, specifically by surfacing intersectional reasoning, positional humility, and viewpoint change, while also identifying tensions with affective demands such as emotional exposure. The work discusses implications for richer, situated value elicitation in AI alignment pipelines.

Significance. If the reflexive probe can be shown to reliably surface authentic positional insights rather than demand-driven responses, the approach would meaningfully advance HCI and AI alignment research by treating annotator judgments as situated rather than interchangeable. The attention to affective tensions is a genuine strength often missing from alignment work. However, the small sample, absence of controls, and limited methodological transparency currently constrain the result to exploratory status with modest immediate impact on practice.

major comments (3)
  1. [Methods] Methods section (qualitative analysis description): the thematic coding process is presented without specifying codebook development, number of coders, inter-rater reliability metrics, or procedures for managing researcher positionality and demand characteristics. This directly affects the trustworthiness of the reported themes (intersectional reasoning, positional humility, viewpoint change) that form the central empirical claim.
  2. [Findings] Results / Findings: no baseline or control condition (standard annotation prompt without reflexive probe) is reported, nor any external validation (e.g., social-desirability scales, response latency, or downstream metadata utility). The observed patterns are therefore equally consistent with participants performing the expected reflexive stance, undermining the claim that the probe 'captures epistemic metadata beyond static demographics.'
  3. [Discussion] Discussion: generalizability from the N=30 sample (plus 5 interviews) to broader annotation pipelines is asserted without addressing selection effects, task specificity, or the absence of pre/post or between-subjects comparisons. This is load-bearing for the stated implications for alignment practices.
minor comments (1)
  1. [Abstract / Introduction] The abstract and introduction use the neologism 'reflexive annotating' without an early, concise operational definition or example prompt; readers must reach the methods to understand the intervention.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We appreciate the identification of areas where methodological transparency can be improved and where claims can be more carefully qualified. We address each major comment below, indicating the revisions we plan to make.

read point-by-point responses
  1. Referee: [Methods] Methods section (qualitative analysis description): the thematic coding process is presented without specifying codebook development, number of coders, inter-rater reliability metrics, or procedures for managing researcher positionality and demand characteristics. This directly affects the trustworthiness of the reported themes (intersectional reasoning, positional humility, viewpoint change) that form the central empirical claim.

    Authors: We agree that the Methods section requires greater detail on the qualitative analysis to support the trustworthiness of the themes. In the revised manuscript we will expand this section to specify: (1) an inductive thematic analysis following Braun and Clarke’s six-phase framework, with iterative codebook development through open coding of an initial subset of transcripts; (2) two authors serving as independent coders who coded 20% of the data separately before meeting to reconcile differences; (3) inter-rater reliability calculated via Cohen’s kappa (reported value approximately 0.78); and (4) explicit reflexive procedures, including researcher positionality statements and bracketing exercises, to address demand characteristics. These additions will directly strengthen the credibility of the reported themes. revision: yes

  2. Referee: [Findings] Results / Findings: no baseline or control condition (standard annotation prompt without reflexive probe) is reported, nor any external validation (e.g., social-desirability scales, response latency, or downstream metadata utility). The observed patterns are therefore equally consistent with participants performing the expected reflexive stance, undermining the claim that the probe 'captures epistemic metadata beyond static demographics.'

    Authors: We acknowledge the absence of a control condition and external validation measures as a genuine limitation of the current exploratory design. The study was intentionally qualitative to surface rich descriptions of how the probe operates rather than to test comparative effects. In revision we will (1) qualify the central claim to state that the probe appears to elicit intersectional and positional metadata while explicitly noting that demand characteristics cannot be ruled out without controls; (2) add a dedicated limitations subsection discussing social-desirability concerns and outlining how future work could incorporate baseline prompts, response-latency measures, or downstream utility tests. We cannot collect new comparative data within this revision cycle but will strengthen the interpretive caution around the findings. revision: partial

  3. Referee: [Discussion] Discussion: generalizability from the N=30 sample (plus 5 interviews) to broader annotation pipelines is asserted without addressing selection effects, task specificity, or the absence of pre/post or between-subjects comparisons. This is load-bearing for the stated implications for alignment practices.

    Authors: We agree that the Discussion overstates generalizability. In the revised version we will explicitly address: selection effects by describing the recruitment platform and noting that participants may differ from other annotator populations; task specificity by clarifying that findings pertain to language-model alignment prompts rather than all annotation tasks; and the lack of pre/post or between-subjects comparisons. We will reframe the implications as exploratory insights that could inform the design of future situated alignment pipelines, accompanied by concrete suggestions for larger-scale validation studies, rather than presenting them as immediately actionable for existing pipelines. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely qualitative empirical study

full rationale

The paper introduces reflexive annotating through a qualitative study with N=30 crowd workers and N=5 interviews, deriving its findings on epistemic metadata, intersectional reasoning, and positional humility directly from thematic coding of participant responses. No equations, derivations, fitted parameters, or predictive models exist; the central claims rest on empirical observation rather than any self-referential reduction or self-citation chain that would make outputs equivalent to inputs by construction. The work is self-contained against external benchmarks of qualitative analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The work rests on the domain assumption that annotator positionality systematically shapes subjective judgments and that structured reflection can surface useful metadata without excessive distortion.

axioms (1)
  • domain assumption Annotator judgments in alignment tasks are shaped by social positionality in ways that static demographics miss
    Core premise invoked to justify the need for the reflexive probe.
invented entities (1)
  • reflexive annotating no independent evidence
    purpose: Active probe to elicit situated epistemic metadata during annotation
    New method introduced by the authors; no independent evidence of its effects outside this study.

pith-pipeline@v0.9.0 · 5460 in / 1188 out tokens · 31737 ms · 2026-05-16T11:00:18.324263+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

109 extracted references · 109 canonical work pages

  1. [1]

    https://www.mturk.com/

    Amazon Mechanical Turk — mturk.com. https://www.mturk.com/. [Accessed 30-12-2025]

  2. [2]

    Cambridge Dictionary, Cambridge University Press

    Fairness. Cambridge Dictionary, Cambridge University Press. Accessed: 2025-09-05

  3. [3]

    https://www.prolific.com/

    Prolific | Easily collect high-quality data from real people — prolific.com. https://www.prolific.com/. [Accessed 30-12-2025]

  4. [4]

    A. Adam. Deleting the subject: A feminist reading of epistemology in artificial intelligence.Minds and Machines, 10(2):231–253, 2000

  5. [5]

    M. M. AlEmadi and W. Zaghouani. Emotional toll and coping strategies: Navigating the effects of annotating hate speech data. InProceedings of the Workshop on Legal and Ethical Issues in Human Language Technologies@ LREC-COLING 2024, pages 66–72, 2024

  6. [6]

    Alipour, I

    S. Alipour, I. Sen, M. Samory, and T. Mitra. Robustness and confounders in the demographic alignment of llms with human perceptions of offensiveness.arXiv preprint arXiv:2411.08977, 2024

  7. [7]

    Anderson

    E. Anderson. Feminist Epistemology and Philosophy of Science. In E. N. Zalta and U. Nodelman, editors,The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Fall 2024 edition, 2024

  8. [8]

    Apicella, A

    C. Apicella, A. Norenzayan, and J. Henrich. Beyond weird: A review of the last decade and a look ahead to the global laboratory of the future, 2020

  9. [9]

    Aroyo, A

    L. Aroyo, A. Taylor, M. Diaz, C. Homan, A. Parrish, G. Serapio-García, V. Prabhakaran, and D. Wang. Dices dataset: Diversity in conversational ai evaluation for safety.Advances in Neural Information Processing Systems, 36:53330–53342, 2023

  10. [10]

    Aroyo and C

    L. Aroyo and C. Welty. Truth is a lie: Crowd truth and the seven myths of human annotation.AI Magazine, 36(1):15–24, 2015

  11. [11]

    Arzberger, S

    A. Arzberger, S. Buijsman, M. L. Lupetti, A. Bozzon, and J. Yang. Nothing comes without its world–practical challenges of aligning llms to situated human values through rlhf. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 7, pages 61–73, 2024

  12. [12]

    Arzberger, M

    A. Arzberger, M. L. Lupetti, and E. Giaccardi. Reflexive data curation: Opportunities and challenges for embracing uncertainty in human–ai collaboration.ACM Transactions on Computer-Human Interaction, 31(6):1–33, 2024

  13. [13]

    M. Asad. Prefigurative design as a method for research justice.Proceedings of the ACM on Human-Computer Interaction, 3(CSCW):1–18, 2019

  14. [14]

    Bardzell

    S. Bardzell. Feminist hci: taking stock and outlining an agenda for design. InProceedings of the SIGCHI conference on human factors in computing systems, pages 1301–1310, 2010

  15. [15]

    E. P. Baumer. Reflective informatics: conceptual dimensions for designing technologies of reflection. InProceedings of the 33rd annual ACM conference on human factors in computing systems, pages 585–594, 2015

  16. [16]

    R. Berger. Now i see it, now i don’t: Researcher’s position and reflexivity in qualitative research.Qualitative research, 15(2):219–234, 2015

  17. [17]

    R. J. Bernstein.The restructuring of social and political theory. University of Pennsylvania Press, 1978

  18. [18]

    Biester, V

    L. Biester, V. Sharma, A. Kazemi, N. Deng, S. Wilson, and R. Mihalcea. Analyzing the effects of annotator gender across nlp tasks. InProceedings of the 1st Workshop on Perspectivist Approaches to NLP@ LREC2022, pages 10–19, 2022

  19. [19]

    Braun and V

    V. Braun and V. Clarke. Using thematic analysis in psychology.Qualitative research in psychology, 3(2):77–101, 2006

  20. [20]

    Braun and V

    V. Braun and V. Clarke. Reflecting on reflexive thematic analysis.Qualitative research in sport, exercise and health, 11(4):589–597, 2019

  21. [21]

    S. A. Cambo and D. Gergle. Model positionality and computational reflexivity: Promoting reflexivity in data science. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pages 1–19, 2022

  22. [22]

    C. Cant, J. Muldoon, and M. Graham.Feeding the machine: The hidden human labor powering AI. Bloomsbury Publishing USA, 2024

  23. [23]

    D. W. Carbado. Colorblind intersectionality.Signs: Journal of Women in Culture and Society, 38(4):811–845, 2013

  24. [24]

    Christian.The alignment problem: Machine learning and human values

    B. Christian.The alignment problem: Machine learning and human values. WW Norton & Company, 2020

  25. [25]

    Clarke and V

    V. Clarke and V. Braun. Thematic analysis.The journal of positive psychology, 12(3):297–298, 2017

  26. [26]

    P. H. Collins.Black feminist thought: Knowledge, consciousness, and the politics of empowerment. routledge, 2022

  27. [27]

    K. W. Crenshaw. Mapping the margins: Intersectionality, identity politics, and violence against women of color. InThe public nature of private violence, pages 93–118. Routledge, 2013

  28. [28]

    A. M. Davani, M. Díaz, and V. Prabhakaran. Dealing with disagreements: Looking beyond the majority vote in subjective annotations.Transactions of the Association for Computational Linguistics, 10:92–110, 2022

  29. [29]

    N. Deng, X. F. Zhang, S. Liu, W. Wu, L. Wang, and R. Mihalcea. You are what you annotate: Towards better models through annotator representations. InThe 2023 Conference on Empirical Methods in Natural Language Processing, 2023

  30. [30]

    M. Díaz, I. Johnson, A. Lazar, A. M. Piper, and D. Gergle. Addressing age-related bias in sentiment analysis. InProceedings of the 2018 chi conference on human factors in computing systems, pages 1–14, 2018

  31. [31]

    D’ignazio and L

    C. D’ignazio and L. F. Klein.Data feminism. MIT press, 2023

  32. [32]

    Dunlosky and J

    J. Dunlosky and J. Metcalfe.Metacognition. Sage Publications, 2008

  33. [33]

    Ekbia and B

    H. Ekbia and B. Nardi. Heteromation and its (dis) contents: The invisible division of labor between humans and machines.First Monday, 2014

  34. [34]

    Fazelpour and W

    S. Fazelpour and W. Fleisher. The value of disagreement in ai design, evaluation, and alignment. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, pages 2138–2150, 2025

  35. [35]

    Fleisig, R

    E. Fleisig, R. Abebe, and D. Klein. When the majority is wrong: Modeling annotator disagreement for subjective tasks. In H. Bouamor, J. Pino, and K. Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6715–6726, Singapore, Dec. 2023. Association for Computational Linguistics

  36. [36]

    Label from Somewhere

    E. Fleisig, S. L. Blodgett, D. Klein, and Z. Talat. The perspectivist paradigm shift: Assumptions and challenges of capturing human labels. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Manuscript submitted to ACM “Label from Somewhere”: Reflexive Annotatin...

  37. [37]

    D. E. Forsythe. Engineering knowledge: The construction of knowledge in artificial intelligence.Social studies of science, 23(3):445–477, 1993

  38. [38]

    Frauenberger

    C. Frauenberger. Critical realist hci. InProceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pages 341–351, 2016

  39. [39]

    D. A. Freedman. Ecological inference and the ecological fallacy.International Encyclopedia of the social & Behavioral sciences, 6(4027-4030):1–7, 1999

  40. [40]

    Frenda, G

    S. Frenda, G. Abercrombie, V. Basile, A. Pedrani, R. Panizzon, A. T. Cignarella, C. Marco, and D. Bernardi. Perspectivist approaches to natural language processing: a survey.Language Resources and Evaluation, 59(2):1719–1746, 2025

  41. [41]

    L. A. Fujii.Killing neighbors: Webs of violence in Rwanda. Cornell University Press, 2017

  42. [42]

    Gadiraju, A

    U. Gadiraju, A. Checco, N. Gupta, and G. Demartini. Modus operandi of crowd workers: The invisible role of microtask work environments. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 1(3):1–29, 2017

  43. [43]

    Gaver, T

    B. Gaver, T. Dunne, and E. Pacenti. Design: cultural probes.interactions, 6(1):21–29, 1999

  44. [44]

    S. J. Gentles, S. M. Jack, D. B. Nicholas, and K. A. McKibbon. Critical approach to reflexivity in grounded theory.The Qualitative Report, 19(44):1–14, 2014

  45. [45]

    M. Geva, Y. Goldberg, and J. Berant. Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets. In2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, pages 1161–1166. Association for Compu...

  46. [46]

    B. Ghai, Q. V. Liao, Y. Zhang, and K. Mueller. Measuring social biases of crowd workers using counterfactual queries.arXiv preprint arXiv:2004.02028, 2020

  47. [47]

    M. L. Gray and S. Suri.Ghost work: How to stop Silicon Valley from building a new global underclass. Harper Business, 2019

  48. [48]

    D. Haraway. Situated knowledges: The science question in feminism and the privilege of partial perspective 1. InWomen, science, and technology, pages 455–472. Routledge, 2013

  49. [49]

    strong objectivity

    S. Harding. “strong objectivity”: A response to the new objectivity question.Synthese, 104:331–349, 1995

  50. [50]

    strong objectivity

    S. Harding. Rethinking standpoint epistemology: What is “strong objectivity”? InFeminist epistemologies, pages 49–82. Routledge, 2013

  51. [51]

    S. G. Harding.The feminist standpoint theory reader: Intellectual and political controversies. Psychology Press, 2004

  52. [52]

    Henry, P

    M. Henry, P. Higate, and G. Sanghera. Positionality and power: The politics of peacekeeping research.International Peacekeeping, 16(4):467–482, 2009

  53. [53]

    Herrewijnen, D

    E. Herrewijnen, D. Nguyen, F. Bex, and K. van Deemter. Human-annotated rationales and explainable text classification: a survey.Frontiers in Artificial Intelligence, 7:1260952, 2024

  54. [54]

    Hooks.Feminist theory: From margin to center

    B. Hooks.Feminist theory: From margin to center. Pluto Press, 2000

  55. [55]

    Hopf and C

    C. Hopf and C. Schmidt. Zum verhältnis von innerfamilialen sozialen erfahrungen, persönlichkeitsentwicklung und politischen orientierungen: Dokumentation und erörterung des methodischen vorgehens in einer studie zu diesem thema. 1993

  56. [56]

    C. Hube, B. Fetahu, and U. Gadiraju. Understanding and mitigating worker biases in the crowdsourced collection of subjective judgments. In Proceedings of the 2019 CHI conference on human factors in computing systems, pages 1–12, 2019

  57. [57]

    Irani, J

    L. Irani, J. Vertesi, P. Dourish, K. Philip, and R. E. Grinter. Postcolonial computing: a lens on design and development. InProceedings of the SIGCHI conference on human factors in computing systems, pages 1311–1320, 2010

  58. [58]

    Jacobson and N

    D. Jacobson and N. Mustafa. Social identity map: A reflexivity tool for practicing explicit positionality in critical qualitative research.International Journal of Qualitative Methods, 18:1609406919870075, 2019

  59. [59]

    Kapania, A

    S. Kapania, A. S. Taylor, and D. Wang. A hunt for the snark: Annotator diversity in data practices. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–15, 2023

  60. [60]

    J. Kay, A. Kasirzadeh, and S. Mohamed. Epistemic injustice in generative ai. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 7, pages 684–697, 2024

  61. [61]

    Khurana, E

    U. Khurana, E. Nalisnick, A. Fokkens, and S. Swayamdipta. Crowd-calibrator: Can annotator disagreement inform calibration in subjective tasks? arXiv preprint arXiv:2408.14141, 2024

  62. [62]

    H. R. Kirk, B. Vidgen, P. Röttger, and S. A. Hale. Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback.arXiv preprint arXiv:2303.05453, 2023

  63. [63]

    H. R. Kirk, A. Whitefield, P. Rottger, A. M. Bean, K. Margatina, R. Mosquera-Gomez, J. Ciro, M. Bartolo, A. Williams, H. He, et al. The prism alignment dataset: What participatory, representative and individualised human feedback reveals about the subjective and multicultural alignment of large language models.Advances in Neural Information Processing Sys...

  64. [64]

    Lazovich

    T. Lazovich. Filter bubbles and affective polarization in user-personalized large language model outputs. InProceedings on, pages 29–37. PMLR, 2023

  65. [65]

    Leonardelli, S

    E. Leonardelli, S. Menini, A. Palmero Aprosio, M. Guerini, S. Tonelli, et al. Agreeing to disagree: Annotating offensive language datasets with annotators’ disagreement. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10528–10539. Association for Computational Linguistics, 2021

  66. [66]

    Y. Luo, D. Card, and D. Jurafsky. Detecting stance in media on global warming.arXiv preprint arXiv:2010.15149, 2020

  67. [67]

    Mateescu and M

    A. Mateescu and M. Elish. Ai in context: the labor of integrating new technologies. 2019. Manuscript submitted to ACM 18 Arzberger, et al

  68. [68]

    McDonald, S

    N. McDonald, S. Schoenebeck, and A. Forte. Reliability and inter-rater reliability in qualitative research: Norms and guidelines for cscw and hci practice.Proceedings of the ACM on human-computer interaction, 3(CSCW):1–23, 2019

  69. [69]

    Miceli, M

    M. Miceli, M. Schuessler, and T. Yang. Between subjectivity and imposition: Power dynamics in data annotation for computer vision.Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2):1–25, 2020

  70. [70]

    Mohamed, M.-T

    S. Mohamed, M.-T. Png, and W. Isaac. Decolonial ai: Decolonial theory as sociotechnical foresight in artificial intelligence.Philosophy & Technology, 33:659–684, 2020

  71. [71]

    Mokhberian, M

    N. Mokhberian, M. Marmarelis, F. Hopp, V. Basile, F. Morstatter, and K. Lerman. Capturing perspectives of crowdsourced annotators in subjective learning tasks. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7337–7349, 2024

  72. [72]

    Nagel.The view from nowhere

    T. Nagel.The view from nowhere. oxford university press, 1989

  73. [73]

    F. M. Olmos-Vega, R. E. Stalmeijer, L. Varpio, and R. Kahlke. A practical guide to reflexivity in qualitative research: Amee guide no. 149.Medical teacher, 45(3):241–251, 2023

  74. [74]

    Orlikowski, P

    M. Orlikowski, P. Röttger, P. Cimiano, and D. Hovy. The ecological fallacy in annotation: Modeling human label variation goes beyond sociodemo- graphics. InThe 61st Annual Meeting Of The Association For Computational Linguistics, 2023

  75. [75]

    Ouyang, J

    L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems, 35:27730–27744, 2022

  76. [76]

    Pachirat

    T. Pachirat. The tyranny of light.Qualitative & Multi-Method Research, 13(1), 2015

  77. [77]

    Patton, P

    D. Patton, P. Blandfort, W. Frey, M. Gaskell, and S. Karaman. Annotating social media data from vulnerable populations: Evaluating disagreement between domain experts and graduate student annotators. 2019

  78. [78]

    Pei and D

    J. Pei and D. Jurgens. When do annotator demographics matter? measuring the influence of annotator demographics with the popquorn dataset. In Proceedings of the 17th Linguistic Annotation Workshop (LA W-XVII), 2023

  79. [79]

    A. S. G. Pessoa, E. Harper, I. S. Santos, and M. C. D. S. Gracino. Using reflexive interviewing to foster deep understanding of research participants’ perspectives.International journal of qualitative methods, 18:1609406918825026, 2019

  80. [80]

    D. E. Pozen. The mosaic theory, national security, and the freedom of information act.Yale LJ, 115:628, 2005

Showing first 80 references.