Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics
Pith reviewed 2026-05-21 10:31 UTC · model grok-4.3
The pith
Community input systematizes cultural appropriateness to create rubrics for evaluating AI images of artifacts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Systematized concepts of cultural appropriateness developed with community members reflect their lived experiences with each artifact and their preferences for depictions of material culture, showing that community involvement at the definition stage produces valid measures for AI evaluation.
What carries the argument
The staged measurement process that places community systematization of cultural appropriateness before operationalization into rubrics and automated application.
Load-bearing premise
Perspectives from community engagement in the initial definition stage remain effective when converted into standardized rubrics for automatic use across many images and models.
What would settle it
Community members scoring the same AI-generated images with the new rubrics would reveal whether the scores match their independent views of cultural appropriateness.
Figures
read the original abstract
Measurement is essential to improving AI performance and mitigating harms for marginalized groups. As generative AI systems are rapidly deployed across geographies and contexts, AI measurement practices must be designed to support repeatable, automatable application across different models, datasets, and evaluation settings. But the drive to automate measurement can be in tension with the ability for measurement instruments to capture the expertise and perspectives of communities impacted by AI. Recent work advocates for breaking measurement into several key stages: first moving from an abstract concept to be measured into a precise, "systematized" concept; next operationalizing the systematized concept into a concrete measurement instrument; and finally applying the measurement instrument on data to produce measurements. This opens up an opportunity to concentrate community engagement in the systematization phase before operationalizing and applying measurement instruments. In this paper, we explore how to involve communities in systematizing the concept of "cultural appropriateness" in text-to-image models' representation of culturally significant artifacts through case studies with three communities: blind and low vision individuals residing in the UK, residents of Kerala, and residents of Tamil Nadu. Our systematized concepts reflect community members' lived experiences interacting with each artifact and how they want their material culture to be depicted, demonstrating the value of community involvement in defining valid measures. We explore how these systematized concepts can be operationalized into automated measurement instruments that could be applied using a multimodal LLM-as-a-judge approach and challenges that remain. We reflect on the benefits and limitations of such approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript explores involving communities in the systematization phase of measuring 'cultural appropriateness' for text-to-image models' depictions of cultural artifacts. Through case studies with blind and low vision individuals in the UK, residents of Kerala, and residents of Tamil Nadu, the authors develop systematized concepts drawn from participants' lived experiences and preferences for how their material culture should be represented. The work then examines operationalizing these concepts into automated instruments via a multimodal LLM-as-a-judge approach and reflects on benefits, limitations, and remaining challenges in achieving repeatable, automatable measurement across models and settings.
Significance. If the community-informed systematized concepts can be faithfully translated into LLM-usable rubrics without substantial loss of nuance or introduction of model-specific biases, the approach would meaningfully advance inclusive AI evaluation practices by reconciling automation needs with community expertise. The multi-community case studies provide concrete grounding for the claim that concentrating engagement in the systematization stage adds validity, and the explicit discussion of operationalization challenges is a constructive contribution to the broader measurement literature.
major comments (2)
- [§4] §4 (Operationalization and LLM-as-a-judge): The manuscript notes challenges in translating community criteria into automatable prompts or rubrics but supplies only high-level discussion rather than concrete examples of rubric items derived from specific community input (e.g., desired depictions for Kerala or Tamil Nadu artifacts) and their encoding as LLM scoring criteria. This step is load-bearing for the central claim that the approach enables repeatable, automatable instruments across settings without losing captured expertise.
- [§3] §3 (Case studies): The systematized concepts are presented as reflecting lived experiences, yet the text provides limited direct evidence—such as participant quotes, raw response summaries, or side-by-side comparisons of community input versus final systematized statements—to allow readers to evaluate the fidelity of the translation process.
minor comments (2)
- The abstract and introduction could more explicitly distinguish the three communities' distinct artifact types and cultural contexts to help readers track how findings generalize.
- Notation for the measurement stages (systematization, operationalization, application) is introduced clearly but could be reinforced with a small diagram or table summarizing the pipeline for each case study.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which identify key areas where additional detail would strengthen the manuscript's demonstration of the proposed approach. We respond to each major comment below, indicating planned revisions.
read point-by-point responses
-
Referee: [§4] §4 (Operationalization and LLM-as-a-judge): The manuscript notes challenges in translating community criteria into automatable prompts or rubrics but supplies only high-level discussion rather than concrete examples of rubric items derived from specific community input (e.g., desired depictions for Kerala or Tamil Nadu artifacts) and their encoding as LLM scoring criteria. This step is load-bearing for the central claim that the approach enables repeatable, automatable instruments across settings without losing captured expertise.
Authors: We agree that concrete examples are necessary to support the claim of faithful translation into automatable instruments. In the revised manuscript we will add specific rubric items drawn from the Kerala and Tamil Nadu case studies, including examples of desired depictions (such as accurate rendering of temple architecture or traditional motifs) and their direct encoding as LLM scoring criteria with sample prompts and scales. This will be presented alongside discussion of remaining challenges to avoid overstating generalizability. revision: yes
-
Referee: [§3] §3 (Case studies): The systematized concepts are presented as reflecting lived experiences, yet the text provides limited direct evidence—such as participant quotes, raw response summaries, or side-by-side comparisons of community input versus final systematized statements—to allow readers to evaluate the fidelity of the translation process.
Authors: We acknowledge the value of greater transparency in showing the translation process. The revised §3 will incorporate selected participant quotes, summarized raw responses, and side-by-side comparisons between community inputs and the final systematized statements for the three case studies, enabling readers to assess fidelity more directly while respecting participant confidentiality constraints. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper is a qualitative exploration of community involvement in systematizing concepts of cultural appropriateness for AI image generation via case studies. It contains no equations, fitted parameters, predictions, or self-referential derivations that reduce claims to author-defined inputs by construction. The central claims rest on direct community input rather than any load-bearing self-citation chain or renaming of prior results. This is the most common honest non-finding for self-contained qualitative work against external community benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Community members' lived experiences provide the authoritative basis for defining valid measures of cultural appropriateness in AI image generation.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We explore how to involve communities in systematizing the concept of 'cultural appropriateness' ... operationalized into automated measurement instruments that could be applied using a multimodal LLM-as-a-judge approach
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
breaking measurement into several key stages: first moving from an abstract concept ... into a precise, 'systematized' concept; next operationalizing ...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Robert Adcock and David Collier. 2001. Measurement validity: A shared standard for qualitative and quantitative research.American Political Science Review95, 3 (2001), 529–546
work page 2001
- [3]
-
[4]
I look at it as the king of knowledge
Rudaiba Adnin and Maitraye Das. 2024. "I look at it as the king of knowledge": How Blind People Use and Understand Generative AI Tools. In Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility(St. John’s, NL, Canada)(ASSETS ’24). Association for Computing Machinery, New York, NY, USA, Article 64, 14 pages. doi:10.11...
-
[5]
Afra Feyza Akyürek, Advait Gosai, Chen Bo Calvin Zhang, Vipul Gupta, Jaehwan Jeong, Anisha Gunjal, Tahseen Rabbani, Maria Mazzone, David Randolph, Mohammad Mahmoudi Meymand, Gurshaan Chattha, Paula Rodriguez, Diego Mares, Pavit Singh, Michael Liu, Subodh Chawla, Pete Cline, Lucy Ogaz, Ernesto Hernandez, Zihao Wang, Pavi Bhatter, Marcos Ayestaran, Bing Liu...
- [6]
-
[7]
Taylor, Mark Díaz, Christopher M
Lora Aroyo, Alex S. Taylor, Mark Díaz, Christopher M. Homan, Alicia Parrish, Greg Serapio-García, Vinodkumar Prabhakaran, and Ding Wang
-
[8]
DICES dataset: diversity in conversational AI evaluation for safety. InProceedings of the 37th International Conference on Neural Information Processing Systems(New Orleans, LA, USA)(NIPS ’23). Curran Associates Inc., Red Hook, NY, USA, Article 2321, 13 pages
-
[9]
Solon Barocas, Anhong Guo, Ece Kamar, Jacquelyn Krones, Meredith Ringel Morris, Jennifer Wortman Vaughan, W Duncan Wadsworth, and Hanna Wallach. 2021. Designing disaggregated evaluations of ai systems: Choices, considerations, and tradeoffs. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 368–378
work page 2021
-
[10]
Bennett, Erin Brady, and Stacy M
Cynthia L. Bennett, Erin Brady, and Stacy M. Branham. 2018. Interdependence as a Frame for Assistive Technology Research and Design. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility(Galway, Ireland)(ASSETS ’18). Association for Computing Machinery, New York, NY, USA, 161–173. doi:10.1145/3234695.3236348
-
[11]
Cynthia L. Bennett, Shaun K. Kane, and Christina N. Harrington. 2025. Toward Community-Led Evaluations of Text-to-Image AI Representations of Disability, Health, and Accessibility. InProceedings of the 5th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’25). Association for Computing Machinery, New York, NY, USA, 25...
-
[12]
Stevie Bergman, Nahema Marchal, John Mellor, Shakir Mohamed, Iason Gabriel, and William Isaac. 2024. STELA: a community-centred approach to norm elicitation for AI alignment.Scientific Reports14, 1 (2024), 6616
work page 2024
-
[13]
Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. 2023. Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale. InProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency(Chicago, IL, U...
-
[14]
Asia Biega, Georgina Born, Fernando Diaz, Mary L. Gray, and Rida Qadri. 2025. Towards a Multidisciplinary Vision for Culturally Inclusive Generative AI (Dagstuhl Seminar 25022).Dagstuhl Reports15, 1 (2025), 33–49. doi:10.4230/DagRep.15.1.33
-
[15]
Black Forest Labs. 2024. FLUX. https://github.com/black-forest-labs/flux
work page 2024
-
[16]
Janet Blake. 2000. On Defining the Cultural Heritage.International & Comparative Law Quarterly49, 1 (2000), 61–85
work page 2000
- [17]
-
[18]
Stacy M. Branham and Shaun K. Kane. 2015. The Invisible Work of Accessibility: How Blind Employees Manage Accessibility in Mixed-Ability Workplaces. InProceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility(Lisbon, Portugal)(ASSETS ’15). Association for Computing Machinery, New York, NY, USA, 163–171. doi:10.1145/270064...
-
[19]
Chris Callison-Burch. 2009. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk. InProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Philipp Koehn and Rada Mihalcea (Eds.). Association for Computational Linguistics, Singapore, 286–295. https://aclanthology.org/D09-1030/
work page 2009
-
[20]
Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems(Denver, Colorado, USA)(CHI ’17). Association for Computing Machinery, New York, NY, USA, 2334–2346. doi:10.1145/3025453.3026044
-
[21]
Kyla Chasalow and Karen Levy. 2021. Representativeness in statistics, politics, and machine learning. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 77–89
work page 2021
- [22]
-
[23]
Jiahui Chen, Candace Ross, Reyhane Askari-Hemmat, Koustuv Sinha, Melissa Hall, Michal Drozdzal, and Adriana Romero-Soriano. 2025. Multi- Modal Language Models as Text-to-Image Model Evaluators. https://arxiv.org/abs/2505.00759 Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics 19
-
[24]
Tim Connell. 2008. The Challenge of Assistive Technology and Braille Literacy. https://www.afb.org/aw/9/1/14277 [Online; accessed 6-September- 2025]
work page 2008
-
[25]
Emily Corvi, Hannah Washington, Stefanie Reed, Chad Atalla, Alexandra Chouldechova, P. Alex Dow, Jean Garcia-Gathright, Nicholas J Pangakis, Emily Sheng, Dan Vann, Matthew Vogel, and Hanna Wallach. 2025. Taxonomizing Representational Harms using Speech Act Theory. InFindings of the Association for Computational Linguistics. doi:10.18653/v1/2025.findings-acl.202
-
[26]
Amanda Coston, Anna Kawakami, Haiyi Zhu, Ken Holstein, and Hoda Heidari. 2023. A validity perspective on evaluating the justified use of data-driven decision-making algorithms. In2023 IEEE conference on secure and trustworthy machine learning (SaTML). IEEE, 690–704
work page 2023
-
[27]
Lee J Cronbach and Paul E Meehl. 1955. Construct validity in psychological tests.Psychological bulletin52, 4 (1955), 281
work page 1955
- [28]
-
[29]
Maitraye Das, Alexander J Fiannaca, Meredith Ringel Morris, Shaun K Kane, and Cynthia L Bennett. 2024. From provenance to aberrations: Image creator and screen reader user perspectives on alt text for AI-generated images. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–21
work page 2024
-
[30]
Maitraye Das, Darren Gergle, and Anne Marie Piper. 2019. "It doesn’t win you friends": Understanding Accessibility in Collaborative Writing for People with Vision Impairments.Proc. ACM Hum.-Comput. Interact.3, CSCW, Article 191 (Nov. 2019), 26 pages. doi:10.1145/3359293
-
[31]
Nassim Dehouche and Kullathida Dehouche. 2023. What’s in a text-to-image prompt? The potential of stable diffusion in visual arts education. Heliyon9, 6 (2023), e16757. doi:10.1016/j.heliyon.2023.e16757
- [32]
-
[33]
Sunipa Dev, Vinodkumar Prabhakaran, Rutledge Chin Feman, Aida Davani, Remi Denton, Charu Kalia, Piyawat L Kumjorn, Madhurima Maji, Rida Qadri, Negar Rostamzadeh, Renee Shelby, Romina Stella, Hayk Stepanyan, Erin van Liemt, Aishwarya Verma, Oscar Wahltinez, Edem Wornyo, Andrew Zaldivar, and Saška Mojsilović. 2026. A Unified Framework to Quantify Cultural I...
-
[34]
Athiya Deviyani and Fernando Diaz. 2025. Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy. https://arxiv.org/abs/2503. 19828
work page 2025
-
[35]
Lisa Egede. 2025.Exploring Black Communities’ Perceptions and Design Approaches for Building Culturally Tailored AI Systems. Association for Computing Machinery, New York, NY, USA, 72–76. https://doi.org/10.1145/3715668.3735629
- [36]
-
[37]
Yannick Exner, Jochen Hartmann, Oded Netzer, and Shunyuan Zhang. 2025. AI in Disguise - How AI-Generated Ads’ Visual Cues Shape Consumer Perception and Performance. doi:10.2139/ssrn.5096969
-
[38]
Ali Farhadi, Ian Endres, Derek Hoiem, and David Forsyth. 2009. Describing objects by their attributes. In2009 IEEE Conference on Computer Vision and Pattern Recognition. 1778–1785. doi:10.1109/CVPR.2009.5206772
- [39]
-
[40]
Simret Araya Gebreegziabher, Charles Chiang, Zichu Wang, Zahra Ashktorab, Michelle Brachman, Werner Geyer, Toby Jia-Jun Li, and Diego Gómez-Zará. 2025. MetricMate: An Interactive Tool for Generating Evaluation Criteria for LLM-as-a-Judge Workflow. InProceedings of the 4th Annual Symposium on Human-Computer Interaction for Work (CHIWORK ’25). Association f...
-
[41]
Sourojit Ghosh, Pranav Narayanan Venkit, Sanjana Gautam, Shomir Wilson, and Aylin Caliskan. 2024. Do Generative AI Models Output Harm while Representing Non-Western Cultures: Evidence from A Community-Centered Approach.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society7, 1 (Oct. 2024), 476–489. doi:10.1609/aies.v7i1.31651
-
[42]
Tarleton Gillespie. 2024. Generative AI and the politics of visibility.Big Data & Society11, 2 (2024), 20539517241252131. doi:10.1177/ 20539517241252131
work page 2024
- [43]
-
[44]
Kanika Gupta, Monojit Choudhury, and Kalika Bali. 2012. Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics. InProceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odij...
work page 2012
- [45]
-
[46]
Bell, Candace Ross, Adina Williams, Michal Drozdzal, and Adriana Romero Soriano
Melissa Hall, Samuel J. Bell, Candace Ross, Adina Williams, Michal Drozdzal, and Adriana Romero Soriano. 2024. Towards Geographic Inclusion in the Evaluation of Text-to-Image Models. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency(Rio de Janeiro, Brazil)(FAccT ’24). Association for Computing Machinery, New York, NY, ...
-
[47]
1997.Representation: Cultural Representations and Signifying Practices
Stuart Hall (Ed.). 1997.Representation: Cultural Representations and Signifying Practices. Sage Publications, London. 20 Johnson et al
work page 1997
-
[48]
Siobhan Mackenzie Hall, Samantha Dalal, Raesetje Sefala, Foutse Yuehgoh, Aisha Alaagib, Imane Hamzaoui, Shu Ishida, Jabez Magomere, Lauren Crais, Aya Salama, et al. 2025. The Human Labour of Data Work: Capturing Cultural Diversity through World Wide Dishes.arXiv preprint arXiv:2502.05961(2025)
- [49]
-
[50]
Hamna, Deepthi Sudharsan, Agrima Seth, Ritvik Budhiraja, Deepika Khullar, Vyshak Jain, Kalika Bali, Aditya Vashistha, and Sameer Segal. 2025. Kahani: Culturally-Nuanced Visual Storytelling Tool for Non-Western Cultures. InProceedings of the 2025 ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies (COMPASS ’25). Association for Computing Ma...
-
[51]
Emma Harvey, Emily Sheng, Su Lin Blodgett, Alexandra Chouldechova, Jean Garcia-Gathright, Alexandra Olteanu, and Hanna Wallach. 2025. Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems. https://arxiv.org/abs/ 2506.04482
-
[52]
Helia Hashemi, Jason Eisner, Corby Rosset, Benjamin Van Durme, and Chris Kedzie. 2024. LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.)...
-
[53]
Huiguo He, Huan Yang, Zixi Tuo, Yuan Zhou, Qiuyue Wang, Yuhang Zhang, Zeyu Liu, Wenhao Huang, Hongyang Chao, and Jian Yin. 2025. Dreamstory: Open-domain story visualization by llm-guided multi-subject consistent diffusion.IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)
work page 2025
-
[54]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2018. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. https://arxiv.org/abs/1706.08500
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[55]
Rachel Hong, William Agnew, Tadayoshi Kohno, and Jamie Morgenstern. 2024. Who’s in and who’s out? A case study of multimodal CLIP-filtering in DataComp. InProceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization. 1–17
work page 2024
-
[56]
Chien-Chi Hsu and Brian A. Sandford. 2007. The Delphi technique: Making sense of consensus.Practical Assessment, Research, and Evaluation12, 10 (2007), 1–8. https://openpublishing.library.umass.edu/pare/article/id/1418/ A widely cited methodological overview of the Delphi method
work page 2007
- [57]
-
[58]
Mina Huh, Yi-Hao Peng, and Amy Pavel. 2023. GenAssist: Making image generation accessible. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–17
work page 2023
- [59]
- [60]
-
[61]
Harry H. Jiang, Lauren Brown, Jessica Cheng, Mehtab Khan, Abhishek Gupta, Deja Workman, Alex Hanna, Johnathan Flowers, and Timnit Gebru
-
[62]
InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society
AI Art and its Impact on Artists. InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society(Montréal, QC, Canada)(AIES ’23). Association for Computing Machinery, New York, NY, USA, 363–374. doi:10.1145/3600211.3604681
-
[63]
Nari Johnson, Hamna Abid, Deepthi Sudharsan, Theo Holroyd, Samantha Dalal, Siobhan Mackenzie Hall, Jennifer Wortman Vaughan, Daniela Massiceti, and Cecily Morrison. 2025. Position: To Make Text-to-Image Models that Work for Marginalized Communities, We Need New Measurement Practices for the Long Tail. https://www.microsoft.com/en-us/research/publication/p...
work page 2025
-
[64]
Shivani Kapania, Stephanie Ballard, Alex Kessler, and Jennifer Wortman Vaughan. 2025. Examining the Expanding Role of Synthetic Data Throughout the AI Development Pipeline. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency
work page 2025
-
[65]
2025.Translation Tutorial: AI Measurement as a Stakeholder-Engaged Design Practice
Anna Kawakami, Su Lin Blodgett, Solon Barocas, Alex Chouldechova, Abigail Jacobs, Emily Sheng, Jenn Wortman Vaughan, Hanna Wallach, Amy Winecoff, Angelina Wang, Haiyi Zhu, and Ken Holstein. 2025.Translation Tutorial: AI Measurement as a Stakeholder-Engaged Design Practice. Retrieved January 10, 2026 from https://drive.google.com/file/d/12qQd6ROfacYAtoQ-ii...
work page 2025
-
[66]
Anna Kawakami, Jordan Taylor, Sarah Fox, Haiyi Zhu, and Kenneth Holstein. 2026. AI failure loops in devalued work: The confluence of overconfidence in AI and underconfidence in worker expertise.Big Data & Society13, 1 (2026), 20539517261424164. doi:10.1177/20539517261424164
-
[67]
Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, et al. 2024. The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models.arXiv preprin...
- [68]
-
[69]
Kevin Knight and Jonathan Graehl. 1998. Machine Transliteration.Computational Linguistics24, 4 (1998), 599–612. https://aclanthology.org/J98- 4003/
work page 1998
-
[70]
Elisa Kreiss, Cynthia Bennett, Shayan Hooshmand, Eric Zelikman, Meredith Ringel Morris, and Christopher Potts. 2022. Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics.arXiv preprint arXiv:2205.10646(2022). Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics 21
-
[71]
Neha Kumar, Naveena Karusala, Azra Ismail, Marisol Wong-Villacres, and Aditya Vishwanath. 2019. Engaging Feminist Solidarity for Comparative Research, Design, and Practice.Proc. ACM Hum.-Comput. Interact.3, CSCW, Article 167 (Nov. 2019), 24 pages. doi:10.1145/3359269
-
[72]
C., Avik Bhattacharyya, Mitesh M
Anoop Kunchukuttan, Divyanshu Kakwani, Satish Golla, Gokul N. C., Avik Bhattacharyya, Mitesh M. Khapra, and Pratyush Kumar. 2020. AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages. https://arxiv.org/abs/2005.00085
-
[73]
Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, and Percy Liang. 2023. Holistic Evaluation of Text-To-Image Models. https://arxiv.org/abs/2311.04287
-
[74]
Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng, and Huan Liu. 2025. From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge. https: //arxiv.org/abs/2411.16594
- [75]
-
[76]
Kelly Mack, Rai Ching Ling Hsu, Andrés Monroy-Hernández, Brian A. Smith, and Fannie Liu. 2023. Towards Inclusive Avatars: Disability Representation in Avatar Platforms. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems(Hamburg, Germany)(CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 607, 13 pages. do...
-
[77]
They only care to show us the wheelchair
Kelly Avery Mack, Rida Qadri, Remi Denton, Shaun K Kane, and Cynthia L Bennett. 2024. “They only care to show us the wheelchair”: disability representation in text-to-image AI models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–23
work page 2024
-
[78]
Jabez Magomere, Shu Ishida, Tejumade Afonja, Aya Salama, Daniel Kochin, Yuehgoh Foutse, Imane Hamzaoui, Raesetje Sefala, Aisha Alaagib, Samantha Dalal, et al . 2025. The World Wide recipe: A community-centred framework for fine-grained data collection and regional bias operationalisation. InProceedings of the 2025 ACM Conference on Fairness, Accountabilit...
work page 2025
-
[79]
Daniela Massiceti, Camilla Longden, Agnieszka Slowik, Samuel Wills, Martin Grayson, and Cecily Morrison. 2024. Explaining CLIP’s performance disparities on data from blind/low vision users. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12172–12182
work page 2024
-
[80]
J. Nathan Matias and Megan Price. 2025. How public involvement can improve the science of AI.Proceedings of the National Academy of Sciences 122, 48 (2025), e2421111122. doi:10.1073/pnas.2421111122
-
[81]
Timothy R McIntosh, Teo Susnjak, Nalin Arachchilage, Tong Liu, Dan Xu, Paul Watters, and Malka N Halgamuge. 2025. Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence.IEEE Transactions on Artificial Intelligence(2025), 1–18. doi:10.1109/tai.2025.3569516
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.