pith. sign in

arxiv: 2411.10915 · v2 · submitted 2024-11-16 · 💻 cs.CL · cs.LG

Bias in Large Language Models: Origin, Evaluation, and Mitigation

Pith reviewed 2026-05-23 17:04 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords LLM biasintrinsic biasextrinsic biasbias evaluationbias mitigationfair AIresponsible AINLP tasks
0
0 comments X

The pith

Biases in large language models arise from data and context and can be detected and reduced through staged evaluation and mitigation methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models produce biased outputs across natural language tasks because their training data and deployment settings embed systematic preferences. The review divides these biases into intrinsic forms tied to model internals and extrinsic forms arising from external use. It surveys detection approaches at the data, model, and output stages and mitigation approaches before, during, and after model training. These distinctions matter because biased outputs can produce unfair results in high-stakes settings such as medical advice and legal decisions. The synthesis supplies researchers with a structured set of tools and techniques for reducing such effects.

Core claim

The paper establishes that bias in LLMs manifests as intrinsic biases rooted in training data and architecture and extrinsic biases introduced during application, that these can be measured with data-level, model-level, and output-level methods, and that they can be addressed by pre-model, intra-model, and post-model mitigation techniques, thereby supporting the development of fairer AI systems.

What carries the argument

The categorization of biases into intrinsic and extrinsic types together with the division of evaluation and mitigation into pre-model, intra-model, and post-model stages.

If this is right

  • Biased models can produce harmful decisions in healthcare and criminal justice applications.
  • Evaluation at multiple levels (data, model, output) allows earlier detection of bias than output checks alone.
  • Mitigation works best when applied across the full model lifecycle rather than at a single stage.
  • Legal and ethical review of LLM deployments must account for both intrinsic and extrinsic bias sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could be extended by testing whether new task-specific biases in areas such as code generation fit the existing intrinsic/extrinsic split.
  • A practical next step would be to map the mitigation techniques onto measurable fairness metrics that regulators could adopt.
  • If the staged approach proves effective, model developers might adopt it as a default checklist during training and deployment.

Load-bearing premise

The review assumes that its chosen categorization of biases into intrinsic and extrinsic, along with the selected evaluation and mitigation methods, provides a representative and useful overview of the field without significant omissions or selection bias in the surveyed literature.

What would settle it

A comprehensive survey of recent LLM bias papers that identifies a major category or effective mitigation approach falling outside the intrinsic/extrinsic and pre/intra/post-model frameworks would show the review's organization is incomplete.

read the original abstract

Large Language Models (LLMs) have revolutionized natural language processing, but their susceptibility to biases poses significant challenges. This comprehensive review examines the landscape of bias in LLMs, from its origins to current mitigation strategies. We categorize biases as intrinsic and extrinsic, analyzing their manifestations in various NLP tasks. The review critically assesses a range of bias evaluation methods, including data-level, model-level, and output-level approaches, providing researchers with a robust toolkit for bias detection. We further explore mitigation strategies, categorizing them into pre-model, intra-model, and post-model techniques, highlighting their effectiveness and limitations. Ethical and legal implications of biased LLMs are discussed, emphasizing potential harms in real-world applications such as healthcare and criminal justice. By synthesizing current knowledge on bias in LLMs, this review contributes to the ongoing effort to develop fair and responsible AI systems. Our work serves as a comprehensive resource for researchers and practitioners working towards understanding, evaluating, and mitigating bias in LLMs, fostering the development of more equitable AI technologies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper is a review that categorizes bias in LLMs as intrinsic versus extrinsic, surveys evaluation methods at data/model/output levels, organizes mitigation into pre/intra/post-model techniques, and discusses ethical/legal implications in applications such as healthcare and justice, with the central claim that this synthesis advances fair and responsible AI.

Significance. A well-structured synthesis of bias literature could provide a practical reference for the field if the coverage is representative; however, the lack of any documented search protocol means the contribution rests on unverified curation rather than systematic aggregation.

major comments (1)
  1. [Abstract and Introduction] The manuscript states it provides a 'comprehensive review' and 'synthesizing current knowledge' (abstract) but contains no description of literature search methodology, databases, keywords, date ranges, inclusion criteria, or number of papers screened. This omission directly undermines the representativeness claim and the utility of the intrinsic/extrinsic and pre/intra/post taxonomies as a 'robust toolkit'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful feedback on our review paper. We address the major comment regarding the absence of a documented literature search methodology below and outline revisions that will improve transparency while preserving the paper's value as a synthesized reference.

read point-by-point responses
  1. Referee: [Abstract and Introduction] The manuscript states it provides a 'comprehensive review' and 'synthesizing current knowledge' (abstract) but contains no description of literature search methodology, databases, keywords, date ranges, inclusion criteria, or number of papers screened. This omission directly undermines the representativeness claim and the utility of the intrinsic/extrinsic and pre/intra/post taxonomies as a 'robust toolkit'.

    Authors: We acknowledge the validity of this observation. The current manuscript presents a narrative synthesis of key literature rather than a systematic review following protocols such as PRISMA. To address the concern, we will add a dedicated subsection (likely in the Introduction) that explicitly describes the literature scope: primary sources include arXiv, ACL Anthology, NeurIPS, and Google Scholar; coverage focuses on works from 2018 to October 2024; inclusion was guided by relevance to LLM bias origins, evaluation benchmarks, and mitigation strategies, with emphasis on highly cited and representative papers. We will also moderate phrasing from 'comprehensive review' to 'extensive review' and 'synthesizing current knowledge' to 'synthesizing key developments' to align with the narrative nature of the work. These changes will clarify the basis for the intrinsic/extrinsic and pre/intra/post categorizations without overstating systematic aggregation, thereby supporting their utility as a practical reference. revision: yes

Circularity Check

0 steps flagged

Review paper aggregates external literature without internal circular derivations

full rationale

This is a review paper synthesizing existing literature on LLM bias. The abstract describes categorization of biases (intrinsic/extrinsic) and mitigation strategies (pre/intra/post-model) drawn from surveyed works, with no original equations, predictions, fitted parameters, or derivations presented. No load-bearing self-citations or self-definitional steps are evident in the provided text; the central synthesis claim rests on external sources rather than reducing to inputs defined within the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This paper is a literature review and introduces no new free parameters, axioms, or invented entities; all content rests on synthesis of previously published research.

pith-pipeline@v0.9.0 · 5725 in / 1088 out tokens · 28368 ms · 2026-05-23T17:04:18.628730+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?

    cs.CL 2026-05 unverdicted novelty 7.0

    Personalized LLM-generated plain language summaries improve lay readers' comprehension and quality ratings but increase risks of reinforcing biases and introducing hallucinations compared to static expert summaries.

  2. Counting Worlds Branching Time Semantics for post-hoc Bias Mitigation in generative AI

    cs.LO 2026-04 unverdicted novelty 7.0

    CTLF is a branching-time logic with counting-worlds semantics for verifying fairness in probability distributions over protected attributes, predicting bias bounds, and calculating outputs to remove in generative AI series.

  3. When AI reviews science: Can we trust the referee?

    cs.AI 2026-04 unverdicted novelty 6.0

    AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference sub...

  4. Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users

    cs.AI 2025-12 unverdicted novelty 6.0

    LLM safety evaluations for personal advice must test responses against diverse user vulnerability profiles, since context-blind ratings overestimate safety and realistic prompt context does not fix the problem.

  5. A Study of LLMs' Preferences for Libraries and Programming Languages

    cs.SE 2025-03 unverdicted novelty 6.0

    Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.

  6. FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

    cs.AI 2026-04 unverdicted novelty 4.0

    Vision-language models for wellbeing assessment exhibit dataset-dependent performance and demographic biases, with explainability interventions providing inconsistent fairness gains at potential accuracy costs.

  7. A Survey on LLM-as-a-Judge

    cs.CL 2024-11 unverdicted novelty 4.0

    A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages · cited by 7 Pith papers · 19 internal anchors

  1. [1]

    Persistent an ti-muslim bias in large language models

    Abubakar Abid, Maheen Farooqi, and James Zou. Persistent an ti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, page 298–306, New York, NY, USA,

  2. [2]

    Arif Ahmad and Pushpak Bhattacharyya

    Association for Computing Machinery. Arif Ahmad and Pushpak Bhattacharyya. Bias in language mode ls: A survey. Jaimeen Ahn and Alice Oh. Mitigating language-dependent et hnic bias in BERT. In Proceedings of the 2021 Conference on Empirical Methods in Natural Langua ge Processing, Online and Punta Cana, Dominican Republic, November

  3. [3]

    AJ Alvero, Jinsook Lee, Alejandra Regla-Vargas, Rene Kizil ec, Thorsten Joachims, and Anthony Lis- ing Antonio

    doi: 10.1109/TAC.1974.1100705. AJ Alvero, Jinsook Lee, Alejandra Regla-Vargas, Rene Kizil ec, Thorsten Joachims, and Anthony Lis- ing Antonio. Large language models, social demography, and hegemony: Comparing authorship in human and synthetic text. Preprint, pages 1–25,

  4. [4]

    Do large language models discriminate in hiring decisions on the basis of race , ethnicity, and gender? arXiv preprint arXiv:2406.10486,

    Haozhe An, Christabel Acquaye, Colin Wang, Zongxia Li, and R achel Rudinger. Do large language models discriminate in hiring decisions on the basis of race , ethnicity, and gender? arXiv preprint arXiv:2406.10486,

  5. [5]

    Machine bias

    Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias. ProPublica, 23(2016): 139–159,

  6. [6]

    Fairmonitor: A dual-framework for detecting stereotypes and biases in lar ge language models

    Yanhong Bai, Jiabao Zhao, Jinxin Shi, Zhentao Xie, Xingjiao Wu, and Liang He. Fairmonitor: A dual-framework for detecting stereotypes and biases in lar ge language models. arXiv preprint arXiv:2405.03098,

  7. [7]

    Evaluating the Underlying Gender Bias in Contextualized Word Embeddings

    As- sociation for Computational Linguistics. Christine Basta, Marta R Costa-Jussà, and Noe Casas. Evalua ting the underlying gender bias in contextualized word embeddings. arXiv preprint arXiv:1904.08783 ,

  8. [8]

    On the dangers of stochastic parrots: Can language models be too bi g

    Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too bi g. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages 610–623,

  9. [9]

    Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Visi on (W ACV), pages 1536–1546

    Abeba Birhane and Vinay Uday Prabhu. Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Visi on (W ACV), pages 1536–1546. IEEE,

  10. [10]

    Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English

    Su Lin Blodgett and Brendan O’Connor. Racial disparity in na tural language processing: A case study of social media african-american english. arXiv preprint arXiv:1707.00061 ,

  11. [11]

    A large annotated corpus for learning natural language inference

    Samuel R Bowman, Gabor Angeli, Christopher Potts, and Chris topher D Manning. A large anno- tated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 ,

  12. [12]

    Language models are few-shot learners

    21 Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jar ed Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda As kell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems , 33:1877–1901,

  13. [13]

    SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

    Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, an d Lucia Specia. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-li ngual focused evaluation. arXiv preprint arXiv:1708.00055,

  14. [14]

    My f air lady: Detecting and mitigating bias in job advertisements

    Michelle Chen, Zhu Ma, Aniko Hannak, and Christo Wilson. My f air lady: Detecting and mitigating bias in job advertisements. Proceedings of the 2018 World Wide Web Conference , pages 991–1000,

  15. [15]

    Enhanced lstm for natural language inference

    Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, and D iana Inkpen. Enhanced lstm for natural language inference. arXiv preprint arXiv:1609.06038 ,

  16. [16]

    Interactive analysis of llms using mea ningful counterfactuals

    Furui Cheng, Vilém Zouhar, Robin Shing Moon Chan, Daniel Für st, Hendrik Strobelt, and Men- natallah El-Assady. Interactive analysis of llms using mea ningful counterfactuals. arXiv preprint arXiv:2405.00708,

  17. [17]

    Improving n eural conversational models with entropy-based data filtering

    Richárd Csáky, Patrik Purgai, and Gábor Recski. Improving n eural conversational models with entropy-based data filtering. arXiv preprint arXiv:1905.05471 ,

  18. [18]

    Ad- vances in neural information processing systems , 33:4271–4282

    22 Debarati Das, Karin De Langis, Anna Martin, Jaehyung Kim, Mi nhwa Lee, Zae Myung Kim, Shirley Hayati, Risako Owan, Bin Hu, Ritik Parkar, et al. Under the su rface: Tracking the artifactuality of llm-generated data. arXiv preprint arXiv:2401.14698 ,

  19. [19]

    Semantic change character- ization with llms using rhetorics

    Jader Martins Camboim de Sá, Marcos Da Silveira, and Cédric P ruski. Semantic change character- ization with llms using rhetorics. arXiv preprint arXiv:2407.16624 ,

  20. [20]

    On measures of biases and harms in nlp

    Sunipa Dev, Emily Sheng, Jieyu Zhao, Aubrie Amstutz, Jiao Su n, Yu Hou, Mattie Sanseverino, Jiin Kim, Akihiro Nishi, Nanyun Peng, et al. On measures of biases and harms in nlp. arXiv preprint arXiv:2108.03362,

  21. [21]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Tout anova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 ,

  22. [22]

    Query Expansion with Locally-Trained Word Embeddings

    Fernando Diaz, Bhaskar Mitra, and Nick Craswell. Query expa nsion with locally-trained word embeddings. arXiv preprint arXiv:1605.07891 ,

  23. [23]

    Addressing age- related bias in sentiment analysis

    Mark Díaz, Isaac Johnson, Amanda Lazar, Anne Marie Piper, an d Darren Gergle. Addressing age- related bias in sentiment analysis. In Proceedings of the 2018 chi conference on human factors in computing systems , pages 1–14,

  24. [24]

    Evaluating vocab ulary usage in llms

    Matthew Durward and Christopher Thomson. Evaluating vocab ulary usage in llms. In Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educa tional Applications (BEA 2024), pages 266–282,

  25. [25]

    Cognitive bias in high- stakes decision-making with llms

    Jessica Echterhoff, Yao Liu, Abeer Alessa, Julian McAuley, a nd Zexue He. Cognitive bias in high- stakes decision-making with llms. arXiv preprint arXiv:2403.00811 ,

  26. [26]

    Robbie: Robust bias evaluation of large generative language models

    David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuc hen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, and Eric Smi th. Robbie: Robust bias evaluation of large generative language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages 3764–3814,

  27. [27]

    AllenNLP: A Deep Semantic Natural Language Processing Platform

    Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Prad eep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. Allennlp: A deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640 ,

  28. [28]

    He is very intelligent , she is very beautiful? on mitigating social biases in language modelling and generation

    Aparna Garimella, Akhash Amarnath, Kiran Kumar, Akash Pram od Yalla, N Anandhavelu, Niyati Chhaya, and Balaji Vasan Srinivasan. He is very intelligent , she is very beautiful? on mitigating social biases in language modelling and generation. In Findings of the Association for Computa- tional Linguistics: ACL-IJCNLP 2021 , pages 4534–4545,

  29. [29]

    Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. RealToxici- tyPrompts: Evaluating neural toxic degeneration in langua ge models. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics : EMNLP 2020 , pages 3356–3369. Association for Computational Linguisti cs,

  30. [30]

    Statistical challenges wit h dataset construction: Why you will never have enough images

    Josh Goldman and John K Tsotsos. Statistical challenges wit h dataset construction: Why you will never have enough images. arXiv preprint arXiv:2408.11160 ,

  31. [31]

    Unboxing occupat ional bias: Grounded debiasing llms with us labor data

    Atmika Gorti, Manas Gaur, and Aman Chadha. Unboxing occupat ional bias: Grounded debiasing llms with us labor data. arXiv preprint arXiv:2408.11247 ,

  32. [32]

    Sentime nt analysis with nlp on twitter data

    24 Md Rakibul Hasan, Maisha Maliha, and M Arifuzzaman. Sentime nt analysis with nlp on twitter data. In 2019 international conference on computer, communication, c hemical, materials and electronic engineering (IC4ME2) , pages 1–4. IEEE,

  33. [33]

    Data Mining, Inference, and Prediction

    URL https://doi.org/10.1007/978-0-387-84858-7 . Lucy Havens, Melissa Terras, Benjamin Bach, and Beatrice Al ex. Uncertainty and inclusivity in gender bias annotation: An annotation taxonomy and annotat ed datasets of british english text. In 4th Workshop on Gender Bias in Natural Language Processing at NAACL, pages 30–57. ACL Anthology,

  34. [34]

    URL https://www.jstor.org/stable/1912352

    doi: 10.2307/1912352. URL https://www.jstor.org/stable/1912352. Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darr ell, and Anna Rohrbach. Women also snowboard: Overcoming bias in captioning models. In Proceedings of the European conference on computer vision (ECCV) , pages 771–787,

  35. [35]

    A structural probe fo r finding syntax in word representa- tions

    John Hewitt and Christopher D Manning. A structural probe fo r finding syntax in word representa- tions. In Proceedings of the 2019 Conference of the North American Chapt er of the Association for Computational Linguistics , pages 4129–4138,

  36. [36]

    Li and Kevin Hou

    Chuan Tian Hongfei Li, Qian H. Li and Kevin Hou. Issues in cox p roportional hazards model with unequal randomization. Journal of Biopharmaceutical Statistics , 0(0):1–6, 2024a. doi: 10.1080/ 10543406.2024.2418139. URL https://doi.org/10.1080/10543406.2024.2418139. PMID: 39445665. Chuan Tian Hongfei Li, Qian H. Li and Kevin Hou. Issues in cox p roportiona...

  37. [37]

    The importance of modeling social fa ctors of language: Theory and practice

    Dirk Hovy and Diyi Yang. The importance of modeling social fa ctors of language: Theory and practice. In Proceedings of the 2021 Conference of the North American Chapt er of the Association for Computational Linguistics: Human language technologi es, pages 588–602,

  38. [38]

    Toxicity detection for free

    Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, and David W agner. Toxicity detection for free. arXiv preprint arXiv:2405.18822 ,

  39. [39]

    Up5: Unbiased foun- dation model for fairness-aware recommendation

    25 Wenyue Hua, Yingqiang Ge, Shuyuan Xu, Jianchao Ji, and Yongf eng Zhang. Up5: Unbiased foun- dation model for fairness-aware recommendation. arXiv preprint arXiv:2305.12090 ,

  40. [40]

    Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

    Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. L lama guard: Llm-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674 ,

  41. [41]

    Ctrl: A conditional transformer language model for controllable generation

    Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, Caimin g Xiong, and Richard Socher. Ctrl: A conditional transformer language model for controllable generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Process ing, pages 111–129,

  42. [42]

    Eval uating the diversity, equity and inclusion of nlp technology: A case study for indian languag es

    Simran Khanuja, Sebastian Ruder, and Partha Talukdar. Eval uating the diversity, equity and inclusion of nlp technology: A case study for indian languag es. arXiv preprint arXiv:2205.12676 ,

  43. [43]

    Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems

    Svetlana Kiritchenko and Saif M Mohammad. Examining gender and race bias in two hundred sentiment analysis systems. arXiv preprint arXiv:1805.04508 ,

  44. [44]

    Can ll ms recognize toxic- ity? structured toxicity investigation framework and sema ntic-based metric

    Hyukhun Koh, Dohyung Kim, Minwoo Lee, and Kyomin Jung. Can ll ms recognize toxic- ity? structured toxicity investigation framework and sema ntic-based metric. arXiv preprint arXiv:2402.06900,

  45. [45]

    Measuring Bias in Contextualized Word Representations

    Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yul ia Tsvetkov. Measuring bias in contextualized word representations. arXiv preprint arXiv:1906.07337 ,

  46. [46]

    Neural embed- ding of beliefs reveals the role of relative dissonance in hu man decision-making

    Byunghwee Lee, Rachith Aiyappa, Yong-Yeol Ahn, Haewoon Kwa k, and Jisun An. Neural embed- ding of beliefs reveals the role of relative dissonance in hu man decision-making. arXiv preprint arXiv:2408.07237,

  47. [47]

    End-to-end Neural Coreference Resolution

    Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. End -to-end neural coreference resolution. arXiv preprint arXiv:1707.07045 ,

  48. [48]

    Comparing biases and the im pact of multilingual training across multiple languages

    Sharon Levy, Neha John, Ling Liu, Yogarshi Vyas, Jie Ma, Yosh inari Fujinuma, Miguel Ballesteros, Vittorio Castelli, and Dan Roth. Comparing biases and the im pact of multilingual training across multiple languages. In Proceedings of the 2023 Conference on Empirical Methods in Na tural Language Processing, Singapore, December

  49. [49]

    Steer- ing llms towards unbiased responses: A causality-guided de biasing framework

    Jingling Li, Zeyu Tang, Xiaoyu Liu, Peter Spirtes, Kun Zhang , Liu Leqi, and Yang Liu. Steer- ing llms towards unbiased responses: A causality-guided de biasing framework. arXiv preprint arXiv:2403.08743, 2024a. Weitao Li, Junkai Li, Weizhi Ma, and Yang Liu. Citation-enha nced generation for llm-based chatbot. arXiv preprint arXiv:2402.16063 , 2024b. Ying...

  50. [50]

    On Measuring Social Biases in Sentence Encoders

    URL https://api.semanticscholar.org/CorpusID:202541569. Chandler May, Alex Wang, Shikha Bordia, Samuel R Bowman, and Rachel Rudinger. On measuring social biases in sentence encoders. arXiv preprint arXiv:1903.10561 ,

  51. [51]

    Text classification using label names only: A language model self -training approach

    Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, C hao Zhang, and Jiawei Han. Text classification using label names only: A language model self -training approach. arXiv preprint arXiv:2010.07245,

  52. [52]

    Global gallery: The fine art of painting culture portraits through multilingual instruction tuning

    Anjishnu Mukherjee, Aylin Caliskan, Ziwei Zhu, and Antonio s Anastasopoulos. Global gallery: The fine art of painting culture portraits through multilingual instruction tuning. In Proceedings of the 2024 Conference of the North American Chapter of the Associati on for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages 6398–6415,

  53. [53]

    Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

    Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, Bing Xiang, et al. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023 ,

  54. [54]

    URL https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-9-56

    doi: 10.1186/1471-2288-9-56. URL https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-9-56. Davide Neri, Jacopo Soldani, Olaf Zimmermann, and Antonio B rogi. Design principles, architectural smells and refactorings for microservices: a multivocal re view. SICS Software-Intensive Cyber- Physical Systems , 35:3–15,

  55. [55]

    Do multilingual large language models mitigate stereotype bias? In Proceedings of the 2nd Workshop on Cross-Cultural Considera tions in NLP , Bangkok, Thailand, August 2024a

    Shangrui Nie, Michael Fromm, Charles Welch, Rebekka Görge, Akbar Karimi, Joan Plepi, Nazia Mowmita, Nicolas Flores-Herr, Mehdi Ali, and Lucie Flek. Do multilingual large language models mitigate stereotype bias? In Proceedings of the 2nd Workshop on Cross-Cultural Considera tions in NLP , Bangkok, Thailand, August 2024a. Association for Computa tional Lin...

  56. [56]

    Competent men and warm women: Gender stereo- types and backlash in image search results

    Jahna Otterbacher, Jo Bates, and Paul Clough. Competent men and warm women: Gender stereo- types and backlash in image search results. In Proceedings of the 2017 chi conference on human factors in computing systems , pages 6620–6631,

  57. [57]

    Reducing gender bia s in abusive language detection

    Ji Ho Park, Jamin Shin, and Pascale Fung. Reducing gender bia s in abusive language detection. In Proceedings of the 2018 Conference on Empirical Methods in Na tural Language Processing , Brussels, Belgium, October-November

  58. [58]

    Models and dat asets for cross-lingual summarisation

    Laura Perez-Beltrachini and Mirella Lapata. Models and dat asets for cross-lingual summarisation. arXiv preprint arXiv:2202.09583 ,

  59. [59]

    SQuAD: 100,000+ Questions for Machine Comprehension of Text

    Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Perc y Liang. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 ,

  60. [60]

    Know What You Don't Know: Unanswerable Questions for SQuAD

    Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you d on’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822 ,

  61. [61]

    Gender Bias in Coreference Resolution

    doi: 10.1037/h0037350. Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benj amin Van Durme. Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301 ,

  62. [62]

    Im not Racist but

    Abel Salinas, Louis Penafiel, Robert McCormack, and Fred Mor statter. " im not racist but...": Dis- covering bias in the internal knowledge of large language mo dels. arXiv preprint arXiv:2310.08780,

  63. [63]

    Xing , title =

    Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Jo el Hestness, Natalia Vassilieva, Daria Soboleva, and Eric Xing. Slimpajama-dc: Understanding dat a combinations for llm training. arXiv preprint arXiv:2309.10818 ,

  64. [64]

    The woman worked as a babysit- ter: On biases in language generation

    Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng . The woman worked as a babysit- ter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Join t Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412,

  65. [65]

    Culturebank: An online community-driven knowl edge base towards culturally aware language technologies

    Weiyan Shi, Ryan Li, Yutong Zhang, Caleb Ziems, Raya Horesh, Rogério Abreu de Paula, Diyi Yang, et al. Culturebank: An online community-driven knowl edge base towards culturally aware language technologies. arXiv preprint arXiv:2404.15238 ,

  66. [66]

    Large language model s as subpopulation representative models: A review

    Gabriel Simmons and Christopher Hare. Large language model s as subpopulation representative models: A review. arXiv preprint arXiv:2310.17888 ,

  67. [67]

    Dropout: a simple way to prevent neural networks from overfit ting

    Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya S utskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfit ting. The Journal of Machine Learning Research, 15(1):1929–1958,

  68. [68]

    Mitigating Gender Bias in Natural Language Processing: Literature Review

    Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSher ief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. Mi tigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976 ,

  69. [69]

    Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation

    Eva Vanmassenhove, Dimitar Shterionov, and Andy Way. Lost i n translation: Loss and decay of linguistic richness in machine translation. arXiv preprint arXiv:1906.12068 ,

  70. [70]

    URL https://projecteuclid.org/euclid.aos/1176345802

    doi: 10.1214/aos/1176345802. URL https://projecteuclid.org/euclid.aos/1176345802. A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems ,

  71. [71]

    Cross-lingual semant ic similarity of words as the similarity of their semantic word responses

    Ivan Vulic and Marie-Francine Moens. Cross-lingual semant ic similarity of words as the similarity of their semantic word responses. In Proceedings of the 2013 Conference of the North Amer- ican Chapter of the Association for Computational Linguist ics: Human Language Technologies (NAACL-HLT 2013), pages 106–116. ACL; East Stroudsburg, PA,

  72. [72]

    Lar ge language models cannot replace human participants because they cannot portray identity gr oups

    Angelina Wang, Jamie Morgenstern, and John P Dickerson. Lar ge language models cannot replace human participants because they cannot portray identity gr oups. arXiv preprint arXiv:2402.01908, 2024a. Xinru Wang, Hannah Kim, Sajjadur Rahman, Kushan Mitra, and Z hengjie Miao. Human-llm collaborative annotation through effective verification of llm labels. In P...

  73. [73]

    Measuring and reducing gendered cor relations in pre-trained models

    Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beutel, Emily Pitler, Ellie Pavlick, Jilin Chen, Ed Chi, and Slav Petrov. Measuring and reducing gendered cor relations in pre-trained models. arXiv preprint arXiv:2010.06032 ,

  74. [74]

    Auditing large language models for enhanced text-based stereotype detect ion and probing-based bias evaluation

    Zekun Wu, Sahan Bulathwela, Maria Perez-Ortiz, and Adriano Soares Koshiyama. Auditing large language models for enhanced text-based stereotype detect ion and probing-based bias evaluation. arXiv preprint arXiv:2404.01768 ,

  75. [75]

    Finbert: A pretrained language model for financial communications

    Yuqi Yang, Yuan Yuan, and Lei Liu. Finbert: A pretrained lang uage model for financial communi- cations. arXiv preprint arXiv:2006.08097 ,

  76. [76]

    Cau sal prompting: Debiasing large language model prompting based on front-door adjustment

    Congzhi Zhang, Linhai Zhang, Deyu Zhou, and Guoqiang Xu. Cau sal prompting: Debiasing large language model prompting based on front-door adjustment. arXiv preprint arXiv:2403.02738 ,

  77. [77]

    Deep learning ba sed recommender system: A survey and new perspectives

    Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. Deep learning ba sed recommender system: A survey and new perspectives. ACM computing surveys (CSUR) , 52(1):1–38, 2019a. Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris B rockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. Dialogpt: Large-scale genera tive pre-training for conversationa...

  78. [78]

    Gender Bias in Contextualized Word Embeddings

    Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Gender bias in coreference resolution: Evaluation and debiasing methods . In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computati onal Linguistics , pages 15–20, 2018a. Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei ...

  79. [79]

    34 Appendix A

    Association for Computational Linguistics. 34 Appendix A. Examples of Extrinsic Biases A.1 Natural Language Understanding (NLU) tasks NLU encompasses a broad range of tasks that aim to improve com prehension of input sequences (Chang et al., 2024). It seeks to grasp the deeper connotatio ns and implications inherent in human communication, focusing on wha...

  80. [80]

    he,” “she,

    This task is crucial for accurately interpreting the meaning of sentences, especially in cases where pronouns, names, or other referen- tial expressions are used. The primary goal of coreference r esolution is to correctly link pronouns like “he,” “she,” or “it” and definite descriptions like “the CEO” to the appropriate entity mentioned earlier in the tex...

Showing first 80 references.