Bias in Large Language Models: Origin, Evaluation, and Mitigation

Hongfei Li; Juntao Su; Mengqiu Zhu; Mengyang Qiu; Muzhe Guo; Shuo Shuo Liu; Yufei Guo; Zhou Yang

arxiv: 2411.10915 · v2 · submitted 2024-11-16 · 💻 cs.CL · cs.LG

Bias in Large Language Models: Origin, Evaluation, and Mitigation

Yufei Guo , Muzhe Guo , Juntao Su , Zhou Yang , Mengqiu Zhu , Hongfei Li , Mengyang Qiu , Shuo Shuo Liu This is my paper

Pith reviewed 2026-05-23 17:04 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords LLM biasintrinsic biasextrinsic biasbias evaluationbias mitigationfair AIresponsible AINLP tasks

0 comments

The pith

Biases in large language models arise from data and context and can be detected and reduced through staged evaluation and mitigation methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models produce biased outputs across natural language tasks because their training data and deployment settings embed systematic preferences. The review divides these biases into intrinsic forms tied to model internals and extrinsic forms arising from external use. It surveys detection approaches at the data, model, and output stages and mitigation approaches before, during, and after model training. These distinctions matter because biased outputs can produce unfair results in high-stakes settings such as medical advice and legal decisions. The synthesis supplies researchers with a structured set of tools and techniques for reducing such effects.

Core claim

The paper establishes that bias in LLMs manifests as intrinsic biases rooted in training data and architecture and extrinsic biases introduced during application, that these can be measured with data-level, model-level, and output-level methods, and that they can be addressed by pre-model, intra-model, and post-model mitigation techniques, thereby supporting the development of fairer AI systems.

What carries the argument

The categorization of biases into intrinsic and extrinsic types together with the division of evaluation and mitigation into pre-model, intra-model, and post-model stages.

If this is right

Biased models can produce harmful decisions in healthcare and criminal justice applications.
Evaluation at multiple levels (data, model, output) allows earlier detection of bias than output checks alone.
Mitigation works best when applied across the full model lifecycle rather than at a single stage.
Legal and ethical review of LLM deployments must account for both intrinsic and extrinsic bias sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be extended by testing whether new task-specific biases in areas such as code generation fit the existing intrinsic/extrinsic split.
A practical next step would be to map the mitigation techniques onto measurable fairness metrics that regulators could adopt.
If the staged approach proves effective, model developers might adopt it as a default checklist during training and deployment.

Load-bearing premise

The review assumes that its chosen categorization of biases into intrinsic and extrinsic, along with the selected evaluation and mitigation methods, provides a representative and useful overview of the field without significant omissions or selection bias in the surveyed literature.

What would settle it

A comprehensive survey of recent LLM bias papers that identifies a major category or effective mitigation approach falling outside the intrinsic/extrinsic and pre/intra/post-model frameworks would show the review's organization is incomplete.

read the original abstract

Large Language Models (LLMs) have revolutionized natural language processing, but their susceptibility to biases poses significant challenges. This comprehensive review examines the landscape of bias in LLMs, from its origins to current mitigation strategies. We categorize biases as intrinsic and extrinsic, analyzing their manifestations in various NLP tasks. The review critically assesses a range of bias evaluation methods, including data-level, model-level, and output-level approaches, providing researchers with a robust toolkit for bias detection. We further explore mitigation strategies, categorizing them into pre-model, intra-model, and post-model techniques, highlighting their effectiveness and limitations. Ethical and legal implications of biased LLMs are discussed, emphasizing potential harms in real-world applications such as healthcare and criminal justice. By synthesizing current knowledge on bias in LLMs, this review contributes to the ongoing effort to develop fair and responsible AI systems. Our work serves as a comprehensive resource for researchers and practitioners working towards understanding, evaluating, and mitigating bias in LLMs, fostering the development of more equitable AI technologies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Standard review that organizes LLM bias into intrinsic/extrinsic and pre/intra/post categories but provides no search protocol.

read the letter

This paper is a literature review that splits bias origins into intrinsic and extrinsic types and mitigation into pre-model, intra-model, and post-model stages while listing evaluation methods at data, model, and output levels. It also touches on ethical issues in domains like healthcare and justice. The structure gives a readable map of existing ideas without claiming new mechanisms or fixes. That matches what a survey should do and makes the categories easy to follow for orientation. The discussion of limitations for each mitigation type is straightforward and avoids overclaiming effectiveness. The main gap is the lack of any account of how papers were selected—no databases, keywords, date cutoffs, or screening criteria appear in the abstract or structure. Without that, the synthesis risks reflecting author choices rather than field coverage, which undercuts the claim to be a comprehensive resource. No original data, derivations, or experiments are presented, so there are no fitting or circularity problems to flag. The work is aimed at readers who need an entry-level overview of the bias conversation rather than specialists hunting for technical advances or falsifiable claims. It could serve as a starting point for someone building a toolkit but will not settle open questions. A serious editor should send it to peer review once the authors add the search methodology and check for balance in the cited examples; the topic is relevant enough to warrant referee input even if revisions are needed.

Referee Report

1 major / 0 minor

Summary. The paper is a review that categorizes bias in LLMs as intrinsic versus extrinsic, surveys evaluation methods at data/model/output levels, organizes mitigation into pre/intra/post-model techniques, and discusses ethical/legal implications in applications such as healthcare and justice, with the central claim that this synthesis advances fair and responsible AI.

Significance. A well-structured synthesis of bias literature could provide a practical reference for the field if the coverage is representative; however, the lack of any documented search protocol means the contribution rests on unverified curation rather than systematic aggregation.

major comments (1)

[Abstract and Introduction] The manuscript states it provides a 'comprehensive review' and 'synthesizing current knowledge' (abstract) but contains no description of literature search methodology, databases, keywords, date ranges, inclusion criteria, or number of papers screened. This omission directly undermines the representativeness claim and the utility of the intrinsic/extrinsic and pre/intra/post taxonomies as a 'robust toolkit'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful feedback on our review paper. We address the major comment regarding the absence of a documented literature search methodology below and outline revisions that will improve transparency while preserving the paper's value as a synthesized reference.

read point-by-point responses

Referee: [Abstract and Introduction] The manuscript states it provides a 'comprehensive review' and 'synthesizing current knowledge' (abstract) but contains no description of literature search methodology, databases, keywords, date ranges, inclusion criteria, or number of papers screened. This omission directly undermines the representativeness claim and the utility of the intrinsic/extrinsic and pre/intra/post taxonomies as a 'robust toolkit'.

Authors: We acknowledge the validity of this observation. The current manuscript presents a narrative synthesis of key literature rather than a systematic review following protocols such as PRISMA. To address the concern, we will add a dedicated subsection (likely in the Introduction) that explicitly describes the literature scope: primary sources include arXiv, ACL Anthology, NeurIPS, and Google Scholar; coverage focuses on works from 2018 to October 2024; inclusion was guided by relevance to LLM bias origins, evaluation benchmarks, and mitigation strategies, with emphasis on highly cited and representative papers. We will also moderate phrasing from 'comprehensive review' to 'extensive review' and 'synthesizing current knowledge' to 'synthesizing key developments' to align with the narrative nature of the work. These changes will clarify the basis for the intrinsic/extrinsic and pre/intra/post categorizations without overstating systematic aggregation, thereby supporting their utility as a practical reference. revision: yes

Circularity Check

0 steps flagged

Review paper aggregates external literature without internal circular derivations

full rationale

This is a review paper synthesizing existing literature on LLM bias. The abstract describes categorization of biases (intrinsic/extrinsic) and mitigation strategies (pre/intra/post-model) drawn from surveyed works, with no original equations, predictions, fitted parameters, or derivations presented. No load-bearing self-citations or self-definitional steps are evident in the provided text; the central synthesis claim rests on external sources rather than reducing to inputs defined within the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This paper is a literature review and introduces no new free parameters, axioms, or invented entities; all content rests on synthesis of previously published research.

pith-pipeline@v0.9.0 · 5725 in / 1088 out tokens · 28368 ms · 2026-05-23T17:04:18.628730+00:00 · methodology

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?
cs.CL 2026-05 unverdicted novelty 7.0

Personalized LLM-generated plain language summaries improve lay readers' comprehension and quality ratings but increase risks of reinforcing biases and introducing hallucinations compared to static expert summaries.
Counting Worlds Branching Time Semantics for post-hoc Bias Mitigation in generative AI
cs.LO 2026-04 unverdicted novelty 7.0

CTLF is a branching-time logic with counting-worlds semantics for verifying fairness in probability distributions over protected attributes, predicting bias bounds, and calculating outputs to remove in generative AI series.
When AI reviews science: Can we trust the referee?
cs.AI 2026-04 unverdicted novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference sub...
Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users
cs.AI 2025-12 unverdicted novelty 6.0

LLM safety evaluations for personal advice must test responses against diverse user vulnerability profiles, since context-blind ratings overestimate safety and realistic prompt context does not fix the problem.
A Study of LLMs' Preferences for Libraries and Programming Languages
cs.SE 2025-03 unverdicted novelty 6.0

Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.
FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment
cs.AI 2026-04 unverdicted novelty 4.0

Vision-language models for wellbeing assessment exhibit dataset-dependent performance and demographic biases, with explainability interventions providing inconsistent fairness gains at potential accuracy costs.
A Survey on LLM-as-a-Judge
cs.CL 2024-11 unverdicted novelty 4.0

A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

Reference graph

Works this paper leans on

82 extracted references · 82 canonical work pages · cited by 7 Pith papers · 19 internal anchors

[1]

Persistent an ti-muslim bias in large language models

Abubakar Abid, Maheen Farooqi, and James Zou. Persistent an ti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, page 298–306, New York, NY, USA,

work page 2021
[2]

Arif Ahmad and Pushpak Bhattacharyya

Association for Computing Machinery. Arif Ahmad and Pushpak Bhattacharyya. Bias in language mode ls: A survey. Jaimeen Ahn and Alice Oh. Mitigating language-dependent et hnic bias in BERT. In Proceedings of the 2021 Conference on Empirical Methods in Natural Langua ge Processing, Online and Punta Cana, Dominican Republic, November

work page 2021
[3]

AJ Alvero, Jinsook Lee, Alejandra Regla-Vargas, Rene Kizil ec, Thorsten Joachims, and Anthony Lis- ing Antonio

doi: 10.1109/TAC.1974.1100705. AJ Alvero, Jinsook Lee, Alejandra Regla-Vargas, Rene Kizil ec, Thorsten Joachims, and Anthony Lis- ing Antonio. Large language models, social demography, and hegemony: Comparing authorship in human and synthetic text. Preprint, pages 1–25,

work page doi:10.1109/tac.1974.1100705 1974
[4]

Do large language models discriminate in hiring decisions on the basis of race , ethnicity, and gender? arXiv preprint arXiv:2406.10486,

Haozhe An, Christabel Acquaye, Colin Wang, Zongxia Li, and R achel Rudinger. Do large language models discriminate in hiring decisions on the basis of race , ethnicity, and gender? arXiv preprint arXiv:2406.10486,

work page arXiv
[5]

Machine bias

Julia Angwin, Jeﬀ Larson, Surya Mattu, and Lauren Kirchner. Machine bias. ProPublica, 23(2016): 139–159,

work page 2016
[6]

Fairmonitor: A dual-framework for detecting stereotypes and biases in lar ge language models

Yanhong Bai, Jiabao Zhao, Jinxin Shi, Zhentao Xie, Xingjiao Wu, and Liang He. Fairmonitor: A dual-framework for detecting stereotypes and biases in lar ge language models. arXiv preprint arXiv:2405.03098,

work page arXiv
[7]

Evaluating the Underlying Gender Bias in Contextualized Word Embeddings

As- sociation for Computational Linguistics. Christine Basta, Marta R Costa-Jussà, and Noe Casas. Evalua ting the underlying gender bias in contextualized word embeddings. arXiv preprint arXiv:1904.08783 ,

work page internal anchor Pith review Pith/arXiv arXiv 1904
[8]

On the dangers of stochastic parrots: Can language models be too bi g

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too bi g. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages 610–623,

work page 2021
[9]

Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Visi on (W ACV), pages 1536–1546

Abeba Birhane and Vinay Uday Prabhu. Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Visi on (W ACV), pages 1536–1546. IEEE,

work page 2021
[10]

Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English

Su Lin Blodgett and Brendan O’Connor. Racial disparity in na tural language processing: A case study of social media african-american english. arXiv preprint arXiv:1707.00061 ,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

A large annotated corpus for learning natural language inference

Samuel R Bowman, Gabor Angeli, Christopher Potts, and Chris topher D Manning. A large anno- tated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 ,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Language models are few-shot learners

21 Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jar ed Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda As kell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems , 33:1877–1901,

work page 1901
[13]

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, an d Lucia Specia. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-li ngual focused evaluation. arXiv preprint arXiv:1708.00055,

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

My f air lady: Detecting and mitigating bias in job advertisements

Michelle Chen, Zhu Ma, Aniko Hannak, and Christo Wilson. My f air lady: Detecting and mitigating bias in job advertisements. Proceedings of the 2018 World Wide Web Conference , pages 991–1000,

work page 2018
[15]

Enhanced lstm for natural language inference

Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, and D iana Inkpen. Enhanced lstm for natural language inference. arXiv preprint arXiv:1609.06038 ,

work page arXiv
[16]

Interactive analysis of llms using mea ningful counterfactuals

Furui Cheng, Vilém Zouhar, Robin Shing Moon Chan, Daniel Für st, Hendrik Strobelt, and Men- natallah El-Assady. Interactive analysis of llms using mea ningful counterfactuals. arXiv preprint arXiv:2405.00708,

work page arXiv
[17]

Improving n eural conversational models with entropy-based data ﬁltering

Richárd Csáky, Patrik Purgai, and Gábor Recski. Improving n eural conversational models with entropy-based data ﬁltering. arXiv preprint arXiv:1905.05471 ,

work page arXiv 1905
[18]

Ad- vances in neural information processing systems , 33:4271–4282

22 Debarati Das, Karin De Langis, Anna Martin, Jaehyung Kim, Mi nhwa Lee, Zae Myung Kim, Shirley Hayati, Risako Owan, Bin Hu, Ritik Parkar, et al. Under the su rface: Tracking the artifactuality of llm-generated data. arXiv preprint arXiv:2401.14698 ,

work page arXiv
[19]

Semantic change character- ization with llms using rhetorics

Jader Martins Camboim de Sá, Marcos Da Silveira, and Cédric P ruski. Semantic change character- ization with llms using rhetorics. arXiv preprint arXiv:2407.16624 ,

work page arXiv
[20]

On measures of biases and harms in nlp

Sunipa Dev, Emily Sheng, Jieyu Zhao, Aubrie Amstutz, Jiao Su n, Yu Hou, Mattie Sanseverino, Jiin Kim, Akihiro Nishi, Nanyun Peng, et al. On measures of biases and harms in nlp. arXiv preprint arXiv:2108.03362,

work page arXiv
[21]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Tout anova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 ,

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Query Expansion with Locally-Trained Word Embeddings

Fernando Diaz, Bhaskar Mitra, and Nick Craswell. Query expa nsion with locally-trained word embeddings. arXiv preprint arXiv:1605.07891 ,

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Addressing age- related bias in sentiment analysis

Mark Díaz, Isaac Johnson, Amanda Lazar, Anne Marie Piper, an d Darren Gergle. Addressing age- related bias in sentiment analysis. In Proceedings of the 2018 chi conference on human factors in computing systems , pages 1–14,

work page 2018
[24]

Evaluating vocab ulary usage in llms

Matthew Durward and Christopher Thomson. Evaluating vocab ulary usage in llms. In Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educa tional Applications (BEA 2024), pages 266–282,

work page 2024
[25]

Cognitive bias in high- stakes decision-making with llms

Jessica Echterhoﬀ, Yao Liu, Abeer Alessa, Julian McAuley, a nd Zexue He. Cognitive bias in high- stakes decision-making with llms. arXiv preprint arXiv:2403.00811 ,

work page arXiv
[26]

Robbie: Robust bias evaluation of large generative language models

David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuc hen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, and Eric Smi th. Robbie: Robust bias evaluation of large generative language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages 3764–3814,

work page 2023
[27]

AllenNLP: A Deep Semantic Natural Language Processing Platform

Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Prad eep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. Allennlp: A deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640 ,

work page internal anchor Pith review Pith/arXiv arXiv
[28]

He is very intelligent , she is very beautiful? on mitigating social biases in language modelling and generation

Aparna Garimella, Akhash Amarnath, Kiran Kumar, Akash Pram od Yalla, N Anandhavelu, Niyati Chhaya, and Balaji Vasan Srinivasan. He is very intelligent , she is very beautiful? on mitigating social biases in language modelling and generation. In Findings of the Association for Computa- tional Linguistics: ACL-IJCNLP 2021 , pages 4534–4545,

work page 2021
[29]

Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. RealToxici- tyPrompts: Evaluating neural toxic degeneration in langua ge models. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics : EMNLP 2020 , pages 3356–3369. Association for Computational Linguisti cs,

work page 2020
[30]

Statistical challenges wit h dataset construction: Why you will never have enough images

Josh Goldman and John K Tsotsos. Statistical challenges wit h dataset construction: Why you will never have enough images. arXiv preprint arXiv:2408.11160 ,

work page arXiv
[31]

Unboxing occupat ional bias: Grounded debiasing llms with us labor data

Atmika Gorti, Manas Gaur, and Aman Chadha. Unboxing occupat ional bias: Grounded debiasing llms with us labor data. arXiv preprint arXiv:2408.11247 ,

work page arXiv
[32]

Sentime nt analysis with nlp on twitter data

24 Md Rakibul Hasan, Maisha Maliha, and M Arifuzzaman. Sentime nt analysis with nlp on twitter data. In 2019 international conference on computer, communication, c hemical, materials and electronic engineering (IC4ME2) , pages 1–4. IEEE,

work page 2019
[33]

Data Mining, Inference, and Prediction

URL https://doi.org/10.1007/978-0-387-84858-7 . Lucy Havens, Melissa Terras, Benjamin Bach, and Beatrice Al ex. Uncertainty and inclusivity in gender bias annotation: An annotation taxonomy and annotat ed datasets of british english text. In 4th Workshop on Gender Bias in Natural Language Processing at NAACL, pages 30–57. ACL Anthology,

work page doi:10.1007/978-0-387-84858-7
[34]

URL https://www.jstor.org/stable/1912352

doi: 10.2307/1912352. URL https://www.jstor.org/stable/1912352. Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darr ell, and Anna Rohrbach. Women also snowboard: Overcoming bias in captioning models. In Proceedings of the European conference on computer vision (ECCV) , pages 771–787,

work page doi:10.2307/1912352
[35]

A structural probe fo r ﬁnding syntax in word representa- tions

John Hewitt and Christopher D Manning. A structural probe fo r ﬁnding syntax in word representa- tions. In Proceedings of the 2019 Conference of the North American Chapt er of the Association for Computational Linguistics , pages 4129–4138,

work page 2019
[36]

Li and Kevin Hou

Chuan Tian Hongfei Li, Qian H. Li and Kevin Hou. Issues in cox p roportional hazards model with unequal randomization. Journal of Biopharmaceutical Statistics , 0(0):1–6, 2024a. doi: 10.1080/ 10543406.2024.2418139. URL https://doi.org/10.1080/10543406.2024.2418139. PMID: 39445665. Chuan Tian Hongfei Li, Qian H. Li and Kevin Hou. Issues in cox p roportiona...

work page doi:10.1080/10543406.2024.2418139 2024
[37]

The importance of modeling social fa ctors of language: Theory and practice

Dirk Hovy and Diyi Yang. The importance of modeling social fa ctors of language: Theory and practice. In Proceedings of the 2021 Conference of the North American Chapt er of the Association for Computational Linguistics: Human language technologi es, pages 588–602,

work page 2021
[38]

Toxicity detection for free

Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, and David W agner. Toxicity detection for free. arXiv preprint arXiv:2405.18822 ,

work page arXiv
[39]

Up5: Unbiased foun- dation model for fairness-aware recommendation

25 Wenyue Hua, Yingqiang Ge, Shuyuan Xu, Jianchao Ji, and Yongf eng Zhang. Up5: Unbiased foun- dation model for fairness-aware recommendation. arXiv preprint arXiv:2305.12090 ,

work page arXiv
[40]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. L lama guard: Llm-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674 ,

work page internal anchor Pith review Pith/arXiv arXiv
[41]

Ctrl: A conditional transformer language model for controllable generation

Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, Caimin g Xiong, and Richard Socher. Ctrl: A conditional transformer language model for controllable generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Process ing, pages 111–129,

work page 2019
[42]

Eval uating the diversity, equity and inclusion of nlp technology: A case study for indian languag es

Simran Khanuja, Sebastian Ruder, and Partha Talukdar. Eval uating the diversity, equity and inclusion of nlp technology: A case study for indian languag es. arXiv preprint arXiv:2205.12676 ,

work page arXiv
[43]

Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems

Svetlana Kiritchenko and Saif M Mohammad. Examining gender and race bias in two hundred sentiment analysis systems. arXiv preprint arXiv:1805.04508 ,

work page internal anchor Pith review Pith/arXiv arXiv
[44]

Can ll ms recognize toxic- ity? structured toxicity investigation framework and sema ntic-based metric

Hyukhun Koh, Dohyung Kim, Minwoo Lee, and Kyomin Jung. Can ll ms recognize toxic- ity? structured toxicity investigation framework and sema ntic-based metric. arXiv preprint arXiv:2402.06900,

work page arXiv
[45]

Measuring Bias in Contextualized Word Representations

Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yul ia Tsvetkov. Measuring bias in contextualized word representations. arXiv preprint arXiv:1906.07337 ,

work page internal anchor Pith review Pith/arXiv arXiv 1906
[46]

Neural embed- ding of beliefs reveals the role of relative dissonance in hu man decision-making

Byunghwee Lee, Rachith Aiyappa, Yong-Yeol Ahn, Haewoon Kwa k, and Jisun An. Neural embed- ding of beliefs reveals the role of relative dissonance in hu man decision-making. arXiv preprint arXiv:2408.07237,

work page arXiv
[47]

End-to-end Neural Coreference Resolution

Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. End -to-end neural coreference resolution. arXiv preprint arXiv:1707.07045 ,

work page internal anchor Pith review Pith/arXiv arXiv
[48]

Comparing biases and the im pact of multilingual training across multiple languages

Sharon Levy, Neha John, Ling Liu, Yogarshi Vyas, Jie Ma, Yosh inari Fujinuma, Miguel Ballesteros, Vittorio Castelli, and Dan Roth. Comparing biases and the im pact of multilingual training across multiple languages. In Proceedings of the 2023 Conference on Empirical Methods in Na tural Language Processing, Singapore, December

work page 2023
[49]

Steer- ing llms towards unbiased responses: A causality-guided de biasing framework

Jingling Li, Zeyu Tang, Xiaoyu Liu, Peter Spirtes, Kun Zhang , Liu Leqi, and Yang Liu. Steer- ing llms towards unbiased responses: A causality-guided de biasing framework. arXiv preprint arXiv:2403.08743, 2024a. Weitao Li, Junkai Li, Weizhi Ma, and Yang Liu. Citation-enha nced generation for llm-based chatbot. arXiv preprint arXiv:2402.16063 , 2024b. Ying...

work page arXiv
[50]

On Measuring Social Biases in Sentence Encoders

URL https://api.semanticscholar.org/CorpusID:202541569. Chandler May, Alex Wang, Shikha Bordia, Samuel R Bowman, and Rachel Rudinger. On measuring social biases in sentence encoders. arXiv preprint arXiv:1903.10561 ,

work page internal anchor Pith review Pith/arXiv arXiv 1903
[51]

Text classiﬁcation using label names only: A language model self -training approach

Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, C hao Zhang, and Jiawei Han. Text classiﬁcation using label names only: A language model self -training approach. arXiv preprint arXiv:2010.07245,

work page arXiv 2010
[52]

Global gallery: The ﬁne art of painting culture portraits through multilingual instruction tuning

Anjishnu Mukherjee, Aylin Caliskan, Ziwei Zhu, and Antonio s Anastasopoulos. Global gallery: The ﬁne art of painting culture portraits through multilingual instruction tuning. In Proceedings of the 2024 Conference of the North American Chapter of the Associati on for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages 6398–6415,

work page 2024
[53]

Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, Bing Xiang, et al. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023 ,

work page internal anchor Pith review Pith/arXiv arXiv
[54]

URL https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-9-56

doi: 10.1186/1471-2288-9-56. URL https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-9-56. Davide Neri, Jacopo Soldani, Olaf Zimmermann, and Antonio B rogi. Design principles, architectural smells and refactorings for microservices: a multivocal re view. SICS Software-Intensive Cyber- Physical Systems , 35:3–15,

work page doi:10.1186/1471-2288-9-56
[55]

Do multilingual large language models mitigate stereotype bias? In Proceedings of the 2nd Workshop on Cross-Cultural Considera tions in NLP , Bangkok, Thailand, August 2024a

Shangrui Nie, Michael Fromm, Charles Welch, Rebekka Görge, Akbar Karimi, Joan Plepi, Nazia Mowmita, Nicolas Flores-Herr, Mehdi Ali, and Lucie Flek. Do multilingual large language models mitigate stereotype bias? In Proceedings of the 2nd Workshop on Cross-Cultural Considera tions in NLP , Bangkok, Thailand, August 2024a. Association for Computa tional Lin...

work page arXiv
[56]

Competent men and warm women: Gender stereo- types and backlash in image search results

Jahna Otterbacher, Jo Bates, and Paul Clough. Competent men and warm women: Gender stereo- types and backlash in image search results. In Proceedings of the 2017 chi conference on human factors in computing systems , pages 6620–6631,

work page 2017
[57]

Reducing gender bia s in abusive language detection

Ji Ho Park, Jamin Shin, and Pascale Fung. Reducing gender bia s in abusive language detection. In Proceedings of the 2018 Conference on Empirical Methods in Na tural Language Processing , Brussels, Belgium, October-November

work page 2018
[58]

Models and dat asets for cross-lingual summarisation

Laura Perez-Beltrachini and Mirella Lapata. Models and dat asets for cross-lingual summarisation. arXiv preprint arXiv:2202.09583 ,

work page arXiv
[59]

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Perc y Liang. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 ,

work page internal anchor Pith review Pith/arXiv arXiv
[60]

Know What You Don't Know: Unanswerable Questions for SQuAD

Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you d on’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822 ,

work page internal anchor Pith review Pith/arXiv arXiv
[61]

Gender Bias in Coreference Resolution

doi: 10.1037/h0037350. Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benj amin Van Durme. Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301 ,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1037/h0037350
[62]

Im not Racist but

Abel Salinas, Louis Penaﬁel, Robert McCormack, and Fred Mor statter. " im not racist but...": Dis- covering bias in the internal knowledge of large language mo dels. arXiv preprint arXiv:2310.08780,

work page arXiv
[63]

Xing , title =

Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Jo el Hestness, Natalia Vassilieva, Daria Soboleva, and Eric Xing. Slimpajama-dc: Understanding dat a combinations for llm training. arXiv preprint arXiv:2309.10818 ,

work page arXiv
[64]

The woman worked as a babysit- ter: On biases in language generation

Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng . The woman worked as a babysit- ter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Join t Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412,

work page 2019
[65]

Culturebank: An online community-driven knowl edge base towards culturally aware language technologies

Weiyan Shi, Ryan Li, Yutong Zhang, Caleb Ziems, Raya Horesh, Rogério Abreu de Paula, Diyi Yang, et al. Culturebank: An online community-driven knowl edge base towards culturally aware language technologies. arXiv preprint arXiv:2404.15238 ,

work page arXiv
[66]

Large language model s as subpopulation representative models: A review

Gabriel Simmons and Christopher Hare. Large language model s as subpopulation representative models: A review. arXiv preprint arXiv:2310.17888 ,

work page arXiv
[67]

Dropout: a simple way to prevent neural networks from overﬁt ting

Nitish Srivastava, Geoﬀrey Hinton, Alex Krizhevsky, Ilya S utskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁt ting. The Journal of Machine Learning Research, 15(1):1929–1958,

work page 1929
[68]

Mitigating Gender Bias in Natural Language Processing: Literature Review

Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSher ief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. Mi tigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976 ,

work page internal anchor Pith review Pith/arXiv arXiv 1906
[69]

Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation

Eva Vanmassenhove, Dimitar Shterionov, and Andy Way. Lost i n translation: Loss and decay of linguistic richness in machine translation. arXiv preprint arXiv:1906.12068 ,

work page internal anchor Pith review Pith/arXiv arXiv 1906
[70]

URL https://projecteuclid.org/euclid.aos/1176345802

doi: 10.1214/aos/1176345802. URL https://projecteuclid.org/euclid.aos/1176345802. A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems ,

work page doi:10.1214/aos/1176345802
[71]

Cross-lingual semant ic similarity of words as the similarity of their semantic word responses

Ivan Vulic and Marie-Francine Moens. Cross-lingual semant ic similarity of words as the similarity of their semantic word responses. In Proceedings of the 2013 Conference of the North Amer- ican Chapter of the Association for Computational Linguist ics: Human Language Technologies (NAACL-HLT 2013), pages 106–116. ACL; East Stroudsburg, PA,

work page 2013
[72]

Lar ge language models cannot replace human participants because they cannot portray identity gr oups

Angelina Wang, Jamie Morgenstern, and John P Dickerson. Lar ge language models cannot replace human participants because they cannot portray identity gr oups. arXiv preprint arXiv:2402.01908, 2024a. Xinru Wang, Hannah Kim, Sajjadur Rahman, Kushan Mitra, and Z hengjie Miao. Human-llm collaborative annotation through eﬀective veriﬁcation of llm labels. In P...

work page arXiv
[73]

Measuring and reducing gendered cor relations in pre-trained models

Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beutel, Emily Pitler, Ellie Pavlick, Jilin Chen, Ed Chi, and Slav Petrov. Measuring and reducing gendered cor relations in pre-trained models. arXiv preprint arXiv:2010.06032 ,

work page arXiv 2010
[74]

Auditing large language models for enhanced text-based stereotype detect ion and probing-based bias evaluation

Zekun Wu, Sahan Bulathwela, Maria Perez-Ortiz, and Adriano Soares Koshiyama. Auditing large language models for enhanced text-based stereotype detect ion and probing-based bias evaluation. arXiv preprint arXiv:2404.01768 ,

work page arXiv
[75]

Finbert: A pretrained language model for financial communications

Yuqi Yang, Yuan Yuan, and Lei Liu. Finbert: A pretrained lang uage model for ﬁnancial communi- cations. arXiv preprint arXiv:2006.08097 ,

work page arXiv 2006
[76]

Cau sal prompting: Debiasing large language model prompting based on front-door adjustment

Congzhi Zhang, Linhai Zhang, Deyu Zhou, and Guoqiang Xu. Cau sal prompting: Debiasing large language model prompting based on front-door adjustment. arXiv preprint arXiv:2403.02738 ,

work page arXiv
[77]

Deep learning ba sed recommender system: A survey and new perspectives

Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. Deep learning ba sed recommender system: A survey and new perspectives. ACM computing surveys (CSUR) , 52(1):1–38, 2019a. Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris B rockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. Dialogpt: Large-scale genera tive pre-training for conversationa...

work page arXiv 1911
[78]

Gender Bias in Contextualized Word Embeddings

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Gender bias in coreference resolution: Evaluation and debiasing methods . In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computati onal Linguistics , pages 15–20, 2018a. Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei ...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[79]

34 Appendix A

Association for Computational Linguistics. 34 Appendix A. Examples of Extrinsic Biases A.1 Natural Language Understanding (NLU) tasks NLU encompasses a broad range of tasks that aim to improve com prehension of input sequences (Chang et al., 2024). It seeks to grasp the deeper connotatio ns and implications inherent in human communication, focusing on wha...

work page 2024
[80]

he,” “she,

This task is crucial for accurately interpreting the meaning of sentences, especially in cases where pronouns, names, or other referen- tial expressions are used. The primary goal of coreference r esolution is to correctly link pronouns like “he,” “she,” or “it” and deﬁnite descriptions like “the CEO” to the appropriate entity mentioned earlier in the tex...

work page 2018

Showing first 80 references.

[1] [1]

Persistent an ti-muslim bias in large language models

Abubakar Abid, Maheen Farooqi, and James Zou. Persistent an ti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, page 298–306, New York, NY, USA,

work page 2021

[2] [2]

Arif Ahmad and Pushpak Bhattacharyya

Association for Computing Machinery. Arif Ahmad and Pushpak Bhattacharyya. Bias in language mode ls: A survey. Jaimeen Ahn and Alice Oh. Mitigating language-dependent et hnic bias in BERT. In Proceedings of the 2021 Conference on Empirical Methods in Natural Langua ge Processing, Online and Punta Cana, Dominican Republic, November

work page 2021

[3] [3]

AJ Alvero, Jinsook Lee, Alejandra Regla-Vargas, Rene Kizil ec, Thorsten Joachims, and Anthony Lis- ing Antonio

doi: 10.1109/TAC.1974.1100705. AJ Alvero, Jinsook Lee, Alejandra Regla-Vargas, Rene Kizil ec, Thorsten Joachims, and Anthony Lis- ing Antonio. Large language models, social demography, and hegemony: Comparing authorship in human and synthetic text. Preprint, pages 1–25,

work page doi:10.1109/tac.1974.1100705 1974

[4] [4]

Do large language models discriminate in hiring decisions on the basis of race , ethnicity, and gender? arXiv preprint arXiv:2406.10486,

Haozhe An, Christabel Acquaye, Colin Wang, Zongxia Li, and R achel Rudinger. Do large language models discriminate in hiring decisions on the basis of race , ethnicity, and gender? arXiv preprint arXiv:2406.10486,

work page arXiv

[5] [5]

Machine bias

Julia Angwin, Jeﬀ Larson, Surya Mattu, and Lauren Kirchner. Machine bias. ProPublica, 23(2016): 139–159,

work page 2016

[6] [6]

Fairmonitor: A dual-framework for detecting stereotypes and biases in lar ge language models

Yanhong Bai, Jiabao Zhao, Jinxin Shi, Zhentao Xie, Xingjiao Wu, and Liang He. Fairmonitor: A dual-framework for detecting stereotypes and biases in lar ge language models. arXiv preprint arXiv:2405.03098,

work page arXiv

[7] [7]

Evaluating the Underlying Gender Bias in Contextualized Word Embeddings

As- sociation for Computational Linguistics. Christine Basta, Marta R Costa-Jussà, and Noe Casas. Evalua ting the underlying gender bias in contextualized word embeddings. arXiv preprint arXiv:1904.08783 ,

work page internal anchor Pith review Pith/arXiv arXiv 1904

[8] [8]

On the dangers of stochastic parrots: Can language models be too bi g

Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too bi g. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency , pages 610–623,

work page 2021

[9] [9]

Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Visi on (W ACV), pages 1536–1546

Abeba Birhane and Vinay Uday Prabhu. Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Visi on (W ACV), pages 1536–1546. IEEE,

work page 2021

[10] [10]

Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English

Su Lin Blodgett and Brendan O’Connor. Racial disparity in na tural language processing: A case study of social media african-american english. arXiv preprint arXiv:1707.00061 ,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

A large annotated corpus for learning natural language inference

Samuel R Bowman, Gabor Angeli, Christopher Potts, and Chris topher D Manning. A large anno- tated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 ,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Language models are few-shot learners

21 Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jar ed Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda As kell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems , 33:1877–1901,

work page 1901

[13] [13]

SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation

Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, an d Lucia Specia. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-li ngual focused evaluation. arXiv preprint arXiv:1708.00055,

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

My f air lady: Detecting and mitigating bias in job advertisements

Michelle Chen, Zhu Ma, Aniko Hannak, and Christo Wilson. My f air lady: Detecting and mitigating bias in job advertisements. Proceedings of the 2018 World Wide Web Conference , pages 991–1000,

work page 2018

[15] [15]

Enhanced lstm for natural language inference

Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, and D iana Inkpen. Enhanced lstm for natural language inference. arXiv preprint arXiv:1609.06038 ,

work page arXiv

[16] [16]

Interactive analysis of llms using mea ningful counterfactuals

Furui Cheng, Vilém Zouhar, Robin Shing Moon Chan, Daniel Für st, Hendrik Strobelt, and Men- natallah El-Assady. Interactive analysis of llms using mea ningful counterfactuals. arXiv preprint arXiv:2405.00708,

work page arXiv

[17] [17]

Improving n eural conversational models with entropy-based data ﬁltering

Richárd Csáky, Patrik Purgai, and Gábor Recski. Improving n eural conversational models with entropy-based data ﬁltering. arXiv preprint arXiv:1905.05471 ,

work page arXiv 1905

[18] [18]

Ad- vances in neural information processing systems , 33:4271–4282

22 Debarati Das, Karin De Langis, Anna Martin, Jaehyung Kim, Mi nhwa Lee, Zae Myung Kim, Shirley Hayati, Risako Owan, Bin Hu, Ritik Parkar, et al. Under the su rface: Tracking the artifactuality of llm-generated data. arXiv preprint arXiv:2401.14698 ,

work page arXiv

[19] [19]

Semantic change character- ization with llms using rhetorics

Jader Martins Camboim de Sá, Marcos Da Silveira, and Cédric P ruski. Semantic change character- ization with llms using rhetorics. arXiv preprint arXiv:2407.16624 ,

work page arXiv

[20] [20]

On measures of biases and harms in nlp

Sunipa Dev, Emily Sheng, Jieyu Zhao, Aubrie Amstutz, Jiao Su n, Yu Hou, Mattie Sanseverino, Jiin Kim, Akihiro Nishi, Nanyun Peng, et al. On measures of biases and harms in nlp. arXiv preprint arXiv:2108.03362,

work page arXiv

[21] [21]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Tout anova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 ,

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

Query Expansion with Locally-Trained Word Embeddings

Fernando Diaz, Bhaskar Mitra, and Nick Craswell. Query expa nsion with locally-trained word embeddings. arXiv preprint arXiv:1605.07891 ,

work page internal anchor Pith review Pith/arXiv arXiv

[23] [23]

Addressing age- related bias in sentiment analysis

Mark Díaz, Isaac Johnson, Amanda Lazar, Anne Marie Piper, an d Darren Gergle. Addressing age- related bias in sentiment analysis. In Proceedings of the 2018 chi conference on human factors in computing systems , pages 1–14,

work page 2018

[24] [24]

Evaluating vocab ulary usage in llms

Matthew Durward and Christopher Thomson. Evaluating vocab ulary usage in llms. In Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educa tional Applications (BEA 2024), pages 266–282,

work page 2024

[25] [25]

Cognitive bias in high- stakes decision-making with llms

Jessica Echterhoﬀ, Yao Liu, Abeer Alessa, Julian McAuley, a nd Zexue He. Cognitive bias in high- stakes decision-making with llms. arXiv preprint arXiv:2403.00811 ,

work page arXiv

[26] [26]

Robbie: Robust bias evaluation of large generative language models

David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuc hen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, and Eric Smi th. Robbie: Robust bias evaluation of large generative language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages 3764–3814,

work page 2023

[27] [27]

AllenNLP: A Deep Semantic Natural Language Processing Platform

Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Prad eep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. Allennlp: A deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640 ,

work page internal anchor Pith review Pith/arXiv arXiv

[28] [28]

He is very intelligent , she is very beautiful? on mitigating social biases in language modelling and generation

Aparna Garimella, Akhash Amarnath, Kiran Kumar, Akash Pram od Yalla, N Anandhavelu, Niyati Chhaya, and Balaji Vasan Srinivasan. He is very intelligent , she is very beautiful? on mitigating social biases in language modelling and generation. In Findings of the Association for Computa- tional Linguistics: ACL-IJCNLP 2021 , pages 4534–4545,

work page 2021

[29] [29]

Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. RealToxici- tyPrompts: Evaluating neural toxic degeneration in langua ge models. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics : EMNLP 2020 , pages 3356–3369. Association for Computational Linguisti cs,

work page 2020

[30] [30]

Statistical challenges wit h dataset construction: Why you will never have enough images

Josh Goldman and John K Tsotsos. Statistical challenges wit h dataset construction: Why you will never have enough images. arXiv preprint arXiv:2408.11160 ,

work page arXiv

[31] [31]

Unboxing occupat ional bias: Grounded debiasing llms with us labor data

Atmika Gorti, Manas Gaur, and Aman Chadha. Unboxing occupat ional bias: Grounded debiasing llms with us labor data. arXiv preprint arXiv:2408.11247 ,

work page arXiv

[32] [32]

Sentime nt analysis with nlp on twitter data

24 Md Rakibul Hasan, Maisha Maliha, and M Arifuzzaman. Sentime nt analysis with nlp on twitter data. In 2019 international conference on computer, communication, c hemical, materials and electronic engineering (IC4ME2) , pages 1–4. IEEE,

work page 2019

[33] [33]

Data Mining, Inference, and Prediction

URL https://doi.org/10.1007/978-0-387-84858-7 . Lucy Havens, Melissa Terras, Benjamin Bach, and Beatrice Al ex. Uncertainty and inclusivity in gender bias annotation: An annotation taxonomy and annotat ed datasets of british english text. In 4th Workshop on Gender Bias in Natural Language Processing at NAACL, pages 30–57. ACL Anthology,

work page doi:10.1007/978-0-387-84858-7

[34] [34]

URL https://www.jstor.org/stable/1912352

doi: 10.2307/1912352. URL https://www.jstor.org/stable/1912352. Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darr ell, and Anna Rohrbach. Women also snowboard: Overcoming bias in captioning models. In Proceedings of the European conference on computer vision (ECCV) , pages 771–787,

work page doi:10.2307/1912352

[35] [35]

A structural probe fo r ﬁnding syntax in word representa- tions

John Hewitt and Christopher D Manning. A structural probe fo r ﬁnding syntax in word representa- tions. In Proceedings of the 2019 Conference of the North American Chapt er of the Association for Computational Linguistics , pages 4129–4138,

work page 2019

[36] [36]

Li and Kevin Hou

Chuan Tian Hongfei Li, Qian H. Li and Kevin Hou. Issues in cox p roportional hazards model with unequal randomization. Journal of Biopharmaceutical Statistics , 0(0):1–6, 2024a. doi: 10.1080/ 10543406.2024.2418139. URL https://doi.org/10.1080/10543406.2024.2418139. PMID: 39445665. Chuan Tian Hongfei Li, Qian H. Li and Kevin Hou. Issues in cox p roportiona...

work page doi:10.1080/10543406.2024.2418139 2024

[37] [37]

The importance of modeling social fa ctors of language: Theory and practice

Dirk Hovy and Diyi Yang. The importance of modeling social fa ctors of language: Theory and practice. In Proceedings of the 2021 Conference of the North American Chapt er of the Association for Computational Linguistics: Human language technologi es, pages 588–602,

work page 2021

[38] [38]

Toxicity detection for free

Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, and David W agner. Toxicity detection for free. arXiv preprint arXiv:2405.18822 ,

work page arXiv

[39] [39]

Up5: Unbiased foun- dation model for fairness-aware recommendation

25 Wenyue Hua, Yingqiang Ge, Shuyuan Xu, Jianchao Ji, and Yongf eng Zhang. Up5: Unbiased foun- dation model for fairness-aware recommendation. arXiv preprint arXiv:2305.12090 ,

work page arXiv

[40] [40]

Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. L lama guard: Llm-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674 ,

work page internal anchor Pith review Pith/arXiv arXiv

[41] [41]

Ctrl: A conditional transformer language model for controllable generation

Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, Caimin g Xiong, and Richard Socher. Ctrl: A conditional transformer language model for controllable generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Process ing, pages 111–129,

work page 2019

[42] [42]

Eval uating the diversity, equity and inclusion of nlp technology: A case study for indian languag es

Simran Khanuja, Sebastian Ruder, and Partha Talukdar. Eval uating the diversity, equity and inclusion of nlp technology: A case study for indian languag es. arXiv preprint arXiv:2205.12676 ,

work page arXiv

[43] [43]

Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems

Svetlana Kiritchenko and Saif M Mohammad. Examining gender and race bias in two hundred sentiment analysis systems. arXiv preprint arXiv:1805.04508 ,

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

Can ll ms recognize toxic- ity? structured toxicity investigation framework and sema ntic-based metric

Hyukhun Koh, Dohyung Kim, Minwoo Lee, and Kyomin Jung. Can ll ms recognize toxic- ity? structured toxicity investigation framework and sema ntic-based metric. arXiv preprint arXiv:2402.06900,

work page arXiv

[45] [45]

Measuring Bias in Contextualized Word Representations

Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yul ia Tsvetkov. Measuring bias in contextualized word representations. arXiv preprint arXiv:1906.07337 ,

work page internal anchor Pith review Pith/arXiv arXiv 1906

[46] [46]

Neural embed- ding of beliefs reveals the role of relative dissonance in hu man decision-making

Byunghwee Lee, Rachith Aiyappa, Yong-Yeol Ahn, Haewoon Kwa k, and Jisun An. Neural embed- ding of beliefs reveals the role of relative dissonance in hu man decision-making. arXiv preprint arXiv:2408.07237,

work page arXiv

[47] [47]

End-to-end Neural Coreference Resolution

Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. End -to-end neural coreference resolution. arXiv preprint arXiv:1707.07045 ,

work page internal anchor Pith review Pith/arXiv arXiv

[48] [48]

Comparing biases and the im pact of multilingual training across multiple languages

Sharon Levy, Neha John, Ling Liu, Yogarshi Vyas, Jie Ma, Yosh inari Fujinuma, Miguel Ballesteros, Vittorio Castelli, and Dan Roth. Comparing biases and the im pact of multilingual training across multiple languages. In Proceedings of the 2023 Conference on Empirical Methods in Na tural Language Processing, Singapore, December

work page 2023

[49] [49]

Steer- ing llms towards unbiased responses: A causality-guided de biasing framework

Jingling Li, Zeyu Tang, Xiaoyu Liu, Peter Spirtes, Kun Zhang , Liu Leqi, and Yang Liu. Steer- ing llms towards unbiased responses: A causality-guided de biasing framework. arXiv preprint arXiv:2403.08743, 2024a. Weitao Li, Junkai Li, Weizhi Ma, and Yang Liu. Citation-enha nced generation for llm-based chatbot. arXiv preprint arXiv:2402.16063 , 2024b. Ying...

work page arXiv

[50] [50]

On Measuring Social Biases in Sentence Encoders

URL https://api.semanticscholar.org/CorpusID:202541569. Chandler May, Alex Wang, Shikha Bordia, Samuel R Bowman, and Rachel Rudinger. On measuring social biases in sentence encoders. arXiv preprint arXiv:1903.10561 ,

work page internal anchor Pith review Pith/arXiv arXiv 1903

[51] [51]

Text classiﬁcation using label names only: A language model self -training approach

Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, C hao Zhang, and Jiawei Han. Text classiﬁcation using label names only: A language model self -training approach. arXiv preprint arXiv:2010.07245,

work page arXiv 2010

[52] [52]

Global gallery: The ﬁne art of painting culture portraits through multilingual instruction tuning

Anjishnu Mukherjee, Aylin Caliskan, Ziwei Zhu, and Antonio s Anastasopoulos. Global gallery: The ﬁne art of painting culture portraits through multilingual instruction tuning. In Proceedings of the 2024 Conference of the North American Chapter of the Associati on for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages 6398–6415,

work page 2024

[53] [53]

Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

Ramesh Nallapati, Bowen Zhou, Caglar Gulcehre, Bing Xiang, et al. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023 ,

work page internal anchor Pith review Pith/arXiv arXiv

[54] [54]

URL https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-9-56

doi: 10.1186/1471-2288-9-56. URL https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-9-56. Davide Neri, Jacopo Soldani, Olaf Zimmermann, and Antonio B rogi. Design principles, architectural smells and refactorings for microservices: a multivocal re view. SICS Software-Intensive Cyber- Physical Systems , 35:3–15,

work page doi:10.1186/1471-2288-9-56

[55] [55]

Do multilingual large language models mitigate stereotype bias? In Proceedings of the 2nd Workshop on Cross-Cultural Considera tions in NLP , Bangkok, Thailand, August 2024a

Shangrui Nie, Michael Fromm, Charles Welch, Rebekka Görge, Akbar Karimi, Joan Plepi, Nazia Mowmita, Nicolas Flores-Herr, Mehdi Ali, and Lucie Flek. Do multilingual large language models mitigate stereotype bias? In Proceedings of the 2nd Workshop on Cross-Cultural Considera tions in NLP , Bangkok, Thailand, August 2024a. Association for Computa tional Lin...

work page arXiv

[56] [56]

Competent men and warm women: Gender stereo- types and backlash in image search results

Jahna Otterbacher, Jo Bates, and Paul Clough. Competent men and warm women: Gender stereo- types and backlash in image search results. In Proceedings of the 2017 chi conference on human factors in computing systems , pages 6620–6631,

work page 2017

[57] [57]

Reducing gender bia s in abusive language detection

Ji Ho Park, Jamin Shin, and Pascale Fung. Reducing gender bia s in abusive language detection. In Proceedings of the 2018 Conference on Empirical Methods in Na tural Language Processing , Brussels, Belgium, October-November

work page 2018

[58] [58]

Models and dat asets for cross-lingual summarisation

Laura Perez-Beltrachini and Mirella Lapata. Models and dat asets for cross-lingual summarisation. arXiv preprint arXiv:2202.09583 ,

work page arXiv

[59] [59]

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Perc y Liang. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 ,

work page internal anchor Pith review Pith/arXiv arXiv

[60] [60]

Know What You Don't Know: Unanswerable Questions for SQuAD

Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you d on’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822 ,

work page internal anchor Pith review Pith/arXiv arXiv

[61] [61]

Gender Bias in Coreference Resolution

doi: 10.1037/h0037350. Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benj amin Van Durme. Gender bias in coreference resolution. arXiv preprint arXiv:1804.09301 ,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1037/h0037350

[62] [62]

Im not Racist but

Abel Salinas, Louis Penaﬁel, Robert McCormack, and Fred Mor statter. " im not racist but...": Dis- covering bias in the internal knowledge of large language mo dels. arXiv preprint arXiv:2310.08780,

work page arXiv

[63] [63]

Xing , title =

Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Jo el Hestness, Natalia Vassilieva, Daria Soboleva, and Eric Xing. Slimpajama-dc: Understanding dat a combinations for llm training. arXiv preprint arXiv:2309.10818 ,

work page arXiv

[64] [64]

The woman worked as a babysit- ter: On biases in language generation

Emily Sheng, Kai-Wei Chang, Prem Natarajan, and Nanyun Peng . The woman worked as a babysit- ter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Join t Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412,

work page 2019

[65] [65]

Culturebank: An online community-driven knowl edge base towards culturally aware language technologies

Weiyan Shi, Ryan Li, Yutong Zhang, Caleb Ziems, Raya Horesh, Rogério Abreu de Paula, Diyi Yang, et al. Culturebank: An online community-driven knowl edge base towards culturally aware language technologies. arXiv preprint arXiv:2404.15238 ,

work page arXiv

[66] [66]

Large language model s as subpopulation representative models: A review

Gabriel Simmons and Christopher Hare. Large language model s as subpopulation representative models: A review. arXiv preprint arXiv:2310.17888 ,

work page arXiv

[67] [67]

Dropout: a simple way to prevent neural networks from overﬁt ting

Nitish Srivastava, Geoﬀrey Hinton, Alex Krizhevsky, Ilya S utskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁt ting. The Journal of Machine Learning Research, 15(1):1929–1958,

work page 1929

[68] [68]

Mitigating Gender Bias in Natural Language Processing: Literature Review

Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSher ief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. Mi tigating gender bias in natural language processing: Literature review. arXiv preprint arXiv:1906.08976 ,

work page internal anchor Pith review Pith/arXiv arXiv 1906

[69] [69]

Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation

Eva Vanmassenhove, Dimitar Shterionov, and Andy Way. Lost i n translation: Loss and decay of linguistic richness in machine translation. arXiv preprint arXiv:1906.12068 ,

work page internal anchor Pith review Pith/arXiv arXiv 1906

[70] [70]

URL https://projecteuclid.org/euclid.aos/1176345802

doi: 10.1214/aos/1176345802. URL https://projecteuclid.org/euclid.aos/1176345802. A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems ,

work page doi:10.1214/aos/1176345802

[71] [71]

Cross-lingual semant ic similarity of words as the similarity of their semantic word responses

Ivan Vulic and Marie-Francine Moens. Cross-lingual semant ic similarity of words as the similarity of their semantic word responses. In Proceedings of the 2013 Conference of the North Amer- ican Chapter of the Association for Computational Linguist ics: Human Language Technologies (NAACL-HLT 2013), pages 106–116. ACL; East Stroudsburg, PA,

work page 2013

[72] [72]

Lar ge language models cannot replace human participants because they cannot portray identity gr oups

Angelina Wang, Jamie Morgenstern, and John P Dickerson. Lar ge language models cannot replace human participants because they cannot portray identity gr oups. arXiv preprint arXiv:2402.01908, 2024a. Xinru Wang, Hannah Kim, Sajjadur Rahman, Kushan Mitra, and Z hengjie Miao. Human-llm collaborative annotation through eﬀective veriﬁcation of llm labels. In P...

work page arXiv

[73] [73]

Measuring and reducing gendered cor relations in pre-trained models

Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beutel, Emily Pitler, Ellie Pavlick, Jilin Chen, Ed Chi, and Slav Petrov. Measuring and reducing gendered cor relations in pre-trained models. arXiv preprint arXiv:2010.06032 ,

work page arXiv 2010

[74] [74]

Auditing large language models for enhanced text-based stereotype detect ion and probing-based bias evaluation

Zekun Wu, Sahan Bulathwela, Maria Perez-Ortiz, and Adriano Soares Koshiyama. Auditing large language models for enhanced text-based stereotype detect ion and probing-based bias evaluation. arXiv preprint arXiv:2404.01768 ,

work page arXiv

[75] [75]

Finbert: A pretrained language model for financial communications

Yuqi Yang, Yuan Yuan, and Lei Liu. Finbert: A pretrained lang uage model for ﬁnancial communi- cations. arXiv preprint arXiv:2006.08097 ,

work page arXiv 2006

[76] [76]

Cau sal prompting: Debiasing large language model prompting based on front-door adjustment

Congzhi Zhang, Linhai Zhang, Deyu Zhou, and Guoqiang Xu. Cau sal prompting: Debiasing large language model prompting based on front-door adjustment. arXiv preprint arXiv:2403.02738 ,

work page arXiv

[77] [77]

Deep learning ba sed recommender system: A survey and new perspectives

Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. Deep learning ba sed recommender system: A survey and new perspectives. ACM computing surveys (CSUR) , 52(1):1–38, 2019a. Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris B rockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. Dialogpt: Large-scale genera tive pre-training for conversationa...

work page arXiv 1911

[78] [78]

Gender Bias in Contextualized Word Embeddings

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Gender bias in coreference resolution: Evaluation and debiasing methods . In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computati onal Linguistics , pages 15–20, 2018a. Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei ...

work page internal anchor Pith review Pith/arXiv arXiv 2018

[79] [79]

34 Appendix A

Association for Computational Linguistics. 34 Appendix A. Examples of Extrinsic Biases A.1 Natural Language Understanding (NLU) tasks NLU encompasses a broad range of tasks that aim to improve com prehension of input sequences (Chang et al., 2024). It seeks to grasp the deeper connotatio ns and implications inherent in human communication, focusing on wha...

work page 2024

[80] [80]

he,” “she,

This task is crucial for accurately interpreting the meaning of sentences, especially in cases where pronouns, names, or other referen- tial expressions are used. The primary goal of coreference r esolution is to correctly link pronouns like “he,” “she,” or “it” and deﬁnite descriptions like “the CEO” to the appropriate entity mentioned earlier in the tex...

work page 2018