Bias in Large Language Models: Origin, Evaluation, and Mitigation

Yufei Guo , Muzhe Guo , Juntao Su , Zhou Yang , Mengqiu Zhu , Hongfei Li , Mengyang Qiu , Shuo Shuo Liu

Authors on Pith no claims yet

classification 💻 cs.CL cs.LG

keywords biasllmslanguagemitigationreviewbiasescomprehensivecurrent

read the original abstract

Large Language Models (LLMs) have revolutionized natural language processing, but their susceptibility to biases poses significant challenges. This comprehensive review examines the landscape of bias in LLMs, from its origins to current mitigation strategies. We categorize biases as intrinsic and extrinsic, analyzing their manifestations in various NLP tasks. The review critically assesses a range of bias evaluation methods, including data-level, model-level, and output-level approaches, providing researchers with a robust toolkit for bias detection. We further explore mitigation strategies, categorizing them into pre-model, intra-model, and post-model techniques, highlighting their effectiveness and limitations. Ethical and legal implications of biased LLMs are discussed, emphasizing potential harms in real-world applications such as healthcare and criminal justice. By synthesizing current knowledge on bias in LLMs, this review contributes to the ongoing effort to develop fair and responsible AI systems. Our work serves as a comprehensive resource for researchers and practitioners working towards understanding, evaluating, and mitigating bias in LLMs, fostering the development of more equitable AI technologies.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?
cs.CL 2026-05 unverdicted novelty 7.0

Personalized LLM-generated plain language summaries improve lay readers' comprehension and quality ratings but increase risks of reinforcing biases and introducing hallucinations compared to static expert summaries.
Counting Worlds Branching Time Semantics for post-hoc Bias Mitigation in generative AI
cs.LO 2026-04 unverdicted novelty 7.0

CTLF is a branching-time logic with counting-worlds semantics for verifying fairness in probability distributions over protected attributes, predicting bias bounds, and calculating outputs to remove in generative AI series.
When AI reviews science: Can we trust the referee?
cs.AI 2026-04 unverdicted novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference sub...
FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment
cs.AI 2026-04 unverdicted novelty 4.0

Vision-language models for wellbeing assessment exhibit dataset-dependent performance and demographic biases, with explainability interventions providing inconsistent fairness gains at potential accuracy costs.