pith. sign in

Bias in Large Language Models: Origin, Evaluation, and Mitigation

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it
abstract

Large Language Models (LLMs) have revolutionized natural language processing, but their susceptibility to biases poses significant challenges. This comprehensive review examines the landscape of bias in LLMs, from its origins to current mitigation strategies. We categorize biases as intrinsic and extrinsic, analyzing their manifestations in various NLP tasks. The review critically assesses a range of bias evaluation methods, including data-level, model-level, and output-level approaches, providing researchers with a robust toolkit for bias detection. We further explore mitigation strategies, categorizing them into pre-model, intra-model, and post-model techniques, highlighting their effectiveness and limitations. Ethical and legal implications of biased LLMs are discussed, emphasizing potential harms in real-world applications such as healthcare and criminal justice. By synthesizing current knowledge on bias in LLMs, this review contributes to the ongoing effort to develop fair and responsible AI systems. Our work serves as a comprehensive resource for researchers and practitioners working towards understanding, evaluating, and mitigating bias in LLMs, fostering the development of more equitable AI technologies.

citation-role summary

background 2

citation-polarity summary

verdicts

UNVERDICTED 12

roles

background 2

polarities

background 2

clear filters

representative citing papers

Side-by-side Comparison Amplifies Dialect Bias in Language Models

cs.CL · 2026-05-23 · unverdicted · novelty 6.0

Side-by-side comparison of intent-equivalent SAE and AAVE tweets significantly exacerbates covert dialect bias in LMs compared to isolated evaluation, with explicit dialect labels worsening the effect further.

When AI reviews science: Can we trust the referee?

cs.AI · 2026-04-26 · unverdicted · novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.

A Survey on LLM-as-a-Judge

cs.CL · 2024-11-23 · unverdicted · novelty 4.0

A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

citing papers explorer

Showing 12 of 12 citing papers.