Bias in Large Language Models: Origin, Evaluation, and Mitigation

· 2024 · cs.CL · arXiv 2411.10915

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

open full Pith review browse 12 citing papers arXiv PDF

abstract

Large Language Models (LLMs) have revolutionized natural language processing, but their susceptibility to biases poses significant challenges. This comprehensive review examines the landscape of bias in LLMs, from its origins to current mitigation strategies. We categorize biases as intrinsic and extrinsic, analyzing their manifestations in various NLP tasks. The review critically assesses a range of bias evaluation methods, including data-level, model-level, and output-level approaches, providing researchers with a robust toolkit for bias detection. We further explore mitigation strategies, categorizing them into pre-model, intra-model, and post-model techniques, highlighting their effectiveness and limitations. Ethical and legal implications of biased LLMs are discussed, emphasizing potential harms in real-world applications such as healthcare and criminal justice. By synthesizing current knowledge on bias in LLMs, this review contributes to the ongoing effort to develop fair and responsible AI systems. Our work serves as a comprehensive resource for researchers and practitioners working towards understanding, evaluating, and mitigating bias in LLMs, fostering the development of more equitable AI technologies.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

A Practical Upper Bound on Selection Bias Effects in Medical Prediction Models

cs.LG · 2026-05-30 · unverdicted · novelty 7.0

A new upper bound is derived for the worst-case effect of selection bias on medical prediction model performance under partial observation of the selection process and target data.

ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

Personalized LLM-generated plain language summaries improve lay readers' comprehension and quality ratings but increase risks of reinforcing biases and introducing hallucinations compared to static expert summaries.

Counting Worlds Branching Time Semantics for post-hoc Bias Mitigation in generative AI

cs.LO · 2026-04-21 · unverdicted · novelty 7.0

CTLF is a branching-time logic with counting-worlds semantics for verifying fairness in probability distributions over protected attributes, predicting bias bounds, and calculating outputs to remove in generative AI series.

Entropy Minimization without Model Collapse: Mitigating Prediction Bias in Medical Imaging

cs.LG · 2026-06-01 · unverdicted · novelty 6.0

Entropy minimization amplifies prediction bias from merged feature clusters under distribution shifts, and DSBR mitigates collapse by equalizing predicted class contributions to the unsupervised loss.

Side-by-side Comparison Amplifies Dialect Bias in Language Models

cs.CL · 2026-05-23 · unverdicted · novelty 6.0

Side-by-side comparison of intent-equivalent SAE and AAVE tweets significantly exacerbates covert dialect bias in LMs compared to isolated evaluation, with explicit dialect labels worsening the effect further.

When AI reviews science: Can we trust the referee?

cs.AI · 2026-04-26 · unverdicted · novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.

Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users

cs.AI · 2025-12-11 · unverdicted · novelty 6.0

LLM safety evaluations for personal advice must test responses against diverse user vulnerability profiles, since context-blind ratings overestimate safety and realistic prompt context does not fix the problem.

A Study of LLMs' Preferences for Libraries and Programming Languages

cs.SE · 2025-03-21 · unverdicted · novelty 6.0

Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.

Curation of a Cardiology Interface Terminology for Highlighting Electronic Health Records using Machine Learning

cs.AI · 2026-06-06 · unverdicted · novelty 4.0

A three-phase ML-assisted curation creates a Cardiology Interface Terminology (CIT) from SNOMED and EHR data that highlights details in cardiology notes with 74.21% coverage, 98.2% average completeness, and 84.2% average conciseness on test data.

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

cs.AI · 2026-06-03 · unverdicted · novelty 4.0

BiasGRPO uses group-relative baselines in online policy optimization plus a custom bias reward model to reduce instability in LLM bias mitigation and outperform DPO and PPO on benchmarks.

FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment

cs.AI · 2026-04-26 · unverdicted · novelty 4.0

Vision-language models for wellbeing assessment exhibit dataset-dependent performance and demographic biases, with explainability interventions providing inconsistent fairness gains at potential accuracy costs.

A Survey on LLM-as-a-Judge

cs.CL · 2024-11-23 · unverdicted · novelty 4.0

A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

citing papers explorer

Showing 12 of 12 citing papers.

A Practical Upper Bound on Selection Bias Effects in Medical Prediction Models cs.LG · 2026-05-30 · unverdicted · none · ref 41 · internal anchor
A new upper bound is derived for the worst-case effect of selection bias on medical prediction model performance under partial observation of the selection process and target data.
ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost? cs.CL · 2026-05-01 · unverdicted · none · ref 53 · internal anchor
Personalized LLM-generated plain language summaries improve lay readers' comprehension and quality ratings but increase risks of reinforcing biases and introducing hallucinations compared to static expert summaries.
Counting Worlds Branching Time Semantics for post-hoc Bias Mitigation in generative AI cs.LO · 2026-04-21 · unverdicted · none · ref 17 · internal anchor
CTLF is a branching-time logic with counting-worlds semantics for verifying fairness in probability distributions over protected attributes, predicting bias bounds, and calculating outputs to remove in generative AI series.
Entropy Minimization without Model Collapse: Mitigating Prediction Bias in Medical Imaging cs.LG · 2026-06-01 · unverdicted · none · ref 27 · internal anchor
Entropy minimization amplifies prediction bias from merged feature clusters under distribution shifts, and DSBR mitigates collapse by equalizing predicted class contributions to the unsupervised loss.
Side-by-side Comparison Amplifies Dialect Bias in Language Models cs.CL · 2026-05-23 · unverdicted · none · ref 21 · internal anchor
Side-by-side comparison of intent-equivalent SAE and AAVE tweets significantly exacerbates covert dialect bias in LMs compared to isolated evaluation, with explicit dialect labels worsening the effect further.
When AI reviews science: Can we trust the referee? cs.AI · 2026-04-26 · unverdicted · none · ref 112 · internal anchor
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users cs.AI · 2025-12-11 · unverdicted · none · ref 18 · internal anchor
LLM safety evaluations for personal advice must test responses against diverse user vulnerability profiles, since context-blind ratings overestimate safety and realistic prompt context does not fix the problem.
A Study of LLMs' Preferences for Libraries and Programming Languages cs.SE · 2025-03-21 · unverdicted · none · ref 25 · internal anchor
Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.
Curation of a Cardiology Interface Terminology for Highlighting Electronic Health Records using Machine Learning cs.AI · 2026-06-06 · unverdicted · none · ref 57 · internal anchor
A three-phase ML-assisted curation creates a Cardiology Interface Terminology (CIT) from SNOMED and EHR data that highlights details in cardiology notes with 74.21% coverage, 98.2% average completeness, and 84.2% average conciseness on test data.
BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization cs.AI · 2026-06-03 · unverdicted · none · ref 2 · internal anchor
BiasGRPO uses group-relative baselines in online policy optimization plus a custom bias reward model to reduce instability in LLM bias mitigation and outperform DPO and PPO on benchmarks.
FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment cs.AI · 2026-04-26 · unverdicted · none · ref 34 · internal anchor
Vision-language models for wellbeing assessment exhibit dataset-dependent performance and demographic biases, with explainability interventions providing inconsistent fairness gains at potential accuracy costs.
A Survey on LLM-as-a-Judge cs.CL · 2024-11-23 · unverdicted · none · ref 43 · internal anchor
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.

Bias in Large Language Models: Origin, Evaluation, and Mitigation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer