Should chatgpt be biased? challenges and risks of bias in large language models

Emilio Ferrara · 2023 · arXiv 2304.03738

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2 other 1

citation-polarity summary

background 2 unclear 1

representative citing papers

Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

GMRL-BD detects untrustworthy topic boundaries for black-box LLMs by combining bias-diffusion on a Wikipedia KG with multi-agent RL, supported by a released dataset labeling biases in models like Llama2 and Qwen2.

Beyond Static Benchmarks: Synthesizing Harmful Content via Persona-based Simulation for Robust Evaluation

cs.CL · 2026-04-18 · unverdicted · novelty 5.0

A two-dimensional persona simulation framework generates harmful content that is more challenging to detect and comparably diverse to human-curated datasets for robust evaluation of detection systems.

Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs

cs.LG · 2026-04-08 · unverdicted · novelty 5.0

Guardian-as-an-Advisor prepends risk labels and explanations from a guardian model to queries, improving LLM safety compliance and reducing over-refusal while adding minimal compute overhead.

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

cs.AI · 2023-08-10 · accept · novelty 5.0

Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.

Data-Centric Foundation Models in Computational Healthcare: A Survey

cs.LG · 2024-01-04 · unverdicted · novelty 3.0

The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.

Social and Ethical Risks Posed by General-Purpose LLMs for Settling Newcomers in Canada

cs.CY · 2024-07-15 · unverdicted · novelty 2.0

The paper identifies social and ethical risks from unguarded use of general-purpose LLMs in Canadian newcomer settlement and advocates for AI literacy programs plus customized models with human oversight.

citing papers explorer

Showing 6 of 6 citing papers.

Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning cs.AI · 2026-04-07 · unverdicted · none · ref 8
GMRL-BD detects untrustworthy topic boundaries for black-box LLMs by combining bias-diffusion on a Wikipedia KG with multi-agent RL, supported by a released dataset labeling biases in models like Llama2 and Qwen2.
Beyond Static Benchmarks: Synthesizing Harmful Content via Persona-based Simulation for Robust Evaluation cs.CL · 2026-04-18 · unverdicted · none · ref 1
A two-dimensional persona simulation framework generates harmful content that is more challenging to detect and comparably diverse to human-curated datasets for robust evaluation of detection systems.
Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs cs.LG · 2026-04-08 · unverdicted · none · ref 17
Guardian-as-an-Advisor prepends risk labels and explanations from a guardian model to queries, improving LLM safety compliance and reducing over-refusal while adding minimal compute overhead.
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment cs.AI · 2023-08-10 · accept · none · ref 220
Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.
Data-Centric Foundation Models in Computational Healthcare: A Survey cs.LG · 2024-01-04 · unverdicted · none · ref 84
The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.
Social and Ethical Risks Posed by General-Purpose LLMs for Settling Newcomers in Canada cs.CY · 2024-07-15 · unverdicted · none · ref 30
The paper identifies social and ethical risks from unguarded use of general-purpose LLMs in Canadian newcomer settlement and advocates for AI literacy programs plus customized models with human oversight.

Should chatgpt be biased? challenges and risks of bias in large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer