A survey on fairness in large language models

Li Y, Du M, Song R, Wang X, Wang Y · 2023 · arXiv 2308.10149

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs

cs.CY · 2026-05-11 · accept · novelty 7.0 · 2 refs

StereoTales shows that all tested LLMs emit harmful stereotypes in open-ended stories, with associations adapting to prompt language and targeting locally salient groups rather than transferring uniformly across languages.

SCOPE: A Dataset of Stereotyped Prompts for Counterfactual Fairness Assessment of LLMs

cs.SE · 2026-04-07 · unverdicted · novelty 7.0

SCOPE is a new large-scale dataset of counterfactual prompt pairs for evaluating fairness and stereotype sensitivity in LLMs across 1,438 topics, nine bias dimensions, 1,536 groups, and four communicative intents.

Estimating Grammatical Gender Directions in Contextual Embeddings under Controlled and Natural Contexts

cs.CL · 2026-06-29 · unverdicted · novelty 6.0

A framework estimates grammatical gender directions in contextual embeddings via controlled and natural contexts, finding unweighted controlled contexts and centroid estimators yield the purest directions.

FairNVT: Improving Fairness via Noise Injection in Vision Transformers

cs.CV · 2026-04-18 · unverdicted · novelty 6.0

FairNVT injects calibrated noise into sensitive embeddings of transformer encoders to jointly improve representation-level and prediction-level fairness metrics without degrading task performance.

Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights

cs.CY · 2025-02-26 · unverdicted · novelty 6.0

LLMs exhibit identity-dependent hedging on human rights questions, with group identity as the strongest predictor among tested factors, and group steering mitigates the disparity.

Resume-ing Control: (Mis)Perceptions of Agency Around GenAI Use in Recruiting Workflows

cs.CY · 2026-04-29 · unverdicted · novelty 5.0

Recruiters perceive themselves as retaining agency over GenAI in hiring pipelines, yet GenAI invisibly architects core evaluation inputs, producing only marginal efficiency gains at the cost of deskilling.

Intersectional Fairness in Large Language Models

cs.CL · 2026-04-22 · unverdicted · novelty 5.0

LLMs are more accurate when answers match stereotypes in clear contexts, especially for race-gender combinations, and no tested model shows consistent fairness or reliability across intersectional groups.

Fairness Testing of Large Language Models in Role-Playing

cs.CY · 2024-11-01 · unverdicted · novelty 5.0

Generates 550 roles and 33,000 questions to evaluate 10 LLMs in role-playing, finding 107,580 biased responses.

Examining Agents' Bias Amplification versus Suppression in Multi-Agent Systems

cs.AI · 2026-05-27 · unverdicted · novelty 4.0

Empirical tests show that uniformly biased agents in multi-agent LLM systems produce system-wide bias exceeding the sum of individual biases, quantified via a new Favor Bias Strength metric.

A Survey on the Memory Mechanism of Large Language Model based Agents

cs.AI · 2024-04-21 · accept · novelty 3.0

A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

citing papers explorer

Showing 10 of 10 citing papers.

StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs cs.CY · 2026-05-11 · accept · none · ref 72 · 2 links
StereoTales shows that all tested LLMs emit harmful stereotypes in open-ended stories, with associations adapting to prompt language and targeting locally salient groups rather than transferring uniformly across languages.
SCOPE: A Dataset of Stereotyped Prompts for Counterfactual Fairness Assessment of LLMs cs.SE · 2026-04-07 · unverdicted · none · ref 9
SCOPE is a new large-scale dataset of counterfactual prompt pairs for evaluating fairness and stereotype sensitivity in LLMs across 1,438 topics, nine bias dimensions, 1,536 groups, and four communicative intents.
Estimating Grammatical Gender Directions in Contextual Embeddings under Controlled and Natural Contexts cs.CL · 2026-06-29 · unverdicted · none · ref 8
A framework estimates grammatical gender directions in contextual embeddings via controlled and natural contexts, finding unweighted controlled contexts and centroid estimators yield the purest directions.
FairNVT: Improving Fairness via Noise Injection in Vision Transformers cs.CV · 2026-04-18 · unverdicted · none · ref 1
FairNVT injects calibrated noise into sensitive embeddings of transformer encoders to jointly improve representation-level and prediction-level fairness metrics without degrading task performance.
Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights cs.CY · 2025-02-26 · unverdicted · none · ref 30
LLMs exhibit identity-dependent hedging on human rights questions, with group identity as the strongest predictor among tested factors, and group steering mitigates the disparity.
Resume-ing Control: (Mis)Perceptions of Agency Around GenAI Use in Recruiting Workflows cs.CY · 2026-04-29 · unverdicted · none · ref 55
Recruiters perceive themselves as retaining agency over GenAI in hiring pipelines, yet GenAI invisibly architects core evaluation inputs, producing only marginal efficiency gains at the cost of deskilling.
Intersectional Fairness in Large Language Models cs.CL · 2026-04-22 · unverdicted · none · ref 18
LLMs are more accurate when answers match stereotypes in clear contexts, especially for race-gender combinations, and no tested model shows consistent fairness or reliability across intersectional groups.
Fairness Testing of Large Language Models in Role-Playing cs.CY · 2024-11-01 · unverdicted · none · ref 33
Generates 550 roles and 33,000 questions to evaluate 10 LLMs in role-playing, finding 107,580 biased responses.
Examining Agents' Bias Amplification versus Suppression in Multi-Agent Systems cs.AI · 2026-05-27 · unverdicted · none · ref 3
Empirical tests show that uniformly biased agents in multi-agent LLM systems produce system-wide bias exceeding the sum of individual biases, quantified via a new Favor Bias Strength metric.
A Survey on the Memory Mechanism of Large Language Model based Agents cs.AI · 2024-04-21 · accept · none · ref 62
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

A survey on fairness in large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer